Copyright © 1999-2006 Steven J. Zeil, Old Dominion University

**Table of Contents**

In the first section of this course, we looked at the process of analyzing the worst-case running time of an algorithm, and the use of worst-case analysis and big-O notation as a way of describing it.

In this section, we will introduce average-case complexity. Just as the worst-case complexity describes an upper bound on the worst-case time we would see when running an algorithm, average case complexity will present an upper bound on the average time we would see when running the program many times on many different inputs.

It is true that worst-case complexity gets used more often than average-case. There are a number of reasons for this.

The worst-case complexity is often easier to compute than the average case. Just figuring out what an “average” set of inputs will look like is often a challenge. To figure out the worst case complexity, we only need to identify that one single input that results in the slowest running.

In many cases, it will turn out the worst and average complexity will turn out to be the same.

Finally, reporting the worst case to your boss or your customers is often “safer” than reporting the average. If you give them the average, then sometimes they will run the program and see slower performance than they had expected. Human nature being what it is, they will probably get rather annoyed. On the other hand, if you go to those same customers with the worst-case figure, most of the time they will observe faster-than-expected behavior, and will be more pleased.

This appears to be particularly true of interactive programs. When people are actually sitting there typing things in or clicking with the mouse and then waiting for a response, if they have to sit for a long time waiting for a response, they're going to remember that. Even if 99.9% of the time they get instant response (so the average response is still quite good), they will characterize your program as “sluggish”.

In those circumstances, it makes sense to focus on worst-case behavior and to do what we can to improve that worst case.

On the other hand, suppose we're talking about a batch
program that will process thousands of inputs per run, or we're talking
about a critical piece of an interactive program that gets run hundreds or
thousands of times in between *each* response from the
user. In that situation, adding up hundreds or thousands of worst-cases may
be just too pessimistic. In that circumstance, the cumulative time of
thousands of different runs should show some averaging out of th worst-case
behavior, and an average case analysis may give a more realistic picture of
what the user will be seeing.

We say that an algorithm requires *average time
proportional to f(n)* (or that it has average-case complexity
O(f(N))) if there are constants c and n_{0} such that the average time the algorithm requires to
process an input set of size n is no more than c*f(n) time units whenever n ≥ n_{0}.

This definition is very similar to the one for worst case complexity.

The biggest difference is that we deal in the average time to process input sets of size n, instead of the maximum (worst case) time.

But note that we are still looking for an *upper
bound* on the algorithm's behavior.

The average case complexity describes how quickly the average time increases when n increases, just as the worst case complexity describes how quickly the worst case time increases when n increases.

**Question:** Suppose we have an
algorithm with worst case complexity O(n).

True or false: It is possible for that algorithm to have average
case complexity O(n^{2}).

**Question:** Suppose we have an
algorithm with worst case complexity O(n).

True or false: It is possible for that algorithm to have average
case complexity O(n^{2}).

**False**. If it were true, the
average case time would be increasing faster than the worst case time.
Eventually, that means that the average case time for some large
n would be larger than the worst case among all input
sets of size n. But an average can never be larger than the maximum
value in the set being averaged.

So the only way this statement could be true is in the trivial
sense that any O(n) is also O(n^{2}). That trivial case is ruled out by the “no
better than” stipulation.

This is an important idea to keep in mind as we discuss the rules for average case analysis. The worst-case analysis rules all apply, because they do provide an upper bound. But that bound is sometimes not as tight as it could be. One of the things that makes average case analysis tricky is recognizing when we can get by with the worst-case rules and when we can gain by using a more elaborate average-case rule. There's no hard-and fast way to tell --- it requires the same kind of personal judgment that goes on in most mathematical proofs.

Process is much the same as for worst-case except that

When evaluating function calls, we replace by the

*average*-case complexity of their bodiesWhen evaluating loops (or recursion) we look at the

*average*number of iterations

For some people, average case analysis is difficult because they don't have a very flexible idea of what an "average" is.

Last semester, Professor Cord gave out the following grades in his CS361 class:

A, A, A-, B+, B, B, B, B-, C+, C, C-, D, F, F, F, FTranslating these to their numerical equivalent,

4, 4, 3.7, 3.3, 3, 3, 3, 2.7, 2.3, 2, 1.7, 1, 0, 0, 0, 0what was the average grade in Cord's class?

According to some classic forms of average:

- Median
the middle value of the sorted list, (the midpoint between the two middle values given an even number of items)

avg_{median}= 2.5- Mode
the most commonly occurring value

avg_{modal}= 0- mean
Computed from the sum of the elements

avg_{mean}= (4 + 4 + 3.7 + 3.3 + 3 + 3 + 3 + 2.7 + 2.3 + 2 + 1.7 + 1 + 0 + 0 + 0 + 0) / 16 = 2.11

The mean average is the most commonly used, but even that comes in comes in many varieties

- Simple mean
- `\bar{x} = (\sum_{i=1}^N x_i) / N`
- Weighted mean
- `\bar{x} = (\sum_{i=1}^N w_i * x_i) / (\sum_{i=1}^N w_i)`
Example: Last semester Professor Cord gave the following grades

Grade # students 4.0 2 3.7 1 3.3 1 3.0 3 2.7 1 2.3 1 2.0 1 1.7 1 1.3 0 1.0 1 0.0 4 The weighted average is

(2*4.0 + 1*3.7 + 1*3.3 + 3*3.0 + 1*2.7 + 1*2.3 + 1*2.0 + 1*1.7 + 0*1.3 + 1*1.0 + 4*0.0) / (2 + 1 + 1 + 3 + 1 + 1 + 1 + 1 + 0 + 1 + 4) = 2.11Example: When one student asked about his grade, Professor Cord pointed out that assignments were worth 50% of the grade, the final exam was worth 30%, and the midterm exam worth 20%. The student has a B, A, and C-, respectively on these.

Category Score Weight Assignments 3.0 50 Final 4.0 30 Midterm 1.7 20 So the student's average grade was

(50*3.0 + 30*4.0 + 20*1.7)/(50+30+20) = 3.04

The *expected value* is a special version
of the weighted mean in which the weights are the probability of
seeing each particular value.

If `x_1, x_2, \ldots,` are all the possible values of some quantity, and
these values occur with probability `p_1, p_2, \ldots,`, then the *expected value* of
that quantity is

`E(x) = \sum_{i=1}^N p_i * x_i`

Note that if we have listed all possible values, then

`\sum_{i=1}^N p_i = 1`

so the usual denominator in the definition of the weighted sum becomes simply "1".

Example: after long observation, we have determined that Professor Cord tends to give grades with the following distribution:

Grade | probability |
---|---|

4.0 | 2/16 |

3.7 | 1/16 |

3.3 | 1/16 |

3.0 | 3/16 |

2.7 | 1/16 |

2.3 | 1/16 |

2.0 | 1/16 |

1.7 | 1/16 |

1.3 | 0/16 |

1.0 | 1/16 |

0.0 | 4/16 |

So the expected value of the grade for an average student in his class is

`((2/16)*4.0 + (1/16)*3.7 + (1/16)*3.3 + (3/16)*3.0
+ (1/16)*2.7 + (1/16)*2.3 + (1/16)*2.0 + (1/16)*1.7 + (0/16)*1.3
+ (1/16)*1.0 + (4/16)*0.0) = 2.11`

The expected value is the kind of average we will use throughout this course.

We don't need a whole lot of facts about probabilities in this course. The following will do:

Every probability is between 0 and 1, inclusive.

The sum of the probabilities of all possible events must be 1.0

If p is the probability that an event will happen, then (1-p) is the probability that the event will not happen.

If two events are independent and occur with probabilities p

_{1}and p_{2}, respectively, then the probability that both events will occur is p_{1}*p_{2}If we make a number of independent attempts at something, and the chance of seeing an event on any single attempt is p, on average we will need 1/p attempts before seeing that event.

When we encounter a loop, we compute the run time by adding up the time required for an average number of iterations.

**Special Case: constant # of iterations. **If we knew that the loop would terminate after exactly
k iterations, where k was a constant, then the rule would be

`t_(text(loop)) = t_{text{init}} + \sum_{i=0}^k
(t_{text{condition}}(i) + t_{text{body}}(i))`

where `t_{text{condition}}(i)` is the average time required to evaluate the condition on the ith iteration, and `t_{text{body}(i)` is the time to evaluate the body on the ith iteration.

**General case: unknown # of iterations. **But in many cases, the number of iterations is not a constant
but depends on the inputs. In general, if `p_k` is the probability that the loop would terminate
after `k` iterations, the average running time would be:

`t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty}
p_k (t_{text{condition}}(k+1: k) + \sum_{i=0}^k
(t_{text{condition}}(i: k) + t_{text{body}}(i: k)))`

where `t_{text{condition}}(i: k)` is the average time required to evaluate the condition on the ith iteration when the loop will take k total iterations, and `t_{text{body}}(i: k)` is defined similarly for loop bodies.

This is a fairly messy formula, but we'll usually fall back on special cases.

The `t_{text{condition}}(k+1: k)` term reflects the fact that, on the final iteration, we evaluate the condition but do not execute the body.

The `\infty` is not a mistake. It expresses the general case that some loops could execute an arbitrary number of times depending upon the input. If, however, we knew that a loop never executed more than, say, 100 times, then the probabilities `p_{101}, p_{102}, \ldots would all be zero.

Suppose that we knew that k was a constant. Then `p_k = 1` for that single value of k, and `p_j = 0` for all `j != k`. That leads directly to our earlier special case.

**Special case: fixed bound on body & condition. **If `(t_{text{condition}}(i: k) + t_{text{body}}(i:
k))` does not vary from one iteration to another, then

`t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty}
p_k ( (k+1)*t_{text{condition}} + k*t_{text{body}})`

**Special case: fixed bound and equally probable
iterations. **If the loop condition and body times do
not vary across iterations, and we know that the loop takes anywhere
from M to N iterations, and that each number of iterations in that
range is equally likely, then

`t_(text(loop)) = t_{text{init}} + O(1 + (M + N)/2 .
(t_{text{condition}} + t_{text{body}}))`

We'll illustrate the process of doing average-case analysis by looking at the ordered insertion algorithm.

We start, as usual, by marking the simple bits O(1).

Next we note that the loop body can be reduced to O(1).

The complexity of the loop condition and body do not vary from one iteration to another.

It can execute

0 times (if

`x`

is larger than anything already in the array),1 time (if

`x`

is larger than all but one element already in the array),and so on up to a maximum of n times (if

`x`

is smaller than everything already in the array).

What we don't know are the probabilities to associate with these different numbers of iterations.

Depends upon the way the successive inputs to this function are distributed.

When I used this algorithm as part of a spell checking programs, I saw two different examples of possible input patterns:

In some cases, we were getting elements that were already in sorted order (e.g., reading from a sorted dictionary file).

We also faced input distributions where successive values of x were coming in essentially random order (e.g., adding words from the document into a "concordance" - a set of all words known to have appeared in the document).

In that case, we know the algorithm always uses 0 iterations of the loop:

`p_0=1, p_1 = 0, p_2 = 0, \ldots`

So the time is

`t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty}
p_k ( (k+1)*t_{text{condition}} + k*t_{text{body}})`

but plugging in the p's simplifies to

`t_(text(loop)) = t_{text{init}} + t_{text{condition}} =
O(1)`

For this input pattern, the entire algorithm has an average-case complexity of O(1).

In this case, we are equally likely to need 0 iterations, 1
iteration, 2 iterations , …, `n`

iterations. So the
iterations are all equally likely and we can use the "fixed bound &
equally probable" special case for loops.

`t_(text(loop)) = t_{text{init}} + O(1 + (M + N)/2 .
(t_{text{condition}} + t_{text{body}}))`

and, after our usual simplifications

`t_{text{loop}} = O(n)`

And we can then replace the entire loop by `O(n)`.

And now, we add up the complexities in the remaining straight-line sequence, and conclude that the entire algorithm has an average case complexity of O(n) when presented with randomly arranged inputs.

This is the same result we had for the worst case analysis. That's not unusual.

Under similar randomly arranged inputs, the average case complexity of ordered search is O(n) and the average case complexity of binary search is O(log n) Again, these are the same as their worst-case complexities.

Looking back a few steps in the analysis of this code, what can we
say about the average-case complexity *before*
looking at the specific input distribution?

Let k_{avg} denote the average number of times that the loop is
executed.

Then we quickly conclude that the loop is `O(k_{avg})`.

and so, in fact, is the entire function.

We've already considered the case where the inputs to this function were already arranged into ascending order. What would happen if the inputs were almost, but not exactly, already sorted into ascending order?

For example, suppose that, on average, one out of n items is out of order. Then the probability of a
given input repeating the loop zero times would be p_{0} = (n-1)/n, and some single p_{i} would have probability 1/n, with all the other probabilities being zero.

Assuming the worst (because we want to find an upper bound), let's assume that the one out-of-order element is the very last one added, and that it actually gets inserted into position 0. Then we have

`p_0 = (n-1)/2, p_1 = 0, p_2 = 0, \ldots, p_{n-1} = 0,
p_n = 1/n`

So the average number of iterations would be given by

`k_{text{avg}} = \sum_{i=0}^{n} (i+1)p_i`

`\ \ \ = 1*{n-1}/n + (n + 1) * /n`

`\ \ \ = (2n)/n`

`\ \ \ = 2`

and the function is `O(k_{text{avg}}) = O(1)`

Now, that's only one possible scenario in which the inputs are almost sorted. Let's look at another. Suppose that we knew that, for each successive input, the probability of it appearing in the input `m` steps out of its correct position is proportional to `1/(m+1)` (i.e., each additional step out of its correct position is progressively more unlikely). Then we have

`p_{0}=c, p_{1}=c/2, p_{2}=c/3, \ldots p_{n-1}=c/n,
p_{n}=c/(n+1)`

The constant `c` is necessary because the sum of all the probabilities must be exactly 1. We can compute the value of `c` by using that fact:

`\sum_{i=0}^{n} p_i = 1`

`\sum_{i=0}^{n} \frac{c}{i+1} = 1`

`c \sum_{i=0}^{n} \frac{1}{i+1} = 1`

This sum, for reasonably large n, is approximately `log n`.

So we conclude that `c` is approximately `= 1/log(n)`.

So the function, for this input distribution, is

`t_{text{seqIns}} = O(k_{text{avg}})

`\ \ \ = O(\sum_{i=0}^n (i + 1)p_i)`

`\ \ \ = O(\sum_{i=0}^n (i + 1) c/(i+1})`

`\ \ \ = O(\sum_{i=0}^n c)`

`\ \ \ = O((n+1)c)`

`\ \ \ = O(n/(log n))`

S o the average case is slightly smaller than the worst case, though not by much (remember that `log n` is nearly constant over large ranges of `n`, so `n/(log(n))` grows only slightly slower than `n`.

You can see, then, that average case complexity can vary considerably depending upon just what constitutes an “average” set of inputs. Utility functions that get used in many different programs may see different input distributions in each program, and so their average performances in different programs will vary accordingly.

## In the Forum: