Analysis of Algorithms: Average Case Analysis

In the first section of this course, we looked at the process of analyzing the worst-case running time of an algorithm, and the use of worst-case analysis and big-O notation as a way of describing it.

In this section, we will introduce average-case complexity. Just as the worst-case complexity describes an upper bound on the worst-case time we would see when running an algorithm, average case complexity will present an upper bound on the average time we would see when running the program many times on many different inputs.

It is true that worst-case complexity gets used more often than average-case. There are a number of reasons for this.

• The worst-case complexity is often easier to compute than the average case. Just figuring out what an average set of inputs will look like is often a challenge. To figure out the worst case complexity, we only need to identify that one single input that results in the slowest running.

• In many cases, it will turn out the worst and average complexity will turn out to be the same.

• Finally, reporting the worst case to your boss or your customers is often safer than reporting the average. If you give them the average, then sometimes they will run the program and see slower performance than they had expected. Human nature being what it is, they will probably get rather annoyed. On the other hand, if you go to those same customers with the worst-case figure, most of the time they will observe faster-than-expected behavior, and will be more pleased.

This appears to be particularly true of interactive programs. When people are actually sitting there typing things in or clicking with the mouse and then waiting for a response, if they have to sit for a long time waiting for a response, they're going to remember that. Even if 99.9% of the time they get instant response (so the average response is still quite good), they will characterize your program as sluggish.

In those circumstances, it makes sense to focus on worst-case behavior and to do what we can to improve that worst case.

On the other hand, suppose we're talking about a batch program that will process thousands of inputs per run, or we're talking about a critical piece of an interactive program that gets run hundreds or thousands of times in between each response from the user. In that situation, adding up hundreds or thousands of worst-cases may be just too pessimistic. In that circumstance, the cumulative time of thousands of different runs should show some averaging out of th worst-case behavior, and an average case analysis may give a more realistic picture of what the user will be seeing.

1. Definition

We say that an algorithm requires average time proportional to f(n) (or that it has average-case complexity O(f(N))) if there are constants c and n0 such that the average time the algorithm requires to process an input set of size n is no more than c*f(n) time units whenever n ≥ n0.

This definition is very similar to the one for worst case complexity.

An algorithm requires average time proportional to f(n) (or that it has average-case complexity O(f(N))) if there are constants c and n0 such that the average time the algorithm requires to process an input set of size n is no more than c*f(n) time units whenever n ≥ n0.

The biggest difference is that we deal in the average time to process input sets of size n, instead of the maximum (worst case) time.

An algorithm requires average time proportional to f(n) (or that it has average-case complexity O(f(N))) if there are constants c and n0 such that the average time the algorithm requires to process an input set of size n is no more than c*f(n) time units whenever n ≥ n0.

But note that we are still looking for an upper bound on the algorithm's behavior.

The average case complexity describes how quickly the average time increases when n increases, just as the worst case complexity describes how quickly the worst case time increases when n increases.

Question: Suppose we have an algorithm with worst case complexity O(n).

True or false: It is possible for that algorithm to have average case complexity O(n2).

1.1.

Question: Suppose we have an algorithm with worst case complexity O(n).

True or false: It is possible for that algorithm to have average case complexity O(n2).

False. If it were true, the average case time would be increasing faster than the worst case time. Eventually, that means that the average case time for some large n would be larger than the worst case among all input sets of size n. But an average can never be larger than the maximum value in the set being averaged.

So the only way this statement could be true is in the trivial sense that any O(n) is also O(n2). That trivial case is ruled out by the no better than stipulation.

This is an important idea to keep in mind as we discuss the rules for average case analysis. The worst-case analysis rules all apply, because they do provide an upper bound. But that bound is sometimes not as tight as it could be. One of the things that makes average case analysis tricky is recognizing when we can get by with the worst-case rules and when we can gain by using a more elaborate average-case rule. There's no hard-and fast way to tell --- it requires the same kind of personal judgment that goes on in most mathematical proofs.

2. Determining the Average Case Complexity

Process is much the same as for worst-case except that

• When evaluating function calls, we replace by the average-case complexity of their bodies

• When evaluating loops (or recursion) we look at the average number of iterations

2.1. What's an Average?

For some people, average case analysis is difficult because they don't have a very flexible idea of what an "average" is.

Last semester, Professor Cord gave out the following grades in his CS361 class:

A, A, A-, B+, B, B, B, B-, C+, C, C-, D, F, F, F, F

Translating these to their numerical equivalent,

4, 4, 3.7, 3.3, 3, 3, 3, 2.7, 2.3, 2, 1.7, 1, 0, 0, 0, 0

what was the average grade in Cord's class?

According to some classic forms of average:

Median

the middle value of the sorted list, (the midpoint between the two middle values given an even number of items)

avgmedian = 2.5
Mode

the most commonly occurring value

avgmodal = 0
mean

Computed from the sum of the elements

avgmean = (4 + 4 + 3.7 + 3.3 + 3 + 3 + 3 + 2.7 + 2.3 + 2 + 1.7 + 1 + 0 + 0 + 0 + 0) / 16 = 2.11

Mean Average

The mean average is the most commonly used, but even that comes in comes in many varieties

Simple mean
\bar{x} = (\sum_{i=1}^N x_i) / N
Weighted mean
\bar{x} = (\sum_{i=1}^N w_i * x_i) / (\sum_{i=1}^N w_i)

Example: Last semester Professor Cord gave the following grades

4.0 2
3.7 1
3.3 1
3.0 3
2.7 1
2.3 1
2.0 1
1.7 1
1.3 0
1.0 1
0.0 4

The weighted average is

(2*4.0 + 1*3.7 + 1*3.3 + 3*3.0 + 1*2.7 + 1*2.3 + 1*2.0 + 1*1.7 + 0*1.3 + 1*1.0 + 4*0.0) / (2 + 1 + 1 + 3 + 1 + 1 + 1 + 1 + 0 + 1 + 4) = 2.11

Example: When one student asked about his grade, Professor Cord pointed out that assignments were worth 50% of the grade, the final exam was worth 30%, and the midterm exam worth 20%. The student has a B, A, and C-, respectively on these.

Category Score Weight
Assignments 3.0 50
Final 4.0 30
Midterm 1.7 20

So the student's average grade was

(50*3.0 + 30*4.0 + 20*1.7)/(50+30+20) = 3.04

Expected Value

The expected value is a special version of the weighted mean in which the weights are the probability of seeing each particular value.

If x_1, x_2, \ldots, are all the possible values of some quantity, and these values occur with probability p_1, p_2, \ldots,, then the expected value of that quantity is

E(x) = \sum_{i=1}^N p_i * x_i

Note that if we have listed all possible values, then

\sum_{i=1}^N p_i = 1

so the usual denominator in the definition of the weighted sum becomes simply "1".

Example: after long observation, we have determined that Professor Cord tends to give grades with the following distribution:

4.0 2/16
3.7 1/16
3.3 1/16
3.0 3/16
2.7 1/16
2.3 1/16
2.0 1/16
1.7 1/16
1.3 0/16
1.0 1/16
0.0 4/16

So the expected value of the grade for an average student in his class is

((2/16)*4.0 + (1/16)*3.7 + (1/16)*3.3 + (3/16)*3.0 + (1/16)*2.7 + (1/16)*2.3 + (1/16)*2.0 + (1/16)*1.7 + (0/16)*1.3 + (1/16)*1.0 + (4/16)*0.0) = 2.11

The expected value is the kind of average we will use throughout this course.

2.2. Probably Not a Problem

We don't need a whole lot of facts about probabilities in this course. The following will do:

• Every probability is between 0 and 1, inclusive.

• The sum of the probabilities of all possible events must be 1.0

• If p is the probability that an event will happen, then (1-p) is the probability that the event will not happen.

• If two events are independent and occur with probabilities p1 and p2, respectively, then the probability that both events will occur is p1*p2

• If we make a number of independent attempts at something, and the chance of seeing an event on any single attempt is p, on average we will need 1/p attempts before seeing that event.

2.3. New Loop Rules

When we encounter a loop, we compute the run time by adding up the time required for an average number of iterations.

Special Case: constant # of iterations. If we knew that the loop would terminate after exactly k iterations, where k was a constant, then the rule would be

t_(text(loop)) = t_{text{init}} + \sum_{i=0}^k (t_{text{condition}}(i) + t_{text{body}}(i))

where t_{text{condition}}(i) is the average time required to evaluate the condition on the ith iteration, and t_{text{body}(i) is the time to evaluate the body on the ith iteration.

General case: unknown # of iterations. But in many cases, the number of iterations is not a constant but depends on the inputs. In general, if p_k is the probability that the loop would terminate after k iterations, the average running time would be:

t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty} p_k (t_{text{condition}}(k+1: k) + \sum_{i=0}^k (t_{text{condition}}(i: k) + t_{text{body}}(i: k)))

where t_{text{condition}}(i: k) is the average time required to evaluate the condition on the ith iteration when the loop will take k total iterations, and t_{text{body}}(i: k) is defined similarly for loop bodies.

This is a fairly messy formula, but we'll usually fall back on special cases.

• The t_{text{condition}}(k+1: k) term reflects the fact that, on the final iteration, we evaluate the condition but do not execute the body.

• The \infty is not a mistake. It expresses the general case that some loops could execute an arbitrary number of times depending upon the input. If, however, we knew that a loop never executed more than, say, 100 times, then the probabilities p_{101}, p_{102}, \ldots would all be zero.

• Suppose that we knew that k was a constant. Then p_k = 1 for that single value of k, and p_j = 0 for all j != k. That leads directly to our earlier special case.

Special case: fixed bound on body & condition. If (t_{text{condition}}(i: k) + t_{text{body}}(i: k)) does not vary from one iteration to another, then

t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty} p_k ( (k+1)*t_{text{condition}} + k*t_{text{body}})

Special case: fixed bound and equally probable iterations. If the loop condition and body times do not vary across iterations, and we know that the loop takes anywhere from M to N iterations, and that each number of iterations in that range is equally likely, then

t_(text(loop)) = t_{text{init}} + O(1 + (M + N)/2 . (t_{text{condition}} + t_{text{body}}))

3. Example: Ordered Insert

We'll illustrate the process of doing average-case analysis by looking at the ordered insertion algorithm.

We start, as usual, by marking the simple bits O(1).

Next we note that the loop body can be reduced to O(1).

The complexity of the loop condition and body do not vary from one iteration to another.

It can execute

• 0 times (if x is larger than anything already in the array),

• 1 time (if x is larger than all but one element already in the array),

• and so on up to a maximum of n times (if x is smaller than everything already in the array).

What we don't know are the probabilities to associate with these different numbers of iterations.

• Depends upon the way the successive inputs to this function are distributed.

When I used this algorithm as part of a spell checking programs, I saw two different examples of possible input patterns:

• In some cases, we were getting elements that were already in sorted order (e.g., reading from a sorted dictionary file).

• We also faced input distributions where successive values of x were coming in essentially random order (e.g., adding words from the document into a "concordance" - a set of all words known to have appeared in the document).

3.1. Input in Sorted Order

In that case, we know the algorithm always uses 0 iterations of the loop:

p_0=1, p_1 = 0, p_2 = 0, \ldots

So the time is

t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty} p_k ( (k+1)*t_{text{condition}} + k*t_{text{body}})

but plugging in the p's simplifies to

t_(text(loop)) = t_{text{init}} + t_{text{condition}} = O(1)

For this input pattern, the entire algorithm has an average-case complexity of O(1).

3.2. Input in Arbitrary Order:

In this case, we are equally likely to need 0 iterations, 1 iteration, 2 iterations , …, n iterations. So the iterations are all equally likely and we can use the "fixed bound & equally probable" special case for loops.

t_(text(loop)) = t_{text{init}} + O(1 + (M + N)/2 . (t_{text{condition}} + t_{text{body}}))

and, after our usual simplifications

t_{text{loop}} = O(n)

And we can then replace the entire loop by O(n).

And now, we add up the complexities in the remaining straight-line sequence, and conclude that the entire algorithm has an average case complexity of O(n) when presented with randomly arranged inputs.

This is the same result we had for the worst case analysis. That's not unusual.

Under similar randomly arranged inputs, the average case complexity of ordered search is O(n) and the average case complexity of binary search is O(log n) Again, these are the same as their worst-case complexities.

3.3. General Case

Looking back a few steps in the analysis of this code, what can we say about the average-case complexity before looking at the specific input distribution?

Let kavg denote the average number of times that the loop is executed.

Then we quickly conclude that the loop is O(k_{avg}).

and so, in fact, is the entire function.

3.4. Inputs in Almost-Sorted Order

We've already considered the case where the inputs to this function were already arranged into ascending order. What would happen if the inputs were almost, but not exactly, already sorted into ascending order?

For example, suppose that, on average, one out of n items is out of order. Then the probability of a given input repeating the loop zero times would be p0 = (n-1)/n, and some single pi would have probability 1/n, with all the other probabilities being zero.

Assuming the worst (because we want to find an upper bound), let's assume that the one out-of-order element is the very last one added, and that it actually gets inserted into position 0. Then we have

p_0 = (n-1)/2, p_1 = 0, p_2 = 0, \ldots, p_{n-1} = 0, p_n = 1/n

So the average number of iterations would be given by

k_{text{avg}} = \sum_{i=0}^{n} (i+1)p_i

\ \ \ = 1*{n-1}/n + (n + 1) * /n

\ \ \ = (2n)/n

\ \ \ = 2

and the function is O(k_{text{avg}}) = O(1)

3.5. Almost Sorted - version 2

Now, that's only one possible scenario in which the inputs are almost sorted. Let's look at another. Suppose that we knew that, for each successive input, the probability of it appearing in the input m steps out of its correct position is proportional to 1/(m+1) (i.e., each additional step out of its correct position is progressively more unlikely). Then we have

p_{0}=c, p_{1}=c/2, p_{2}=c/3, \ldots p_{n-1}=c/n, p_{n}=c/(n+1)

The constant c is necessary because the sum of all the probabilities must be exactly 1. We can compute the value of c by using that fact:

\sum_{i=0}^{n} p_i = 1

\sum_{i=0}^{n} \frac{c}{i+1} = 1

c \sum_{i=0}^{n} \frac{1}{i+1} = 1

This sum, for reasonably large n, is approximately log n.

So we conclude that c is approximately = 1/log(n).

So the function, for this input distribution, is

t_{text{seqIns}} = O(k_{text{avg}})

\ \ \ = O(\sum_{i=0}^n (i + 1)p_i)

\ \ \ = O(\sum_{i=0}^n (i + 1) c/(i+1})

\ \ \ = O(\sum_{i=0}^n c)

\ \ \ = O((n+1)c)

\ \ \ = O(n/(log n))

S o the average case is slightly smaller than the worst case, though not by much (remember that log n is nearly constant over large ranges of n, so n/(log(n)) grows only slightly slower than n.

4. The Input Distribution is Key

You can see, then, that average case complexity can vary considerably depending upon just what constitutes an average set of inputs. Utility functions that get used in many different programs may see different input distributions in each program, and so their average performances in different programs will vary accordingly.