Analysis of Algorithms: Average Case Analysis

Steven J. Zeil

Old Dominion University, Dept. of Computer Science

Table of Contents

1. Definition
1.1.
2. Determining the Average Case Complexity
2.1. What's an Average?
2.2. Probably Not a Problem
2.3. New Loop Rules
3. Example: Ordered Insert
3.1. Input in Sorted Order
3.2. Input in Arbitrary Order:
3.3. General Case
3.4. Inputs in Almost-Sorted Order
3.5. Almost Sorted - version 2
4. The Input Distribution is Key

In the first section of this course, we looked at the process of analyzing the worst-case running time of an algorithm, and the use of worst-case analysis and big-O notation as a way of describing it.

In this section, we will introduce average-case complexity. Just as the worst-case complexity describes an upper bound on the worst-case time we would see when running an algorithm, average case complexity will present an upper bound on the average time we would see when running the program many times on many different inputs.

It is true that worst-case complexity gets used more often than average-case. There are a number of reasons for this.

On the other hand, suppose we're talking about a batch program that will process thousands of inputs per run, or we're talking about a critical piece of an interactive program that gets run hundreds or thousands of times in between each response from the user. In that situation, adding up hundreds or thousands of worst-cases may be just too pessimistic. In that circumstance, the cumulative time of thousands of different runs should show some averaging out of th worst-case behavior, and an average case analysis may give a more realistic picture of what the user will be seeing.

1. Definition

We say that an algorithm requires average time proportional to f(n) (or that it has average-case complexity O(f(N))) if there are constants c and n0 such that the average time the algorithm requires to process an input set of size n is no more than c*f(n) time units whenever n ≥ n0.

This definition is very similar to the one for worst case complexity.

An algorithm requires average time proportional to f(n) (or that it has average-case complexity O(f(N))) if there are constants c and n0 such that the average time the algorithm requires to process an input set of size n is no more than c*f(n) time units whenever n ≥ n0.

The biggest difference is that we deal in the average time to process input sets of size n, instead of the maximum (worst case) time.

An algorithm requires average time proportional to f(n) (or that it has average-case complexity O(f(N))) if there are constants c and n0 such that the average time the algorithm requires to process an input set of size n is no more than c*f(n) time units whenever n ≥ n0.

But note that we are still looking for an upper bound on the algorithm's behavior.

The average case complexity describes how quickly the average time increases when n increases, just as the worst case complexity describes how quickly the worst case time increases when n increases.

Question: Suppose we have an algorithm with worst case complexity O(n).

True or false: It is possible for that algorithm to have average case complexity O(n2).

1.1. 

Question: Suppose we have an algorithm with worst case complexity O(n).

True or false: It is possible for that algorithm to have average case complexity O(n2).

False. If it were true, the average case time would be increasing faster than the worst case time. Eventually, that means that the average case time for some large n would be larger than the worst case among all input sets of size n. But an average can never be larger than the maximum value in the set being averaged.

So the only way this statement could be true is in the trivial sense that any O(n) is also O(n2). That trivial case is ruled out by the no better than stipulation.

This is an important idea to keep in mind as we discuss the rules for average case analysis. The worst-case analysis rules all apply, because they do provide an upper bound. But that bound is sometimes not as tight as it could be. One of the things that makes average case analysis tricky is recognizing when we can get by with the worst-case rules and when we can gain by using a more elaborate average-case rule. There's no hard-and fast way to tell --- it requires the same kind of personal judgment that goes on in most mathematical proofs.

2. Determining the Average Case Complexity

Process is much the same as for worst-case except that

  • When evaluating function calls, we replace by the average-case complexity of their bodies

  • When evaluating loops (or recursion) we look at the average number of iterations

2.1. What's an Average?

For some people, average case analysis is difficult because they don't have a very flexible idea of what an "average" is.

Last semester, Professor Cord gave out the following grades in his CS361 class:

A, A, A-, B+, B, B, B, B-, C+, C, C-, D, F, F, F, F

Translating these to their numerical equivalent,

4, 4, 3.7, 3.3, 3, 3, 3, 2.7, 2.3, 2, 1.7, 1, 0, 0, 0, 0

what was the average grade in Cord's class?

According to some classic forms of average:

Median

the middle value of the sorted list, (the midpoint between the two middle values given an even number of items)

avgmedian = 2.5
Mode

the most commonly occurring value

avgmodal = 0
mean

Computed from the sum of the elements

avgmean = (4 + 4 + 3.7 + 3.3 + 3 + 3 + 3 + 2.7 + 2.3 + 2 + 1.7 + 1 + 0 + 0 + 0 + 0) / 16 = 2.11

Mean Average

The mean average is the most commonly used, but even that comes in comes in many varieties

Simple mean
`\bar{x} = (\sum_{i=1}^N x_i) / N`
Weighted mean
`\bar{x} = (\sum_{i=1}^N w_i * x_i) / (\sum_{i=1}^N w_i)`

Example: Last semester Professor Cord gave the following grades

Grade # students
4.0 2
3.7 1
3.3 1
3.0 3
2.7 1
2.3 1
2.0 1
1.7 1
1.3 0
1.0 1
0.0 4

The weighted average is

(2*4.0 + 1*3.7 + 1*3.3 + 3*3.0 + 1*2.7 + 1*2.3 + 1*2.0 + 1*1.7 + 0*1.3 + 1*1.0 + 4*0.0) / (2 + 1 + 1 + 3 + 1 + 1 + 1 + 1 + 0 + 1 + 4) = 2.11

Example: When one student asked about his grade, Professor Cord pointed out that assignments were worth 50% of the grade, the final exam was worth 30%, and the midterm exam worth 20%. The student has a B, A, and C-, respectively on these.

Category Score Weight
Assignments 3.0 50
Final 4.0 30
Midterm 1.7 20

So the student's average grade was

(50*3.0 + 30*4.0 + 20*1.7)/(50+30+20) = 3.04

Expected Value

The expected value is a special version of the weighted mean in which the weights are the probability of seeing each particular value.

If `x_1, x_2, \ldots,` are all the possible values of some quantity, and these values occur with probability `p_1, p_2, \ldots,`, then the expected value of that quantity is

`E(x) = \sum_{i=1}^N p_i * x_i`

Note that if we have listed all possible values, then

`\sum_{i=1}^N p_i = 1`

so the usual denominator in the definition of the weighted sum becomes simply "1".

Example: after long observation, we have determined that Professor Cord tends to give grades with the following distribution:

Grade probability
4.0 2/16
3.7 1/16
3.3 1/16
3.0 3/16
2.7 1/16
2.3 1/16
2.0 1/16
1.7 1/16
1.3 0/16
1.0 1/16
0.0 4/16

So the expected value of the grade for an average student in his class is

`((2/16)*4.0 + (1/16)*3.7 + (1/16)*3.3 + (3/16)*3.0 + (1/16)*2.7 + (1/16)*2.3 + (1/16)*2.0 + (1/16)*1.7 + (0/16)*1.3 + (1/16)*1.0 + (4/16)*0.0) = 2.11`

The expected value is the kind of average we will use throughout this course.

2.2. Probably Not a Problem

We don't need a whole lot of facts about probabilities in this course. The following will do:

  • Every probability is between 0 and 1, inclusive.

  • The sum of the probabilities of all possible events must be 1.0

  • If p is the probability that an event will happen, then (1-p) is the probability that the event will not happen.

  • If two events are independent and occur with probabilities p1 and p2, respectively, then the probability that both events will occur is p1*p2

  • If we make a number of independent attempts at something, and the chance of seeing an event on any single attempt is p, on average we will need 1/p attempts before seeing that event.

2.3. New Loop Rules

When we encounter a loop, we compute the run time by adding up the time required for an average number of iterations.

Special Case: constant # of iterations. If we knew that the loop would terminate after exactly k iterations, where k was a constant, then the rule would be

`t_(text(loop)) = t_{text{init}} + \sum_{i=0}^k (t_{text{condition}}(i) + t_{text{body}}(i))`

where `t_{text{condition}}(i)` is the average time required to evaluate the condition on the ith iteration, and `t_{text{body}(i)` is the time to evaluate the body on the ith iteration.

General case: unknown # of iterations. But in many cases, the number of iterations is not a constant but depends on the inputs. In general, if `p_k` is the probability that the loop would terminate after `k` iterations, the average running time would be:

`t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty} p_k (t_{text{condition}}(k+1: k) + \sum_{i=0}^k (t_{text{condition}}(i: k) + t_{text{body}}(i: k)))`

where `t_{text{condition}}(i: k)` is the average time required to evaluate the condition on the ith iteration when the loop will take k total iterations, and `t_{text{body}}(i: k)` is defined similarly for loop bodies.

This is a fairly messy formula, but we'll usually fall back on special cases.

  • The `t_{text{condition}}(k+1: k)` term reflects the fact that, on the final iteration, we evaluate the condition but do not execute the body.

  • The `\infty` is not a mistake. It expresses the general case that some loops could execute an arbitrary number of times depending upon the input. If, however, we knew that a loop never executed more than, say, 100 times, then the probabilities `p_{101}, p_{102}, \ldots would all be zero.

  • Suppose that we knew that k was a constant. Then `p_k = 1` for that single value of k, and `p_j = 0` for all `j != k`. That leads directly to our earlier special case.

Special case: fixed bound on body & condition. If `(t_{text{condition}}(i: k) + t_{text{body}}(i: k))` does not vary from one iteration to another, then

`t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty} p_k ( (k+1)*t_{text{condition}} + k*t_{text{body}})`

Special case: fixed bound and equally probable iterations. If the loop condition and body times do not vary across iterations, and we know that the loop takes anywhere from M to N iterations, and that each number of iterations in that range is equally likely, then

`t_(text(loop)) = t_{text{init}} + O(1 + (M + N)/2 . (t_{text{condition}} + t_{text{body}}))`

3. Example: Ordered Insert

We'll illustrate the process of doing average-case analysis by looking at the ordered insertion algorithm.

We start, as usual, by marking the simple bits O(1).

Next we note that the loop body can be reduced to O(1).

The complexity of the loop condition and body do not vary from one iteration to another.

It can execute

  • 0 times (if x is larger than anything already in the array),

  • 1 time (if x is larger than all but one element already in the array),

  • and so on up to a maximum of n times (if x is smaller than everything already in the array).

What we don't know are the probabilities to associate with these different numbers of iterations.

  • Depends upon the way the successive inputs to this function are distributed.

When I used this algorithm as part of a spell checking programs, I saw two different examples of possible input patterns:

  • In some cases, we were getting elements that were already in sorted order (e.g., reading from a sorted dictionary file).

  • We also faced input distributions where successive values of x were coming in essentially random order (e.g., adding words from the document into a "concordance" - a set of all words known to have appeared in the document).

3.1. Input in Sorted Order

In that case, we know the algorithm always uses 0 iterations of the loop:

`p_0=1, p_1 = 0, p_2 = 0, \ldots`

So the time is

`t_(text(loop)) = t_{text{init}} + \sum_{k=0}^{\infty} p_k ( (k+1)*t_{text{condition}} + k*t_{text{body}})`

but plugging in the p's simplifies to

`t_(text(loop)) = t_{text{init}} + t_{text{condition}} = O(1)`

For this input pattern, the entire algorithm has an average-case complexity of O(1).

3.2. Input in Arbitrary Order:

In this case, we are equally likely to need 0 iterations, 1 iteration, 2 iterations , …, n iterations. So the iterations are all equally likely and we can use the "fixed bound & equally probable" special case for loops.

`t_(text(loop)) = t_{text{init}} + O(1 + (M + N)/2 . (t_{text{condition}} + t_{text{body}}))`

and, after our usual simplifications

`t_{text{loop}} = O(n)`

And we can then replace the entire loop by `O(n)`.

And now, we add up the complexities in the remaining straight-line sequence, and conclude that the entire algorithm has an average case complexity of O(n) when presented with randomly arranged inputs.

This is the same result we had for the worst case analysis. That's not unusual.

Under similar randomly arranged inputs, the average case complexity of ordered search is O(n) and the average case complexity of binary search is O(log n) Again, these are the same as their worst-case complexities.

3.3. General Case

Looking back a few steps in the analysis of this code, what can we say about the average-case complexity before looking at the specific input distribution?

Let kavg denote the average number of times that the loop is executed.

Then we quickly conclude that the loop is `O(k_{avg})`.

and so, in fact, is the entire function.

3.4. Inputs in Almost-Sorted Order

We've already considered the case where the inputs to this function were already arranged into ascending order. What would happen if the inputs were almost, but not exactly, already sorted into ascending order?

For example, suppose that, on average, one out of n items is out of order. Then the probability of a given input repeating the loop zero times would be p0 = (n-1)/n, and some single pi would have probability 1/n, with all the other probabilities being zero.

Assuming the worst (because we want to find an upper bound), let's assume that the one out-of-order element is the very last one added, and that it actually gets inserted into position 0. Then we have

`p_0 = (n-1)/2, p_1 = 0, p_2 = 0, \ldots, p_{n-1} = 0, p_n = 1/n`

So the average number of iterations would be given by

`k_{text{avg}} = \sum_{i=0}^{n} (i+1)p_i`

`\ \ \ = 1*{n-1}/n + (n + 1) * /n`

`\ \ \ = (2n)/n`

`\ \ \ = 2`

and the function is `O(k_{text{avg}}) = O(1)`

3.5. Almost Sorted - version 2

Now, that's only one possible scenario in which the inputs are almost sorted. Let's look at another. Suppose that we knew that, for each successive input, the probability of it appearing in the input `m` steps out of its correct position is proportional to `1/(m+1)` (i.e., each additional step out of its correct position is progressively more unlikely). Then we have

`p_{0}=c, p_{1}=c/2, p_{2}=c/3, \ldots p_{n-1}=c/n, p_{n}=c/(n+1)`

The constant `c` is necessary because the sum of all the probabilities must be exactly 1. We can compute the value of `c` by using that fact:

`\sum_{i=0}^{n} p_i = 1`

`\sum_{i=0}^{n} \frac{c}{i+1} = 1`

`c \sum_{i=0}^{n} \frac{1}{i+1} = 1`

This sum, for reasonably large n, is approximately `log n`.

So we conclude that `c` is approximately `= 1/log(n)`.

So the function, for this input distribution, is

`t_{text{seqIns}} = O(k_{text{avg}})

`\ \ \ = O(\sum_{i=0}^n (i + 1)p_i)`

`\ \ \ = O(\sum_{i=0}^n (i + 1) c/(i+1})`

`\ \ \ = O(\sum_{i=0}^n c)`

`\ \ \ = O((n+1)c)`

`\ \ \ = O(n/(log n))`

S o the average case is slightly smaller than the worst case, though not by much (remember that `log n` is nearly constant over large ranges of `n`, so `n/(log(n))` grows only slightly slower than `n`.

4. The Input Distribution is Key

You can see, then, that average case complexity can vary considerably depending upon just what constitutes an average set of inputs. Utility functions that get used in many different programs may see different input distributions in each program, and so their average performances in different programs will vary accordingly.


In the Forum:

(no threads at this time)