Copyright © 1999-2006 Steven J. Zeil, Old Dominion University
Table of Contents
In the first section of this course, we looked at the process of analyzing the worst-case running time of an algorithm, and the use of worst-case analysis and big-O notation as a way of describing it.
In this section, we will introduce average-case complexity. Just as the worst-case complexity describes an upper bound on the worst-case time we would see when running an algorithm, average case complexity will present an upper bound on the average time we would see when running the program many times on many different inputs.
It is true that worst-case complexity gets used more often than average-case. There are a number of reasons for this.
The worst-case complexity is often easier to compute than the average case. Just figuring out what an “average” set of inputs will look like is often a challenge. To figure out the worst case complexity, we only need to identify that one single input that results in the slowest running.
In many cases, it will turn out the worst and average complexity will turn out to be the same.
Finally, reporting the worst case to your boss or your customers is often “safer” than reporting the average. If you give them the average, then sometimes they will run the program and see slower performance than they had expected. Human nature being what it is, they will probably get rather annoyed. On the other hand, if you go to those same customers with the worst-case figure, most of the time they will observe faster-than-expected behavior, and will be more pleased.
This appears to be particularly true of interactive programs. When people are actually sitting there typing things in or clicking with the mouse and then waiting for a response, if they have to sit for a long time waiting for a response, they're going to remember that. Even if 99.9% of the time they get instant response (so the average response is still quite good), they will characterize your program as “sluggish”.
In those circumstances, it makes sense to focus on worst-case behavior and to do what we can to improve that worst case.
On the other hand, suppose we're talking about a batch program that will process thousands of inputs per run, or we're talking about a critical piece of an interactive program that gets run hundreds or thousands of times in between each response from the user. In that situation, adding up hundreds or thousands of worst-cases may be just too pessimistic. In that circumstance, the cumulative time of thousands of different runs should show some averaging out of th worst-case behavior, and an average case analysis may give a more realistic picture of what the user will be seeing.
In the Forum: