Analysis of Algorithms: Motivation

1. Comparison of Spellcheck Solutions
2. Why Not Just Time the Programs?
2.1. Better than Timing

An important theme throughout this semester will be viewing the process of developing software as an engineering process. Now, engineers in traditional engineering disciplines, civil engineers, electrical engineers, and the like, face trade offs in developing a new product, trade offs in cost, performance, and quality. One of the jobs of an engineer is to look at competing proposals for the design of a new product and to estimate ahead of time the cost, the speed, the strength, the quality, etc. of products that would result from these competing designs.

Software developers face the same kinds of choices. Early on, you may have several alternative designs and need to make a decision about which of those designs to actually pursue. It's no good waiting until the program has already been implemented, written down in code. By then you've already committed to one design and invested significant resources into it.

How do we make these kinds of choices? In this course, we'll be looking at mathematical techniques for analyzing algorithms to determine what their speed will be. It will be important that we do this both with real algorithms already written into code and with proposed algorithms that have been given a much sketchier description, probably written in pseudocode.

Suppose that we worked for a company that was producing word processors and other text manipulation programs. They have decided to add an automatic spell-check feature to the product line. Our designers have considered the process of how to check a document for spelling errors (i.e., any words not in a "dictionary" of known words) and have proposed two different algorithms for finding the set of misspelled words within a target file.

In the first, shown here, we read words, one at a time, from the target file. Each word that is not in the dictionary gets added to the set of misspellings.

Some of the designers, however, have objected that the first algorithm will waste time by repeatedly looking up common words like "a", "the", etc., in the dictionary.

They suggest an alternative algorithm shown here. It works by first collecting all words from the document to form a concordance, an index of all the words taken from a document together with the locations where they were found. Then each word is checked just once against the dictionary, no matter how many times that word actually occurs within the target document.

So, which of these algorithms is likely to run faster?