Hash-Based Sets and Maps

Steven J. Zeil

Old Dominion University, Dept. of Computer Science

Table of Contents

1. Examples: The Unordered Set
1.1. An Unordered Set of Strings
1.2. Supplying a Hash Function
2. All This May Change

Coming soon to a compiler near you!

The associative containers (sets and maps) in the C++ std library are based, as we have seen, on balanced binary trees. There are times when even the O(log size()) performance of these containers is considered too slow. Hashing-based versions of these have been one of the more common requests to the C++ standards committees.

As part of the ongoing standardization process, a set of hash-based containers has been proposed and has made it through some of the early rounds of review. Before too long, you may begin to see provisional forms of these being distributed with some C++ compilers. (In fact, the latest versions of g++ now include these containers.)

The hash-based versions of set and map (and of their "multi-" cousins) will offer an average of nearly O(1) time for insertion and searching. As always, with hashing, we pay for this increase in speed with an increase in memory required. To guarantee the O(1) time, these classes will use rehashing when the tables get full enough to degrade the performance.

The tree-based set and map containers have the property that they keep their keys in order. When we use iterators to look at the contents of a std::set, for example, we get the data in ascending order.

Hash tables, on the other hand, by their very nature try to distribute their keys as randomly as possible. So one of the things that we give up when using hash-base storage is that ordering. We can still use iterators to get at all the keys, but there's no telling what in order we will see those data values appear. Because of this, the new hash-based containers have been dubbed unordered associative containers.

In the Forum:

(no threads at this time)