**Table of Contents**

A tree in which every parent has at most 2 children is a
*binary tree*.

The most common use of binary trees is for ADTs that require frequent searches for arbitrary keys.

E.g., sets, maps

For this we use a special form of binary tree, the binary search tree.

A binary tree T is a *binary search tree* if
for each node n with children T_{L} and T_{R}:

The value in n is greater than the values in every node in T

_{L}.The value in n is less than the values in every node in T

_{R}.Both T

_{L}and T_{R}are binary search trees.

**Question**: Is this a BST?

Yes, this is a binary search tree. Each node is greater than or equal to all of its left descendants, and is less than or equal than all of its right descendants.

Let's look at the basic interface for a binary search tree.

This code is taken from your textbook and is the same code used in our prior discussion of tree iterators.

Some points of note:

The

`stnode`

template implements individual tree nodes.The

`stree`

template represents the entire tree, with functions for searching, insertion, iteration, etc..Our primary focus in this lecture will be on the

`find`

,`insert`

and`erase`

functions.

Since you have, presumably, read your text's discussion of how to implement BSTs, I'm mainly going to hit the high points.

We'll start by reviewing the basic searching algorithm.

The tree's `find`

operation works by using a private
utility function, `findNode`

to find a pointer to the node
containing the desired data and then uses that pointer to construct an
iterator representing the position of that node.

We search a tree by comparing the value we're searching for to the
“current” node, `t`

. If the value we want is
smaller, we look in the left subtree. If the value we want is larger, we
look in the right subtree.

You may note that this algorithm bears a certain resemblance to the binary search algorithm we studied earlier in the semester. We shall see shortly that the performance of both search algorithms on a collection of N items is O(log N), but that binary trees support faster insertion operations, allowing us to build the searchable collection in less time than when using binary search over sorted arrays.

You can run this algorithm to see how it works.

The first part of the
insertion function is closely related to the recursive form of the
search. In fact, we are searching for the place where the new data
*would* reside, if it were in the tree.

We know we have not found it when we reach a null pointer. Since that pointer (as either the left or right child of some parent node) was found by asking “where would this data go if it were in the tree?”, we know that we can, in fact, insert the data here.

You might want to run this algorithm and experiment with inserting nodes into binary search trees. Take particular note of what happens if you insert data in ascending or descending order, as opposed to inserting “randomly” ordered data.

Our tree class actually provides two distinct approaches to erasing. We can erase the data at a given position (iterator) or erase a given value, if it exists.

Deleting a value is shown here. We simply do a conventional binary
search tree `findNode`

call and, if the value actually exists
in the tree, erase the node at the position where we found the
data.

In essence, this “passes the buck” to the "erase at a position" function, which we will look at next.

Here is the `erase`

algorithm. For the moment,
concentrate on the code for replacing the
node we want to erase, `pNodePtr`

, by a
replacement node `rNodePtr`

. You can see that it is careful
to place the address of the replacement into either the tree root, the
left child of the erased node's parent, or the right child of the erased
node's parent, depending on the data value in the parent.

Most of the code in this function is actually concerned with finding that replacement node. We can break down the problem of finding a suitable replacement when removing a given node from a BST into cases:

Removing a leaf

Removing a node that has only one child

only a left child

only a right child

Removing a node that has two children

Case 1: Suppose we wanted to remove the “40” from this tree. What would we have to do so that the remaining nodes would still be a valid BST?

*Nothing at all*!

If we simply delete this node (setting the pointer to it from its parent to 0), what's left would still be a perfectly good binary search tree --- it would satisfy all the BST ordering requirements.

Now, take a look at this code for removing a node, pointed at by
`dNodePtr`

, from a BST. Find the “leaf” case, and you can see
that all we do is to delete the node. (Note that when we assign
`dNodePtr->left`

to `rNodePtr`

, that in this
leaf case `dNodePtr->left`

is null.)

So if we are removing a tree leaf, we "replace" it by a null pointer.

Case 2: Suppose we wanted to remove the “20” or the “70” from this tree. What would we have to do so that the remaining nodes would still be a valid BST?

There is one pointer to the node being deleted, and one pointer from that node to its only child. So this is actually a bit like deleting a node from the middle of a linked list. All we need to do is to reroute the pointer from the parent (“30”) to the node we want to remove, making that pointer point directly to the child of the node we are going to remove.

For example, starting from this:

Verify for yourself if we remove 20:

or 70:

in this manner, that the results are still valid BSTs.

Again, take a look at this
code for the case when the node being erased has exactly
one child. Notice that its non-null child is chosen as the replacement
node, `rNodePtr`

.

Case 3: Suppose we wanted to remove the “50” or the “30” from this tree. What would we have to do so that the remaining nodes would still be a valid BST?

This is a hard case. Clearly, if we remove either the "50" or "30" nodes, we break the tree into pieces, with no obvious place to put the now-detached subtrees.

So let's take a different tack. Instead of deleting this node, is there some other data value that we could put into that node that would preserve the BST ordering (all nodes to the left must be less, all nodes to the right must be greater or equal)?

There are, in fact, two values that we could safely put in there: the smallest value from the right subtree, or the largest value from the left subtree.

We can find the largest value on the left by

taking one step to the left

then running as far down to the right as we can go

We can find the smallest value on the right by

taking one step to the right

then running as far down to the left as we can go

Now, if we replace “30” by the largest value from the left:

or by the smallest value from the right:

the results are properly ordered for a BST, except possibly for the node we just copied the value from. But since that node is now redundant, we can delete it from the tree.

And here's the best part. Since we find the node to copy from by running as far as we can go in one direction or the other, we know that the node we copied from has at least 1 null child pointer (otherwise we would have kept running past it). So removing it from the tree will always fall into one of the earlier, simpler cases (leaf or only one child).

Again, take a look at the code for removing a node. This code does the "step to the right, then run to the left" behavior we have just described in order to find the replacement node. The remaining code is concerned with removing that replacement node from where it currently resides so that we can then link it in to the parent of the node being erased.

Finally, try running this algorithm, available as “erase from a position”. Try to observe each of the major cases, as outlined here, in action.

Each step in the BST `insert`

and `findNode`

algorithms move one level deeper in the tree. Similarly, in
`erase`

, the only part that is not constant time is the
“running down the tree” to find the smallest value to the
right.

The number of recursive calls/loop iterations in all these algorithms is therefore no greater than the height of the tree.

But how high can a BST be?

That depends on how well the tree is “balanced”.

A binary tree is *balanced* if for every
interior node, the height of its two children differ by at most
1.

Unbalanced trees are easy to obtain.

This is a BST.

But, so is this!

The shape of the tree depends upon the order of insertions.

The worst case is when the data being inserted is already in order
(or in reverse order). In that case, the tree
*degenerates* into a sorted linked list, as shown
here.

The best case is when the tree is *balanced*,
meaning that, for each node, the heights of the node's children are
nearly the same.

Consider the `findNode`

operation on a nearly balanced
tree with N nodes.

**Question**: What is the complexity
of the best case?

O(1)

O(log N)

O(N)

O(N log N)

O(N^2)

In the best case, we find what we're looking for in the root of the tree. That's O(1) time.

**Question**: Consider the
`findNode`

operation on a nearly balanced tree with N
nodes.

What is the complexity of the worst case?

O(1)

O(log N)

O(N)

O(N log N)

O(N^2)

The `findNode`

operation starts at the root and moves
down one level each recursion. So it is, in the worst case,
O(h) where h is the height of the tree.

But how high is a balanced tree?

A nearly balanced tree will be height log N. Consider a tree that is completely balanced and has its lowest level full. Since every node on the lowest level shares a parent with one other, there will be exactly half as many nodes on the next-to-lowest level as on the lowest. And, by the same reasoning, each level will have half as many nodes as the one below it, until we finally get to the single root at the top of the tree.

So a balanced tree has height log N.

**Question**: Now, consider the
`findNode`

operation on a degenerate tree with N
nodes.

What is the complexity of the best case?

O(1)

O(log N)

O(N)

O(N log N)

O(N^2)

In the best case, we find what we're looking for in the root of the tree. That's O(1) time.

**Question**: Consider the
`findNode`

operation on a degenerate tree with N
nodes.

What is the complexity of the worst case?

O(1)

O(log N)

O(N)

O(N log N)

O(N^2)

A degenerate tree looks like a linked list. In the worst case, the value we're looking for is at the end of the list, so we have to search through all N nodes to get there. Thus the worst case is O(N).

There's quite a difference, then, between the worst case behavior of trees, depending upon the tree's “shape”.

So the question is, does the "average" binary tree look more like the balanced or the degenerate case?

An intuitive argument is:

No tree with n nodes has height `< log(n)`

No tree with n nodes has height `> n`

Average depth of all nodes is therefore bounded between `n/2` and `(log n)/2`.

The more unbalanced a tree is, the less likely that a random insertion would increase the tree height.

For example, if we are inserting into this tree, then any insertion will increase the tree's height.

But if we were inserting a randomly selected value into this one, then there is only a `2/8` chance that we will increase the height of the tree.

For trees that are somewhere between those two extremes, the chances of a random insertion actually increasing the height of the tree will fall somewhere between those two probability extremes.

Insertions that don't increase the tree height make the tree more balanced.

So, the more unbalanced a tree is, the more likely that a
*random* insertion will actually tend to increase the
balance of the tree. This suggests (but does not prove) that randomly
constructed binary search trees tend to be reasonably balanced.

It is possible to prove this claim, but the proof is beyond the scope of this class.

But, it's not safe to be too sanguine about the height of binary search trees. Although random construction tends to yield reasonable balance, in real applications we often do not get random values.

**Question**: Which of the following
data would, if inserted into an initially empty binary search tree,
yield a degenerate tree?

data that is in ascending order

data that is in descending order

both of the above

none of the above

Both data in ascending and descending order results in degenerate trees. (Try it if you are not convinced.)

It's very common to get data that is in sorted or almost sorted order, so degenerate behavior turns out to be more common than we might expect.

Also, the arguments made so far don't take deletions into account, which tend to unbalance trees.

Later, we'll look at variants of the binary search tree that use more elaborate insertion and deletion algorithms to maintain tree balance.

## In the Forum: