Lists are great for storing a bunch of items, but what about looking up specific elements? In the previous chapter, a skip list greatly outperformed a regular linked list when simply finding an item. Why? Because it was utilizing an iteration strategy that resembles that of a balanced tree structure: there, the internal order lets the algorithm strategically skip items. However, that's only the beginning. Many libraries, databases, and search engines are built on trees; in fact, whenever a program is compiled, the compiler creates an abstract syntax tree.
Tree-based data structures incorporate all kinds of smart ideas that we will explore in this chapter, so you can look forward to the following:
A tree structure is almost like a linked list: each node has branches—in the case of a binary tree, there are two—which represent children of that node. Since these children have children of their own, the node count grows exponentially, building a hierarchical structure that looks like a regular tree turned on its head.
Binary trees are a subset of these structures with only two branches, typically called left and right. However, that does not inherently help the tree's performance. This is why using a binary search tree, where left represents the smaller or equal value to its parent, and right anything that's greater than that parent node, was established!
If that was confusing, don't worry; there will be code. First, some vocabulary though: what would you call the far ends of the tree? Leaves. Cutting off branches? Pruning. The number of branches per node? Branching factor (binary trees have a branching factor of 2).
Great, with that out of the way, the nodes can be shown—although they look a lot like the doubly linked list from the previous chapter:
type Tree = Option<Box<Node>>;
struct Node {
pub value: u64,
left: Tree,
right: Tree,
}
Similarly, the tree structure itself is only a pointer to the root node:
pub struct BinarySearchTree {
root: Tree,
pub length: u64,
} Yet before you can get comfortable with the new data structure, the product team from the previous chapter is back! You did a great job improving the transaction log and they want to continue that progress and build an Internet of Things (IoT) device management platform so users can register a device with a numerical name and later search for it. However, the search has to be fast or really fast, which is especially critical since many customers have announced the incorporation of more than 10,000 devices into the new system!
Isn't this a great opportunity to get more experience with a binary search tree?
Device management in the IoT space is mostly about storing and retrieving specific devices or device twins. These objects typically store addresses, configuration values, encryption keys, or other things for small devices so nobody has to connect manually. Consequently, keeping an inventory is critical!
For now, the product team settled on a numerical "name", to be available faster than the competition, and to keep the requirements short:
- Store IoT device objects (containing the IP address, numerical name, and type)
- Retrieve IoT objects by numerical name
- Iterate over IoT objects
A great use for a tree: the numerical name can be used to create a tree and search for it nice and quickly. The basic object for storing this IoT device information looks like this:
#[derive(Clone, Debug)]
pub struct IoTDevice {
pub numerical_id: u64,
pub address: String,
}
For simplicity, this object will be used in the code directly (adding generics isn't too tricky, but would go beyond the scope of this book):
type Tree = Option<Box<Node>>;
struct Node {
pub dev: IoTDevice,
left: Tree,
right: Tree,
}
Starting with this basic implementation, the requisite operations, add and find, can be implemented.
Unlike lists, trees make a major decision on insert: which side does the new element go to? Starting at the root node, each node's value is compared to the value that is going to be inserted: is this greater than or less than that? Either decision will lead down a different subtree (left or right).
This process is (usually recursively) repeated until the targeted subtree is None, which is exactly where the new value is inserted—as a leaf of the tree. If this is the first value going into the tree, it becomes the root node. There are some problems with this, and the more experienced programmers will have had a strange feeling already: what happens if you insert numbers in ascending order?
These feelings are justified. Inserting in ascending order (for example, 1, 2, 3, 4) will lead to a tree that is basically a list in disguise! This is also called a (very) unbalanced tree and won't have any of the benefits of other trees:
1
/ \
2
/ \
3
/ \
4
During this chapter, we are going to go a lot more things on balancing trees and why that is important in order to achieve high performance. In order to avoid this pitfall associated with binary search trees, the first value to insert should ideally be the median of all elements since it will be used as the root node, as is visible in the following code snippet:
pub fn add(&mut self, device: IoTDevice) {
self.length += 1;
let root = mem::replace(&mut self.root, None);
self.root = self.add_rec(root, device);
}
fn add_rec(&mut self, node: Tree, device: IoTDevice) -> Tree {
match node {
Some(mut n) => {
if n.dev.numerical_id <= device.numerical_id {
n.left = self.add_rec(n.left, device);
Some(n)
} else {
n.right = self.add_rec(n.right, device);
Some(n)
}
}
_ => Node::new(device),
}
} Split into two parts, this code walks the tree recursively to find the appropriate position and attache...