Understanding Binary Search Trees (BSTs)
What Is a Binary Search Tree?
A Binary Search Tree (BST) is a type of binary tree where each node has at most two children, and the tree is organized based on the following properties:
- Left Child Property: All nodes in the left subtree of a node have values less than the node's value.
- Right Child Property: All nodes in the right subtree of a node have values greater than the node's value.
- Always restate the ordering rule (left < node < right) in answers.
- IB mark schemes expect this explicitly.
These properties ensure that a BST is always sorted, making it efficient for operations like searching, insertion, and deletion.
Structure of a BST
Structure of a BST
- A binary tree where each node has at most two children:
- Left child contains values smaller than the parent.
- Right child contains values greater than the parent.
- Ensures that the tree remains ordered, making searching efficient.
- Can be represented as a tree diagram with nodes and branches.
- Leaf Nodes: Nodes that have no children (both left and right pointers are null), marking the end of a branch in the tree.
- When drawing or describing a BST, always label the root, internal nodes, and leaf nodes.
- Use the ordering rule (left < parent < right) in explanations: examiners often award marks for explicitly stating this.
The root node is the starting point for all operations in a BST, while leaf nodes represent the endpoints of the tree's branches.
- Confusing leaf nodes with nodes that have one child: leaf nodes must have no children at all.
- Forgetting that BSTs are ordered: placing nodes randomly breaks the BST property.
Properties of a BST
- Ordered Structure: The left subtree of a node contains values less than the node, and the right subtree contains values greater than the node.
- Dynamic Size: BSTs can grow and shrink as nodes are inserted or deleted.
- Balanced vs. Unbalanced Trees:
- A balanced BST has a minimized depth, ensuring optimal search times.
- An unbalanced BST can degrade to a linked list, resulting in inefficient operations.
A balanced BST maintains a time complexity of $O(\log n)$ for search operations, while an unbalanced BST can degrade to $O(n)$.
Operations on a BST
Insertion
- Start at the Root: Begin at the root node.
- Compare Values:
- If the new value is less than the current node, move to the left child.
- If the new value is greater, move to the right child.
- Find the Correct Position: Repeat the comparison until an empty spot is found.
- Insert the Node: Place the new node in the correct position.
When inserting a node, always maintain the BST's ordering properties by comparing the new value with existing nodes.
Deletion
- Find the Node: Locate the node to be deleted.
- Three Cases:
- No Children: Simply remove the node.
- One Child: Replace the node with its child.
- Two Children: Find the node's in-order successor (the smallest node in the right subtree) and replace the node with the successor.
- Deleting a node with two children requires careful handling to maintain the BST's structure.
- The in-order successor is often used to preserve the tree's order.
Searching
- Start at the Root: Begin at the root node.
- Compare Values:
- If the target value is less than the current node, move to the left child.
- If the target value is greater, move to the right child.
- Repeat Until Found: Continue until the target value is found or a leaf node is reached.
- The efficiency of searching in a BST depends on its balance.
- A balanced BST offers $O(\log n)$ search time, while an unbalanced BST can degrade to $O(n)$.
Traversal
- In-Order Traversal: Visit the left subtree, then the current node, and finally the right subtree. This produces a sorted list of values.
- Pre-Order Traversal: Visit the current node, then the left subtree, and finally the right subtree.
- Post-Order Traversal: Visit the left subtree, then the right subtree, and finally the current node.
In-order traversal is particularly useful for BSTs, as it returns the nodes in sorted order.
Sketching a BST
Constructing a BST with Integers
Data Set: 40, 20, 60, 10, 30, 50, 70
Steps:
- Insert 40: Becomes the root.
- Insert 20: Less than 40, so it becomes the left child.
- Insert 60: Greater than 40, so it becomes the right child.
- Insert 10: Less than 40 and 20, so it becomes the left child of 20.
- Insert 30: Less than 40 but greater than 20, so it becomes the right child of 20.
- Insert 50: Greater than 40 but less than 60, so it becomes the left child of 60.
- Insert 70: Greater than 40 and 60, so it becomes the right child of 60.
For any given node, all values in the left subtree are less than the node's value, and all values in the right subtree are greater.
Efficiency and Limitations
Time Complexity
Search, Insertion, Deletion:
- Best and Average Case: \$O(\log n)\$ for balanced trees.
- Worst Case: \$O(n)\$ for unbalanced trees (e.g., a skewed tree).
- A common misconception is that BSTs always provide $O(\log n)$ performance.
- This is only true for balanced trees.
- Unbalanced trees can degrade to \$O(n)\$.
Balancing BSTs
- Balanced Trees: Maintain optimal performance by minimizing tree depth.
- Unbalanced Trees: Can degrade to a linked list, reducing efficiency.
Balanced BSTs, such as AVL trees and Red-Black trees, automatically maintain their balance, ensuring consistent performance.
Applications of BSTs
- Database Indexing: Efficiently organize and retrieve data.
- Memory Allocation: Manage free and allocated memory blocks.
- Autocompletion: Store and search for prefixes in text-based applications.