CPSC 327 | Data Structures and Algorithms | Spring 2024 |
Design a data structure to efficiently support:
Answer:
[This reflects as far as we got in class; it may not be the final answer.]
Use a balanced search tree to store (key,element) pairs, a pointer to the node with the minimum key, and a hashtable-based dictionary with (element,node) pairs.
findMin() — use the min pointer = Θ(1)
insert(k,e) — O(log n) to insert (k,e) into the balanced search tree + Θ(1) to update the min pointer (compare k to current min) + O(1) to add (e,node) to hashtable and O(log n) to update hashtable entries for nodes affected by restructuring = O(log n)
removeMin() — Θ(1) to find the node with the min key using the min pointer + O(log n) to remove from balanced search tree + O(log n) to find new min in the search tree + O(1) to remove (e,node) from the hashtable and O(log n) to update hashtable entries for nodes affected by restructuring = O(log n)
decreaseKey(e,k) — O(1) to find the node containing e (hashtable lookup) + O(log n) to delete from a balanced search tree and re-insert + O(log n) to update the hashtable for nodes affected by restructuring = O(log n)
The hashtable stores the node containing each element — but insert and remove in a balanced search tree may involve restructuring of the tree. How does that affect the contents of the hashtable?
With 2-4 trees, all operations may move elements between nodes — but only a couple of nodes (and a fixed number of nodes) are affected in each operation and there are at most three elements per node, so that gives a maximum number of updates in the hashtable no matter the total number of elements. For AVL trees, nodes are relinked but their elements aren't changed (except for one node with a swap during delete). So updating the hashtable is O(1) per restructuring operation.
Discussion:
[This isn't part of the answer or a writeup, but instead outlines a thought process for finding the answer.]
A strategy for designing a data structure is to start with something straightforward and see where it breaks.
insert, findMin, and removeMin sound like OrderedDictionary, but decreaseKey is not an OrderedDictionary operation and doesn't seem like something that could be implemented with OrderedDictionary operations (nothing works with elements instead of keys). So we consider a data structure underlying OrderedDictionary — working directly with the data structure may allow implementation of additional operations.
OrderedDictionary is commonly implemented with a balanced search tree. (No need to get more specific about the kind unless it is matter.) Let's consider how to implement the desired operations in a balanced search tree.
insert(k,e) — balanced search tree insert, O(log n)
findMin() — to find the min is O(log n) to start from the root and repeatedly go left, so what if we added a pointer to the node with the min key element?
New plan: a balanced search tree with a pointer to the node with the min key.
findMin() — use the min pointer, Θ(1)
insert(k,e) — balanced search tree insert, O(log n) + update min node, Θ(1) to compare min and new key and update as needed
removeMin() — Θ(1) to find the min, O(log n) to remove from balanced search tree, O(log n) to find new min
decreaseKey(e,k) — could be done by remove and re-insert, but this operation is to decrease a particular element's key — the search tree is organized by keys, not elements, and searching for a particular element requires traversal, O(n)
New plan: need to be able to find the node containing an element given the element. This sounds like a lookup task — add a dictionary with (element,node) as the (key,value) pairs. So now we have a balanced search tree with the (key,element) pairs, a pointer to the node with the minimum key, and a dictionary with (element,node) pairs. Since the implementation of the dictionary matters for running time, let's go with a hashtable.
decreaseKey(e,k) — O(1) to find the node containing e (hashtable lookup), O(log n) to delete that from a balanced search tree, O(log n) to re-insert = O(log n)
...but wait, what about the search tree delete and insert? That may involve restructuring of the tree, what does that mean for the (element,node) hashtable? With 2-4 trees, all operations may move elements between nodes — but only a couple of nodes (and a fixed number of nodes) are affected in each operation and there are at most three elements per node, so that gives a maximum number of updates in the hashtable no matter the total number of elements. For AVL trees, nodes are relinked but their elements aren't changed (except for one node with a swap during delete). So updating the hashtable is O(1) per restructuring operation and thus doesn't change the overall big-Oh.
decreaseKey(e,k) — O(1) to find the node containing e (hashtable lookup) + O(log n) to delete that from a balanced search tree + O(log n) to re-insert + O(1) per restructuring operation (O(log n) restructuring operations) to update the hashtable during restructuring = O(log n)
insert(k,e) — balanced search tree insert, O(log n) + update min node, Θ(1) to compare min and new key and update as needed + O(1) to add (e,node) to hashtable and O(log n) to update hashtable entries for nodes affected by restructuring = O(log n)
removeMin() — Θ(1) to find the min + O(log n) to remove from balanced search tree + O(log n) to find new min + O(1) to remove e from the hashtable and O(log n) to update hashtable entries for nodes affected by restructuring = O(log n)
findMin() — use the min pointer, Θ(1)
So we now have O(log n) for modification operations and O(1) for find. It's hard to expect that we'll do better for a balanced search tree because decreaseKey would potentially involve moving the element all the way up or down the tree, so something more clever than delete and insert doesn't seem helpful. And it seems pretty good, particularly because the better option for insert(k,e) and remove(k) — a hashtable — is not efficient for ordered dictionaries...but the O(log n) for insert, remove, decrease is mostly due to the balanced search tree, not maintaining the min — a hashtable would reduce everything to O(1) except for an O(N) update-the-min step in removeMin. Could we do better with updating the min?