Exotic Data Structures

A few basic data structures:

  1. Dynamic arrays
  2. Linked lists
  3. Unordered maps
  4. Ordered maps
  5. Ordered maps (over finite keys)
  6. Fully persistent ordered maps
  7. Fully persistent ordered sets
  8. Heaps

1. Compact dynamic array (compact-arrays)

An indexable deque which is optimal in space and time [1].

This is simply a O(sqrtN) array of O(sqrtN) sub-arrays.

Two lists of arrays are maintained, small and big (twice bigger)

Also, pointers to head/tail indexes, and the big/small separation are maintained.

Conceptually, the virtual array is the concatenation of all small sub-arrays followed by the big sub-arrays, and indexed between head/tail.

All operations are straightforward. In order to maintain the global invariants, small sub-arrays are sometimes merged into big arrays, or big arrays are split into small arrays (this happens at the boundary between the two lists). when one of both lists is empty, lists are swapped (big arrays become small and vice-versa)

Asymptotic complexity:

  • O(1) worst case queries (get/set)
  • O(1) amortized, O(sqrtN) worst case update (push/pop) at both ends
  • N + O(sqrtN) records of space

Variant: Compact integer arrays

This is implemented by growing the integer range of sub-arrays dynamically when an update overflows. It makes updates O(sqrtN) worst case (O(1) amortized if O(N) numbers take O(lgM) bits).

Asymptotic complexity:

  • O(1) amortized, O(sqrtN) worst case queries (get/set)
  • O(1) amortized, O(sqrtN) worst case update (push/pop) at both ends
  • NlgM + O(N) bits of space in the worst case, where M is the biggest integer stored in the array.

2. Monlithic and Succinct lists (monolithic-lists, succinct-lists)

Monolithic lists

Monolithic lists are lists stored without any managed pointers, in a flat array. Each node in the list is identified by a number in [1..N]. 0 is the head and -1 the tail. Insertion allocates nodes, removal reclaims them. Because the links are xor-coded, two consecutive nodes are needed to form a cursor. given a cursor, you can query the list for the next or previous element. To store values at each node, you can simply use node identifiers to index a [1..N] array. This makes cache behavior even worse, but it doesn't matter much, as indexing lists is slow anyway.

note: unrolled lists are isomorphic to lists of arrays. they offer no improvement.

Asymptotic complexity:

  • O(1) worst case queries (prev/next)
  • O(1) amortized, O(N) worst case update (insertion/removal)
  • N + o(N) words of space

Succinct lists

A succinct data structure has space usage close to the information theoretic lower bound.

In essence, a list is isomorphic to an Hamiltonian cycle in a complete graph of n nodes. There are O(N!) such cycles.

Thus at least lg(N!) + o(N) bits are needed to encode a list. This gives us NlgN + o(N) as a lower bound.

Succinct lists achieve this lower bound (tightly) and can still operate efficiently. To achieve this, succinct lists are simply monolithic lists stored in the previously defined compact integer arrays.

Asymptotic complexity:

  • O(1) worst case queries (prev/next)
  • O(1) amortized, O(sqrtN) worst case update (insertion/removal)
  • NlgN + o(N) bits of space

3. Split Hashtables (split-hashtables)

Split hashtables are a variation of cuckoo hashing that doesn't move anything around.

In essence, this structure is a pair of arrays of lists of key/value pairs. Lookup is done by splitting the hash of the key in two, using this pair to index the arrays, and then search each list for the key. The twist is that insertion is always in the shortest list. Surprisingly, that gives us an information-theoretic advantage that reduces the expected worst case list length significantly [2]. This works so well that it's a waste of time to make any effort to move elements around (as in cuckoo hashing). Also, the structure can achieve near 100% load efficiently.

Of course, to get good performance, some tweaks can make the difference.

For space usage: If possible, lists are inlined in the tables. When it's inevitable, full lists are stored as arrays. The effort to recover inlined lists is to scan a single cache line in each table.

For speed: The most critical operation for an unordered set is to answer queries. This is divided into three stages in split hashtables:

  1. Short inlined path: Both arrays are looked up for key/values. if any key matches, it's a hit, and the appropriate value is returned with branchless arithmetic. This costs two 2-word loads (2 potential cache misses) and a conditional branch.
  2. Long inlined path: If the short path fails, there's a branch to the escape path if any of the 2 keys is ((list)). If there's no full list, all the key/value pairs in both cache lines are loaded. Using branchless arithmetic again, the matching key/value pair (or f f) is returned after we swapped it with the pair in the primary position (which is not a full list link because of the previous branch). This costs cached memory accesses and a conditional branch.
  3. Escape: For the inevitable cases where full lists have been allocated, we swap the pair of hashes and use linear probing on parallel on each list (probing all elements of each list in the worst case) and return the match.

Insertions and removal are performed with a variant of query that returns the key/value pair by reference.

  • Removals query for the key and and replace the key/value pair with ((free)) f. (compaction is done when load is low)
  • Insertions query for a ((free)) key and store the key/value pair over it. If the query fails, ((free)) keys are allocated and the operation is restarted.

Allocation has many possibilities: allocating full lists, growing tables, or even swapping values around (like cuckoo hashing) to achieve better results.

A load of 50% gives expected performance similar to cuckoo hashes, as the short inline path will miss with very low probability. Very high loads (near 100%) can be achieved as the saturation of inlined lists has very low probability (on the order of 1% for millions of elements, grows in lglgN).

Asymptotic complexity:

  • 2 cache misses expected on cold cache for queries and updates
  • O(1) worst case queries with cuckoo allocation
  • O(1) expected queries and updates
  • O(lglgN) worst case queries and updates with probability 1 - O(1/n)
  • 2N words of space with a load factor close to 100
  • 4N words of space to get near 100 hit rate for the fast path
  • 5N words of space expected for O(1) worst case queries

4. B+ trees, Adaptive packed memory arrays (b-trees)

B+ trees

B+ trees (hereinafter called btrees) are boring. In essence, a btree is a tree where each node contains a perfectly balanced implicit binary tree along with pointers to the child nodes. All the nodes but the root are balanced (they have between B and 2B elements) and the leaf nodes contain the records (for us, key/value pairs).

The large size of the nodes make them very cache efficient. Also, it's very easy to navigate an implicit binary tree (childs = position2 + 0 or 1) so search is very fast. One weakness is the cost of insertion/removal in a node. rotated lists can bring that cost down, especially for the key exchanges needed to balance.

Balance is maintained by exchanging elements with neighbours. if consecutive nodes are at the threshold, then nodes are split (2 nodes into 3) or merged (3 nodes into 2). The cost of a parent pointers at each node is very low, and allows finger operations along with simplified maintainance. Space usage is at least between 50-100% load, but constant time on updates can be traded for higher compacity (using leaves of variable sizes).

Asymptotic complexity:

  • O(lgN) worst case queries and updates
  • O(lgD) worst case finger queries and updates (where D is the distance from the finger)
  • typically between 2N and 4N words of space

Adaptive packed memory arrays

A packed memory array (hereinafter called PMA) is a an array of records stored in order, with some gaps to allow for efficient updates. The array is cut in lgN chunks, and an implicit binary tree keeps track of the number of elements for each 2klgN region. To enforce balance, regions have density thresholds, which are tighter for larger regions. After updates, if the density of any region is out of bounds, all elements in the region are redistributed uniformly. The adaptive PMA [3] improves upon pmas by redistributing unevenly in order to achieve lgN updates for many common update patterns (like bulk insertions). PMAs are especially attractive for external storage (like disks) as they have great locality (all elements are in order) and do not age (locality isn't degraded by updates). This is why PMAs are often a building block for cache oblivious data structures. For instance, using an implicit binary tree with a van Emde Boas layout to index a PMA gives a 'cache oblivious B+ tree'. Note that the log2(N) amortized cost can be reduced to lgN using an indirection with lgN sub-blocks.

Asymptotic complexity:

  • O(lgN) worst case queries
  • O(lgD) worst case finger queries
  • O(lgN) amortized, O(N) worst case updates (for random insertions)
  • O(lg2(N)) amortized, O(N) worst case updates (for some update patterns)
  • typically between 2N and 4N words of space

5. van Emde Boas trees, y-fast trees, approximate ordered sets (y-fast-trees)

van Emde Boas trees

note: this is a simplified version of the real thing

Search in the comparison model is optimal in lgN. However, for an indexable, finite universe, van Emde Boas trees can be go faster.

The way to achieve that is very simple. The universe (the set of all possible elements) has size U. Consider an implicit radix tree over U where every node

stores the maximum and minimum of all it's childs, or empty if it has no childs. Also, all values are stored in a doubly linked list.

Following the path from a value to the root, at some point there has to be a non-empty node. One of the extremas of this node has to be the predecessor or successor of the queried value. The list gives the other one. Such a naive approach would take lgU, however, because non-empty/empty nodes are monotonous along a path, binary search can be used, thus queries can be done in lglgU operations.

Naive updates would take lgU, but the deterministic steps of the binary search can be exploited, and updates only need to affect as much nodes as queries touch. Thus, search is altered to terminate early when a node has only one child in order to bring the update cost down to lglgU.

Asymptotic complexity:

  • O(lglgU) worst case queries and updates
  • O(U) words of space, where U is the size of the universe

y-fast trees

note: this is a simplified version of the real thing

y-fast trees are simply sparse van Emde Boas trees. As is, the space usage is impractical. Using a hashtable to store the implicit tree, we can get the space usage down to O(N), in exchange for randomization.

Asymptotic complexity:

  • O(lglgU) expected queries and updates
  • O(N) words of space

Approximate ordered sets

Given a set U that can be partitionned into a finite amount of [a, (1+e)a) intervals (for instance, any interval of the reals), we can store the values in an ordered multimap using the intervals as keys. Thus we get an e-approximate ordered map, where all operations take constant time O(lg(loge(Umax-Umin)). This can be used to sort sets with a minimum relative difference between elements in linear time, to compute e-approximate minimum spanning trees in linear time, etc...

Asymptotic complexity:

  • O(1) expected e-approximate queries and updates
  • O(N) words of space

6. Fully persistent red-black trees

The popular way to implement fully persistent data structures is path copying: When an update is done in a linked list or a binary tree, the whole path from the root is copied. This doesn't change the time complexity (if we needed to reach the updated node from the root). However, this leaks memory: Unless the the former root dies (but then, why bother with persistence ?), the GC can't collect all the nodes that have been copied. Linked lists will take O(N2) space and binary trees O(NlgN) space. This is not an acceptable cost when full persistence is needed.

However, implementing full persistence in O(1) space per update is not trivial, especially in the presence of random access [4]. The specific case of graphs of bounded degrees is easier to handle [5], and red-black trees, with O(1) rotations per update, are ideal for that usage case.

In order to get full persistence, a way to embbed several trees (one for each version) into one graph is needed. A elegant, amortized solution is to keep some space in every node to store future updates, tagged with their base version. When that space is full, then the node is copied, the contents updated to the current version, and the parent itself will store the update. Amortized analysis shows that this costs O(1) space per update. To navigate over a given version of the structure, the stored updates must be taken into account when their version is an ancestor of the given version. A version tree could be used, but then ancestor queries take O(lnV) (V the number of versions). However, there is a way to answer ancestor queries in O(1) space by solving the order-maintenance problem [6] on a list produced by in-order traversal of the version tree. The last detail is to alter the red-black balancing algorithm so that changes to the color of nodes is not stored persistently.

Asymptotic complexity:

  • O(lnN) worst case search
  • O(lnN) amortized updates
  • O(1) space per update

7. key/value difference lists (diff-assocs)

It is possible to implement fully persistent arrays in O(1) space per update, with O(lglgN) time for access and updates [4]. But these great asymptotic bounds hide large constant costs. However it's possible to design a much more simple datastructure with inefficient access to random versions [7].

Specifically, each version of the datastructure is only a key/value pair, along with a link to the previous version of the structure. Thus the datastructure is fully persistent, and optimal in space as well as update time. The drawback is that access time grows linearly with the number of updates. However it's possible to exploit the locality of accesses by using a full dictionnary as a cache of the current version. This way all consecutive access to a version after the first take constant time.

Such a datastructure is highly efficient for backtracking purposes. It's also possible to efficiently access different versions of the same datastructure in parallel using a cache for each access pattern. However full random access cannot be improved without a radically different approach [4].

Asymptotic complexity:

  • O(k) worst case access, where k is the update distance between accessed versions, thus:
  • O(1) worst case access with bounded backtracking
  • O(N) worst case access for a random version
  • O(1) worst case updates
  • O(1) space per update

8. Incremental selection implicit heaps (select-heap)

Implicit heaps are a way to store elements in an array that allows quick insertion as well as retrieval in sorted order. The well known binary heap is the basis for heapsort, and takes O(lnN) for updates. However it's possible to go much faster when only a subset of the sorted values is needed. For instance, sorting a single value (finding the Kth value), which is called selection, can be done in linear time, and sorting the first X elements of an array can be done in O(XlnX) operations.

A way to achieve all these bounds very efficiently is to alter the quickselect algorithm to keep the stack of pivots between calls [8], along with linear time pivot selection to garantee worst-case complexity (like introsort). Updates take constant time on average, with an algorithmic worst case for the update of small elements. Such a datastructure can be used to compute minimum spanning trees in O(E+VlnV) using either Kruskal's algorithm or Prim's algorithm.

Asymptotic complexity:

  • O(N) worst case to heapify an array of N elements
  • O(1) worst case to find the smallest element
  • O(K) worst case to find the Kth element
  • O(lnX) worst case to delete the minimum element for the Xth time
  • O(1) average for updates
  • O(lnN) worst case for updates
  • O(lnN) additional space

References

[1] A. Brodnik, S. Carlsson, E. D. Demaine, J. I. Munro, and R. Sedgewick: Resizable arrays in optimal time and space. Technical Report CS-99-09, U. Waterloo, 1999

http://www.cs.uwaterloo.ca/~imunro/resize.ps

[2] M. Mitzenmacher: The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems, Volume 12, Issue 10, Oct 2001

http://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.pdf

[3] M. A. Bender, H. Hu: An adaptive packed-memory array. ACM Transactions on Database Systems, Volume 32, Issue 4, 2006

http://www.inf.uni-konstanz.de/disy/teaching/ws0607/filesystems/download/adaptivepackedmemoryarray.pdf

[4] P. F. Dietz. Fully Persistent Arrays (Extended Array): Lecture Notes in Computer Science, Volume 382, 1989

https://gummo.lib.rochester.edu/retrieve/14496/tr%23290.pdf

[5] J. R. Driscoll, N. Sarnak, D. D. Sleator, R. E. Tarjan: Making data structures persistent. Journal of Computer and System Sciences, Volume 38, Issue 1, February 1989

http://www.cs.cmu.edu/~sleator/papers/making-data-structures-persistent.pdf

[6] M. A. Bender, R. Cole, E. D. Demaine, M. Farach-Colton, J. Zito: Two Simplified Algorithms for Maintaining Order in a List. Lecture Notes in Computer Science, Volume 2461, 2002

http://erikdemaine.org/papers/DietzSleator_ESA2002/paper.pdf

[7] S. Conchon, J.-C. Filli√Ętre: A persistent union-find data structure. Proceedings of the 2007 workshop on Workshop on ML, 2007

http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps

[8] R. Paredes G. Navarro: Optimal Incremental Sorting, Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments, 2006

http://www.siam.org/proceedings/alenex/2006/alx06_016rparedes.pdf

This revision created on Fri, 6 May 2011 01:35:09 by akkartik (Rollback)