Exotic Data Structures

Or maybe not so exotic...

  1. Dynamic arrays
  2. Linked lists
  3. Unordered maps
  4. Ordered maps
  5. Ordered maps (over finite keys)

The last one is starting to sound exotic though.

1. Compact dynamic array (compact-arrays)

Based on the deque variant of [1].


  • O(1) worst case access (get/set)
  • O(1) amortized, O(sqrtN) worst case update (push/pop) at both ends
  • N + O(sqrtN) space

This is simply a O(sqrtN) array of O(sqrtN) sub-arrays.

Two lists of arrays are maintained, small and big (twice bigger)

Also, pointers to head/tail indexes, and the big/small separation are maintained.

Conceptually, the virtual array is the concatenation of all small sub-arrays followed by the big sub-arrays, and indexed between head/tail.

All operations are straightforward. In order to maintain the global invariants, small sub-arrays are sometimes merged into big arrays, or big arrays are split into small arrays (this happens at the boundary between the two lists). when one of both lists is empty, we swap the lists (big arrays become small and vice-versa)

Variant: Compact integer arrays

Has the same time complexity bounds, but takes NlgM +o(N) bits in the worst case, where M is the biggest integer stored in the array. This is implemented by growing sub-arrays dynamically when an update overflows. it makes updates O(sqrtN) worst case (amortized if O(N) numbers take O(lgM) bits).

2. Monlithic and Succinct lists (monolithic-lists, succinct-lists)

A succinct data structure has space usage close to the information theoretic lower bound.

In essence, a list is isomorphic to an Hamiltonian cycle in a complete graph of n nodes. There are O(N!) such cycles.

Thus we need at least lg(N!) + o(N) bits to encode a list. This gives us NlgN + o(N) as a lower bound.

Succinct lists achieve this lower bound (tightly) and can still operate efficiently. To achieve this, succinct lists are simply monolithic lists stored in the previously defined compact integer arrays.

Monolithic lists are lists stored without any managed pointers, in a flat array. Each node in the list is identified by a number in [1..N]. 0 is the head and -1 the tail. Insertion allocates nodes, deletion reclaims them. Because the links are xor-coded, two consecutive nodes are needed to form a cursor. given a cursor, you can query the list for the next or previous element. To store values at each node, you can simply use node identifiers to index a [1..N] array. This makes cache behavior even worse, but it doesn't matter much, as indexing lists is slow anyway.

Split-Hash Table

A variation of cuckoo hashing, that doesn't move anything around. Inspired by [2]


[1] A. Brodnik, S. Carlsson, E. D. Demaine, J. I. Munro, and R. Sedgewick. Resizable arrays in optimal time and space. Technical Report CS-99-09, U. Waterloo, 1999.


[2] Mitzenmacher, M. The power of two choices in randomized load balancing. Parallel and Distributed Systems, IEEE Transactions on.


This revision created on Mon, 8 Dec 2008 07:44:31 by prunedtree