Advanced Data Structures
Segment/Interval/Range/Binary Index Tree
All these data structures are used for solving different problems:
- Segment tree stores intervals, and optimized for "which of these intervals contains a given point" queries.
- Interval tree stores intervals as well, but optimized for "which of these intervals overlap with a given interval" queries. It can also be used for point queries - similar to segment tree.
- Range tree stores points, and optimized for "which points fall within a given interval" queries.
- Binary indexed tree stores items-count per index, and optimized for "how many items are there between index m and n" queries.
Performance / Space consumption for one dimension:
- Segment tree -
O(n logn)preprocessing time,O(k+logn)query time,O(n logn)space - Interval tree -
O(n logn)preprocessing time,O(k+logn)query time,O(n)space - Range tree -
O(n logn)preprocessing time,O(k+logn)query time,O(n)space - Binary Indexed tree -
O(n logn)preprocessing time,O(logn)query time,O(n)space
(k is the number of reported results).
All data structures can be dynamic, in the sense that the usage scenario includes both data changes and queries:
- Segment tree - interval can be added/deleted in O(logn) time (see here)
- Interval tree - interval can be added/deleted in O(logn) time
- Range tree - new points can be added/deleted in O(logn) time (see here)
- Binary Indexed tree - the items-count per index can be increased in O(logn) time
Higher dimensions (d>1):
- Segment tree -
O(n(logn)^d)preprocessing time,O(k+(logn)^d)query time,O(n(logn)^(d-1))space - Interval tree -
O(n logn)preprocessing time,O(k+(logn)^d)query time,O(n logn)space - Range tree -
O(n(logn)^d)preprocessing time,O(k+(logn)^d)query time,O(n(logn)^(d-1))space - Binary Indexed tree -
O(n(logn)^d)preprocessing time,O((logn)^d)query time,O(n(logn)^d)space
LSM: Log-Structured Merge Tree
https://en.wikipedia.org/wiki/Log-structured_merge-tree
- The LSM-tree is actually a collection of trees but which is treated as a single key-value store.
- attractive for providing indexed access to files with high insert volume, such as transactional log data.
Self-balancing Trees
- B-Tree, B+Tree, B*Tree
- Red-black Tree
- AVL Tree
- Splay Tree
- Treap: a really cool way to implement a balanced binary search tree. https://jvns.ca/blog/2017/09/09/data-structure--the-treap-/
B tree vs red-black tree
A red/black tree is more or less equivalent to a 2-3-4 tree, which is just a type of B-tree. The worst-case performance is identical, provided you do a binary search of the B-tree node values.
Merkle Tree
https://medium.com/netlify/how-netlifys-deploying-and-routing-infrastructure-works-c90adbde3b8d
We’ve built Netlify’s core around Merkle trees, the same structures used in Git, ZFS file systems, and blockchain technology. A Merkle tree is a structure where each node is labeled with the cryptographic hash of the labels of all the nodes under it.
merkel tree=hash tree, every leaf node is labelled with the hash of a data block and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes Git and Mercurial distributed revision control systems the Bitcoin and Ethereum peer-to-peer networks a number of NoSQL systems like Apache Cassandra, Riak and Dynamo hash 0 = hash( hash 0-0 + hash 0-1 )
Other Advanced Data Structures
- k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space.
- pairing heaps
- Fibonacci heaps
- union-find with disjoint-set structures
- longest common prefix array: https://en.wikipedia.org/wiki/LCP_array