Advanced Data Structures
Segment/Interval/Range/Binary Index Tree
All these data structures are used for solving different problems:
- Segment tree stores intervals, and optimized for "which of these intervals contains a given point" queries.
- Interval tree stores intervals as well, but optimized for "which of these intervals overlap with a given interval" queries. It can also be used for point queries - similar to segment tree.
- Range tree stores points, and optimized for "which points fall within a given interval" queries.
- Binary indexed tree stores items-count per index, and optimized for "how many items are there between index m and n" queries.
Performance / Space consumption for one dimension:
- Segment tree -
O(n logn)
preprocessing time,O(k+logn)
query time,O(n logn)
space - Interval tree -
O(n logn)
preprocessing time,O(k+logn)
query time,O(n)
space - Range tree -
O(n logn)
preprocessing time,O(k+logn)
query time,O(n)
space - Binary Indexed tree -
O(n logn)
preprocessing time,O(logn)
query time,O(n)
space
(k is the number of reported results).
All data structures can be dynamic, in the sense that the usage scenario includes both data changes and queries:
- Segment tree - interval can be added/deleted in O(logn) time (see here)
- Interval tree - interval can be added/deleted in O(logn) time
- Range tree - new points can be added/deleted in O(logn) time (see here)
- Binary Indexed tree - the items-count per index can be increased in O(logn) time
Higher dimensions (d>1):
- Segment tree - O(n(logn)^d) preprocessing time, O(k+(logn)^d) query time, O(n(logn)^(d-1)) space
- Interval tree - O(n logn) preprocessing time, O(k+(logn)^d) query time, O(n logn) space
- Range tree - O(n(logn)^d) preprocessing time, O(k+(logn)^d) query time, O(n(logn)^(d-1))) space
- Binary Indexed tree - O(n(logn)^d) preprocessing time, O((logn)^d) query time, O(n(logn)^d) space
LSM: Log-Structured Merge Tree
https://en.wikipedia.org/wiki/Log-structured_merge-tree
- The LSM-tree is actually a collection of trees but which is treated as a single key-value store.
- attractive for providing indexed access to files with high insert volume, such as transactional log data.
Self-balancing Trees
- B-Tree, B+Tree, B*Tree
- Red-black Tree
- AVL Tree
- Splay Tree
- Treap: a really cool way to implement a balanced binary search tree. https://jvns.ca/blog/2017/09/09/data-structure--the-treap-/
Merkle Tree
https://medium.com/netlify/how-netlifys-deploying-and-routing-infrastructure-works-c90adbde3b8d
We’ve built Netlify’s core around Merkle trees, the same structures used in Git, ZFS file systems, and blockchain technology. A Merkle tree is a structure where each node is labeled with the cryptographic hash of the labels of all the nodes under it.
merkel tree=hash tree, every leaf node is labelled with the hash of a data block and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes Git and Mercurial distributed revision control systems the Bitcoin and Ethereum peer-to-peer networks a number of NoSQL systems like Apache Cassandra, Riak and Dynamo hash 0 = hash( hash 0-0 + hash 0-1 )
Other Advanced Data Structures
- k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space.
- pairing heaps
- Fibonacci heaps
- union-find with disjoint-set structures
- longest common prefix array: https://en.wikipedia.org/wiki/LCP_array