Radix tree

Radix tree
Patricia trie.svg

In computer science, a radix tree (also patricia trie or radix trie) is a space-optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike in regular tries, edges can be labeled with sequences of characters as well as single characters. This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.

It supports the following main operations, all of which are O(k), where k is the maximum length of all strings in the set:

  • Lookup: Determines if a string is in the set. This operation is identical to tries except that some edges consume multiple characters.
  • Insert: Add a string to the tree. We search the tree until we can make no further progress. At this point we either add a new outgoing edge labeled with all remaining characters in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, we split it into two edges (the first labeled with the common prefix) and proceed. This splitting step ensures that no node has more children than there are possible string characters.
  • Delete: Delete a string from the tree. First, we delete the corresponding leaf. Then, if its parent only has one child remaining, we delete the parent and merge the two incident edges.
  • Find predecessor: Locates the largest string less than a given string, by lexicographic order.
  • Find successor: Locates the smallest string greater than a given string, by lexicographic order.

A common extension of radix trees uses two colors of nodes, 'black' and 'white'. To check if a given string is stored in the tree, the search starts from the top and follows the edges of the input string until no further progress can be made. If the search-string is consumed and the final node is a black node, the search has failed; if it is white, the search has succeeded. This enables us to add a large range of strings with a common prefix to the tree, using white nodes, then remove a small set of "exceptions" in a space-efficient manner by inserting them using black nodes.

Contents

Applications

As mentioned, radix trees are useful for constructing associative arrays with keys that can be expressed as strings. They find particular application in the area of IP routing, where the ability to contain large ranges of values with a few exceptions is particularly suited to the hierarchical organization of IP addresses.[1] They are also used for inverted indexes of text documents in information retrieval.

History

Donald R. Morrison first described what he called "Patricia trees" in 1968;[2] the name comes from the acronym PATRICIA, which stands for "Practical Algorithm To Retrieve Information Coded In Alphanumeric". Gernot Gwehenberger independently invented and described the data structure at about the same time.[3]

Comparison to other data structures

(In the following comparisons, it is assumed that the keys are of length k and the data structure contains n elements.)

Unlike balanced trees, radix trees permit lookup, insertion, and deletion in O(k) time rather than O(log n). This doesn't seem like an advantage, since normally k ≥ log n, but in a balanced tree every comparison is a string comparison requiring O(k) worst-case time, many of which are slow in practice due to long common prefixes. In a trie, all comparisons require constant time, but it takes m comparisons to look up a string of length m. Radix trees can perform these operations with fewer comparisons and require many fewer nodes.

Radix trees also share the disadvantages of tries, however: as they can only be applied to strings of elements or elements with an efficiently reversible mapping (injection) to strings, they lack the full generality of balanced search trees, which apply to any data type with a total ordering. A reversible mapping to strings can be used to produce the required total ordering for balanced search trees, but not the other way around. This can also be problematic if a data type only provides a comparison operation, but not a (de)serialization operation.

Hash tables are commonly said to have expected O(1) insertion and deletion times, but this is only true when considering computation of the hash of the key to be a constant time operation. When hashing the key is taken into account, hash tables have expected O(k) insertion and deletion times, but may take longer in the worst-case depending on how collisions are handled. Radix trees have worst-case O(k) insertion and deletion. The successor/predecessor operations of radix trees are also not implemented by hash tables.

Variants

The HAT-trie is a radix tree based cache-conscious data structure that offers efficient string storage and retrieval, and ordered iterations. Performance, with respect to both time and space, is comparable to the cache-conscious hashtable.[4][5]

See also

References

  1. ^ Knizhnik, Konstantin. "Patricia Tries: A Better Index For Prefix Searches", Dr. Dobb's Journal, June, 2008.
  2. ^ Morrison, Donald R. Practical Algorithm to Retrieve Information Coded in Alphanumeric
  3. ^ G. Gwehenberger, Anwendung einer binären Verweiskettenmethode beim Aufbau von Listen. Elektronische Rechenanlagen 10 (1968), pp. 223–226
  4. ^ Askitis, Nikolas; Sinha, Ranjan (2007). HAT-trie: A Cache-conscious Trie-based Data Structure for Strings. 62. 97–105. ISBN 1-920-68243-0. http://portal.acm.org/citation.cfm?id=1273749.1273761&coll=GUIDE&dl= 
  5. ^ Askitis, Nikolas; Sinha, Ranjan (2010). Engineering scalable, cache and space efficient tries for strings. doi:10.1007/s00778-010-0183-9. ISBN 1066-8888 (Print) 0949-877X (Online). http://www.springerlink.com/content/86574173183j6565/. 

External links

Implementations


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Radix (disambiguation) — Radix can refer to: *Radix, another word for mathematical bases *In botany, radix is the root of a plant **Also used for dorsal root and ventral root in anatomy. Derived terms e.g. radiculoneuropathy * Radix (novel), a science fiction novel by A …   Wikipedia

  • Tree (data structure) — A simple unordered tree; in this diagram, the node labeled 7 has two children, labeled 2 and 6, and one parent, labeled 2. The root node, at the top, has no parent. In computer science, a tree is a widely used data structure that emulates a… …   Wikipedia

  • Radix sort — In computer science, radix sort is a sorting algorithm that sorts integers by processing individual digits. Because integers can represent strings of characters (e.g., names or dates) and specially formatted floating point numbers, radix sort is… …   Wikipedia

  • Binary tree — Not to be confused with B tree. A simple binary tree of size 9 and height 3, with a root node whose value is 2. The above tree is unbalanced and not sorted. In computer science, a binary tree is a tree data structure in which each node has at… …   Wikipedia

  • B-tree — In computer science, a B tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time. It is most commonly used in databases and filesystems. In B trees, internal (non leaf)… …   Wikipedia

  • Phylogenetic tree — ptree redirects here. For Patricia tree, see Radix tree …   Wikipedia

  • Hash tree — A binary hash tree In cryptography and computer science Hash trees or Merkle trees are a type of data structure[citation needed] which contains a tree of summary information about a larger piece of da …   Wikipedia

  • Suffix tree — In computer science, a suffix tree (also called suffix trie, PAT tree or, in an earlier form, position tree) is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many… …   Wikipedia

  • Dancing tree — For the film Dancing tree, see Dancing tree (film) In computer science, a dancing tree is a tree data structure, which is similar to B+ tree. Invented by Hans Reiser, for use by the Reiser4 file system. As opposed to self balancing binary search… …   Wikipedia

  • Metric tree — This article is about the data structure. For the type of metric space, see Real tree. A metric tree is any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”