A trie
is a special kind of tree data structure in which items are stored and
accessed based upon their key value alone, not based upon their key
value in relation to others in the tree. In contrast, recall that a
binary search tree
uses a greaterthan or lessthan relationship between keys in the tree
to determine where a new insertion would be placed. In tries, the new
insertion's place is predetermined based on its key value. For this
reason a trie is somewhat of a cross between a tree and a hash table.
Tries can only be used when the range of valid keys to be stored is
known upfront. To store a new item in a trie, that item's key value
is somehow broken down into components. If, for example, the item to
be stored is a string, a logical way to break its value down is by
letter. For a number, perhaps a good way to break down the key value
is by digits in its binary representation. Every internal node in a
trie can have as many child nodes as the number of items in the
alphabet you are storing. That is, if you are traversing on English
letters each node can have up to 26 children. If you are traversing
on binary digits, each node can have two children, 0 or 1.
Imagine we have a trie in which we are storing strings. We wish to
store the new item ``dog'' in the trie. From the root node we follow
the ``d'' edge and arrive at a child node. The only data stored in
leaves under this child node are words that begin with the letter
``d.''
Next, we traverse along the ``o'' link from the ``d'' node. We reach
yet another internal node under which only words which start with the
prefix ``do'' are stored. Finally suppose we find that there is no
``g'' link off the ``do'' node. We create one and store ``dog'' at this
new leaf node.
Likewise, if we want to store the number six (6) in a numeric trie, we
might convert six to it's binary representation (110). From the first
node we follow the ``1'' edge. Next we traverse down the ``1'' edge
again. And finally we store the value six in a leaf node that is lies
along the ``110'' edges from the root node.
Any path from the root to a leaf in a trie corresponds to the value of
the item stored at the leaf node reached.
Huffman trees,
the data structures behind Huffman data compression algorithms, are a
popular use of tries. Huffman coding
is discussed in the data compression section of this document.
