## Trie and Patricia, with Functional and imperative implementation

Liu Xinyu *

October 26, 2009

*Liu Xinyu
Email: liuxinyu95@gmail.com
 Abstract Trie and Patricia are important data structures in information retreiving and manipulating. None of these data structures are new. They were invented in 1960s. This post collects some existing knowledge about them. Some functional and imperative implementation are given in order to show the basic idea of these data structures. There are multiple programming languages used, including, C++, Haskell, python and scheme/lisp. C++ and python are mostly used to show the imperative implementation, while Haskell and Scheme are used for functional purpose. There may be mistakes in the post, please feel free to point out. This post is generated by LATEX2ɛ, and provided with GNU FDL(GNU Free Documentation License). Please refer to http://www.gnu.org/copyleft/fdl.html for detail.

Keywords: Trie, Patricia, Radix tree, Suffix tree

Please browse the PDF file for the full text of this post.

### 1 Introduction

There isn’t a separate chapter about Trie or Patricia in CLRS book. While these data structure are very basic, especially in information retrieving. Some of them are also widely used in compiler design[2], and bio-information area, such as DNA pattern matching [4].

In CLRS book index, Trie is redirected to Radix tree, while Radix tree is described in Problem 12-1 [1].
 Figure 1: an Radix tree example in CLRS

Figure 1 shows a radix tree contains the bit strings 1011, 10, 011, 100 and 0. When searching for a key k = b0b1...bn, we take the first bit b0 (MSB from left), check if it is 0 or 1, if it is 0, we turn left, and turn right for 1. Then we take the 2nd bit and repeat this search until we either meet a leaf or finish all n bits.

Note that radix tree needn’t store key in node at all. The information are represented by edges in fact. The node with key string in the above figure are only for illustration.

It is very natural to come to the idea ‘is it possible to represent keys in integers instead of string, because integer can be denoted in binary format?’. Such approach can save spaces and it is fast if we can use bit-wise manipulation.

I’ll first show the integer based Trie data structure and implementation in section 2. Then we can point out the drawbacks and go to the improved data structure of integer based Patricia in section 3. After that, I’ll show alphabetic Trie and Patricia and list some typical use of them in textual manipulation engineering problems.

This article provides example implementation of Trie and Patricia in C, C++, Haskell, Python, and Scheme/Lisp languages. Some functional implementation can be referenced from current Haskell packages ?? ??.

### 2 Integer Trie

Let’s give a definition of the data structure in figure 1. To be more accurate, it should be called as binary trie. a binary trie is a binary tree in which the placement of each key is controlled by its bits, each 0 means “go left at the next node” and each 1 means “go right at the next node”[2].

Because integers can be represented in binary format in computer, it is possible to store integer keys rather than 0,1 strings. When we insert an integer as a new key to the trie, we take first bit, if it is 0, we recursively insert the other bits to the left sub tree; if it is 1, we insert into right sub tree.

However, there is a problem if we treat the key as integer. Consider a binary trie shows in figure 2. If the keys are represented in string based on ’0’ and ’1’, all the three keys are different. While if they are turned into integers, they are identical. So if we want to insert a data with integer key 3, where should we put it into the trie?
 Figure 2: a big-endian trie

One approach is to treat all prefix zero as effective bits. Suppose an integer is represented in 32-bits, If we want to insert key 1 to a trie, it will end up with a 32-levels tree. There are 31 node has only 1 left sub tree and the last node has a right sub tree. It is very inefficient in terms of space.

Chris Okasaki shows a method to solve this problem in [2]. Instead of normal big-endian integer, we can use little-endian integer as key. By using little-endian, decimal integer 1 is represent as binary 1, if we insert it to an empty binary trie, we get a trie with a root and a right leaf. There is only 1 level. For integer 3, it is 11 in binary, we needn’t add any prefix 0, the position in the trie is unique.

#### 2.1 Definition of Integer Trie

Trie is invented by Edward Fredkin. It comes from “retrieval”, pronounce as /’tri:/ by the inventor, while it is pronounced /’trai/ “try” by other authors [5]. The definition of little-endian binary trie is simple, we can reuse the structure of binary tree, with its left sub tree to store 0 part and right tree to store 1 part. The augment data can be stored as value.

##### Definition of little-endian integer Trie in C++

We can utilize C++ template to abstract the value stored in Trie. The type of key is integer. Each node contains a value, a left child and a right child.

template <class  T >
struct IntTrie{
IntTrie():value(), left(0), right(0){}
˜IntTrie(){
delete left;
delete right;
}
T  value;
IntTrie* left;
IntTrie* right;
};

In order to simplify the release, recursive destruction is added.

##### Definition of little-endian integer Trie in Haskell

In trie, since a node may not contains value, so we use Haskell Maybe data to represent this situation. A IntTrie node is either an empty node, or a branch node. The branch node contains a left child a ‘Maybe’ value and a right child.

data IntTrie a  =  Empty
| Branch (IntTrie a) (Maybe  a) (IntTrie a)  --  left, value, right

type Key  =  Int

--  helpers
left :: IntTrie a  ->  IntTrie a
left (Branch l _ _)  =  l
left Empty  =  Empty

right :: IntTrie a  ->  IntTrie a
right (Branch _ _ r)  =  r
right Empty  =  Empty

value :: IntTrie a  ->   Maybe  a
value (Branch _ v _)  =  v
value Empty  =  Nothing

In order to access the children and value some helper functions are given.

##### Definition of little-endian integer Trie in Python

The definition of integer trie in Python is shown as below. All fields are initialized as empty values.

class IntTrie:
def __init__(self):
self.value  =  None
self.left  =  self.right  =  None

left child and right child represent sub Trie branches, and value is used to store actual data.

##### Definition of little-endian integer Trie in Scheme/Lisp

In Scheme/Lisp, we provides some helper functions to create and access Trie data. The underground data structure is still list.

;; Definition

(define (make-int-trie l v r) ;; left, value, right
(list l v r))

;; Helpers
(define (left trie)
(if (null? trie) ’() (car trie)))

(define (value trie)
(if (null? trie) ’() (cadr trie)))

(define (right trie)
(if (null? trie) ’() (caddr trie)))

Some helper functions are provided for easy access to children and value.

#### 2.2 Insertion of integer trie

##### 2.2.1 Iterative insertion algorithm

Since the key is little-endian, when we insert a key into trie, we take the bit from right most (LSB). If it is 0, we go to the left child, and go to right for 1. if the child is empty, we need create new node, and repeat this until meet the last bit (MSB) of the integer. Below is the iterative algorithm of insertion.

INT - TRIE - INSERT(T,x,data)
if T = NIL then
T EmptyNode

end if
p = T
while xdo
if EV EN(x) = TRUE then
p LEFT(p)

else
p RIGHT(p)

end if
if p = NIL then
p EmptyNode

end if
x x∕2

end while
DATA(p) data

##### Insertion of integer Trie in C++

With C++ language we can speed up the above even/odd test and key update with bit-wise operation.

template <class  T >
IntTrie< T >* insert(IntTrie< T >* t, int key,  T  value= T()){
if(!t)
t  =   new  IntTrie< T>();

IntTrie< T >* p  =  t;
while(key){
if( (key&0x1)  ==  0){
if(!p ->left) p ->left  =   new  IntTrie< T>();
p  =  p ->left;
}
else{
if(!p ->right) p ->right  =   new  IntTrie< T>();
p  =  p ->right;
}
key>>=1;
}
p ->value  =  value;
return t;
}

In order to verify this program, some helper functions are provided to simplify the repeat insertions. And we also provide a function to convert the Trie to readable string.

template <class  T, class Iterator>
IntTrie< T >* list_to_trie(Iterator first, Iterator last){
IntTrie< T >* t(0);
for(;first!=last;  ++first)
t  =  insert(t, *first);
return t;
}

template <class  T, class Iterator>
IntTrie< T >* map_to_trie(Iterator first, Iterator last){
IntTrie< T >* t(0);
for(;first!=last;  ++first)
t  =  insert(t, first->first, first->second);
return t;
}

template <class  T >
std::string trie_to_str(IntTrie< T >* t, int prefix=0, int depth=0){
std::stringstream s;
s<< ”(” <<prefix;
if(t->value!= T())
s<< ”:” << t->value;
if(t->left)
s<< ”,_” <<trie_to_str(t->left, prefix, depth+1);
if(t->right)
s<< ”,_” <<trie_to_str(t->right, (1<<depth)+prefix, depth+1);
s<< ”)”;
return s.str();
}

Function list_to_trie just inserts keys, all values are treated as default value as type T, while map_to_trie inserts both keys and values repeatedly. Function trie_to_str helps to convert a Trie to literal string in a modified pre-order traverse.

The verification cases are as the following.

const int lst[]  =  {1, 4, 5}
std::list< int>  l(lst, lst+sizeof(lst)/sizeof(int));

IntTrie< int>* ti  =  list_to_trie< int, std::list< int>::iterator>(l.begin(), l.end());
std::copy(l.begin(), l.end(),
std::ostream_iterator< int>(std::cout, ”,_”));
std::cout<< ” = =><<trie_to_str(ti)<<\n”;

IntTrie< char>* tc;
typedef std::list<std::pair< intchar >   >  Dict;
const int  keys[]  =  {4, 1, 5, 9}
const char vals[]  =  ”bacd”;
Dict  m;
for(int i=0; i<sizeof(keys)/sizeof(int);  ++i)
m.push_back(std::make_pair(keys[i], vals[i]));
tc  =  map_to_trie< char, Dict::iterator>( m.begin(),  m.end());
std::copy(keys, keys+sizeof(keys)/sizeof(int),
std::ostream_iterator< int>(std::cout, ”,_”));
std::cout<< ” = =><<trie_to_str(tc);

The above code lines will output results in console like this.

 1, 4, 5, ==>(0, (0, (0, (4))), (1, (1, (5))))  4, 1, 5, 9, ==>(0, (0, (0, (4:b))), (1:a, (1, (1, (9:d)), (5:c))))

##### Insertion of integer trie in Python

Imperative implementation of insertion in Python can be easily given by translating the pseudo code of the algorithm.

def trie_insert(t, key, value  =  None):
if t is None:
t  =  IntTrie()

p  =  t
while key != 0:
if key  &  1  ==  0:
if p.left is None:
p.left  =  IntTrie()
p  =  p.left
else
if p.right is None:
p.right  =  IntTrie()
p  =  p.right
key  =  key>>1
p.value  =  value
return t

In order to test this insertion program, some test helpers are provided:

def trie_to_str(t, prefix=0, depth=0):
to_str  =  lambda x: ” % s”  % x
str= ”(” +to_str(prefix)
if t.value is not None:
str  +=  ”:” +t.value
if t.left is not None:
str  +=  ”,_” +trie_to_str(t.left, prefix, depth+1)
if t.right is not None:
str  +=  ”,_” +trie_to_str(t.right, (1<<depth)+prefix, depth+1)
str+= ”)”
return str

def list_to_trie(l):
t  =  None
for x in l:
t  =  trie_insert(t, x)
return t

def map_to_trie( m):
t  =  None
for k, v in  m.items():
t  =  trie_insert(t, k, v)
return t

Function trie_to_str can print the contents of trie in pre-order, It doesn’t only print the value of the node, but also print the edge information.

Function list_to_trie can repeatedly insert a list of keys into a trie, since the default argument of value is None, so all data relative to keys are empty. If the data isn’t empty, function map_to_trie can insert a list of key-value pairs into the trie.

Then a test class is given to encapsulate test cases:

class IntTrieTest:
def run(self):
self.test_insert()

def test_insert(self):
t  =  None
t  =  trie_insert(t, 0);
t  =  trie_insert(t, 1);
t  =  trie_insert(t, 4);
t  =  trie_insert(t, 5);
print trie_to_str(t)
t1  =  list_to_trie([1, 4, 5])
print trie_to_str(t1)
t2  =  map_to_trie({4:’b’, 1:’a’, 5:’c’, 9:’d’}
print trie_to_str(t2)

if __name__  ==  ”__main__”:
IntTrieTest().run()

Running this program will print the following result.

 (0, (0, (0, (4))), (1, (1, (5))))  (0, (0, (0, (4))), (1, (1, (5))))  (0, (0, (0, (4:b))), (1:a, (1, (1, (9:d)), (5:c))))

Please note that by pre-order traverse trie, we can get a ’lexical’ order result of the keys. For instance, the last result print the little-endian format of key 1, 4, 5, 9 as below.

 001  1  1001  101

They are in lexical order. We’ll go back to this feature of trie later in alphabetic trie section.

##### 2.2.2 Recursive insertion algorithm

Insertion can also be implemented in a recursive manner. If the LSB is 0, it means that the key to be inserted is an even number, we recursively insert the date to left child, we can divide the number by 2 and round to integer to get rid of the LSB. If the LSB is 1, the key is then an odd number, the recursive insertion will be happened to right child. This algorithm is described as below.

INT - TRIE - INSERT(T,x,data)
if  x = 0 then
V ALUE(T) data

else
if EV EN(xthen
LEFT(T) INT - TRIE - INSERT(LEFT(T),x∕2,data)

else
RIGHT(T) INT - TRIE - INSERT(RIGHT(T),x∕2,data)

end if

end if
return  T

##### Insertion of integer Trie in Haskell

To simply the problem, If user insert a data with key already exists, we simply overwrite the previous stored data. This approach can be easily replaced with other methods, such as storing data as a linked list etc.

Insertion integer key into a trie can be implemented with Haskell as below.

insert :: IntTrie a  ->  Key  ->  a  ->  IntTrie a
insert t 0 x  =  Branch (left t) (Just x) (right t)
insert t k x  =
if even k
then Branch (insert (left t) (k ‘div‘ 2) x) (value t) (right t)
else Branch (left t) (value t) (insert (right t) (k ‘div‘ 2) x)

If the key is zero, we just insert the data to current node, in other cases, the program go down along the trie according to the last bit of the key.

To test if this program, some test helper functions are provided.

fromList :: [(Key, a)]  ->  IntTrie a
fromList xs  =  foldl ins Empty xs where
ins t (k, v)  =  insert t k v

--  k  =  ... a2, a1, a0  = =>  k  =  ai *  m   +  k, where  m =2ˆi
toString :: (Show  a)=> IntTrie a  ->  String
toString t  =  toStr t 0 1 where
toStr Empty k  m   =  ”.”
toStr tr k  m   =  ”(”  ++  (toStr (left tr) k (2* m))  ++
”_”  ++  (show k)  ++  (valueStr (value tr))  ++
”_”  ++  (toStr (right tr) ( m + k) (2* m))  ++  ”)”
valueStr (Just x)  =  ”:”  ++  (show x)
valueStr _  =  ””

fromList function can create a trie from a list of integer-data pairs. toString function can turn a trie data structure to readable string for printing. This is a modified in-order tree traverse, since the number stored is in little-endian, the program store the 2m to calculate the keys. The following code shows a test.

main  =  do
putStrLn (toString (fromList [(1, ’a’), (4, ’b’), (5, ’c’), (9, ’d’)]))

This will output:

 (((. 0 (. 4:’b’ .)) 0 .) 0 (((. 1 (. 9:’d’ .)) 1 (. 5:’c’ .)) 1:’a’ .))

Figure 2 shows this result.

 Figure 3: An little-endian integer binary trie for the map {1 → a,4 → b,5 → c,9 → d}.

##### Insertion of integer Trie in Scheme/Lisp

Insertion program implemented with Scheme/Lisp is quite similar, since Scheme has a complex numeric system, it will use fraction instead of round to integer, we minus the key by 1 before divide it with 2.

;; Insertion
;;   if user insert an existed value, just overwrite the old value
;;   usage: (insert t k x) t: trie, k: key, x: value
(define (insert t k x)
(if (= k 0)
(make-int-trie (left t) x (right t))
(if (even? k)
(make-int-trie (insert (left t) (/ k 2) x) (value t) (right t))
(make-int-trie (left t) (value t) (insert (right t) (/ (- k 1) 2) x)))))

In order to make creation of Trie from a list of key-value pairs easy, here is a helper function.

(define (list->trie lst) ;;lst is list of pairs
(define (insert-pair t p)
(insert t (car p) (cadr p)))
(fold-left insert-pair ’() lst))

In order to convert the Trie to a readable string, a converter function is given as the following.

(define (trie->string trie)
(define (value->string x)
(cond ((null? x) ”.”)
((number? x) (number- >string x))
((string? x) x)
(else ”unknown_value”)))
(define (trie->str t k  m )
(if (null? t)
”.”
(string-append ”(” (trie->str (left t) k (*  m  2)) ”_”
(number- >string k) (value->string (value t)) ”_”
(trie->str (right t) (+  m  k) (*  m  2)) ”)”)))
(trie->str trie 0 1))

To verify the program, we can test it with a easy test case.

(define (test-int-trie)
(define t (list->trie (list ’(1 ”a”) ’(4 ”b”) ’(5 ”c”) ’(9 ”d”))))
(display (trie->string t)) (newline))

Evaluate this test function will generate below output.

(test-int-trie)
(((. 0. (. 4b .)) 0. .) 0. (((. 1. (. 9d .)) 1. (. 5c .)) 1a .))

Which is identical to the Haskell output.

#### 2.3 Look up in integer binary trie

##### 2.3.1 Iterative looking up algorithm

To look up a key in a little-endian integer binary trie. We take each bit of the key from left (LSB), and go left or right according to if the bit is 0, until we consumes all bits. this algorithm can be described as below pseudo code.

INT - TRIE - LOOKUP(T,x)
while x0 and TNIL do
if EV EN(x) = TRUE then
T LEFT(T)

else
T RIGHT(T)

end if
x x∕2

end while
if TNIL then
return  DATA(T)

else
return  NIL

end if

##### Look up implemented in C++

In C++, we can test the LSB by using bit-wise operation. The following code snippet searches a key in an integer Trie. If the target node is found, the value of that node is returned, else it will return the default value of the value type1 .

template <class  T >
T  lookup(IntTrie< T >* t, int key){
while(key  &&  t){
if( (key  &  0x1)  ==  0)
t  =  t->left;
else
t  =  t->right;
key>>=1;
}
if(t)
return t->value;
else
return  T();
}

To verify this program, some simple test cases are provided.

std::cout<<\nlook_up_4:_” <<lookup(tc, 4)
<<\nlook_up_9:_” <<lookup(tc, 9)
<<\nlook_up_0:_” <<lookup(tc, 0);

Where ’tc’ is the Trie we created in insertion section. the output is like below.

 look up 4: b  look up 9: d  look up 0:

##### Look up implemented in Python

By translating the pseudo code algorithm, we can easily get a python implementation.

def lookup(t, key):
while key != 0 and (t is not None):
if key  &  1  ==  0:
t  =  t.left
else
t  =  t.right
key  =  key>>1
if t is not None:
return t.value
else
return None

In this implementation, instead of using even-odd property, bit-wise manipulation is used to test if a bit is 0 or 1.

Here is the smoke test of the lookup function.

class IntTrieTest:
#...
def test_lookup(self):
t  =  map_to_trie({4:’y’, 1:’x’, 5:’z’}
print ”look_up_4:_”, lookup(t, 4)
print ”look_up_5:_”, lookup(t, 5)
print ”look_up_0:_”, lookup(t, 0)

The output of the test cases is as below.

 look up 4:  y  look up 5:  z  look up 0:  None

##### 2.3.2 Recursive looking up algorithm

Looking up in integer Trie can also be implemented in recursive manner. We take the LSB of the key to be found, if it is 0, we recursively look it up in left child, else in right child.

INT - TRIE - LOOKUP(T,x)
if T = NIL then
return  NIL

else if x = 0 then
return  V ALUE(T)

else if EV EN(xthen
return  INT - TRIE - LOOKUP(LEFT(T),x∕2)

else
return  INT - TRIE - LOOKUP(RIGHT(T),x∕2)

end if

##### Look up implemented in Haskell

In Haskell, we can use pattern matching to realize the above long if-then-else statements. The program is as the following.

search :: IntTrie a  ->  Key  ->   Maybe  a
search Empty k  =  Nothing
search t 0  =  value t
search t k  =  if even k then search (left t) (k ‘div‘ 2)
else search (right t) (k ‘div‘ 2)

If trie is empty, we simply returns nothing; if key is zero we return the value of current node; in other case we recursively search either left child or right child according to the LSB is 0 or not.

To test this program, we can write a smoke test case as following.

testIntTrie  =  ”t= ”  ++  (toString t)  ++
”\nsearch_t_4:_”  ++  (show \$ search t 4)  ++
”\nsearch_t_0:_”  ++  (show \$ search t 0)
where
t  =  fromList [(1, ’a’), (4, ’b’), (5, ’c’), (9, ’d’)]

main  =  do
putStrLn testIntTrie

This program will output these result.

 t=(((. 0 (. 4:’b’ .)) 0 .) 0 (((. 1 (. 9:’d’ .)) 1 (. 5:’c’ .)) 1:’a’ .))  search t 4: Just ’b’  search t 0: Nothing

##### Look up implemented in Scheme/Lisp

Scheme/Lisp implementation is quite similar. Note that we decrees key by 1 before divide it with 2.

(define (lookup t k)
(if (null? t) ’()
(if (= k 0) (value t)
(if (even? k)
(lookup (left t) (/ k 2))
(lookup (right t) (/ (- k 1) 2))))))

Test cases use the same Trie which is created in insertion section.

(define (test-int-trie)
(define t (list->trie (list ’(1 ”a”) ’(4 ”b”) ’(5 ”c”) ’(9 ”d”))))
(display (trie->string t)) (newline)
(display ”lookup_4:_”) (display (lookup t 4)) (newline)
(display ”lookup_0:_”) (display (lookup t 0)) (newline))

the result is as same as the one output by Haskell program.

(test-int-trie)
(((. 0. (. 4b .)) 0. .) 0. (((. 1. (. 9d .)) 1. (. 5c .)) 1a .))
lookup 4: b
lookup 0: ()

### 3 Integer Patricia Tree

It’s very easy to find the drawbacks of integer binary trie. Trie wasts a lot of spaces. Note in figure 2, all nodes except leafs store the real data. Typically, an integer binary trie contains many nodes only have one child. It is very easy to come to the idea for improvement, to compress the chained nodes which have only one child. Patricia is such a data structure invented by Donald R. Morrison in 1968. Patricia means practical algorithm to retrieve information coded in alphanumeric[3]. Wikipedia redirect Patricia as Radix tree.

Chris Okasaki gave his implementation of Integer Patricia tree in paper [2]. If we merge the chained nodes which have only one child together in figure 3, We can get a patricia as shown in figure 4.
 Figure 4: Little endian patricia for the map {1 → a,4 → b,5 → c,9 → d}.

From this figure, we can found that they keys of sibling nodes have the longest common prefix. They only branches out at certain bit. It means that we can save a lot of data by storing the common prefix.

Different from integer Trie, using big-endian integer in Patricia doesn’t cause the problem mentioned in section 2. Because all zero bits before MSB can be just omitted to save space. Big-endian integer is more natural than little-endian integer. Chris Okasaki list some significant advantages of big-endian Patricia trees [2].

#### 3.1 Definition of Integer Patricia tree

Integer Patricia tree is a special kind of binary tree, it is

• either a leaf node contains an integer key and a value
• or a branch node, contains a left child and a right child. The integer keys of two children shares the longest common prefix bits, the next bit of the left child’s key is zero while it is one for right child’s key.

##### Definition of big-endian integer Patricia tree in Haskell

If we translate the above recursive definition to Haskell, we can get below Integer Patrica Tree code.

data IntTree a  =  Empty
| Leaf Key a
| Branch Prefix Mask (IntTree a) (IntTree a)  --  prefix, mask, left, right

type Key  =  Int
type Prefix  =  Int

In order to tell from which bit the left and right children differ, a mask is recorded by the branch node. Typically, a mask is 2n, all lower bits than n doesn’t belong to common prefix

##### Definition of big-endian integer Patricia tree in Python

Such definition can be represent in Python similarly. Some helper functions are provided for easy operation later on.

class IntTree:
def __init__(self, key  =  None, value  =  None):
self.key  =  key
self.value  =  value
self.left  =  self.right  =  None

def set_children(self, l, r):
self.left  =  l
self.right  =  r

def replace_child(self, x, y):
if self.left  ==  x:
self.left  =  y
else
self.right  =  y

def is_leaf(self):
return self.left is None and self.right is None

def get_prefix(self):
if self.prefix is None:
return self.key
else
return self.prefix

Some helper member functions are provided in this definition. When Initialized, prefix, mask and children are all set to invalid value. Note the get_prefix() function, in case the prefix hasn’t been initialized, which means it is a leaf node, the key itself is returned.

##### Definition of big-endian integer Patricia tree in C++

With ISO C++, the type of the data stored in Patricia can be abstracted as a template parameter. The definition is similar to the python version.

template <class  T >
struct IntPatricia{
IntPatricia(int k=0,  T  v = T()):
key(k), value(v), prefix(k), mask(1), left(0), right(0){}

˜IntPatricia(){
delete left;
delete right;
}

bool is_leaf(){
return left  ==  0  &&  right  ==  0;
}

bool match(int x){
}

void replace_child(IntPatricia< T >* x, IntPatricia< T >* y){
if(left  ==  x)
left  =  y;
else
right  =  y;
}

void set_children(IntPatricia< T >* l, IntPatricia< T >* r){
left  =  l;
right  =  r;
}

int key;
T  value;
int prefix;
IntPatricia* left;
IntPatricia* right;
};

In order to release the memory easily, the program just recursively deletes the children in destructor. The default value of type T is used for initialization. The prefix is initialized to be the same value as key.

For the member function match(), I’ll explain it in later part.

##### Definition of big-endian integer Patricia tree in Scheme/Lisp

In Scheme/Lisp program, the data structure behind is still list, we provide creator functions and accessors to create Patricia and access the children, key, value, prefix and mask.

(define (make-leaf k v) ;; key and value
(list k v))

(define (make-branch p  m  l r) ;; prefix, mask, left and right
(list p  m  l r))

;; Helpers
(define (leaf? t)
(= (length t) 2))

(define (branch? t)
(= (length t) 4))

(define (key t)
(if (leaf? t) (car t) ’()))

(define (value t)
(if (leaf? t) (cadr t) ’()))

(define (prefix t)
(if (branch? t) (car t) ’()))

(if (branch? t) (cadr t) ’()))

(define (left t)
(if (branch? t) (caddr t) ’()))

(define (right t)
(if (branch? t) (cadddr t) ’()))

Function key and value are only applicable to leaf node while prefix, mask, children accessors are only applicable to branch node. So we test the node type in these functions.

#### 3.2 Insertion of Integer Patricia tree

When insert a key into a integer Patricia tree, if the tree is empty, we can just create a leaf node with the given key and data. (as shown in figure 5).
 Figure 5: (a). Insert key 12 to an empty patricia tree.

If the tree only contains a leaf node x, we can create a branch, put the new key and data as a leaf y of the branch. To determine if the new leaf y should be left node or right node. We need find the longest common prefix of x and y, for example if key(x) is 12 (1100 in binary), key(y) is 15 (1111 in binary), then the longest common prefix is 11oo. The o denotes the bits we don’t care about. we can use an integer to mask the those bits. In this case, the mask number is 4 (100 in binary). The next bit after the prefix presents 21. It’s 0 in key(x), while it is 1 in key(y). So we put x as left child and y as right child. Figure 6 shows this case.
 Figure 6: (b). Insert key 15 to the result tree in (a).

If the tree is neither empty, nor a leaf node, we need firstly check if the key to be inserted matches common prefix with root node. If it does, then we can recursively insert the key to the left child or right child according to the next bit. For instance, if we want to insert key 14 (1110 in binary) to the result tree in figure 6, since it has common prefix 11oo, and the next bit (the bit of 21) is 1, so we tried to insert 14 to the right child. Otherwise, if the key to be inserted doesn’t match the common prefix with the root node, we need branch a new leaf node. Figure 7 shows these 2 different cases.
 Figure 7: (c). Insert key 14 to the result tree in (b); (d). Insert key 5 to the result tree in (b).

##### 3.2.1 Iterative insertion algorithm for integer Patricia

Summarize the above cases, the insertion of integer patricia can be described with the following algorithm.

INT - PATRICIA - INSERT(T,x,data)
if T = NIL then
T CREATE - LEAF(x,data)
return  T

end if
y T
p NIL
while y is not LEAF and MATCH(x,PREFIX(y),MASK(y)) do
p y
y LEFT(y)

else
y RIGHT(y)

end if

end while
if LEAF(y) = TRUE and x = KEY (ythen
DATA(y) data

else
z BRANCH(y,CREATE - LEAF(x,data))
if p = NIL then
T z

else
if LEFT(p) = y then
LEFT(p) z

else
RIGHT(p) z

end if

end if

end if
return  T

In the above algorithm, MATCH procedure test if an integer key x, has the same prefix of node y above the mask bit. For instance, Suppose the prefix of node y can be denoted as p(n),p(n - 1),...,p(i),...,p(0) in binary, key x is k(n),k(n - 1),...,k(i),...,k(0), and mask of node y is 100...0 = 2i, if and only if p(j) = k(j) for all i j n, we say the key matches.

##### Insertion of big-endian integer Patricia tree in Python

Based on the above algorithm, the main insertion program can be realized as the following.

def insert(t, key, value  =  None):
if t is None:
t  =  IntTree(key, value)
return t

node  =  t
parent  =  None
while(True):
if match(key, node):
parent  =  node
node  =  node.left
else:
node  =  node.right
else:
if node.is_leaf() and key  ==  node.key:
node.value  =  value
else:
new_node  =  branch(node, IntTree(key, value))
if parent is None:
t  =  new_node
else:
parent.replace_child(node, new_node)
break
return t

The sub procedure of match, branch, lcp etc. are given as below.

def match(key, tree):
if tree.is_leaf():
return False

return x  &  (mask >>1)  ==  0

def lcp(p1, p2):
diff  =  (p1 ˆ p2)
while(diff!=0):
diff>>=1

def branch(t1, t2):
t  =  IntTree()
t.set_children(t1, t2)
else:
t.set_children(t2, t1)
return t

Function maskbit() can clear all bits covered by a mask to 0. For instance, x = 101101(b), and mask = 23 = 100(b), the lowest 2 bits will be cleared to 0, which means maskbit(x,mask) = 101100(b). This can be easily done by bit-wise operation.

Function zero() is used to check if the bit next to mask bit is 0. For instance, if x = 101101(b),y = 101111(b), and mask = 23 = 100(b), zero will check if the 2nd lowest bit is 0. So zero(x,mask) = true,zero(y,mask) = false.

Function lcp can extract ’the Longest Common Prefix’ of two integer. For the x and y in above example, because only the last 2 bits are different, so lcp(x,y) = 101100(b). And we set a mask to 23 = 100(b) to indicate that the last 2 bits are not effective for the prefix value.

To convert a list or a map into a Patricia tree, we can repeatedly insert the elements one by one. Since the program is same, except for the insert function, we can abstract list_to_xxx and map_to_xxx to utility functions

#  in trieutil.py
def from_list(l, insert_func):
t  =  None
for x in l:
t  =  insert_func(t, x)
return t

def from_map(m, insert_func):
t  =  None
for k, v in  m.items():
t  =  insert_func(t, k, v)
return t

With this high level functions, we can provide list_to_patricia and map_to_patricia as below.

def list_to_patricia(l):
return from_list(l, insert)

def map_to_patricia( m):
return from_map(m, insert)

In order to have smoke test of the above insertion program, some test cases and output helper are given.

def to_string(t):
to_str  =  lambda x: ” % s”  % x
if t is None:
return ””
if t.is_leaf():
str  =  to_str(t.key)
if t.value is not None:
str  +=  ”:” +to_str(t.value)
return str
str  = ”[” +to_str(t.prefix)+” @ ” +to_str(t.mask )+”]”
str+= ”(” +to_string(t.left)+”,” +to_string(t.right)+”)”
return str

class IntTreeTest:
def run(self):
self.test_insert()

def test_insert(self):
print ”test_insert”
t  =  list_to_patricia([6])
print to_string(t)
t  =  list_to_patricia([6, 7])
print to_string(t)
t  =  map_to_patricia({1:’x’, 4:’y’, 5:’z’}
print to_string(t)

if __name__  ==  ”__main__”:
IntTreeTest().run()

The program will output a result as the following.

 test insert  6  [6@2](6,7)  [0@8](1:x,[4@2](4:y,5:z))

This result means the program creates a Patrica tree shown in Figure 8.
 Figure 8: Insert map 1 → x,4 → y,5 → z into a big-endian integer Patricia tree.

##### Insertion of big-endian integer Patricia tree in C++

In the below C++ program, the default value of data type is used if user doesn’t provide data. It is nearly strict translation of the pseudo code.

template <class  T >
IntPatricia< T >* insert(IntPatricia< T >* t, int key,  T  value= T()){
if(!t)
return  new  IntPatricia< T >(key, value);

IntPatricia< T >* node  =  t;
IntPatricia< T >* parent(0);

while( node->is_leaf()==false  &&  node-> match(key) ){
parent  =  node;
node  =  node->left;
else
node  =  node->right;
}

if(node->is_leaf()  &&  key  ==  node-> key)
node->value  =  value;
else{
IntPatricia< T >* p  =  branch(node,  new  IntPatricia< T >(key, value));
if(!parent)
return p;
parent->replace_child(node, p);
}
return t;
}

Let’s review the implementation of member function match()

bool match(int x){
}

if a node is not a leaf, and it has common prefix (in bit-wise) as the key to be inserted, we say the node match the key. It is realized by a maskbit() function as below.

}

Since mask is always 2n, minus 1 will flip it to 111...1(b), then we reverse the it by bit-wise not, and clear all the lowest n - 1 bits of x by bit-wise and.

The branch() function in above program is as the following.

template <class  T >
IntPatricia< T >* branch(IntPatricia< T >* t1, IntPatricia< T >* t2){
IntPatricia< T >* t  =   new  IntPatricia< T>();
t-> mask  =  lcp(t->prefix, t1->prefix, t2->prefix);
t->set_children(t1, t2);
else
t->set_children(t2, t1);
return t;
}

It will extract the ’Longest Common Prefix’, and create a new node, put the 2 nodes to be merged as its children. Function lcp() is implemented as below.

int lcp(int&  p, int p1, int p2){
int diff  =  p1ˆp2;
while(diff){
diff>>=1;
}
}

Because we can only return one value in C++, we set the reference of parameter p as the common prefix result and returns the mask value.

To decide which child is left and which one is right when branching, we need test if the bit next to mask bit is zero.

return (x  &  (mask>>1))  ==  0;
}

To verify the C++ program, some simple test cases are provided.

IntPatricia< int>* ti(0);
const int lst[]  =  {6, 7}
ti  =  std::accumulate(lst, lst+sizeof(lst)/sizeof(int), ti,
std::ptr_fun(insert_key< int>));
std::copy(lst, lst+sizeof(lst)/sizeof(int),
std::ostream_iterator< int>(std::cout, ”,_”));
std::cout<< ” = =><<patricia_to_str(ti)<<\n”;

const int keys[]  =  {1, 4, 5}
const char vals[]  =  ”xyz”;
IntPatricia< char>* tc(0);
for(unsigned int i=0; i<sizeof(keys)/sizeof(int);  ++i)
tc  =  insert(tc, keys[i], vals[i]);
std::copy(keys, keys+sizeof(keys)/sizeof(int),
std::ostream_iterator< int>(std::cout, ”,_”));
std::cout<< ” = =><<patricia_to_str(tc);

To avoid repeating ourselves, we provide a different way instead of write a list_to_patrica(), which is very similar to list_to_trie in previous section.

In C++ STL, std::accumulate() plays a similar role of fold-left. But the functor we provide to accumulate must take 2 parameters, so we provide a wrapper function as below.

template <class  T >
IntPatricia< T >* insert_key(IntPatricia< T >* t, int key){
return insert(t, key);
}

With all these code line, we can get the following result.

 6, 7, ==>[6@2](6,7)  1, 4, 5, ==>[0@8](1:x,[4@2](4:y,5:z))

##### 3.2.2 Recursive insertion algorithm for integer Patricia

To implement insertion in recursive way, we treat the different cases separately. If the tree is empty, we just create a leaf node and return; if the tree is a leaf node, we need check the key of the node is as same as the key to be inserted, we overwrite the data in case they are same, else we need branch a new node and extract the longest common prefix and mask bit; In other case, we need examine if the key as common prefix with the branch node, and recursively perform insertion either to left child or to right child according to the next different bit is 0 or 1; Below recursive algorithm describes this approach.

INT - PATRICIA - INSERT(T,x,data)
if T = NIL or (T is a leaf and x = KEY (T)) then
return  CREATE - LEAF(x,data)

LEFT(T) INT - PATRICIA - INSERT(LEFT(T),x,data)

else
RIGHT(T) INT - PATRICIA - INSERT(RIGHT(T),x,data)

end if
return  T

else
return  BRANCH(T,CREATE - LEAF(x,data))

end if

##### Insertion of big-endian integer Patricia tree in Haskell

Insertion of big-endian integer Patricia tree can be implemented in Haskell by Change the above algorithm to recursive approach.

--  usage: insert tree key x
insert :: IntTree a  ->  Key  ->  a  ->  IntTree a
insert t k x
=  case t of
Empty  ->  Leaf k x
Leaf k’ x’  ->  if k == k’ then Leaf k x
else join k (Leaf k x) k’ t  --  t@(Leaf k x’)
Branch p  m  l r
| match k p  m   ->  if zero k  m
then Branch p  m  (insert l k x) r
else Branch p  m  l (insert r k x)
| otherwise  ->  join k (Leaf k x) p t  --  t@(Branch p  m  l r)

The match, zero and join functions in this program are defined as below.

--  join 2 nodes together.
--  (prefix1, tree1)  ++  (prefix2, tree2)
--   1. find the longest common prefix  ==  lcp(prefix1, prefix2), where
--          prefix1  =  a(n),a(n-1),...a(i+1),a(i),x...
--          prefix2  =  a(n),a(n-1),...a(i+1),a(i),y...
--          prefix   =  a(n),a(n-1),...a(i+1),a(i),00...0
--   2. mask bit  =  100...0b (=2ˆi)
--          so mask is something like, 1,2,4,...,128,256,...
--   3. if      x==’0’, y==’1’ then (tree1=>left, tree2=>right),
--      else if x==’1’, y==’0’ then (tree2=>left, tree1=>right).
join :: Prefix  ->  IntTree a  ->  Prefix  ->  IntTree a  ->  IntTree a
join p1 t1 p2 t2  =  if zero p1  m  then Branch p  m  t1 t2
else Branch p  m  t2 t1
where
(p,  m )  =  lcp p1 p2

--  lcp means longest common prefix
lcp :: Prefix  ->  Prefix  ->  (Prefix, Mask)
lcp p1 p2  =  (p,  m ) where
m   =  bit (highestBit (p1 ‘xor‘ p2))

--  get the order of highest bit of 1.
--  For a number x  =  00...0,1,a(i-1)...a(1)
--  the result is i
highestBit :: Int  ->  Int
highestBit x  =  if x ==0  then 0 else 1+highestBit (shiftR x 1)

--  For a number x  =  a(n),a(n-1)...a(i),a(i-1),...,a(0)
--  and a mask  m   =  100..0 (=2ˆi)
--  the result of mask x  m  is a(n),a(n-1)...a(i),00..0
mask x  m   =  (x .&. complement (m -1))  --  complement means bit-wise not.

--  Test if the next bit after mask bit is zero
--  For a number x  =  a(n),a(n-1)...a(i),1,...a(0)
--  and a mask    m   =  100..0 (=2ˆi)
--  because the bit next to a(i) is 1, so the result is False
--  For a number y  =  a(n),a(n-1)...a(i),0,...a(0) the result is True.
zero :: Int  ->  Mask  ->  Bool
zero x  m   =  x .&. (shiftR  m  1)  ==  0

--  Test if a key matches a prefix above of the mask bit
--  For a prefix: p(n),p(n-1)...p(i)...p(0)
--      a key:    k(n),k(n-1)...k(i)...k(0)
--  and a mask:                 100..0  =  (2ˆi)
--  If and only if p(j)==k(j), i<= j<= n the result is True
match :: Key  ->  Prefix  ->  Mask  ->  Bool
match k p  m   =  (mask k  m )  ==  p

In order to test the above insertion program, some test helper functions are provided.

--  Generate a Int Patricia tree from a list
--  Usage: fromList [(k1, x1), (k2, x2),..., (kn, xn)]
fromList :: [(Key, a)]  ->  IntTree a
fromList xs  =  foldl ins Empty xs where
ins t (k, v)  =  insert t k v

toString :: (Show  a)=> IntTree a  ->  String
toString t  =
case t of
Empty  ->  ”.”
Leaf k x  ->  (show k)  ++  ”:”  ++  (show x)
Branch p  m  l r  ->  ”[”  ++  (show p)  ++  ” @ ”  ++  (show  m )  ++  ”]”  ++
”(”  ++  (toString l)  ++  ”,_”  ++  (toString r)  ++  ”)”

With these helpers, insertion can be test as the following.

testIntTree  =  ”t= ”  ++  (toString t)
where
t  =  fromList [(1, ’x’), (4, ’y’), (5, ’z’)]

main  =  do
putStrLn testIntTree

This test will output:

 t=[0@8](1:’x’, [4@2](4:’y’, 5:’z’))

This result means the program creates a Patrica tree shown in Figure 8.

##### Insertion of big-endian integer Patricia tree in Scheme/Lisp

In Scheme/Lisp, we use switch-case like condition to test if the node is empty, or a leaf or a branch.

(define (insert t k x) ;; t: patrica, k: key, x: value
(cond ((null? t) (make-leaf k x))
((leaf? t) (if (= (key t) k)
(make-leaf k x) ;; overwrite
(branch k (make-leaf k x) (key t) t)))
((branch? t) (if (match? k (prefix t) (mask t))
(make-branch (prefix t)
(insert (left t) k x)
(right t))
(make-branch (prefix t)
(left t)
(insert (right t) k x)))
(branch k (make-leaf k x) (prefix t) t)))))

Where the function match?, zero-bit?, and branch are given as the following. We use the scheme fix number bit-wise operations to mask the number and test bit.

(fix:and x (fix:not (-  m  1))))

(define (zero-bit? x  m )
(= (fix:and x (fix:lsh  m  -1)) 0))

(define (lcp x y) ;; get the longest common prefix
(if (= z 0) 1 (* 2 (count-mask (fix:lsh z -1)))))
(let* ((m  (count-mask (fix:xor x y)))
(cons p  m)))

(define (match? k p  m )
(= (mask-bit k  m ) p))

(define (branch p1 t1 p2 t2) ;; pi: prefix i, ti: Patricia i
(let* ((pm  (lcp p1 p2))
(p (car  pm))
( m  (cdr  pm)))
(if (zero-bit? p1  m )
(make-branch p  m  t1 t2)
(make-branch p  m  t2 t1))))

We can use the very same list-¿trie function which is defined in integer trie. Below is an example to create a integer Patricia tree.

(define (test-int-patricia)
(define t (list->trie (list ’(1 ”x”) ’(4 ”y”) ’(5 ”z”))))
(display t) (newline))

Evaluate it will generate a Patricia tree like below.

(test-int-patricia)
(0 8 (1 x) (4 2 (4 y) (5 z)))

It is identical to the insert result output by Hasekll insertion program.

#### 3.3 Look up in Integer Patricia tree

Consider the property of integer Patricia tree, to look up a key, we test if the key has common prefix with the root, if yes, we then check the next bit differs from common prefix is zero or one. If it is zero, we then do look up in the left child, else we turn to right.

##### 3.3.1 Iterative looking up in integer Patricia tree

In case we reach a leaf node, we can directly check if the key of the leaf is equal to what we are looking up. This algorithm can be described with the following pseudo code.

INT - PATRICIA - LOOK - UP(T,x)
if T = NIL then
return  NIL

end if
while T is not LEAF and MATCH(x,PREFIX(T),MASK(T)) do
T LEFT(T)

else
T RIGHT(T)

end if

end while
if T is LEAF and KEY (T) = x then
return  DATA(T)

else
return  NIL

end if

##### Look up in big-endian integer Patricia tree in Python

With Python, we can directly translate the pseudo code into valid program.

def lookup(t, key):
if t is None:
return None
while (not t.is_leaf()) and match(key, t):
t  =  t.left
else
t  =  t.right
if t.is_leaf() and t.key  ==  key:
return t.value
else
return None

We can verify this program by some simple smoke test cases.

print ”test_look_up”
t  =  map_to_patricia({1:’x’, 4:’y’, 5:’z’}
print ”look_up_4:_”, lookup(t, 4)
print ”look_up_0:_”, lookup(t, 0)

We can get similar output as below.

 test look up  look up 4:  y  look up 0:  None

##### Look up in big-endian integer Patricia tree in C++

With C++ language, if the program doesn’t find the key, we can either raise exception to indicate a search failure or return a special value.

template <class  T >
T  lookup(IntPatricia< T >* t, int key){
if(!t)
return  T(); //or throw exception

while( (!t->is_leaf())  &&  t-> match(key)){
t  =  t->left;
else
t  =  t->right;
}
if(t->is_leaf()  &&  t-> key  ==  key)
return t->value;
else
return  T(); //or throw exception
}

We can try some test cases to search keys in a Patricia tree we created when test insertion.

std::cout<<\nlook_up_4:_” <<lookup(tc, 4)
<<\nlook_up_0:_” <<lookup(tc, 0)<<\n”;

The output result is as the following.

 look up 4: y  look up 0:

##### 3.3.2 Recursive looking up in integer Patricia tree

We can easily change the while-loop in above iterative algorithm into recursive calls, so that we can have a functional approach.

INT - PATRICIA - LOOK - UP(T,x)
if T = NIL then
return  NIL

else if T is a leaf and x = KEY (Tthen
return  V ALUE(T)

return  INT - PATRICIA - LOOK - UP(LEFT(T),x)

else
return  INT - PATRICIA - LOOK - UP(RIGHT(T),x)

end if

else
return  NIL

end if

##### Look up in big-endian integer Patricia tree in Haskell

By changing the above if-then-else into pattern matching, we can get Haskell version of looking up program.

--  look up a key
search :: IntTree a  ->  Key  ->   Maybe  a
search t k
=  case t of
Empty  ->  Nothing
Leaf k’ x  ->  if k == k’ then Just x else Nothing
Branch p  m  l r
| match k p  m   ->  if zero k  m  then search l k
else search r k
| otherwise  ->  Nothing

And we can test this program with looking up some keys in the previously created Patricia tree.

testIntTree  =  ”t= ”  ++  (toString t)  ++  ”\nsearch_t_4:_”  ++  (show \$ search t 4)  ++
”\nsearch_t_0:_”  ++  (show \$ search t 0)
where
t  =  fromList [(1, ’x’), (4, ’y’), (5, ’z’)]

main  =  do
putStrLn testIntTree

The output result is as the following.

 t=[0@8](1:’x’, [4@2](4:’y’, 5:’z’))  search t 4: Just ’y’  search t 0: Nothing

##### Look up in big-endian integer Patricia tree in Scheme/Lisp

Scheme/Lisp program for looking up is similar in case the tree is empty, we just returns nothing; If it is a leaf node and the key is equal to the number we are looking for, we find the result; If it is branch, we need test if binary format of the prefix matches the number, then we recursively search either in left child or in right child according to the next bit after mask is zero or not.

(define (lookup t k)
(cond ((null? t) ’())
((leaf? t) (if (= (key t) k) (value t) ’()))
((branch? t) (if (match? k (prefix t) (mask t))
(lookup (left t) k)
(lookup (right t) k))
’()))))

We can test it with the Patricia tree we create in the insertion program.

(define (test-int-patricia)
(define t (list->trie (list ’(1 ”x”) ’(4 ”y”) ’(5 ”z”))))
(display t) (newline)
(display ”lookup_4:_”) (display (lookup t 4)) (newline)
(display ”lookup_0:_”) (display (lookup t 0)) (newline))

The result is like below.

(test-int-patricia)
(0 8 (1 x) (4 2 (4 y) (5 z)))
lookup 4: y
lookup 0: ()

### 4 Alphabetic Trie

Integer based Trie and Patricia Tree can be a good start point. Such technical plays important role in Compiler implementation. Okasaki pointed that the widely used Haskell Compiler GHC (Glasgow Haskell Compiler), utilizes a similar implementation for several years before 1998 [2].

While if we extend the type of the key from integer to alphabetic value, Trie and Patricia tree can be very useful in textual manipulation engineering problems.

#### 4.1 Definition of alphabetic Trie

If the key is alphabetic value, just left and right children can’t represent all values. For English, there are 26 letters and each can be lower case or upper case. If we don’t care about case, one solution is to limit the number of branches (children) to 26. Some simplified ANSI C implementation of Trie are defined by using an array of 26 letters. This can be illustrated as in Figure 9.
 Figure 9: A Trie with 26 branches, with key a, an, another, bool, boy and zoo inserted.

In each node, not all branches may contain data. for instance, in the above figure, the root node only has its branches represent letter a, b, and z have sub trees. Other branches such as for letter c, is empty. For other nodes, empty branches (point to nil) are not shown.

I’ll give such simplified implementation in ANSI C in later section, however, before we go to the detailed source code, let’s consider some alternatives.

In case of language other than English, there may be more letters than 26, and if we need solve case sensitive problem. we face a problem of dynamic size of sub branches. There are 2 typical method to represent children, one is by using Hash table, the other is by using map. We’ll show these two types of method in Python and C++.

##### Definition of alphabetic Trie in ANSI C

ANSI C implementation is to illustrate a simplified approach limited only to case-insensitive English language. The program can’t deal with letters other than lower case ’a’ to ’z’ such as digits, space, tab etc.

struct Trie{
struct Trie* children[26];
void* data;
};

In order to initialize/destroy the children and data, I also provide 2 helper functions.

struct Trie* create_node(){
struct Trie* t  =  (struct Trie*)malloc(sizeof(struct Trie));
int i;
for(i=0; i<26;  ++i)
t->children[i]=0;
t->data=0;
return t;
}

void destroy(struct Trie* t){
if(!t)
return

int i;
for(i=0; i<26;  ++i)
destroy(t->children[i]);

if(t->data)
free(t->data);
free(t);
}

Note that, the destroy function uses recursive approach to free all children nodes.

##### Definition of alphabetic Trie in C++

With C++ and STL, we can abstract the language and characters as type parameter. Since the number of characters of the undetermined language varies, we can use std::map to store children of a node.

template <class Char, class Value>
struct Trie{
typedef Trie< Char, Value>  Self;
typedef std::map < Char, Self*> Children;
typedef Value ValueType;

Trie():value(Value()){}

virtual ˜Trie(){
for(typename Children::iterator it=children.begin();
it!=children.end();  ++it)
delete it->second;
}

Value value;
Children children;
};

For simple illustration purpose, recursive destructor is used to release the memory.

##### Definition of alphabetic Trie in Haskell

We can use Haskell record syntax to get some “free” accessor functions[5].

data Trie a  =  Trie { value ::  Maybe  a
, children :: [(Char, Trie a)]}

empty  =  Trie Nothing []

Neither Map nor Hash table is used, just a list of pairs can realize the same purpose. Function empty can help to create an empty Trie node. This implementation doesn’t constrain the key values to lower case English letters, it can actually contains any values of ’Char’ type.

##### Definition of alphabetic Trie in Python

In Python version, we can use Hash table as the data structure to represent children nodes.

class Trie:
def __init__(self):
self.value  =  None
self.children  =  {}

##### Definition of alphabetic Trie in Scheme/Lisp

The definition of alphabetic Trie in Scheme/Lisp is a list of two elements, one is the value of the node, the other is a children list. The children list is a list of pairs, one is the character binding to the child, the other is a Trie node.

(define (make-trie v lop) ;; v: value, lop: children, list of char-trie pairs
(cons v lop))

(define (value t)
(if (null? t) ’() (car t)))

(define (children t)
(if (null? t) ’() (cdr t)))

In order to create the child and access it easily, we also provide functions for such purpose.

(define (make-child k t)
(cons k t))

(define (key child)
(if (null? child) ’() (car child)))

(define (tree child)
(if (null? child) ’() (cdr child)))

#### 4.2 Insertion of alphabetic trie

To insert a key with type of string into a Trie, we pick the first letter from the key string. Then check from the root node, we examine which branch among the children represents this letter. If the branch is null, we then create an empty node. After that, we pick the next letter from the key string and pick the proper branch from the grand children of the root.

We repeat the above process till finishing all the letters of the key. At this time point, we can finally set the data to be inserted as the value of the node.

Note that the value of root node of Trie is always empty.

##### 4.2.1 Iterative algorithm of trie insertion

The below pseudo code describes the above insertion algorithm.

TRIE - INSERT(T,key,data)
if T = NIL then
T EmptyNode

end if
p = T
for each c in key do
if CHILDREN(p)[c] = NIL then
CHILDREN(p)[c] EmptyNode

end if
p CHILDREN(p)[c]

end for
DATA(p) data
return  T

##### Simplified insertion of alphabetic trie in ANSI C

Go on with the above ANSI C definition, because only lower case English letter is supported, we can use plan array manipulation to do the insertion.

struct Trie* insert(struct Trie* t, const char* key, void* value){
if(!t)
t=create_node();

struct Trie* p  =t;
while(*key){
int c  =  *key  -  ’a’;
if(!p ->children[c])
p ->children[c]  =  create_node();
p  =  p ->children[c];
++ key;
}
p ->data  =  value;
return t;
}

In order to test the above program, some helper functions to print content of the Trie is provided as the following.

void print_trie(struct Trie* t, const char* prefix){
printf(”(%s”, prefix);
if(t->data)
printf(”:%s”, (char*)(t->data));
int i;
for(i=0; i<26;  ++i){
if(t->children[i]){
printf(”,_”);
char* new_prefix=(char*)malloc(strlen(prefix+1)*sizeof(char));
sprintf(new_prefix, ” % s % c”, prefix, i+’a’);
print_trie(t->children[i], new_prefix);
}
}
printf(”)”);
}

After that, we can test the insertion program with such test cases.

struct Trie* test_insert(){
struct Trie* t=0;
t  =  insert(t, ”a”, 0);
t  =  insert(t, ”an”, 0);
t  =  insert(t, ”another”, 0);
t  =  insert(t, ”boy”, 0);
t  =  insert(t, ”bool”, 0);
t  =  insert(t, ”zoo”, 0);
print_trie(t, ””);
return t;
}

int main(int argc, char** argv){
struct Trie* t  =  test_insert();
destroy(t);
return 0;
}

This program will output a Trie like this.

 (, (a, (an, (ano, (anot, (anoth, (anothe, (another))))))),  (b, (bo, (boo, (bool)), (boy))), (z, (zo, (zoo))))

It is exactly the Trie as shown in figure 9.

##### Insertion of alphabetic Trie in C++

With above C++ definition, we can utilize STL provided search function in std::map to locate a child quickly, the program is implemented as the following, note that if user only provides key for insert, we also insert a default value of that type.

template <class Char, class Value, class Key >
Trie< Char, Value>* insert(Trie< Char, Value>* t, Key key, Value value= Value()){
if(!t)
t  =   new  Trie< Char, Value>();

Trie< Char, Value>* p(t);
for(typename Key::iterator it= key.begin(); it!=key.end();  ++it){
if(p ->children.find(*it)  ==  p ->children.end())
p ->children[*it]  =   new  Trie< Char, Value>();
p  =  p ->children[*it];
}
p ->value  =  value;
return t;
}

template <class  T, class  K >
T * insert_key(T * t,  K  key){
return insert(t, key);
}

Where insert_key() acts as a adapter, we’ll use similar accumulation method to create trie from list later.

To test this program, we provide the helper functions to print the trie on console.

template <class  T >
std::string trie_to_str(T * t, std::string prefix= ””){
std::ostringstream s;
s<< ”(” <<prefix;
if(t->value != typename  T::ValueType())
s<< ”:” << t->value;
for(typename  T::Children::iterator it= t->children.begin();
it!=t->children.end();  ++it)
s<< ”,_” <<trie_to_str(it-> second, prefix+it->first);
s<< ”)”;
return s.str();
}

After that, we can test our program with some simple test cases.

typedef Trie< char, std::string>  TrieType;
TrieType* t(0);
const char* lst[]  =  {”a”, ”an”, ”another”, ”b”, ”bob”, ”bool”, ”home”}
t  =  std::accumulate(lst, lst+sizeof(lst)/sizeof(char*), t,
std::ptr_fun(insert_key< TrieType, std::string>));
std::copy(lst, lst+sizeof(lst)/sizeof(char*),
std::ostream_iterator<std::string>(std::cout, ”,_”));
std::cout<<\n = =><<trie_to_str(t)<<\n”;
delete t;

t=0;
const char* keys[]  =  {”001”, ”100”, ”101”}
const char* vals[]  =  {”y”, ”x”, ”z”}
for(unsigned int i=0; i<sizeof(keys)/sizeof(char*);  ++i)
t  =  insert(t, std::string(keys[i]), std::string(vals[i]));
std::copy(keys, keys+sizeof(keys)/sizeof(char*),
std::ostream_iterator<std::string>(std::cout, ”,_”));
std::cout<< ” = =><<trie_to_str(t)<<\n”;
delete t;

It will output result like this.

 a, an, another, b, bob, bool, home,  ==>(, (a, (an, (ano, (anot, (anoth, (anothe, (another))))))), (b, (bo,  (bob), (boo, (bool)))), (h, (ho, (hom, (home)))))  001, 100, 101, ==>(, (0, (00, (001:y))), (1, (10, (100:x), (101:z))))

##### Insertion of alphabetic trie in Python

In python the implementation is very similar to the pseudo code.

def trie_insert(t, key, value  =  None):
if t is None:
t  =  Trie()

p  =  t
for c in key:
if not c in p.children:
p.children[c]  =  Trie()
p  =  p.children[c]
p.value  =  value
return t

And we define the helper functions as the following.

def trie_to_str(t, prefix= ””):
str= ”(” +prefix
if t.value is not None:
str  +=  ”:” +t.value
for k,v in sorted(t.children.items()):
str  +=  ”,_” +trie_to_str(v, prefix+ k)
str+= ”)”
return str

def list_to_trie(l):
return from_list(l, trie_insert)

def map_to_trie( m):
return from_map(m, trie_insert)

With these helpers, we can test the insert program as below.

class TrieTest:
#...
def test_insert(self):
t  =  None
t  =  trie_insert(t, ”a”)
t  =  trie_insert(t, ”an”)
t  =  trie_insert(t, ”another”)
t  =  trie_insert(t, ”b”)
t  =  trie_insert(t, ”bob”)
t  =  trie_insert(t, ”bool”)
t  =  trie_insert(t, ”home”)
print trie_to_str(t)

It will print a trie in console.

 (, (a, (an, (ano, (anot, (anoth, (anothe, (another))))))),  (b, (bo, (bob), (boo, (bool)))), (h, (ho, (hom, (home)))))

##### 4.2.2 Recursive algorithm of Trie insertion

The iterative algorithms can transform to recursive algorithm by such approach. We take one character from the key, and locate the child branch, then recursively insert the left characters of the key to that branch. If the branch is empty, we create a new node and add it to children before doing the recursively insertion.

TRIE - INSERT(T,key,data)
if T = NIL then
T EmptyNode

end if
if key = NIL then
V ALUE(T) data

else
p FIND(CHILDREN(T),FIRST(key))
if p = NIL then
p APPEND(CHILDREN(T),FIRST(key),EmptyNode)

end if
TRIE - INSERT(p,REST(key),data)

end if
return  T

##### Insertion of alphabetic trie in Haskell

To realize the insertion in Haskell, The only thing we need do is to translate the for-each loop into recursive call.

insert :: Trie a  ->  String  ->  a  ->  Trie a
insert t []     x  =  Trie (Just x)  (children t)
insert t (k:ks) x  =  Trie (value t) (ins (children t) k ks x) where
ins [] k ks x  =  [(k, (insert empty ks x))]
ins (p:ps) k ks x  =  if fst p  ==  k
then (k, insert (snd p) ks x):ps
else p:(ins ps k ks x)

If the key is empty, the program reaches the trivial terminator case. It just set the value. In other case, it examine the children recursively. Each element of the children is a pair, contains a character and a branch.

Some helper functions are provided as the following.

fromList :: [(String, a)]  ->  Trie a
fromList xs  =  foldl ins empty xs where
ins t (k, v)  =  insert t k v

toString :: (Show  a)=>  Trie a  ->  String
toString t  =  toStr t ”” where
toStr t prefix  =  ”(”  ++  prefix  ++  showMaybe (value t)  ++
(concat (map  (\(k, v)-> ”,_”  ++  toStr v (prefix++[k])))
(sort (children t)))
++  ”)”
showMaybe Nothing  =  ””
showMaybe (Just x)   =  ”:”  ++  show x

sort :: (Ord  a)=>[(a, b)]  ->  [(a, b)]
sort []  =  []
sort (p:ps)  =  sort xs  ++  [p]  ++  sort ys where
xs  =  [x | x <- ps, fst x  <=  fst p ]
ys  =  [y | y <- ps, fst y  >  fst p ]

The fromList function provide an easy way to repeatedly extract key-value pairs from a list and insert them into a Trie.

Function toString can print the Trie in a modified pre-order way. Because the children stored in a unsorted list, a sort function is provided to sort the branches. The quick-sort algorithm is used.

We can test the above program with the below test cases.

testTrie  =  ”t= ”  ++  (toString t)
where
t  =  fromList [(”a”, 1), (”an”, 2), (”another”, 7), (”boy”, 3),
(”bool”, 4), (”zoo”, 3)]

main  =  do
putStrLn testTrie

The program outputs:

 t=(, (a:1, (an:2, (ano, (anot, (anoth, (anothe, (another:7))))))),  (b, (bo, (boy:3), (boo, (bool:4)))), (z, (zo, (zoo:3))))

It is identical to the ANSI C result except the values we inserted.

##### Insertion of alphabetic trie in Scheme/Lisp

In order to manipulate string like list, we provide two helper function to provide car, cdr like operations for string.

(define (string-car s)

(define (string-cdr s)
(string-tail s 1))

After that, we can implement insert program as the following.

(define (insert t k x)
(define (ins lst k ks x) ;; return list of child
(if (null? lst)
(list (make-child k (insert ’() ks x)))
(if (string=? (key (car lst)) k)
(cons (make-child k (insert (tree (car lst)) ks x)) (cdr lst))
(cons (car lst) (ins (cdr lst) k ks x)))))
(if (string-null? k)
(make-trie x (children t))
(make-trie (value t)
(ins (children t) (string-car k) (string-cdr k) x))))

In order to print readable string for a Trie, we provide a pre-order manner of Trie traverse function. It can convert a Trie to string.

(define (trie->string t)
(define (value->string x)
(cond ((null? x) ”.”)
((number? x) (number- >string x))
((string? x) x)
(else ”unknown_value”)))
(define (trie->str t prefix)
(define (child->str c)
(string-append ”,_” (trie->str (tree c) (string-append prefix (key c)))))
(let ((lst (map  child->str (sort-children (children t)))))
(string-append ”(” prefix (value->string (value t))
(fold-left string-append ”” lst) ”)”)))
(trie->str t ””))

Where sort-children is a quick sort algorithm to sort all children of a node based on keys.

(define (sort-children lst)
(if (null? lst) ’()
(let ((xs (filter (lambda (c) (string<=?  (key c) (key (car lst))))
(cdr lst)))
(ys (filter (lambda (c) (string>?  (key c) (key (car lst))))
(cdr lst))))
(append (sort-children xs)
(list (car lst))
(sort-children ys)))))

Function filter is only available after R6RS, for R5RS, we define the filter function manually.

(define (filter pred lst)
(keep-matching-items lst pred))

With all of these definition, we can test our insert program with some simple test cases.

(define (test-trie)
(define t (list->trie (list ’(”a” 1) ’(”an” 2) ’(”another” 7)
’(”boy” 3) ’(”bool” 4) ’(”zoo” 3))))
(define t2 (list->trie (list ’(”zoo” 3) ’(”bool” 4) ’(”boy” 3)
’(”another” 7) ’(”an” 2) ’(”a” 1))))
(display (trie->string t)) (newline)
(display (trie->string t2)) (newline))

In the above test program, function trie-¿string is reused, it is previous defined for integer trie.

Evaluate test-trie function will output the following result.

(test-trie)
(., (a1, (an2, (ano., (anot., (anoth., (anothe., (another7))))))),
(b., (bo., (boo., (bool4)), (boy3))), (z., (zo., (zoo3))))
(., (a1, (an2, (ano., (anot., (anoth., (anothe., (another7))))))),
(b., (bo., (boo., (bool4)), (boy3))), (z., (zo., (zoo3))))

#### 4.3 Look up in alphabetic trie

To look up a key in a Trie, we also extract the character from the key string one by one. For each character, we search among the children branches to see if there is a branch represented by this character. In case there is no such child, the look up process terminates immediately to indicate a fail result. If we reach the last character, The data stored in the current node is the result we are looking up.

##### 4.3.1 Iterative look up algorithm for alphabetic Trie

This process can be described in pseudo code as below.

TRIE - LOOK - UP(T,key)
if T = NIL then
return  NIL

end if
p = T
for each c in key do
if CHILDREN(p)[c] = NIL then
return  NIL

end if
p CHILDREN(p)[c]

end for
return  DATA(p)

##### Look up in alphabetic Trie in C++

We can easily translate the iterative algorithm to C++. If the key specified can’t be found in the Trie, our program returns a default value of the data type. As alternative, it is also a choice to raise exception.

template <class  T, class Key >
typename  T::ValueType lookup(T * t, Key key){
if(!t)
return typename  T::ValueType(); //or throw exception

T * p(t);
for(typename Key::iterator it= key.begin(); it!=key.end();  ++it){
if(p ->children.find(*it)  ==  p ->children.end())
return typename  T::ValueType(); //or throw exception
p  =  p ->children[*it];
}
return p ->value;
}

To verify the look up program, we can test it with some simple test cases.

Trie< charint>* t(0);
const char* keys[]  =  {”a”, ”an”, ”another”, ”b”, ”bool”, ”bob”, ”home”}
const int vals[]  =  {1, 2, 7, 1, 4, 3, 4}
for(unsigned int i=0; i<sizeof(keys)/sizeof(char*);  ++i)
t  =  insert(t, std::string(keys[i]), vals[i]);
std::cout<<\nlookup_another:_” <<lookup(t, std::string(”another”))
<<\nlookup_home:_” <<lookup(t, std::string(”home”))
<<\nlookup_the:_” <<lookup(t, std::string(”the”))<<\n”;
delete t;

We can get result as below.

 lookup another: 7  lookup home: 4  lookup the: 0

We see that key word “the” isn’t contained in Trie, in our program, the default value of integer, 0 is returned.

##### Look up in alphabetic trie in Python

By translating the algorithm in Python language, we can get a imperative program.

def lookup(t, key):
if t is None:
return None

p  =  t
for c in key:
if not c in p.children:
return None
p  =  p.children[c]
return p.value

We can use the similiar test cases to test looking up function.

class TrieTest:
#...
def test_lookup(self):
t  =  map_to_trie({”a”:1, ”an”:2, ”another”:7, ”b”:1,
”bool”:4, ”bob”:3, ”home”:4}
print ”find_another:_”, lookup(t, ”another”)
print ”find_home:_”, lookup(t, ”home”)
print ”find_the:_”, lookup(t, ”the”)

The result of these test cases are.

 find another:  7  find home:  4  find the:  None

##### 4.3.2 Recursive look up algorithm for alphabetic Trie

In recursive algorithm, we take first character from the key to be looked up. If it can be found in a child for the current node, we then recursively search the rest characters of the key from that child branch. else it means the key can’t be found.

TRIE - LOOK - UP(T,key)
if key = NIL then
return  V ALUE(T)

end if
p FIND(CHILDREN(T),FIRST(key))
if p = NIL then
return  NIL

else
return  TRIE - LOOK - UP(p,REST(key))

end if

##### Look up in alphabetic trie in Haskell

To express this algorithm in Haskell, we can utilize ’lookup’ function in Haskell standard library[5].

find :: Trie a  ->  String  ->   Maybe  a
find t []  =  value t
find t (k:ks)  =  case lookup k (children t) of
Nothing  ->  Nothing
Just t’  ->  find t’ ks

We can append some search test cases right after insert.

testTrie  =  ”t= ”  ++  (toString t)  ++
”\nsearch_t_an:_”  ++  (show (find t ”an”))  ++
”\nsearch_t_boy:_”  ++  (show (find t ”boy”))  ++
”\nsearch_t_the:_”  ++  (show (find t ”the”))
...

Here is the search result.

 search t an: Just 2  search t boy: Just 3  search t the: Nothing

##### Look up in alphabetic trie in Scheme/Lisp

In Scheme/Lisp program, if the key is empty, we just return the value of the current node, else we recursively find in children of the node to see if there is a child binding to a character, which match the first character of the key. We repeat this process till examine all characters of the key.

(define (lookup t k)
(define (find k lst)
(if (null? lst) ’()
(if (string=? k (key (car lst)))
(tree (car lst))
(find k (cdr lst)))))
(if (string-null? k) (value t)
(let ((child (find (string-car k) (children t))))
(if (null? child) ’()
(lookup child (string-cdr k))))))

we can test this look up with similar test cases as in Haskell program.

(define (test-trie)
(define t (list->trie (list ’(”a” 1) ’(”an” 2) ’(”another” 7)
’(”boy” 3) ’(”bool” 4) ’(”zoo” 3))))
(display (trie->string t)) (newline)
(display ”lookup_an:_”) (display (lookup t ”an”)) (newline)
(display ”lookup_boy:_”) (display (lookup t ”boy”)) (newline)
(display ”lookup_the:_”) (display (lookup t ”the”)) (newline))

This program will output the following result.

(test-trie)
(., (a1, (an2, (ano., (anot., (anoth., (anothe., (another7))))))),
(b., (bo., (boo., (bool4)), (boy3))), (z., (zo., (zoo3))))
lookup an: 2
lookup boy: 3
lookup the: ()

### 5 Alphabetic Patricia Tree

Alphabetic Trie has the same problem as integer Trie. It is not memory efficient. We can use the same method to compress alphabetic Trie into Patricia.

#### 5.1 Definition of alphabetic Patricia Tree

Alphabetic patricia tree is a special tree, each node contains multiple branches. All children of a node share the longest common prefix string. There is no node has only one children, because it is conflict with the longest common prefix property.

If we turn the Trie shown in figure 9 into Patricia tree by compressing those nodes which has only one child. we can get a Patricia tree like in figure 10.
 Figure 10: A Patricia tree, with key a, an, another, bool, boy and zoo inserted.

Note that the root node always contains empty value.

##### Definition of alphabetic Patricia Tree in Haskell

We can use a similar definition as Trie in Haskell, we need change the type of the first element of children from single character to string.

type Key  =  String

data Patricia a  =  Patricia { value ::  Maybe  a
, children :: [(Key, Patricia a)]}

empty  =  Patricia Nothing []

leaf :: a  ->  Patricia a
leaf x  =  Patricia (Just x) []

Besides the definition, helper functions to create a empty Patricia node and to create a leaf node are provided.

##### Definition of alphabetic Patricia tree in Python

The definition of Patricia tree is same as Trie in Python.

class Patricia:
def __init__(self, value  =  None):
self.value  =  value
self.children  =  {}

##### Definition of alphabetic Patricia tree in C++

With ISO C++, we abstract the key type of value type as type parameters, and utilize STL provide map container to represent children of a node.

template <class Key, class Value>
struct Patricia{
typedef Patricia< Key, Value>  Self;
typedef std::map < Key, Self*> Children;
typedef Key   KeyType;
typedef Value ValueType;

Patricia(Value v = Value()):value(v){}

virtual ˜Patricia(){
for(typename Children::iterator it=children.begin();
it!=children.end();  ++it)
delete it->second;
}

Value value;
Children children;
};

For illustration purpose, we simply release the memory in a recursive way.

##### Definition of alphabetic Patricia tree in Scheme/Lisp

We can fully reuse the definition of alphabetic Trie in Scheme/Lisp. In order to provide a easy way to create a leaf node, we define an extra helper function.

(define (make-leaf x)
(make-trie x ’()))

#### 5.2 Insertion of alphabetic Patricia Tree

When insert a key, s, into the Patricia tree, if the tree is empty, we can just create an leaf node. Otherwise, we need check each child of the Patricia tree. Every branch of the children is binding to a key, we denote them as, s1,s2,...,sn, which means there are n branches. if s and si have common prefix, we then need branch out 2 new sub branches. Branch itself is represent with the common prefix, each new branches is represent with the different parts. Note there are 2 special cases. One is that s is the substring of si, the other is that si is the substring of s. Figure 11 shows these different cases.

Figure 11: (a). Insert key, “boy” into an empty Patricia tree, the result is a leaf node;
(b). Insert key, “bool” into (a), result is a branch with common prefix “bo”.
(c). Insert “an”, with value y into node x with prefix “another”.
(d). Insert “another”, into a node with prefix “an”, the key to be inserted update to “other”, and do further insertion.
##### 5.2.1 Iterative insertion algorithm for alphabetic Patricia

The insertion algorithm can be described as below pseudo code.

PATRICIA - INSERT(T,key,value)
if T = NIL then
T NewNode

end if
p = T
loop
match FALSE
for each i in CHILDREN(pdo
if key = KEY (ithen
V ALUE(p) value
return  T

end if
prefix LONGEST - COMMON - PREFIX(key,KEY (i))
key1 key subtract prefix
key2 KEY (i) subtract prefix
if prefixNIL then
match TRUE
if key2 = NIL then
p TREE(i)
key key substract prefix
break

else
CHILDREN(p)[prefix] BRANCH(key1,value,key2,TREE(i))
DELETE CHILDREN(p)[KEY (i)]
return  T

end if

end if
if match = FALSE then
CHILDREN(p)[key] CREATE - LEAF(value)

end if

end for

end loop
return  T

In the above algorithm, LONGEST-COMMON-PREFIX function will find the longest common prefix of two given string, for example, string “bool” and “boy” has longest common prefix “bo”. BRANCH function will create a branch node and update keys accordingly.

##### Insertion of alphabetic Patricia in C++

in C++, to support implicit type conversion we utilize the KeyType and ValueType as parameter types. If we define Patricia¡std::string, std::string¿, we can directly provide char* parameters. the algorithm is implemented as the following.

template <class  K, class  V >
Patricia< K,  V >* insert(Patricia< K,  V >* t,
typename Patricia< K,  V>::KeyType key,
typename Patricia< K,  V>::ValueType value= V()){
if(!t)
t  =   new  Patricia< K,  V>();

Patricia< K,  V >* p  =  t;
typedef typename Patricia< K,  V>::Children::iterator Iterator;
for(;;){
bool match(false);
for(Iterator it  =  p ->children.begin(); it!=p ->children.end();  ++it){
K  k =it->first;
if(key  ==  k){
p ->value  =  value; //overwrite
return t;
}
K  prefix  =  lcp(key, k);
if(!prefix.empty()){
match = true
if(k.empty()){ //e.g. insert another into an
p  =  it->second;
break
}
else{
p ->children[prefix]  =  branch(key,  new  Patricia< K,  V >(value),
k, it->second);
p ->children.erase(it);
return t;
}
}
}
if(!match){
p ->children[key]  =   new  Patricia< K,  V >(value);
break
}
}
return t;
}

Where the lcp and branch functions are defined like this.

template <class  K >
K  lcp(K &  s1,  K &  s2){
typename  K::iterator it1(s1.begin()), it2(s2.begin());
for(; it1!=s1.end()  &&  it2!=s2.end()  &&  *it1  ==  *it2;  ++it1,  ++it2);
K  res(s1.begin(), it1);
s1  =   K(it1, s1.end());
s2  =   K(it2, s2.end());
return res;
}

template <class  T >
T * branch(typename  T::KeyType k1,  T * t1,
typename  T::KeyType k2,  T * t2){
if(k1.empty()){ //e.g. insert an into another
t1->children[k2]  =  t2;
return t1;
}
T * t  =   new   T();
t->children[k1]  =  t1;
t->children[k2]  =  t2;
return t;
}

Function lcp() will extract the longest common prefix and modify its parameters. Function branch() will create a new node and set the 2 nodes to be merged as the children. There is a special case, if the key of one node is sub-string of the other, it will chain them together.

We find the implementation of patricia_to_str() will be very same as trie_to_str(), so we can reuse it. Also the convert from a list of keys to trie can be reused

// list_to_trie
template <class Iterator, class  T >
T * list_to_trie(Iterator first, Iterator last,  T * t){
typedef typename  T::ValueType ValueType;
return std::accumulate(first, last, t,
std::ptr_fun(insert_key< T, ValueType>));
}

We put all of the helper function templates to a utility header file, and we can test patricia insertion program as below.

template <class Iterator>
void test_list_to_patricia(Iterator first, Iterator last){
typedef Patricia<std::string, std::string>  PatriciaType;
PatriciaType* t(0);
t  =  list_to_trie(first, last, t);
std::copy(first, last,
std::ostream_iterator<std::string>(std::cout, ”,_”));
std::cout<<\n = =><<trie_to_str(t)<<\n”;
delete t;
}

void test_insert(){
const char* lst1[]  =  {”a”, ”an”, ”another”, ”b”, ”bob”, ”bool”, ”home”}
test_list_to_patricia(lst1, lst1+sizeof(lst1)/sizeof(char*));

const char* lst2[]  =  {”home”, ”bool”, ”bob”, ”b”, ”another”, ”an”, ”a”}
test_list_to_patricia(lst2, lst2+sizeof(lst2)/sizeof(char*));

const char* lst3[]  =  {”romane”, ”romanus”, ”romulus”}
test_list_to_patricia(lst3, lst3+sizeof(lst3)/sizeof(char*));

typedef Patricia<std::string, std::string>  PatriciaType;
PatriciaType* t(0);
const char* keys[]  =  {”001”, ”100”, ”101”}
const char* vals[]  =  {”y”, ”x”, ”z”}
for(unsigned int i=0; i<sizeof(keys)/sizeof(char*);  ++i)
t  =  insert(t, std::string(keys[i]), std::string(vals[i]));
std::copy(keys, keys+sizeof(keys)/sizeof(char*),
std::ostream_iterator<std::string>(std::cout, ”,_”));
std::cout<< ” = =><<trie_to_str(t)<<\n”;
delete t;
}

Running test_insert() function will generate the following output.

 a, an, another, b, bob, bool, home,  ==>(, (a, (an, (another))), (b, (bo, (bob), (bool))), (home))  home, bool, bob, b, another, an, a,  ==>(, (a, (an, (another))), (b, (bo, (bob), (bool))), (home))  romane, romanus, romulus,  ==>(, (rom, (roman, (romane), (romanus)), (romulus)))  001, 100, 101, ==>(, (001:y), (10, (100:x), (101:z)))

##### Insertion of alphabetic Patrica Tree in Python

By translate the insertion algorithm into Python language, we can get a program as below.

def insert(t, key, value  =  None):
if t is None:
t  =  Patricia()

node  =  t
while(True):
match  =  False
for k, tr in node.children.items():
if key  ==  k:  #  just overwrite
node.value  =  value
return t
(prefix, k1, k2)=lcp(key, k)
if prefix != ””:
match  =  True
if k2  ==  ””:
#  example: insert another into an”, go on traversing
node  =  tr
key  =  k1
break
else# branch out a new leaf
node.children[prefix]  =  branch(k1, Patricia(value), k2, tr)
del node.children[k]
return t
if not match:  #  add a new leaf
node.children[key]  =  Patricia(value)
break
return t

Where the longest common prefix finding and branching functions are implemented as the following.

#  longest common prefix
#  returns (p, s1’, s2’), where p is lcp, s1’=s1- p, s2’=s2- p
def lcp(s1, s2):
j=0
while j<=len(s1) and j<=len(s2) and s1[0:j]==s2[0:j]:
j+=1
j-=1
return (s1[0:j], s1[j:], s2[j:])

def branch(key1, tree1, key2, tree2):
if key1  ==  ””:
# example: insert an into another
tree1.children[key2]  =  tree2
return tree1
t  =  Patricia()
t.children[key1]  =  tree1
t.children[key2]  =  tree2
return t

Function lcp check every characters of two strings are same one by one till it met a different one or either of the string finished.

In order to test the insertion program, some helper functions are provided.

def to_string(t):
return trie_to_str(t)

def list_to_patricia(l):
return from_list(l, insert)

def map_to_patricia( m):
return from_map(m, insert)

We can reuse the trie_to_str since their implementation are same. to_string function can turn a Patricia tree into string by traversing it in pre-order. list_to_patricia helps to convert a list of object into a Patricia tree by repeatedly insert every elements into the tree. While map_to_string does similar thing except it can convert a list of key-value pairs into a Patricia tree.

Then we can test the insertion program with below test cases.

class PatriciaTest:
#...
def test_insert(self):
print ”test_insert”
t  =  list_to_patricia([”a”, ”an”, ”another”, ”b”, ”bob”, ”bool”, ”home”])
print to_string(t)
t  =  list_to_patricia([”romane”, ”romanus”, ”romulus”])
print to_string(t)
t  =  map_to_patricia({”001”:’y’, ”100”:’x’, ”101”:’z’}
print to_string(t)
t  =  list_to_patricia([”home”, ”bool”, ”bob”, ”b”, ”another”, ”an”, ”a”]);
print to_string(t)

These test cases will output a series of result like this.

 (, (a, (an, (another))), (b, (bo, (bob), (bool))), (home))  (, (rom, (roman, (romane), (romanus)), (romulus)))  (, (001:y), (10, (100:x), (101:z)))  (, (a, (an, (another))), (b, (bo, (bob), (bool))), (home))

##### 5.2.2 Recursive insertion algorithm for alphabetic Patricia

The insertion can also be implemented recursively. When doing insertion, the program check all the children of the Patricia node, to see if there is a node can match the key. Match means they have common prefix. One special case is that the keys are same, the program just overwrite the value of that child. If there is no child can match the key, the program create a new leaf, and add it as a new child.

PATRICIA - INSERT(T,key,value)
if T = NIL then
T EmptyNode

end if
p FIND - MATCH(CHILDREN(T),key)
if p = NIL then

else if KEY (p) = key then
V ALUE(p) value

else
q BRANCH(CREATE - LEAF(key,value),p)
DELETE(CHILDREN(T),p)

end if
return  T

The recursion happens inside call to BRANCH. The longest common prefix of 2 nodes are extracted. If the key to be inserted is the sub-string of the node, we just chain them together; If the prefix of the node is the sub-string of the key, we recursively insert the rest of the key to the node. In other case, we create a new node with the common prefix and set its two children.

BRANCH(T1,T2)
prefix LONGEST - COMMON - PREFIX(T1,T2)
p EmptyNode
if prefix = KEY (T1) then
KEY (T2) KEY (T2) subtract prefix
p CREATE - LEAF(prefix,V ALUE(T1))

else if prefix = KEY (T2) then
KEY (T1) KEY (T1) subtract prefix
p PATRICIA - INSERT(T2,KEY (T1),V ALUE(T1))
KEY (p) prefix

else
KEY (T2) KEY (T2) subtract prefix
KEY (T1) KEY (T1) subtract prefix
KEY (p) prefix

end if
return  p

##### Insertion of alphabetic Patrica Tree in Haskell

By implementing the above algorithm in Recursive way, we can get a Haskell program of Patricia insertion.

insert :: Patricia a  ->  Key  ->  a  ->  Patricia a
insert t k x  =  Patricia (value t) (ins (children t) k x) where
ins []     k x  =  [(k, Patricia (Just x) [])]
ins (p:ps) k x
| (fst p)  ==  k
=  (k, Patricia (Just x) (children (snd p))):ps  --overwrite
| match (fst p) k
=  (branch k x (fst p) (snd p)):ps
| otherwise
=  p:(ins ps k x)

Function insert takes a Patricia tree, a key and a value. It will call an internal function ins to insert the data into the children of the tree. If the tree has no children, it simply create a leaf node and put it as the single child of the tree. In other case it will check each child to see if any one has common prefix with the key. There is a special case, the child has very same key, we can overwrite the data. If the child has common prefix is located, we branch out a new node.

A function match is provided to determine if two keys have common prefix as below.

match :: Key  ->  Key  ->  Bool
match [] _  =  False
match _ []  =  False

This function is straightforward, only if the first character of the two keys are identical, we say they have common prefix.

Branch out function and the longest common prefix function are implemented like the following.

branch :: Key  ->  a  ->  Key  ->  Patricia a  ->  (Key, Patricia a)
branch k1 x k2 t2
| k1  ==  k
--  ex: insert an into another
=  (k, Patricia (Just x) [(k2’, t2)])
| k2  ==  k
--  ex: insert another into an
=  (k, insert t2 k1’ x)
| otherwise  =  (k, Patricia Nothing [(k1’, leaf x), (k2’, t2)])
where
k  =  lcp k1 k2
k1’  =  drop (length k) k1
k2’  =  drop (length k) k2

lcp :: Key  ->  Key  ->  Key
lcp [] _  =  []
lcp _ []  =  []
lcp (x:xs) (y:ys)  =  if x == y then x:(lcp xs ys) else []

Function take a key, k1, a value, another key, k2, and a Patricia tree t2. It will first call lcp to get the longest common prefix, k, and the different part of the original key. If k is just k1, which means k1 is a sub-string of k2, we create a new Patricia node, put the new value in it, then set the left part of k2 and t2 as the single child of this new node. If k is just k2, which means k2 is a sub-string of k1, we recursively insert the update key and value to t2. Otherwise, we create a new node, along with the the longest common prefix. the new node has 2 children, one is t2, the other is a leaf node of the data to be inserted. Each of them are binding to updated keys.

In order to test the above program, we provided some helper functions.

fromList :: [(Key, a)]  ->  Patricia a
fromList xs  =  foldl ins empty xs where
ins t (k, v)  =  insert t k v

sort :: (Ord  a)=>[(a, b)]  ->  [(a, b)]
sort []  =  []
sort (p:ps)  =  sort xs  ++  [p]  ++  sort ys where
xs  =  [x | x <- ps, fst x  <=  fst p ]
ys  =  [y | y <- ps, fst y  >  fst p ]

toString :: (Show  a)=>Patricia a  ->  String
toString t  =  toStr t ”” where
toStr t prefix  =  ”(”  ++  prefix  ++  showMaybe (value t)  ++
(concat \$  map  (\(k, v)->”,_”  ++  toStr v (prefix++ k))
(sort \$children t))
++  ”)”
showMaybe Nothing  =  ””
showMaybe (Just x)  =  ”:”  ++  show x

Function fromList can recursively insert each key-value pair into a Patricia node. Function sort helps to sort a list of key-value pairs based on keys by using quick sort algorithm. toString turns a Patricia tree into a string by using modified pre-order.

After that, we can test our insert program with the following test cases.

testPatricia  =  ”t1= ”  ++  (toString t1)  ++  ”\n”  ++
”t2= ”  ++  (toString t2)
where
t1  =  fromList [(”a”, 1), (”an”, 2), (”another”, 7),
(”boy”, 3), (”bool”, 4), (”zoo”, 3)]
t2  =  fromList [(”zoo”, 3), (”bool”, 4), (”boy”, 3),
(”another”, 7), (”an”, 2), (”a”, 1)]

main  =  do
putStrLn testPatricia

No matter what’s the insert order, the 2 test cases output an identical result.

 t1=(, (a:1, (an:2, (another:7))), (bo, (bool:4), (boy:3)), (zoo:3))  t2=(, (a:1, (an:2, (another:7))), (bo, (bool:4), (boy:3)), (zoo:3))

##### Insertion of alphabetic Patrica Tree in Scheme/Lisp

In Scheme/Lisp, If the root doesn’t has child, we create a leaf node with the value, and bind the key with this node; if the key binding to one child is equal to the string we want to insert, we just overwrite the current value; If the key has common prefix of the string to be inserted, we branch out a new node.

(define (insert t k x)
(define (ins lst k x) ;; lst: [(key patrica)]
(if (null? lst) (list (make-child k (make-leaf x)))
(cond ((string=? (key (car lst)) k)
(cons (make-child k (make-trie x (children (tree (car lst)))))
(cdr lst)))
((match? (key (car lst)) k)
(cons (branch k x (key (car lst)) (tree (car lst)))
(cdr lst)))
(else (cons (car lst) (ins (cdr lst) k x))))))
(make-trie (value t) (ins (children t) k x)))

The match? function just test if two strings have common prefix.

(define (match? x y)
(and (not (or (string-null? x) (string-null? y)))
(string=? (string-car x) (string-car y))))

Function branch takes 4 parameters, the first key, the value to be inserted, the second key, and the Patricia tree need to be branch out. If will first find the longest common prefix of the two keys, If it is equal to the first key, it means that the first key is the prefix of the second key, we just create a new node with the value and chained the Patricia tree by setting it as the only one child of this new node; If the longest common prefix is equal to the second key, it means that the second key is the prefix of the first key, we just recursively insert the different part (the remove the prefix part) into this Patricia tree; In other case, we just create a branch node and set its two children, one is the leaf node with the value to be inserted, the other is the Patricia tree passed as the fourth parameter.

(define (branch k1 x k2 t2) ;; returns (key tree)
(let* ((k (lcp k1 k2))
(k1-new (string-tail k1 (string-length k)))
(k2-new (string-tail k2 (string-length k))))
(cond ((string=? k1 k) ;; e.g. insert an into another
(make-child k (make-trie x (list (make-child k2-new t2)))))
((string=? k2 k) ;; e.g. insert another into an
(make-child k (insert t2 k1-new x)))
(else (make-child k (make-trie
’()
(list (make-child k1-new (make-leaf x))
(make-child k2-new t2))))))))

Where the longest common prefix is extracted as the following.

(define (lcp x y)
(let ((len (string-match-forward x y)))

We can reuse the list-¿trie and trie-¿string functions to test our program.

(define (test-patricia)
(define t (list->trie (list ’(”a” 1) ’(”an” 2) ’(”another” 7)
’(”boy” 3) ’(”bool” 4) ’(”zoo” 3))))
(define t2 (list->trie (list ’(”zoo” 3) ’(”bool” 4) ’(”boy” 3)
’(”another” 7) ’(”an” 2) ’(”a” 1))))
(display (trie->string t)) (newline)
(display (trie->string t2)) (newline))

Evaluate this function will print t and t2 as below.

(test-patricia)
(., (a1, (an2, (another7))), (bo., (bool4), (boy3)), (zoo3))
(., (a1, (an2, (another7))), (bo., (bool4), (boy3)), (zoo3))

#### 5.3 Look up in alphabetic Patricia tree

Different with Trie, we can’t take each character from the key to look up. We need check each child to see if it is a prefix of the key to be found. If there is such a child, we then remove the prefix from the key, and search this updated key in that child. if we can’t find any children as a prefix of the key, it means the looking up failed.

##### 5.3.1 Iterative look up algorithm for alphabetic Patricia tree

This algorithm can be described in pseudo code as below.

PATRICIA - LOOK - UP(T,key)
if T = NIL then
return  NIL

end if
repeat
match FALSE
for each i in CHILDREN(Tdo
if key = KEY (ithen
return  V ALUE(TREE(i))

end if
if KEY (i) IS-PREFIX-OF key then
match TRUE
key key subtract KEY (i)
T TREE(i)
break

end if

end for

until match = FALSE
return  NIL

##### Look up in alphabetic Patrica Tree in C++

In C++, we abstract the key type as a template parameter. By refer to the KeyType defined in Patricia, we can get the support of implicitly type conversion. If we can’t find the key in a Patricia tree, the below program returns default value of data type. One alternative is to throw exception.

template <class  K, class  V >
V  lookup(Patricia< K,  V >* t, typename Patricia< K,  V>::KeyType key){
typedef typename Patricia< K,  V>::Children::iterator Iterator;
if(!t)
return  V(); //or throw exception
for(;;){
bool match(false);
for(Iterator it= t->children.begin(); it!=t->children.end();  ++it){
K  k  =  it->first;
if(key  ==  k)
return it-> second->value;
K  prefix  =  lcp(key, k);
if((!prefix.empty())  &&  k.empty()){
match  =  true
t  =  it->second;
break
}
}
if(!match)
return  V(); //or throw exception
}
}

To verify the look up program, we test it with the following simple test cases.

Patricia<std::string, int>* t(0);
const char* keys[]  =  {”a”, ”an”, ”another”, ”boy”, ”bool”, ”home”}
const int vals[]  =  {1, 2, 7, 3, 4, 4}
for(unsigned int i=0; i<sizeof(keys)/sizeof(char*);  ++i)
t  =  insert(t, keys[i], vals[i]);
std::cout<<\nlookup_another:_” <<lookup(t, ”another”)
<<\nlookup_boo:_” <<lookup(t, ”boo”)
<<\nlookup_boy:_” <<lookup(t, ”boy”)
<<\nlookup_by:_” <<lookup(t, ”by”)
<<\nlookup_boolean:_” <<lookup(t, ”boolean”)<<\n”;
delete t;

This program will output the the result like below.

 lookup another: 7  lookup boo: 0  lookup boy: 3  lookup by: 0  lookup boolean: 0

##### Look up in alphabetic Patricia Tree in Python

The implementation of looking up in Python is similar to the pseudo code. Because Python don’t support repeat-until loop directly, a while loop is used instead.

def lookup(t, key):
if t is None:
return None
while(True):
match  =  False
for k, tr in t.children.items():
if k  ==  key:
return tr.value
(prefix, k1, k2)  =  lcp(key, k)
if prefix != ”” and k2  ==  ””:
match  =  True
key  =  k1
t  =  tr
break
if not match:
return None

We can verify the looking up program as below.

class PatriciaTest:
#  ...
def test_lookup(self):
t  =  map_to_patricia({”a”:1, ”an”:2, ”another”:7, ”b”:1, ”bob”:3, \
”bool”:4, ”home”:4}
print ”search_t_another”, lookup(t, ”another”)
print ”search_t_boo”, lookup(t, ”boo”)
print ”search_t_bob”, lookup(t, ”bob”)
print ”search_t_boolean”, lookup(t, ”boolean”)

The test result output in console is like the following.

 search t another 7  search t boo None  search t bob 3  search t boolean None

##### 5.3.2 Recursive look up algorithm for alphabetic Patricia tree

To implement the look up recursively, we just look up among the children of the Patricia tree.

PATRICIA - LOOK - UP(T,key)
if T = NIL then
return  NIL

else
return  FIND - IN - CHILDREN(CHILDREN(T),key)

end if

The real recursion happens in FIND-IN-CHILDREN call, we pass the children list as parameter. If it is not empty, we take first child, and check if the prefix of this child is equal to the key; the value of this child will be returned if they are same; if the prefix of the child is just a prefix of the key, we recursively find in this child with updated key.

FIND - IN - CHILDREN(l,key)
if l = NIL then
return  NIL

else if KEY (FIRST(l)) = key then
return  V ALUE(FIRST(l))

else if KEY (FIRST(l)) is prefix of key then
key key subtract KEY (FIRST(l))
return  PATRICIA - LOOK - UP(FIRST(l),key)

else
return  FIND - IN - CHILDREN(REST(l),key)

end if

##### Look up in alphabetic Patrica Tree in Haskell

In Haskell implementation, the above algorithm should be turned into recursive way.

--  lookup
import qualified Data.List

find :: Patricia a  ->  Key  ->   Maybe  a
find t k  =  find’ (children t) k where
find’ [] _  =  Nothing
find’ (p:ps) k
| (fst p)  ==  k  =  value (snd p)
| (fst p) ‘Data.List.isPrefixOf‘ k  =  find (snd p) (diff (fst p) k)
| otherwise  =  find’ ps k
diff k1 k2  =  drop (length (lcp k1 k2)) k2

When we search a given key in a Patricia tree, we recursively check each of the child. If there are no children at all, we stop the recursion and indicate a look up failure. In other case, we pick the prefix-node pair one by one. If the prefix is as same as the given key, it means the target node is found and the value of the node is returned. If the key has common prefix with the child, the key will be updated by removing the longest common prefix and we performs looking up recursively.

We can verify the above Haskell program with the following simple cases.

testPatricia  =  ”t1= ”  ++  (toString t1)  ++  ”\n”  ++
”find_t1_another_ = ”  ++  (show (find t1 ”another”))  ++  ”\n”  ++
”find_t1_bo_ =_”  ++  (show (find t1 ”bo”))  ++  ”\n”  ++
”find_t1_boy_ =_”  ++  (show (find t1 ”boy”))  ++  ”\n”  ++
”find_t1_boolean_ =_”  ++  (show (find t1 ”boolean”))
where
t1  =  fromList [(”a”, 1), (”an”, 2), (”another”, 7), (”boy”, 3),
(”bool”, 4), (”zoo”, 3)]

main  =  do
putStrLn testPatricia

The output is as below.

 t1=(, (a:1, (an:2, (another:7))), (bo, (bool:4), (boy:3)), (zoo:3))  find t1 another =Just 7  find t1 bo = Nothing  find t1 boy = Just 3  find t1 boolean = Nothing

##### Look up in alphabetic Patricia Tree in Scheme/Lisp

The Scheme/Lisp program is given as the following. The function delegate the looking up to an inner function which will check each child to see if the key binding to the child match the string we are looking for.

(define (lookup t k)
(define (find lst k) ;; lst, [(k patricia)]
(if (null? lst) ’()
(cond ((string=? (key (car lst)) k) (value (tree (car lst))))
((string-prefix? (key (car lst)) k)
(lookup (tree (car lst))
(string-tail k (string-length (key (car lst))))))
(else (find (cdr lst) k)))))
(find (children t) k))

In order to verify this program, some simple test cases are given to search in the Patricia we created in previous section.

(define (test-patricia)
(define t (list->trie (list ’(”a” 1) ’(”an” 2) ’(”another” 7)
’(”boy” 3) ’(”bool” 4) ’(”zoo” 3))))
(display (trie->string t)) (newline)
(display ”lookup_another:_”) (display (lookup t ”another”)) (newline)
(display ”lookup_bo:_”) (display (lookup t ”bo”)) (newline)
(display ”lookup_boy:_”) (display (lookup t ”boy”)) (newline)
(display ”lookup_by:_”) (display (lookup t ”by”)) (newline)
(display ”lookup_boolean:_”) (display (lookup t ”boolean”)) (newline))

This program will output the same result as the Haskell one.

(test-patricia)
(., (a1, (an2, (another7))), (bo., (bool4), (boy3)), (zoo3))
lookup another: 7
lookup bo: ()
lookup boy: 3
lookup by: ()
lookup boolean: ()

### 6 Trie and Patricia used in Industry

Trie and Patricia are widely used in software industry. Integer based Patricia tree is widely used in compiler. Some daily used software has very interesting features can be realized with Trie and Patricia. In the following sections, I’ll list some of them, including, e-dictionary, word auto-completion, t9 input method etc. The commercial implementation typically doesn’t adopt Trie or Patricia directly. However, Trie and Patricia can be shown as a kind of example realization.

#### 6.1 e-dictionary and word auto-completion

Figure 12 shows a screen shot of an English-Chinese dictionary. In order to provide good user experience, when user input something, the dictionary will search its word library, and list all candidate words and phrases similar to what user have entered.
 Figure 12: e-dictionary. All candidates starting with what user input are listed.

Typically such dictionary contains hundreds of thousands words, performs a whole word search is expensive. Commercial software adopts complex approach, including caching, indexing etc to speed up this process.

Similar with e-dictionary, figure 13 shows a popular Internet search engine, when user input something, it will provide a candidate lists, with all items start with what user has entered. And these candidates are shown in an order of popularity. The more people search for a word, the upper position it will be shown in the list.
 Figure 13: Search engine. All candidates key words starting with what user input are listed.

In both case, we say the software provide a kind of word auto-completion support. In some modern IDE, the editor can even helps user to auto-complete programmings.

In this section, I’ll show a very simple implementation of e-dictionary with Trie and Patricia. To simplify the problem, let us assume the dictionary only support English - English information.

Typically, a dictionary contains a lot of key-value pairs, the keys are English words or phrases, and the relative values are the meaning of the words.

We can store all words and their meanings to a Trie, the drawback for this approach is that it isn’t space effective. We’ll use Patricia as alternative later on.

As an example, when user want to look up ’a’, the dictionary does not only return the meaning of then English word ’a’, but also provide a list of candidate words, which are all start with ’a’, including ’abandon’, ’about’, ’accent’, ’adam’, ... Of course all these words are stored in Trie.

If there are too many candidates, one solution is only display the top 10 words for the user, and if he like, he can browse more.

Below pseudo code reuse the looking up program in previous sections and Expand all potential top N candidates.

TRIE - LOOK - UP - TOP - N(T,key,N)
p TRIE - LOOK - UP(T,key)
return  EXPAND - TOP - N(p,key,N)

Note that we should modify the TRIE-LOOK-UP a bit, instead of return the value of the node, TRIE-LOOK-UP’ returns the node itself.

Another alternative is to use Patricia instead of Trie. It can save much spaces.

##### 6.1.1 Iterative algorithm of search top N candidate in Patricia

The algorithm is similar to the Patricia look up one, but when we found a node which key start from the string we are looking for, we expand all its children until we get N candidates.

PATRICIA - LOOK - UP - TOP - N(T,key,N)
if T = NIL then
return  NIL

end if
prefix NIL
repeat
match FALSE
for each i in CHILDREN(Tdo
if key IS-PREFIX-OF KEY (ithen
return  EXPAND - TOP - N(TREE(i),prefix,N)

end if
if KEY (i) IS-PREFIX-OF key then
match TRUE
key key subtract KEY (i)
T TREE(i)
prefix prefix + KEY (i)
break

end if

end for

until match = FALSE
return  NIL

##### An e-dictionary in Python

In Python implementation, a function trie_lookup is provided to perform search all top N candidate started with a given string.

def trie_lookup(t, key, n):
if t is None:
return None

p  =  t
for c in key:
if not c in p.children:
return None
p = p.children[c]
return expand(key, p, n)

def expand(prefix, t, n):
res  =  []
q  =  [(prefix, t)]
while len(res)<and len(q)>0:
(s, p)  =  q.pop(0)
if p.value is not None:
res.append((s, p.value))
for k, tr in p.children.items():
q.append((s+ k, tr))
return res

Compare with the Trie look up function, the first part of this program is almost same. The difference part is after we successfully located the node which matches the key, all sub trees are expanded from this node in a bread-first search manner, and the top n candidates are returned.

This program can be verified by below simple test cases.

class LookupTest:
def __init__(self):
dict  =  {”a”:”the_first_letter_of_English”, \
self.tt  =  trie.map_to_trie(dict)

def run(self):
self.test_trie_lookup()

def test_trie_lookup(self):
print ”test_lookup_top_5”
print ”search_a_”, trie_lookup(self.tt, ”a”, 5)
print ”search_ab_”, trie_lookup(self.tt, ”ab”, 5)

The test will output the following result.

 test lookup to 5  search a  [(’a’, ’the first letter of English’), (’an’, "used instead of ’a’  when the following word begins with a vowel sound"), (’adam’, ’a character in  the Bible who was the first man made by God’), (’about’, ’on the subject of;  connected with’), (’abandon’, ’to leave a place, thing or person forever’)]  search ab  [(’about’, ’on the subject of; connected with’), (’abandon’, ’to  leave a place, thing or person forever’)]

To save the spaces, we can also implement such a dictionary search by using Patricia.

def patricia_lookup(t, key, n):
if t is None:
return None
prefix  =  ””
while(True):
match  =  False
for k, tr in t.children.items():
if string.find(k, key)  ==  0:  #is prefix of
return expand(prefix+ k, tr, n)
if string.find(key, k) ==0:
match  =  True
key  =  key[len(k):]
t  =  tr
prefix  +=  k
break
if not match:
return None

In this program, we called Python string class to test if a string x is prefix of a string y. In case we locate a node with the key we are looking up is either equal of as prefix of the this sub tree, we expand it till we find n candidates. Function expand() can be reused here.

We can test this program with the very same test cases and the results are identical to the previous one.

##### An e-dictionary in C++

In C++ implementation, we overload the look up function by providing an extra integer n to indicate we want to search top n candidates. the result is a list of key-value pairs,

//lookup top n candidate with prefix key in Trie
template <class  K, class  V >
std::list<std::pair< K,  V >   >  lookup(Trie< K,  V >* t,
typename Trie< K,  V>::KeyType key,
unsigned int n)
{
typedef std::list<std::pair< K,  V >   >  Result;
if(!t)
return Result();

Trie< K,  V >* p(t);
for(typename  K::iterator it= key.begin(); it!=key.end();  ++it){
if(p ->children.find(*it)  ==  p ->children.end())
return Result();
p  =  p ->children[*it];
}
return expand(key, p, n);
}

The program is almost same as the Trie looking up one, except it will call expand function when it located the node with the key. Function expand is as the following.

template <class  T >
std::list<std::pair< typename  T::KeyType, typename  T::ValueType>   >
expand(typename  T::KeyType prefix,  T * t, unsigned int n)
{
typedef typename  T::KeyType KeyType;
typedef typename  T::ValueType ValueType;
typedef std::list<std::pair< KeyType, ValueType>   >  Result;

Result res;
std::queue<std::pair< KeyType,  T *>  >  q;
q.push(std::make_pair(prefix, t));
while(res.size()<n  &&  (!q.empty())){
std::pair< KeyType,  T *> i  =  q.front();
KeyType s  =  i.first;
T * p  =  i.second;
q.pop();
if(p ->value != ValueType()){
res.push_back(std::make_pair(s, p ->value));
}
for(typename  T::Children::iterator it  =  p ->children.begin();
it!=p ->children.end();  ++it)
q.push(std::make_pair(s+it->first, it->second));
}
return res;
}

This function use a bread-first search approach to expand top N candidates, it maintain a queue to store the node it is currently dealing with. Each time the program picks a candidate node from the queue, expands all its children and put them to the queue. the program will terminate when the queue is empty or we have already found N candidates.

Function expand is generic we’ll use it in later sections.

Then we can provide a helper function to convert the candidate list to readable string. Note that this list is actually a list of pairs so we can provide a generic function.

//list of pairs to string
template <class Container>
std::string lop_to_str(Container coll){
typedef typename Container::iterator Iterator;
std::ostringstream s;
s<< ”[”;
for(Iterator it=coll.begin(); it!=coll.end();  ++it)
s<< ”(” <<it->first<< ”,_” <<it-> second<< ”),_”;
s<< ”]”;
return s.str();
}

After that, we can test the program with some simple test cases.

Trie<std::string, std::string>* t(0);
const char* dict[]  =  {
”a”, ”the_first_letter_of_English”, \
”another”, ”one_more_person_or_thing_or_an_extra_amount”, \
”abandon”, ”to_leave_a_place,_thing_or_person_forever”, \
”boy”, ”a_male_child_or,_more_generally,_a_male_of_any_age”, \
”body”, ”the_whole_physical_structure_that_forms_a_person_or_animal”, \
”zoo”, ”an_area_in_which_animals,_especially_wild_animals,_are_kept” \
”so_that_people_can_go_and_look_at_them,_or_study_them”}

const char** first=dict;
const char** last  =dict  +  sizeof(dict)/sizeof(char*);
for(;first!=last;  ++first,  ++first)
t  =  insert(t, *first, *(first+1));
}

std::cout<< ”test_lookup_top_5_in_Trie\n”
<< ”search_a_” <<lop_to_str(lookup(t, ”a”, 5))<<\n”
<< ”search_ab_” <<lop_to_str(lookup(t, ”ab”, 5))<<\n”;
delete t;

The result print to the console is something like this:

 test lookup top 5 in Trie  search a [(a, the first letter of English), (an, used instead of ’a’  when the following word begins with a vowel sound), (adam, a character  in the Bible who was the first man made by God), (about, on the  subject of; connected with), (abandon, to leave a place, thing or  person forever), ]  search ab [(about, on the subject of; connected with), (abandon, to  leave a place, thing or person forever), ]

To save the the space with Patricia, we provide a C++ program to search top N candidate as below.

template <class  K, class  V >
std::list<std::pair< K,  V >   >  lookup(Patricia< K,  V >* t,
typename Patricia< K,  V>::KeyType key,
unsigned int n)
{
typedef typename std::list<std::pair< K,  V >   >  Result;
typedef typename Patricia< K,  V>::Children::iterator Iterator;
if(!t)
return Result();
K  prefix;
for(;;){
bool match(false);
for(Iterator it= t->children.begin(); it!=t->children.end();  ++it){
K  k(it->first);
if(is_prefix_of(key, k))
return expand(prefix+ k, it-> second, n);
if(is_prefix_of(k, key)){
match  =  true
prefix  +=  k;
lcp< K >(key, k); //update key
t  =  it->second;
break
}
}
if(!match)
return Result();
}
}

The program iterate all children if the string we are looked up is prefix of one child, we expand this child to find top N candidates; If the in the opposite case, we update the string and go on examine into this child Patricia tree.

Where the function is_prefix_of() is defined as below.

// x is prefix of y?
template <class  T >
bool is_prefix_of(T  x,  T  y){
if(x.size()<=y.size())
return std::equal(x.begin(), x.end(), y.begin());
return false
}
\end{lstlisitng}

We  use STL equal function to check if x is prefix of y.

The test case is nearly same as the one in Trie.

\begin{lstlisting}
Patricia<std::string, std::string>* t(0);
const char* dict[]  =  {
”a”, ”the_first_letter_of_English”, \
”another”, ”one_more_person_or_thing_or_an_extra_amount”, \
”abandon”, ”to_leave_a_place,_thing_or_person_forever”, \
”boy”, ”a_male_child_or,_more_generally,_a_male_of_any_age”, \
”body”, ”the_whole_physical_structure_that_forms_a_person_or_animal”, \
”zoo”, ”an_area_in_which_animals,_especially_wild_animals,_are_kept” \
”so_that_people_can_go_and_look_at_them,_or_study_them”}

const char** first=dict;
const char** last  =dict  +  sizeof(dict)/sizeof(char*);
for(;first!=last;  ++first,  ++first)
t  =  insert(t, *first, *(first+1));
}

std::cout<< ”test_lookup_top_5_in_Trie\n”
<< ”search_a_” <<lop_to_str(lookup(t, ”a”, 5))<<\n”
<< ”search_ab_” <<lop_to_str(lookup(t, ”ab”, 5))<<\n”;
delete t;

This test case will output a very same result in console.

##### 6.1.2 Recursive algorithm of search top N candidate in Patricia

This algorithm can also be implemented recursively, if the string we are looking for is empty, we expand all children until we get N candidates. else we recursively examine the children of the node to see if we can find one has prefix as this string.

PATRICIA - LOOK - UP - TOP - N(T,key,N)
if T = NIL then
return  NIL

end if
if KEY = NIL then
return  EXPAND - TOP - N(T,NIL,N)

else
return  FIND - IN - CHILDREN - TOP - N(CHILDREN(T),key,N)

end if
FIND - IN - CHILDREN - TOP - N(l,key,N)
if l = NIL then
return  NIL

else if KEY (FIRST(l)) = key then
return  EXPAND - TOP - N(FIRST(l),key,N)

else if KEY (FIRST(l) is prefix of key then
return  PATRICIA-LOOK-UP -TOP -N(FIRST(l),key subtract KEY (FIRST(l)))

else if key is prefix of KEY (FIRST(lthen
return  PATRICIA - LOOK - UP - TOP - N(FIRST(l),NIL,N)

else
return  FIND - IN - CHILDREN - TOP - N(REST(l),key,N)

end if

In Haskell implementation, we provide a function named as findAll. Thanks for the lazy evaluation support, findAll won’t produce all candidates words until we need them. we can use something like ’take 10 findAll’ to get the top 10 words easily.

findAll is given as the following.

findAll:: Trie a  ->  String  ->  [(String, a)]
findAll t []  =
case value t of
Nothing  ->  enum (children t)
Just x   ->  (””, x):(enum (children t))
where
enum []  =  []
enum (p:ps)  =  (mapAppend (fst p) (findAll (snd p) []))  ++  (enum ps)
findAll t (k:ks)  =
case lookup k (children t) of
Nothing  ->  []
Just t’  ->  mapAppend k (findAll t’ ks)

mapAppend x lst  =   map  (\p->(x:(fst p), (snd p))) lst

function findAll take a Trie, a word to be looked up, it will output a list of pairs, the first element of the pair is the candidate word, the second element of the pair is the meaning of the word.

Compare with the find function of Trie, the none-trivial case is very similar. We take a letter form the words to be looked up, if there is no child starting with this letter, the program returns empty list. If there is such a child starting with this letter, this child should be a candidate. We use function mapAppend to add this letter in front of all elements of recursively founded candidate words.

In case we consumed all letters, we next returns all potential words, which means the program will traverse all children of the current node.

Note that only the node with value field not equal to ’None’ is a meaningful word in our dictionary. We need append the list with the right meaning.

With this function, we can construct a very simple dictionary and return top 5 candidate to user. Here is the test program.

testFindAll  =  ”\nlook_up_a:_”  ++  (show \$ take 5 \$findAll t ”a”)  ++
”\nlook_up_ab:_”  ++  (show \$ take 5 \$findAll t ”ab”)
where
t  =  fromList [
(”a”, ”the_first_letter_of_English”),
”a_vowel_sound”),
(”another”, ”one_more_person_or_thing_or_an_extra_amount”),
(”abandon”, ”to_leave_a_place,_thing_or_person_forever”),
(”boy”, ”a_male_child_or,_more_generally,_a_male_of_any_age”),
(”body”, ”the_whole_physical_structure_that_forms_a_person_or_animal”),
(”zoo”, ”an_area_in_which_animals,_especially_wild_animals,_are_kept”
”_so_that_people_can_go_and_look_at_them,_or_study_them”)]

main  =  do
putStrLn testFindAll

This program will out put a result like this:

 look up a: [("a","the first letter of English"),("an","used instead of ’a’  when the following word begins with a vowel sound"),("another","one more  person or thing or an extra amount"),("abandon","to leave a place, thing  or person forever"),("about","on the subject of; connected with")]  look up ab: [("abandon","to leave a place, thing or person forever"),  ("about","on the subject of; connected with")]

The Trie solution wasts a lot of spaces. It is very easy to improve the above program with Patricia. Below source code shows the Patricia approach.

findAll’ :: Patricia a  ->  Key  ->  [(Key, a)]
findAll’ t []  =
case value t of
Nothing  ->  enum \$ children t
Just x   ->  (””, x):(enum \$ children t)
where
enum []  =  []
enum (p:ps)  =  (mapAppend’ (fst p) (findAll’ (snd p) []))  ++  (enum ps)
findAll’ t k  =  find’ (children t) k where
find’ [] _  =  []
find’ (p:ps) k
| (fst p)  ==  k
=  mapAppend’ k (findAll’ (snd p) [])
| (fst p) ‘Data.List.isPrefixOf‘ k
=  mapAppend’ (fst p) (findAll’ (snd p) (k ‘diff‘ (fst p)))
| k ‘Data.List.isPrefixOf‘ (fst p)
=  findAll’ (snd p) []
| otherwise  =  find’ ps k
diff x y  =  drop (length y) x

mapAppend’ s lst  =   map  (\p->(s++(fst p), snd p)) lst

If compare this program with the one implemented by Trie, we can find they are very similar to each other. In none-trivial case, we just examine each child to see if any one match the key to be looked up. If one child is exactly equal to the key, we then expand all its sub branches and put them to the candidate list. If the child correspond to a prefix of the key, the program goes on find the the rest part of the key along this child and concatenate this prefix to all later results. If the current key is prefix to a child, the program will traverse this child and return all its sub branches as candidate list.

This program can be tested with the very same case as above, and it will output the same result.

##### An e-dictionary in Scheme/Lisp

In Scheme/Lisp implementation with Trie, a function named find is used to search all candidates start with a given string. If the string is empty, the program will enumerate all sub trees as result; else the program calls an inner function find-child to search a child which matches the first character of the given string. Then the program recursively apply the find function to this child with the rest characters of the string to be searched.

(define (find t k)
(define (find-child lst k)
(tree (find-matching-item lst (lambda (c) (string=? (key c) k)))))
(if (string-null? k)
(enumerate t)
(let ((t-new (find-child (children t) (string-car k))))
(if (null? t-new) ’()
(map-string-append (string-car k) (find t-new (string-cdr k)))))))

Note that the map-string-append will insert the first character to all the elements (more accurately, each element is a pair with a key and a value, map-string-append insert the character in front of each key) in the result returned by recursive call. It is defined like this.

(define (map-string-append x lst) ;; lst: [(key value)]
(map  (lambda (p) (cons (string-append x (car p)) (cdr p))) lst))

The enumerate function which can expend all sub trees is implemented as the following.

(define (enumerate t) ;; enumerate all sub trees
(if (null? t) ’()
(let ((res (append-map
(lambda (p)(map-string-append (key p)(enumerate (tree p))))
(children t))))
(if (null? (value t)) res
(cons (cons ”” (value t)) res)))))

The test case is a very simple list of word-meaning pairs.

(define dict
(list ’(”a” ”the_first_letter_of_English”)
’(”another” ”one_more_person_or_thing_or_an_extra_amount”)
’(”abandon” ”to_leave_a_place,_thing_or_person_forever”)
’(”boy” ”a_male_child_or,_more_generally,_a_male_of_any_age”)
’(”body” ”the_whole_physical_structure_that_forms_a_person_or_animal”)
’(”zoo” ”an_area_in_which_animals,_especially_wild_animals,

__are_kept_so_that_people_can_go_and_look_at_them,_or_study_them”)))

After feed this dict to a Trie, if user tries to find ’a*’ or ’ab*’ like below.

(define (test-trie-find-all)
(define t (list->trie dict))
(display ”find_a*:_”) (display (find t ”a”)) (newline)
(display ”find_ab*:_”) (display (find t ”ab”)) (newline))

The result is a list with all candidates start with the given string.

 (test-trie-find-all)  find a*: ((a . the first letter of English) (an . used instead of ’a’  when the following word begins with a vowel sound) (another . one more  person or thing or an extra amount) (abandon . to leave a place, thing  or person forever) (about . on the subject of; connected with) (adam  . a character in the Bible who was the first man made by God))  find ab*: ((abandon . to leave a place, thing or person forever)  (about . on the subject of; connected with))

Trie approach isn’t space effective. Patricia can be one alternative to improve in terms of space.

We can fully reuse the function enumerate, map-string-append which are defined for trie. the find function for Patricia is implemented as the following.

(define (find t k)
(define (find-child lst k)
(if (null? lst) ’()
(cond ((string=? (key (car lst)) k)
(map-string-append k (enumerate (tree (car lst)))))
((string-prefix? (key (car lst)) k)
(let ((k-new (string-tail k (string-length (key (car lst))))))
(map-string-append (key (car lst)) (find (tree (car lst)) k-new))))
((string-prefix? k (key (car lst))) (enumerate (tree (car lst))))
(else (find-child (cdr lst) k)))))
(if (string-null? k)
(enumerate t)
(find-child (children t) k)))

If the same test cases of search all candidates of ’a*’ and ’ab*’ are fed we can get a very same result.

#### 6.2 T9 input method

Most mobile phones around year 2000 has a key pad. To edit a short message/email with such key-pad, users typically have quite different experience from PC. Because a mobile-phone key pad, or so called ITU-T key pad has few keys. Figure fig:itut-keypad shows an example.
 Figure 14: an ITU-T keypad for mobile phone.

There are typical 2 methods to input an English word/phrase with ITU-T key pad. For instance, if user wants to enter a word “home”, He can press the key in below sequence.

• Press key ’4’ twice to enter the letter ’h’;
• Press key ’6’ three times to enter the letter ’o’;
• Press key ’6’ twice to enter the letter ’m’;
• Press key ’3’ twice to enter the letter ’e’;

Another high efficient way is to simplify the key press sequence like the following.

• Press key ’4’, ’6’, ’6’, ’3’, word “home” appears on top of the candidate list;
• Press key ’*’ to change a candidate word, so word “good” appears;
• Press key ’*’ again to change another candidate word, next word “gone” appears;
• ...

Compare these 2 method, we can see method 2 is much easier for the end user, and it is operation efficient. The only overhead is to store a candidate words dictionary.

Method 2 is called as T9 input method, or predictive input method [6], [7]. The abbreviation ’T9’ stands for ’textonym’. In this section, I’ll show an example implementation of T9 by using Trie and Patricia.

In order to provide candidate words to user, a dictionary must be prepared in advance. Trie or Patricia can be used to store the Dictionary. In the real commercial software, complex indexing dictionary is used. We show the very simple Trie and Patricia only for illustration purpose.

##### 6.2.1 Iterative algorithm of T9 looking up

Below pseudo code shows how to realize T9 with Trie.

TRIE - LOOK - UP - T9(T,key)
PUSH - BACK(Q,NIL,key,T)
r NIL
while Q is not empty do
p,k,t POP - FRONT(Q)
i FIRST - LETTER(k)
for each c in T9 - MAPPING(ido
if c is in CHILDREN(tthen
k′← k subtract i
if kis empty then
APPEND(r,p + c)

else
PUSH - BACK(Q,p + c,k,CHILDREN(t)[c])

end if

end if

end for

end while
return  r

This is actually a bread-first search program. It utilizes a queue to store the current node and key string we are examining. The algorithm takes the first digit from the key, looks up it in T9 mapping to get all English letters corresponding to this digit. For each letter, if it can be found in the children of current node, the node along with the English string found so far are push back to the queue. In case all digits are examined, a candidate is found. We’ll append this candidate to the result list. The loop will terminate when the queue is empty.

Since Trie is not space effective, minor modification of the above program can work with Patricia, which can help to save extra spaces.

PATRICIA - LOOK - UP - T9(T,key)
PUSH - BACK(Q,NIL,key,T)
r NIL
while Q is not empty do
p,k,t POP - FRONT(Q)
for each child in CHILDREN(tdo
k′← CONV ERT - T9(KEY (child))
if kIS-PREFIX-OF k then
if k= k then
APPEND(r,p + KEY (child))

else
PUSH - BACK(Q,p + KEY (child),k - k,child)

end if

end if

end for

end while
return  r

##### T9 implementation in Python

In Python implementation, T9 looking up is realized in a typical bread-first search algorithm as the following.

T9MAP ={’2’:”abc”, ’3’:”def”, ’4’:”ghi”, ’5’:”jkl”, \
’6’:”mno”, ’7’:”pqrs”, ’8’:”tuv”, ’9’:”wxyz”}

def trie_lookup_t9(t, key):
if t is None or key  ==  ””:
return None
q  =  [(””, key, t)]
res  =  []
while len(q)>0:
(prefix, k, t)  =  q.pop(0)
i= k[0]
if not i in  T9MAP:
return None  #invalid input
for c in  T9MAP[i]:
if c in t.children:
if k[1:]==””:
res.append((prefix+ c, t.children[c].value))
else
q.append((prefix+ c, k[1:], t.children[c]))
return res

Function trie_lookup_t9 check if the parameters are valid first. Then it push the initial data into a queue. The program repeatedly pop the item from the queue, including what node it will examine next, the number sequence string, and the alphabetic string it has been searched.

For each popped item, the program takes the next digit from the number sequence, and looks up in T9 map to find the corresponding English letters. With all these letters, if they can be found in the children of the current node, we’ll push this child along with the updated number sequence string and updated alphabetic string into the queue. In case we process all numbers, we find a candidate result.

We can verify the above program with the following test cases.

class LookupTest:
def __init__(self):
t9dict  =  [”home”, ”good”, ”gone”, ”hood”, ”a”, ”another”, ”an”]
self.t9t  =  trie.list_to_trie(t9dict)

def test_trie_t9(self):
print ”search_4_”, trie_lookup_t9(self.t9t, ”4”)
print ”search_46_”, trie_lookup_t9(self.t9t, ”46”)
print ”search_4663_”, trie_lookup_t9(self.t9t, ”4663”)
print ”search_2_”, trie_lookup_t9(self.t9t, ”2”)
print ”search_22_”, trie_lookup_t9(self.t9t, ”22”)

If we run the test, it will output a very same result as the above Haskell program.

 search 4  [(’g’, None), (’h’, None)]  search 46  [(’go’, None), (’ho’, None)]  search 4663  [(’gone’, None), (’good’, None), (’home’, None), (’hood’, None)]  search 2  [(’a’, None)]  search 22  []

To save the spaces, Patricia can be used instead of Trie.

def patricia_lookup_t9(t, key):
if t is None or key  ==  ””:
return None
q  =  [(””, key, t)]
res  =  []
while len(q)>0:
(prefix, key, t)  =  q.pop(0)
for k, tr in t.children.items():
digits  =  toT9(k)
if string.find(key, digits)==0:  #is prefix of
if key  ==  digits:
res.append((prefix+ k, tr.value))
else
q.append((prefix+ k, key[len(k):], tr))
return res

Compare to the implementation with Trie, they are very similar. We also used a bread-first search approach. The different part is that we convert the string of each child to number sequence string according to T9 mapping. if it is prefix of the key we are looking for, we push this child along with updated key and prefix. In case we examined all digits, we find a candidate result.

The convert function is a reverse mapping process as below.

def toT9(s):
res= ””
for c in s:
for k, v in  T9MAP.items():
if string.find(v, c)>=0:
res+= k
break
#error handling skipped.
return res

For illustration purpose, the error handling for invalid letters is skipped. If we feed the program with the same test cases, we can get a result as the following.

 search 4  []  search 46  [(’go’, None), (’ho’, None)]  search 466  []  search 4663  [(’good’, None), (’gone’, None), (’home’, None), (’hood’, None)]  search 2  [(’a’, None)]  search 22  []

The result is slightly different from the one output by Trie. The reason is as same as what we analyzed in Haskell implementation. It is easily to modify the program to output a similar result.

##### T9 implemented in C++

First we define T9 mapping as a Singleton object, this is because we want to it can be used both in Trie look up and Patricia look up programs.

struct t9map{
typedef std::map < char, std::string>   Map;
Map  map;

t9map(){
map[’2’]=”abc”;
map[’3’]=”def”;
map[’4’]=”ghi”;
map[’5’]=”jkl”;
map[’6’]=”mno”;
map[’7’]=”pqrs”;
map[’8’]=”tuv”;
map[’9’]=”wxyz”;
}

static t9map &  inst(){
static t9map i;
return i;
}
};

Note in other languages or keypad layout, we can define different mappings and pass them as a parameter to the looking up function.

With this mapping, the looking up in Trie can be given as below. Although we want to keep the genericity of the program, for illustration purpose, we just simply use the t9 mapping directly.

In order to keep the code as short as possible, a boost library tool, boost::tuple is used. For more about boost::tuple, please refer to [8].

template <class  K, class  V >
std::list<std::pair< K,  V >   >  lookup_t9(Trie< K,  V >* t,
typename Trie< K,  V>::KeyType key)
{
typedef std::list<std::pair< K,  V >   >  Result;
typedef typename Trie< K,  V>::KeyType Key;
typedef typename Trie< K,  V>::Char Char;

if((!t) || key.empty())
return Result();

Key prefix;
std::map < Char, Key >   m   =  t9map::inst().map;
std::queue<boost::tuple< Key, Key, Trie< K,  V >*>   >  q;
q.push(boost::make_tuple(prefix, key, t));
Result res;
while(!q.empty()){
boost::tie(prefix, key, t)  =  q.front();
q.pop();
Char c  =  *key.begin();
key  =  Key(key.begin()+1, key.end());
if( m.find(c)  ==   m.end())
return Result();
Key cs  =   m[c];
for(typename Key::iterator it=cs.begin(); it!=cs.end();  ++it)
if(t->children.find(*it)!=t->children.end()){
if(key.empty())
res.push_back(std::make_pair(prefix+*it, t->children[*it]->value));
else
q.push(boost::make_tuple(prefix+*it, key, t->children[*it]));
}
}
return res;
}

This program will first check if the Patricia tree or the key are empty to deal with with trivial case. It next initialize a queue, and push one tuple to it. the tuple contains 3 elements, a prefix to represent a string the program has been searched, current key it need look up, and a node it will examine.

Then the program repeatedly pops the tuple from the queue, takes the first character from the key, and looks up in T9 map to get a candidate English letter list. With each letter in this list, the program examine if it exists in the children of current node. In case it find such a child, if there is no left letter to look up, it means we found a candidate result, we push it to the result list. Else, we create a new tuple with updated prefix, key and this child; the push it to the queue for later process.

Below are some simple test cases for verification.

Trie<std::string, std::string>* t9trie(0);
const char* t9dict[]  =  {”home”, ”good”, ”gone”, ”hood”, ”a”, ”another”, ”an”}
t9trie  =  list_to_trie(t9dict, t9dict+sizeof(t9dict)/sizeof(char*), t9trie);
std::cout<< ”test_t9_lookup_in_Trie\n”
<< ”search_4_” <<lop_to_str(lookup_t9(t9trie, ”4”))<<\n”
<< ”serach_46_” <<lop_to_str(lookup_t9(t9trie, ”46”))<<\n”
<< ”serach_4663_” <<lop_to_str(lookup_t9(t9trie, ”4663”))<<\n”
<< ”serach_2_” <<lop_to_str(lookup_t9(t9trie, ”2”))<<\n”
<< ”serach_22_” <<lop_to_str(lookup_t9(t9trie, ”22”))<<\n\n”;
delete t9trie;

It will output the same result as the Python program.

 test t9 lookup in Trie  search 4 [(g, ), (h, ), ]  serach 46 [(go, ), (ho, ), ]  serach 4663 [(gone, ), (good, ), (home, ), (hood, ), ]  serach 2 [(a, ), ]  serach 22 []

In order to save space, a looking up program for Patricia is also provided.

template <class  K, class  V >
std::list<std::pair< K,  V >   >  lookup_t9(Patricia< K,  V >* t,
typename Patricia< K,  V>::KeyType key)
{
typedef std::list<std::pair< K,  V >   >  Result;
typedef typename Patricia< K,  V>::KeyType Key;
typedef typename Key::value_type Char;
typedef typename Patricia< K,  V>::Children::iterator Iterator;

if((!t) || key.empty())
return Result();

Key prefix;
std::map < Char, Key >   m   =  t9map::inst().map;
std::queue<boost::tuple< Key, Key, Patricia< K,  V >*>   >  q;
q.push(boost::make_tuple(prefix, key, t));
Result res;
while(!q.empty()){
boost::tie(prefix, key, t)  =  q.front();
q.pop();
for(Iterator it= t->children.begin(); it!=t->children.end();  ++it){
Key digits  =  t9map::inst().toT9(it->first);
if(is_prefix_of(digits, key)){
if(digits  ==  key)
res.push_back(std::make_pair(prefix+it->first, it-> second->value));
else{
key  = Key(key.begin()+it->first.size(), key.end());
q.push(boost::make_tuple(prefix+it->first, key, it->second));
}
}
}
}
return res;
}

The program is similar to the one with Trie very much. This is a typical bread-first search approach. Note that we added a member function to_t9() to convert a English word/phrase back to digit number string. This member function is implemented as the following.

struct t9map{
//...
std::string to_t9(std::string s){
std::string res;
for(std::string::iterator c =s.begin(); c!=s.end();  ++ c){
for(Map::iterator  m = map.begin();  m !=map.end();  ++ m ){
std::string val  =   m ->second;
if(std::find(val.begin(), val.end(), *c)!=val.end()){
res.push_back(m ->first);
break
}
}
} // skip error handling.
return res;
}

The error handling for invalid letters is omitted in order to keep the code short for easy understanding. We can use the very similar test cases as above except we need change the Trie to Patrica. It will output as below.

 test t9 lookup in Patricia  search 4 []  serach 46 [(go, ), (ho, ), ]  serach 466 []  serach 4663 [(gone, ), (good, ), ]  serach 2 [(a, ), ]  serach 22 []

The result is slightly different, please refer to the Haskell section for the reason of this difference. It is very easy to modify the program to output the very same result as Trie’s one.

##### 6.2.2 Recursive algorithm of T9 looking up

In Haskell, we first define a map from key pad to English letter. When user input a key pad number sequence, we take each number and check from the Trie. All children match the number should be investigated. Below is a Haskell program to realize T9 input.

mapT9  =  [(’2’, ”abc”), (’3’, ”def”), (’4’, ”ghi”), (’5’, ”jkl”),
(’6’, ”mno”), (’7’, ”pqrs”), (’8’, ”tuv”), (’9’, ”wxyz”)]

lookupT9 :: Char  ->  [(Char, b)]  ->  [(Char, b)]
lookupT9 c children  =  case lookup c mapT9 of
Nothing  ->  []
Just s   ->  foldl f [] s where
f lst x  =  case lookup x children of
Nothing  ->  lst
Just t   ->  (x, t):lst

--  T9 -find in Trie
findT9:: Trie a  ->  String  ->  [(StringMaybe  a)]
findT9 t []  =  [(””, Trie.value t)]
findT9 t (k:ks)  =  foldl f [] (lookupT9 k (children t))
where
f lst (c, tr)  =  (mapAppend c (findT9 tr ks))  ++  lst

findT9 is the main function, it takes 2 parameters, a Trie and a number sequence string. In non-trivial case, it calls lookupT9 function to examine all children which match the first number.

For each matched child, the program recursively calls findT9 on it with the left numbers, and we use mapAppend to insert the currently finding letter in front of all results. The program use foldl to combine all these together.

Function lookupT9 is used to filtered all possible children who match a number. It first call lookup function on mapT9, so that a string of possible English letters can be identified. Next we call lookup for each candidate letter to see if there is a child can match the letter. We use foldl to collect all such child together.

This program can be verified by using some simple test cases.

testFindT9  =  ”press_4:_”  ++  (show \$ take 5 \$ findT9 t ”4”)++
”\npress_46:_”  ++  (show \$ take 5 \$ findT9 t ”46”)++
”\npress_4663:_”  ++  (show \$ take 5 \$ findT9 t ”4663”)++
”\npress_2:_”  ++  (show \$ take 5 \$ findT9 t ”2”)++
”\npress_22:_”  ++  (show \$ take 5 \$ findT9 t ”22”)
where
t  =  Trie.fromList lst
lst  =  [(”home”, 1), (”good”, 2), (”gone”, 3), (”hood”, 4),
(”a”, 5), (”another”, 6), (”an”, 7)]

The program will output below result.

 press 4: [("g",Nothing),("h",Nothing)]  press 46: [("go",Nothing),("ho",Nothing)]  press 4663: [("gone",Just 3),("good",Just 2),("home",Just 1),("hood",Just 4)]  press 2: [("a",Just 5)]  press 22: []

The value of each child is just for illustration, we can put empty value instead and only returns candidate keys for a real input application.

Tries consumes to many spaces, we can provide a Patricia version as alternative.

findPrefixT9’ :: String  ->  [(String, b)]  ->  [(String, b)]
findPrefixT9’ s lst  =  filter f lst where
f (k, _)  =  (toT9 k) ‘Data.List.isPrefixOf‘ s

toT9 :: String  ->  String
toT9 []  =  []
toT9 (x:xs)  =  (unmapT9 x mapT9):(toT9 xs) where
unmapT9 x (p:ps)  =  if x ‘elem‘ (snd p) then (fst p) else unmapT9 x ps

findT9’ :: Patricia a  ->  String  ->  [(StringMaybe  a)]
findT9’ t []  =  [(””, value t)]
findT9’ t k  =  foldl f [] (findPrefixT9’ k (children t))
where
f lst (s, tr)  =  (mapAppend’ s (findT9’ tr (k ‘diff‘ s)))  ++  lst
diff x y  =  drop (length y) x

In this program, we don’t check one digit at a time, we take all the digit sequence, and we examine all children of the Patricia node. For each child, the program convert the prefix string to number sequence by using function toT9, if the result is prefix of what user input, we go on search in this child and append the prefix in front of all further results.

If we tries the same test case, we can find the result is a bit different.

 press 4: []  press 46: [("go",Nothing),("ho",Nothing)]  press 466: []  press 4663: [("good",Just 2),("gone",Just 3),("home",Just 1),("hood",Just 4)]  press 2: [("a",Just 5)]  press 22: []

If user press key ’4’, because the dictionary (represent by Patricia) doesn’t contain any candidates matches it, user will get an empty candidates list. The same situation happens when he enters “466”. In real input method implementation, such user experience isn’t good, because it displays nothing although user presses the key several times. One improvement is to predict what user will input next by display a partial result. This can be easily achieved by modify the above program. (Hint: not only check

findPrefixT9’ s lst  =  filter f lst where
f (k, _)  =  (toT9 k) ‘Data.List.isPrefixOf‘ s

but also check

f (k, _)  =  s ‘Data.List.isPrefixOf‘ (toT9 k)

)

##### T9 implemented in Scheme/Lisp

In Scheme/Lisp, T9 map is defined as a list of pairs.

(define  map-T9  (list ’(”2” ”abc”) ’(”3” ”def”) ’(”4” ”ghi”) ’(”5” ”jkl”)
’(”6” ”mno”) ’(”7” ”pqrs”) ’(”8” ”tuv”) ’(”9” ”wxyz”)))

The main searching function is implemented as the following.

(define (find-T9 t k) ;; return [(key value)]
(define (accumulate-find lst child)
(append (map-string-append (key child) (find-T9 (tree child) (string-cdr k)))
lst))
(define (lookup-child lst c) ;; lst, list of childen [(key tree)], c, char
(let ((res (find-matching-item  map-T9  (lambda (x) (string=? c (car x))))))
(if (not res) ’()
(filter (lambda (x) (substring? (key x) (cadr res))) lst))))
(if (string-null? k) (list (cons k (value t)))
(fold-left accumulate-find ’() (lookup-child (children t) (string-car k)))))

This function contains 2 inner functions. If the string is empty, the program returns a one element list. The element is a string value pair. For the none trivial case, the program will call inner function to find in each child and then put them together by using fold-left high order function.

To test this T9 search function, a very simple dictionary is established by using Trie insertion. Then we test by calling find-T9 function on several digits sequences.

(define dict-T9 (list ’(”home” ()) ’(”good” ()) ’(”gone” ()) ’(”hood” ())
’(”a” ()) ’(”another” ()) ’(”an” ())))

(define (test-trie-T9)
(define t (list->trie dict-T9))
(display ”find_4:_”) (display (find-T9 t ”4”)) (newline)
(display ”find_46:_”) (display (find-T9 t ”46”)) (newline)
(display ”find_4663:_”) (display (find-T9 t ”4663”)) (newline)
(display ”find_2:_”) (display (find-T9 t ”2”)) (newline)
(display ”find_22:_”) (display (find-T9 t ”22”)) (newline))

Evaluate this test function will output below result.

 find 4: ((g) (h))  find 46: ((go) (ho))  find 4663: ((gone) (good) (hood) (home))  find 2: ((a))  find 22: ()

In order to be more space effective, Patricia can be used to replace Trie. The search program modified as the following.

(define (find-T9 t k)
(define (accumulate-find lst child)
(append (map-string-append (key child) (find-T9 (tree child) (string- k (key child))))
lst))
(define (lookup-child lst k)
(filter (lambda (child) (string-prefix? (str->t9 (key child)) k)) lst))
(if (string-null? k) (list (cons ”” (value t)))
(fold-left accumulate-find ’() (lookup-child (children t) k))))

In this program a string helper function ’string-’ is defined to get the different part of two strings. It is defined like below.

(define (string- x y)
(string-tail x (string-length y)))

Another function is ’str-¿t9’ it will convert a alphabetic string back to digit sequence base on T9 mapping.

(define (str->t9 s)
(define (unmap-t9 c)
(car (find-matching-item  map-T9  (lambda (x) (substring? c (cadr x))))))
(if (string-null? s) ””
(string-append (unmap-t9 (string-car s)) (str->t9 (string-cdr s)))))

We can feed the almost same test cases, and the result is output as the following.

 find 4: ()  find 46: ((go) (ho))  find 466: ()  find 4663: ((good) (gone) (home) (hood))  find 2: ((a))  find 22: ()

Note the result is a bit different, the reason is described in Haskell section. It is easy to modify the program, so that Trie and Patricia approach give the very same result.

### 7 Short summary

In this post, we start from the Integer base trie and Patricia, the map data structure based on integer patricia plays an important role in Compiler implementation. Next, alphabetic Trie and Patricia are given, and I provide a example implementation to illustrate how to realize a predictive e-dictionary and a T9 input method. Although they are far from the real implementation in commercial software. They show a very simple approach of manipulating text. There are still some interesting problem can not be solved by Trie or Patrica directly, how ever, some other data structures such as suffix tree have close relationship with them. I’ll note something about suffix tree in other post.

### 8 Appendix

#### 8.1 Prerequisite software

GNU Make is used for easy build some of the program. For C++ and ANSI C programs, GNU GCC and G++ 3.4.4 are used. I use boost triple to reduce the amount of our code lines, boost library version I am using is 1.33.1. The path is in CXX variable in Makefile, please change it to your path when compiling. For Haskell programs GHC 6.10.4 is used for building. For Python programs, Python 2.5 is used for testing, for Scheme/Lisp program, MIT Scheme 14.9 is used.

all source files are put in one folder. Invoke ’make’ or ’make all’ will build C++ and Haskell program.

Run ’make Haskell’ will separate build Haskell program. There will be two executable file generated one is htest the other is happ (with .exe in Window like OS). Run htest will test functions in IntTrie.hs, IntPatricia.hs, Trie.hs and Patricia.hs. Run happ will execute the editionary and T9 test cases in EDict.hs.

Run ’make cpp’ will build c++ program. It will create a executable file named cpptest (with .exe in Windows like OS). Run this program will test inttrie.hpp, intpatricia.hpp, trie.hpp, patricia.hpp, and edict.hpp.

Run ’make c’ will build the ANSI C program for Trie. It will create a executable file named triec (with .exe in Windows like OS).

Python programs can run directly with interpreter.

Scheme/Lisp program need be loaded into Scheme evaluator and evaluate the final function in the program. Note that patricia.scm will hide some functions defined in trie.scm.

Here is a detailed list of source files

• IntTrie.hs, Haskell version of little-endian integer Trie.
• IntPatricia.hs, integer Patricia tree implemented in Haskell.
• Trie.hs, Alphabetic Trie, implemented in Haskell.
• Patricia.hs, Alphabetic Patricia, implemented in Haskell.
• TestMain.hs, main module to test the above 4 programs.
• EDict.hs, Haskell program for e-dictionary and T9.

#### 8.3 C++/C source files

• inttrie.hpp, Integer base Trie;
• intpatricia.hpp, Integer based Patricia tree;
• trie.c, Alphabetic Trie only for lowercase English language, implemented in ANSI C.
• trie.hpp, Alphabetic Trie;
• patricia.hpp, Alphabetic Patricia;
• trieutil.hpp, Some generic utilities;
• edit.hpp, e-dictionary and T9 implemented in C++;
• test.cpp, main program to test all above programs.

#### 8.4 Python source files

• inttrie.py, Python version of little-endian integer Trie, with test cases;
• intpatricia.py, integer Patricia tree implemented in Python;
• trie.py, Alphabetic Trie, implemented in Python;
• patricia.py, Alphabetic Patricia implemented in Python;
• trieutil.py, Common utilities;
• edict.py, e-dictionary and T9 implemented in Python.

#### 8.5 Scheme/Lisp source files

• inttrie.scm, Little-endian integer Trie, implemented in Scheme/Lisp;
• intpatricia.scm, Integer based Patricia tree;
• trie.scm, Alphabetic Trie;
• patricia.scm, Alphabetic Patricia, reused many definitions in Trie;
• trieutil.scm, common functions and utilities.

#### 8.6 Tools

Besides them, I use graphviz to draw most of the figures in this post. In order to translate the Trie, Patrica and Suffix Tree output to dot language scripts. I wrote a python program. it can be used like this.

 trie2dot.py -o foo.dot -t patricia "1:x, 4:y, 5:z"  trie2dot.py -o foo.dot -t trie "001:one, 101:five, 100:four"

### References

[1]   Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “Introduction to Algorithms, Second Edition”. ISBN:0262032937. The MIT Press. 2001

[2]   Chris Okasaki and Andrew Gill. “Fast Mergeable Integer Maps”. Workshop on ML, September 1998, pages 77-86, http://www.cse.ogi.edu/ andy/pub/finite.htm

[3]   D.R. Morrison, “PATRICIA – Practical Algorithm To Retrieve Information Coded In Alphanumeric”, Journal of the ACM, 15(4), October 1968, pages 514-534.

[4]   Suffix Tree, Wikipedia. http://en.wikipedia.org/wiki/Suffix_tree

[5]   Trie, Wikipedia. http://en.wikipedia.org/wiki/Trie

[6]   T9 (predictive text), Wikipedia. http://en.wikipedia.org/wiki/T9_(predictive_text)

[7]   Predictive text, Wikipedia. http://en.wikipedia.org/wiki/Predictive_text

[8]   Bjorn Karlsson. “Beyond the C++ Standard Library: An Introduction to Boost”. Addison Wesley Professional, August 31, 2005, ISBN: 0321133544

Č
Ċ
ď
Xinyu LIU,
2010年11月16日 上午12:02
ċ
ď
trie.zip
(44k)
Xinyu LIU,
2010年1月24日 下午6:47