We know tree well, a binary search tree performs well in searching. Today, I’ll introduce another tree-like data structure, which is widely used for retrieval of a key in a dataset of strings: trie.
Introduction
A trie also called prefix-tree, is a type of search tree, a tree data structure used for locating specific keys from within a set. These keys are always strings and characters. A trie is structured by trie nodes. Each trie node has two components: a set of children that leads to the next trie node, and a boolean value defines if it is the end for the word.
A trie has many applications such as:
- Autocomplete: when googling something, it will automatically predict the rest of a word a user is typing. Basically, it’s searching the prefix in a trie and provides the following words as options.
- Spell checker: it checks if the user’s typing by searching the word in a trie(dictionary).
- Solving Boggle games: starts with a random character in the given matrix, check whether it is a prefix in a trie(dictionary).
Basic functions of a trie
The basic functions of a trie include: insert, search a word, and search a prefix:
- Insert a word into a trie
Insert every letter of the word into a trie. The process is 1). start from the root to check if the current letter node exists; 2) if the letter node exists, move one level down to the next letter; 3) if the letter does not exist, create a new node; 4) repeat the previous steps until we met the end of the word, and mark the last node’s boolean True.
- Search if a word exists in a trie
We start from the root with the first letter, examine if the letter exists in its children set. There are two cases: 1). if a letter can’t be found in a trie, simple return False; 2). if all letters in a word can be found in a trie and they all linked, check the last letter’s boolean value, if it’s not the end of the word, return False, otherwise return True.
- Check if a prefix exists in a trie
Similar to the searching function. In this case, we don’t need to check the last letter node’s boolean value.
Implementation
There’re two ways of implementation. The trie structures are the same, the only difference is the representation of the trie node’s children.
- Trie node’s children is an array
For every trie node in a tree, its children is a form of an array with size 26. Index 0 is representing ‘a’, and index 25 is representing ‘z’. When inserting a new letter, we’ll first calculate its index, then create a new trie node at children[index].
Code:
class TrieNode:
def __init__(self):
self.children = [None] * 26
self.isLeaf = Falseclass Trie: def __init__(self):
"""
Initialize your data structure here.
"""
self.root = TrieNode() def insert(self, word: str) -> None:
"""
Inserts a word into the trie.
"""
current = self.root
for letter in word:
index = ord(letter) - ord('a')
if not current.children[index]:
current.children[index] = TrieNode()
current = current.children[index]
current.isLeaf = True def search(self, word: str) -> bool:
"""
Returns if the word is in the trie.
"""
current = self.root
for letter in word:
index = ord(letter) - ord('a')
if not current.children[index]:
return False
current = current.children[index]
return current.isLeaf and current def startsWith(self, prefix: str) -> bool:
"""
Returns if there is any word in the trie that starts with the given prefix.
"""
current = self.root
for letter in prefix:
index = ord(letter) - ord('a')
if not current.children[index]:
return False
current = current.children[index]
return True
- Trie node’s children is a dictionary
Each trie node’s children set is represented by a dictionary. When inserting a new word into a trie, the letter is the key of the node’s children dictionary, the value is the next letter’s node.
Compare with array approach, it saves some spaces due to the size of trie node children is not fixed to 26.
Code:
class TrieNode:
def __init__(self):
self.children = collections.defaultdict(TrieNode)
self.is_end = Falseclass Trie: def __init__(self):
"""
Initialize your data structure here.
"""
self.root = TrieNode() def insert(self, word: str) -> None:
"""
Inserts a word into the trie.
"""
current = self.root
for letter in word:
current = current.children[letter]
current.is_end = True def search(self, word: str) -> bool:
"""
Returns if the word is in the trie.
"""
current = self.root
for letter in word:
current = current.children.get(letter)
if current is None:
return False
return current.is_end def startsWith(self, prefix: str) -> bool:
"""
Returns if there is any word in the trie that starts with the given prefix.
"""
current = self.root
for letter in prefix:
current = current.children.get(letter)
if not current:
return False
return True