Skip to main content

Suffix Tree

ImplementationSpaceTime
AccessLookupInsertionDeletion
Sibling lists / unsorted arraysO(n)N/AO(n)O(1)O(1)
Bitwise sibling treesO(n)N/AO(log n)O(1)O(1)
Hash mapsO(n)N/AO(1)O(1)O(n)
Balanced search treeO(n)N/AO(log n)O(log n)O(log n)
Sorted arraysO(n)N/AO(log n)O(n)O(n)
Hash maps + sibling listsO(n)N/AO(1)O(1)O(1)

Definition​

Suffix Tree is a tree-based data structure that efficiently represents all suffixes of a given string, enabling fast pattern matching and substring search algorithms

Simplified

Imagine you have a giant book, and a Suffix Tree is like a smart index that helps you quickly find any word or phrase in the book without reading it all over again, making searches faster and more efficient

Practice​

AspectPseudo Code
Build Suffix Tree
build_suffix_tree():
for index in text.length:
_add_suffix(index, root)
Add Suffix
add_suffix(suffix):
_add_suffix(0, root)

_add_suffix(suffix_start, node):
if suffix_start == text.length:
return

current_char = text[suffix_start]
if current_char not in node.children:
node.children[current_char] = SuffixNode()
node.children[current_char].index = suffix_start
else:
existing_node = node.children[current_char]
_add_suffix(suffix_start + 1, existing_node)
Insertion
insert(new_text):
text += new_text
for index in range(text.length - new_text.length, text.length):
_add_suffix(index, root)
Search
search(pattern):
current_node = root
n = len(pattern)
i = 0

while i < n:
char = pattern[i]
if char in current_node.children:
current_node = current_node.children[char]
i += 1
else:
return False

return True
Delete
delete(pattern):
if search(pattern):
text = text.replace(pattern, "", 1)
root = SuffixNode()
build_suffix_tree()
Longest Common Substring
longest_common_substring():
result = [0, 0]
_longest_common_substring_dfs(root, "", result)
return text.substring(result[0], result[1])

_longest_common_substring_dfs(node, current_suffix, result):
if not node.children:
return

for child in node.children.values():
new_suffix = current_suffix + text[child.index]

if len(new_suffix) > result[1] - result[0]:
result[0] = child.index - len(new_suffix) + 1
result[1] = child.index + 1

_longest_common_substring_dfs(child, new_suffix, result)
Substring Count
substring_count(node=None):
if node is None:
node = root

count = 1
for child in node.children.values():
count += substring_count(child)

return count
Pattern Matching
pattern_matching(pattern, node=None):
if node is None:
node = root

for i, char in enumerate(pattern):
if char in node.children:
node = node.children[char]
else:
return []

return _get_leaf_indices(node)

_get_leaf_indices(node):
indices = []
if not node.children:
indices.append(node.index)

for child in node.children.values():
indices.extend(_get_leaf_indices(child))

return indices
Traverse
traverse(node=None, depth=0):
if node is None:
node = root
print(" " * depth + "Node: " + node.index)
for char, child in node.children.items():
print(" " * (depth + 1) + "Edge: " + char)
traverse(child, depth + 2)