The big picture#
TL;DRthe 30-second version
- A trie stores strings as a tree of characters: the path from the root to a node spells a prefix, so every word sharing that prefix shares that path.
- Prefix lookup is O(prefix length) — completely independent of how many words the dictionary holds. That's the property a hash set can't give you.
- Autocomplete = walk to the prefix node, then collect the word-ends in its subtree and rank them by frequency, returning the top-k.
- The cost is memory: one small node per character and a pointer per child make a plain trie cache-unfriendly. Production systems compress it (radix/Patricia trees, DAWGs, FSTs) and cache the top-k completions per node.
- It powers search typeahead, spell-checkers, IP routing tables (longest-prefix match), and Lucene's term dictionary (an FST).
Everything below expands on these points. Read the core sections top to bottom for the full mental model; the collapsible "Go deeper" boxes hold the advanced internals (compression schemes, top-k caching, memory math) you can skip on a first pass.
- root
- c
- a
- r★ word
- t★ word
- e★ word
- r★ word
- a
- d
- o
- g★ word
- o
- c
Start here: the problem it solves#
A search box needs to suggest completions on every keystroke: type 'car' and instantly see 'care', 'cart', 'card'. Scanning a million-word dictionary for everything starting with 'car' on each keypress is far too slow, and a hash set can test exact membership but can't answer 'what starts with this prefix?' at all.
Why can't a hash set do it? A hash function deliberately scatters keys: 'car', 'care', and 'cart' land in three unrelated buckets with no notion that one is a prefix of another. The only way to find every key beginning with 'car' is to enumerate the whole table and test each one — O(n) per keystroke, and you do it again on every letter typed. That is exactly the work a typeahead box can't afford.
A sorted array does better: binary-search to the first key ≥ 'car', then scan forward while the prefix still matches. That's O(log n + matches) for a query, which is fine for a static dictionary — but every insert or frequency update has to shift elements, so it's painful to keep current under a live stream of edits.
A trie is built exactly for prefix queries. Because every word with a given prefix lives under the same node, you walk to that node once and everything beneath it is a completion. The lookup cost depends on the prefix length, not the dictionary size — and inserts and updates are just as cheap as lookups.
The structure: one node per character#
The root represents the empty string. Each edge is labeled with a character, so the path from the root to a node spells a prefix. A node is flagged as a word end when a complete word stops there. Inserting a word walks down character by character, creating any missing nodes, and marks the final node as a word end.
- Insert 'car': create nodes c → a → r, mark r as a word end.
- Insert 'cart': c, a, r already exist — reuse them, just add t and mark it.
- Insert 'care': reuse c, a, r, add e and mark it.
- Three words, but the shared prefix 'car' is stored once.
Each node also needs a way to find its children by character. Two layouts dominate: a fixed-size array indexed by character (e.g. 26 slots for a–z, or 256 for bytes), which gives O(1) child lookup but wastes a pointer per unused slot; or a hash map / small sorted list of (char → child), which is compact for sparse nodes but adds a per-access constant. The array is faster and cache-friendlier near the dense root; maps win deep in the tree where most nodes have one or two children.
Autocomplete: walk, then gather and rank#
To autocomplete a prefix, walk from the root following one node per character. If any character has no node, the prefix isn't present and there are no suggestions. Otherwise you reach the prefix node, and every word-end in its subtree is a completion. Collect them with a depth-first search, sort by frequency (breaking ties alphabetically), and return the top-k.
- Walk: from the root, follow the child for 'c', then 'a', then 'r'. Three hops, O(prefix length).
- Land on the prefix node for 'car'. If any hop had failed, return no suggestions.
- DFS the subtree under 'car', collecting every node flagged as a word end: car, card, care, cart…
- Rank by stored frequency (ties broken alphabetically) and keep the top-k, say the top 10.
- Return that list as the dropdown.
Walking to the prefix is O(prefix length). Gathering completions costs as much as the subtree you traverse — for a short, popular prefix like 'a' that subtree can be enormous. That's why real systems cache the top-k completions directly on each node: the keystroke becomes a single node lookup that returns a precomputed list, turning suggest into O(prefix length) with no subtree scan at all.
PredictA dictionary holds 1,000,000 words. You type the 3-character prefix 'car'. Roughly how many nodes must the trie walk to reach the prefix node?
Hint: The walk cost tracks the prefix length, not the word count.
About 3 — one node per character of the prefix. It's the same 3 steps whether the dictionary holds 100 words or 100 million. (Gathering the completions underneath is extra work, which is why systems cache the top-k per node.)
Complexity: time independent of dictionary size#
The headline property: lookup, insert, and delete of a key of length L are all O(L) — proportional to the key length and completely independent of n, the number of keys stored. A balanced BST is O(log n · L) in the worst case (a comparison touches up to L characters at each of log n levels); a hash set is O(L) to hash but offers no prefix search. The trie's cost simply does not grow as the dictionary grows.
| Operation | Cost | Note |
|---|---|---|
| Insert key length L | O(L) | walk/create one node per character |
| Exact lookup | O(L) | walk L nodes, check the word-end flag |
| Prefix walk | O(P) | P = prefix length; n drops out entirely |
| Suggest (no cache) | O(P + S) | S = size of the subtree gathered + sort |
| Suggest (top-k cached) | O(P) | read the precomputed list at the node |
| Space | O(total chars) | minus shared prefixes; plus pointer overhead |
Space is where tries cost you. Total storage is O(sum of all key lengths) in characters — shared prefixes are stored once, which is a genuine saving — but each node carries child pointers and a word-end flag, and there are a lot of nodes. The pointer overhead, not the characters, dominates the memory bill.
Go deeperWhy the pointers, not the letters, cost you
Consider an array-of-children layout over a 26-letter alphabet. Each node holds 26 pointers; at 8 bytes each that's 208 bytes of pointers per node, the vast majority of them null. A dictionary with a few million nodes can need gigabytes — most of it empty slots. Switch to a 256-way byte alphabet (for Unicode/UTF-8) and the per-node cost balloons to 2 KB.
A hash-map-per-node layout fixes the waste (only real children stored) but trades it for hashing overhead and worse cache locality, since each child is a separate heap allocation reached by pointer-chasing. Either way, the structure is pointer-heavy and scatters its nodes across memory — a stark contrast to a sorted array, which is one contiguous block the CPU prefetcher loves. This memory profile, not the asymptotic time, is what pushes production systems to the compressed variants below.
Variants: compressing the trie#
A plain trie's weakness is all those single-child chains and null pointers. Every serious variant attacks that memory cost while preserving prefix lookup.
- Radix tree (Patricia trie) — collapse any chain of single-child nodes into one edge labeled with the whole substring. 'r → o → u → t → e' becomes a single edge 'route'. Same prefix queries, dramatically fewer nodes. This is the standard fix and the structure inside Linux's IP routing table and many key-value stores.
- Ternary search tree (TST) — store children as a small BST keyed by character (left/middle/right per node) instead of an array or hash map. Far less memory than a 256-way array, supports near-neighbor and wildcard search, at the cost of an extra log factor per character.
- DAWG (directed acyclic word graph) — a trie that also shares common suffixes by merging identical subtrees. 'fishing' and 'washing' can share the trailing 'shing'. It minimizes the structure into a DAG, hugely compact for a fixed dictionary — but it loses easy per-word frequencies and is awkward to update.
- Suffix tree / suffix array — index all suffixes of a text (not a word list) to answer substring (not just prefix) queries; the workhorses of bioinformatics and full-text search. A different problem, same family.
- FST (finite-state transducer) — a DAWG that also maps each key to an output value (an offset, a weight). Lucene stores its entire term dictionary as an FST: shared prefixes and suffixes plus an associated value, in a fraction of the memory of a plain trie.
Go deeperCaching top-k per node — O(1)-after-walk suggest
Gathering and ranking a subtree on every keystroke is the expensive part of autocomplete, especially for short popular prefixes whose subtree is most of the dictionary. The fix: precompute and store, at each node, the top-k completions of the prefix it represents (with their frequencies). A query then walks to the prefix node and returns its cached list — no DFS, no sort. Suggest drops to O(prefix length), and in practice that list is what gets served straight from an in-memory cache.
The trade-off moves to writes and memory. Each node now stores k entries, multiplying space, and updating a single word's frequency may have to refresh the cached lists of every ancestor up to the root. Production typeaheads therefore rebuild these caches in batch from query logs rather than on every edit, accepting that suggestions lag the latest data by minutes — a fine trade for a feature that only needs to feel instant, not be perfectly fresh.
When to reach for a trie (and when not to)#
A trie is the right call when the query is prefix-shaped and the data is string-like over a manageable alphabet — autocomplete, typeahead, longest-prefix match, spell-check. It's the wrong call when you only ever need exact membership (a hash set is smaller and simpler) or when the alphabet is huge and the keys are sparse (the node overhead explodes).
- Good fit: prefix and range-of-prefix queries, ordered iteration, shared-prefix data (URLs, file paths, dictionary words, IP addresses), and live updates mixed with queries.
- Weaker fit: exact-membership-only workloads (hash set wins on memory), or set membership where order never matters.
- Watch out: large/sparse alphabets (full Unicode), or short keys where a hash set's flat layout beats the trie's pointer chasing.
Failure modes: where tries hurt#
- Memory blow-up — a wide alphabet (256-way byte nodes, or full Unicode) with sparse data leaves most child slots null. Millions of nodes × hundreds of mostly-empty pointers each can turn a megabyte of words into gigabytes of trie. Radix compression or a map layout is the defense.
- Cache misses from pointer chasing — each step of a walk dereferences a pointer to a node that may live anywhere in memory. A long walk is a string of cache misses, so wall-clock time can lag the clean O(L) analysis badly. Contiguous structures (sorted arrays, FSTs) win on locality.
- Top-k cache update cost — caching completions per node makes a frequency bump cascade up to the root, refreshing every ancestor's list. Under a high write rate this dominates; the usual fix is asynchronous, batched rebuilds from logs, accepting slightly stale suggestions.
- Deletion leaving dead nodes — removing a word means clearing its end flag and pruning now-childless nodes back up the path; skip the prune and the trie accumulates dead internal nodes that waste space and slow walks.
Trie vs hash set vs BST vs sorted array#
| Trie | Hash set | Balanced BST | Sorted array + binary search | |
|---|---|---|---|---|
| Prefix query | O(P) — native | Not supported (must scan all) | O(log n · L) to locate, then in-order | O(log n + matches) |
| Exact lookup | O(L) | O(L) average (hash + compare) | O(log n · L) | O(log n · L) |
| Insert / update | O(L) | O(L) average | O(log n · L) | O(n) — shift elements |
| Ordered iteration | Yes (DFS in sorted order) | No | Yes (in-order) | Yes (already sorted) |
| Memory | High — pointer per child, many nodes | Low — flat table | Moderate — 2 pointers/node | Lowest — contiguous, no pointers |
Read it as: hash set wins memory and exact lookup but can't do prefixes; sorted array wins memory and read-side prefix queries but is painful to update; BST gives order and updates but pays a log factor and no native prefix walk. The trie is the only one where the prefix query itself is O(prefix length) and updates stay cheap — which is precisely why typeaheads use it.
Where tries run in the wild#
Tries (usually in a compressed form) show up wherever a system has to answer 'what comes after this prefix?' at speed.
- Search autocomplete / typeahead — the canonical use. Google, Elasticsearch's completion suggester, and most search boxes walk a prefix structure and serve frequency-ranked top-k completions, typically precomputed and cached.
- IP routing tables — routers do longest-prefix match on destination addresses to pick a next hop. Linux's routing table and hardware forwarding tables use radix/Patricia tries (the classic LPC-trie / LPM structure).
- Spell-checkers and fuzzy match — a trie supports edit-distance search by walking the tree while tracking a small dynamic-programming row, finding all words within k edits efficiently.
- Lucene / Elasticsearch term dictionary — stored as an FST (a transducer DAWG), giving prefix and term lookup over millions of terms in a tiny memory footprint.
- Redis — radix trees (rax) back the stream (radix-keyed entry IDs) and cluster key-slot tracking; key-value engines use them for ordered, prefix-friendly indexes.
- T9 / predictive text — old phone keypads mapped digit sequences to a trie of words to predict input from numeric keystrokes.
Common questions & gotchas#
What does O(prefix length) mean?
The work grows with how long the prefix is, not how many words are stored. Searching 'car' costs the same three steps in a 100-word dictionary or a 100-million-word one — the dictionary size simply drops out of the cost.
Why is trie lookup O(prefix length)?
Walking the trie follows exactly one node per character of the prefix: 'c' then 'a' then 'r' is three hops, period. There's no comparison against other keys and no scan of the dictionary, so the count of stored words never enters the cost — only the prefix's length does.
Why not just use a hash set for autocomplete?
A hash set tests exact membership in O(1) but can't answer 'what starts with car?' at all — hashing scatters related words to unrelated slots, so the only way to find prefix matches is to scan the entire table. A trie keeps every word with a shared prefix on a shared path, which is exactly what prefix queries need.
Trie vs hash set — when is the hash set actually better?
When you only ever need exact membership or exact-key lookup and never prefixes. A hash set is a flat array with no per-child pointers, so it's smaller and more cache-friendly. The moment you need 'everything starting with X', ordered iteration, or longest-prefix match, the hash set can't help and the trie wins.
What is a radix tree (Patricia tree)?
A compressed trie: any chain of single-child nodes is merged into one edge labeled with the whole substring. Same prefix queries, far fewer nodes — the standard way to cut a plain trie's memory cost, and what Linux routing tables and Redis streams use.
How do you rank suggestions?
Store a frequency (or score) on each word-end node, gather all completions in the prefix's subtree, and return the top-k by frequency, breaking ties alphabetically. To make every keystroke instant, precompute and cache that top-k list at each node so a query just reads it after the walk.
QuizYour autocomplete service stores 50 million words in a plain 256-way (byte-alphabet) trie and is running out of RAM, even though the words themselves are only a few hundred megabytes. What is the most effective first fix?
- Switch to a hash set so lookups become O(1)
- Compress to a radix/Patricia tree (and/or map-based children) to remove single-child chains and null slots
- Add more frequency metadata to each node
- Lower k so fewer completions are returned
Show answer
Compress to a radix/Patricia tree (and/or map-based children) to remove single-child chains and null slots — The memory is going to per-node pointers — a 256-way node carries hundreds of mostly-null child slots, and single-character chains create huge numbers of nodes. Radix compression merges those chains and map-based children drop the null slots, cutting node count and pointer overhead while keeping prefix queries O(prefix length). A hash set would shrink memory but destroy the prefix-query ability the service exists for; the other options don't address node overhead.
In an interview#
Lead with the fit: prefix queries (autocomplete, typeahead, spell-check, IP routing tables) want a trie, because lookup is O(prefix length) and prefixes are shared. Mention ranking by frequency for 'most popular completions', and that you'd cache top-k completions per node so each keystroke is O(1) after the walk.
Be ready to discuss memory: a plain trie has many small nodes, so compress it into a radix/Patricia tree, or use a DAWG/FST to share suffixes too. For a web-scale typeahead, describe sharding the trie by first letters, serving precomputed suggestion lists from cache, and updating frequencies asynchronously from query logs. A common comparison is trie vs. hash set (hash gives exact membership but no prefix search) and trie vs. sorted array + binary search (works for prefixes but is slow to update).
Then open the simulator: INSERT words with frequencies and watch shared prefixes reuse nodes, then type a prefix to walk the tree and produce the frequency-ranked top-k dropdown.
References & further reading#
- Fredkin — Trie Memory, Communications of the ACM (1960) — the paper that introduced the trie
- Morrison — PATRICIA: Practical Algorithm To Retrieve Information Coded in Alphanumeric, JACM (1968) — the original radix/Patricia trie
- McCandless — Using Finite State Transducers in Lucene — how Lucene stores its term dictionary as an FST
- Wikipedia — Deterministic acyclic finite state automaton (DAWG) — sharing suffixes as well as prefixes
- Wikipedia — Radix tree — compressed-trie variants and where they're used
Ready to try it?
The simulator is a real, deterministic implementation — pick a scenario and step through it, scrubbing the timeline forward and backward through every change.