System Internals
Open the simulator →
Ordered data structure

Skip List

Sorted, O(log n) search without the rebalancing a tree needs — using coin flips.

A skip list is a sorted linked list with extra 'express lanes'. By randomly promoting some nodes to higher levels, it lets a search skip over large stretches of the list and only walk node-by-node near its target — giving expected O(log n) search, insert, and delete with far simpler code than a balanced tree. Invented by William Pugh in 1989, it is the structure behind Redis sorted sets, many LSM-tree memtables, and Java's lock-free ConcurrentSkipListMap.

The big picture#

TL;DRthe 30-second version
  • A skip list is a sorted linked list plus randomized 'express lanes' — each node gets a tower of forward pointers whose height is decided by repeated coin flips (promote with probability p, usually 1/2).
  • Search starts at the top-left sentinel, moves right while the next key is smaller than the target, and drops a level when it would overshoot. The geometric thinning of upper levels makes this expected O(log n).
  • Insert and delete run the same search to find each level's predecessor, then splice or unlink one tower — no rotations, no global rebalancing. That makes the code short and easy to make lock-free.
  • It powers Redis sorted sets (ZSET), the LevelDB/RocksDB memtable, and java.util.concurrent.ConcurrentSkipListMap. The classic alternative — a balanced BST (red-black/AVL) — gives a hard worst-case bound but is fiddlier and harder to make concurrent.

Everything below expands on these points. Read the core sections top to bottom for the full mental model; the collapsible "Go deeper" boxes hold the probability math and edge cases you can skip on a first pass and return to later.

every key lives at L0; taller towers are rarer — a box means the key reaches that level

H36912172125303340NILL3H25NILL2H925NILL1H69172533NILL0H36912172125303340NIL
Three express lanes over a sorted base list (p = 1/2)

Start here: the problem it solves#

A sorted linked list is easy to insert into, but searching it is O(n): you have to walk one node at a time because there is no way to jump ahead. An array fixes search with binary search at O(log n), but then every insert or delete shifts elements — O(n) again — so neither plain structure gives you fast ordered search and fast updates at the same time.

The textbook answer is a balanced binary search tree — red-black or AVL — which guarantees O(log n) search, insert, and delete. The catch is the machinery that keeps it balanced: rotations, recoloring, and case analysis that are famously fiddly to implement correctly. Worse, those rotations restructure pointers far from the point of change, which makes a balanced tree hard to update from multiple threads without taking coarse locks.

A skip list gets the same expected O(log n) operations as a balanced tree, but stays balanced on average through randomness instead of rotations. There is no rebalancing step at all — an insert or delete only touches the tower of the affected node and the predecessors that point at it. That locality is what makes the code short and what makes concurrent, even lock-free, implementations practical.

The trade-offSkip lists give probabilistic O(log n) search/insert/delete with simple, rotation-free code that is easy to make concurrent. The cost is extra forward pointers (about two per node on average) and bounds that hold in expectation and with high probability, not as a hard worst case. For the overwhelming majority of systems that is a trade worth making.

The mechanism: stacked linked lists and coin-flip towers#

Picture several linked lists stacked on top of each other. The bottom level (level 0) contains every node in sorted order. Each level above holds a sparse subset of the level below — the express lanes. Every node carries a 'tower' of forward pointers, one per level it participates in, so a tall tower is reachable from several lanes and a height-1 node only from the base list.

Heights are chosen by chance, not by position. When you insert a node you flip a coin to decide its tower height: it is always at level 0; with probability p (typically 1/2) it is also promoted to level 1; with probability p again to level 2; and so on, stopping at the first tails. This makes roughly half the nodes reach level 1, a quarter level 2, an eighth level 3 — the geometric thinning that makes the upper lanes act like a coarse index over the lanes beneath them.

  1. Allocate the new node and run a search to record the predecessor at every level (the 'update' array).
  2. Flip coins to pick the tower height h: start at 1, and while a fair coin says heads and h is below the cap, increment h.
  3. If h exceeds the list's current top level, raise the head sentinel's level so the new tower has somewhere to attach.
  4. For each level 0..h-1, splice the node in: new.next = pred.next; pred.next = new — exactly like a singly linked list insert, repeated per level.
Why randomness instead of balancing?A balanced tree carefully chooses structure to guarantee depth. A skip list lets the coin flips produce a good-enough structure on average — no existing node ever has to move to keep things balanced. Crucially, a node's height is decided locally at insert time and never changes, so updates stay local and the analysis is the same no matter what order keys arrive in.
Go deeperSentinels, the level cap, and back pointers

Real implementations use a head sentinel whose tower is as tall as the current list, and often a NIL/tail sentinel with a +∞ key so the 'move right' loop needs no null check. The maximum level is capped — Pugh suggests MaxLevel = log_{1/p}(N) for the expected size N (16 levels covers ~65k keys at p=1/2; Redis uses 32, covering ~2^64). Capping bounds the pointer overhead and is harmless because exceeding it is astronomically unlikely.

Plain skip lists store only forward pointers, so they are singly linked per level. Implementations that need to walk backwards — Redis ZSETs support ZREVRANGE — add a single back pointer at level 0, turning the base list into a doubly linked list while leaving the express lanes one-directional.

Complexity: why the coin flips give O(log n)#

All three operations — search, insert, delete — are expected O(log n) time. Space is O(n): every node stores its key/value plus its tower of forward pointers, and the towers add only a constant factor on average.

  • Expected levels: a node reaches level k with probability p^k, so the tallest tower in n nodes is ~log_{1/p}(n) levels. At p = 1/2 that is about log2(n) levels.
  • Expected pointers per node: a node's height is a geometric random variable with mean 1/(1−p). At p = 1/2 that is exactly 2 forward pointers per node on average, so total space is ~2n pointers.
  • Expected search cost: ~(1/p)·log_{1/p}(n) comparisons. At p = 1/2 that is ~2·log2(n) — the same Θ(log n) shape as a balanced tree, with a small constant.
Go deeperThe backward-climb argument and high-probability bounds

Pugh analyzes the search path by walking it backwards from the target. At each step the path either climbs up a level (probability p, because the current node has a taller tower) or steps left (probability 1−p). The expected number of leftward steps needed to traverse one level is 1/p, and the number of levels is log_{1/p}(n) with high probability, giving expected cost (1/p)·log_{1/p}(n). Minimizing this over p gives an optimum near p ≈ 1/e ≈ 0.37, but p = 1/2 is the standard choice because it makes the coin flip a single bit and costs almost nothing extra.

The bound is not just 'expected' — it holds with high probability. The probability that search cost exceeds its mean by a constant factor falls off exponentially in n (a Chernoff-style tail). Concretely, the chance the structure degenerates badly shrinks faster than any polynomial, which is why in practice a skip list never visibly misbehaves even though, in principle, an unlucky run of coin flips could.

Tuning pp controls the search-speed/space trade. Lower p (e.g. 1/4) means taller-but-fewer express lanes: fewer pointers per node (1/(1−p) ≈ 1.33) and less memory, but more horizontal steps per level. Higher p (e.g. 1/2) means more pointers per node (2) and faster searches. Redis uses p = 1/4 to save memory; many textbook and in-memory implementations use p = 1/2 for speed.

Variants worth knowing#

  • Deterministic / 1-2-3 skip lists — replace the coin flips with a balance rule (between 1 and 3 nodes of one height between any two of the next height up), giving a guaranteed O(log n) worst case at the cost of more bookkeeping. They are the skip-list analogue of a 2-3-4 tree.
  • Indexable skip lists — store a 'span' count on each forward pointer (how many base nodes it jumps). Summing spans along the search path yields the rank of a key in O(log n), enabling order-statistic queries like 'the 1000th smallest' or ZRANK. Redis ZSETs use exactly this.
  • Concurrent / lock-free skip lists — because updates are local pointer splices, they map naturally onto compare-and-swap (CAS). Java's ConcurrentSkipListMap is a non-blocking design; deletions mark a node logically before physically unlinking it, so readers never see a torn structure. This concurrency-friendliness is a major reason skip lists are chosen over balanced trees in multithreaded code.
Go deeperHow lock-free deletion avoids a half-removed node

Naively CAS-ing a node out level by level can let another thread observe it present at one level and gone at another. Lock-free skip lists (Pugh's concurrent design; Java's ConcurrentSkipListMap) solve this with logical deletion: the node's value is first marked (e.g. set to a sentinel or its next pointer 'marked') with a single CAS, which atomically makes it logically absent. Physical unlinking from each level then happens lazily, and any thread that trips over a marked node helps finish the unlink before proceeding. Readers therefore always see a consistent linearizable view without locks.

Trade-offs and when to reach for one#

A skip list is the right call when you want an ordered map with O(log n) operations and you value simple, concurrency-friendly code over a hard worst-case guarantee. It is a weaker fit when you need a strict bound, the tightest possible memory, or cache-optimal scans over huge on-disk datasets.

  • For: in-memory ordered maps/sets, ranking and range queries, and anything multithreaded — the local, rotation-free updates make lock-free implementations tractable where a balanced tree would need coarse locking.
  • Against (hard real-time): the O(log n) bound is probabilistic. A safety-critical system that cannot tolerate even an astronomically rare slow operation should prefer a deterministic balanced tree or a deterministic skip list.
  • Against (memory-tight or on-disk): towers cost extra pointers and the nodes are scattered in memory, so cache locality is worse than a packed array or a B-tree. For disk-resident data, a B-tree's high fan-out and page alignment win decisively — which is why databases index on disk with B-trees, not skip lists.
Simplicity is the headline featurePugh's original motivation was not raw speed — a skip list and a balanced tree have the same asymptotics — but that the skip list is dramatically easier to implement, reason about, and parallelize. In an interview, 'simpler than a balanced tree and naturally concurrent' is the answer to 'why would anyone use this instead of a red-black tree?'

Failure modes: the unlucky worst case#

The pathological case is real but exponentially unlikely: if every coin flip comes up tails, every node lands at level 0 and the skip list degenerates into a plain linked list with O(n) search. Less dramatically, an unlucky distribution of heights can make some searches longer than the mean.

Two properties keep this from mattering. First, the bad cases are vanishingly rare — the probability of degradation by a constant factor falls off exponentially in n, so a million-key list effectively never misbehaves. Second, the randomness comes from the structure's own coin flips, not from the keys, so there is no adversarial input ordering that forces the worst case the way sorted insertions degrade a naive BST. (If an attacker could observe or control the RNG, they could in principle bias heights — so security-sensitive deployments seed the RNG unpredictably.)

The p tuning trade, restated as a failure tradePush p too high and you spend memory on pointers for little search gain; push p too low and searches do more horizontal walking per level, widening the tail of slow operations. p = 1/2 (speed) and p = 1/4 (memory, Redis's choice) bracket the sweet spot for almost all workloads.

Skip list vs balanced BST vs B-tree vs hash table#

Skip listBalanced BST (red-black)B-treeHash table
Search / insert / deleteO(log n) expectedO(log n) worst caseO(log n) worst caseO(1) average
Ordered ops (range, rank, min/max)Yes — base list + spansYes — in-order traversalYes — sorted leavesNo — unordered
Worst-case guaranteeProbabilisticDeterministicDeterministicProbabilistic (O(n) on collisions)
Implementation simplicitySimple, no rotationsFiddly rotations/recoloringComplex (splits/merges)Simple
ConcurrencyEasy — local CAS, lock-free existsHard — rotations touch far nodesHard — node splits lock subtreesModerate — bucket/stripe locks
Memory localityPoor — scattered towersPoor — scattered nodesExcellent — packed pagesGood — contiguous buckets
Best homeIn-memory ordered + concurrentIn-memory ordered, strict boundOn-disk / large ordered indexesUnordered key lookup

Read this table as: hash tables win when you do not need order; B-trees win on disk because of fan-out and locality; balanced BSTs win when you need a hard bound in memory; and skip lists win when you want ordered, in-memory operations with simple code that is easy to make concurrent.

Where skip lists run in the wild#

  • Redis sorted sets (ZSET) — the canonical example. A ZSET is a skip list ordered by score (with a back pointer for reverse range, and span counts for ZRANK) paired with a hash map from member to score. Small ZSETs use a compact listpack; large ones switch to the skip list. This is why ZADD/ZRANGE/ZRANK are all O(log n).
  • LevelDB & RocksDB memtables — the in-memory write buffer in front of an LSM tree is a skip list. It must support concurrent sorted inserts and ordered iteration during a flush; the skip list's lock-friendly inserts fit perfectly. (The LSM explainer's memtable is exactly this.)
  • Java's java.util.concurrent.ConcurrentSkipListMap / ConcurrentSkipListSet — the standard library's concurrent sorted map, a lock-free skip list. It is the go-to when you need a thread-safe NavigableMap.
  • Apache Lucene & HBase — Lucene has used skip lists within postings to skip over document IDs during query evaluation; HBase's in-memory store (MemStore) is skip-list backed (Java's ConcurrentSkipListMap), mirroring the LSM memtable pattern.
A recurring shapeNotice the pattern: skip lists show up wherever you need an ordered, in-memory index that several threads touch at once — Redis ZSETs, LSM memtables, concurrent maps, search postings. That is precisely the niche where 'ordered + simple + concurrent' beats a balanced tree.

Common misconceptions & gotchas#

Why use randomness instead of a balanced tree?

Same asymptotics, far less code. A skip list reaches O(log n) without rotations, recoloring, or the case analysis a red-black/AVL tree needs — and because updates are local pointer splices, it is much easier to make concurrent (even lock-free). The price is a probabilistic rather than deterministic bound.

Is the O(log n) guaranteed?

No — it is expected and holds with high probability, not as a hard worst case. With catastrophically unlucky coin flips the list degenerates toward O(n), but the probability of degrading by any constant factor falls off exponentially in n, so a large list effectively never misbehaves. The randomness is internal, so there is no adversarial key ordering that forces the worst case.

Why p = 1/2?

It makes the height a single coin flip and balances search speed against memory: ~2 pointers per node and ~2·log2(n) search steps. The theoretical optimum for search time is around p = 1/e ≈ 0.37, and memory-sensitive systems like Redis use p = 1/4 to cut pointers to ~1.33 per node at the cost of slightly longer walks per level.

How do concurrent skip lists work?

Updates are local, so they map onto compare-and-swap. Lock-free designs (Pugh's, Java's ConcurrentSkipListMap) delete in two phases: a single CAS logically marks the node absent, then physical unlinking from each level happens lazily, with passing threads helping finish it. Readers always see a consistent, linearizable structure without taking locks.

QuizRedis chose p = 1/4 (rather than 1/2) for its sorted-set skip list. What is it primarily optimizing, and what does it give up?

  1. Optimizing search speed; giving up worst-case guarantees
  2. Optimizing memory (fewer pointers per node); giving up some search speed
  3. Optimizing concurrency; giving up ordered iteration
  4. Optimizing insert speed; giving up delete speed
Show answer

Optimizing memory (fewer pointers per node); giving up some search speedAverage pointers per node is 1/(1−p): 2.0 at p=1/2 but only ~1.33 at p=1/4. Lowering p makes towers taller-but-rarer, cutting per-node pointer memory — important when a server holds millions of sorted-set members. The cost is more horizontal steps per level, i.e. slightly slower searches. Concurrency and ordered iteration are unaffected by p.

In an interview#

Lead with the pitch: a skip list gives balanced-tree performance (expected O(log n) search/insert/delete) with much simpler, rotation-free code that is easy to make concurrent — at the cost of expected rather than worst-case bounds. Name the uses: Redis sorted sets (ZSET), LevelDB/RocksDB memtables, and Java's ConcurrentSkipListMap.

Be ready to explain the coin-flip promotion (geometric level distribution, ~2 pointers/node and ~log2(n) levels at p=1/2), the skim-high/drop-low search, and why there is no rebalancing. Expect the follow-up about the worst case — with terrible luck every node lands at level 0 and search degrades to O(n), which is vanishingly unlikely, which is why the bound is probabilistic. A strong candidate also mentions span counts for O(log n) rank queries and the two-phase logical-delete trick that makes lock-free versions safe.

Then open the simulator: INSERT a spread of keys and watch towers of different heights appear, SEARCH a key and follow the path skipping across the express lanes, then DELETE a key and see it unlinked from every level of its tower.

References & further reading#

References