Quadtree · System Internals

The big picture#

TL;DRthe 30-second version

A quadtree indexes points in 2D by recursively cutting a square into four quadrants — NW, NE, SW, SE — refining only where points are dense.
Points live in leaf cells. A cell splits into four children when it overflows a small capacity; sparse regions stay one big cell, dense regions get deep.
A range or radius query walks from the root and prunes any quadrant whose bounding box can't intersect the query region — touching only a sliver of the tree.
Query and insert are ~O(log n) when points are well spread, but degrade to O(n) for heavily clustered or coincident points unless you cap the depth.
It's the spatial cousin of the B-tree: where a 1D index sorts on one key, a quadtree partitions two dimensions at once. Relatives: k-d trees, R-trees, geohash, S2, H3.

Everything below expands on these points. Read the core sections top to bottom for the full mental model; the collapsible "Go deeper" boxes hold the advanced internals (worst-case analysis, depth caps, variant trade-offs) you can skip on a first pass and return to later.

a cell splits into NW / NE / SW / SE only when it overflows — depth grows where data is dense

worldthe root cell
- NWdense → split again
  - NW
  - NE
  - SW
  - SE
- NE
- SW
- SE

One square, recursively cut into four

Start here: the problem it solves#

You have millions of points — drivers, restaurants, sensors — each with a latitude and longitude, and you constantly need to answer 'which ones are within 2km of this location?'. Checking the distance to every single point is O(n) per query and hopeless at scale: a ride-hailing service matching riders to nearby drivers can't scan every driver on the continent for each request.

The obvious fix — put a normal database B-tree index on the coordinates — barely helps, and understanding why is the whole motivation for a spatial index. A B-tree sorts on a single key. Index latitude, and you can quickly grab every point in a latitude band; but that band wraps the entire globe east-to-west, so it still contains far too many points. Worse, nearness is genuinely two-dimensional: two points can be one degree apart in latitude (close on that axis) yet be on opposite sides of the planet in longitude. A 1D order can keep neighbors-on-one-axis together, but it can't keep neighbors-in-the-plane together — sorting forces you to pick an axis, and the other axis scatters.

Why one dimension can't index twoAny 1D sort imposes a total order on points; 2D nearness is a partial relationship with no total order that preserves it. A range query 'lat in [a,b] AND lon in [c,d]' over two separate B-tree indexes still has to intersect two long candidate lists. A composite (lat, lon) index helps the first axis only — once you fix latitude, longitude is unsorted within it. You need a structure that partitions both axes together.

A spatial index organizes points by where they are, so a query can rule out whole areas at once. A quadtree does this by carving space into a hierarchy of squares: ask only the squares that overlap your search area, and ignore everything else. The payoff is turning a nearby search from O(n) into roughly O(log n), while adapting automatically to how the data clusters.

The mechanism: squares within squares#

The root node covers the whole world as one square (its bounding box). Each node holds up to a fixed capacity of points — often a small number like 4. When a node overflows, it subdivides into four equal quadrants — NW, NE, SW, SE — and pushes its points down into whichever child quadrant contains each one. That node is now an internal node; its points live in its children. Points only ever live in leaf cells.

Start with one square (the root) covering all of space, holding a list of points.
Insert a point: descend from the root, at each internal node picking the child quadrant (NW/NE/SW/SE) whose box contains the point, until you reach a leaf.
Add the point to that leaf. If the leaf now exceeds its capacity, split it into four equal quadrants and redistribute its points into the children.
If all the points fall into the same child (clustered), that child immediately overflows too — splitting cascades downward until they separate or a depth cap stops it.
Repeat — dense regions keep splitting and get deeper; sparse regions stay shallow as a single coarse cell.

Adaptive, not uniformUnlike a fixed grid, a quadtree only refines where the data is. A city full of points becomes a deep, fine subtree; an empty ocean stays a single big cell. That data-adaptivity is the key advantage over a uniform grid, which must pick one cell size everywhere — too coarse for the city, too fine (and mostly empty) for the ocean.

Go deeperCapacity, depth caps, and coincident points

Capacity (the bucket size before a leaf splits) trades tree height against per-leaf scan cost. A small capacity means a deeper tree with tiny leaves — more pointer-chasing but fewer distance checks per leaf. A larger capacity means a shallower tree with fatter leaves — less memory overhead, but each leaf scan touches more points. Typical implementations use 4–16.

A depth cap is mandatory in practice. If many points share the same (or nearly the same) coordinates, no subdivision can ever separate them: all four children keep funnelling them into the same quadrant, and the tree tries to recurse forever. Capping the depth turns the bottom cell into an oversized bucket that simply holds all the coincident points in a list — the query still works, it just scans that list linearly.

Querying: prune regions you can't reach#

To find every point within a radius of a location, walk the tree from the root. At each node, check whether its square (bounding box) intersects the search circle. If it doesn't, skip the entire subtree — none of those points can possibly be in range. If it does, descend into its children; at a leaf, distance-check only the handful of points it holds and keep the ones inside the radius.

highlighted = descended (box overlaps the circle) · dashed = pruned (box misses it)

rootbox hits circle → descend
- NWbox misses → PRUNE
  - a
  - b
  - SW
- NEbox hits → descend
  - leaf pointsdistance-checked
- SWbox misses → PRUNE
- SEbox misses → PRUNE

A radius query prunes non-overlapping quadrants

Cell's box doesn't intersect the search circle ⇒ prune it and everything inside, with zero further work.
Cell's box intersects ⇒ descend into its four quadrants and test each the same way.
Leaf that intersects ⇒ distance-check just its few points and collect those within radius.

Because most of the world's cells don't intersect a small search circle, the query touches only a tiny fraction of the tree and distance-checks only a few points instead of all of them. The cheap box-vs-circle test at each node is what lets you discard millions of points without ever looking at them individually. That pruning is the entire value of a spatial index. Nearest-neighbor queries work the same way, but instead of a fixed radius they use a best-so-far distance to prune: once you've found a candidate at distance d, any cell whose box is farther than d can be skipped.

PredictYour search circle sits entirely inside one leaf cell's square. How much of the rest of the tree does the query examine?

Hint: What does the box-intersection test say at the root's other three children?

Almost none of it. From the root, only the one child whose box contains the circle intersects it; the other three quadrants are pruned immediately, and recursively their subtrees are never visited. The query descends a single path to that leaf and distance-checks just the points there. This is the best case — O(depth) ≈ O(log n) — and it's why pruning, not the tree itself, is the source of the speedup.

Complexity: when it's fast, and when it isn't#

When points are reasonably well distributed, a quadtree behaves like a balanced search tree. Its depth is about log4(n) — every level cuts the space (and roughly the point count) by four — so descending to a leaf is O(log n). Insert pays the same cost: find the leaf, then occasionally split. A point or small-radius query that touches few cells is O(log n + k), where k is the number of results reported.

Insert: O(log n) average — descend to a leaf, append, occasionally split. Each split is O(capacity), a small constant.
Point / small-range query: O(log n + k) average, where k is the matches returned. Pruning keeps the visited cells small.
Space: O(n) for the points, plus internal nodes. Heavily subdivided regions add overhead — each split allocates four child cells even if some stay empty.
Worst case (clustered or coincident points): O(n). The tree degenerates into a long thin chain, descent walks every level, and the box test stops pruning.

The worst case is real, not theoreticalUnlike a B-tree, a quadtree does not self-balance. Its shape is dictated entirely by the data's spatial distribution. Pile a million points into one tiny area and the tree becomes a degenerate chain of single-occupied quadrants all the way down to the depth cap — every operation along that path is O(depth), and with no cap the depth (and memory) is effectively unbounded. Balanced behavior is a property of well-spread data, not a guarantee of the structure.

Go deeperWhy depth depends on the data, not just n

For uniformly random points, expected depth is O(log n) and total nodes O(n), giving the clean averages above. But the worst case is governed by the spatial spread of the keys: two points at distance d in a domain of width W force subdivision down to a cell smaller than d, i.e. about log2(W/d) levels — regardless of how many points there are. Near-coincident points (tiny d) therefore drive depth toward the cap independent of n.

This is the core reason quadtrees are simple but fragile: their cost is data-dependent. k-d trees address one symptom (imbalance) by splitting at the median point so each cut halves the point count, guaranteeing O(log n) depth — but they lose the quadtree's fixed, predictable cell geometry. R-trees address another (clustering and updates) by grouping nearby objects into balanced, overlapping bounding rectangles with guaranteed height. Each picks a different point on the simplicity-versus-robustness curve.

Variants and relatives#

"Quadtree" is a family, not a single structure. The differences come down to what lives in a node and how space is split.

Point quadtree — the original (Finkel & Bentley, 1974). Each inserted point itself becomes a node and splits space at its own coordinates into four (unequal) quadrants. Shape depends on insertion order.
PR (point-region) quadtree — the common modern form: space is split into equal quadrants regardless of the points, and points sit in leaf buckets. Shape depends only on point positions, not insertion order — this is the one in the simulator.
Region quadtree — used for images and rasters: recursively split a grid until each cell is uniform (e.g. all-black or all-white pixels), giving a compact representation of large flat areas. The basis of quadtree image compression.
k-d tree — generalizes to k dimensions by splitting on one axis at a time (alternating x, y, …) at the median point. Balanced by construction; great for k-nearest-neighbor in higher dimensions.
R-tree (Guttman, 1984) — indexes rectangles/extents, not just points, by grouping them into balanced, possibly overlapping bounding boxes. The workhorse inside PostGIS, SQLite R*Tree, and most geospatial databases.
Geohash — encodes a 2D coordinate into a 1D base-32 string by interleaving latitude/longitude bits; shared prefixes mean spatial proximity, so an ordinary B-tree or sorted key store becomes a spatial index. Easy to shard.
Google S2 / Uber H3 — production global grid systems. S2 projects the sphere onto a cube and uses a Hilbert-curve cell ID (a quadtree-on-a-sphere); H3 tiles the globe with hexagons (no diagonal-neighbor ambiguity). Both give stable cell IDs for sharding and aggregation.

The jump from 2D to 3D follows the same recipe with eight children instead of four: an octree. It's used for 3D collision detection, voxel worlds (Minecraft-style), point-cloud indexing, and graphics culling.

Trade-offs: simplicity vs. robustness#

A quadtree's biggest selling point is conceptual and implementation simplicity. Splitting a square into four equal quadrants is trivial; the cell geometry is fixed and easy to reason about; and range/radius pruning is a one-line box test. For in-memory, point-only, moderately-distributed workloads it's hard to beat on effort-to-value.

For: dead simple to build and debug; fixed, predictable cell boundaries; adapts to clustering by refining only where needed; pruning makes range and radius queries fast.
Against: no self-balancing, so it's sensitive to skewed/clustered data; uneven depth and degenerate chains; updates can trigger cascades of splits; equal-quadrant splits can waste space on empty children.
Versus k-d tree: k-d trees guarantee balance (median splits) and handle higher dimensions, but lose the fixed cell geometry and are fiddlier to update incrementally.
Versus R-tree: R-trees stay balanced, index extents (not just points), and handle dynamic updates well — at the cost of overlapping bounding boxes that can force exploring multiple branches per query, and a more complex implementation.

Rule of thumbReach for a quadtree when data is points, mostly in memory, and not pathologically clustered, and you value simplicity. Reach for an R-tree when you index rectangles/shapes, need balance and durable on-disk updates (a database), or when data is dynamic. Reach for geohash/S2/H3 when you need to shard space across many machines using a stable 1D cell ID.

Failure modes#

Most quadtree problems trace back to the same root cause: the tree's shape is dictated by the data, and pathological data makes a pathological tree.

Degenerate deep trees — heavily clustered or near-coincident points force subdivision far past the point of usefulness, producing a long chain of single-occupied cells. Descent and query cost climb toward O(n).
Unbounded subdivision — coincident points (exact duplicates) can never be separated by splitting, so without a depth cap the tree recurses forever and exhausts memory. Always cap depth and let the bottom cell hold a bucket.
Imbalance — even short of degeneracy, real-world skew (everyone in the city, no one in the desert) makes some branches far deeper than others, so latency varies wildly by query location.
Costly updates — frequent inserts/deletes in dense regions trigger repeated split (and, on delete, merge) cascades. Moving objects (e.g. live drivers) that constantly cross cell boundaries make this worse; many systems rebuild periodically instead of updating in place.
Empty-child overhead — equal-quadrant splitting allocates all four children even when points fall into one, wasting memory in sparse-but-deep regions.

MitigationsCap the maximum depth and let leaf buckets absorb coincident points. Tune capacity so leaves aren't too tiny. For high-churn or moving-point workloads, prefer periodic bulk rebuilds, a grid + quadtree hybrid, or an R-tree designed for dynamic updates. For global scale, shard with geohash/S2/H3 so each shard's quadtree stays small.

Quadtree vs. k-d tree vs. R-tree vs. geohash#

	Quadtree	k-d tree	R-tree	Geohash
Indexes	Points (2D)	Points (k-D)	Points & rectangles	Points → 1D string
Build	Simple, top-down splits	Median splits per axis	Complex, balanced	Trivial bit-interleave
Query	Box-prune, ~O(log n)	~O(log n), good k-NN	~O(log n), balanced	Prefix range scan
Updates	Split cascades; no rebalance	Hard to keep balanced	Designed for dynamic	Just insert a key
Clustering	Degrades (unbalanced)	Stays balanced	Stays balanced	Edge-of-cell misses
Best for	In-memory point sets, games	k-NN, higher dimensions	Geospatial DBs (PostGIS)	Sharding, key-value stores

A subtlety with geohash: because it linearizes 2D into 1D, two points can be physically close yet land in different prefix buckets (the classic 'edge of the cell' problem). Real systems query the cell plus its eight neighbors to compensate. Quadtrees, k-d trees, and R-trees don't have this artifact because they prune in the plane directly.

Where quadtrees run in the wild#

Anywhere a system asks 'what's near this point?' or 'what's in this region?' over many 2D items, a quadtree or one of its relatives is probably involved.

Games & graphics — collision detection (only test objects in the same/neighboring cells, not every pair), view-frustum culling (skip whole regions off-screen), and spatial hashing for particle systems. Octrees do the 3D version.
Image compression — region quadtrees collapse uniform pixel areas into single nodes, compactly representing large flat regions; the same idea drives mesh refinement in simulations.
Map tiling — web maps (Google Maps, Mapbox, OpenStreetMap) serve square tiles addressed by a quadtree-style z/x/y scheme; zooming in descends one level and quadruples the tiles.
Geospatial databases — PostGIS and most spatial DBs use R-trees (a quadtree relative) for ST_Intersects/ST_DWithin; Elasticsearch's geo queries use a BKD/geohash grid; MongoDB uses S2 cells for its 2dsphere index.
Ride-hailing & 'near me' — Uber's H3 and Google's S2 partition the globe into cells so 'find drivers within 2km' becomes 'scan a handful of nearby cells', and the cell IDs double as shard keys.
Scientific computing — N-body and particle simulations use quadtrees/octrees (Barnes–Hut) to approximate far-away groups of bodies as a single mass, turning O(n²) force computation into O(n log n).

Common misconceptions & gotchas#

What makes a quadtree slow?

Clustered or coincident points. The tree's shape mirrors the data, so when many points pile into a small area the tree becomes a deep, thin, unbalanced chain — descent and queries drift toward O(n), and the box-pruning test stops eliminating much. Without a depth cap, coincident points can even make subdivision (and memory) unbounded. Well-spread data, a depth cap, and a sensible leaf capacity keep it fast.

Quadtree vs. k-d tree — when does each win?

Quadtrees split space into four equal quadrants regardless of the data, giving fixed, simple cell geometry that's easy to build and reason about, but no balance guarantee. k-d trees split on the median point one axis at a time, guaranteeing O(log n) balanced depth and generalizing cleanly to higher dimensions (great for k-nearest-neighbor), at the cost of irregular cells and trickier incremental updates. Pick quadtree for simple 2D point work; pick k-d tree for balance and higher dimensions.

How does a range query prune?

At each node it does a cheap test: does this cell's bounding box intersect the query region (rectangle or circle)? If not, the entire subtree is skipped — none of its points can match — so a single test discards potentially millions of points. If yes, it recurses into the four children. Only leaves whose boxes overlap the query are scanned point-by-point. The pruning, not the tree itself, is where the speedup comes from.

What about 3D?

Use an octree — the same idea with eight children (split each of the three axes), one per corner of a cube. Octrees power 3D collision detection, voxel worlds, point-cloud indexing, and graphics culling. The general k-dimensional analogue is the k-d tree, which scales to dimensions where 2^k children would be far too many.

QuizAn app stores live driver locations in a single global quadtree. 90% of drivers are in three dense downtown cores, and they move constantly. Queries near downtown are slow and inserts thrash. What's the best fix?

Remove the depth cap so dense cells can subdivide further
Shard space (e.g. geohash/S2/H3) so each region has its own small tree, and rebuild hot trees periodically instead of churning splits
Switch to a single 1D B-tree on latitude
Increase leaf capacity to 1,000,000 so nothing ever splits

Show answer

Shard space (e.g. geohash/S2/H3) so each region has its own small tree, and rebuild hot trees periodically instead of churning splits — The problems are clustering (downtown), churn (moving points), and global scope. Sharding space distributes the load and keeps each tree small, and periodic rebuilds avoid split/merge thrash from constantly-moving points. Removing the depth cap makes degenerate depth worse; a 1D B-tree can't do 2D nearness; an enormous capacity turns every query into a linear scan of a giant bucket.

In an interview#

Lead with the problem: nearby search over many geo-points needs a 2D spatial index, not a 1D B-tree — and explain why one dimension can't capture two-dimensional nearness. A quadtree recursively subdivides space into NW/NE/SW/SE quadrants, holds points in leaf buckets, splits a cell when it overflows, and prunes non-overlapping quadrants on query to get roughly O(log n) lookups. Name where it fits: ride-hailing driver matching, 'restaurants near me', collision detection, map tiling.

Show you know the failure mode: it doesn't self-balance, so clustered or coincident points degrade it to O(n) — mention the depth cap and leaf buckets as the fix. Then compare alternatives: k-d trees (balanced, higher dimensions, k-NN), R-trees (balanced, index rectangles, geospatial databases like PostGIS), and geohash/S2/H3 (linearize 2D to 1D for sharding and prefix scans, with the edge-of-cell caveat). For 3D, say octree.

Then open the simulator: INSERT points (or click the map) and watch dense cells subdivide, then QUERY a radius and watch whole quadrants get pruned while only the cells near your circle are scanned.

References & further reading#

References

Finkel & Bentley — Quad Trees: A Data Structure for Retrieval on Composite Keys (1974) — the original paper that introduced and named the quadtree
Samet — The Quadtree and Related Hierarchical Data Structures (ACM Computing Surveys, 1984) — the definitive survey; Samet's books are the canonical reference on spatial data structures
Guttman — R-trees: A Dynamic Index Structure for Spatial Searching (1984) — the balanced, rectangle-indexing relative used in most geospatial databases
PostGIS Workshop — Spatial Indexing — how R-tree spatial indexes are used in a production geospatial database
Uber H3 — hexagonal global grid system — production grid for 'near me' queries and spatial sharding
Google S2 Geometry — quadtree-on-a-sphere using Hilbert-curve cell IDs
Wikipedia — Quadtree — accessible overview of point, PR, and region quadtrees with pseudocode

Ready to try it?

The simulator is a real, deterministic implementation — pick a scenario and step through it, scrubbing the timeline forward and backward through every change.

Try these in the simulator

Insert points, then a radius query →Scatter ten points so the dense cluster forces cells to subdivide, then run a radius query that prunes the cells it can't possibly touch.A dense cluster subdivides →Drop several points into one tight corner and watch a single cell split into quadrants again and again as it overflows its capacity.A radius query prunes cells →With points spread across the map, a small radius query visits only the cells near the target and skips the rest — that pruning is what makes it sub-linear.

Open the Quadtree simulator →

Up nextLoad Balancer

← Back to the learning path