The Event Loop · System Internals

The big picture#

TL;DRthe 30-second version

The problem is concurrency, not throughput: a server must hold tens of thousands of connections open at once, most of them idle, without spending a whole OS thread on each. That's the C10k problem.
The event loop is one thread running a tight cycle: ask the OS which sockets are ready (non-blocking I/O + a readiness API — epoll/kqueue/IOCP), run the callback for each ready socket, then loop. This is the reactor pattern.
Cost is O(ready) work per tick, not O(connections): idle sockets cost a few bytes of kernel state, not a megabyte-sized thread stack. One thread can multiplex thousands of clients.
The catch: there is only one thread, so any CPU-bound or blocking call inside a callback stalls every other connection. The discipline is 'never block the loop' — offload heavy work to a thread/worker pool.
It's the model behind Node.js/libuv, nginx, Redis (single-threaded command execution), Netty, and Python's asyncio. The alternative — thread-per-request — is simpler to write but falls over under high connection counts.

Everything below expands on these points. Read the core sections top to bottom for the full mental model; the collapsible "Go deeper" boxes hold the advanced internals (loop phases, the proactor variant, libuv's thread pool) you can skip on a first pass and return to later.

ready socketsthousands, mostly idle

wake on readinessreturns e.g. [6, 8]

epoll_wait()which fds are ready?

for each ready fd →

run callbacksone per ready fd · O(ready)

One thread, three repeating steps

Start here: why not a thread per connection?#

The classic Apache-style model spawns a thread (or process) per connection. It's simple to reason about — each connection gets its own call stack, and you write plain top-to-bottom blocking code: read the request, do the work, write the response. But threads aren't free. Each one needs its own stack (commonly 1–8 MB of address space reserved), a kernel scheduling slot, and bookkeeping. Run ten thousand of them and you've reserved gigabytes of memory for stacks alone, and the scheduler now has ten thousand things to juggle.

The deeper cost is context switching. When a thread blocks on a slow socket, the OS parks it and switches to another — saving and restoring registers, swapping page-table state, polluting CPU caches. A handful of switches is nothing; tens of thousands per second is pure overhead, work the CPU does instead of serving requests. This is the wall Dan Kegel named in 1999 as the C10k problem: can a single server handle ten thousand concurrent clients? (Today the bar is often C10M.)

Most connections are idle most of the timeA typical connection spends almost all of its life waiting — for the next request, for a slow mobile client to send bytes, for a database round-trip, for nothing in particular. Dedicating a whole thread (megabytes of stack, a kernel scheduling slot) to mostly-idle work is wasteful. The event loop's whole premise is: don't pay for a thread per connection, pay for a thread per piece of actual, ready-to-run work.

Note the distinction this whole topic turns on: concurrency (managing many things in progress) is not the same as parallelism (doing many things at the same instant). A web server's bottleneck is usually concurrency — thousands of connections in flight, almost all waiting on I/O — not raw CPU parallelism. The event loop attacks the concurrency problem with a single thread; parallelism, if you need it, comes from running several loops.

The accept queue: where new clients wait#

When a client connects, it doesn't get handed straight to your application code. The kernel completes the TCP handshake and places the finished connection into a backlog queue — the listen backlog you configure when you call listen(socket, backlogSize) — and your application calls accept() to pull one connection off that queue.

If your app is slow to call accept() — busy servicing other clients, or stalled on a blocking call — the backlog fills up. Once it's full, the kernel refuses new connections outright (or drops the handshake so the client retries), even though, from the network's point of view, the new client did everything right.

Backlog overflow is a real production failure modeA server that's overwhelmed doesn't usually fail mid-request — it fails by refusing new connections before your application code (or even your access logs) ever sees the client. On Linux, watch for 'SYNs to LISTEN sockets dropped' in netstat -s, or tune net.core.somaxconn. Monitoring accept-queue depth is a standard way to detect this before users report timeouts.

The mechanism: readiness, callbacks, repeat#

Once connections are accepted, something has to read their requests and write responses. Instead of a thread per connection, Redis, Node.js, and nginx run a single thread that never blocks on I/O. The trick has two halves. First, every socket is put in non-blocking mode, so a read() that has no data returns immediately with EWOULDBLOCK instead of parking the thread. Second, a readiness API lets the thread ask the OS, in one call, which of its thousands of sockets are ready right now.

That readiness API is epoll on Linux, kqueue on BSD/macOS, IOCP on Windows, and the portable-but-slow select/poll everywhere. epoll and kqueue are the scalable ones: you register interest in a set of file descriptors once, and each wait call returns only the descriptors that became ready — O(ready), not O(registered). (select rescans the entire fd set every call, which is exactly why it doesn't scale to C10k.) The thread then loops:

epoll_wait() blocks until at least one registered socket is ready (or a timer is due) — this costs zero CPU while idle; the thread is asleep in the kernel.
The OS returns the list of ready file descriptors: new connections waiting in the accept backlog, established sockets with unread data, and sockets that are now writable.
The loop dispatches each event to its callback (its handler) — accept() a new connection, or read()/parse/process/write() a request — never blocking on any single socket.
Pending tasks queued by those callbacks (timers that came due, completed async work) are drained, then the loop goes back to epoll_wait() and sleeps again.

This shape has a name: the reactor pattern (Schmidt). A 'synchronous event demultiplexer' (epoll) waits on many handles; when events arrive, a dispatcher routes each to the handler registered for it. The application registers callbacks and gives up control to the loop, which calls back when there's something to do — 'don't call us, we'll call you.' The whole of Node, nginx, libuv, Netty, and asyncio is the reactor pattern with different ergonomics on top.

One thread, but still concurrentFrom the outside, thousands of clients appear to be served "at once" — but only one socket is actually being touched at any instant. Concurrency comes from rapidly interleaving many short operations, not from parallel execution. This only works if handlers never block; a single slow synchronous call stalls every other connection on that thread.

The payoff: a shared store with no locks#

Here's why the event loop model is more than just a clever way to avoid threads: because only one thread ever touches application state, that state needs no locking. Redis's entire dataset is a set of in-memory data structures that every client command reads and writes directly — no mutexes, no risk of one client seeing another's write half-applied, no lock contention sapping throughput as you add clients.

Concretely: a client writes SET foo bar into its socket. That command doesn't execute immediately — it just sits there until the event loop, on some future iteration, notices the socket is readable, reads the command, and runs it against the shared store. If two clients both have a command pending, the loop runs them one at a time, in whatever order it got to them — never both at once.

Atomic almost by accidentPeople sometimes describe Redis commands as 'atomic' like it's a special design feature. It's really a side effect of single-threaded execution: there's no way for two commands to interleave, because there's only ever one command running. The flip side: a single O(n) command (KEYS *, a big SORT, a Lua script that loops) holds the thread and blocks every other client until it finishes.

Inside Node's loop: phases and the microtask queue#

A real event loop isn't a single undifferentiated callback queue — it has structure, because different kinds of callbacks need different ordering. Node.js (via libuv) runs each iteration of the loop through a fixed sequence of phases, each with its own queue, and drains each queue before moving on.

timerssetTimeout / setInterval due now

↓

pending callbacksdeferred I/O callbacks (e.g. TCP errors)

↓

pollwait for I/O; run I/O callbacks — most work

↓

checksetImmediate callbacks

↓

close‘close’ events (socket.on('close'))

libuv loop phases (one tick)

Sitting beside these phases is the microtask queue — promise .then/.catch callbacks and queueMicrotask in Node, plus process.nextTick (which has even higher priority). Microtasks are not a phase; they are drained completely after every single callback, before the loop continues. That's why an awaited promise resolves before a setTimeout(…, 0) that was scheduled earlier: the promise continuation is a microtask, the timer is a macrotask in the next phase.

Go deeperMacrotasks vs microtasks — the ordering gotcha

Schedule four things in this order: setTimeout(cb, 0), setImmediate(cb), Promise.resolve().then(cb), process.nextTick(cb). They do NOT run in source order. nextTick runs first (highest priority, drained before other microtasks), then the promise (microtask), then on the next loop turn the timer and immediate run in their respective phases. A microtask that schedules another microtask can starve the loop entirely — the macrotask phases never get a turn — which is the microtask equivalent of blocking the loop.

The browser event loop is similar in spirit but simpler: one task queue (plus separate queues for things like rendering and animation frames) and a microtask checkpoint after each task and after each callback. The HTML spec defines it precisely; the practical rule is the same — microtasks before the next task, rendering only between tasks.

The cost model: O(ready) per tick#

The whole point of epoll/kqueue is that the cost of one loop iteration scales with the number of ready sockets, not the number of registered ones. With C connections of which R are ready this instant, a tick is O(R) — plus the callbacks themselves. Idle connections contribute essentially nothing: they sit in the kernel's interest set as a few bytes of state, never copied, never scanned. This is the precise sense in which one thread 'handles thousands of connections.'

Memory: O(C) but with a tiny constant — a socket buffer and a small struct per connection (kilobytes), versus a 1–8 MB thread stack per connection in the blocking model. The difference between 'thousands' and 'tens of thousands' of connections on one box.
CPU while idle: ~0. The thread sleeps in epoll_wait until the kernel wakes it. No busy-polling, no per-connection timers firing.
CPU per tick: O(R) for the readiness scan + the sum of callback work. As long as each callback is short and non-blocking, latency stays low across all C connections.
select/poll degrade this to O(C) per call because they re-examine every descriptor every time — the original reason they couldn't reach C10k and epoll/kqueue were invented.

The one term that wrecks the model: CPU-bound workBecause there is a single thread, the time a callback spends computing is time no other connection is served. Hash a large payload, parse a huge JSON, run a tight loop — for those milliseconds the loop is frozen and every other client waits. The model assumes each callback is O(small). Violate that and your p99 latency explodes even though average load looks fine.

Variants: reactor vs proactor, thread pools, multiple loops#

Reactor (readiness-based) — the OS tells you a socket is ready, then YOU do the read/write. epoll, kqueue, select. This is the classic event loop (nginx, Redis, Node's network I/O).
Proactor (completion-based) — you start an operation and the OS performs the whole read/write into your buffer, then tells you it's DONE. Windows IOCP and Linux io_uring work this way. libuv emulates a proactor-style API on top of epoll so Node code looks uniform across platforms.
Multiple event loops — one loop uses one core. To use all cores you run N loops: nginx forks worker processes, Node uses the cluster module or worker_threads, and each typically accepts on the same listening socket.
SO_REUSEPORT — lets several processes each bind the same port with their own listening socket; the kernel load-balances incoming connections across them, avoiding a single shared accept lock (the 'thundering herd' on accept).

Go deeperHow a single-threaded runtime does blocking work: libuv's thread pool

Not everything has a non-blocking syscall. File system I/O on Linux historically could not be done in a truly non-blocking way through epoll, and DNS resolution via getaddrinfo() is a blocking C call. So libuv keeps a small thread pool (default 4 threads, set by UV_THREADPOOL_SIZE) and runs those operations there. The worker thread blocks; when it finishes, it posts the result back to the loop, which fires your callback on the main thread. Your JavaScript still runs single-threaded — only the blocking syscall was offloaded.

This is also the right pattern for your own CPU-bound work: don't run it on the loop. Push it to a worker_threads pool (Node), a separate process, or a queue. The event loop stays responsive; the heavy computation happens off to the side and reports back via a callback. io_uring is now changing the file-I/O story — it offers genuine async file operations — and libuv has been adopting it where available, reducing reliance on the thread pool.

Strengths and limits of the event-loop model#

Scales to huge connection counts — memory per idle connection is just a small struct, not a thread stack, so tens of thousands of idle clients cost almost nothing.
No locking for shared in-memory state — since only one thread ever touches application data at a time, there's no need for mutexes around request handling (a big part of why Redis is simple and fast, with no lock contention as clients grow).
Vulnerable to one slow handler — any CPU-bound or blocking call inside a callback stalls every other connection on that thread. Real servers offload such work to a thread/worker pool and keep the event loop itself non-blocking.
Doesn't use multiple cores by itself — a single event loop runs on one core; scaling to multiple cores means running several event-loop processes (Node's cluster, nginx's worker processes) and load-balancing across them.
Async code is harder to write — control flow is inverted into callbacks/promises/async-await; errors must be threaded through manually, and a forgotten await or unhandled rejection is a silent bug. Blocking code reads top-to-bottom; event-loop code does not.

PredictYour Node service is at 5% CPU but request latency suddenly spikes to seconds for ALL clients at once. What's the most likely cause?

Hint: Low CPU rules out 'too much work overall.' What can one request do to all the others on a single thread?

One handler is blocking the event loop — a synchronous CPU-bound call (JSON.parse of a huge body, a sync crypto/hash, a tight loop, a blocking fs call) or a microtask that keeps rescheduling itself. While that callback runs, the single thread can't touch any other socket, so every client's latency spikes together even though average CPU looks idle. The fix is to move the heavy work off the loop (worker thread, stream/chunk it, or precompute).

How event loops fail in production#

Blocking the loop — the #1 failure. A synchronous CPU-bound call or accidental blocking syscall (sync fs, a synchronous deadlock, a regex with catastrophic backtracking) freezes every connection at once. Symptom: latency for all clients spikes together while CPU may look low.
No backpressure / unbounded queues — if work arrives faster than the loop drains it, internal queues (pending writes, accepted-but-unprocessed requests, an in-memory job queue) grow without bound, memory climbs, and latency degrades until the process is OOM-killed. Healthy systems push back: pause reads, reject, or shed load.
Starvation — a hot path that always has work ready can monopolize the loop so lower-priority callbacks (or whole phases) never run. The microtask version: a promise chain that keeps scheduling microtasks starves the macrotask phases entirely.
Slow-consumer / write stalls — a client that reads responses slowly fills the socket's send buffer; if you ignore the 'writable' signal and keep buffering, memory grows per slow client. Respect drain/backpressure on writes.
Callback hell & error handling — deeply nested callbacks obscure control flow, and an error thrown in an async callback won't be caught by a surrounding try/catch (the stack has unwound). Unhandled promise rejections and missing error listeners on streams/sockets crash or silently drop.

Detecting a blocked loopMeasure event-loop lag: schedule a timer for T ms and check how late it actually fires; the delay is how long the loop was busy not servicing I/O. Node exposes perf_hooks.monitorEventLoopDelay; healthy services keep p99 lag in single-digit milliseconds and alert when it climbs.

Event loop vs the alternatives#

	Event loop	Thread per request	Thread pool	Multi-process
Concurrency ceiling	Very high (10k–1M conns/thread)	Low — bound by thread count/RAM	Medium — bound by pool size	High — N× a single process
Memory per idle conn	Tiny (a small struct)	Large (1–8 MB stack)	Large per active worker	Tiny per conn, ×N processes
CPU-bound work	Bad — blocks everyone; must offload	Good — OS preempts each thread	Good — bounded parallelism	Good — true parallelism
Uses many cores	No (one loop = one core)	Yes	Yes	Yes
Shared-state locking	None needed (single thread)	Locks/mutexes required	Locks/mutexes required	No shared memory (IPC instead)
Code complexity	Higher — async/callbacks	Lowest — linear blocking code	Medium	Medium — IPC, coordination

The honest summary: event loops win decisively on I/O-bound concurrency and memory; thread-per-request wins on simplicity and on CPU-bound work where preemption matters. Most real systems combine them — an event loop for the network front, a thread/process pool behind it for heavy work — and run several loops to use all cores.

Where event loops run in the wild#

Once you recognize the shape — non-blocking sockets, a readiness API, a callback dispatch loop — you see it across the whole infrastructure stack. The differences are mostly the language ergonomics layered on top.

Node.js / libuv — JavaScript's single-threaded runtime; libuv provides the cross-platform loop (epoll/kqueue/IOCP), the phases, and the thread pool for fs/DNS. The canonical 'event loop' most engineers mean today.
nginx — event-driven worker processes (one per core), each an epoll/kqueue loop handling thousands of connections. Its non-blocking architecture is why it displaced Apache's prefork model for high-concurrency serving and reverse proxying.
Redis — single-threaded command execution on an event loop (its own ae library over epoll/kqueue). That's the source of its atomicity and simplicity. Redis 6+ added multi-threaded I/O for reading/writing sockets, but command execution stays single-threaded.
Netty (JVM) — the async networking framework under gRPC-Java, Cassandra's transport, Elasticsearch, and more: an event-loop ('EventLoopGroup') reactor over Java NIO selectors.
Python asyncio — the standard-library event loop; async/await coroutines over a selector. Frameworks like uvloop (libuv-backed) and FastAPI/Starlette build on it.

nginx vs Apache prefork — the canonical contrastApache's classic prefork MPM dedicates a process (or thread) to each connection — simple, but memory and context-switching cap concurrency, and slow clients tie up workers (the Slowloris weakness). nginx serves the same load with a handful of event-loop workers and flat memory as connections climb. It's the C10k argument made concrete in two widely deployed servers.

Common misconceptions & gotchas#

If it's single-threaded, how is it scalable?

Because the bottleneck for a web server is concurrency (many connections, almost all waiting on I/O), not CPU parallelism. One thread that never blocks can interleave thousands of short I/O operations, spending CPU only on work that's actually ready. It scales on the axis that matters — connection count — while a thread-per-connection design hits memory and context-switch limits long before. For CPU parallelism you run more loops.

What exactly 'blocks the event loop'?

Anything synchronous that takes non-trivial time on the loop thread: a CPU-heavy computation (hashing, big JSON.parse, a tight loop), a synchronous/blocking syscall (sync fs, a blocking DB driver), catastrophic-backtracking regex, or a microtask that keeps rescheduling itself. While it runs, no other connection is served. The fix is to keep callbacks short and offload heavy work to a thread pool or another process.

epoll vs select — why does it matter?

select/poll pass the entire set of file descriptors to the kernel on every call and scan all of them — O(n) per wait, with a hard cap (FD_SETSIZE) on select. epoll (Linux) and kqueue (BSD/macOS) register interest once and return only the descriptors that became ready — O(ready). At ten thousand connections that's the difference between rescanning 10k fds every loop and touching only the handful that are active. It's the core reason epoll/kqueue exist.

If Node is single-threaded, how does it do file I/O without blocking?

libuv runs blocking operations (file system I/O, DNS getaddrinfo) on a small background thread pool (default 4 threads). The worker thread blocks on the syscall; when done, it posts the result back to the loop, which calls your callback on the main thread. Your JavaScript still runs single-threaded — only the blocking syscall happened off-loop. (io_uring is starting to make file I/O truly async, reducing the need for the pool.)

Does single-threaded mean Redis can only use one core?

For command execution, historically yes — and that's deliberate (no locks, atomic commands). You scale Redis across cores by running multiple instances (sharding) or using Redis Cluster. Redis 6 added multi-threaded network I/O (parsing/replying on extra threads) but kept the data-structure operations on one thread.

QuizYou add a synchronous bcrypt hash (≈80 ms) inside a Node HTTP handler. Under load, what happens?

Only the request being hashed is slow; others are unaffected because Node is async
Throughput drops to ~12 req/s and ALL clients see latency spikes, because the loop is blocked for 80 ms per request
Node spawns a new thread per request to run the hash in parallel
Nothing — bcrypt releases the event loop while it computes

Show answer

Throughput drops to ~12 req/s and ALL clients see latency spikes, because the loop is blocked for 80 ms per request — A synchronous 80 ms CPU call blocks the single loop thread for those 80 ms, during which no other connection is serviced. Serialized, that caps throughput near 1000/80 ≈ 12 req/s and every concurrent client's latency balloons. The fix is the async/threadpool variant (bcrypt's async API, which offloads to libuv's pool) or a dedicated worker — never run heavy CPU work synchronously on the loop.

In an interview#

Lead with the trade-off: thread-per-connection is simple but doesn't scale past a few thousand connections (the C10k problem — thread stacks and context switches dominate); a single-threaded event loop multiplexes many sockets on one thread via a readiness API (epoll/kqueue), paying for work done, not connections held open. Name the pattern (reactor) and the systems (Redis, Node.js/libuv, nginx).

Then show depth: the loop is non-blocking sockets + epoll_wait + callback dispatch, O(ready) per tick; idle connections are nearly free. The punchline for shared state: single-threaded execution needs no locking — that's why Redis commands are atomic without special machinery. Always volunteer the failure mode: a blocking or CPU-heavy call inside a callback stalls every client on that thread, so heavy work goes to a thread pool (and you scale cores by running multiple loops). If asked about Node specifics, mention loop phases (timers/poll/check) and that microtasks/promises drain after each callback.

Then try it in the simulator: CONNECT a few clients past the backlog limit and watch one get refused, SET and GET keys, then TICK the event loop and watch it accept connections and run commands one client at a time against the same shared store.

References & further reading#

References

Dan Kegel — The C10K problem — the 1999 essay that framed the whole question
Douglas Schmidt — Reactor: An Object Behavioral Pattern for Demultiplexing and Dispatching Handles for Synchronous Events — the canonical reactor-pattern paper (POSA)
libuv — Design overview — the loop, the I/O backends, and the thread pool behind Node
Node.js — The event loop, timers, and process.nextTick() — the official walkthrough of the loop phases
Linux man pages — epoll(7) — the scalable readiness API (and edge vs level triggering)
FreeBSD man pages — kqueue(2) — the BSD/macOS equivalent of epoll
nginx — Connection processing methods — epoll/kqueue selection in a production event-driven server
Python — asyncio event loop — the standard-library reactor for Python

Ready to try it?

The simulator is a real, deterministic implementation — pick a scenario and step through it, scrubbing the timeline forward and backward through every change.

Try these in the simulator

Accept, run commands, close →Connections queue in the backlog, a tick accepts them, then commands run one at a time on the single thread against a shared store — set, get, delete, close.The accept backlog fills up →More connections arrive than the backlog can hold before the loop ticks — the extras are refused. One tick then accepts everything queued.One thread, one command at a time →Two clients fire commands at the same key; the single thread runs them in order across ticks, so there are no races — the store is only ever touched by one command at a time.

Open the The Event Loop simulator →

Up nextRate Limiting

← Back to the learning path