System Internals
Open the simulator →
Absolute basics

Client ↔ Server

The request/response loop that almost everything online is built on.

Before any database, cache, or distributed protocol, there's one pattern underneath it all: a client sends a request, and a server sends back a response. A 'server' isn't special hardware — it's just a program that sits waiting for requests and answers them. Understanding this one round-trip is the foundation for everything else in the path.

The big picture#

TL;DRthe 30-second version
  • Almost everything you do online is one pattern repeated: a client asks (a request), a server answers (a response). That there-and-back is a round-trip.
  • A server isn't special hardware — it's just a program running a loop: wait for a request, do the work, send a reply, repeat.
  • The cost that matters most is latency — how long one round-trip takes. Fewer round-trips means a faster app, which is why so much design effort goes into avoiding extra ones.
  • The pattern is simple, but it has a sharp edge: a request can get no answer at all (server down, network dropped). The client can't tell 'slow' from 'never' — and that single fact is the seed of nearly every hard distributed-systems topic.

Everything below expands on these four points. Read the core sections top to bottom for the full mental model; the collapsible "Go deeper" boxes hold the advanced bits (the protocol zoo, idempotency, statelessness) you can skip on a first pass and come back to later.

Clientbrowser
GET /home →← 200 OK + page
Servera loop
One round-trip: client asks, server answers

Start here: why this pattern exists#

Imagine two computers that both have useful things — one holds your photos, the other wants to show them. They could each try to reach into the other's memory directly, but that would be chaos: no order, no permissions, no way to know who is allowed to do what. Software needs a simple, predictable way for one program to ask another for something across a network.

The client-server pattern is that way. One side takes the role of asking (the client) and the other takes the role of answering (the server). The roles are fixed for a given conversation: the client always speaks first, the server only ever replies. That tiny rule removes the chaos — there is always exactly one party in charge of starting, and one party in charge of the data.

Why centralize the answer in a 'server'?Putting the data and the rules in one place — the server — means there's a single source of truth. Your bank balance lives on the bank's server, not copied across every phone that ever checked it. Clients stay simple and disposable; the server holds the state everyone agrees on. This is the trade we'll keep coming back to: central control and simplicity, paid for with the server becoming a thing everyone depends on.

Client, server, request, response#

A client is whatever initiates a conversation — your browser, a mobile app, or another service. A server is a program, running on some machine, that listens for incoming requests and replies to them. The client speaks first (the request); the server answers (the response). The server never calls you out of the blue — it only ever responds.

  • Request: 'GET /home' — the client asking for something, addressed to the server.
  • Response: '200 OK' plus the page — the server's answer, sent back to the client.
  • Round-trip: the there-and-back of one request + its response. Latency is how long that takes.
A server is just a waiting programIt runs a loop: wait for a request, do the work, send a reply, repeat. Scaling, replication, and all the hard parts later are about running many of these and keeping them coordinated — but each one is still just request in, response out.

A request isn't magic — it's just a small message with a few well-defined parts. So is a response. If you've ever opened your browser's developer tools and watched the Network tab, you've seen exactly these pieces fly back and forth.

  1. A request carries: a method (what action — GET to read, POST to create, PUT/PATCH to update, DELETE to remove), a path/address (which resource, like /home or /users/42), headers (extra info — who's asking, what format they want, credentials), and an optional body (data being sent, e.g. a form or JSON).
  2. The server reads the request, does the work (look something up, save something, run some logic), and decides on an answer.
  3. A response carries: a status code (a 3-digit verdict — 200 OK, 404 Not Found, 500 Server Error), headers (format, caching hints, cookies), and an optional body (the actual content — HTML, JSON, an image).
  4. The client reads the status code first to know if it worked, then uses the body.
Status codes in one breathThe first digit tells the whole story: 2xx = it worked, 3xx = go look somewhere else (redirect), 4xx = you (the client) made a mistake — bad request, not allowed, not found, 5xx = the server broke. '404' and '500' are just the two most famous members of those last two families.
Go deeperGo deeper: what's actually on the wire (and the connection underneath)

On the web, the request/response messages are HTTP. A raw HTTP/1.1 request is plain text: a first line like 'GET /home HTTP/1.1', then a list of 'Header: value' lines, a blank line, then the optional body. The response mirrors it: 'HTTP/1.1 200 OK', headers, blank line, body. It's deliberately human-readable, which is part of why the web won.

But HTTP rides on top of a connection. Before the first byte of an HTTP request, the two machines usually set up a TCP connection (a handshake of a few packets) and, for HTTPS, a TLS handshake on top of that to encrypt it. Those handshakes are themselves round-trips — which is why connection setup has a real cost, and why reusing a connection for many requests (keep-alive, HTTP/2 multiplexing) is such a common optimization. The mechanism is layered: TCP gives a reliable byte pipe, TLS encrypts it, HTTP gives it request/response structure.

The cost model: latency, round-trips, throughput#

There's no Big-O for 'send a request' the way there is for a sort — the dominant cost isn't computation, it's the journey. Two numbers describe almost everything about client-server performance: latency and throughput.

  • Latency — how long a single round-trip takes, from the moment the client sends the request to the moment the full response arrives. It's dominated by physical distance (the speed of light is a hard limit), connection setup, and how long the server takes to do the work. Same-datacenter latency is often under a millisecond; across the world it can be 100–300 ms before the server even starts working.
  • Throughput — how many requests the server can handle per second. A server is a loop; throughput is how fast it can go around that loop, multiplied by how many copies of the loop are running.
  • Connection setup cost — the first request to a server pays for the TCP (and usually TLS) handshake, each a round-trip of its own. Reusing the connection amortizes that across many requests.
Why fewer round-trips is the whole gameIf one round-trip costs 100 ms, then a screen that needs 10 sequential requests costs a full second of waiting — even if every server is instant. This is why apps batch requests, cache responses, and try to get everything in one round-trip. You can't make light faster, so you make fewer trips. Almost every later performance topic (caching, CDNs, connection pooling, request coalescing) is a different way to remove round-trips.
PredictA page makes 5 requests to a server 100 ms away. If it sends them one after another (each waits for the last), it takes ~500 ms. If it sends all 5 at once over one connection, roughly how long?

Hint: What are you actually waiting on — the work, or the travel time? Can the travel overlap?

Roughly 100 ms — about one round-trip, not five. The requests travel in parallel and the responses come back together, so you pay the ~100 ms of travel once instead of stacking it five times (assuming the server handles them concurrently and the connection is already open). This is exactly why 'parallelize and batch' beats 'one at a time': the bottleneck is the round-trip, and parallel round-trips overlap.

Ways to shape the conversation#

Plain request/response is the foundation, but the conversation can take different shapes depending on what you need. These are all still 'client asks, server answers' underneath — they just bend the rules about timing, direction, and structure. Treat this as a tour; the deep dive has the details.

  • Request/response (the default) — one ask, one answer, done. Simple and stateless-friendly. Most of the web works this way.
  • Streaming / Server-Sent Events (SSE) — the client asks once, and the server keeps sending data over time (a live score feed, a log tail). One request, many response chunks.
  • WebSockets — after an initial HTTP handshake, the connection 'upgrades' to a two-way pipe: now the server can push to the client and vice versa, without a new request each time. This is how chat and live collaboration work.
  • Synchronous vs asynchronous — synchronous: the client waits for the answer before doing anything else. Asynchronous: the client fires the request and gets on with other work, handling the answer whenever it arrives. Same messages, different waiting behavior.
Go deeperGo deeper: REST vs RPC/gRPC vs GraphQL, and stateless vs stateful

Once you have request/response, you need a style for how requests are named and shaped. REST treats everything as a resource with an address and uses HTTP methods as the verbs: GET /users/42 reads user 42, DELETE /users/42 removes it. It's resource-centric, cache-friendly, and the default for public web APIs.

RPC (Remote Procedure Call) flips the mental model: instead of 'fetch this resource', you 'call this function on the server' — getUser(42) — as if it were local. gRPC is the popular modern RPC framework: it uses Protocol Buffers (a compact binary format) over HTTP/2, supports streaming, and generates client/server code from a schema. It's fast and strongly typed, which makes it a favorite for service-to-service calls inside a backend, though it's less human-readable than REST.

GraphQL takes a third angle: the client sends one query describing exactly the fields it wants, possibly spanning what would be several REST endpoints, and the server returns precisely that shape. It solves over- and under-fetching (no more 'this endpoint gives me 20 fields, I needed 2'), at the cost of more complex server-side execution and caching.

Cutting across all of these is stateless vs stateful. A stateless server treats each request as self-contained — it remembers nothing between requests, so any copy of the server can handle any request. That's what makes horizontal scaling and load balancing easy (a later topic). A stateful server keeps per-client context (an in-memory session, an open transaction), which can be simpler to program but pins a client to a specific server and complicates scaling and failover. The common compromise: keep servers stateless and push the state into a shared store (a database, a cache, or a signed cookie/token the client carries).

What client-server buys, and what it costs#

The pattern wins because it's simple and centralized. There's one place that holds the truth, one place to secure, one place to update. Clients can be thin and disposable. But every one of those strengths has a shadow.

  • Strength — simplicity: a fixed asker/answerer split is easy to reason about, debug, and secure. Strength — central control: one source of truth, one place to enforce rules and roll out changes.
  • Cost — the server is a bottleneck: every client depends on it, so it limits total throughput. When demand exceeds one server, you need more of them (scaling), copies of the data (replication), and something to spread requests across them (load balancing) — each its own later topic.
  • Cost — the server is a single point of failure (SPOF): if the one server everyone depends on goes down, everyone is down. Removing that SPOF is exactly what replication and failover exist for.
  • Cost — distance and round-trips: clients and servers are far apart, so latency is unavoidable and must be designed around (caching, CDNs, fewer round-trips).
The whole rest of the curriculum, in one sentenceTake 'one server holds the truth and everyone asks it', then ask 'what if that server can't keep up, or falls over?' — and you've just motivated load balancing, replication, caching, consensus, and most of distributed systems. Client-server is the simple base case those topics are all trying to scale and harden.

Failure: a request with no answer#

The request travels across a network to reach the server, and the response travels back. If the server is down — or the network drops the message — the client gets no reply. It can't tell the difference between 'slow' and 'never', so after a timeout it gives up and the request fails.

That inability to distinguish 'slow' from 'never' is the heart of the matter. The client sent a request and heard nothing. Did the server never receive it? Receive it but die before replying? Reply, but the reply got lost on the way back? From the client's side, all three look identical: silence. This is called partial failure — part of the system worked, part didn't, and you can't be sure which.

  • Timeout — the client can't wait forever, so it sets a deadline. If no answer arrives in time, it declares failure. Too short and you give up on healthy-but-slow servers; too long and a dead server hangs your whole app.
  • Retry — the obvious response to a failed request is to send it again. Often that works (the blip was temporary). But it's dangerous: if the first request actually succeeded and only the reply was lost, retrying does the action twice.
  • Idempotency — the fix for safe retries. An idempotent request can be sent any number of times with the same end result (reading a page, or 'set balance to 100'). A non-idempotent one ('add 100 to the balance') is unsafe to repeat. Designing requests to be idempotent — or tagging them with a unique key the server can de-duplicate — is what makes retries trustworthy.
This is why distributed systems are hardA request with no answer is the seed of almost every later topic: timeouts, retries, circuit breakers, replication, consensus. Two computers can never be perfectly sure of each other's state across an unreliable network — so the whole field is about making good decisions despite that uncertainty. For now, just internalize the one fact: 'send a request' can simply get nothing back, and you often can't know why.

Comparisons at a glance#

Two comparisons worth holding in your head: client-server versus its main alternative (peer-to-peer), and the three common API styles versus each other.

Client-serverPeer-to-peer (P2P)
RolesFixed: clients ask, server answersSymmetric: every node is both client and server
Source of truthCentral — the serverDistributed across peers
SimplicitySimple to build, secure, and reason aboutComplex — discovery, trust, coordination
Bottleneck / SPOFThe server limits scale and can take everyone downNo single bottleneck; resilient to one node failing
ExamplesThe web, mobile apps + APIs, databases, emailBitTorrent, blockchains, Kademlia DHTs
RESTRPC / gRPCGraphQL
Mental modelResources at URLs + HTTP verbsCall a function on the serverAsk for exactly the fields you want
FormatUsually JSON over HTTPOften binary (Protobuf) over HTTP/2JSON, single flexible endpoint
Best atPublic APIs, caching, simplicityFast internal service-to-service callsAvoiding over-/under-fetching for rich UIs
Watch outMany round-trips / over-fetchingLess human-readable, tooling-heavyServer-side complexity, harder caching

Where the pattern shows up#

Once you see 'one side asks, one side answers', you spot it everywhere. The most important instance is the web itself.

  • HTTP and the web — your browser (client) sends an HTTP request to a web server, which responds with HTML, CSS, JS, images. Every page load is a flurry of these round-trips. The methods (GET/POST/PUT/DELETE) and status codes (200/404/500) you met above are HTTP's, standardized so any client and any server can understand each other.
  • Mobile apps → APIs — the app on your phone is a client; it talks to backend servers over HTTP APIs (usually REST or gRPC) to fetch your feed, post a message, or sync data. The app is just a nicer-looking browser for one specific server.
  • Databases — when your code runs a query, it's a client of the database server. The query is the request; the rows are the response. Postgres, MySQL, Redis all speak their own client-server protocol over a connection.
  • SSH and email — SSH is a client-server protocol for remote shells; your terminal is the client, sshd is the server. Email clients fetch mail from servers via IMAP/POP and send via SMTP — different protocols, same asker/answerer shape.
The same shape, layeredA single user action often nests the pattern: your browser asks a web server, which is itself a client to a database server, which may be a client to a storage service. Client and server are roles, not fixed identities — a program can be a server to the things above it and a client to the things below.

Common questions & gotchas#

What exactly IS a server, then?

A program that runs a loop: wait for an incoming request, do some work, send back a response, repeat — forever. That's it. The word also gets used for the machine the program runs on, but the useful definition is the program. Any computer can run one, including your laptop. 'Server' is a role, not a kind of hardware.

What happens when there's no response?

The client waits up to a timeout, then declares the request failed. Crucially, it can't tell why — server down, request lost on the way there, or reply lost on the way back all look identical (silence). That uncertainty (partial failure) is why we need timeouts, careful retries, and idempotency to retry safely.

REST vs RPC — what's the real difference?

It's the mental model. REST says 'address a resource and act on it with a verb' (GET /users/42). RPC says 'call a function on the server' (getUser(42)). REST leans on HTTP and URLs and is great for public, cacheable APIs; RPC (especially gRPC) feels like calling local code and is great for fast internal service-to-service calls. Both are still request/response underneath.

Is the client ever a server too?

Yes — constantly. Client and server are roles per-conversation, not permanent labels. A web server is a client to its database. A backend service is a server to the mobile app but a client to three other services. The same process can play both roles at once. (In peer-to-peer systems, every node is deliberately both.)

QuizA client sends a 'transfer $100' request, gets no response, and retries. The money moves twice. What was the root cause, and what would have prevented it?

  1. The server was too slow; a longer timeout fixes it
  2. The request wasn't idempotent; making it idempotent (or de-duplicating by a unique key) prevents the double-apply
  3. The network was down; nothing can be done
  4. The client should never retry any request
Show answer

The request wasn't idempotent; making it idempotent (or de-duplicating by a unique key) prevents the double-applyThe first request actually succeeded — only its response was lost — so the retry applied the transfer a second time. A longer timeout wouldn't help (the work was already done). The fix is idempotency: design the operation so repeating it has the same effect, or attach a unique idempotency key the server uses to recognize and ignore the duplicate. Retries aren't the problem; unsafe retries are.

In an interview#

Lead with the shape: client-server is one side that asks (the client, which always initiates) and one side that answers (the server, just a program in a wait-do-reply loop). One ask plus one answer is a round-trip, and its duration is latency. Most online systems are this pattern repeated.

Then show you know the cost model and the failure model — that's what separates a beginner answer from a strong one. Cost: latency is dominated by round-trips and distance, so good designs minimize round-trips (batching, caching). Failure: a request can get no answer, and the client can't tell 'slow' from 'never' (partial failure) — hence timeouts, retries, and idempotency to retry safely.

Close by connecting it forward: the server is a bottleneck and a single point of failure, which is exactly what load balancing, replication, and consensus exist to address. If asked about API styles, contrast REST (resources + HTTP verbs), RPC/gRPC (call a remote function, fast and binary, internal services), and GraphQL (ask for exactly the fields you need). Mention stateless servers as the thing that makes horizontal scaling easy.

References & further reading#

References