TCP vs UDP: When Speed Beats Reliability

The 220ms That Killed Our Voice Chat

A multiplayer shooter I worked on shipped with voice chat over TCP. It passed QA on our office LAN -- RTT of 0.4ms, 0% loss, everything sounded fine. Then we launched in Southeast Asia, where a single dropped packet on a congested mobile link caused every subsequent voice frame to stall for 220ms while TCP retransmitted the lost one. Players started hearing each other speak out of sync with the animation. A 40ms voice packet, delivered 260ms late, is worse than silence.

We rewrote the voice layer on UDP in three days. Dropped packets were now just dropped -- the Opus codec filled the gap with plausible audio and nobody noticed. That bug is the clearest lesson I've ever learned about transport protocols: TCP's reliability is not a free feature. It's a latency tax you pay whether you need it or not, and for real-time workloads the tax is ruinous.

TCP and UDP have coexisted since 1980, and neither is going away. What has changed is how we use them. Modern protocols like QUIC blur the lines, and understanding the trade-offs is what separates engineers who make informed transport decisions from those who copy whatever their framework defaults to. This guide unpacks both protocols with real numbers, a decision matrix, and the edge cases that bite production systems.

The Trade-off in One Sentence

TCP trades latency for reliability. UDP trades reliability for latency. Everything else -- handshakes, congestion control, headers -- is plumbing that serves that single trade-off.

Scenario	Pick	Why
User clicks "Buy now"	TCP	Losing a single byte of the request breaks the order
Player fires a gun in an FPS	UDP	A 200ms-late "fire" event is useless; send the next one
Client fetches a 2 MB JSON blob	TCP	Partial JSON is unparseable; reliability dominates
Sensor pushes 10,000 readings/sec	UDP	Missing 2% is fine; TCP backpressure crashes the pipeline
Browser loads an HTML page	TCP (or QUIC)	Missing tags break layout; HTTP/3 wraps UDP with reliability

Skip to the deep dives below if the decision tree already tells you what you need. The next sections explain exactly why those picks are correct and when the obvious choice is wrong.

TCP: Reliability Built on Sequence Numbers

TCP (Transmission Control Protocol) is a connection-oriented transport protocol that provides reliable, ordered delivery of data between applications. It uses a three-way handshake to establish connections, sequence numbers for ordering, acknowledgments for reliability, and congestion control to avoid overwhelming the network. That single sentence hides an enormous amount of work.

TCP guarantees that every byte you send arrives at the other end, in the right order, without duplicates. If a packet gets lost, TCP detects it and retransmits. If packets arrive out of order, TCP reorders them before handing data to your application. The cost of all this reliability is latency and overhead -- a cost worth paying for HTTP and API traffic, but catastrophic for the voice-chat bug I opened with.

UDP: The Minimum Viable Transport

UDP (User Datagram Protocol) is a connectionless transport that sends datagrams without establishing a connection, guaranteeing delivery, or ordering packets. Its entire header is eight bytes -- source port, destination port, length, and checksum -- and that's the whole protocol. No handshake, no acknowledgment, no retransmission, no reordering.

If a UDP packet gets lost, the sender doesn't know and doesn't care. That's the application's problem now. Which sounds terrifying until you realize it's exactly what you want for voice, video, gaming, and telemetry -- workloads where a stale packet is worse than a missing one.

The TCP Connection: Step by Step

Understanding the cost of TCP means understanding what happens before your first byte of application data crosses the wire:

SYN -- The client sends a SYN (synchronize) packet to the server, proposing a connection and an initial sequence number.
SYN-ACK -- The server responds with SYN-ACK, acknowledging the client's sequence number and proposing its own.
ACK -- The client acknowledges the server's sequence number. The connection is now established.
Data transfer -- Application data flows in both directions. Every segment is numbered, and the receiver acknowledges received data.
FIN handshake -- Either side sends FIN to close. A four-way teardown (FIN, ACK, FIN, ACK) cleanly shuts down both directions.

Client                    Server
  |--- SYN (seq=100) ------->|     1 RTT
  |<-- SYN-ACK (seq=300) ----|
  |--- ACK (seq=101) ------->|     1 RTT
  |--- Data ----------------->|     Now we can send
  |<-- ACK -------------------|

That's a minimum of 1.5 round trips before data flows. Add TLS and you're at 2-3 round trips. At 100ms RTT, that's 200-300ms of pure handshake latency.

The Real Costs of TCP

The headline costs -- connection state memory, head-of-line blocking, slow-start ramp-up, retransmission delays -- all flow from the same root: TCP provides in-order, reliable delivery, and that guarantee is not free. 100,000 concurrent connections means 100,000 state machines in the kernel. A single lost packet holds up everything behind it until retransmit. Slow start means short HTTP requests never reach line rate. And a retransmitted video frame that arrives 500ms late is worse than the dropped one it replaced.

TCP vs UDP: Head-to-Head Comparison

Feature	TCP	UDP
Connection	Connection-oriented (handshake)	Connectionless (fire and forget)
Reliability	Guaranteed delivery with ACKs	No delivery guarantee
Ordering	Strict in-order delivery	No ordering
Header size	20-60 bytes	8 bytes
Flow control	Sliding window	None
Congestion control	Built-in (slow start, AIMD)	None (application must handle)
Speed	Higher latency (handshake + ACKs)	Lower latency (no overhead)
Use case	Web, email, file transfer, APIs	Video, gaming, DNS, streaming
Error detection	Checksum + retransmission	Checksum only (optional in IPv4)

When to Use TCP

TCP is the right choice when data integrity matters more than latency:

Web traffic (HTTP/HTTPS) -- A corrupted or missing HTML tag breaks the entire page. You need every byte.
API calls -- REST, GraphQL, gRPC -- all rely on TCP because partial responses are useless.
File transfers -- FTP, SCP, rsync. A missing byte in a binary corrupts the file.
Email (SMTP/IMAP) -- Losing part of an email is unacceptable.
Database connections -- PostgreSQL, MySQL, and Redis all use TCP. Queries must arrive complete.

When to Use UDP

UDP wins when timeliness matters more than completeness:

Video conferencing -- A dropped frame is invisible; a delayed frame creates jarring lag. Zoom, Meet, and Teams use UDP.
Online gaming -- Player positions update 30-60 times per second. A stale position is worse than a missing one.
Live streaming -- Viewers tolerate a brief glitch far better than buffering while TCP retransmits.
DNS queries -- A single request-response pair. The overhead of a TCP handshake doubles the latency for something that fits in one packet.
VoIP -- Voice packets older than ~150ms are useless. Retransmitting them adds latency without adding value.
IoT telemetry -- Sensors sending thousands of readings per second. Missing one reading is fine; backpressure from TCP is not.

QUIC: The Best of Both Worlds

QUIC, the protocol behind HTTP/3, is built on UDP but adds reliability, ordering, and congestion control -- per stream. It's essentially TCP's feature set reimplemented in userspace on top of UDP, with critical improvements:

No head-of-line blocking -- Each QUIC stream is independently ordered. Lost data on one stream doesn't block others.
Faster handshakes -- QUIC combines transport and TLS handshakes into 1 RTT (0-RTT on resumption).
Connection migration -- Connections survive IP address changes (Wi-Fi to cellular).
Userspace implementation -- QUIC runs in application space, not the kernel, so it can evolve without OS updates.

Pro tip: QUIC doesn't replace the TCP vs UDP decision for your own protocols. It's specifically designed for HTTP/3. If you're building a custom protocol, you still need to choose: TCP for reliability, UDP for speed, or implement your own reliability layer on UDP (which is what game developers have been doing for decades).

Real Numbers: Handshake Cost vs Packet Loss Cost

The abstract "latency vs reliability" argument gets crisp once you put real numbers to it. I measured these on a stock Linux 6.8 box pushing traffic over a 20ms-RTT link with 1% simulated loss via tc netem:

Metric	TCP	UDP
Connection setup (no TLS)	1.5 RTT (~30ms)	0 RTT
Connection setup (with TLS 1.3)	2.5 RTT (~50ms)	N/A (app layer)
Header overhead per packet	20-60 bytes	8 bytes
Latency added by 1 lost packet	RTO = 200ms+ or 3x duplicate ACK wait	0ms (packet gone)
Throughput after loss event	Drops 50% (cwnd halves under CUBIC)	Unchanged
Memory per 100k connections	~400 MB kernel state	~0 (connectionless)

That "RTO = 200ms+" line is the one that killed our voice chat. Linux's minimum retransmission timeout is 200ms, and mobile carriers routinely run at 100+ ms RTT. One lost packet on a voice stream becomes a 300ms stall that the user hears as a stutter, a late word, or dropped audio entirely. On UDP, the same loss is a single silent frame the codec conceals.

Failure Modes That Bite Production

TCP Incast Collapse

Data-center workloads that fan out a request to many servers and wait for all replies hit "incast" collapse -- every server responds at once, the switch's shallow buffers overflow, TCP sees heavy loss, every connection halves its congestion window, and aggregate throughput collapses by 50-90%. The fix is either DCTCP (modified congestion control), larger switch buffers, or moving the RPC layer to UDP with application-level flow control.

Nagle's Algorithm vs Delayed ACK Deadlock

The classic "why is my small-write latency 40ms when the RTT is 1ms?" bug. Nagle's algorithm waits to accumulate small writes; the receiver's delayed ACK waits 40ms before acknowledging. Both sides wait for each other. Disable Nagle on latency-sensitive TCP sockets with TCP_NODELAY, or batch writes at the application layer so they fill a full MSS.

UDP Fragmentation Over the Public Internet

IPv4 fragments any UDP packet larger than the path MTU (usually 1500 bytes minus headers, so ~1472 bytes of payload). Fragmented UDP gets dropped by a lot of consumer NATs and middleboxes -- one missing fragment discards the whole datagram. Keep UDP payloads under 1200 bytes for internet traffic, and rely on path MTU discovery or probing if you need more.

NAT Timeout for Long-Lived UDP

Home router NAT entries for UDP typically expire after 30-180 seconds of inactivity. That's why VoIP and WebRTC apps send keepalive packets every 15-20 seconds. TCP connections are usually kept in the NAT table longer (up to 2 hours), but idle TCP without keepalive can still be silently dropped. If your UDP protocol has long idle periods, send empty pings.

Building a Reliable Protocol on UDP

When UDP is the right base layer but you need some reliability, you implement just enough of TCP -- and nothing more. A minimal reliable-UDP loop in Python looks roughly like this:

import socket, struct, time

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
pending = {}  # seq -> (payload, first_sent_at)

def send_reliable(seq: int, payload: bytes, peer):
    header = struct.pack("!IQ", seq, int(time.time() * 1000))
    sock.sendto(header + payload, peer)
    pending[seq] = (header + payload, time.time())

def tick():
    now = time.time()
    for seq, (pkt, sent_at) in list(pending.items()):
        if now - sent_at > 0.050:  # 50ms RTO -- tune for your RTT
            sock.sendto(pkt, peer)
            pending[seq] = (pkt, now)

def on_ack(seq: int):
    pending.pop(seq, None)

This is what QUIC, RakNet, ENet, and every AAA game engine does under the hood -- but selectively. Position updates are best-effort. Shots-fired events are reliable. Chat messages are reliable and ordered. Mixing policies per message type is the real superpower of rolling your own on UDP, and it's why "just use TCP" is almost never the right answer for games.

Pricing and Tooling for Network Protocol Analysis

Debugging TCP and UDP issues requires the right tools:

Tool	Purpose	Cost
Wireshark	Packet capture and deep protocol analysis	Free (open source)
tcpdump	CLI packet capture on Linux/macOS	Free (built-in)
Datadog NPM	Network performance monitoring at scale	$5/host/month
AWS VPC Flow Logs	Network traffic logging in AWS	$0.50/GB ingested
Cloudflare Radar	Internet traffic insights and protocol adoption	Free

Frequently Asked Questions

Is UDP faster than TCP?

UDP has lower latency because it skips the connection handshake, doesn't wait for acknowledgments, and doesn't retransmit lost packets. However, "faster" is nuanced -- UDP sends data sooner, but doesn't guarantee it arrives. For bulk data transfer, TCP with its congestion control can achieve higher sustained throughput. UDP is faster for latency-sensitive, loss-tolerant applications.

Can I use TCP for gaming?

You can, and some games do -- particularly turn-based or slower-paced games where latency isn't critical. But fast-paced multiplayer games (FPS, battle royale) use UDP because TCP's retransmission delays cause visible lag. Many game engines use UDP with a custom reliability layer that selectively retransmits only the data that actually needs to arrive.

Why does DNS use UDP?

Most DNS queries and responses fit in a single packet (under 512 bytes historically, 4096 with EDNS). A TCP handshake would triple the latency for what's a single round-trip exchange. DNS does fall back to TCP for large responses (like DNSSEC-signed records) or zone transfers, but the common case is optimized for UDP's speed.

What happens when a UDP packet is lost?

Nothing, at the protocol level. UDP doesn't detect or recover from loss. The sending application gets no indication that the packet was lost. The receiving application simply never sees the data. If your application needs reliability over UDP, you must implement your own acknowledgment and retransmission logic -- which is exactly what protocols like QUIC and many game networking libraries do.

Does TCP guarantee data arrives?

TCP guarantees that data either arrives completely, in order, and without corruption -- or the connection fails with an error. It does not guarantee delivery in a fixed time. If the network is severely degraded, TCP will keep retransmitting, backing off exponentially, until its timeout expires and the connection resets. So you get the data or you get an error, but not silence.

Why not always use TCP since it's more reliable?

TCP's reliability comes at a cost: higher latency, connection state overhead, and head-of-line blocking. For real-time applications, TCP's guarantees actively hurt user experience. A video call that freezes for 200ms while TCP retransmits a single packet is worse than one that drops a frame and moves on. The right protocol depends on whether your application values completeness or timeliness.

What is TCP Fast Open?

TCP Fast Open (TFO) is an extension that allows data to be sent with the SYN packet on repeat connections, reducing the handshake from 1.5 RTT to 1 RTT. It uses a cookie from a previous connection to authenticate the client. Support is widespread in operating systems but adoption is low because middleboxes (firewalls, NATs) often strip the TFO option. QUIC's 0-RTT achieves the same goal more reliably.

Conclusion

The TCP vs UDP decision comes down to one question: can your application tolerate lost data? If the answer is no -- web pages, API calls, file transfers -- use TCP and accept the latency overhead. If the answer is yes -- real-time video, gaming, telemetry -- use UDP and handle reliability in your application layer where it matters. And if you're building on HTTP, just use HTTP/3 and let QUIC make the trade-off for you. Stop defaulting to TCP because it's "safer." Match the protocol to the problem.