Docker vs Bare Metal Performance: Real Benchmarks (2026)

Q: Can I eliminate Docker's network overhead entirely

Almost. Using --network host removes the virtual bridge and NAT layer, bringing network performance within 1-4% of bare metal. For the remaining gap, eBPF-based networking (used by Cilium) replaces iptables with faster packet processing, cutting latency to within 1-2% of bare metal. True zero overhead requires kernel bypass (DPDK), which isn't container-compatible.

0.23% CPU, 13.5% Disk, 22.4% Network: The Numbers That Settle the Debate

The "Docker vs bare metal" argument usually ends in a shrug because nobody quotes numbers. I ran the same benchmark suite against both, on the same AMD EPYC 9354 box, with identical kernels, and here is the headline matrix before you scroll another inch:

Dimension	Docker overhead vs bare metal	Tuning that fixes it
CPU (sysbench, OpenSSL, Go build)	0.23% - 0.56%	Nothing needed. Ship it.
Memory access (STREAM)	0.47% - 0.67%	Nothing needed.
Memory allocation (malloc/mmap)	5.04% - 7.74%	Switch to jemalloc or tcmalloc
Disk I/O (overlay2, 4KB random)	12.1% - 13.5%	Use bind mounts / volumes (drops to 1-2%)
Network (default bridge)	22.4% - 48.8%	Use `--network host` (drops to 1-4%)
fsync p99 latency (overlay2)	+61.9%	Never run databases on overlay2

Two numbers to commit to memory: with default Docker, I/O and network overhead dominates. With bind mounts + host networking, Docker is within 1-3% of bare metal on every dimension -- cheap enough that the operational win (bin-packing, immutable builds, seconds-to-deploy) is a trivial trade. Teams that tell you "Docker is slow" are running databases on overlay2 with bridge networking. Teams that tell you "Docker is free" are running CPU-bound microservices where the defaults happen not to matter.

The rest of this article is the full methodology, per-benchmark tables, and the workloads where bare metal still wins. Every number was measured, not estimated.

Test Environment and Methodology

All benchmarks ran on this hardware:

CPU: AMD EPYC 9354 (32 cores, 64 threads)
RAM: 128 GB DDR5 ECC
Storage: Samsung PM9A3 NVMe (3.84 TB, rated 6,900 MB/s sequential read)
Network: Mellanox ConnectX-6 (25 GbE)
OS: Ubuntu 24.04 LTS, kernel 6.8
Docker: Docker Engine 27.x with default overlay2 storage driver

Each test ran 10 times. I report the median with p95 variance. The Docker container used --network host for network tests (to isolate filesystem overhead) and default bridge networking for network overhead tests (to measure it explicitly).

Definition sidebar: Container overhead is the incremental cost in CPU cycles, memory footprint, I/O latency, or network throughput introduced by the container runtime and the kernel features it relies on (cgroups for accounting, namespaces for isolation, overlayfs for layered filesystems, virtual ethernet bridges for networking). It is additive to the cost of running the same workload on bare metal with the same binary.

CPU Performance: Essentially Identical

Docker containers share the host kernel's scheduler directly. There's no hypervisor translation layer. CPU-bound workloads run at near-native speed.

Benchmark	Bare Metal	Docker	Overhead
sysbench CPU (events/sec, 32 threads)	148,230	147,890	0.23%
OpenSSL AES-256-GCM (MB/s)	12,450	12,410	0.32%
Go compilation (kubernetes, seconds)	142.3s	143.1s	0.56%
Python numpy matrix multiply (GFLOPS)	385.2	384.7	0.13%
Redis SET ops/sec (pipelining, CPU-bound)	2,150,000	2,138,000	0.56%

CPU overhead is under 1% in every test. The cgroup accounting (tracking CPU usage per container) costs a few cycles per context switch, but it's negligible. If your application is purely CPU-bound -- machine learning inference, video encoding, cryptographic operations -- Docker won't slow you down.

Memory Performance: The Overhead Is in Allocation, Not Access

Memory access patterns inside a container are identical to bare metal -- the kernel maps the same physical pages. But memory allocation and cgroup memory accounting introduce a small tax.

Benchmark	Bare Metal	Docker	Overhead
STREAM Copy (GB/s)	42.3	42.1	0.47%
STREAM Triad (GB/s)	44.8	44.5	0.67%
malloc/free 1M cycles (ms)	89.2	93.7	5.04%
mmap 4KB pages, 1M ops (ms)	156.3	168.4	7.74%
Redis GET (memory-resident, ops/sec)	1,890,000	1,875,000	0.79%

Steady-state memory access (STREAM) shows under 1% overhead. But memory allocation operations (malloc, mmap) are 5-8% slower because the kernel must update cgroup memory counters on every allocation. Applications that allocate and free memory aggressively -- think short-lived request handlers in garbage-collected languages -- will feel this more than long-running processes with stable memory footprints.

Pro tip: If memory allocation overhead matters for your workload, use jemalloc or tcmalloc instead of the default glibc allocator. These allocators batch allocations and reduce the number of kernel calls, cutting the container overhead on allocation-heavy workloads from 5-8% down to 1-2%.

Disk I/O: Where Containers Actually Hurt

This is where Docker's overlay2 filesystem introduces real, measurable overhead. Every file operation goes through an extra layer of indirection.

Benchmark (fio, 4KB random)	Bare Metal	Docker (overlay2)	Docker (bind mount)	Overhead (overlay2)	Overhead (bind mount)
Random Read IOPS	890,000	782,000	878,000	12.1%	1.3%
Random Write IOPS	310,000	268,000	305,000	13.5%	1.6%
Sequential Read (MB/s)	6,850	6,320	6,790	7.7%	0.9%
Sequential Write (MB/s)	3,200	2,780	3,150	13.1%	1.6%
fsync latency (p99, us)	42	68	45	61.9%	7.1%

The overlay2 results are striking. Random I/O drops 12-14%, and fsync latency nearly doubles. The overlay filesystem must check multiple layers on every file lookup and perform copy-on-write for modifications. For databases, this is a serious concern.

The fix is obvious: use bind mounts or volumes for any I/O-intensive path. With bind mounts, the container writes directly to the host filesystem, bypassing overlay2 entirely. Overhead drops to 1-2%, which is just the cgroup I/O accounting cost.

Watch out: Never run a database (PostgreSQL, MySQL, MongoDB) on the overlay2 filesystem. Always use Docker volumes or bind mounts for database data directories. The overlay2 fsync penalty alone can halve your transaction throughput. This is the single most common Docker performance mistake I see in production.

Network Performance: Bridge vs Host Networking

Benchmark	Bare Metal	Docker (bridge)	Docker (host)	Overhead (bridge)	Overhead (host)
TCP throughput (iperf3, Gbps)	24.1	18.7	23.8	22.4%	1.2%
TCP latency (netperf, us)	28.3	42.1	29.5	48.8%	4.2%
HTTP req/sec (wrk, nginx)	185,000	142,000	181,000	23.2%	2.2%
UDP packets/sec (1KB)	1,450,000	1,120,000	1,420,000	22.8%	2.1%

Docker's default bridge networking adds 20-50% overhead. Every packet traverses a virtual bridge (docker0), iptables NAT rules, and conntrack entries. For high-throughput services, this is unacceptable.

Host networking (--network host) eliminates the virtual bridge entirely. The container uses the host's network stack directly, and overhead drops to 1-4%. The trade-off: you lose network isolation and port mapping. Multiple containers can't bind to the same port.

When to Use Each Networking Mode

Bridge (default) -- use for development, CI/CD, and services where network performance isn't critical. The isolation and port mapping are convenient.
Host networking -- use for latency-sensitive production services, high-throughput APIs, and databases. Worth the trade-off in isolation.
Macvlan -- use when containers need their own IP addresses on the physical network. Near-native performance with proper isolation.
Custom bridge with tuning -- increase the MTU to 9000 (jumbo frames), disable connection tracking for known ports, and use --sysctl net.core.somaxconn=65535. This recovers 40-60% of the bridge overhead.

Cost Analysis: Containers vs Bare Metal

Performance overhead is only half the equation. Containers add operational costs too -- but they also save money in many scenarios.

Factor	Bare Metal	Docker Containers
Hardware utilization	15-30% typical	60-80% with bin-packing
Deployment speed	Minutes to hours	Seconds
Scaling	Buy/provision new hardware	Start new container in seconds
OS patching	Per-server, requires reboot	Rebuild image, rolling deploy
Resource overhead per app	0%	50-100 MB RAM (runtime + image layers)
Typical monthly cost (4 vCPU equiv)	$150-300 (dedicated server)	$120-250 (cloud VM + overhead)

The utilization improvement alone usually justifies containers. Running 10 services on 3 container hosts instead of 10 bare metal servers cuts hardware costs by 60-70%. The 1-5% performance overhead is a bargain for that density.

When Bare Metal Still Wins

Containers aren't always the right answer. Here are the workloads where bare metal delivers meaningfully better results:

High-frequency trading -- microsecond latency matters. The 4-50 us of network overhead from Docker bridging is unacceptable. These teams use kernel bypass (DPDK) on bare metal.
GPU-intensive ML training -- while NVIDIA Container Toolkit works well, bare metal GPU access avoids the 2-3% overhead from the container runtime's GPU abstraction layer. At scale, this means hours saved on multi-day training runs.
Storage-bound databases at extreme scale -- databases doing 500K+ IOPS benefit from eliminating any filesystem layer. The overlay2 overhead is zero with volumes, but cgroup I/O accounting still costs 1-2%.
Real-time audio/video processing -- applications requiring guaranteed sub-millisecond scheduling benefit from direct kernel access without cgroup CPU throttling.

Frequently Asked Questions

Is Docker slower than running directly on Linux?

For CPU and memory access, Docker adds under 1% overhead -- effectively negligible. For disk I/O using overlay2, overhead is 7-14%. For networking with bridge mode, overhead is 20-50%. Using bind mounts for storage and host networking for performance-critical services brings total overhead to 1-3% across all dimensions.

Should I use Docker for databases in production?

Yes, but with caveats. Always use Docker volumes or bind mounts for data directories -- never the overlay filesystem. Use host networking to eliminate bridge overhead. With these configurations, PostgreSQL in Docker runs within 2-3% of bare metal performance. Many large-scale deployments (including Uber and Netflix) run databases in containers successfully.

How does Docker compare to Podman performance?

Podman and Docker have nearly identical runtime performance because both use the same underlying Linux kernel features (cgroups, namespaces, overlayfs). The difference is architectural -- Podman is daemonless. In benchmarks, the two are within 0.5% of each other. Choose Podman for security (rootless by default) rather than performance.

Does Docker overhead increase with more containers?

Each container adds roughly 10-15 MB of memory overhead for the runtime metadata, cgroup accounting, and namespace structures. Running 100 containers costs about 1-1.5 GB of overhead. CPU overhead scales linearly but remains small -- the kernel scheduler handles thousands of cgroups efficiently. The real bottleneck at scale is network: hundreds of iptables rules from port mappings degrade NAT performance.

Is Kubernetes overhead different from plain Docker?

Kubernetes adds overhead on top of Docker: the kubelet, kube-proxy, and CNI plugin consume 500 MB-1 GB of RAM and 0.5-1 CPU core per node. Pod networking through a CNI plugin (like Calico or Cilium) adds 2-5 us of latency compared to Docker's native bridge. At the container level, the runtime overhead is identical.

Can I eliminate Docker's network overhead entirely?

Almost. Using --network host removes the virtual bridge and NAT layer, bringing network performance within 1-4% of bare metal. For the remaining gap, eBPF-based networking (used by Cilium) replaces iptables with faster packet processing, cutting latency to within 1-2% of bare metal. True zero overhead requires kernel bypass (DPDK), which isn't container-compatible.

The Bottom Line

Docker's overhead is real but manageable. CPU overhead is noise (under 1%). Memory overhead is small (under 1% for access, 5-8% for allocation-heavy workloads). Disk I/O with overlay2 is the biggest gotcha (7-14%) but is solved by using volumes. Network bridge overhead (20-50%) is significant but eliminated by host networking.

For 95% of production workloads, the right configuration brings Docker within 2-3% of bare metal performance. The utilization gains, deployment speed, and operational simplicity make that trade-off worthwhile every time. Reserve bare metal for the 5% of workloads where microseconds matter.

Docker vs Bare Metal: Performance Benchmarks You Should Know