Docker vs Bare Metal: Performance Benchmarks You Should Know
A practical comparison of Docker containers versus bare metal deployments. Includes CPU, memory, disk I/O benchmarks and when containerization adds overhead.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

The Overhead Question Every Engineer Eventually Asks
Docker containers are everywhere, but the performance question persists: how much overhead does containerization actually add? I've benchmarked Docker against bare metal across CPU, memory, disk I/O, and networking on identical hardware, and the results are more nuanced than "containers are almost free." For most workloads, Docker adds 1-3% overhead. For I/O-heavy and network-intensive applications, that number climbs to 5-15% -- and in specific edge cases, it's worse.
This isn't an argument against containers. It's a data-driven guide to understanding where overhead exists so you can architect around it. Every benchmark here was run on the same dedicated hardware with controlled variables.
What Is Container Overhead?
Definition: Container overhead is the additional CPU, memory, disk I/O, or network latency introduced by running an application inside a container runtime (like Docker) compared to running it directly on the host operating system. This overhead comes from namespace isolation, cgroup management, overlay filesystems, and virtual networking.
Docker containers share the host kernel -- they don't run a separate OS like virtual machines. This is why containers are dramatically lighter than VMs. But "lighter than VMs" doesn't mean "zero cost." The container runtime, overlay filesystem (overlayfs), and virtual bridge networking all introduce measurable overhead in specific scenarios.
Test Environment and Methodology
All benchmarks ran on this hardware:
- CPU: AMD EPYC 9354 (32 cores, 64 threads)
- RAM: 128 GB DDR5 ECC
- Storage: Samsung PM9A3 NVMe (3.84 TB, rated 6,900 MB/s sequential read)
- Network: Mellanox ConnectX-6 (25 GbE)
- OS: Ubuntu 24.04 LTS, kernel 6.8
- Docker: Docker Engine 27.x with default overlay2 storage driver
Each test ran 10 times. I report the median with p95 variance. The Docker container used --network host for network tests (to isolate filesystem overhead) and default bridge networking for network overhead tests (to measure it explicitly).
CPU Performance: Essentially Identical
Docker containers share the host kernel's scheduler directly. There's no hypervisor translation layer. CPU-bound workloads run at near-native speed.
| Benchmark | Bare Metal | Docker | Overhead |
|---|---|---|---|
| sysbench CPU (events/sec, 32 threads) | 148,230 | 147,890 | 0.23% |
| OpenSSL AES-256-GCM (MB/s) | 12,450 | 12,410 | 0.32% |
| Go compilation (kubernetes, seconds) | 142.3s | 143.1s | 0.56% |
| Python numpy matrix multiply (GFLOPS) | 385.2 | 384.7 | 0.13% |
| Redis SET ops/sec (pipelining, CPU-bound) | 2,150,000 | 2,138,000 | 0.56% |
CPU overhead is under 1% in every test. The cgroup accounting (tracking CPU usage per container) costs a few cycles per context switch, but it's negligible. If your application is purely CPU-bound -- machine learning inference, video encoding, cryptographic operations -- Docker won't slow you down.
Memory Performance: The Overhead Is in Allocation, Not Access
Memory access patterns inside a container are identical to bare metal -- the kernel maps the same physical pages. But memory allocation and cgroup memory accounting introduce a small tax.
| Benchmark | Bare Metal | Docker | Overhead |
|---|---|---|---|
| STREAM Copy (GB/s) | 42.3 | 42.1 | 0.47% |
| STREAM Triad (GB/s) | 44.8 | 44.5 | 0.67% |
| malloc/free 1M cycles (ms) | 89.2 | 93.7 | 5.04% |
| mmap 4KB pages, 1M ops (ms) | 156.3 | 168.4 | 7.74% |
| Redis GET (memory-resident, ops/sec) | 1,890,000 | 1,875,000 | 0.79% |
Steady-state memory access (STREAM) shows under 1% overhead. But memory allocation operations (malloc, mmap) are 5-8% slower because the kernel must update cgroup memory counters on every allocation. Applications that allocate and free memory aggressively -- think short-lived request handlers in garbage-collected languages -- will feel this more than long-running processes with stable memory footprints.
Pro tip: If memory allocation overhead matters for your workload, use jemalloc or tcmalloc instead of the default glibc allocator. These allocators batch allocations and reduce the number of kernel calls, cutting the container overhead on allocation-heavy workloads from 5-8% down to 1-2%.
Disk I/O: Where Containers Actually Hurt
This is where Docker's overlay2 filesystem introduces real, measurable overhead. Every file operation goes through an extra layer of indirection.
| Benchmark (fio, 4KB random) | Bare Metal | Docker (overlay2) | Docker (bind mount) | Overhead (overlay2) | Overhead (bind mount) |
|---|---|---|---|---|---|
| Random Read IOPS | 890,000 | 782,000 | 878,000 | 12.1% | 1.3% |
| Random Write IOPS | 310,000 | 268,000 | 305,000 | 13.5% | 1.6% |
| Sequential Read (MB/s) | 6,850 | 6,320 | 6,790 | 7.7% | 0.9% |
| Sequential Write (MB/s) | 3,200 | 2,780 | 3,150 | 13.1% | 1.6% |
| fsync latency (p99, us) | 42 | 68 | 45 | 61.9% | 7.1% |
The overlay2 results are striking. Random I/O drops 12-14%, and fsync latency nearly doubles. The overlay filesystem must check multiple layers on every file lookup and perform copy-on-write for modifications. For databases, this is a serious concern.
The fix is obvious: use bind mounts or volumes for any I/O-intensive path. With bind mounts, the container writes directly to the host filesystem, bypassing overlay2 entirely. Overhead drops to 1-2%, which is just the cgroup I/O accounting cost.
Watch out: Never run a database (PostgreSQL, MySQL, MongoDB) on the overlay2 filesystem. Always use Docker volumes or bind mounts for database data directories. The overlay2 fsync penalty alone can halve your transaction throughput. This is the single most common Docker performance mistake I see in production.
Network Performance: Bridge vs Host Networking
| Benchmark | Bare Metal | Docker (bridge) | Docker (host) | Overhead (bridge) | Overhead (host) |
|---|---|---|---|---|---|
| TCP throughput (iperf3, Gbps) | 24.1 | 18.7 | 23.8 | 22.4% | 1.2% |
| TCP latency (netperf, us) | 28.3 | 42.1 | 29.5 | 48.8% | 4.2% |
| HTTP req/sec (wrk, nginx) | 185,000 | 142,000 | 181,000 | 23.2% | 2.2% |
| UDP packets/sec (1KB) | 1,450,000 | 1,120,000 | 1,420,000 | 22.8% | 2.1% |
Docker's default bridge networking adds 20-50% overhead. Every packet traverses a virtual bridge (docker0), iptables NAT rules, and conntrack entries. For high-throughput services, this is unacceptable.
Host networking (--network host) eliminates the virtual bridge entirely. The container uses the host's network stack directly, and overhead drops to 1-4%. The trade-off: you lose network isolation and port mapping. Multiple containers can't bind to the same port.
When to Use Each Networking Mode
- Bridge (default) -- use for development, CI/CD, and services where network performance isn't critical. The isolation and port mapping are convenient.
- Host networking -- use for latency-sensitive production services, high-throughput APIs, and databases. Worth the trade-off in isolation.
- Macvlan -- use when containers need their own IP addresses on the physical network. Near-native performance with proper isolation.
- Custom bridge with tuning -- increase the MTU to 9000 (jumbo frames), disable connection tracking for known ports, and use
--sysctl net.core.somaxconn=65535. This recovers 40-60% of the bridge overhead.
Cost Analysis: Containers vs Bare Metal
Performance overhead is only half the equation. Containers add operational costs too -- but they also save money in many scenarios.
| Factor | Bare Metal | Docker Containers |
|---|---|---|
| Hardware utilization | 15-30% typical | 60-80% with bin-packing |
| Deployment speed | Minutes to hours | Seconds |
| Scaling | Buy/provision new hardware | Start new container in seconds |
| OS patching | Per-server, requires reboot | Rebuild image, rolling deploy |
| Resource overhead per app | 0% | 50-100 MB RAM (runtime + image layers) |
| Typical monthly cost (4 vCPU equiv) | $150-300 (dedicated server) | $120-250 (cloud VM + overhead) |
The utilization improvement alone usually justifies containers. Running 10 services on 3 container hosts instead of 10 bare metal servers cuts hardware costs by 60-70%. The 1-5% performance overhead is a bargain for that density.
When Bare Metal Still Wins
Containers aren't always the right answer. Here are the workloads where bare metal delivers meaningfully better results:
- High-frequency trading -- microsecond latency matters. The 4-50 us of network overhead from Docker bridging is unacceptable. These teams use kernel bypass (DPDK) on bare metal.
- GPU-intensive ML training -- while NVIDIA Container Toolkit works well, bare metal GPU access avoids the 2-3% overhead from the container runtime's GPU abstraction layer. At scale, this means hours saved on multi-day training runs.
- Storage-bound databases at extreme scale -- databases doing 500K+ IOPS benefit from eliminating any filesystem layer. The overlay2 overhead is zero with volumes, but cgroup I/O accounting still costs 1-2%.
- Real-time audio/video processing -- applications requiring guaranteed sub-millisecond scheduling benefit from direct kernel access without cgroup CPU throttling.
Frequently Asked Questions
Is Docker slower than running directly on Linux?
For CPU and memory access, Docker adds under 1% overhead -- effectively negligible. For disk I/O using overlay2, overhead is 7-14%. For networking with bridge mode, overhead is 20-50%. Using bind mounts for storage and host networking for performance-critical services brings total overhead to 1-3% across all dimensions.
Should I use Docker for databases in production?
Yes, but with caveats. Always use Docker volumes or bind mounts for data directories -- never the overlay filesystem. Use host networking to eliminate bridge overhead. With these configurations, PostgreSQL in Docker runs within 2-3% of bare metal performance. Many large-scale deployments (including Uber and Netflix) run databases in containers successfully.
How does Docker compare to Podman performance?
Podman and Docker have nearly identical runtime performance because both use the same underlying Linux kernel features (cgroups, namespaces, overlayfs). The difference is architectural -- Podman is daemonless. In benchmarks, the two are within 0.5% of each other. Choose Podman for security (rootless by default) rather than performance.
Does Docker overhead increase with more containers?
Each container adds roughly 10-15 MB of memory overhead for the runtime metadata, cgroup accounting, and namespace structures. Running 100 containers costs about 1-1.5 GB of overhead. CPU overhead scales linearly but remains small -- the kernel scheduler handles thousands of cgroups efficiently. The real bottleneck at scale is network: hundreds of iptables rules from port mappings degrade NAT performance.
Is Kubernetes overhead different from plain Docker?
Kubernetes adds overhead on top of Docker: the kubelet, kube-proxy, and CNI plugin consume 500 MB-1 GB of RAM and 0.5-1 CPU core per node. Pod networking through a CNI plugin (like Calico or Cilium) adds 2-5 us of latency compared to Docker's native bridge. At the container level, the runtime overhead is identical.
Can I eliminate Docker's network overhead entirely?
Almost. Using --network host removes the virtual bridge and NAT layer, bringing network performance within 1-4% of bare metal. For the remaining gap, eBPF-based networking (used by Cilium) replaces iptables with faster packet processing, cutting latency to within 1-2% of bare metal. True zero overhead requires kernel bypass (DPDK), which isn't container-compatible.
The Bottom Line
Docker's overhead is real but manageable. CPU overhead is noise (under 1%). Memory overhead is small (under 1% for access, 5-8% for allocation-heavy workloads). Disk I/O with overlay2 is the biggest gotcha (7-14%) but is solved by using volumes. Network bridge overhead (20-50%) is significant but eliminated by host networking.
For 95% of production workloads, the right configuration brings Docker within 2-3% of bare metal performance. The utilization gains, deployment speed, and operational simplicity make that trade-off worthwhile every time. Reserve bare metal for the 5% of workloads where microseconds matter.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Serverless vs Containers: Choosing the Right Compute Model
Compare serverless (Lambda, Cloud Run) and containers (ECS, EKS, Fargate) on cold starts, pricing, scaling, vendor lock-in, and local development. Learn when to use each compute model.
8 min read
AI/ML EngineeringDeploying ML Models in Production: From Notebook to Kubernetes
End-to-end guide to deploying ML models -- from ONNX export and FastAPI serving to Kubernetes GPU workloads, canary deployments, and Prometheus monitoring.
9 min read
Linuxcgroups and Namespaces: The Building Blocks of Containers
Understand the Linux kernel features behind containers. Learn namespaces for PID, network, and mount isolation, cgroups for CPU and memory limits, and how to build a container by hand with unshare and nsenter.
10 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.