Docker vs Bare Metal: Performance Benchmarks You Should Know
A practical comparison of Docker containers versus bare metal deployments. Includes CPU, memory, disk I/O benchmarks and when containerization adds overhead.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

0.23% CPU, 13.5% Disk, 22.4% Network: The Numbers That Settle the Debate
The "Docker vs bare metal" argument usually ends in a shrug because nobody quotes numbers. I ran the same benchmark suite against both, on the same AMD EPYC 9354 box, with identical kernels, and here is the headline matrix before you scroll another inch:
| Dimension | Docker overhead vs bare metal | Tuning that fixes it |
|---|---|---|
| CPU (sysbench, OpenSSL, Go build) | 0.23% - 0.56% | Nothing needed. Ship it. |
| Memory access (STREAM) | 0.47% - 0.67% | Nothing needed. |
| Memory allocation (malloc/mmap) | 5.04% - 7.74% | Switch to jemalloc or tcmalloc |
| Disk I/O (overlay2, 4KB random) | 12.1% - 13.5% | Use bind mounts / volumes (drops to 1-2%) |
| Network (default bridge) | 22.4% - 48.8% | Use --network host (drops to 1-4%) |
| fsync p99 latency (overlay2) | +61.9% | Never run databases on overlay2 |
Two numbers to commit to memory: with default Docker, I/O and network overhead dominates. With bind mounts + host networking, Docker is within 1-3% of bare metal on every dimension -- cheap enough that the operational win (bin-packing, immutable builds, seconds-to-deploy) is a trivial trade. Teams that tell you "Docker is slow" are running databases on overlay2 with bridge networking. Teams that tell you "Docker is free" are running CPU-bound microservices where the defaults happen not to matter.
The rest of this article is the full methodology, per-benchmark tables, and the workloads where bare metal still wins. Every number was measured, not estimated.
Test Environment and Methodology
All benchmarks ran on this hardware:
- CPU: AMD EPYC 9354 (32 cores, 64 threads)
- RAM: 128 GB DDR5 ECC
- Storage: Samsung PM9A3 NVMe (3.84 TB, rated 6,900 MB/s sequential read)
- Network: Mellanox ConnectX-6 (25 GbE)
- OS: Ubuntu 24.04 LTS, kernel 6.8
- Docker: Docker Engine 27.x with default overlay2 storage driver
Each test ran 10 times. I report the median with p95 variance. The Docker container used --network host for network tests (to isolate filesystem overhead) and default bridge networking for network overhead tests (to measure it explicitly).
Definition sidebar: Container overhead is the incremental cost in CPU cycles, memory footprint, I/O latency, or network throughput introduced by the container runtime and the kernel features it relies on (cgroups for accounting, namespaces for isolation, overlayfs for layered filesystems, virtual ethernet bridges for networking). It is additive to the cost of running the same workload on bare metal with the same binary.
CPU Performance: Essentially Identical
Docker containers share the host kernel's scheduler directly. There's no hypervisor translation layer. CPU-bound workloads run at near-native speed.
| Benchmark | Bare Metal | Docker | Overhead |
|---|---|---|---|
| sysbench CPU (events/sec, 32 threads) | 148,230 | 147,890 | 0.23% |
| OpenSSL AES-256-GCM (MB/s) | 12,450 | 12,410 | 0.32% |
| Go compilation (kubernetes, seconds) | 142.3s | 143.1s | 0.56% |
| Python numpy matrix multiply (GFLOPS) | 385.2 | 384.7 | 0.13% |
| Redis SET ops/sec (pipelining, CPU-bound) | 2,150,000 | 2,138,000 | 0.56% |
CPU overhead is under 1% in every test. The cgroup accounting (tracking CPU usage per container) costs a few cycles per context switch, but it's negligible. If your application is purely CPU-bound -- machine learning inference, video encoding, cryptographic operations -- Docker won't slow you down.
Memory Performance: The Overhead Is in Allocation, Not Access
Memory access patterns inside a container are identical to bare metal -- the kernel maps the same physical pages. But memory allocation and cgroup memory accounting introduce a small tax.
| Benchmark | Bare Metal | Docker | Overhead |
|---|---|---|---|
| STREAM Copy (GB/s) | 42.3 | 42.1 | 0.47% |
| STREAM Triad (GB/s) | 44.8 | 44.5 | 0.67% |
| malloc/free 1M cycles (ms) | 89.2 | 93.7 | 5.04% |
| mmap 4KB pages, 1M ops (ms) | 156.3 | 168.4 | 7.74% |
| Redis GET (memory-resident, ops/sec) | 1,890,000 | 1,875,000 | 0.79% |
Steady-state memory access (STREAM) shows under 1% overhead. But memory allocation operations (malloc, mmap) are 5-8% slower because the kernel must update cgroup memory counters on every allocation. Applications that allocate and free memory aggressively -- think short-lived request handlers in garbage-collected languages -- will feel this more than long-running processes with stable memory footprints.
Pro tip: If memory allocation overhead matters for your workload, use jemalloc or tcmalloc instead of the default glibc allocator. These allocators batch allocations and reduce the number of kernel calls, cutting the container overhead on allocation-heavy workloads from 5-8% down to 1-2%.
Disk I/O: Where Containers Actually Hurt
This is where Docker's overlay2 filesystem introduces real, measurable overhead. Every file operation goes through an extra layer of indirection.
| Benchmark (fio, 4KB random) | Bare Metal | Docker (overlay2) | Docker (bind mount) | Overhead (overlay2) | Overhead (bind mount) |
|---|---|---|---|---|---|
| Random Read IOPS | 890,000 | 782,000 | 878,000 | 12.1% | 1.3% |
| Random Write IOPS | 310,000 | 268,000 | 305,000 | 13.5% | 1.6% |
| Sequential Read (MB/s) | 6,850 | 6,320 | 6,790 | 7.7% | 0.9% |
| Sequential Write (MB/s) | 3,200 | 2,780 | 3,150 | 13.1% | 1.6% |
| fsync latency (p99, us) | 42 | 68 | 45 | 61.9% | 7.1% |
The overlay2 results are striking. Random I/O drops 12-14%, and fsync latency nearly doubles. The overlay filesystem must check multiple layers on every file lookup and perform copy-on-write for modifications. For databases, this is a serious concern.
The fix is obvious: use bind mounts or volumes for any I/O-intensive path. With bind mounts, the container writes directly to the host filesystem, bypassing overlay2 entirely. Overhead drops to 1-2%, which is just the cgroup I/O accounting cost.
Watch out: Never run a database (PostgreSQL, MySQL, MongoDB) on the overlay2 filesystem. Always use Docker volumes or bind mounts for database data directories. The overlay2 fsync penalty alone can halve your transaction throughput. This is the single most common Docker performance mistake I see in production.
Network Performance: Bridge vs Host Networking
| Benchmark | Bare Metal | Docker (bridge) | Docker (host) | Overhead (bridge) | Overhead (host) |
|---|---|---|---|---|---|
| TCP throughput (iperf3, Gbps) | 24.1 | 18.7 | 23.8 | 22.4% | 1.2% |
| TCP latency (netperf, us) | 28.3 | 42.1 | 29.5 | 48.8% | 4.2% |
| HTTP req/sec (wrk, nginx) | 185,000 | 142,000 | 181,000 | 23.2% | 2.2% |
| UDP packets/sec (1KB) | 1,450,000 | 1,120,000 | 1,420,000 | 22.8% | 2.1% |
Docker's default bridge networking adds 20-50% overhead. Every packet traverses a virtual bridge (docker0), iptables NAT rules, and conntrack entries. For high-throughput services, this is unacceptable.
Host networking (--network host) eliminates the virtual bridge entirely. The container uses the host's network stack directly, and overhead drops to 1-4%. The trade-off: you lose network isolation and port mapping. Multiple containers can't bind to the same port.
When to Use Each Networking Mode
- Bridge (default) -- use for development, CI/CD, and services where network performance isn't critical. The isolation and port mapping are convenient.
- Host networking -- use for latency-sensitive production services, high-throughput APIs, and databases. Worth the trade-off in isolation.
- Macvlan -- use when containers need their own IP addresses on the physical network. Near-native performance with proper isolation.
- Custom bridge with tuning -- increase the MTU to 9000 (jumbo frames), disable connection tracking for known ports, and use
--sysctl net.core.somaxconn=65535. This recovers 40-60% of the bridge overhead.
Cost Analysis: Containers vs Bare Metal
Performance overhead is only half the equation. Containers add operational costs too -- but they also save money in many scenarios.
| Factor | Bare Metal | Docker Containers |
|---|---|---|
| Hardware utilization | 15-30% typical | 60-80% with bin-packing |
| Deployment speed | Minutes to hours | Seconds |
| Scaling | Buy/provision new hardware | Start new container in seconds |
| OS patching | Per-server, requires reboot | Rebuild image, rolling deploy |
| Resource overhead per app | 0% | 50-100 MB RAM (runtime + image layers) |
| Typical monthly cost (4 vCPU equiv) | $150-300 (dedicated server) | $120-250 (cloud VM + overhead) |
The utilization improvement alone usually justifies containers. Running 10 services on 3 container hosts instead of 10 bare metal servers cuts hardware costs by 60-70%. The 1-5% performance overhead is a bargain for that density.
When Bare Metal Still Wins
Containers aren't always the right answer. Here are the workloads where bare metal delivers meaningfully better results:
- High-frequency trading -- microsecond latency matters. The 4-50 us of network overhead from Docker bridging is unacceptable. These teams use kernel bypass (DPDK) on bare metal.
- GPU-intensive ML training -- while NVIDIA Container Toolkit works well, bare metal GPU access avoids the 2-3% overhead from the container runtime's GPU abstraction layer. At scale, this means hours saved on multi-day training runs.
- Storage-bound databases at extreme scale -- databases doing 500K+ IOPS benefit from eliminating any filesystem layer. The overlay2 overhead is zero with volumes, but cgroup I/O accounting still costs 1-2%.
- Real-time audio/video processing -- applications requiring guaranteed sub-millisecond scheduling benefit from direct kernel access without cgroup CPU throttling.
Frequently Asked Questions
Is Docker slower than running directly on Linux?
For CPU and memory access, Docker adds under 1% overhead -- effectively negligible. For disk I/O using overlay2, overhead is 7-14%. For networking with bridge mode, overhead is 20-50%. Using bind mounts for storage and host networking for performance-critical services brings total overhead to 1-3% across all dimensions.
Should I use Docker for databases in production?
Yes, but with caveats. Always use Docker volumes or bind mounts for data directories -- never the overlay filesystem. Use host networking to eliminate bridge overhead. With these configurations, PostgreSQL in Docker runs within 2-3% of bare metal performance. Many large-scale deployments (including Uber and Netflix) run databases in containers successfully.
How does Docker compare to Podman performance?
Podman and Docker have nearly identical runtime performance because both use the same underlying Linux kernel features (cgroups, namespaces, overlayfs). The difference is architectural -- Podman is daemonless. In benchmarks, the two are within 0.5% of each other. Choose Podman for security (rootless by default) rather than performance.
Does Docker overhead increase with more containers?
Each container adds roughly 10-15 MB of memory overhead for the runtime metadata, cgroup accounting, and namespace structures. Running 100 containers costs about 1-1.5 GB of overhead. CPU overhead scales linearly but remains small -- the kernel scheduler handles thousands of cgroups efficiently. The real bottleneck at scale is network: hundreds of iptables rules from port mappings degrade NAT performance.
Is Kubernetes overhead different from plain Docker?
Kubernetes adds overhead on top of Docker: the kubelet, kube-proxy, and CNI plugin consume 500 MB-1 GB of RAM and 0.5-1 CPU core per node. Pod networking through a CNI plugin (like Calico or Cilium) adds 2-5 us of latency compared to Docker's native bridge. At the container level, the runtime overhead is identical.
Can I eliminate Docker's network overhead entirely?
Almost. Using --network host removes the virtual bridge and NAT layer, bringing network performance within 1-4% of bare metal. For the remaining gap, eBPF-based networking (used by Cilium) replaces iptables with faster packet processing, cutting latency to within 1-2% of bare metal. True zero overhead requires kernel bypass (DPDK), which isn't container-compatible.
The Bottom Line
Docker's overhead is real but manageable. CPU overhead is noise (under 1%). Memory overhead is small (under 1% for access, 5-8% for allocation-heavy workloads). Disk I/O with overlay2 is the biggest gotcha (7-14%) but is solved by using volumes. Network bridge overhead (20-50%) is significant but eliminated by host networking.
For 95% of production workloads, the right configuration brings Docker within 2-3% of bare metal performance. The utilization gains, deployment speed, and operational simplicity make that trade-off worthwhile every time. Reserve bare metal for the 5% of workloads where microseconds matter.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Best Vulnerability Scanners for Containers (2026): Snyk vs Trivy vs Grype vs Aqua
Benchmarked comparison of Snyk, Trivy, Grype, and Aqua against 100 production images. Real 2026 pricing, false-positive rates, scan times, and a decision matrix for picking the right scanner.
15 min read
ContainersKubernetes GPU Scheduling: DRA, KAI Scheduler, MIG
Dynamic Resource Allocation replaced device plugins for GPU claims in Kubernetes 1.34. KAI Scheduler adds gang scheduling and queues. MIG slices H100s into 7 isolated tenants. Full production setup with the NVIDIA GPU Operator, topology-aware training, and when to use MIG vs MPS vs time-slicing.
17 min read
AI/ML EngineeringvLLM vs TGI vs Triton: LLM Inference Server Comparison
Production LLM serving with vLLM 0.7, TGI 3.0, and NVIDIA Triton + TensorRT-LLM. Llama 3.1 70B H100 benchmarks, FP8 KV-cache numbers, $/1M token math, and a decision framework for picking the right server per team shape.
18 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.