Load Balancing Algorithms: Round Robin, Least Connections, and More
A practical guide to load balancing algorithms -- round robin, least connections, IP hash, consistent hashing, and power of two choices -- with nginx and HAProxy configurations.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Your Load Balancer Is Making Thousands of Decisions Per Second
Every request hitting your production infrastructure passes through a load balancing algorithm that decides which backend server handles it. Pick the wrong algorithm and you get uneven load distribution, hot spots, cache misses, and degraded performance under pressure. The default round robin works fine for many setups, but the moment your servers aren't identical or your traffic patterns have affinity requirements, you need to understand your options.
I've seen teams run into capacity problems that weren't actually about capacity -- they were about distribution. One server at 95% CPU while three others idled at 20%. The fix wasn't more servers. It was changing one line in an nginx config.
What Is Load Balancing?
Definition: Load balancing is the process of distributing incoming network traffic across multiple servers to ensure no single server is overwhelmed. A load balancer sits between clients and backend servers, using an algorithm to select which server handles each request, while providing health checks, failover, and optionally SSL termination.
Load balancers operate at two layers: Layer 4 (TCP/UDP -- routing based on IP and port) and Layer 7 (HTTP -- routing based on URL path, headers, cookies). The algorithm choice matters at both layers, but Layer 7 gives you more information to make smarter decisions.
Round Robin
The simplest algorithm: requests go to servers in order, cycling through the list. Server A, Server B, Server C, Server A, Server B, Server C.
# nginx default - round robin
upstream backend {
server 10.0.0.1:8080;
server 10.0.0.2:8080;
server 10.0.0.3:8080;
}
When Round Robin Works
- All servers have identical hardware and capacity.
- Requests are roughly uniform in cost (similar processing time).
- No session affinity requirements.
- Stateless application servers.
When Round Robin Breaks
If Server A is a 4-core machine and Server B is an 8-core machine, round robin sends them equal traffic. Server A saturates while Server B coasts. Similarly, if some requests take 10ms and others take 5 seconds, round robin can stack expensive requests on one server by bad luck.
Weighted Round Robin
Assign each server a weight proportional to its capacity. A server with weight 3 gets three times as many requests as a server with weight 1.
upstream backend {
server 10.0.0.1:8080 weight=3; # 8-core, 32GB
server 10.0.0.2:8080 weight=1; # 4-core, 16GB
server 10.0.0.3:8080 weight=2; # 4-core, 32GB
}
This handles heterogeneous hardware but still assumes requests are uniform in cost. You're manually tuning weights, which means re-tuning whenever you change server specs.
Least Connections
Route each new request to the server with the fewest active connections. This naturally adapts to differences in both server capacity and request cost -- faster servers complete requests sooner and accumulate fewer active connections.
upstream backend {
least_conn;
server 10.0.0.1:8080;
server 10.0.0.2:8080;
server 10.0.0.3:8080;
}
Why Least Connections Is Often the Best Default
- It adapts to servers of different speeds without manual weight tuning.
- It handles variable request durations -- servers processing slow requests naturally receive fewer new ones.
- It responds to real-time load, not a preconfigured assumption about capacity.
- It gracefully handles a slow server (e.g., garbage collection pause) by routing traffic away from it while it's stalled.
Pro tip: If you're unsure which algorithm to use, start with least connections. It handles more edge cases than round robin with zero additional configuration. Switch to something more specialized only when you have a specific reason.
IP Hash
Compute a hash of the client's IP address and use it to consistently route that client to the same server. This provides session affinity (sticky sessions) without cookies or application-level session stores.
upstream backend {
ip_hash;
server 10.0.0.1:8080;
server 10.0.0.2:8080;
server 10.0.0.3:8080;
}
IP Hash Trade-offs
- Pro: Clients consistently hit the same server, enabling server-local caches and in-memory sessions.
- Con: Corporate NATs and shared IPs can funnel thousands of users to one server. A large office behind a single NAT IP creates a hot spot.
- Con: Adding or removing servers rehashes most clients to different servers, invalidating their sessions and caches.
Consistent Hashing
A more sophisticated version of IP hash that minimizes redistribution when servers are added or removed. Servers are placed on a hash ring, and each request is mapped to the nearest server on the ring. When a server is added, only a fraction (roughly 1/n) of requests move.
Consistent hashing is essential for caching layers. If you're load-balancing across Memcached or Redis instances, you want the same keys to hit the same servers. Regular hashing redistributes nearly all keys when a server changes; consistent hashing moves only the minimum necessary.
# nginx Plus or third-party module
upstream cache_backend {
hash $request_uri consistent;
server 10.0.0.1:11211;
server 10.0.0.2:11211;
server 10.0.0.3:11211;
}
Random with Two Choices (Power of Two)
Pick two servers at random, then send the request to whichever has fewer active connections. This surprisingly simple algorithm produces near-optimal load distribution with minimal overhead.
The math behind it is compelling: random selection alone creates an O(log n) maximum load imbalance. Random with two choices reduces it to O(log log n) -- an exponential improvement from a trivial change. This is known as the "power of two choices" principle.
Pro tip: Random with two choices is excellent for large-scale distributed systems where maintaining a global connection count is expensive. Each load balancer only needs to probe two servers, not track the state of all of them. It's used in Envoy proxy and several cloud-native service meshes.
Algorithm Comparison
| Algorithm | Best For | Weakness | State Needed |
|---|---|---|---|
| Round Robin | Identical servers, uniform requests | Ignores server capacity and load | Counter only |
| Weighted Round Robin | Heterogeneous hardware | Manual tuning, ignores request cost | Counter + weights |
| Least Connections | Variable request durations | Requires connection tracking | Active connection counts |
| IP Hash | Session affinity | NAT hot spots, rehashing on changes | None (stateless hash) |
| Consistent Hashing | Caching layers | More complex implementation | Hash ring |
| Random Two Choices | Large distributed systems | Slightly less optimal than least-conn | Two random probes |
Health Checks
No algorithm matters if you're routing to dead servers. Health checks come in two flavors:
Passive Health Checks
The load balancer monitors responses from normal traffic. If a server returns errors or times out, it's marked unhealthy after a threshold. nginx does this by default:
upstream backend {
server 10.0.0.1:8080 max_fails=3 fail_timeout=30s;
server 10.0.0.2:8080 max_fails=3 fail_timeout=30s;
}
After 3 failures within 30 seconds, the server is removed from rotation for 30 seconds.
Active Health Checks
The load balancer sends periodic probe requests to a health endpoint. This detects failures before real traffic hits them. Active checks are available in nginx Plus, HAProxy, and cloud load balancers:
# HAProxy active health check
backend app_servers
option httpchk GET /health
http-check expect status 200
server app1 10.0.0.1:8080 check inter 5s fall 3 rise 2
server app2 10.0.0.2:8080 check inter 5s fall 3 rise 2
Load Balancer Cost Comparison
| Solution | Type | Algorithms | Cost |
|---|---|---|---|
| nginx (open source) | Self-hosted | Round robin, weighted, least conn, IP hash | Free |
| nginx Plus | Self-hosted | All above + least time, consistent hash, active checks | $2,500/year |
| HAProxy | Self-hosted | Round robin, least conn, source hash, URI hash, random | Free / Enterprise from $1,995/yr |
| AWS ALB | Managed | Round robin, least outstanding requests | $0.0225/hr + $0.008/LCU-hr |
| AWS NLB | Managed | Flow hash (5-tuple) | $0.0225/hr + $0.006/NLCU-hr |
| Cloudflare LB | Managed | Round robin, weighted, least conn, geo | $5/mo + $0.50/500k DNS queries |
| GCP Cloud LB | Managed | Round robin, least conn, consistent hash | $0.025/hr + $0.008/GB |
Frequently Asked Questions
What is the default load balancing algorithm in nginx?
nginx defaults to weighted round robin. Without explicit weights, all servers have weight 1, making it plain round robin. This is fine for identical servers handling uniform traffic. If your servers differ in capacity or your requests vary significantly in processing time, switch to least_conn for better distribution without manual tuning.
When should I use sticky sessions instead of stateless load balancing?
Sticky sessions (via IP hash or cookie-based affinity) make sense when migrating legacy applications that store session state in server memory. For new applications, avoid sticky sessions entirely. Store session data in Redis or a database, keep your application servers stateless, and use least connections. Sticky sessions create hot spots and complicate deployments since you can't drain servers cleanly.
What is the difference between Layer 4 and Layer 7 load balancing?
Layer 4 load balancers route based on TCP/UDP information: source IP, destination IP, and port numbers. They're fast because they don't inspect packet contents. Layer 7 load balancers understand HTTP and can route based on URL path, headers, cookies, or request body. Layer 7 is more flexible but adds processing overhead. Use Layer 4 for raw throughput (databases, TCP services) and Layer 7 for HTTP applications where you need content-based routing.
How does consistent hashing help with caching?
Consistent hashing maps cache keys to servers on a hash ring so the same key always goes to the same server. Without it, adding or removing a cache server redistributes most keys, causing a cache stampede as every server refills its cache simultaneously. With consistent hashing, only 1/n of keys move when a server changes. This is critical for Memcached clusters and CDN origin shields.
Can I combine multiple load balancing algorithms?
Yes. A common pattern is geographic routing at the DNS or global load balancer level (route users to the nearest region) combined with least connections within each region. Another pattern: consistent hashing for cacheable requests (same URL hits same server) with least connections for API endpoints. HAProxy and Envoy support complex routing rules that mix algorithms based on request attributes.
How many backend servers before load balancing matters?
Load balancing matters from 2 servers onward -- that's the point where you need traffic distribution and failover. The algorithm choice becomes more critical as you scale. At 2-5 servers, the difference between round robin and least connections is minor. At 20+ servers with heterogeneous hardware, the algorithm meaningfully affects tail latency and resource utilization. Health checks matter at any scale.
What is the power of two choices?
The power of two choices is a load balancing strategy where you pick two servers at random and route to the less loaded one. Despite its simplicity, it achieves near-optimal distribution. Mathematically, pure random selection creates O(log n) maximum imbalance, while two random choices reduces it to O(log log n). Envoy proxy uses this algorithm by default for its outlier detection mechanism.
Conclusion
Start with least connections unless you have a specific reason not to -- it handles heterogeneous servers and variable request costs without configuration. Use consistent hashing when caching is involved. Use IP hash only as a stopgap for session affinity while you migrate to externalized session storage. Configure health checks regardless of algorithm -- routing to unhealthy servers negates any algorithmic advantage. And run load tests before production to verify your algorithm choice actually distributes load the way you expect.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
DNS Explained: From Domain Name to IP Address, Step by Step
DNS translates domain names into IP addresses in milliseconds. Trace the full resolution chain step by step, learn every record type, and debug common failures like NXDOMAIN and SERVFAIL.
12 min read
NetworkingWhat is BGP? The Protocol That Runs the Internet
A comprehensive guide to BGP covering autonomous systems, route selection, BGP hijacking, RPKI, anycast routing, and how 75,000 independent networks form a single navigable internet.
11 min read
NetworkingUnderstanding CIDR Notation and Subnetting (Without the Pain)
A practical guide to CIDR notation and subnetting covering binary IP math, prefix lengths, private ranges, VPC CIDR carving, and Kubernetes subnet sizing with worked examples.
9 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.