Networking

Load Balancing Algorithms: Round Robin, Least Connections, and More

A practical guide to load balancing algorithms -- round robin, least connections, IP hash, consistent hashing, and power of two choices -- with nginx and HAProxy configurations.

A
Abhishek Patel10 min read

Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Load Balancing Algorithms: Round Robin, Least Connections, and More
Load Balancing Algorithms: Round Robin, Least Connections, and More

Your Load Balancer Is Making Thousands of Decisions Per Second

Every request hitting your production infrastructure passes through a load balancing algorithm that decides which backend server handles it. Pick the wrong algorithm and you get uneven load distribution, hot spots, cache misses, and degraded performance under pressure. The default round robin works fine for many setups, but the moment your servers aren't identical or your traffic patterns have affinity requirements, you need to understand your options.

I've seen teams run into capacity problems that weren't actually about capacity -- they were about distribution. One server at 95% CPU while three others idled at 20%. The fix wasn't more servers. It was changing one line in an nginx config.

What Is Load Balancing?

Definition: Load balancing is the process of distributing incoming network traffic across multiple servers to ensure no single server is overwhelmed. A load balancer sits between clients and backend servers, using an algorithm to select which server handles each request, while providing health checks, failover, and optionally SSL termination.

Load balancers operate at two layers: Layer 4 (TCP/UDP -- routing based on IP and port) and Layer 7 (HTTP -- routing based on URL path, headers, cookies). The algorithm choice matters at both layers, but Layer 7 gives you more information to make smarter decisions.

Round Robin

The simplest algorithm: requests go to servers in order, cycling through the list. Server A, Server B, Server C, Server A, Server B, Server C.

# nginx default - round robin
upstream backend {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

When Round Robin Works

  • All servers have identical hardware and capacity.
  • Requests are roughly uniform in cost (similar processing time).
  • No session affinity requirements.
  • Stateless application servers.

When Round Robin Breaks

If Server A is a 4-core machine and Server B is an 8-core machine, round robin sends them equal traffic. Server A saturates while Server B coasts. Similarly, if some requests take 10ms and others take 5 seconds, round robin can stack expensive requests on one server by bad luck.

Weighted Round Robin

Assign each server a weight proportional to its capacity. A server with weight 3 gets three times as many requests as a server with weight 1.

upstream backend {
    server 10.0.0.1:8080 weight=3;  # 8-core, 32GB
    server 10.0.0.2:8080 weight=1;  # 4-core, 16GB
    server 10.0.0.3:8080 weight=2;  # 4-core, 32GB
}

This handles heterogeneous hardware but still assumes requests are uniform in cost. You're manually tuning weights, which means re-tuning whenever you change server specs.

Least Connections

Route each new request to the server with the fewest active connections. This naturally adapts to differences in both server capacity and request cost -- faster servers complete requests sooner and accumulate fewer active connections.

upstream backend {
    least_conn;
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

Why Least Connections Is Often the Best Default

  1. It adapts to servers of different speeds without manual weight tuning.
  2. It handles variable request durations -- servers processing slow requests naturally receive fewer new ones.
  3. It responds to real-time load, not a preconfigured assumption about capacity.
  4. It gracefully handles a slow server (e.g., garbage collection pause) by routing traffic away from it while it's stalled.

Pro tip: If you're unsure which algorithm to use, start with least connections. It handles more edge cases than round robin with zero additional configuration. Switch to something more specialized only when you have a specific reason.

IP Hash

Compute a hash of the client's IP address and use it to consistently route that client to the same server. This provides session affinity (sticky sessions) without cookies or application-level session stores.

upstream backend {
    ip_hash;
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

IP Hash Trade-offs

  • Pro: Clients consistently hit the same server, enabling server-local caches and in-memory sessions.
  • Con: Corporate NATs and shared IPs can funnel thousands of users to one server. A large office behind a single NAT IP creates a hot spot.
  • Con: Adding or removing servers rehashes most clients to different servers, invalidating their sessions and caches.

Consistent Hashing

A more sophisticated version of IP hash that minimizes redistribution when servers are added or removed. Servers are placed on a hash ring, and each request is mapped to the nearest server on the ring. When a server is added, only a fraction (roughly 1/n) of requests move.

Consistent hashing is essential for caching layers. If you're load-balancing across Memcached or Redis instances, you want the same keys to hit the same servers. Regular hashing redistributes nearly all keys when a server changes; consistent hashing moves only the minimum necessary.

# nginx Plus or third-party module
upstream cache_backend {
    hash $request_uri consistent;
    server 10.0.0.1:11211;
    server 10.0.0.2:11211;
    server 10.0.0.3:11211;
}

Random with Two Choices (Power of Two)

Pick two servers at random, then send the request to whichever has fewer active connections. This surprisingly simple algorithm produces near-optimal load distribution with minimal overhead.

The math behind it is compelling: random selection alone creates an O(log n) maximum load imbalance. Random with two choices reduces it to O(log log n) -- an exponential improvement from a trivial change. This is known as the "power of two choices" principle.

Pro tip: Random with two choices is excellent for large-scale distributed systems where maintaining a global connection count is expensive. Each load balancer only needs to probe two servers, not track the state of all of them. It's used in Envoy proxy and several cloud-native service meshes.

Algorithm Comparison

AlgorithmBest ForWeaknessState Needed
Round RobinIdentical servers, uniform requestsIgnores server capacity and loadCounter only
Weighted Round RobinHeterogeneous hardwareManual tuning, ignores request costCounter + weights
Least ConnectionsVariable request durationsRequires connection trackingActive connection counts
IP HashSession affinityNAT hot spots, rehashing on changesNone (stateless hash)
Consistent HashingCaching layersMore complex implementationHash ring
Random Two ChoicesLarge distributed systemsSlightly less optimal than least-connTwo random probes

Health Checks

No algorithm matters if you're routing to dead servers. Health checks come in two flavors:

Passive Health Checks

The load balancer monitors responses from normal traffic. If a server returns errors or times out, it's marked unhealthy after a threshold. nginx does this by default:

upstream backend {
    server 10.0.0.1:8080 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:8080 max_fails=3 fail_timeout=30s;
}

After 3 failures within 30 seconds, the server is removed from rotation for 30 seconds.

Active Health Checks

The load balancer sends periodic probe requests to a health endpoint. This detects failures before real traffic hits them. Active checks are available in nginx Plus, HAProxy, and cloud load balancers:

# HAProxy active health check
backend app_servers
    option httpchk GET /health
    http-check expect status 200
    server app1 10.0.0.1:8080 check inter 5s fall 3 rise 2
    server app2 10.0.0.2:8080 check inter 5s fall 3 rise 2

Load Balancer Cost Comparison

SolutionTypeAlgorithmsCost
nginx (open source)Self-hostedRound robin, weighted, least conn, IP hashFree
nginx PlusSelf-hostedAll above + least time, consistent hash, active checks$2,500/year
HAProxySelf-hostedRound robin, least conn, source hash, URI hash, randomFree / Enterprise from $1,995/yr
AWS ALBManagedRound robin, least outstanding requests$0.0225/hr + $0.008/LCU-hr
AWS NLBManagedFlow hash (5-tuple)$0.0225/hr + $0.006/NLCU-hr
Cloudflare LBManagedRound robin, weighted, least conn, geo$5/mo + $0.50/500k DNS queries
GCP Cloud LBManagedRound robin, least conn, consistent hash$0.025/hr + $0.008/GB

Frequently Asked Questions

What is the default load balancing algorithm in nginx?

nginx defaults to weighted round robin. Without explicit weights, all servers have weight 1, making it plain round robin. This is fine for identical servers handling uniform traffic. If your servers differ in capacity or your requests vary significantly in processing time, switch to least_conn for better distribution without manual tuning.

When should I use sticky sessions instead of stateless load balancing?

Sticky sessions (via IP hash or cookie-based affinity) make sense when migrating legacy applications that store session state in server memory. For new applications, avoid sticky sessions entirely. Store session data in Redis or a database, keep your application servers stateless, and use least connections. Sticky sessions create hot spots and complicate deployments since you can't drain servers cleanly.

What is the difference between Layer 4 and Layer 7 load balancing?

Layer 4 load balancers route based on TCP/UDP information: source IP, destination IP, and port numbers. They're fast because they don't inspect packet contents. Layer 7 load balancers understand HTTP and can route based on URL path, headers, cookies, or request body. Layer 7 is more flexible but adds processing overhead. Use Layer 4 for raw throughput (databases, TCP services) and Layer 7 for HTTP applications where you need content-based routing.

How does consistent hashing help with caching?

Consistent hashing maps cache keys to servers on a hash ring so the same key always goes to the same server. Without it, adding or removing a cache server redistributes most keys, causing a cache stampede as every server refills its cache simultaneously. With consistent hashing, only 1/n of keys move when a server changes. This is critical for Memcached clusters and CDN origin shields.

Can I combine multiple load balancing algorithms?

Yes. A common pattern is geographic routing at the DNS or global load balancer level (route users to the nearest region) combined with least connections within each region. Another pattern: consistent hashing for cacheable requests (same URL hits same server) with least connections for API endpoints. HAProxy and Envoy support complex routing rules that mix algorithms based on request attributes.

How many backend servers before load balancing matters?

Load balancing matters from 2 servers onward -- that's the point where you need traffic distribution and failover. The algorithm choice becomes more critical as you scale. At 2-5 servers, the difference between round robin and least connections is minor. At 20+ servers with heterogeneous hardware, the algorithm meaningfully affects tail latency and resource utilization. Health checks matter at any scale.

What is the power of two choices?

The power of two choices is a load balancing strategy where you pick two servers at random and route to the less loaded one. Despite its simplicity, it achieves near-optimal distribution. Mathematically, pure random selection creates O(log n) maximum imbalance, while two random choices reduces it to O(log log n). Envoy proxy uses this algorithm by default for its outlier detection mechanism.

Conclusion

Start with least connections unless you have a specific reason not to -- it handles heterogeneous servers and variable request costs without configuration. Use consistent hashing when caching is involved. Use IP hash only as a stopgap for session affinity while you migrate to externalized session storage. Configure health checks regardless of algorithm -- routing to unhealthy servers negates any algorithmic advantage. And run load tests before production to verify your algorithm choice actually distributes load the way you expect.

A

Written by

Abhishek Patel

Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.

Related Articles

Enjoyed this article?

Get more like this in your inbox. No spam, unsubscribe anytime.

Comments

Loading comments...

Leave a comment

Stay in the loop

New articles delivered to your inbox. No spam.