Blue-Green Deployments: Zero-Downtime Releases Explained
Blue-green deployments eliminate downtime by running two identical environments and switching traffic instantly. Learn the pattern with Kubernetes, AWS CodeDeploy, and Nginx, plus how to handle the tricky part: database migrations.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

What You're Here For -- Pick Your Section
Blue-green deployments are a release pattern: two identical production environments (blue = current, green = new), a single traffic switch between them, and a rollback path that is literally one command. That is the whole idea. Almost every question readers bring to this page is about one of five specific pieces, so here is the jump table.
| If you're here because... | Jump to |
|---|---|
| You need the 30-second definition and a comparison with canary / rolling | the table right below this one |
| You want the step-by-step release sequence | How Blue-Green Works |
| Your app has a relational database and you are worried about migrations | The Database Problem |
| You are implementing on Kubernetes, Nginx, or AWS CodeDeploy | the three implementation sections further down |
| You are trying to justify the 2x infra cost to a skeptical manager | Cost of Running Two Environments |
If you want the full walkthrough -- including the session-stickiness, long-running-request, and cache-warming pitfalls that bit me on three separate migrations -- read straight through. Everything is structured in the order you would actually execute a real blue-green rollout.
Blue-Green vs Canary vs Rolling vs In-Place
| Strategy | Traffic Split | Rollback Speed | Infrastructure Cost | Risk Level |
|---|---|---|---|---|
| Blue-Green | 100% switch | Seconds | 2x (two full environments) | Low |
| Canary | Gradual % | Seconds | 1x + small canary | Very low |
| Rolling Update | Instance-by-instance | Minutes | 1x + buffer | Medium |
| In-Place | N/A (replace) | Minutes to hours | 1x | High |
How Blue-Green Deployments Work: Step by Step
Here's the exact sequence for a blue-green release:
- Provision the green environment -- spin up an identical copy of your production infrastructure (servers, containers, load balancers).
- Deploy the new version to green -- install your application code, run migrations if needed, warm caches.
- Run smoke tests against green -- hit health endpoints, run integration tests, verify critical user flows against the green environment directly (not through the public load balancer).
- Switch the load balancer -- update DNS, change target groups, or flip the reverse proxy to point at the green environment.
- Monitor closely -- watch error rates, latency, and business metrics for the first 15-30 minutes.
- Keep blue warm -- don't tear down blue immediately. Keep it running for a rollback window (typically 1-24 hours).
- Tear down or recycle blue -- once you're confident, the old blue environment becomes the next green for your subsequent deployment.
The Database Problem
Blue-green deployments sound clean until you remember that both environments share the same database. This is where most teams get tripped up, and it's the single hardest part of implementing this strategy correctly.
Schema Migrations in Blue-Green
You can't just run ALTER TABLE and flip traffic. If the green version expects a new column that blue doesn't know about, switching back to blue will break. The solution is expand-and-contract migrations:
- Expand: Add the new column (nullable or with a default) while blue is still serving traffic. Both blue and green can work with the expanded schema.
- Deploy green: Green writes to the new column. Blue ignores it.
- Switch to green: All traffic now uses the new column.
- Contract: In a subsequent release, remove the old column or constraint that's no longer needed.
-- Step 1: Expand (run while blue is active)
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
-- Step 2: Green deployment writes to email_verified
-- Step 3: Switch traffic to green
-- Step 4: Contract (next release cycle)
-- Only after blue is fully decommissioned
ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
Watch out: Never run destructive migrations (dropping columns, renaming tables) during the expand phase. If you need to roll back to blue, those changes will break the old version. Destructive changes always happen in the contract phase, after the rollback window has closed.
Implementing Blue-Green With Nginx
The simplest blue-green setup uses Nginx as a reverse proxy with two upstream blocks. Switching is a config reload -- no downtime.
# /etc/nginx/conf.d/app.conf
upstream blue {
server 10.0.1.10:3000;
server 10.0.1.11:3000;
}
upstream green {
server 10.0.2.10:3000;
server 10.0.2.11:3000;
}
server {
listen 80;
server_name app.example.com;
# Switch by changing this single line
location / {
proxy_pass http://green;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
To switch from green back to blue, change proxy_pass http://green; to proxy_pass http://blue; and run nginx -s reload. Nginx reloads gracefully -- existing connections finish on the old upstream while new connections go to the new one.
Blue-Green on Kubernetes
Kubernetes makes blue-green straightforward using Services and label selectors. You deploy both versions simultaneously and switch the Service selector.
# blue deployment (current)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: app
image: myapp:1.0.0
---
# green deployment (new)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myapp:1.1.0
---
# Service -- switch by changing the version selector
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
version: green # flip to "blue" to rollback
ports:
- port: 80
targetPort: 3000
The switch is a single kubectl patch command:
# Switch to green
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'
# Rollback to blue
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'
Blue-Green With AWS CodeDeploy
AWS CodeDeploy has native blue-green support for both EC2 instances and ECS services. For ECS, it works with Application Load Balancers to shift traffic between two target groups.
# appspec.yml for ECS blue-green
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:aws:ecs:us-east-1:123456789:task-definition/myapp:2"
LoadBalancerInfo:
ContainerName: "myapp"
ContainerPort: 3000
Hooks:
- BeforeAllowTraffic: "LambdaFunctionToValidateGreen"
- AfterAllowTraffic: "LambdaFunctionToRunSmokeTests"
CodeDeploy handles the target group switching, runs your validation hooks, and supports automatic rollback if CloudWatch alarms trigger during the deployment. It's the most fully managed option if you're already in the AWS ecosystem.
Cost of Running Two Environments
The elephant in the room: blue-green requires double the infrastructure during deployments. Here's what that actually costs in practice:
| Approach | Extra Cost | When It Makes Sense |
|---|---|---|
| Always-on dual environments | ~2x compute | Mission-critical apps with frequent deploys (multiple per day) |
| Spin up green on demand | Compute only during deploy window | Apps that deploy weekly or less frequently |
| Kubernetes label switching | 2x pods during deploy, then scale down | Container-based workloads with auto-scaling |
| Serverless (Lambda/Cloud Functions) | Nearly zero (pay per invocation) | Serverless architectures -- blue-green is essentially free |
Pro tip: You don't have to keep both environments running permanently. Spin up the green environment 30 minutes before deploy, run your tests, switch traffic, then keep blue for your rollback window (2-4 hours is usually enough). With infrastructure-as-code, provisioning a fresh environment should be fast and repeatable.
Common Pitfalls and How to Avoid Them
Session Stickiness
If your app stores sessions in memory, switching from blue to green logs out every user. The fix: externalize sessions to Redis, a database, or use JWTs. This isn't just a blue-green requirement -- it's a prerequisite for any multi-instance deployment.
Long-Running Requests
When you switch the load balancer, in-flight requests to blue need to complete. Most load balancers handle this with connection draining -- new requests go to green while existing connections to blue are allowed to finish. Make sure connection draining is enabled and set to an appropriate timeout (30-60 seconds for most web apps).
Cache Invalidation
If green uses a different cache key scheme or has different cached data, you might see stale data or cache misses after switching. Warm the green environment's caches before switching traffic, or use a shared cache layer (Redis, Memcached) that both environments access.
Frequently Asked Questions
What is a blue-green deployment in simple terms?
A blue-green deployment runs two identical production environments side by side. One (blue) handles live traffic while the other (green) gets the new version deployed to it. Once the new version is verified, traffic switches from blue to green instantly. If anything goes wrong, traffic switches back to blue in seconds. It eliminates downtime during releases.
How is blue-green different from a canary deployment?
Blue-green switches 100% of traffic at once -- all users move to the new version simultaneously. Canary deployments gradually shift traffic, starting with a small percentage (1-5%) and increasing over time. Blue-green is simpler to implement but riskier since all users are affected at once. Canary is safer but requires more sophisticated traffic management.
Do blue-green deployments require double the infrastructure cost?
Only during the deployment window. You need both environments running simultaneously while deploying and during your rollback window. Most teams spin up the green environment on demand and tear it down after the rollback window closes, making the extra cost proportional to deployment frequency rather than a permanent 2x multiplier.
How do you handle database migrations with blue-green?
Use expand-and-contract migrations. First, apply backward-compatible schema changes (add nullable columns, create new tables) while blue is active. Deploy green to use the new schema. After switching and closing the rollback window, apply destructive changes (drop old columns) in the next release. Never run breaking schema changes during the switch.
What tools support blue-green deployments?
AWS CodeDeploy, Kubernetes (via Service selectors or Argo Rollouts), Nginx and HAProxy (upstream switching), AWS Elastic Beanstalk (environment URL swaps), Azure App Service (deployment slots), and Cloud Foundry. Most modern container orchestrators support blue-green natively or through extensions.
How long should I keep the old environment running after switching?
Keep the old (blue) environment running for your rollback window -- typically 1-4 hours for most applications. High-traffic applications or those with complex data flows might extend this to 24 hours. The rollback window should be long enough to catch issues that only surface under sustained production load, like memory leaks or slow database query degradation.
Can I do blue-green deployments with a managed database like RDS?
Yes, but both environments share the same database instance. This is actually the standard approach -- you don't duplicate the database. The critical requirement is that schema migrations must be backward-compatible so both the old and new application versions can work with the same database schema simultaneously during the transition period.
Conclusion
Blue-green deployments trade infrastructure cost for deployment safety. The pattern is straightforward: two environments, one load balancer switch, instant rollback capability. The hard part isn't the switching -- it's the database migrations and the discipline to make every schema change backward-compatible.
If you're deploying to Kubernetes, start with the Service selector approach. It's the lowest-effort path and costs nothing extra beyond the temporary doubled pod count. If you're on AWS, CodeDeploy handles the orchestration and adds automated rollback via CloudWatch alarms. Either way, the investment pays for itself the first time you catch a bad deploy and roll back in seconds instead of scrambling for 45 minutes.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Multi-Cluster Kubernetes: Argo CD ApplicationSet Patterns
When 10+ clusters or 50+ services break hand-written GitOps. ApplicationSet's four generators (cluster list, Git directory, PR, cluster decision), real production patterns (env promotion, per-tenant, multi-region failover, preview envs), and the sharp edges (template debugging, cascading mistakes, RBAC).
11 min read
AI/ML EngineeringLLM Latency: TTFT, ITL, and Why End-User Latency Isn't What You Think
LLM latency decomposes into TTFT (time to first token, 300-1500ms), ITL (inter-token, 10-30ms), and total time. Each has different causes and fixes. Why streaming dominates UX, when Cerebras/Groq beat Claude on speed, and the optimization playbook.
11 min read
DevOpsPython uv vs pip vs Poetry vs PDM: Speed Benchmarks 2026
Real benchmarks: uv installs Django + ML stack in 8s vs pip's 90s, Poetry's 50s, PDM's 38s. Why uv is fast (Rust + parallelism + PubGrub), what pip still does that uv doesn't, migration paths, and where Poetry's ergonomics still win.
12 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.