Availability Zones vs Regions: 1-2ms Latency, 99.99% SLA

Q: How do I check which AZ a resource is in

EC2 instances show their AZ in the console and CLI. Use aws ec2 describe-instances and check the Placement.AvailabilityZone field. For cross-account coordination, use AZ IDs instead of AZ names since names are randomized per account. The aws ec2 describe-availability-zones command shows the mapping between names and IDs.

The Physical Foundation of Cloud Reliability

Availability Zones (AZs) and regions are the building blocks of every cloud architecture decision you'll make. When AWS promises 99.99% uptime for a service, that guarantee assumes you've deployed across multiple AZs. When your application goes down because "the data center had an issue," it usually means you put everything in a single AZ. Understanding what AZs and regions actually are -- physically and logically -- changes how you design infrastructure.

This isn't abstract theory. Every production outage I've investigated in the past five years had a root cause tied to AZ or region architecture: a database in one AZ with no replica, a load balancer not spanning enough zones, or a region-scoped service dependency that took everything down. This guide covers the physical reality, the design patterns, and the failure modes you need to plan for.

What Are Availability Zones?

Definition: An Availability Zone is one or more discrete data centers within a cloud region, each with independent power, cooling, and networking. AZs within a region are connected by high-bandwidth, low-latency links and are designed so that a failure in one AZ does not cascade to others.

What Are Regions?

A region is a geographic area containing multiple Availability Zones. AWS has 30+ regions worldwide. GCP calls them regions with zones. Azure calls them regions with Availability Zones (added later in Azure's evolution). Each region is completely independent -- a region-level outage in us-east-1 doesn't affect eu-west-1.

The Physical Reality

Cloud providers are deliberately vague about the physical details, but here's what we know:

Each AZ consists of one or more data centers (AWS has confirmed some AZs have multiple buildings)
AZs within a region are typically 10-100 km apart -- far enough for independent failure domains but close enough for low-latency replication
Inter-AZ latency is usually 1-2ms round trip
Each AZ has independent power feeds, often from different substations or power grids
AZs have independent network connections to the internet and to AWS's backbone

Pro tip: AZ names are randomized per AWS account. Your us-east-1a is not the same physical facility as another account's us-east-1a. AWS does this to distribute load evenly. Use AZ IDs (like use1-az1) when coordinating across accounts.

Cross-Cloud Comparison

Concept	AWS	GCP	Azure
Geographic unit	Region	Region	Region (Geography)
Fault domain	Availability Zone	Zone	Availability Zone
AZs per region	2-6 (typically 3)	3 (typically)	3 (in supported regions)
Total regions	30+	40+	60+
AZ-level SLA	No (region-level with multi-AZ)	No (region-level with multi-zone)	Yes (per-zone SLA for VMs)

GCP's zones are functionally identical to AWS AZs. Azure added Availability Zones later, so some older Azure regions don't have them -- check before assuming zone redundancy is available.

Designing for AZ Redundancy: Step by Step

Step 1: Deploy Compute Across at Least Two AZs

Every Auto Scaling Group, ECS service, or Kubernetes deployment should span at least two AZs (three is better). If one AZ fails, the surviving AZs handle the load. Your ASG should have enough capacity in N-1 AZs to serve your full traffic.

# Terraform: ASG spanning three AZs
resource "aws_autoscaling_group" "app" {
  min_size         = 3
  max_size         = 12
  desired_capacity = 6

  vpc_zone_identifier = [
    aws_subnet.private_a.id,
    aws_subnet.private_b.id,
    aws_subnet.private_c.id,
  ]
}

Step 2: Use Multi-AZ Databases

RDS Multi-AZ deploys a synchronous standby replica in a different AZ. If the primary fails, RDS automatically promotes the standby -- typically in 60-120 seconds. Aurora goes further with up to 15 read replicas across three AZs and failover in under 30 seconds.

Step 3: Distribute Load Balancers

Application Load Balancers (ALBs) automatically distribute across the AZs you enable. Enable all AZs in your VPC. Cross-zone load balancing is on by default for ALBs, meaning each AZ's targets receive traffic regardless of which AZ the client connects to.

Step 4: Replicate Data Stores

ElastiCache Redis can run in cluster mode with replicas across AZs. DynamoDB replicates across three AZs automatically (it's built in). EFS is multi-AZ by default. EBS volumes are AZ-scoped -- if you need cross-AZ data availability, use EFS, S3, or database replication.

Watch out: EBS volumes exist in a single AZ. If that AZ fails, the volume is inaccessible until the AZ recovers. Never rely on a single EBS volume for critical data without snapshots or cross-AZ replication.

Zone-Scoped vs Region-Scoped Services

Understanding which AWS services are zone-scoped vs region-scoped determines your blast radius:

Zone-Scoped (affected by AZ failure)	Region-Scoped (survives AZ failure)
EC2 instances	S3
EBS volumes	DynamoDB
NAT Gateways	Lambda
ElastiCache nodes	SQS
RDS single-AZ instances	ALB (multi-AZ)
Subnets	Route 53

Region-scoped services replicate across AZs internally. When you write an object to S3, it's replicated across at least three AZs before the write is acknowledged. DynamoDB does the same. These services survive AZ failures transparently.

Failure Modes and SLAs

AZ Failure

Partial or complete AZ failures happen roughly once per year across all AWS regions. They're usually partial -- a power issue affecting some racks, a network partition isolating some hosts. Designing for multi-AZ means these events cause degraded performance (fewer healthy targets) rather than full outages.

Region Failure

Full region failures are extremely rare -- a handful of incidents in AWS's history. However, region-wide service degradation (like the us-east-1 events in 2017 and 2020) affects multiple services simultaneously. If your SLA requires resilience against region failure, you need a multi-region architecture.

Understanding SLA Documents

AWS SLAs are carefully worded. Key points:

EC2 SLA: 99.99% for instances deployed across two or more AZs. Single-AZ instances have no availability SLA.
RDS Multi-AZ SLA: 99.95%. Single-AZ RDS: no SLA.
S3 SLA: 99.9% availability, 99.999999999% (11 nines) durability.
SLA credits are billing credits, not refunds. A 10% credit on your EC2 bill doesn't cover the revenue lost from a 4-hour outage.

Cost of Multi-AZ Deployment

Multi-AZ adds cost, but it's modest compared to multi-region:

RDS Multi-AZ: roughly 2x the cost of single-AZ (you're running two instances)
Cross-AZ data transfer: $0.01/GB each direction -- monitor this for high-throughput workloads
NAT Gateways per AZ: $32/month each -- three AZs means $96/month just for NAT
Additional compute capacity: running N+1 instances across AZs for redundancy

For most workloads, multi-AZ adds 20-40% to your infrastructure cost. It's almost always worth it for production -- the cost of downtime exceeds the cost of redundancy.

Frequently Asked Questions

How many Availability Zones should I use?

Use at least two AZs for any production workload. Three AZs is the standard recommendation because it provides redundancy even if one AZ fails and another is under maintenance. Using three AZs also enables quorum-based systems (like etcd in Kubernetes) to tolerate a single AZ failure while maintaining consensus.

Do all AWS services support multi-AZ?

Most managed services either support multi-AZ explicitly (RDS, ElastiCache) or are region-scoped and handle it internally (S3, DynamoDB, Lambda, SQS). Some services like EBS and EC2 are inherently zone-scoped. Always check the service documentation for multi-AZ support before assuming it's there.

What is the latency between Availability Zones?

Inter-AZ latency within an AWS region is typically 1-2 milliseconds round trip. This is low enough for synchronous database replication and most application communication. Cross-region latency is much higher -- typically 20-200ms depending on geographic distance -- which is why multi-region adds complexity to data consistency.

Can an entire AWS region go down?

Full region outages are extremely rare but have occurred. More commonly, specific services within a region degrade -- for example, the 2020 Kinesis outage in us-east-1 cascaded to affect Cognito, CloudWatch, and Lambda. Design critical paths to avoid dependencies on multiple services in a single region if region resilience matters to you.

What is the difference between an AZ and a data center?

An AZ is a logical concept that maps to one or more physical data centers. AWS has confirmed that some AZs contain multiple data center buildings connected by dedicated fiber. The key distinction is that an AZ is designed as a single failure domain -- all buildings in one AZ share fate for power, networking, and physical events.

How do I check which AZ a resource is in?

EC2 instances show their AZ in the console and CLI. Use aws ec2 describe-instances and check the Placement.AvailabilityZone field. For cross-account coordination, use AZ IDs instead of AZ names since names are randomized per account. The aws ec2 describe-availability-zones command shows the mapping between names and IDs.

Multi-AZ Is Your Baseline

Single-AZ deployment is acceptable for development and testing. For anything that serves users or processes data you care about, deploy across at least two AZs -- preferably three. Use RDS Multi-AZ or Aurora for databases. Span your ASGs and ECS services across all available zones. Monitor cross-AZ data transfer costs but don't let them discourage multi-AZ design. The availability gains are worth the modest cost premium, and every cloud provider SLA assumes you've done this work.

Understanding Availability Zones and Regions: A Practical Guide