Best Cloud GPU Providers for AI Training (2026): RunPod vs Lambda Labs vs Paperspace vs Vast.ai vs Together AI
Benchmarked comparison of RunPod, Lambda Labs, Paperspace, Vast.ai, and Together AI for AI training in 2026. Real H100 hourly rates, multi-node reliability, spin-up times, and a decision matrix for picking the right cloud GPU provider.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Quick Answer: The Best Cloud GPU Provider Depends on Your Workload
After running Llama 3.1 fine-tunes, Stable Diffusion XL training, and vLLM inference across RunPod, Lambda Labs, Paperspace, Vast.ai, and Together AI between October 2025 and March 2026, the picks fall out cleanly. RunPod wins for startups and solo builders: community cloud H100s land at $1.99/hr, spin-up is under 30 seconds, and the template ecosystem is the widest in the market. Lambda Labs wins for serious training runs — 8x H100 SXM clusters at $2.49/hr/GPU with InfiniBand interconnect and a reservations desk that actually answers. Paperspace (now part of DigitalOcean) wins when you need a managed notebook plus deployment in the same dashboard. Vast.ai wins on raw spot pricing — H100s dip below $1.60/hr on weekends — but you're renting someone's basement rig. Together AI wins when you don't want to manage a GPU at all: pay per token on 200+ open-source models with fine-tuning included. The honest answer for 80% of teams in 2026: RunPod for dev, Lambda Reservations for production training.
Last updated: April 2026 — verified H100/A100 hourly rates on all five providers, community-cloud availability on RunPod, Lambda Cloud queue times, Paperspace Gradient pricing post-DigitalOcean integration, Vast.ai on-demand spot floor, and Together AI token pricing for Llama 3.1 405B and Qwen 3.5 72B.
Hero Comparison: Five Cloud GPU Providers at a Glance
| Provider | H100 Starting Price | Free Tier | Best For | Key Differentiator |
|---|---|---|---|---|
| RunPod | $1.99/hr (community) | $0 credits; pay-as-you-go | Solo builders, rapid prototyping, inference serving | Sub-30s cold start, Docker-native templates, serverless GPU endpoints |
| Lambda Labs | $2.49/hr (on-demand); $1.85/hr reserved | No free tier | Multi-node training, InfiniBand clusters, longer reservations | 8x H100 SXM with NVLink + IB at reservation scale |
| Paperspace (DO) | $3.18/hr (H100x1) | Free CPU notebooks; limited free GPU minutes | Managed notebooks, Gradient deployments, education | Integrated notebook + model deployment + DO infra |
| Vast.ai | $1.60-$2.20/hr (spot) | No free tier; $5 min deposit | Weekend experiments, maximum cost compression | Peer-to-peer marketplace, lowest absolute price |
| Together AI | Token-based (e.g. $0.88/M tokens Llama 70B) | $5 signup credit | Inference without managing hardware, fine-tuning SaaS | Pay-per-token on 200+ open models; no GPU provisioning |
Affiliate disclosure: links to RunPod and Paperspace may earn commission through their referral programs. Lambda Labs, Vast.ai, and Together AI are direct unpaid links.
The trade-offs below — throughput benchmarks, real hourly bills after egress and storage, and where each provider breaks down — come from running identical workloads on all five. The deeper playbook I use in production (how to split training between reserved and spot, egress-avoidance patterns, Vast.ai trust-score filtering) is in a follow-up I send to the newsletter.
If you're still deciding whether to rent GPUs at all versus buy your own, our local-GPU benchmark article covers the RTX 4090 / 5090 / A100 math. This piece is specifically about the rental side of the market.
RunPod: The Startup-Friendly Default
RunPod splits its GPU inventory into Secure Cloud (tier-3 DCs, 99.99% SLA) and Community Cloud (peer hosts, 20-40% cheaper, no SLA). In April 2026, Community Cloud H100 PCIe hits $1.99/hr, Secure Cloud H100 SXM $2.79/hr, A100 80GB $1.19/hr community. The gap to the nearest hyperscaler is 4-5x — AWS p5.48xlarge lists at $98.32/hr for 8x H100, which is $12.29/hr per GPU.
What I actually use RunPod for: every personal experiment and every inference-serving gig under 500 concurrent users. The template ecosystem — pre-baked images for ComfyUI, vLLM, Axolotl, Jupyter — means a pod is running Stable Diffusion XL 60 seconds after you click. Serverless endpoints are the other kill feature: pay per-second, auto-scale to zero between requests, cold-start on a warmed image under 8 seconds. For anyone running vLLM versus llama.cpp decisions in production, RunPod Serverless is the lowest-friction deployment path.
# Spin up an H100 pod with the vLLM template via CLI
runpodctl create pod \
--name llama-inference \
--gpu "NVIDIA H100 PCIe" \
--imageName runpod/vllm:latest \
--volumeInGb 50 \
--containerDiskInGb 20 \
--env "MODEL=meta-llama/Llama-3.1-70B-Instruct"
Where RunPod falls apart: Community Cloud availability is spiky. Peak hours (US evenings, weekdays) see H100 capacity exhaust in under a minute when a fine-tune trend hits Twitter. Expect to refresh the capacity page 5-8 times during a launch week. The other honest weakness: multi-node training isn't RunPod's strength. For 4+ node runs with InfiniBand, you're better off on Lambda Reservations — I've hit RunPod throttling on simultaneous NCCL init twice in the last year.
Lambda Labs: The Serious-Training Operator's Pick
Lambda Cloud runs hyperscaler-grade infra at a 3-4x discount to AWS/GCP. On-demand H100 PCIe lands at $2.49/hr; 8x H100 SXM (NVLink + 3.2 Tbps InfiniBand) at $2.99/hr/GPU. Reservations drop to $1.85-$2.10/hr on one-year commits — the range where a self-run 70B fine-tune starts beating OpenAI custom endpoints.
Lambda's differentiator is the capacity layer, not the price. When I ran a 1.2B-token continued pretraining job over 72 hours on 2x 8xH100 nodes, Lambda's provisioning desk put the nodes in the same InfiniBand fabric with zero stitching required. NCCL all-reduce measured 370 GB/s aggregate — within 4% of theoretical. On RunPod the same test needed custom networking their ops team doesn't formally support.
# Lambda Cloud API - provision and launch a training job
lambda instance-types list --gpu h100
lambda instance launch \
--instance-type-name gpu_8x_h100_sxm5 \
--region us-west-2 \
--ssh-key-name main
# After SSH, standard torchrun with Lambda's preconfigured NCCL
torchrun --nnodes 2 --nproc_per_node 8 \
--master_addr $MASTER_ADDR --master_port 29500 \
train.py --config llama-70b-ft.yaml
Lambda Stack — their preinstalled driver/CUDA/PyTorch image — eliminates a full day of driver-matching hell, especially when training PyTorch distributed across H100 + A100 in the same run. Their docs are the cleanest in the category; RunPod and Vast.ai both still treat docs as an afterthought.
Where Lambda falls apart: availability. Lambda Cloud On-Demand is functionally unavailable during research-conference seasons (NeurIPS submission deadline, ICLR camera-ready week). Their capacity page will show "0 available" for days on H100 8x SXM. Solution is reservations, but reservations need 30-90 days of lead time. For a "I need 4 H100s right now" moment, Lambda is the wrong tool — use RunPod or Vast.ai instead.
Paperspace (Gradient): The Managed-Notebook Incumbent
Paperspace was acquired by DigitalOcean in mid-2023 and now sits inside the DO dashboard. The two relevant products: Core (raw GPU VMs) and Gradient (managed Jupyter notebooks + model deployments). H100x1 on Core lists at $3.18/hr and Gradient Growth plan at $8/month gives you a few free hours monthly plus discounted per-hour GPU time on A4000/A5000 class cards.
Paperspace's place in the market is education and managed notebooks. For a graduate ML course or a cohort of 50 students, Gradient notebooks with persistent storage and shared environments are hard to beat. DigitalOcean's DO-integrated pricing lets you run preprocessing on Droplets and push to Paperspace GPUs without cross-provider egress charges.
Where Paperspace falls apart: raw price. At $3.18/hr for H100 on-demand, Core is the most expensive option here — roughly 1.6x RunPod and 1.3x Lambda. Gradient notebook runtime limits (6-hour idle timeout on Growth) also bite training jobs; I lost 90 minutes of compute to a QLoRA fine-tune that hit idle during a validation pass. If your workflow is "SSH in, run for 72 hours, walk away," Paperspace is the wrong fit.
Pricing Comparison: Real 2026 Hourly Rates
| GPU Config | RunPod Community | RunPod Secure | Lambda On-Demand | Lambda Reserved (1yr) | Paperspace | Vast.ai Spot |
|---|---|---|---|---|---|---|
| RTX 4090 24GB | $0.39/hr | $0.69/hr | Not offered | Not offered | Not offered | $0.22-$0.45/hr |
| A100 80GB | $1.19/hr | $1.89/hr | $1.29/hr | $0.99/hr | $3.09/hr | $0.74-$1.10/hr |
| H100 80GB PCIe | $1.99/hr | $2.79/hr | $2.49/hr | $1.85/hr | $3.18/hr | $1.60-$2.20/hr |
| 8x H100 SXM (node) | $15.92/hr (not NVLink) | $22.32/hr | $23.92/hr | $14.80/hr | Not offered | $12.80-$17.60/hr |
| H200 141GB | $3.49/hr (limited) | $4.49/hr | $3.99/hr | $2.89/hr | Coming Q3 2026 | $2.80-$3.60/hr |
For context: AWS p5.48xlarge (8x H100 SXM) is $98.32/hr on-demand or $39.20/hr with a 3-year reserved commit. GCP A3 Mega (8x H100 SXM) is $88.49/hr on-demand. The specialist providers are 4-7x cheaper for comparable silicon, but you lose hyperscaler networking, IAM, storage services, and your negotiated enterprise contracts.
Watch out: on-demand GPU prices shift weekly. RunPod raised A100 community pricing twice in Q4 2025 when demand spiked around a Qwen launch; Vast.ai floors moved 18% in a single week after NVIDIA's H200 GA. Always check current rates before committing to a quarter-long capacity plan. Pricing comparison articles (including this one) age in weeks, not months.
Vast.ai: The Cost-Compressor Marketplace
Vast.ai is a peer-to-peer GPU marketplace — hosts list their machines (often gaming rigs, often literally in basements), renters bid on them. The floor on H100 hits $1.60/hr on weekends and A100 80GB regularly dips to $0.74/hr. I've watched 8x 4090 nodes go for $2.40/hr total during a Saturday morning lull in US demand. If absolute cost is the only metric, nothing beats Vast.ai.
The trust model is the whole game on Vast.ai. Every host has a reliability score (0-100), a datacenter/residential flag, and verified-by-Vast indicators. For any serious run I filter to reliability > 0.98, datacenter-only, verified hosts, and internet bandwidth ≥ 1 Gbps. That still leaves 40-60% of the catalog. The remaining pool gives you hyperscaler-adjacent pricing without the hyperscaler margin.
# Vast.ai CLI - search for verified H100 at under $2/hr
vastai search offers 'gpu_name=H100 num_gpus=1 dph_total<2 reliability2>0.98 verified=true' \
--order 'dph_total asc' \
--limit 10
# Rent and launch with a Docker image
vastai create instance OFFER_ID \
--image pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel \
--disk 50 --onstart-cmd "pip install vllm && python serve.py"
The wins I've had on Vast.ai: a 14B Qwen fine-tune on 2x A100 80GB for $43 total (19 hours of weekend compute). The losses: an SDXL LoRA run that stalled three times in 8 hours because the host's cable internet dropped. If your job is checkpointed every 30 minutes, those drops are nuisances. Otherwise they're disasters.
Where Vast.ai falls apart: reliability variance and data ingress. Residential hosts have no upstream SLA, so multi-GB dataset pulls can take all night. There's no formal support — if your instance dies at 3am, open a new one and hope. Don't use Vast.ai for time-sensitive work or sensitive data (host hardware is not audited, and host-level malware has been observed in the wild on low-score machines). For regulated workloads, stay on Lambda or a hyperscaler.
Together AI: The No-GPU Serverless Play
Together AI is a different animal — you don't rent a GPU, you pay per token for inference on 200+ open-source models, and fine-tuning is offered as a managed service where Together runs the cluster. Llama 3.1 405B Instruct costs $3.50/M input + $3.50/M output tokens. Qwen 3.5 72B runs $0.88/M tokens. DeepSeek-V3 and Mixtral 8x22B both sit under $0.60/M.
For open-model economics without managing infrastructure, Together competes directly with GPT and Claude API pricing. Together's Llama 70B at $0.88/M combined is roughly 7x cheaper than GPT-5.4 ($6/M) for comparable non-reasoning quality. The tradeoff is availability — Together's queue stretches into seconds on Llama 405B at peak, while Claude and OpenAI maintain sub-100ms TTFT.
# Together AI - inference against Llama 3.1 70B
from together import Together
client = Together(api_key="...")
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Explain NCCL all-reduce"}],
max_tokens=512
)
print(response.choices[0].message.content)
# Fine-tuning - managed, no GPU provisioning needed
client.fine_tuning.create(
training_file="file-abc123",
model="meta-llama/Llama-3.1-8B-Instruct",
n_epochs=3,
learning_rate=2e-5
)
Where Together falls apart: the moment you need custom CUDA, a non-standard framework, or a model outside their catalog. Together is inference-as-a-service — the abstraction is deliberate. You can't inject a custom tokenizer or do the low-level ML model deployment work that's trivial on RunPod. Managed fine-tuning costs roughly 2x what the same compute would on RunPod self-managed ($5.80/M vs ~$2.50/M training tokens). You're paying for the abstraction.
Which Cloud GPU Provider Should You Pick: The Decision Matrix
- Pick RunPod if: You're solo or a small team doing research, prototyping, and inference serving under 500 concurrent users. Template ecosystem and 30-second spin-up make this the lowest-friction option for 80% of real workloads.
- Pick Lambda Labs if: You're running multi-node training, need InfiniBand, or are committing to 3+ months of reserved capacity. This is the serious-operator pick — treat Lambda Reservations the way you'd treat an AWS Savings Plan.
- Pick Paperspace Gradient if: You're in education, running student cohorts, or already deep in DigitalOcean for non-GPU infrastructure. Don't pick it for long training runs — the notebook-idle timeouts and price premium don't make sense.
- Pick Vast.ai if: Absolute cost is priority one, the workload is checkpointed and interruptible, and you can tolerate residential-grade network reliability. Perfect for weekend experiments and price-sensitive research.
- Pick Together AI if: You want open-model inference without provisioning GPUs, your workload is stateless token generation, and you can live inside their model catalog. Best fit when the choice is "Together vs. OpenAI," not "Together vs. RunPod."
- Pick a hyperscaler (AWS, GCP, Azure) if: You're enterprise with negotiated discounts, need the same-provider IAM/networking story, or are running regulated workloads that mandate a Tier-1 cloud. Everyone else is overpaying.
Storage, Egress, and the Hidden Costs
Hourly GPU price is the headline; the real bill includes persistent storage and data egress. RunPod Network Volumes cost $0.07/GB/month (500 GB = $35 parked). Lambda charges $0.10/GB/month for persistent filesystems. Vast.ai storage is effectively free during rent but lost on release. Paperspace is the priciest at $0.29/GB/month for Gradient storage.
Egress is where cloud GPU bills surprise people. RunPod and Lambda don't charge egress at all — a deliberate differentiator versus AWS. Paperspace inherits DO's egress (1 TB/month free, then $0.01/GB). Moving 2 TB of training checkpoints from an S3 bucket costs roughly $180 in egress, often 10-15% of the compute cost of the run itself. Keep your dataset and checkpoint storage on the GPU provider, not on a hyperscaler.
flowchart LR A[Dataset on S3] -->|$180 egress| B[Lambda H100] B --> C[Checkpoint to S3] C -->|$180 egress next run| B D[Dataset on RunPod Volume] -->|$0 egress| E[RunPod H100 pod] E --> F[Checkpoint to RunPod Volume] F --> E
Keeping dataset + checkpoint storage inside the GPU provider avoids hyperscaler egress fees — a recurring 5-15% of total training cost for teams that don't notice.
Spin-Up Time, Queue Depth, and Operational Reality
The operational friction between providers is bigger than the price difference for most teams. Measured wall-clock time from "request pod" to "ssh -> CUDA available" across the five providers in April 2026:
| Provider | Best Case | Typical | Worst Case (peak) | What fails |
|---|---|---|---|---|
| RunPod Community | 22s | 48s | Unavailable (10+ min retry loop) | Capacity exhaust on H100 during US evenings |
| RunPod Secure | 38s | 90s | 5-10 min | Template pull bandwidth, not capacity |
| Lambda On-Demand | 2 min | 4 min | Queued indefinitely (no capacity) | H100 capacity during conference season |
| Paperspace Core | 3 min | 6 min | 12+ min | DO region capacity, notebook init |
| Vast.ai | 45s | 2 min | Variable (host-dependent) | Host connectivity, Docker image pull over home internet |
| Together AI | <1s | <1s (API call) | 3-8s queue during peak | Model warmup when your model isn't already loaded |
For interactive development, RunPod wins on latency. For long reserved training, Lambda's slow spin-up doesn't matter — provision once, run for weeks. For inference without the headache, Together AI's sub-second API means you never think about GPUs at all. NVIDIA Pyxis helps with cluster hygiene but doesn't close the gap between a marketplace and a hyperscaler-grade provider.
Frequently Asked Questions
What is the cheapest cloud GPU provider for AI training?
Vast.ai has the lowest absolute hourly price — H100 spot can hit $1.60/hr and A100 80GB drops to $0.74/hr on weekends. But that's residential-class reliability. For a reliable cheapest, RunPod Community Cloud at $1.99/hr H100 and $1.19/hr A100 is the defensible answer. For longer commits, Lambda Reservations at $1.85/hr H100 (one-year commit) beats both on total cost-of-training.
Is RunPod better than Lambda Labs?
For solo builders and prototyping, yes — RunPod's 30-second spin-up, template ecosystem, and per-second billing beat Lambda on operational friction. For serious multi-node training with InfiniBand, Lambda wins because RunPod doesn't formally support NCCL-optimized multi-node networking at the same scale. The practical answer: RunPod for dev and inference, Lambda for production training. Many teams run both.
How much does it cost to train a small LLM on cloud GPUs?
A full fine-tune of a 7B model (say Llama 3.1 8B) on 2 billion tokens takes roughly 150 H100-hours. At RunPod Community's $1.99/hr that's $299, on Lambda Reservations $278, on Vast.ai spot around $260. Add $20-40 for storage and data staging. For a 70B LoRA fine-tune, expect 400-800 H100-hours, or $800-$1600 on RunPod. Full-rank 70B training runs into four-figure or five-figure territory.
Is Vast.ai safe to use?
Vast.ai is safe for non-sensitive workloads when you filter to datacenter-class, verified, reliability-above-0.98 hosts. It's not appropriate for regulated data (HIPAA, PCI, GDPR-sensitive), production model weights with commercial value, or anything where host-level compromise is unacceptable. For research, open datasets, and personal experiments, the savings are real and the risk is manageable with proper filtering. Always use encrypted volumes and delete persistent storage after rent ends.
What is the difference between RunPod Secure Cloud and Community Cloud?
Secure Cloud runs in tier-3 data centers with a 99.99% uptime SLA, redundant networking, and compliance attestations (SOC 2 Type II). Community Cloud is operated by peer hosts with no SLA — typically hosting providers running their own H100s in smaller facilities. Community is 20-40% cheaper. Use Secure Cloud for production inference and compliance-adjacent work. Use Community Cloud for development, experiments, and cost-sensitive inference workloads where a 30-minute outage is tolerable.
Can I fine-tune a model on Together AI?
Yes — Together offers managed fine-tuning across most models in their catalog (Llama 3.1, Qwen, Mixtral, DeepSeek). You upload training data, Together runs the cluster, and you get a deployable endpoint. Pricing is per training token — roughly $5.80/M tokens for Llama 70B as of April 2026. That's about 2x what the same compute would cost self-managed on RunPod, so you're paying for the abstraction. Worthwhile if you don't want to touch PyTorch or NCCL.
Do cloud GPU providers charge for bandwidth or data egress?
RunPod and Lambda Labs don't charge egress — this is a deliberate differentiator from hyperscalers. Paperspace inherits DigitalOcean's egress (1 TB/month free, then $0.01/GB). Vast.ai bandwidth varies per host; most datacenter hosts offer unlimited, residential hosts often cap at a few TB/month. Together AI is token-based so bandwidth doesn't apply. Hyperscalers (AWS, GCP, Azure) charge $0.05-$0.12/GB egress, which adds up fast — moving 2 TB of checkpoints from S3 costs roughly $180.
The Right Cloud GPU Provider Is the One Your Workload Actually Fits
The best cloud GPU providers for AI training in 2026 aren't ranked in a single order because the workloads aren't a single shape. RunPod beats everyone on developer ergonomics and inference serving. Lambda Labs wins on multi-node training reliability. Paperspace owns managed-notebook UX. Vast.ai wins on raw price when you tolerate reliability variance. Together AI wins when the point is not managing a GPU at all. Start on RunPod Community Cloud this week — $5 of credits teaches you more about your workload than any benchmark. Escalate to Lambda Reservations when you hit the multi-node ceiling, and keep Together AI in your back pocket for inference paths that don't need a dedicated cluster.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Self-Hosting LLMs from India: Providers, Latency & INR Pricing (2026)
A practical comparison of self-hosting LLMs on Indian GPU clouds including E2E Networks, Tata TIR, and Yotta Shakti Cloud, with INR pricing inclusive of 18% GST, latency tests from Mumbai, Bangalore, Chennai, and Delhi, and DPDP Act 2023 compliance notes.
15 min read
AI/ML EngineeringQwen 3 vs Qwen 3.5: What Changed & Should You Upgrade
Qwen 3.5 wins on long context, code, and agentic math (AIME +25.8 at 72B) — but the 72B license shifted from Apache 2.0 to a community license and LoRA adapters do not port. Full architecture, benchmark, and migration breakdown.
15 min read
AI/ML EngineeringQwen 3.5 VRAM Requirements: Every Model Size & Quantization
Full VRAM matrix for every Qwen 3.5 model from 0.5B to 397B across 8 quantization levels. GPU tier picks, CPU/RAM fallback, llama.cpp and vLLM launch flags.
16 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.