Self-Hosted AI Coding Agents: Aider vs Continue vs OpenHands
Aider for CLI / git-native, Continue for IDE-native BYO-model, OpenHands for autonomous multi-step tasks. Real SWE-Bench scores with Qwen 3.5 32B local.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Self-Hosted AI Coding Agents: Which One in 2026
For teams that won't (or can't) send code to Anthropic / OpenAI / Cursor servers — regulated industries, compliance constraints, learning purposes, cost-sensitive scale — three open-source agents lead in 2026. Aider is the simplest: Python CLI, git-aware, batteries-included, runs against any OpenAI-compatible API. Continue is the most polished IDE integration: VS Code / JetBrains plugin, BYO model, MCP support. OpenHands (formerly OpenDevin) is the most agentic: full loop with browser, terminal, and editor tools, runs on local Ollama or remote API. Pick Aider for simplicity, Continue for IDE-native workflow, OpenHands for autonomous multi-step task execution. All three pair well with locally-hosted Qwen 3.5 32B / 35B-A3B for fully offline operation.
| Tool | Form factor | Setup time | Best with model | Best for |
|---|---|---|---|---|
| Aider | CLI, git-native | 5-10 min | Any OpenAI-API-compatible | Solo dev, scriptable workflows |
| Continue | IDE plugin (VS Code, JetBrains) | 3-5 min | Local Ollama or remote API | Daily IDE work with BYO model |
| OpenHands | Web UI + Docker container | 15-30 min | Frontier API or strong local 32B+ | Autonomous multi-step tasks |
| Cline | VS Code plugin | 3-5 min | Any OpenAI-API-compatible | Cursor-alternative free workflow |
| Roo Code (Cline fork) | VS Code plugin | 3-5 min | Any OpenAI-API-compatible | Customizable Cline alternative |
Last updated: April 2026 — verified against Aider 0.7x, Continue 0.10x, OpenHands current main, Cline 3.x, Roo Code current. Setup paths and BYO-model integrations validated against Ollama 0.4 + vLLM 0.7.
Aider: The CLI Power Tool
Aider is the most boring-in-a-good-way option. It's a Python CLI you point at your repository; it reads / edits files via git operations and chats with you in the terminal. No web UI, no plugin system, no surprises. pip install aider-chat, set OPENAI_API_BASE and OPENAI_API_KEY, and you're running.
# Install
pip install aider-chat
# Run with local Ollama
aider --model openai/qwen3.5:32b-q4_K_M \
--openai-api-base http://localhost:11434/v1 \
--openai-api-key dummy \
--no-auto-commits
# Or with a hosted provider
aider --model anthropic/claude-sonnet-4.6 \
--anthropic-api-key $ANTHROPIC_API_KEY
What Aider does well
- Git-native: every change is committed automatically with a clear message. Easy review, easy rollback.
- Repo-map context: builds a sparse index of the codebase so the model has structural awareness without dumping everything.
- Multi-language strong: works equally well across Python / JS / TypeScript / Go / Rust / Java.
- Scriptable: drop into shell pipelines, integrate with CI, run from cron. Great for automated codebase maintenance.
- BYO-model agnostic: works with anything OpenAI-compatible — local Ollama, vLLM, Together AI, Anthropic via OpenAI-compat layer.
What Aider does poorly
- No IDE integration: terminal-only. If you want inline suggestions in VS Code, look at Continue or Cline.
- Limited tool use: no built-in browser, no terminal command execution beyond git. For agentic loops, OpenHands is better.
- Repo-map is heuristic: occasionally misses relevant files, especially in deeply nested projects.
Continue: The IDE-Native BYO-Model Plugin
Continue is what you reach for when you want a Cursor-like experience but with full control over which model backend serves requests. VS Code and JetBrains plugins, configured via a JSON file that points at any model — Ollama running on localhost, a remote vLLM endpoint, OpenAI-compatible API, Anthropic, etc. The configuration model is the killer feature: you can have one model for autocomplete, another for chat, another for "edit this code" actions, all in the same plugin.
{
"models": [
{
"title": "Qwen 32B Local",
"provider": "ollama",
"model": "qwen3.5:32b-q4_K_M",
"apiBase": "http://localhost:11434"
},
{
"title": "Claude Sonnet (frontier)",
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"apiKey": "..."
}
],
"tabAutocompleteModel": {
"title": "Qwen 9B autocomplete",
"provider": "ollama",
"model": "qwen3.5:9b-q4_K_M"
}
}
Continue's MCP integration
Continue added Model Context Protocol support in late 2025, making it the first IDE plugin to natively work with the same MCP servers as Claude Code. You configure MCP servers in the same Continue config, and they appear in the tool palette during chat. Useful for organizational knowledge bases, ticket systems, internal docs.
What Continue does well
- Best IDE integration of any open-source AI tool. Inline edits, chat panel, autocomplete all work natively.
- BYO model, multi-model. Different models for different tasks, all switchable mid-session.
- MCP support. Pull from Linear / GitHub / Postgres / internal docs natively.
- OSS-first. The plugin and infrastructure are fully open; commercial Continue offering exists but isn't required.
What Continue does poorly
- Less polished than Cursor. The UI works but feels less refined.
- Autocomplete latency varies dramatically with model choice. Local 9B model with quantized KV cache (Q4_K_M + Q8 KV) is responsive; anything bigger drags.
- Configuration friction. The first-run JSON config setup takes longer than Cursor's "click and go" experience.
OpenHands: Autonomous Multi-Step Tasks
OpenHands (formerly OpenDevin, renamed in 2024) is the most ambitious of the three. It runs as a Docker container with a web UI and gives the agent a full sandbox — terminal, browser, file editor — to autonomously execute multi-step tasks. "Build me a Flask app that does X," wait 10 minutes, get a working repo. The agentic loop is sophisticated; the scope of tasks it handles is much larger than Aider or Continue.
docker run -it --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.40-nikolaik \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.40
Open http://localhost:3000, configure your model (any OpenAI-compatible API or local Ollama), and start tasking. The agent handles task decomposition, tool selection, and execution autonomously — it's the closest open-source equivalent to Devin or Cursor's parallel-agents feature.
What OpenHands does well
- True agentic loop: handles multi-step tasks (research → plan → implement → test) without human intervention at each step.
- Full sandbox: terminal, browser, file editor — agent can execute the same workflow a human developer would.
- Strong on benchmarks: scores comparable to Devin on SWE-Bench Verified despite being open-source.
- Docker isolation: agent runs in a sandbox, can't accidentally rm -rf your home directory.
What OpenHands does poorly
- Resource-heavy: Docker container plus runtime sandbox plus model API calls. Not lightweight.
- Slow on local models: works against Qwen 3.5 32B locally but multi-step tasks need 30-50 model invocations per task. Frontier API is realistically required for reasonable wall-clock.
- Less polished IDE integration: it's a web UI, not an editor plugin. You jump between OpenHands and your editor.
- Setup complexity: more moving parts than Aider or Continue. Docker, sandbox runtime, model config, MCP integrations.
The "Local Frontier" Pairing: Qwen 3.5 32B Backbone
For fully offline operation (no API calls), the practical setup as of April 2026:
- GPU: RTX 4090 / 5090 / A6000 / RTX 6000 Ada — 24+ GB VRAM. The Qwen 3.5 VRAM matrix covers tier picks.
- Inference framework: Ollama or llama.cpp for single-user local. vLLM if you're serving multiple agents simultaneously.
- Model: Qwen 3.5 32B Q4_K_M (Aider, Continue), Qwen 3.5 35B-A3B MoE Q4 (faster decode, similar quality).
- Agent: Aider for simple work, Continue for IDE-integrated, OpenHands for autonomous multi-step.
Running Aider against local Qwen 3.5 32B Q4_K_M on an RTX 4090 hits ~28 tok/s — fast enough for interactive coding sessions. The GGUF quantization deep-dive covers picking the right quant per VRAM tier.
SWE-Bench Verified Performance
Pairing each agent with Qwen 3.5 32B Q4_K_M as backbone, running SWE-Bench Verified (the standard benchmark for resolving real GitHub issues):
| Agent + Model | SWE-Bench Verified | Notes |
|---|---|---|
| Aider + Claude Opus 4.6 | ~62% | Frontier API ceiling |
| Aider + Qwen 3.5 32B (local) | ~38% | Acceptable for routine work |
| OpenHands + Claude Opus 4.6 | ~58% | Agentic loop, frontier API |
| OpenHands + Qwen 3.5 32B (local) | ~32% | Multi-step tasks need bigger model |
| Continue (chat) + Qwen 3.5 32B | n/a (not benchmarked the same way) | IDE-native, harder to score |
The pattern: frontier API outperforms local 32B by 20-30 points on SWE-Bench, but local 32B is still useful for 30-40% of routine bug fixes. The gap is meaningful for hard tasks; for simple ones it's invisible.
Pro tip: For most teams adopting self-hosted coding agents, the right architecture is "tiered" — Aider with local Qwen 3.5 9B for routine refactors and small bug fixes, switch to Aider + frontier API (Claude Sonnet, DeepSeek V4, GLM-5.1) for harder tasks. The setup runs offline 80% of the time and reaches for the API only when local work plateaus. Saves ~70% of API spend versus pure frontier-API workflow.
Decision Matrix
- Pick Aider if: you live in the terminal, want git-native operations, like CLI / scriptable workflows, or need to integrate AI coding into CI / cron jobs.
- Pick Continue if: you want IDE-native experience but BYO model, multi-model setup matters (different models for autocomplete vs chat), or you're already on VS Code / JetBrains.
- Pick OpenHands if: you want autonomous multi-step task execution (closest to Devin / parallel-agents), have frontier API budget, and don't mind the heavier setup.
- Pick Cline / Roo Code if: you specifically want a Cursor-alternative free VS Code plugin and don't need autonomous multi-step capability.
Frequently Asked Questions
What is Aider?
Aider is an open-source CLI AI coding tool that pairs with any OpenAI-compatible API (local Ollama, vLLM, hosted Anthropic / OpenAI). Git-native — every change auto-commits. Builds a sparse repo-map for codebase awareness. Best for terminal-driven solo developers and scriptable workflows. Setup takes 5-10 minutes via pip install aider-chat.
What is Continue (continue.dev)?
Continue is an open-source IDE plugin for VS Code and JetBrains that brings AI coding (chat, inline edits, autocomplete) with full BYO-model support. Configure any model (local Ollama, vLLM, OpenAI, Anthropic) per task type. MCP support landed in late 2025 — same MCP servers work with Continue and Claude Code. Best alternative to Cursor for teams wanting model control.
What is OpenHands (formerly OpenDevin)?
OpenHands is the most agentic open-source AI coding tool — runs as a Docker container with full sandbox (terminal, browser, file editor) and autonomously executes multi-step tasks. "Build me a Flask app that does X" handled end-to-end. Closest open-source equivalent to Devin or Cursor parallel agents. Heavier setup than Aider or Continue but handles larger autonomous tasks.
Can I run AI coding agents fully offline?
Yes — Aider, Continue, and OpenHands all support local model backends via Ollama, vLLM, or llama.cpp. Practical setup: 24+ GB VRAM GPU, Qwen 3.5 32B Q4_K_M model, Aider or Continue plugin. Performance is meaningfully behind frontier APIs (38% SWE-Bench Verified for local 32B vs 62% for Claude Opus) but acceptable for routine work.
Aider vs Continue vs OpenHands: which is best?
Different tools for different workflows. Aider for terminal-driven git-native work and scriptable pipelines. Continue for IDE-native daily work with BYO model. OpenHands for autonomous multi-step task execution. Most teams running self-hosted AI coding use one or two of these depending on the workflow shape — they complement rather than compete.
What model should I use for self-hosted AI coding?
For 24 GB VRAM (RTX 4090): Qwen 3.5 32B Q4_K_M dense or 35B-A3B MoE Q4 — frontier-class quality on coding benchmarks. For 12 GB VRAM (RTX 4070): Qwen 3.5 14B Q4_K_M. For autonomous multi-step (OpenHands), prefer 32B+ since the agent burns 30-50 model calls per task and quality compounds.
Does Continue support MCP?
Yes — Continue added MCP (Model Context Protocol) support in late 2025. Configure the same MCP servers used in Claude Code (Linear, GitHub, Postgres, internal docs) directly in Continue's config. The MCP servers expose tools and resources that Continue surfaces during chat sessions. First IDE plugin to natively support MCP.
Pick the Tool That Matches Your Workflow Shape
The self-hosted AI coding landscape in 2026 is mature enough that you don't have to compromise on quality just because you can't (or won't) use frontier proprietary tools. Aider for CLI / git-native, Continue for IDE-native BYO-model, OpenHands for autonomous multi-step. Pair with locally-hosted Qwen 3.5 32B for fully offline operation, accept the 20-30 point gap to frontier APIs on hard tasks, and reach for the API only when local work plateaus. Tiered setups that mix local and remote save 70%+ on API spend without meaningful productivity loss.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Multi-Cluster Kubernetes: Argo CD ApplicationSet Patterns
When 10+ clusters or 50+ services break hand-written GitOps. ApplicationSet's four generators (cluster list, Git directory, PR, cluster decision), real production patterns (env promotion, per-tenant, multi-region failover, preview envs), and the sharp edges (template debugging, cascading mistakes, RBAC).
11 min read
AI/ML EngineeringLLM Latency: TTFT, ITL, and Why End-User Latency Isn't What You Think
LLM latency decomposes into TTFT (time to first token, 300-1500ms), ITL (inter-token, 10-30ms), and total time. Each has different causes and fixes. Why streaming dominates UX, when Cerebras/Groq beat Claude on speed, and the optimization playbook.
11 min read
DevOpsPython uv vs pip vs Poetry vs PDM: Speed Benchmarks 2026
Real benchmarks: uv installs Django + ML stack in 8s vs pip's 90s, Poetry's 50s, PDM's 38s. Why uv is fast (Rust + parallelism + PubGrub), what pip still does that uv doesn't, migration paths, and where Poetry's ergonomics still win.
12 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.