Skip to content

AI/ML Engineering

Practical AI and machine learning engineering. LLM inference, tokenization, RAG pipelines, model deployment, vector databases, and the infrastructure behind modern AI applications.

54 articles

Eval-Driven Development for LLM Apps: A Practical Workflow
AI/ML Engineering

Eval-Driven Development for LLM Apps: A Practical Workflow

TDD doesn't work on non-deterministic LLM outputs. Eval-driven development is the analog: paired prompt-expectation sets, rule-based or LLM-judge scoring, run as a regression suite on every change. Tools (promptfoo, Braintrust, OpenAI Evals), real customer-support example, and what evals catch that code review misses.

12 min read·
Can You Run LLMs Without GPU? CPU Benchmarks & Reality Check
AI/ML Engineering

Can You Run LLMs Without GPU? CPU Benchmarks & Reality Check

A deep dive into running large language models on CPUs. Includes performance benchmarks, limitations, and optimization strategies.

10 min read·
AI Agent Frameworks Compared: LangGraph vs CrewAI vs AutoGen (2026)
AI/ML Engineering

AI Agent Frameworks Compared: LangGraph vs CrewAI vs AutoGen (2026)

A practitioner comparison of LangGraph, CrewAI, and AutoGen -- benchmarks on research, code gen, and data analysis agents with code examples, token efficiency, and production guidance.

14 min read·
What Is MCP (Model Context Protocol)? A Developer's Guide
AI/ML Engineering

What Is MCP (Model Context Protocol)? A Developer's Guide

A practical guide to the Model Context Protocol (MCP) -- Anthropic's open standard for connecting AI models to external tools and data. Architecture, TypeScript server examples, and security best practices.

14 min read·
Best GPU for Running LLMs Locally (2026): VRAM, Performance & Value Guide
AI/ML Engineering

Best GPU for Running LLMs Locally (2026): VRAM, Performance & Value Guide

VRAM requirements, tokens/sec benchmarks, and cost-per-token analysis for every GPU worth considering in 2026. Covers RTX 4060 through H100, multi-GPU setups, Apple Silicon, and budget tiers from $260 to $22,000.

16 min read·
MiniMax M2.7 Self-Evolving Agents: What "Self-Evolving" Actually Means
AI/ML Engineering

MiniMax M2.7 Self-Evolving Agents: What "Self-Evolving" Actually Means

The marketing says self-evolving; the engineering says adapter tuning + adaptive tool selection + memory consolidation on a static base. Honest breakdown of what M2.7 ships, when it wins, and when it doesn't.

10 min read·
LLM API Pricing Compared (2026): OpenAI vs Anthropic vs Google vs Open Source
AI/ML Engineering

LLM API Pricing Compared (2026): OpenAI vs Anthropic vs Google vs Open Source

Per-token pricing, caching credits, batch discounts, and hidden costs across OpenAI, Anthropic, Google, and open-source LLM providers. Includes four real workload simulations and cost optimization strategies.

13 min read·
Gemini 3.1 Pro for Developers: When It Beats Opus 4.7
AI/ML Engineering

Gemini 3.1 Pro for Developers: When It Beats Opus 4.7

Gemini 3.1 Pro tops the LM Council April 2026 board on GPQA Diamond and ARC-AGI-2 at 50% lower cost — but Opus 4.7 still leads on coding. The honest task-by-task decision guide.

10 min read·
Kimi K2.6 for Coding: The Cost-Performance Sweet Spot
AI/ML Engineering

Kimi K2.6 for Coding: The Cost-Performance Sweet Spot

Moonshot's Kimi K2.6 hits ~74% SWE-Bench Pro at $0.30 per typical run — 17-25x cheaper than Opus 4.7. Real benchmarks, where it falls short, and the two-tier routing pattern teams use in production.

9 min read·

Stay in the loop

New articles delivered to your inbox. No spam.