AI/ML Engineering

Practical AI and machine learning engineering. LLM inference, tokenization, RAG pipelines, model deployment, vector databases, and the infrastructure behind modern AI applications.

6 articles

AI Observability: How to Monitor and Debug LLM Applications
AI/ML Engineering

AI Observability: How to Monitor and Debug LLM Applications

A practical guide to monitoring LLM applications in production -- input/output logging, cost tracking, quality metrics, and a comparison of LangSmith, Langfuse, and Arize.

10 min read·
Deploying ML Models in Production: From Notebook to Kubernetes
AI/ML Engineering

Deploying ML Models in Production: From Notebook to Kubernetes

End-to-end guide to deploying ML models -- from ONNX export and FastAPI serving to Kubernetes GPU workloads, canary deployments, and Prometheus monitoring.

9 min read·
Fine-Tuning vs Prompt Engineering: Choosing the Right Approach
AI/ML Engineering

Fine-Tuning vs Prompt Engineering: Choosing the Right Approach

A practical guide to choosing between prompt engineering and fine-tuning for LLMs -- techniques, costs, LoRA/QLoRA, and a decision framework for production systems.

10 min read·
Vector Databases: What They Are, How They Work, and When You Need One
AI/ML Engineering

Vector Databases: What They Are, How They Work, and When You Need One

A practical guide to vector databases -- how embeddings and ANN algorithms work, and an honest comparison of Pinecone, Weaviate, Qdrant, and pgvector.

9 min read·
RAG Explained: Building AI Applications That Know Your Data
AI/ML Engineering

RAG Explained: Building AI Applications That Know Your Data

A practical guide to building Retrieval-Augmented Generation pipelines -- from document chunking and embedding to hybrid retrieval and evaluation metrics.

9 min read·
How LLM Inference Works: Tokens, Context Windows, and KV Cache
AI/ML Engineering

How LLM Inference Works: Tokens, Context Windows, and KV Cache

Language models process tokens, not words. Learn how BPE tokenization works, what the context window really is, and how the KV cache speeds up generation — with real pricing comparisons across OpenAI, Anthropic, and Google.

12 min read·

Stay in the loop

New articles delivered to your inbox. No spam.

AI/ML Engineering | TechPlained