What Is Platform Engineering? And How It Differs from DevOps
Platform engineering builds Internal Developer Platforms (IDPs) -- self-service layers that abstract infrastructure complexity. Learn the five pillars, golden paths, tooling (Backstage, Crossplane, ArgoCD), anti-patterns, and when your org needs a dedicated platform team.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Thursday, 4:47 PM
A new engineer on the payments team opens a Slack ticket in #infra-help: "Need a Postgres 16 database for a new reconciliation service. Staging and prod. Can you set it up?" She started three weeks ago. She has already shipped the service locally. She needs a database. The infra on-call glances at the ticket, adds a "will pick up Monday" react, and goes back to debugging a flaky Kubernetes upgrade.
Monday morning a queue of 14 tickets like hers is waiting. Each one needs 15-40 minutes of work: a Terraform module, some IAM glue, a manual secret rotation into Vault, an entry in the monitoring spreadsheet, and a ticket update. By Wednesday the on-call has cleared eight. By Friday the payments engineer pings again -- the reconciliation service was supposed to ship yesterday. She has been sitting on a fully written app for a week, waiting on infrastructure.
Multiply that scene by 20 teams in a 300-person engineering org and you get the single biggest productivity drag I see at growth-stage companies: a small central infra team has become the bottleneck on every deploy, every new environment, and every new service. DevOps promised to break the wall between dev and ops. In practice it left a concrete one with a Jira queue hanging on it.
Platform engineering is the discipline that fixes this. You stop answering tickets and start shipping self-service: an Internal Developer Platform (IDP) that abstracts Kubernetes, Terraform, and cloud IAM behind a handful of paved-road workflows developers can use without asking you. That payments engineer should create a database by filling a form in Backstage, clicking submit, and having provisioned credentials in her service's environment within 10 minutes. No Slack ticket, no "will pick up Monday", no waiting.
Platform Engineering vs DevOps: What Actually Changed
DevOps and platform engineering are not competing philosophies. Platform engineering is what happens when DevOps principles mature at scale. But the operational model is fundamentally different.
| Dimension | DevOps | Platform Engineering |
|---|---|---|
| Core idea | Break silos between dev and ops | Build a self-service platform for developers |
| Who handles infra? | Every team (shared responsibility) | Platform team builds; dev teams consume |
| Developer experience | "You build it, you run it" | "You build it, we make running it easy" |
| Cognitive load | High -- devs must learn infra tooling | Low -- complexity is abstracted away |
| Scaling model | Embed SREs/DevOps in each team | Central platform team serves all teams |
| Self-service | Varies (often ticket-based in practice) | First-class requirement -- everything is self-serve |
| Standardization | Encouraged but not enforced | Golden paths enforce sensible defaults |
| Works best at | Small to mid-size teams (5-50 engineers) | 50+ engineers, multiple teams/services |
DevOps told every developer to care about infrastructure. The intention was good, but the result at many organizations was that developers became part-time, mediocre infrastructure operators. Platform engineering acknowledges this reality and centralizes infrastructure expertise into a dedicated team that builds tools the rest of the organization consumes.
Watch out: Platform engineering is not "DevOps rebranded." If your platform team is just a renamed ops team that still processes tickets manually, you've changed the name without changing the model. The litmus test: can a developer deploy a new service to production in under 30 minutes without opening a ticket? If not, you don't have a platform -- you have a help desk.
The Five Pillars of an Internal Developer Platform
Every effective IDP has five layers, whether you build it explicitly or it emerges organically. Understanding these layers helps you decide where to invest first.
1. Developer Portal (Service Catalog)
A single pane of glass where developers discover services, view documentation, check ownership, and trigger workflows. Backstage (by Spotify) is the dominant open-source option. Port and Cortex are managed alternatives. The portal is the front door to your platform -- if developers don't use it, the rest of the platform doesn't matter.
2. Self-Service Infrastructure Provisioning
Developers request infrastructure (databases, caches, queues, storage buckets) through templates or APIs, and the platform provisions it automatically. Crossplane, Terraform modules behind a service catalog, and Pulumi Automation API are the common implementation choices. The key is that developers fill in a form or a YAML template -- not write raw Terraform.
3. Deployment and Delivery
A standardized deployment pipeline that works the same way for every service. ArgoCD and Flux handle GitOps-based continuous delivery to Kubernetes. The platform team owns the pipeline templates; application teams configure what to deploy, not how to deploy it. Push to main, the golden path pipeline handles the rest.
4. Observability and Developer Experience
Built-in logging, metrics, tracing, and alerting that work out of the box for any service deployed through the platform. Developers shouldn't need to configure Prometheus scrape targets, set up Grafana dashboards, or wire up distributed tracing manually. The platform provides sensible defaults, and developers customize when needed.
5. Security and Governance
Policy enforcement, secret management, RBAC, and compliance checks baked into the platform. OPA (Open Policy Agent) or Kyverno enforce guardrails. Vault manages secrets. The platform ensures every service meets security baselines without requiring developers to think about it.
Golden Paths: The Core Design Pattern
A golden path is an opinionated, well-supported workflow for a common task. It's not a constraint -- it's a recommendation backed by tooling, documentation, and automation. Developers can deviate, but the golden path is so much easier that most choose to follow it.
Example golden paths in a mature platform:
- Create a new microservice -- scaffold from a template in Backstage, get a GitHub repo with CI/CD, Dockerfile, Helm chart, and observability pre-configured in 5 minutes.
- Provision a PostgreSQL database -- submit a request through the portal specifying size and region. Crossplane provisions a managed database, creates credentials in Vault, and injects them into the service's environment.
- Deploy to production -- merge to main. ArgoCD syncs the Helm chart to the production cluster. Canary rollout with automatic rollback on error rate spike.
- Add a new API endpoint -- follow the service template's API convention, add the route, push. The platform auto-generates OpenAPI docs and registers the endpoint in the service catalog.
The golden path philosophy is critical: you offer the paved road, not a walled garden. Developers who need to go off-road can, but they take on the maintenance burden. This balance between flexibility and standardization is what separates a good platform from a bureaucratic bottleneck.
Platform Engineering Tool Landscape
| Layer | Open Source | Commercial | Purpose |
|---|---|---|---|
| Developer portal | Backstage | Port, Cortex, OpsLevel | Service catalog, docs, workflows |
| Infrastructure provisioning | Crossplane, Terraform | Humanitec, Massdriver | Self-service infra as code |
| Workload spec | Score | Humanitec Score | Environment-agnostic workload definition |
| GitOps / CD | ArgoCD, Flux | Codefresh, Harness | Declarative continuous delivery |
| Policy engine | OPA, Kyverno | Styra DAS | Guardrails and compliance |
| Secret management | Vault, Sealed Secrets | Doppler, AWS Secrets Manager | Credential lifecycle |
| Observability | Prometheus, Grafana, Jaeger | Datadog, New Relic | Metrics, logs, traces |
Pro tip: Don't try to build all five layers at once. Start with the developer portal (Backstage) and one golden path (e.g., deploy a new service). Prove value with a single team, then expand. Platform teams that try to boil the ocean end up shipping nothing for 6 months and losing organizational trust.
Platform Engineering Anti-Patterns
I've seen platform initiatives fail more often than they succeed. Here are the patterns that kill them:
The Ivory Tower Platform
The platform team builds what they think developers need without talking to developers. Six months later, they ship a portal nobody asked for, and developers keep using their existing scripts. Fix: treat the platform as a product. Interview your users. Track adoption metrics. Iterate based on feedback, not assumptions.
The Mandatory Migration
Forcing all teams to migrate to the new platform simultaneously. This creates resentment and an avalanche of support requests the platform team can't handle. Fix: make the platform opt-in. Let early adopters prove value. Other teams will follow when they see the benefits.
The Ticket-Queue Platform
Renaming the ops team to "platform team" without changing the workflow. Developers still open tickets for everything. The platform team becomes a bottleneck wearing a new hat. Fix: self-service is non-negotiable. If a common request can't be fulfilled without human intervention, it's not a platform feature -- it's a todo item for the platform team.
Over-Abstraction
Hiding so much infrastructure detail that developers can't debug production issues. When the abstraction leaks (and it always does), developers are helpless because they've never seen what's underneath. Fix: make the abstractions transparent. Show developers what's being provisioned. Let them access raw Kubernetes resources when needed. The golden path should simplify, not obscure.
Platform Maturity by Organization Size
| Org Size | Platform Maturity | What to Focus On |
|---|---|---|
| 10-30 engineers | No dedicated platform needed | Shared CI/CD templates, basic IaC modules, wiki documentation. One senior engineer owns infra part-time. |
| 30-50 engineers | Lightweight platform | Standardize on one deploy pipeline. Templatize common infra (databases, caches). Start a shared Terraform module library. Consider Backstage for service catalog. |
| 50-150 engineers | Dedicated platform team (2-4 engineers) | Build golden paths for service creation and deployment. Self-service database provisioning. Developer portal. Standardized observability. |
| 150-500 engineers | Mature platform team (5-10 engineers) | Full IDP with portal, self-service infra, policy enforcement, multi-cluster support. Platform-as-a-product with roadmap and user research. |
| 500+ engineers | Platform organization | Multiple sub-teams owning different layers (compute, data, security, developer experience). Internal SLAs. Platform SDK and APIs. |
The inflection point is around 50 engineers. Below that, the overhead of building and maintaining a platform exceeds the productivity gains. Above that, the cost of not having a platform -- in ticket wait times, inconsistent deployments, and duplicated infrastructure work -- grows exponentially.
Getting Started: A 90-Day Platform Roadmap
If you're starting from zero, here's a practical 90-day plan:
- Weeks 1-2: Discovery -- Interview 10 developers. Ask them what wastes their time. Map the current deploy workflow end-to-end. Identify the top 3 pain points.
- Weeks 3-4: Quick wins -- Fix the most painful friction point. Usually it's deploy pipeline standardization or database provisioning. Ship something developers can use immediately.
- Weeks 5-8: Golden path MVP -- Build one golden path end-to-end. "Create a new service and deploy it to staging" is the classic starting point. Use Backstage software templates, ArgoCD, and a Helm chart template.
- Weeks 9-10: Onboard early adopters -- Get 2-3 teams using the golden path. Collect feedback. Fix sharp edges. Measure time-to-deploy before and after.
- Weeks 11-12: Iterate and plan -- Ship improvements based on feedback. Plan the next quarter's roadmap. Present adoption metrics and time savings to leadership.
Key metric: Track "time from git init to first production deploy." If that number drops from 2 weeks to 30 minutes, your platform is working. If it doesn't change, you're building the wrong thing.
A Concrete Golden Path: "Create a New Microservice"
Abstract pillars are easy to argue about; a real golden path is harder. Here is a complete implementation of the most common first golden path, assembled from the tools most platform teams will already have.
The Developer's View
The engineer opens Backstage, picks "Create Service (Go)", fills a short form (service name, team, database required?), and clicks Create. Five minutes later she receives a Slack message: "Your repo is at github.com/org/pricing-svc, staging deploy is live at pricing-svc.staging.example.com, dashboards here."
Behind the Curtain: The Backstage Software Template
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: go-microservice
title: Create Go Microservice
description: New Go service with CI/CD, Helm chart, dashboards, and optional DB
spec:
owner: platform-team
type: service
parameters:
- title: Service details
required: [name, team, database]
properties:
name:
type: string
pattern: ^[a-z][a-z0-9-]{2,29}$
team:
type: string
enum: [payments, catalog, search, platform]
database:
type: boolean
title: Provision a Postgres database?
steps:
- id: scaffold
name: Scaffold repo
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
team: ${{ parameters.team }}
- id: publish
name: Publish to GitHub
action: publish:github
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=org
defaultBranch: main
topics: ['service', '${{ parameters.team }}']
- id: register
name: Register in catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
- id: database
name: Provision database
if: ${{ parameters.database }}
action: http:backstage:request
input:
method: POST
path: /api/crossplane/postgres
body:
serviceName: ${{ parameters.name }}
team: ${{ parameters.team }}
The Crossplane XR Behind the "Provision a Database" Button
apiVersion: database.platform.example.com/v1alpha1
kind: PostgresInstance
metadata:
name: pricing-svc-prod
spec:
size: small # small|medium|large => t4g.small..r6g.xlarge
region: us-east-1
backup:
retentionDays: 14
compositionRef:
name: postgres-aws
writeConnectionSecretToRef:
name: pricing-svc-db
namespace: pricing
The XR triggers a Crossplane Composition that provisions the RDS instance, the security group, the IAM role, the Secret with credentials, and a Kubernetes ExternalSecret that syncs the credentials into the target namespace. The developer never touches AWS -- she just picks "small" or "medium" and the platform encodes the sensible defaults (encryption at rest, automated backups, multi-AZ for prod).
Time Budget for the Whole Path
| Step | Target duration | Actual in practice |
|---|---|---|
| Backstage form to scaffolded repo | 60s | 40-90s |
| First CI pipeline green | 3 min | 2-5 min |
| ArgoCD sync to staging | 2 min | 90s-3 min |
| Database provisioning (RDS) | 8 min | 6-12 min |
| Total git-init to staging URL | 10 min | 8-15 min |
Prior to the platform, the same workflow at one client took between 4 and 11 business days depending on ticket queue depth.
Metrics That Prove the Platform Is Working
Platform teams live or die by adoption. If developers route around your platform, you are building a cost centre. Track these numbers monthly and share them publicly in engineering all-hands:
- Time from git init to first production deploy (p50 and p95). Target: less than 1 day p50.
- Percentage of services deployed via golden path. Target: greater than 80 percent within 18 months.
- Developer-initiated infra changes (provisioning, config) that required no ticket. Target: greater than 90 percent.
- Platform NPS from biannual developer survey. Healthy: greater than 30. Red flag: less than 10.
- Mean time to recover (MTTR) for platform-caused incidents. Target: less than 30 minutes.
- Platform team capacity spent on new capabilities vs unplanned toil. Target: greater than 60 percent on new work.
A signal I trust more than surveys: count the number of "can you help me..." DMs infrastructure engineers receive per week. If that number is rising quarter over quarter, your platform is losing ground. If it is falling, you are shipping the right abstractions.
Cost of a Platform Team vs Cost of Not Having One
Platform engineering is not free -- you are paying salaries for 3-10 engineers who could otherwise be shipping product. The case for it is economic: ticket queue time is also a cost, it just hides on the product side of the P&L.
| Line item | Typical value |
|---|---|
| Platform team (5 engineers, fully loaded) | $1.25M/year |
| Tooling (Backstage hosting, Crossplane, CI minutes) | $80k-$200k/year |
| Total platform cost | ~$1.5M/year |
| Time saved per developer per week (mature platform) | ~4-8 hours |
| At 200 developers, $150k loaded cost: time saved | $1.5M-$3M/year |
| Plus reduced infrastructure cost (fewer orphaned environments, better density) | $200k-$500k/year |
The rough math: a 200-developer org breaks even around a five-person platform team in year two, and is net positive from year three onward. Smaller orgs struggle to justify the investment; larger orgs struggle to justify not making it. The 50-engineer inflection point in the maturity table earlier in this article is not a coincidence -- it is where the numbers start to pencil.
Frequently Asked Questions
What is platform engineering in simple terms?
Platform engineering is the practice of building a self-service developer platform that lets application teams deploy code, provision infrastructure, and manage services without needing deep infrastructure expertise. Instead of every developer learning Kubernetes and Terraform, a dedicated platform team builds tools and workflows that abstract that complexity away. Think of it as building an internal PaaS tailored to your organization's specific needs.
Is platform engineering just DevOps with a new name?
No. DevOps is a cultural movement that encourages shared responsibility between development and operations. Platform engineering is a specific discipline that emerged from DevOps: it acknowledges that "everyone owns infrastructure" doesn't scale, and centralizes infrastructure expertise into a team that builds self-service tools. DevOps is the philosophy; platform engineering is one way to implement it at scale. You can practice DevOps without platform engineering, but platform engineering without DevOps principles (automation, collaboration, continuous improvement) will fail.
What is an Internal Developer Platform (IDP)?
An IDP is the product that a platform engineering team builds. It typically includes a developer portal (like Backstage), self-service infrastructure provisioning, standardized CI/CD pipelines, built-in observability, and security guardrails. The IDP is the interface between developers and infrastructure -- developers interact with the IDP, and the IDP translates their requests into infrastructure operations. It's not a single tool but a curated, integrated set of tools and workflows.
When should my organization invest in platform engineering?
The sweet spot is around 50 engineers and 5+ development teams. Below that, the overhead of building and maintaining a platform exceeds the productivity gains -- shared CI templates and Terraform modules are enough. Above 50 engineers, you'll notice that infrastructure requests become a bottleneck, deployment practices diverge across teams, and onboarding new developers takes weeks because they need to learn too many tools. Those are signals that a dedicated platform team will pay for itself.
What are golden paths in platform engineering?
Golden paths are pre-built, opinionated workflows for common developer tasks -- creating a new service, provisioning a database, deploying to production. They encode your organization's best practices into templates and automation. Developers follow the golden path for standard work and only deviate for genuinely unusual requirements. The key distinction is that golden paths are recommended, not mandatory. They succeed because they're easier than the alternative, not because they're enforced.
What is Backstage and why does every platform team use it?
Backstage is an open-source developer portal created by Spotify. It provides a service catalog (who owns what), software templates (scaffold new services from blueprints), TechDocs (documentation-as-code), and a plugin architecture for extending functionality. It became the de facto standard because it solves the service discovery problem that every organization hits at scale -- "who owns this service, where's the runbook, and how do I deploy it?" Alternatives include Port, Cortex, and OpsLevel, which offer managed versions with less operational overhead.
What tools do I need to start building an IDP?
Start small. You need three things: (1) a developer portal for service discovery -- Backstage is the standard choice; (2) a GitOps CD tool for standardized deployments -- ArgoCD if you're on Kubernetes; and (3) a way to templatize infrastructure provisioning -- Crossplane for Kubernetes-native, or shared Terraform modules behind a simple UI. Don't adopt Score, Humanitec, or other advanced orchestration tools until you've proven the basics work. Add observability (Prometheus + Grafana) and policy enforcement (OPA) in the second phase.
The Platform Is the Product
The biggest mindset shift in platform engineering is treating your platform as a product, not a project. Products have users, feedback loops, roadmaps, and success metrics. Projects have deadlines and end dates. A platform that ships once and stops iterating is a platform that developers will route around.
Measure adoption, not features shipped. Track developer satisfaction through regular surveys. Monitor "time to first deploy" for new services and "time to resolve infrastructure requests." If those numbers aren't improving quarter over quarter, your platform isn't delivering value -- it's delivering overhead. The organizations that get platform engineering right build platforms developers actually want to use. The ones that get it wrong build platforms developers are forced to use. The difference shows up in velocity, retention, and the number of angry messages in your #infrastructure Slack channel.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Multi-Cluster Kubernetes: Argo CD ApplicationSet Patterns
When 10+ clusters or 50+ services break hand-written GitOps. ApplicationSet's four generators (cluster list, Git directory, PR, cluster decision), real production patterns (env promotion, per-tenant, multi-region failover, preview envs), and the sharp edges (template debugging, cascading mistakes, RBAC).
11 min read
AI/ML EngineeringLLM Latency: TTFT, ITL, and Why End-User Latency Isn't What You Think
LLM latency decomposes into TTFT (time to first token, 300-1500ms), ITL (inter-token, 10-30ms), and total time. Each has different causes and fixes. Why streaming dominates UX, when Cerebras/Groq beat Claude on speed, and the optimization playbook.
11 min read
DevOpsPython uv vs pip vs Poetry vs PDM: Speed Benchmarks 2026
Real benchmarks: uv installs Django + ML stack in 8s vs pip's 90s, Poetry's 50s, PDM's 38s. Why uv is fast (Rust + parallelism + PubGrub), what pip still does that uv doesn't, migration paths, and where Poetry's ergonomics still win.
12 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.