Prometheus and Grafana: Setting Up Your First Monitoring Stack
Deploy Prometheus and Grafana on Kubernetes using Helm. Learn the pull-based scrape model, PromQL essentials (rate, histogram_quantile, aggregation), Grafana dashboard design, recording rules, and Alertmanager routing.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Why Prometheus and Grafana Dominate Monitoring
If you're running anything in production -- containers, VMs, serverless functions -- you need metrics. And in 2026, Prometheus and Grafana remain the monitoring stack that most teams reach for first. Prometheus handles metrics collection and alerting. Grafana handles visualization. Together, they give you a monitoring foundation that scales from a single node to thousands of pods without vendor lock-in or per-host pricing surprises.
I've deployed this stack at companies ranging from five-person startups to large enterprises. The learning curve is real, but once you understand the scrape model and PromQL, you've got a system that's more flexible than any SaaS dashboard. Here's how to set it up properly from the start.
What Is Prometheus?
Definition: Prometheus is an open-source, pull-based monitoring system that scrapes metrics from instrumented targets at configurable intervals. It stores time-series data locally, provides a powerful query language called PromQL, and includes a built-in alerting engine. It's a CNCF graduated project.
The key insight behind Prometheus is the pull model. Instead of your applications pushing metrics to a central server, Prometheus reaches out and scrapes HTTP endpoints that expose metrics in a specific text format. This inversion matters: Prometheus knows what it's monitoring, can detect when a target goes down (because the scrape fails), and doesn't require your applications to know where the monitoring server lives.
Setting Up Prometheus with Helm
The fastest path to a production-ready Prometheus installation on Kubernetes is the kube-prometheus-stack Helm chart. It bundles Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics into a single deployment.
Step-by-Step Deployment
- Add the Helm repository.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update - Create a values file for your environment.
# values-production.yaml prometheus: prometheusSpec: retention: 15d storageSpec: volumeClaimTemplate: spec: storageClassName: gp3 resources: requests: storage: 50Gi resources: requests: memory: 2Gi cpu: 500m limits: memory: 4Gi grafana: adminPassword: ${GRAFANA_ADMIN_PASSWORD} persistence: enabled: true size: 10Gi alertmanager: alertmanagerSpec: retention: 120h - Install the chart.
helm install monitoring prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --create-namespace \ -f values-production.yaml - Verify the installation.
kubectl get pods -n monitoring kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring - Access Grafana at
http://localhost:3000. The kube-prometheus-stack ships with pre-built dashboards for Kubernetes cluster health, node metrics, and pod resource usage.
Pro tip: Don't skip the storage configuration. Prometheus without persistent volumes loses all historical data on pod restart. Use a storage class with good IOPS -- Prometheus writes heavily during compaction.
The Scrape Model Explained
Prometheus discovers targets through service discovery and periodically scrapes their /metrics endpoint. In Kubernetes, this happens automatically via annotations or ServiceMonitor custom resources.
# ServiceMonitor for a custom application
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
namespace: monitoring
labels:
release: monitoring
spec:
namespaceSelector:
matchNames:
- production
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 15s
path: /metrics
The interval: 15s setting means Prometheus hits this endpoint every 15 seconds. That's a good default for most services. Going lower (5s) increases storage and CPU usage. Going higher (60s) means you miss short-lived spikes.
What a Scrape Target Looks Like
Your application's /metrics endpoint returns plain text in the Prometheus exposition format:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",endpoint="/api/users",status="200"} 14523
http_requests_total{method="POST",endpoint="/api/orders",status="201"} 892
http_requests_total{method="GET",endpoint="/api/users",status="500"} 37
# HELP http_request_duration_seconds Request latency histogram
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.01"} 11234
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.05"} 13876
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.1"} 14401
http_request_duration_seconds_bucket{endpoint="/api/users",le="+Inf"} 14523
http_request_duration_seconds_sum{endpoint="/api/users"} 312.47
http_request_duration_seconds_count{endpoint="/api/users"} 14523
PromQL: The Query Language
PromQL is what makes Prometheus powerful -- and what trips up most newcomers. It's a functional query language designed specifically for time-series data. Here are the three patterns you'll use constantly.
Rate: Turning Counters into Useful Data
Counters only go up. A raw counter value like http_requests_total = 14523 isn't useful by itself. The rate() function calculates the per-second rate of increase over a time window:
# Requests per second over the last 5 minutes
rate(http_requests_total[5m])
# Error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) * 100
Watch out: Never use
rate()with a range shorter than twice your scrape interval. If you scrape every 15s, your minimum range should be 30s. Usingrate(metric[15s])with a 15s scrape interval produces unreliable results because you might only have one data point in the window.
histogram_quantile: Latency Percentiles
The most common use of histograms is calculating latency percentiles:
# 99th percentile request duration
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)
# 95th percentile, broken down by endpoint
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
)
Aggregation: Combining Time Series
# Total request rate across all pods
sum(rate(http_requests_total[5m]))
# Request rate per service
sum by (service) (rate(http_requests_total[5m]))
# Top 5 endpoints by error rate
topk(5, sum by (endpoint) (rate(http_requests_total{status=~"5.."}[5m])))
Grafana Dashboards That Actually Help
Grafana is only as good as the dashboards you build. I've seen teams with 50 dashboards where nobody looks at any of them. Here's the approach that works: build three dashboards and make them great.
| Dashboard | Purpose | Key Panels |
|---|---|---|
| Service Overview | RED metrics for all services | Request rate, error rate, p50/p95/p99 latency per service |
| Infrastructure | USE metrics for nodes and pods | CPU utilization, memory usage, disk I/O, network bandwidth |
| Alerts Overview | Current firing and pending alerts | Alert status table, recent alert history, silence management |
Pro tip: Use Grafana's template variables to make dashboards interactive. A single dropdown for
namespaceandserviceturns one dashboard into a view for every service in your cluster, instead of duplicating dashboards per team.
Recording Rules: Pre-Computing Expensive Queries
Some PromQL queries are expensive. If your Grafana dashboard computes histogram_quantile across thousands of time series every time someone loads the page, Prometheus will struggle. Recording rules pre-compute these expressions and store the results as new time series.
# recording-rules.yaml
groups:
- name: service_slis
interval: 30s
rules:
- record: service:http_request_duration_seconds:p99
expr: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
)
- record: service:http_requests:error_rate_5m
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service)
- record: service:http_requests:rate_5m
expr: |
sum(rate(http_requests_total[5m])) by (service)
Now your Grafana dashboards query service:http_requests:rate_5m instead of computing the aggregation on every load. The naming convention level:metric:operations is standard in the Prometheus community -- stick to it.
Alertmanager: Routing Alerts
Prometheus evaluates alert rules and fires them to Alertmanager. Alertmanager handles deduplication, grouping, silencing, and routing to the right channel (Slack, PagerDuty, email, webhooks).
# alertmanager-config.yaml
route:
receiver: default-slack
group_by: ['alertname', 'namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: pagerduty-critical
repeat_interval: 1h
- match:
severity: warning
receiver: slack-warnings
receivers:
- name: default-slack
slack_configs:
- channel: '#alerts'
send_resolved: true
- name: pagerduty-critical
pagerduty_configs:
- service_key: ${PD_SERVICE_KEY}
- name: slack-warnings
slack_configs:
- channel: '#alerts-warnings'
Cost Comparison: Self-Hosted vs. SaaS
Running your own Prometheus stack isn't free. Here's a realistic cost breakdown for a medium-sized deployment (50 nodes, 2M active time series):
| Approach | Monthly Cost (est.) | Operational Burden | Retention |
|---|---|---|---|
| Self-hosted kube-prometheus-stack | $200-500 (compute + storage) | High -- you manage upgrades, scaling, storage | Configurable (15d-1y+) |
| Grafana Cloud Free | $0 | None | 14d metrics, 10K series limit |
| Grafana Cloud Pro | $300-800 | Low | 13 months |
| Datadog Infrastructure | $1,150+ ($23/host) | None | 15 months |
| New Relic | $500-1,500 (data ingest based) | None | Varies by plan |
Self-hosting is cheapest at scale but requires dedicated SRE time. Grafana Cloud hits a sweet spot for teams that want the Prometheus/Grafana ecosystem without the operational burden. Datadog and New Relic are more expensive but provide all-in-one platforms with logs, traces, and APM bundled in.
Frequently Asked Questions
How much storage does Prometheus need?
A rough formula: bytes per sample (1-2 bytes after compression) multiplied by active time series multiplied by samples per day. For 1 million active series scraped every 15 seconds, expect roughly 10-15 GB per day after compaction. Retention of 15 days needs about 150-225 GB. Always over-provision by 30% -- cardinality explosions from bad labels can spike storage quickly.
Should I use Prometheus or Thanos for long-term storage?
Prometheus itself is designed for short-to-medium retention (days to weeks). For long-term storage beyond 30 days, use Thanos or Cortex (now Mimir). Thanos adds a sidecar to Prometheus that uploads compacted blocks to object storage (S3, GCS). Grafana Mimir is the newer option and is simpler to operate. Either one gives you years of retention at object storage pricing.
What's the difference between Prometheus and InfluxDB?
Prometheus uses a pull model and PromQL, is optimized for reliability (works even when the network is partitioned), and is tightly integrated with Kubernetes. InfluxDB uses a push model and SQL-like queries, handles higher cardinality natively, and works well for IoT or custom metrics use cases. For Kubernetes monitoring, Prometheus is the clear winner due to ecosystem support.
How do I avoid high cardinality problems?
High cardinality occurs when a label has too many unique values -- like a user ID or request ID on a metric. This creates millions of time series and crashes Prometheus. The fix: never put unbounded values in metric labels. Use labels for bounded categories (HTTP method, status code, service name). Move high-cardinality data into logs or trace attributes instead.
Can Prometheus monitor non-Kubernetes workloads?
Absolutely. Prometheus supports static target configuration, DNS-based discovery, Consul, EC2, and dozens of other service discovery mechanisms. For VMs, install node-exporter and point Prometheus at it. For cloud services, use exporters -- there are community exporters for AWS, GCP, databases, message queues, and nearly every popular service.
What is the kube-prometheus-stack and why should I use it?
The kube-prometheus-stack is a Helm chart that bundles Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics with pre-configured dashboards and alert rules. It uses the Prometheus Operator to manage configuration via Kubernetes custom resources (ServiceMonitors, PrometheusRules). It saves weeks of setup and gives you production-ready defaults out of the box.
Conclusion
Prometheus and Grafana aren't going anywhere. The ecosystem is mature, the community is massive, and the operational patterns are well-documented. Start with the kube-prometheus-stack Helm chart to get running in an hour. Learn rate(), histogram_quantile(), and sum by -- those three PromQL patterns cover 80% of what you'll write. Add recording rules early to keep dashboards fast. And route your alerts through Alertmanager so critical pages go to PagerDuty while warnings go to Slack.
The biggest mistake teams make isn't choosing the wrong tool -- it's deploying Prometheus and then never building useful dashboards or alert rules on top of it. The tool is only as good as the queries and alerts you write. Invest the time in PromQL, and it pays back every on-call shift.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
How eBPF Is Changing Observability
eBPF enables kernel-level observability without application code changes. Learn how Cilium, Pixie, Falco, and bpftrace use eBPF for network monitoring, security, profiling, and tracing in production Kubernetes environments.
10 min read
ObservabilityAlerting Done Right: Reducing Noise and Writing Actionable Alerts
Most alerts are noise. Learn how to write actionable alerts by focusing on symptoms, implementing hysteresis, using multi-window burn rate alerting, and routing through Alertmanager. Includes a five-question checklist for every alert.
12 min read
ObservabilitySLOs, SLAs, and Error Budgets: Running Reliable Services
SLOs, SLAs, and error budgets turn reliability into a measurable resource. Learn how to choose SLIs, set realistic targets, calculate error budgets, and implement burn rate alerts with Prometheus.
11 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.