Observability

Prometheus and Grafana: Setting Up Your First Monitoring Stack

Deploy Prometheus and Grafana on Kubernetes using Helm. Learn the pull-based scrape model, PromQL essentials (rate, histogram_quantile, aggregation), Grafana dashboard design, recording rules, and Alertmanager routing.

A
Abhishek Patel9 min read

Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Prometheus and Grafana: Setting Up Your First Monitoring Stack
Prometheus and Grafana: Setting Up Your First Monitoring Stack

Why Prometheus and Grafana Dominate Monitoring

If you're running anything in production -- containers, VMs, serverless functions -- you need metrics. And in 2026, Prometheus and Grafana remain the monitoring stack that most teams reach for first. Prometheus handles metrics collection and alerting. Grafana handles visualization. Together, they give you a monitoring foundation that scales from a single node to thousands of pods without vendor lock-in or per-host pricing surprises.

I've deployed this stack at companies ranging from five-person startups to large enterprises. The learning curve is real, but once you understand the scrape model and PromQL, you've got a system that's more flexible than any SaaS dashboard. Here's how to set it up properly from the start.

What Is Prometheus?

Definition: Prometheus is an open-source, pull-based monitoring system that scrapes metrics from instrumented targets at configurable intervals. It stores time-series data locally, provides a powerful query language called PromQL, and includes a built-in alerting engine. It's a CNCF graduated project.

The key insight behind Prometheus is the pull model. Instead of your applications pushing metrics to a central server, Prometheus reaches out and scrapes HTTP endpoints that expose metrics in a specific text format. This inversion matters: Prometheus knows what it's monitoring, can detect when a target goes down (because the scrape fails), and doesn't require your applications to know where the monitoring server lives.

Setting Up Prometheus with Helm

The fastest path to a production-ready Prometheus installation on Kubernetes is the kube-prometheus-stack Helm chart. It bundles Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics into a single deployment.

Step-by-Step Deployment

  1. Add the Helm repository.
    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
  2. Create a values file for your environment.
    # values-production.yaml
    prometheus:
      prometheusSpec:
        retention: 15d
        storageSpec:
          volumeClaimTemplate:
            spec:
              storageClassName: gp3
              resources:
                requests:
                  storage: 50Gi
        resources:
          requests:
            memory: 2Gi
            cpu: 500m
          limits:
            memory: 4Gi
    
    grafana:
      adminPassword: ${GRAFANA_ADMIN_PASSWORD}
      persistence:
        enabled: true
        size: 10Gi
    
    alertmanager:
      alertmanagerSpec:
        retention: 120h
  3. Install the chart.
    helm install monitoring prometheus-community/kube-prometheus-stack \
      --namespace monitoring \
      --create-namespace \
      -f values-production.yaml
  4. Verify the installation.
    kubectl get pods -n monitoring
    kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring
  5. Access Grafana at http://localhost:3000. The kube-prometheus-stack ships with pre-built dashboards for Kubernetes cluster health, node metrics, and pod resource usage.

Pro tip: Don't skip the storage configuration. Prometheus without persistent volumes loses all historical data on pod restart. Use a storage class with good IOPS -- Prometheus writes heavily during compaction.

The Scrape Model Explained

Prometheus discovers targets through service discovery and periodically scrapes their /metrics endpoint. In Kubernetes, this happens automatically via annotations or ServiceMonitor custom resources.

# ServiceMonitor for a custom application
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  namespace: monitoring
  labels:
    release: monitoring
spec:
  namespaceSelector:
    matchNames:
      - production
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

The interval: 15s setting means Prometheus hits this endpoint every 15 seconds. That's a good default for most services. Going lower (5s) increases storage and CPU usage. Going higher (60s) means you miss short-lived spikes.

What a Scrape Target Looks Like

Your application's /metrics endpoint returns plain text in the Prometheus exposition format:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",endpoint="/api/users",status="200"} 14523
http_requests_total{method="POST",endpoint="/api/orders",status="201"} 892
http_requests_total{method="GET",endpoint="/api/users",status="500"} 37

# HELP http_request_duration_seconds Request latency histogram
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.01"} 11234
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.05"} 13876
http_request_duration_seconds_bucket{endpoint="/api/users",le="0.1"} 14401
http_request_duration_seconds_bucket{endpoint="/api/users",le="+Inf"} 14523
http_request_duration_seconds_sum{endpoint="/api/users"} 312.47
http_request_duration_seconds_count{endpoint="/api/users"} 14523

PromQL: The Query Language

PromQL is what makes Prometheus powerful -- and what trips up most newcomers. It's a functional query language designed specifically for time-series data. Here are the three patterns you'll use constantly.

Rate: Turning Counters into Useful Data

Counters only go up. A raw counter value like http_requests_total = 14523 isn't useful by itself. The rate() function calculates the per-second rate of increase over a time window:

# Requests per second over the last 5 minutes
rate(http_requests_total[5m])

# Error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) * 100

Watch out: Never use rate() with a range shorter than twice your scrape interval. If you scrape every 15s, your minimum range should be 30s. Using rate(metric[15s]) with a 15s scrape interval produces unreliable results because you might only have one data point in the window.

histogram_quantile: Latency Percentiles

The most common use of histograms is calculating latency percentiles:

# 99th percentile request duration
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

# 95th percentile, broken down by endpoint
histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
)

Aggregation: Combining Time Series

# Total request rate across all pods
sum(rate(http_requests_total[5m]))

# Request rate per service
sum by (service) (rate(http_requests_total[5m]))

# Top 5 endpoints by error rate
topk(5, sum by (endpoint) (rate(http_requests_total{status=~"5.."}[5m])))

Grafana Dashboards That Actually Help

Grafana is only as good as the dashboards you build. I've seen teams with 50 dashboards where nobody looks at any of them. Here's the approach that works: build three dashboards and make them great.

DashboardPurposeKey Panels
Service OverviewRED metrics for all servicesRequest rate, error rate, p50/p95/p99 latency per service
InfrastructureUSE metrics for nodes and podsCPU utilization, memory usage, disk I/O, network bandwidth
Alerts OverviewCurrent firing and pending alertsAlert status table, recent alert history, silence management

Pro tip: Use Grafana's template variables to make dashboards interactive. A single dropdown for namespace and service turns one dashboard into a view for every service in your cluster, instead of duplicating dashboards per team.

Recording Rules: Pre-Computing Expensive Queries

Some PromQL queries are expensive. If your Grafana dashboard computes histogram_quantile across thousands of time series every time someone loads the page, Prometheus will struggle. Recording rules pre-compute these expressions and store the results as new time series.

# recording-rules.yaml
groups:
  - name: service_slis
    interval: 30s
    rules:
      - record: service:http_request_duration_seconds:p99
        expr: |
          histogram_quantile(0.99,
            sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
          )
      - record: service:http_requests:error_rate_5m
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
          /
          sum(rate(http_requests_total[5m])) by (service)
      - record: service:http_requests:rate_5m
        expr: |
          sum(rate(http_requests_total[5m])) by (service)

Now your Grafana dashboards query service:http_requests:rate_5m instead of computing the aggregation on every load. The naming convention level:metric:operations is standard in the Prometheus community -- stick to it.

Alertmanager: Routing Alerts

Prometheus evaluates alert rules and fires them to Alertmanager. Alertmanager handles deduplication, grouping, silencing, and routing to the right channel (Slack, PagerDuty, email, webhooks).

# alertmanager-config.yaml
route:
  receiver: default-slack
  group_by: ['alertname', 'namespace']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: pagerduty-critical
      repeat_interval: 1h
    - match:
        severity: warning
      receiver: slack-warnings

receivers:
  - name: default-slack
    slack_configs:
      - channel: '#alerts'
        send_resolved: true
  - name: pagerduty-critical
    pagerduty_configs:
      - service_key: ${PD_SERVICE_KEY}
  - name: slack-warnings
    slack_configs:
      - channel: '#alerts-warnings'

Cost Comparison: Self-Hosted vs. SaaS

Running your own Prometheus stack isn't free. Here's a realistic cost breakdown for a medium-sized deployment (50 nodes, 2M active time series):

ApproachMonthly Cost (est.)Operational BurdenRetention
Self-hosted kube-prometheus-stack$200-500 (compute + storage)High -- you manage upgrades, scaling, storageConfigurable (15d-1y+)
Grafana Cloud Free$0None14d metrics, 10K series limit
Grafana Cloud Pro$300-800Low13 months
Datadog Infrastructure$1,150+ ($23/host)None15 months
New Relic$500-1,500 (data ingest based)NoneVaries by plan

Self-hosting is cheapest at scale but requires dedicated SRE time. Grafana Cloud hits a sweet spot for teams that want the Prometheus/Grafana ecosystem without the operational burden. Datadog and New Relic are more expensive but provide all-in-one platforms with logs, traces, and APM bundled in.

Frequently Asked Questions

How much storage does Prometheus need?

A rough formula: bytes per sample (1-2 bytes after compression) multiplied by active time series multiplied by samples per day. For 1 million active series scraped every 15 seconds, expect roughly 10-15 GB per day after compaction. Retention of 15 days needs about 150-225 GB. Always over-provision by 30% -- cardinality explosions from bad labels can spike storage quickly.

Should I use Prometheus or Thanos for long-term storage?

Prometheus itself is designed for short-to-medium retention (days to weeks). For long-term storage beyond 30 days, use Thanos or Cortex (now Mimir). Thanos adds a sidecar to Prometheus that uploads compacted blocks to object storage (S3, GCS). Grafana Mimir is the newer option and is simpler to operate. Either one gives you years of retention at object storage pricing.

What's the difference between Prometheus and InfluxDB?

Prometheus uses a pull model and PromQL, is optimized for reliability (works even when the network is partitioned), and is tightly integrated with Kubernetes. InfluxDB uses a push model and SQL-like queries, handles higher cardinality natively, and works well for IoT or custom metrics use cases. For Kubernetes monitoring, Prometheus is the clear winner due to ecosystem support.

How do I avoid high cardinality problems?

High cardinality occurs when a label has too many unique values -- like a user ID or request ID on a metric. This creates millions of time series and crashes Prometheus. The fix: never put unbounded values in metric labels. Use labels for bounded categories (HTTP method, status code, service name). Move high-cardinality data into logs or trace attributes instead.

Can Prometheus monitor non-Kubernetes workloads?

Absolutely. Prometheus supports static target configuration, DNS-based discovery, Consul, EC2, and dozens of other service discovery mechanisms. For VMs, install node-exporter and point Prometheus at it. For cloud services, use exporters -- there are community exporters for AWS, GCP, databases, message queues, and nearly every popular service.

What is the kube-prometheus-stack and why should I use it?

The kube-prometheus-stack is a Helm chart that bundles Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics with pre-configured dashboards and alert rules. It uses the Prometheus Operator to manage configuration via Kubernetes custom resources (ServiceMonitors, PrometheusRules). It saves weeks of setup and gives you production-ready defaults out of the box.

Conclusion

Prometheus and Grafana aren't going anywhere. The ecosystem is mature, the community is massive, and the operational patterns are well-documented. Start with the kube-prometheus-stack Helm chart to get running in an hour. Learn rate(), histogram_quantile(), and sum by -- those three PromQL patterns cover 80% of what you'll write. Add recording rules early to keep dashboards fast. And route your alerts through Alertmanager so critical pages go to PagerDuty while warnings go to Slack.

The biggest mistake teams make isn't choosing the wrong tool -- it's deploying Prometheus and then never building useful dashboards or alert rules on top of it. The tool is only as good as the queries and alerts you write. Invest the time in PromQL, and it pays back every on-call shift.

A

Written by

Abhishek Patel

Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.

Related Articles

Enjoyed this article?

Get more like this in your inbox. No spam, unsubscribe anytime.

Comments

Loading comments...

Leave a comment

Stay in the loop

New articles delivered to your inbox. No spam.