OpenTelemetry Distributed Tracing: Instrument Once (2026)

How Distributed Tracing Got to OpenTelemetry: A Short, Honest Timeline

Distributed tracing did not start with OpenTelemetry. It started at Google in 2010 with the Dapper paper, which described how Google stitched together request traces across thousands of RPCs. Everything since has been the open-source world chasing that idea through five successive standards -- each one claiming to be the last.

2012 -- Zipkin (Twitter). First widely used open-source tracer. Introduced the B3 header format that is still in legacy systems a decade later. Each vendor wrote its own client library.
2016 -- OpenTracing. CNCF-hosted. Tried to standardize the tracing API so you could swap backends by changing one dependency. Traction was uneven; auto-instrumentation stayed fragmented.
2017 -- OpenCensus (Google). Covered tracing and metrics, shipped strong auto-instrumentation for Go and Java, but overlapped so much with OpenTracing that teams had to pick sides.
2019 -- OpenTelemetry announced. The two projects merged under CNCF specifically to stop the schism. Spec-first approach, clean SDK contracts, and a new wire protocol (OTLP) designed from scratch.
2021 -- Tracing SDK reaches GA. Vendors start shipping OTLP endpoints alongside their proprietary ones.
2023 -- Metrics spec GA. Prometheus remote-write compatibility lands; OTel becomes a plausible default for metrics too.
Late 2024 -- Logs GA. The final pillar lands. For the first time, traces, metrics, and logs share a single SDK and a single exporter.
2026 (today). Every major backend -- Datadog, New Relic, Grafana Tempo, Honeycomb, AWS X-Ray -- accepts OTLP natively. OpenTracing and OpenCensus are archived. Instrumentation choice is no longer coupled to backend choice.

That history matters because it explains what OTel is and is not. OpenTelemetry is not a backend -- it does not store traces, it does not render flamegraphs. It is the plumbing layer the industry finally agreed on: an SDK, a spec, and a protocol (OTLP) that moves traces, metrics, and logs from your application to whatever backend you point it at. The rest of this guide is about using that plumbing well.

The OTel Data Model

Understanding OTel starts with understanding its core concepts: traces, spans, context propagation, and baggage.

Traces and Spans

A trace represents the full journey of a request through your system. It's a directed acyclic graph of spans. Each span represents a unit of work -- an HTTP request, a database query, a message publish. Spans have:

Trace ID: A globally unique identifier shared by all spans in the trace
Span ID: Unique to this span
Parent Span ID: Links this span to its caller
Name: A human-readable operation name (e.g., GET /api/users)
Start/End time: Duration of the operation
Status: OK, ERROR, or UNSET
Attributes: Key-value metadata (HTTP method, status code, database statement)
Events: Timestamped annotations within the span (e.g., exception details)

Trace: abc123
  |
  +-- Span: API Gateway (parent)
       |   method: GET, path: /api/orders/42, status: 200, duration: 145ms
       |
       +-- Span: Auth Service
       |       duration: 12ms, cache_hit: true
       |
       +-- Span: Order Service
            |   duration: 128ms
            |
            +-- Span: PostgreSQL Query
            |       db.statement: SELECT * FROM orders WHERE id = 42
            |       duration: 23ms
            |
            +-- Span: Redis Cache Set
                    duration: 2ms

Context Propagation

Context propagation is how OTel carries trace context across service boundaries. When Service A calls Service B over HTTP, the trace ID and parent span ID get injected into HTTP headers. Service B extracts them and continues the same trace. OTel supports two propagation formats:

Format	Header	Status
W3C Trace Context	`traceparent`, `tracestate`	W3C standard, default in OTel
B3 (Zipkin)	`X-B3-TraceId`, `X-B3-SpanId`	Legacy, still used in older systems

W3C Trace Context is the default and what you should use unless you're interoperating with legacy Zipkin-instrumented services.

Baggage

Baggage is metadata that propagates across all services in a trace -- things like user ID, tenant ID, or feature flags. Unlike span attributes (which stay on one span), baggage travels through the entire request chain. Use it sparingly; every baggage entry adds bytes to every cross-service call.

Auto-Instrumentation vs. Manual Instrumentation

OTel offers two instrumentation approaches, and you'll typically use both.

Auto-instrumentation hooks into common libraries (HTTP clients, database drivers, message queues) and creates spans automatically. In Node.js, this means a single setup call instruments Express, pg, ioredis, and dozens of other libraries without changing application code.

Manual instrumentation lets you create custom spans for business logic that auto-instrumentation can't capture -- like tracking the duration of a machine learning inference call or a batch processing step.

Node.js Express Tutorial

Here's a complete setup for a Node.js Express application with auto-instrumentation:

Install the packages.

npm install @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/exporter-metrics-otlp-http \
  @opentelemetry/sdk-metrics

Create the instrumentation file. This must be loaded before your application code.

// tracing.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';

const sdk = new NodeSDK({
  serviceName: 'order-service',
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: 'http://otel-collector:4318/v1/metrics',
    }),
    exportIntervalMillis: 15000,
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-fs': { enabled: false },
    }),
  ],
});

sdk.start();

Load it before your app.

node --require ./tracing.js ./app.js
# Or with ts-node:
node --require ./tracing.ts ./app.ts

Add manual spans for business logic.

import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');

async function processOrder(orderId: string) {
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      span.setAttribute('order.id', orderId);

      const validated = await validateOrder(orderId);
      span.addEvent('order.validated');

      const charged = await chargePayment(orderId);
      span.addEvent('payment.charged');

      span.setStatus({ code: 1 }); // OK
      return charged;
    } catch (error) {
      span.setStatus({ code: 2, message: String(error) }); // ERROR
      span.recordException(error as Error);
      throw error;
    } finally {
      span.end();
    }
  });
}

Pro tip: Always call span.end() in a finally block. Forgetting to end a span causes memory leaks and produces incomplete traces that are difficult to debug.

The OTel Collector Pipeline

The OpenTelemetry Collector is a vendor-agnostic proxy that sits between your applications and your backend. It receives telemetry, processes it (batching, filtering, sampling, enrichment), and exports it to one or more destinations.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
  attributes:
    actions:
      - key: environment
        value: production
        action: upsert
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: error-traces
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow-traces
        type: latency
        latency: { threshold_ms: 1000 }
      - name: sample-rest
        type: probabilistic
        probabilistic: { sampling_percentage: 10 }

exporters:
  otlphttp/tempo:
    endpoint: http://tempo:4318
  otlphttp/honeycomb:
    endpoint: https://api.honeycomb.io
    headers:
      x-honeycomb-team: ${HONEYCOMB_API_KEY}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes, tail_sampling]
      exporters: [otlphttp/tempo, otlphttp/honeycomb]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlphttp/tempo]

The collector is the right place for tail-based sampling. Your applications send 100% of spans to the collector, and the collector decides which traces to keep. This gives you all error traces, all slow traces, and a random sample of everything else -- without changing application code.

Backend Comparison

Once telemetry leaves the collector, it needs a backend for storage and querying. Here's how the major options compare:

Backend	Type	Pricing Model	Best For
Jaeger	Open source, self-hosted	Infrastructure cost only	Teams already running Elasticsearch or Cassandra
Grafana Tempo	Open source, self-hosted or Cloud	Free self-hosted; Cloud from $0	Teams in the Grafana ecosystem wanting object-storage-backed traces
Honeycomb	SaaS	Event-based ($0.20/1M events)	Teams prioritizing query power and high-cardinality exploration
Datadog APM	SaaS	Per host ($31/host/mo) + ingestion	Teams wanting an all-in-one platform with logs, metrics, and traces
AWS X-Ray	SaaS	Per trace ($5/1M traces)	AWS-native shops wanting minimal operational overhead

Watch out: Trace storage costs can surprise you. A busy service generating 10,000 requests/second with 5 spans per trace produces 4.3 billion spans per day. Without sampling, you're looking at thousands of dollars per month on any SaaS backend. Always implement sampling in the collector.

How to Choose a Tracing Backend

The decision tree is simpler than vendors want you to think:

Already using Grafana? Use Tempo. It integrates natively with Grafana, stores traces in object storage (cheap), and accepts OTLP directly.
Need powerful ad-hoc querying? Use Honeycomb. Its query engine handles high-cardinality data better than anyone else, and the BubbleUp feature surfaces anomalies automatically.
Want everything in one platform? Datadog or New Relic. You'll pay more, but you get logs, metrics, traces, profiling, and error tracking under one roof.
Running on AWS and want simplicity? X-Ray works fine for basic tracing needs and requires zero infrastructure management.
Want full control and have Elasticsearch? Jaeger is battle-tested and free. But operating Elasticsearch at scale is its own project.

Frequently Asked Questions

What is the difference between OpenTelemetry and OpenTracing?

OpenTracing was the original CNCF tracing standard. OpenCensus was Google's competing project for metrics and tracing. OpenTelemetry merged both projects into a single unified standard. OpenTracing and OpenCensus are deprecated -- all development has moved to OpenTelemetry. If you're on OpenTracing, OTel provides compatibility shims to migrate incrementally.

Does OpenTelemetry add latency to my application?

The overhead is measurable but small. Auto-instrumentation typically adds 1-3% latency per span. For most services, this is negligible. If you're building ultra-low-latency systems (sub-millisecond), you may want to benchmark carefully and disable instrumentations you don't need. The biggest performance concern is usually the exporter -- use the batch processor to avoid blocking your application on network calls.

Can I use OpenTelemetry with Datadog or New Relic?

Yes. Both Datadog and New Relic accept OTLP data natively. You can instrument with OpenTelemetry SDKs and export directly to their OTLP endpoints or route through the OTel Collector. This gives you vendor-neutral instrumentation while using a commercial backend. If you decide to switch vendors later, you only change the exporter configuration -- no application code changes.

What is tail-based sampling and when should I use it?

Tail-based sampling makes sampling decisions after a trace completes, rather than at the beginning. This lets you keep 100% of error traces and slow traces while sampling routine traffic. Use it when you want high fidelity for anomalies without the cost of storing everything. The OTel Collector supports tail-based sampling natively -- configure it in the processor pipeline.

How do I correlate traces with logs?

Inject the trace ID and span ID into your log context. Most OTel SDKs provide log bridge integrations that do this automatically. In your structured logs, include trace_id and span_id fields. Grafana can then link from a log entry directly to the corresponding trace in Tempo, and vice versa. This correlation is what turns separate pillars into a unified observability experience.

Is OpenTelemetry ready for production?

The tracing SDK and API are stable (GA) across all major languages. Metrics reached GA in 2023. Logs reached GA in late 2024. The Collector is production-ready and used at massive scale. The only area still evolving rapidly is profiling support, which is experimental. For traces and metrics, OTel is absolutely production-ready and used by thousands of companies in production today.

Conclusion

OpenTelemetry has won the instrumentation layer. The days of vendor-specific SDKs are over. Instrument once with OTel, export to any backend, and switch vendors with a configuration change -- not a code change.

Start with auto-instrumentation to get traces flowing immediately. Add manual spans for business-critical operations. Deploy the OTel Collector for sampling, enrichment, and multi-destination export. And choose your backend based on your team's existing stack and query needs, not on vendor sales pitches. The beauty of OTel is that your backend decision is no longer permanent -- you can always change it later.

OpenTelemetry: The Standard for Distributed Tracing in 2026