CI/CD

Feature Flags: Decoupling Deployment from Release

Feature flags let you deploy code to production without exposing it to users. Learn the different flag types, build a flag system from scratch, compare LaunchDarkly vs Unleash, and manage the technical debt that comes with flag sprawl.

A
Abhishek Patel10 min read

Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Feature Flags: Decoupling Deployment from Release
Feature Flags: Decoupling Deployment from Release

Deployment Is Not Release

Most teams treat deployment and release as the same event. You merge to main, CI builds and ships, and users immediately see the new feature. This coupling is the source of most deployment anxiety. Feature flags break this coupling by letting you deploy code to production without exposing it to users. You ship the code dark, then flip a flag to release it -- on your schedule, to your chosen audience, with an instant kill switch.

This isn't a new concept, but the tooling has matured significantly. What started as simple if-else toggles has evolved into a category of infrastructure with sophisticated targeting, analytics, and lifecycle management.

What Are Feature Flags?

Definition: Feature flags (also called feature toggles or feature switches) are conditional statements in code that control whether a feature is visible or active, without deploying new code. They decouple deployment from release, enabling teams to ship code continuously while controlling feature exposure independently.

Types of Feature Flags

Not all flags serve the same purpose, and treating them all the same leads to technical debt. Here are the four categories you'll encounter:

TypePurposeLifespanExample
Release flagsControl feature rolloutDays to weeksEnable new checkout flow for 10% of users
Experiment flagsA/B testingWeeks to monthsTest two pricing page layouts
Ops flagsOperational controlPermanentCircuit breaker for a third-party API
Permission flagsEntitlement gatingPermanentPremium features for paid users

Release flags should be short-lived and removed after full rollout. Experiment flags live until the experiment concludes and a winner is chosen. Ops and permission flags are long-lived by design. The problems start when release flags stick around for months because nobody cleaned them up.

Building Feature Flags From Scratch

Before reaching for a third-party service, it's worth understanding what a minimal feature flag system looks like. Here's a practical implementation:

Step 1: Define the Flag Store

// flags.ts
interface FlagDefinition {
  key: string;
  enabled: boolean;
  rolloutPercentage?: number;  // 0-100
  allowedUserIds?: string[];
  allowedGroups?: string[];
}

const flags: Record<string, FlagDefinition> = {
  'new-checkout': {
    key: 'new-checkout',
    enabled: true,
    rolloutPercentage: 10,
  },
  'dark-mode': {
    key: 'dark-mode',
    enabled: true,
    allowedGroups: ['beta-testers'],
  },
  'maintenance-mode': {
    key: 'maintenance-mode',
    enabled: false,
  },
};

Step 2: Build the Evaluation Logic

// evaluator.ts
import { createHash } from 'crypto';

interface UserContext {
  userId: string;
  groups?: string[];
}

function isEnabled(flagKey: string, user?: UserContext): boolean {
  const flag = flags[flagKey];
  if (!flag || !flag.enabled) return false;

  // Check user allowlist
  if (flag.allowedUserIds?.includes(user?.userId ?? '')) return true;

  // Check group membership
  if (flag.allowedGroups && user?.groups) {
    if (flag.allowedGroups.some(g => user.groups!.includes(g))) return true;
  }

  // Percentage rollout using consistent hashing
  if (flag.rolloutPercentage !== undefined && user?.userId) {
    const hash = createHash('md5')
      .update(flagKey + user.userId)
      .digest('hex');
    const bucket = parseInt(hash.substring(0, 8), 16) % 100;
    return bucket < flag.rolloutPercentage;
  }

  // No targeting rules matched
  return flag.rolloutPercentage === undefined;
}

Step 3: Use It in Application Code

// In your route handler or component
if (isEnabled('new-checkout', { userId: req.user.id, groups: req.user.groups })) {
  return renderNewCheckout();
} else {
  return renderLegacyCheckout();
}

Pro tip: Use consistent hashing (combining the flag key and user ID) for percentage rollouts. This ensures the same user always gets the same experience, preventing jarring switches between old and new versions across page loads. It also means increasing the rollout from 10% to 20% adds new users without changing the experience for the original 10%.

LaunchDarkly vs Unleash vs Open-Source Alternatives

Building your own flag system works for simple use cases, but it breaks down when you need audit trails, gradual rollouts with metrics, multi-environment support, and a management UI. Here's how the major options compare:

FeatureLaunchDarklyUnleashFlagsmithDIY
PricingFrom $10/seat/monthFree (OSS) / Pro from $80/monthFree (OSS) / Cloud from $45/monthEngineering time
SDKs25+ languages15+ languages15+ languagesBuild your own
TargetingAdvanced (segments, rules, prereqs)Good (strategies, constraints)Good (segments, rules)Basic
A/B testingBuilt-in with Experimentation add-onVia metrics APIBuilt-inNot included
Audit logFull historyEvent logAudit logBuild your own
Evaluation speed<1ms (local cache)<1ms (local cache)<1ms (local cache)Depends on implementation
Self-hostedNo (SaaS only)YesYesYes

My recommendation: If you're a startup or small team, start with Unleash's open-source edition. It covers 90% of what you need, it's self-hosted so there's no per-seat cost, and it has solid SDK support. If you're at a larger company that needs enterprise features (SSO, approval workflows, detailed audit trails), LaunchDarkly is the gold standard but you'll pay for it.

Targeting Rules and Segmentation

Simple boolean flags are just the beginning. Real feature flag systems support targeting rules that determine which users see which variation:

Common Targeting Strategies

  1. Percentage rollout -- show the feature to X% of users, consistently hashed so the same user always sees the same variation.
  2. User allowlist -- explicitly enable for specific user IDs (useful for internal testing in production).
  3. Group/segment targeting -- enable for users matching specific attributes (plan tier, region, company, beta program membership).
  4. Environment targeting -- different flag states per environment (enabled in staging, disabled in production).
  5. Time-based targeting -- schedule a flag to enable at a specific date/time (product launches, marketing campaigns).
{
  "key": "new-pricing-page",
  "variations": [
    { "value": "control", "name": "Original pricing" },
    { "value": "variant-a", "name": "Simplified tiers" },
    { "value": "variant-b", "name": "Usage-based pricing" }
  ],
  "rules": [
    {
      "if": { "attribute": "email", "op": "endsWith", "value": "@ourcompany.com" },
      "serve": "variant-a"
    },
    {
      "if": { "attribute": "plan", "op": "in", "value": ["enterprise"] },
      "serve": "control"
    }
  ],
  "fallthrough": {
    "rollout": [
      { "variation": "control", "weight": 80 },
      { "variation": "variant-a", "weight": 10 },
      { "variation": "variant-b", "weight": 10 }
    ]
  }
}

The Technical Debt Problem

Feature flags create technical debt by design. Every flag adds a conditional branch in your code, doubling the number of possible execution paths. Ten flags means 1,024 theoretical combinations. This isn't hypothetical -- I've seen production bugs caused by two flags interacting in ways nobody anticipated.

Signs of Flag Sprawl

  • Flags that have been "temporary" for six months
  • Nobody knows who owns a flag or whether it's safe to remove
  • Bugs that only reproduce with specific flag combinations
  • New engineers are afraid to touch flagged code because they don't understand all the variations

How to Manage Flag Lifecycle

  1. Assign an owner and expiration date to every release flag when it's created. Put it in the flag metadata.
  2. Set up automated alerts for flags past their expiration date. Unleash and LaunchDarkly both support this.
  3. Remove flags aggressively. Once a feature is fully rolled out (100% for 2+ weeks with no issues), remove the flag and dead code path in the next sprint.
  4. Track flag count as a metric. If your active flag count keeps growing, your cleanup process is broken.
  5. Run flag linting in CI. Write a linter that detects references to flags that have been removed from your flag service, catching dead code automatically.

Watch out: Never nest feature flags. If flag A controls a feature that contains code gated by flag B, you've created a combinatorial testing nightmare. If you find yourself nesting flags, it's a sign that you need to break the feature into smaller, independently releasable pieces.

Feature Flags in CI/CD Pipelines

Feature flags integrate into your CI/CD workflow at multiple points:

# GitHub Actions example
name: CI
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        flag-state: [all-on, all-off, production]
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm test
        env:
          FLAG_OVERRIDE: ${{ matrix.flag-state }}
          # all-on: test that new code works
          # all-off: test that old code still works
          # production: test current production state

Testing with flags in both states catches regressions in both the old and new code paths. This is critical -- when you eventually remove the flag, you need confidence that the winning path works independently.

Frequently Asked Questions

What is the difference between a feature flag and a feature toggle?

They're the same thing -- different names for the same concept. "Feature flag" is more common in North American companies, while "feature toggle" appears more in Martin Fowler's original writing. Some teams use "toggle" for simple on/off switches and "flag" for flags with targeting rules or multiple variations, but this distinction isn't standardized.

Do feature flags add latency to my application?

Negligible latency if implemented correctly. All major feature flag SDKs (LaunchDarkly, Unleash, Flagsmith) cache flag definitions locally and evaluate them in-memory, typically under 1 millisecond. The SDK syncs with the server in the background. You're not making a network call on every flag check. A naive implementation that queries a database on every evaluation would add latency, but no production-grade system works that way.

How many feature flags should a team have active at once?

There's no universal number, but a healthy range for a team of 5-10 engineers is 10-30 active flags. More than 50 active flags is a red flag (pun intended) that cleanup isn't happening. Track the ratio of flags created to flags removed each month. If creation consistently outpaces removal, you have a process problem that needs attention.

Should feature flags be stored in code or in an external service?

Short-lived release flags belong in an external service where they can be toggled without a deployment. Long-lived ops flags (circuit breakers, kill switches) can live in code or config files if you need them to work even when the flag service is down. The hybrid approach works well: use an external service for targeting and rollout control, but hardcode critical operational flags as fallbacks.

How do feature flags work with trunk-based development?

Feature flags are what make trunk-based development practical. Instead of long-lived feature branches, developers commit directly to main with new functionality hidden behind flags. The code ships to production continuously but isn't visible to users until the flag is enabled. This eliminates merge conflicts from long-lived branches and enables true continuous delivery.

What happens if my feature flag service goes down?

All major SDKs cache the last-known flag state locally, so your application continues working with the most recently fetched configuration. Some SDKs also support bootstrapping from a local file as a backup. Flags evaluated during an outage use cached values. New flag changes won't propagate until the service recovers, but existing behavior is preserved.

Can feature flags be used for A/B testing?

Yes, and this is one of their strongest use cases. Multi-variate flags assign users to different variations (control, variant A, variant B) using consistent hashing. You measure business metrics per variation and pick the winner. LaunchDarkly's Experimentation add-on and Flagsmith both offer built-in statistical analysis. For basic A/B tests, a percentage rollout flag with analytics tracking works fine.

Conclusion

Feature flags are the bridge between continuous deployment and controlled releases. They let you ship code daily while releasing features on a product schedule. The technical implementation is straightforward -- it's the discipline around lifecycle management that separates teams that benefit from flags and teams that drown in them.

Start with release flags on your next feature. Deploy the code dark, validate in production with internal users, then roll out to 10%, 50%, 100%. Remove the flag within two weeks of full rollout. Get that cycle working before adding experiment flags or complex targeting rules. The tooling only helps if the habits are in place.

A

Written by

Abhishek Patel

Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.

Related Articles

Enjoyed this article?

Get more like this in your inbox. No spam, unsubscribe anytime.

Comments

Loading comments...

Leave a comment

Stay in the loop

New articles delivered to your inbox. No spam.