Skip to content
← Back to trunk-based development guide Trunk-based development

Feature flags and trunk-based development

Trunk-based development without feature flags collapses back into long-lived branches inside a quarter. Flags are how you ship code that is not finished yet without users seeing it. The patterns, the types, and how to keep flag debt from eating your config.

In one paragraph

Feature flags decouple "code is in main" from "users see this feature." That decoupling is what lets trunk-based development keep branches short even when features take weeks to build. The hard part is not adding flags, it is removing them. Most teams accumulate flag debt because the work to retire a flag falls off the backlog the moment the feature is live.

Why trunk-based development needs flags

Without flags, an engineer working on a multi-week feature has two options. Either they hold the work locally on a long-lived branch (the failure mode trunk-based development was designed to fix), or they refuse to merge anything until everything is done (same problem, different appearance).

With flags, the third option appears: merge each piece of the feature to main as soon as it is reviewable, behind a flag that is off in production. The branch closes in hours. The feature ships in weeks. The two timelines stop fighting each other.

A typical rollout

flowchart LR
  C["Code merged<br/>flag OFF"]
  D["Internal users<br/>flag ON"]
  P1["1% of traffic"]
  P10["10% of traffic"]
  P100["100% of traffic"]
  K["Flag deleted"]

  C --> D --> P1 --> P10 --> P100 --> K

  style C fill:#F2F4F7,stroke:#43A7E5,color:#1A1D24
  style D fill:#F2F4F7,stroke:#43A7E5,color:#1A1D24
  style P1 fill:#FFF4E5,stroke:#F27B2A,color:#1A1D24
  style P10 fill:#FFF4E5,stroke:#F27B2A,color:#1A1D24
  style P100 fill:#E6F8F2,stroke:#1CB893,color:#1A1D24
  style K fill:#E6F8F2,stroke:#1CB893,color:#1A1D24

Code merges with the flag off. Internal users flip it on. Production traffic ramps in stages until 100%. The flag gets deleted. Total time: typically two to six weeks.

The flag is off when the code lands. Engineers and internal users flip it on for themselves to dogfood. Once the feature is reasonably stable, the flag rolls out to a small fraction of production traffic, then a larger fraction, then all of it. When the rollout is complete and rollback is no longer plausible, the flag and the old code path get deleted.

That last step is the one teams skip.

The four kinds of flags

Pete Hodgson's feature toggle taxonomy on martinfowler.com is the standard reference. Four kinds, with very different lifetimes and very different retirement strategies.

Type Purpose Lifetime Retirement
Release toggleHide unfinished workDays to weeksDelete after rollout
Experiment toggleA/B test variantsWeeks to monthsDelete when experiment ends
Ops toggleKill switches, circuit breakersLong-livedStays as ops control
Permission togglePer-customer entitlementsIndefiniteBecomes permanent config

Trunk-based development uses the first kind heavily and the second kind regularly. The third and fourth are valid but separate concerns, and they have to be tracked separately or they pollute the codebase with what looks like flag debt.

Flag debt and how to avoid it

Release toggles are supposed to be temporary. In practice, the work to retire one falls off the backlog the moment the feature is live. The flag stays. New code gets written behind if (newBillingFlow) branches. Old code paths stay alive forever in case anyone ever flips the flag back. The codebase grows two implementations of every feature with a flag deciding which one is real.

The patterns that prevent it:

  • Expiry dates on every release flag. The flag library tracks when each flag was created. Anything past 90 days fails CI or shows up on a weekly report. Treat the flag as a TODO with a deadline.
  • Retirement PR opens at the same time as the rollout PR. When the flag goes to 100%, the PR that deletes the flag is already drafted, sitting in review.
  • One owner per flag. If nobody owns the flag, nobody retires it. The owner is on the hook for cleanup, same as any code.
  • Categorize at creation. Release vs experiment vs ops vs permission goes into the flag definition. Release flags get the deletion clock. Permission flags do not.

Tooling, briefly

Three buckets, depending on team size and tolerance for managed services.

  • In-house, config-file based. A YAML or JSON file in the repo, read at startup, with simple targeting. Free. Works for small teams. Updates require a deploy.
  • In-house, runtime-evaluated. A small service that reads from a database and evaluates rules. Targeting by user, percentage, environment. Slightly more work, no deploy needed to flip a flag.
  • Managed (LaunchDarkly, Statsig, GrowthBook, Unleash). Targeting, analytics, audit logs, kill switches. Real money once the team grows.

The choice matters less than the discipline around retirement. A team with a homegrown flag file and a strict 90-day expiry policy ends up with cleaner code than a team using LaunchDarkly with no retirement process.

Flags ship code dark. The merge queue keeps main green.

Trunk-based development needs both. Mergify is the merge queue piece, and it works with whichever feature flag system you already use.