RSpec randomized order is the messenger, not the bug

Why a green spec on --order defined fails on --order random with a different seed every CI run, how to bisect to the dependent pair, and the cleanup hooks that fix the underlying coupling.

A Rails app has 4,000 RSpec examples. The suite has been green for months. Today’s CI run fails on OrderProcessor#calculates_tax, a spec that has not changed in two years. Tomorrow’s run fails on a different spec. Next week’s run is green again. Welcome to randomized order surfacing implicit dependencies.

We see this pattern often enough on Mergify Test Insights that it earned its own slot in our flaky RSpec catalog. The cause is hidden coupling between specs. Random order is the diagnostic, not the disease.

What you see

A Rails-generated .rspec includes:

--order random

Most teams keep it on. RSpec picks a new seed each run (Randomized with seed 12345) and shuffles the spec order for that run. A spec that mutates a class variable, registers a Sidekiq worker, or stubs a constant without resetting it leaves that mutation for whatever spec runs next. With a different seed, the dependent pair runs in a different order, and the failure jumps to a new spec.

# spec/models/user_spec.rb
RSpec.describe User do
  it "registers the welcome callback" do
    User.register_callback(:welcome) { |u| WelcomeMailer.deliver(u) }
    expect(User.callbacks).to include(:welcome)
  end
end

# spec/models/order_spec.rb (runs after user_spec.rb on this seed)
RSpec.describe Order do
  it "creates the order without notifications" do
    user = create(:user) # User.callbacks now contains :welcome from the previous spec
    order = Order.create!(user: user)
    expect(WelcomeMailer).not_to have_received(:deliver) # fails
  end
end

The user spec leaves the callback registered. The order spec creates a user, the callback fires, the mailer gets called, the assertion fails. The order spec did nothing wrong. The user spec did nothing visibly wrong either — its assertion passed.

Reproduce with the failing seed

The first thing the failure log tells you is the seed:

Randomized with seed 47291

Re-run with that seed locally:

bundle exec rspec --seed 47291

Now the failure is deterministic. You can stop here and start guessing, or you can let RSpec narrow the dependency to its minimal pair:

bundle exec rspec --seed 47291 --bisect

--bisect runs the spec in increasingly smaller subsets to find the minimal set of specs that, run together in order, reproduces the failure. For a 4,000-spec suite, this typically takes 5-15 minutes and produces output like:

The minimal reproduction command is:
  rspec ./spec/models/user_spec.rb[1:1] ./spec/models/order_spec.rb[1:1]

Now you know exactly which two specs are coupled.

The naive fix and why it is incomplete

Two common workarounds, neither of which scales:

Pin the seed:

--order random
--seed 1

The suite runs in the same order every time. The flake disappears because the dependent pair stops running in the failing order. The hidden coupling is still there, undetected, until someone removes a spec that happens to sit between the pair and breaks the implicit ordering.

Add i_suck_and_my_tests_are_order_dependent!:

RSpec.configure do |config|
  config.order = :defined
end

Or per-context:

RSpec.describe Order do
  i_suck_and_my_tests_are_order_dependent!
  # ...
end

The annotation name is intentionally hostile: the only honest reason to use it is that you know the suite has a coupling bug you are not ready to fix. Both workarounds buy time. Neither prevents the next dependent pair from forming.

The fix that holds

Reset whatever was mutated. The pattern depends on what was mutated:

Class-level callbacks, Sidekiq workers, registered listeners:

RSpec.configure do |config|
  config.before(:each) { User.callbacks.clear }
  config.before(:each) { Sidekiq::Worker.clear_all }
end

If the callback set is part of the production contract, a per-spec clear is wrong (you would be deleting real callbacks). Capture and restore around the mutating spec:

it "registers the welcome callback" do
  original = User.callbacks.dup
  User.register_callback(:welcome) { |u| WelcomeMailer.deliver(u) }
  expect(User.callbacks).to include(:welcome)
ensure
  User.callbacks.replace(original)
end

Constant stubs:

Use stub_const from RSpec mocks. It auto-reverts at the end of the example:

it "discounts in the test environment" do
  stub_const("Pricing::DISCOUNT", 0.5)
  expect(Pricing.for(:pro)).to eq(49)
end

Never reassign a constant directly in a spec. Ruby will warn and the change persists for the entire process.

Time mutations:

travel_to from ActiveSupport::Testing::TimeHelpers auto-reverts when used with a block. Outside a block, pair every travel_to with travel_back (Rails-style, not Timecop’s Timecop.return).

RSpec.configure do |config|
  config.include ActiveSupport::Testing::TimeHelpers
  config.after(:each) { travel_back }
end

Belt and braces: even if you forget the block form, the global after(:each) catches it.

How Mergify catches this before you ship

Random-order failures are easy to dismiss as “weird flake, retry it.” A retry loop usually wins because the next seed picks a different order. The team learns to ignore the failure category, which is exactly when a real regression slips through.

Test Insights records the seed for every CI run and tracks failures by their seed range. When a spec fails consistently under a specific subset of seeds and passes everywhere else, the dashboard surfaces the ordering signature: “OrderProcessor fails with seeds 40000-50000 only when UserSpec runs first.” You get the pair without running --bisect yourself.

Quarantine kicks in once the pattern is clear, so the merge queue keeps moving while you write the missing reset hook.

Want to know which of your specs have hidden coupling without running --bisect for an hour? Point Mergify at your repo. Native gem: rspec-mergify. One Gemfile line and you’re set.

More patterns like this

Order dependencies are one of the eight patterns in the flaky-tests-in-RSpec guide. The others are variants of the same theme: state that crosses specs because the cleanup did not run when expected. database_cleaner strategy mismatches under JS-driven Capybara, lazy let versus let! surprises, Mocha stubs that forgot to verify, travel_to without travel_back. One bug class, many faces.

Random order makes them findable. Without it, the same coupling sits in the suite for years until a refactor exposes it as a regression.

RSpec randomized order is the messenger, not the bug

What you see

Reproduce with the failing seed

The naive fix and why it is incomplete

The fix that holds

How Mergify catches this before you ship

More patterns like this

Tired of flaky tests blocking your pipeline?

Recommended posts

pytest-xdist makes the suite faster and the flakes weirder

pytest-xdist makes the suite faster and the flakes weirder

Playwright auto-wait is great, until your component re-renders mid-action

Playwright auto-wait is great, until your component re-renders mid-action

Vitest's `threads` pool is fast. It is also why your suite leaks state.

Vitest's `threads` pool is fast. It is also why your suite leaks state.