RSpec randomized order is the messenger, not the bug
Why a green spec on --order defined fails on --order random with a different seed every CI run, how to bisect to the dependent pair, and the cleanup hooks that fix the underlying coupling.
A Rails app has 4,000 RSpec examples. The suite has been green for months. Today’s CI run fails on OrderProcessor#calculates_tax, a spec that has not changed in two years. Tomorrow’s run fails on a different spec. Next week’s run is green again. Welcome to randomized order surfacing implicit dependencies.
We see this pattern often enough on Mergify Test Insights that it earned its own slot in our flaky RSpec catalog. The cause is hidden coupling between specs. Random order is the diagnostic, not the disease.
What you see
A Rails-generated .rspec includes:
--order random
Most teams keep it on. RSpec picks a new seed each run (Randomized with seed 12345) and shuffles the spec order for that run. A spec that mutates a class variable, registers a Sidekiq worker, or stubs a constant without resetting it leaves that mutation for whatever spec runs next. With a different seed, the dependent pair runs in a different order, and the failure jumps to a new spec.
# spec/models/user_spec.rb
RSpec.describe User do
it "registers the welcome callback" do
User.register_callback(:welcome) { |u| WelcomeMailer.deliver(u) }
expect(User.callbacks).to include(:welcome)
end
end
# spec/models/order_spec.rb (runs after user_spec.rb on this seed)
RSpec.describe Order do
it "creates the order without notifications" do
user = create(:user) # User.callbacks now contains :welcome from the previous spec
order = Order.create!(user: user)
expect(WelcomeMailer).not_to have_received(:deliver) # fails
end
end
The user spec leaves the callback registered. The order spec creates a user, the callback fires, the mailer gets called, the assertion fails. The order spec did nothing wrong. The user spec did nothing visibly wrong either — its assertion passed.
Reproduce with the failing seed
The first thing the failure log tells you is the seed:
Randomized with seed 47291
Re-run with that seed locally:
bundle exec rspec --seed 47291
Now the failure is deterministic. You can stop here and start guessing, or you can let RSpec narrow the dependency to its minimal pair:
bundle exec rspec --seed 47291 --bisect
--bisect runs the spec in increasingly smaller subsets to find the minimal set of specs that, run together in order, reproduces the failure. For a 4,000-spec suite, this typically takes 5-15 minutes and produces output like:
The minimal reproduction command is:
rspec ./spec/models/user_spec.rb[1:1] ./spec/models/order_spec.rb[1:1]
Now you know exactly which two specs are coupled.
The naive fix and why it is incomplete
Two common workarounds, neither of which scales:
Pin the seed:
--order random
--seed 1
The suite runs in the same order every time. The flake disappears because the dependent pair stops running in the failing order. The hidden coupling is still there, undetected, until someone removes a spec that happens to sit between the pair and breaks the implicit ordering.
Add i_suck_and_my_tests_are_order_dependent!:
RSpec.configure do |config|
config.order = :defined
end
Or per-context:
RSpec.describe Order do
i_suck_and_my_tests_are_order_dependent!
# ...
end
The annotation name is intentionally hostile: the only honest reason to use it is that you know the suite has a coupling bug you are not ready to fix. Both workarounds buy time. Neither prevents the next dependent pair from forming.
The fix that holds
Reset whatever was mutated. The pattern depends on what was mutated:
Class-level callbacks, Sidekiq workers, registered listeners:
RSpec.configure do |config|
config.before(:each) { User.callbacks.clear }
config.before(:each) { Sidekiq::Worker.clear_all }
end
If the callback set is part of the production contract, a per-spec clear is wrong (you would be deleting real callbacks). Capture and restore around the mutating spec:
it "registers the welcome callback" do
original = User.callbacks.dup
User.register_callback(:welcome) { |u| WelcomeMailer.deliver(u) }
expect(User.callbacks).to include(:welcome)
ensure
User.callbacks.replace(original)
end
Constant stubs:
Use stub_const from RSpec mocks. It auto-reverts at the end of the example:
it "discounts in the test environment" do
stub_const("Pricing::DISCOUNT", 0.5)
expect(Pricing.for(:pro)).to eq(49)
end
Never reassign a constant directly in a spec. Ruby will warn and the change persists for the entire process.
Time mutations:
travel_to from ActiveSupport::Testing::TimeHelpers auto-reverts when used with a block. Outside a block, pair every travel_to with travel_back (Rails-style, not Timecop’s Timecop.return).
RSpec.configure do |config|
config.include ActiveSupport::Testing::TimeHelpers
config.after(:each) { travel_back }
end
Belt and braces: even if you forget the block form, the global after(:each) catches it.
How Mergify catches this before you ship
Random-order failures are easy to dismiss as “weird flake, retry it.” A retry loop usually wins because the next seed picks a different order. The team learns to ignore the failure category, which is exactly when a real regression slips through.
Test Insights records the seed for every CI run and tracks failures by their seed range. When a spec fails consistently under a specific subset of seeds and passes everywhere else, the dashboard surfaces the ordering signature: “OrderProcessor fails with seeds 40000-50000 only when UserSpec runs first.” You get the pair without running --bisect yourself.
Quarantine kicks in once the pattern is clear, so the merge queue keeps moving while you write the missing reset hook.
Want to know which of your specs have hidden coupling without running --bisect for an hour? Point Mergify at your repo. Native gem: rspec-mergify. One Gemfile line and you’re set.
More patterns like this
Order dependencies are one of the eight patterns in the flaky-tests-in-RSpec guide. The others are variants of the same theme: state that crosses specs because the cleanup did not run when expected. database_cleaner strategy mismatches under JS-driven Capybara, lazy let versus let! surprises, Mocha stubs that forgot to verify, travel_to without travel_back. One bug class, many faces.
Random order makes them findable. Without it, the same coupling sits in the suite for years until a refactor exposes it as a regression.