Run the new thing alongside the old thing, compare, then switch.
You're replacing a critical algorithm or data path and need high confidence that the new version produces correct results before switching. This is the technique for situations where "we'll test it in staging" isn't good enough — you need production traffic to validate correctness.
The key difference from Feature Flags is that here, both code paths run simultaneously and their outputs are compared, rather than just toggling between them. Use this technique when:
How this looks in your git history:
The team is rewriting a pricing engine that calculates order totals including tax, discounts, and shipping. The current implementation has known rounding issues caused by floating-point arithmetic — multiplying floats for tax and discount calculations can produce values like $10.9999999 instead of $11.00. The engine is "correct enough" today, but the team needs to ship a new implementation that fixes the rounding without introducing regressions on the thousands of other pricing scenarios in production.
We'll run both engines in parallel, compare their outputs, and switch only after confirming parity.
Create a PricingResult type and a comparePricingResults() utility that takes results from two pricing functions, normalizes for acceptable differences (rounding to 2 decimal places), and logs divergences with full context.
Wire the harness into the existing call site: run the old engine, call comparePricingResults() with a placeholder second argument, and return the old result. No user-visible behavior changes — the harness fires and logs, but has no second engine to compare against yet.
| line number | line content |
|---|
Create calculatePricingV2() with corrected integer-cents math instead of floating-point multiplication. The old engine multiplies floats (price * taxRate), which causes rounding drift. The new engine converts to cents first, performs integer arithmetic, then rounds once at the end.
This is dark code — not yet called from anywhere. The diff adds ~45 lines of new pricing logic alongside the existing function, with no changes to the call site.
| line number | line content |
|---|
Update the order call site to run both calculatePricing() and calculatePricingV2() on every request. Pass both results to comparePricingResults(), which logs any differences to the observability platform.
The old engine's result is still returned to the user — the new engine runs but its output is discarded. Engineers can now monitor the divergence logs to build confidence before the switch.
| line number | line content |
|---|
After observing zero divergence in production logs for one week, flip the call site to return calculatePricingV2()'s result. The old engine still runs for one more cycle as a comparison safety net — but users now see the new engine's output.
This is the payoff: a small, focused diff. The confidence comes from the data, not from the size of the change.
| line number | line content |
|---|
Delete calculatePricing(), delete comparePricingResults(), and simplify the order call site to a single direct call. The parallel infrastructure is gone. The new engine runs alone.
The diff shows the call site going from 12 lines back to 3. The two deleted files (src/lib/pricingComparison.ts and the dead calculatePricing() function in pricing.ts) are removed entirely. Clean final state.
| line number | line content |
|---|
Not accounting for the performance impact of running both paths. If each pricing calculation takes 50ms and you're running both, your latency just doubled. The example above mitigates this with fire-and-forget (void Promise.resolve().then(...)) — the second engine runs asynchronously so users don't wait for it. Consider whether your use case can tolerate this approach, or whether you need to run comparisons on a sampled subset of requests rather than 100%.
Ignoring non-deterministic differences in comparison logic. Timestamps, random IDs, and floating-point rounding will cause false positives in your divergence logs. Your comparison harness must normalize these out. In the pricing example, we compare in cents (integers) rather than dollars (floats) to eliminate float-comparison noise. If your outputs include generated IDs or timestamps, strip them before comparing.
Running the parallel comparison for too long. Set a specific timeline before you start — "one week of zero divergence, then we switch" — and commit to it. Indefinite parallel running wastes CPU, creates maintenance burden, and gives engineers false comfort from the safety net. The parallel phase should have a defined end condition and a hard deadline. If you can't get to zero divergence, that's a signal to investigate the new implementation, not to keep the parallel phase running forever.