CI Pipeline Flaky Tests
Flaky tests degrade developer confidence and slow down pipelines. This guide explains how to detect flaky tests, quarantine or fix them, and prevent recurrence.
What are flaky tests
Flaky tests are tests that intermittently fail without a deterministic code change. They often indicate timing, resource, or environmental coupling in test suites.
Why this problem happens
- Race conditions in tests
- Test dependencies on external services without proper mocking
- Resource contention on shared runners or test clusters
- Non-deterministic inputs such as timing or randomness
How engineers debug this
- Identify candidate flaky tests by scanning historical job results for inconsistent pass/fail patterns.
- Re-run failing tests in isolation to confirm flakiness.
- Add diagnostic logging and snapshot state when failures occur.
- Quarantine flaky tests OR add retries with caution while the real fix is developed.
- Fix root causes: remove timing dependencies, use deterministic seeds, and mock external services.
Best practices
- Tag flaky tests and run them separately from critical fast-path suites.
- Maintain a quarantine dashboard to track flake rates and remediation progress.
- Prefer small, deterministic unit tests and test doubles for external systems.
Tools that help
CI systems and OctoLaunch can surface flaky test trends grouped by test name and commit range. OctoLaunch helps correlate flakiness spikes with recent merges and environment changes.
FAQ
- Q: Should I automatically retry flaky tests?
- A: Use retries to reduce noise, but only as a temporary measure while fixing the underlying issue.
- Q: How can I find flaky tests in large suites?
- A: Aggregate historical test results and look for tests with high variance in pass rates.
- Q: Is a flaky test always low priority?
- A: No—some flaky tests exercise critical paths and deserve immediate attention.
Related reading: