Skip to main content

How to Detect Bad Deployments

Detecting a bad deployment quickly reduces user impact and shortens the triage window. This guide outlines signals and processes engineers use to detect regressions after releases.

What is a bad deployment

A bad deployment is a release that introduces functional regressions, performance regressions, or availability issues relative to the baseline.

Why this problem happens

  • Missing pre/post-release baselines
  • Insufficient monitoring coverage for critical user paths
  • Slow incident detection and alerting thresholds set too high

How engineers debug this

  1. Compare metrics for key user journeys pre- and post-deploy.
  2. Inspect error rates and latency percentiles around the deploy timestamp.
  3. Check for new exceptions or logged stack traces after the release.
  4. Validate feature flags and configuration applied at deploy time.
  5. If confirmed, roll back or perform a targeted fix and re-observe.

Best practices

  • Capture metric snapshots before deploying.
  • Automate smoke tests post-deploy that exercise critical flows.
  • Use progressive rollouts to limit exposure.

Tools that help

OctoLaunch compares pre/post deploy signals and surfaces deployments that most likely caused observable regressions. It points engineers to the relevant metrics, traces, and logs aligned with the release window.

FAQ

  • Q: How long after deploy should I consider an issue as caused by release?
    • A: Start with a short window (minutes) and expand to a few deployment propagation periods depending on your architecture.
  • Q: Can monitoring generate false positives for regressions?
    • A: Yes—validate metric anomalies with traces and user reproduction before rolling back.
  • Q: What baseline metrics should I capture?
    • A: Error rate, latency p95/p99, saturation (CPU/memory), and request volume for key endpoints.

Related reading: