Essentials to capture

  • Title: Summarize the customer/user impact.
  • Severity: Choose P1/P2/P3 based on outage, degradation, or backlog.
  • Impacted services: Mention products, APIs, or teams.
  • Responder: Assign a primary owner immediately.

Capture the timeline

Timestamps help with accurate MTTA/MTTR measurement:

  1. Detection: When the alert first fired or the issue was observed.
  2. Acknowledgement: When a human responded.
  3. Mitigation: When the emergency fix was applied.
  4. Resolution: When the service returned to expected behavior.

Context & evidence

  • Link dashboards, runbooks, alerts, or pull requests that show what happened.
  • Note what diagnostics were run (logs, traces, queries) so future responders start where you left off.
  • If chat discussions happened (Slack, PagerDuty), paste the relevant snippet or summary.

Collaborate & escalate

  • Use the "@mention" field to loop in owners, product managers, or leadership depending on severity.
  • Update the status (investigating, mitigated, monitoring) so other observers know the current state.
  • Escalate per your on-call policy when the usual responders are overloaded.