Response philosophy

Incident management in Sizemotion keeps teams calm, aligned, and learning. You treat incidents like ceremonies—not fire drills—so you can restore service while preserving psychological safety.

  • Clarity: Everyone knows what the incident is, who owns the response, and where status updates appear.
  • Respect: Accountability comes without blame; we focus on systems and humans equally.
  • Reflection: Every incident becomes fuel for learning, not just firefighting.

Logging basics

Capture incidents directly from On-Call → Incidents. Aim for enough detail to explain impact, but keep entries concise so responders can act fast.

  • Titles: Short summary + impacted service (e.g., “Payments API latency spikes”).
  • Impact: Users, services, or business outcomes that suffered.
  • Timeline: Detection, acknowledgement, mitigation, and resolution timestamps.
  • Postmortem link: Attach follow-up doc for reviews.

Triage + severity

Severity tags (P1/P2/P3) determine who gets notified and what cadence applies. Automations pre-fill severity, but responders can adjust it as information arrives.

  • P1: Major outage with leadership notified instantly.
  • P2: Significant degradation with a defined mitigation plan.
  • P3: A localized issue for future backlog grooming.

Post-incident handbook

Run lightweight, focused reviews to close the learning loop:

  1. Summarize the incident, timeline, and fixes.
  2. Share what slowed down and what helped.
  3. Define one to two action items + owners.
  4. Link actions back to OKRs or reliability goals.

Next steps