Response philosophy
Incident management in Sizemotion keeps teams calm, aligned, and learning. You treat incidents like ceremonies—not fire drills—so you can restore service while preserving psychological safety.
- Clarity: Everyone knows what the incident is, who owns the response, and where status updates appear.
- Respect: Accountability comes without blame; we focus on systems and humans equally.
- Reflection: Every incident becomes fuel for learning, not just firefighting.
Logging basics
Capture incidents directly from On-Call → Incidents. Aim for enough detail to explain impact, but keep entries concise so responders can act fast.
- Titles: Short summary + impacted service (e.g., “Payments API latency spikes”).
- Impact: Users, services, or business outcomes that suffered.
- Timeline: Detection, acknowledgement, mitigation, and resolution timestamps.
- Postmortem link: Attach follow-up doc for reviews.
Triage + severity
Severity tags (P1/P2/P3) determine who gets notified and what cadence applies. Automations pre-fill severity, but responders can adjust it as information arrives.
- P1: Major outage with leadership notified instantly.
- P2: Significant degradation with a defined mitigation plan.
- P3: A localized issue for future backlog grooming.
Post-incident handbook
Run lightweight, focused reviews to close the learning loop:
- Summarize the incident, timeline, and fixes.
- Share what slowed down and what helped.
- Define one to two action items + owners.
- Link actions back to OKRs or reliability goals.
Next steps
- Create and manage on-call schedules to keep your rotations healthy.
- Report an incident step-by-step so nothing gets missed.
- Understand incident capabilities like assignment, severity, and status controls.
- Conduct post-mortems to learn from incidents and prevent recurrence.