Overview

Connect Prometheus AlertManager to Sizemotion to automatically create incidents when your alerts fire. AlertManager's webhook receiver will send alert groups to your team's webhook URL, creating incidents with full alert context including labels, annotations, and alert status.

Key Benefits:
  • Native AlertManager webhook format - zero custom configuration
  • Automatic severity mapping from Prometheus labels
  • Route different alert groups to different teams
  • Idempotency prevents duplicate incidents for the same alert
  • Only fires on "firing" status - ignores "resolved" alerts

Prerequisites

Before configuring the integration:

  • API Token - Generate from Admin Settings → API Tokens
  • Team ID - Available in Team Settings → Team Information
  • Prometheus & AlertManager - Configured with alert rules

Setup Steps

Step 1: Get Your Webhook URL

Each team has a unique webhook URL for routing alerts:

  1. Navigate to Team Settings in your workspace
  2. Locate the Team Information section
  3. Note your Team ID (e.g., 5)
  4. Your webhook URL format:
    https://sizemotion.com/api/v1/incidents/webhook/prometheus/{TEAM_ID}
Multiple Teams? Create separate AlertManager receivers for each team using routing rules to ensure proper alert routing.

Step 2: Create API Token

  1. Go to Admin Settings → API Tokens
  2. Click Create New Token
  3. Name it descriptively (e.g., "Prometheus Production")
  4. Save the token securely - it won't be shown again!

Step 3: Configure AlertManager Webhook

Add a webhook receiver to your AlertManager configuration (alertmanager.yml):

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'sizemotion-backend'
  routes:
    # Route critical alerts to backend team
    - match:
        team: backend
      receiver: 'sizemotion-backend'
    
    # Route frontend alerts
    - match:
        team: frontend
      receiver: 'sizemotion-frontend'

receivers:
  - name: 'sizemotion-backend'
    webhook_configs:
      - url: 'https://sizemotion.com/api/v1/incidents/webhook/prometheus/5'
        send_resolved: false
        http_config:
          authorization:
            type: Bearer
            credentials: 'YOUR_API_TOKEN_HERE'
  
  - name: 'sizemotion-frontend'
    webhook_configs:
      - url: 'https://sizemotion.com/api/v1/incidents/webhook/prometheus/8'
        send_resolved: false
        http_config:
          authorization:
            type: Bearer
            credentials: 'YOUR_API_TOKEN_HERE'
💡 Pro Tip: Set send_resolved: false to avoid creating incidents for resolved alerts. Sizemotion automatically filters out resolved alerts anyway, but this saves bandwidth.

Step 4: Add Labels to Your Prometheus Rules

Ensure your Prometheus alert rules include severity and team labels:


groups:
  - name: backend-alerts
    rules:
      - alert: HighCPUUsage
        expr: rate(process_cpu_seconds_total[1m]) > 0.9
        for: 5m
        labels:
          severity: critical
          team: backend
          service: api
        annotations:
          summary: "High CPU usage detected on {{ $labels.instance }}"
          description: "CPU usage is above 90% for more than 5 minutes."

Step 5: Test the Integration

Trigger a test alert or use amtool to send a test webhook:

amtool alert add test_alert \
  alertname="TestAlert" \
  severity="warning" \
  team="backend" \
  --alertmanager.url=http://localhost:9093

Alert Routing to Teams

Team-Based Routing

Use AlertManager routing rules to direct alerts to different teams based on labels:

route:
  receiver: 'default'
  routes:
    # Backend team - critical and high severity
    - match_re:
        team: backend
        severity: critical|error|high
      receiver: 'sizemotion-backend'
    
    # Frontend team - all severities
    - match:
        team: frontend
      receiver: 'sizemotion-frontend'
    
    # DevOps team - infrastructure alerts
    - match_re:
        alertname: ^(Node|Disk|Memory).*
      receiver: 'sizemotion-devops'

Severity Mapping

Prometheus severity labels automatically map to incident severities:

Prometheus Label Incident Severity Description
critical, page Sev 1 Service down / Requires immediate action
error, high, warning Sev 2 Major degradation / Needs attention
info, medium, low Sev 3 Minor issues / Default severity
debug Sev 4 Informational only

Common Scenarios

Kubernetes Pod Alerts (Backend Team)


groups:
  - name: kubernetes-pods
    rules:
      - alert: PodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
        for: 5m
        labels:
          severity: critical
          team: backend
          namespace: "{{ $labels.namespace }}"
        annotations:
          summary: "Pod {{ $labels.pod }} is crash looping"
          description: "Pod has restarted {{ $value }} times in the last 15 minutes"

Receiver: sizemotion-backend
Team ID: 5

Infrastructure Alerts (DevOps Team)


groups:
  - name: node-alerts
    rules:
      - alert: NodeDiskPressure
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
        for: 10m
        labels:
          severity: warning
          team: devops
          cluster: production
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"
          description: "Less than 10% disk space remaining on {{ $labels.device }}"

Receiver: sizemotion-devops
Team ID: 12

Application Performance (Frontend Team)


groups:
  - name: frontend-performance
    rules:
      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 10m
        labels:
          severity: high
          team: frontend
          service: web-app
        annotations:
          summary: "95th percentile response time is high"
          description: "95th percentile response time: {{ $value }}s"

Receiver: sizemotion-frontend
Team ID: 8

Troubleshooting

Webhook Not Creating Incidents

Check these common issues:

  • Authorization Header: Verify the API token is correct with "Bearer" type in http_config
  • Team ID: Ensure the team ID in the URL matches your actual team
  • API Token Permissions: Token must have incidents:create permission
  • URL Format: Should be /api/v1/incidents/webhook/prometheus/{team_id}
  • Receiver Name: Verify the receiver name in routing rules matches the receivers section

Duplicate Incidents

Duplicate prevention is automatic using alert fingerprints:

  • Same alert firing multiple times creates only one incident
  • Each unique alert (different labels) creates a separate incident
  • Fingerprints are based on alert labels and name
  • Resolved alerts are ignored - no incident updates yet

Missing Alert Data

If incident details are incomplete:

  • Ensure your Prometheus rules include severity labels
  • Add summary and description annotations for better incident descriptions
  • Check that AlertManager is sending the full webhook payload
  • Review the incident metadata - all Prometheus labels and annotations are preserved

Testing AlertManager Configuration

Validate your configuration before deploying:

# Check AlertManager configuration
amtool check-config alertmanager.yml

# Test routing rules
amtool config routes test \
  --config.file=alertmanager.yml \
  --tree \
  severity=critical team=backend

# View current AlertManager status
amtool --alertmanager.url=http://localhost:9093 alert query

Severity Not Mapping Correctly

Ensure your Prometheus rules use recognized severity labels:

  • Use severity: critical (not level: critical)
  • Check for typos in severity values (e.g., "critcal" vs "critical")
  • Default severity is Sev 3 when no severity label is present
  • Both severity and priority labels are supported

Getting Help

If issues persist:

  • Check Admin Settings → API Tokens for recent activity
  • Review AlertManager logs: kubectl logs -n monitoring alertmanager-0
  • See API Documentation for technical details
  • Contact your workspace administrator
✅ Integration Complete! Your Prometheus alerts will now automatically create incidents with full context. Configure routing rules to direct different alert types to the appropriate teams.

Next Steps