Skip to content

Webhooks

Complete guide to developing webhook integrations for Overwatch, covering both incoming and outgoing webhook patterns.

Webhooks enable real-time communication between Overwatch and external services. The platform supports bidirectional webhook integration:

Incoming Webhooks

Receive alerts from monitoring platforms

  • Datadog, New Relic, PagerDuty, Grafana
  • Automatic incident creation
  • Background processing
  • Signature validation

Outgoing Webhooks

Send notifications to external systems

  • Custom HTTP endpoints
  • Incident status updates
  • Procedure execution results
  • Configurable retry logic
External Service → Webhook Endpoint → Signature Validation
Background Task Processing
Incident Creation/Update
Notification Distribution

Key Components:

  • Fast Response: Return 200 OK immediately (< 100ms)
  • Background Processing: Heavy lifting happens asynchronously
  • Signature Validation: HMAC-based webhook authenticity verification
  • Rate Limiting: Protection against webhook flooding
  • Idempotency: Handle duplicate webhook deliveries gracefully
Event Trigger → Webhook Configuration Lookup
Template Rendering → HTTP Request
Retry Logic (exponential backoff)
Success/Failure Logging

Key Features:

  • Template System: Dynamic payload generation with variables
  • Retry Mechanism: Automatic retry with exponential backoff
  • Circuit Breaker: Auto-disable failing webhooks after threshold
  • Audit Trail: Complete request/response logging

All incoming webhooks follow consistent URL structure:

POST /api/v1/webhooks/{platform}/{integration_id}

Supported Platforms:

  • Datadog: /webhooks/datadog/{integration_id}
  • New Relic: /webhooks/newrelic/{integration_id}
  • PagerDuty: /webhooks/pagerduty/{integration_id}
  • Grafana: /webhooks/grafana/{integration_id}
  • Elasticsearch: /webhooks/elasticsearch/{integration_id}
  • SigNoz: /webhooks/signoz/{integration_id}
  • Prometheus: /webhooks/prometheus/{integration_id}

Outgoing webhooks are configured per integration:

{
"webhook_config": {
"url": "https://external-system.com/api/notifications",
"method": "POST",
"headers": {
"Authorization": "Bearer {{api_token}}",
"Content-Type": "application/json"
},
"events": ["incident.created", "incident.resolved"],
"template": {
"incident_id": "{{incident.id}}",
"title": "{{incident.title}}",
"severity": "{{incident.severity}}",
"status": "{{incident.status}}"
}
}
}

Headers:

Content-Type: application/json
X-Datadog-Signature: sha256=abc123...

Payload:

{
"id": "1234567890",
"alert_type": "error",
"alert_name": "High CPU Usage",
"alert_status": "triggered",
"severity": "critical",
"message": "CPU usage above 90%",
"host": "web-server-01",
"tags": ["env:production", "service:api"],
"snapshot": "https://p.datadoghq.com/snapshot/...",
"link": "https://app.datadoghq.com/monitors/..."
}

Success Response:

{
"success": true,
"message": "Webhook received successfully",
"data": {
"integration_id": "uuid",
"alert_id": "external-alert-id",
"processing_status": "queued"
}
}

Error Response:

{
"error": {
"code": "WEBHOOK_VALIDATION_FAILED",
"message": "Webhook signature validation failed",
"details": {
"expected_signature": "sha256=...",
"received_signature": "sha256=..."
}
}
}

Always validate webhook signatures to ensure authenticity:

import hmac
import hashlib
def validate_webhook_signature(
payload: bytes,
signature: str,
secret: str
) -> bool:
"""Validate HMAC signature for webhook."""
expected_signature = hmac.new(
secret.encode(),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(
f"sha256={expected_signature}",
signature
)

Implement rate limiting to prevent abuse:

from fastapi_limiter.depends import RateLimiter
@router.post("/webhooks/custom/{integration_id}")
@limiter.limit("100/minute")
async def custom_webhook(
integration_id: str,
request: Request
):
# Process webhook
pass

Restrict webhook access to known IP ranges:

ALLOWED_IPS = [
"10.0.0.0/8", # Internal network
"172.16.0.0/12", # VPN network
"3.5.140.0/22", # Datadog US
"162.247.240.0/22" # Datadog EU
]
def validate_source_ip(request: Request) -> bool:
"""Validate webhook source IP."""
client_ip = request.client.host
return any(
ipaddress.ip_address(client_ip) in ipaddress.ip_network(allowed)
for allowed in ALLOWED_IPS
)

Implement exponential backoff for outgoing webhooks:

async def send_webhook_with_retry(
url: str,
payload: dict,
max_retries: int = 3
):
"""Send webhook with exponential backoff retry."""
for attempt in range(max_retries):
try:
response = await httpx.post(
url,
json=payload,
timeout=30
)
response.raise_for_status()
return response
except httpx.HTTPError as e:
if attempt == max_retries - 1:
raise
# Exponential backoff: 2^attempt seconds
await asyncio.sleep(2 ** attempt)

Auto-disable webhooks after repeated failures:

class WebhookCircuitBreaker:
"""Circuit breaker for failing webhooks."""
def __init__(self, failure_threshold: int = 10):
self.failure_threshold = failure_threshold
self.failure_count = 0
self.is_open = False
async def call(self, func, *args, **kwargs):
"""Execute webhook with circuit breaker."""
if self.is_open:
raise Exception("Circuit breaker is open")
try:
result = await func(*args, **kwargs)
self.failure_count = 0 # Reset on success
return result
except Exception as e:
self.failure_count += 1
if self.failure_count >= self.failure_threshold:
self.is_open = True
# Disable webhook integration
await self.disable_webhook()
raise

Use ngrok or CloudFlare Tunnel for local webhook testing:

Terminal window
# Start ngrok tunnel
ngrok http 8000
# Your webhook URL becomes:
https://abcd1234.ngrok.io/api/v1/webhooks/datadog/{integration_id}
# Configure this URL in external service (Datadog, PagerDuty, etc.)

Test webhooks with curl:

Terminal window
# Test Datadog webhook
curl -X POST http://localhost:8000/api/v1/webhooks/datadog/{integration_id} \
-H "Content-Type: application/json" \
-H "X-Datadog-Signature: sha256=test" \
-d '{
"id": "test-alert-123",
"alert_name": "Test Alert",
"alert_type": "error",
"severity": "high",
"message": "This is a test alert"
}'
# Test New Relic webhook
curl -X POST http://localhost:8000/api/v1/webhooks/newrelic/{integration_id} \
-H "Content-Type: application/json" \
-d '{
"issueId": "TEST-123",
"state": "OPEN",
"title": "Test Incident",
"priority": "CRITICAL"
}'

Write integration tests for webhooks:

import pytest
from httpx import AsyncClient
@pytest.mark.asyncio
async def test_datadog_webhook(client: AsyncClient, integration_id: str):
"""Test Datadog webhook processing."""
payload = {
"id": "test-123",
"alert_name": "Test Alert",
"severity": "high"
}
response = await client.post(
f"/api/v1/webhooks/datadog/{integration_id}",
json=payload
)
assert response.status_code == 200
assert response.json()["success"] is True

All webhook processing is logged with correlation IDs:

logger.info(
"Webhook received",
extra={
"integration_id": integration_id,
"platform": "datadog",
"alert_id": payload.get("id"),
"correlation_id": request.headers.get("X-Request-ID")
}
)

Track webhook performance metrics:

from prometheus_client import Counter, Histogram
webhook_requests = Counter(
'webhook_requests_total',
'Total webhook requests',
['platform', 'status']
)
webhook_latency = Histogram(
'webhook_processing_seconds',
'Webhook processing time',
['platform']
)

Monitor webhook endpoint health:

@router.get("/webhooks/health")
async def webhook_health():
"""Health check for webhook endpoints."""
return {
"status": "healthy",
"endpoints": {
"datadog": "operational",
"newrelic": "operational",
"pagerduty": "operational"
}
}

For webhook development questions, contact support@overwatch-observability.com.


Related Documentation: