Incoming Webhooks
Receive alerts from monitoring platforms
- Datadog, New Relic, PagerDuty, Grafana
- Automatic incident creation
- Background processing
- Signature validation
Complete guide to developing webhook integrations for Overwatch, covering both incoming and outgoing webhook patterns.
Webhooks enable real-time communication between Overwatch and external services. The platform supports bidirectional webhook integration:
Incoming Webhooks
Receive alerts from monitoring platforms
Outgoing Webhooks
Send notifications to external systems
External Service → Webhook Endpoint → Signature Validation ↓ Background Task Processing ↓ Incident Creation/Update ↓ Notification DistributionKey Components:
Event Trigger → Webhook Configuration Lookup ↓ Template Rendering → HTTP Request ↓ Retry Logic (exponential backoff) ↓ Success/Failure LoggingKey Features:
All incoming webhooks follow consistent URL structure:
POST /api/v1/webhooks/{platform}/{integration_id}Supported Platforms:
/webhooks/datadog/{integration_id}/webhooks/newrelic/{integration_id}/webhooks/pagerduty/{integration_id}/webhooks/grafana/{integration_id}/webhooks/elasticsearch/{integration_id}/webhooks/signoz/{integration_id}/webhooks/prometheus/{integration_id}Outgoing webhooks are configured per integration:
{ "webhook_config": { "url": "https://external-system.com/api/notifications", "method": "POST", "headers": { "Authorization": "Bearer {{api_token}}", "Content-Type": "application/json" }, "events": ["incident.created", "incident.resolved"], "template": { "incident_id": "{{incident.id}}", "title": "{{incident.title}}", "severity": "{{incident.severity}}", "status": "{{incident.status}}" } }}Headers:
Content-Type: application/jsonX-Datadog-Signature: sha256=abc123...Payload:
{ "id": "1234567890", "alert_type": "error", "alert_name": "High CPU Usage", "alert_status": "triggered", "severity": "critical", "message": "CPU usage above 90%", "host": "web-server-01", "tags": ["env:production", "service:api"], "snapshot": "https://p.datadoghq.com/snapshot/...", "link": "https://app.datadoghq.com/monitors/..."}Headers:
Content-Type: application/jsonX-NewRelic-Signature: sha256=def456...Payload:
{ "issueId": "NRINCIDENT-123456", "state": "OPEN", "title": "Error rate above threshold", "priority": "CRITICAL", "createdAt": 1696768800000, "policyName": "Production APM Policy", "conditionName": "Error Rate", "violationChartUrl": "https://one.newrelic.com/..."}Headers:
Content-Type: application/jsonX-PagerDuty-Signature: v1=ghi789...Payload:
{ "messages": [ { "event": "incident.triggered", "incident": { "id": "PT4KHLK", "incident_number": 234, "title": "Database connection timeout", "status": "triggered", "urgency": "high", "html_url": "https://company.pagerduty.com/incidents/...", "service": { "id": "PEYSGVA", "name": "Production Database" } } } ]}Success Response:
{ "success": true, "message": "Webhook received successfully", "data": { "integration_id": "uuid", "alert_id": "external-alert-id", "processing_status": "queued" }}Error Response:
{ "error": { "code": "WEBHOOK_VALIDATION_FAILED", "message": "Webhook signature validation failed", "details": { "expected_signature": "sha256=...", "received_signature": "sha256=..." } }}Always validate webhook signatures to ensure authenticity:
import hmacimport hashlib
def validate_webhook_signature( payload: bytes, signature: str, secret: str) -> bool: """Validate HMAC signature for webhook.""" expected_signature = hmac.new( secret.encode(), payload, hashlib.sha256 ).hexdigest()
return hmac.compare_digest( f"sha256={expected_signature}", signature )Implement rate limiting to prevent abuse:
from fastapi_limiter.depends import RateLimiter
@router.post("/webhooks/custom/{integration_id}")@limiter.limit("100/minute")async def custom_webhook( integration_id: str, request: Request): # Process webhook passRestrict webhook access to known IP ranges:
ALLOWED_IPS = [ "10.0.0.0/8", # Internal network "172.16.0.0/12", # VPN network "3.5.140.0/22", # Datadog US "162.247.240.0/22" # Datadog EU]
def validate_source_ip(request: Request) -> bool: """Validate webhook source IP.""" client_ip = request.client.host return any( ipaddress.ip_address(client_ip) in ipaddress.ip_network(allowed) for allowed in ALLOWED_IPS )Implement exponential backoff for outgoing webhooks:
async def send_webhook_with_retry( url: str, payload: dict, max_retries: int = 3): """Send webhook with exponential backoff retry.""" for attempt in range(max_retries): try: response = await httpx.post( url, json=payload, timeout=30 ) response.raise_for_status() return response except httpx.HTTPError as e: if attempt == max_retries - 1: raise
# Exponential backoff: 2^attempt seconds await asyncio.sleep(2 ** attempt)Auto-disable webhooks after repeated failures:
class WebhookCircuitBreaker: """Circuit breaker for failing webhooks."""
def __init__(self, failure_threshold: int = 10): self.failure_threshold = failure_threshold self.failure_count = 0 self.is_open = False
async def call(self, func, *args, **kwargs): """Execute webhook with circuit breaker.""" if self.is_open: raise Exception("Circuit breaker is open")
try: result = await func(*args, **kwargs) self.failure_count = 0 # Reset on success return result except Exception as e: self.failure_count += 1
if self.failure_count >= self.failure_threshold: self.is_open = True # Disable webhook integration await self.disable_webhook()
raiseUse ngrok or CloudFlare Tunnel for local webhook testing:
# Start ngrok tunnelngrok http 8000
# Your webhook URL becomes:https://abcd1234.ngrok.io/api/v1/webhooks/datadog/{integration_id}
# Configure this URL in external service (Datadog, PagerDuty, etc.)Test webhooks with curl:
# Test Datadog webhookcurl -X POST http://localhost:8000/api/v1/webhooks/datadog/{integration_id} \ -H "Content-Type: application/json" \ -H "X-Datadog-Signature: sha256=test" \ -d '{ "id": "test-alert-123", "alert_name": "Test Alert", "alert_type": "error", "severity": "high", "message": "This is a test alert" }'
# Test New Relic webhookcurl -X POST http://localhost:8000/api/v1/webhooks/newrelic/{integration_id} \ -H "Content-Type: application/json" \ -d '{ "issueId": "TEST-123", "state": "OPEN", "title": "Test Incident", "priority": "CRITICAL" }'Write integration tests for webhooks:
import pytestfrom httpx import AsyncClient
@pytest.mark.asyncioasync def test_datadog_webhook(client: AsyncClient, integration_id: str): """Test Datadog webhook processing.""" payload = { "id": "test-123", "alert_name": "Test Alert", "severity": "high" }
response = await client.post( f"/api/v1/webhooks/datadog/{integration_id}", json=payload )
assert response.status_code == 200 assert response.json()["success"] is TrueAll webhook processing is logged with correlation IDs:
logger.info( "Webhook received", extra={ "integration_id": integration_id, "platform": "datadog", "alert_id": payload.get("id"), "correlation_id": request.headers.get("X-Request-ID") })Track webhook performance metrics:
from prometheus_client import Counter, Histogram
webhook_requests = Counter( 'webhook_requests_total', 'Total webhook requests', ['platform', 'status'])
webhook_latency = Histogram( 'webhook_processing_seconds', 'Webhook processing time', ['platform'])Monitor webhook endpoint health:
@router.get("/webhooks/health")async def webhook_health(): """Health check for webhook endpoints.""" return { "status": "healthy", "endpoints": { "datadog": "operational", "newrelic": "operational", "pagerduty": "operational" } }For webhook development questions, contact support@overwatch-observability.com.
Related Documentation: