Seamless Integration with Datadog Monitoring Platform
The Overwatch Chrome Extension provides comprehensive integration with Datadog, enabling automatic context extraction from alerts, metrics, logs, and dashboards.
Alert Detection
Automatic alert detection and context extraction
Monitor and SLO status extraction
Alert correlation and impact analysis
Metric Correlation
Time series data extraction
Metric correlation and analysis
Dashboard context capture
Log Aggregation
Log message extraction
Error pattern recognition
Stack trace capture
Dashboard Integration
Dashboard screenshot capture
Widget and panel extraction
Time range and filter context
Automatic Detection For :
app.datadoghq.com (US1)
app.datadoghq.eu (EU)
us3.datadoghq.com (US3)
us5.datadoghq.com (US5)
Custom Datadog instances (configure in settings)
Navigate to Datadog
Open any Datadog page (app.datadoghq.com)
Extension automatically detects platform
Grant Permissions
Click Overwatch extension icon
Select “Enable Datadog Integration”
Grant required permissions when prompted
Configure Settings
Enable auto-detect alerts: ✅ Recommended
Enable metric extraction: ✅ Recommended
Enable log extraction: ✅ Recommended
Enable screenshot capture: ❌ Optional (performance impact)
Verify Detection
Navigate to a Datadog alert
Extension icon should show active status
Context should be automatically extracted
Datadog-Specific Settings :
"domains" : [ " app.datadoghq.com " , " app.datadoghq.eu " ],
"screenshot_on_alert" : false ,
"notification_types" : [ " alerts " , " metrics " ],
"max_metric_points" : 1000
Alert Context :
Alert ID and name
Current status (OK, Alert, Warning, No Data)
Severity level
Trigger condition and threshold
Affected tags and hosts
Alert message and details
Example Extracted Data :
" alert_name " : " High Error Rate - API " ,
" message " : " Error rate above 5% for API endpoints " ,
" tags " : [ " env:production " , " service:api " ],
" time_range " : " last_15_minutes "
" affected_hosts " : [ " api-1 " , " api-2 " , " api-3 " ],
" dashboard_url " : " https://app.datadoghq.com/dashboard/... "
Automatic Workflow :
Alert Triggered
Datadog alert appears on dashboard or notification
Extension detects alert context automatically
Context Extraction
Alert details extracted (name, severity, status)
Related metrics and logs captured
Affected infrastructure identified
Incident Creation
Click extension icon
Review pre-populated incident form
Click “Create Incident” with full context
AI Suggestions
Overwatch suggests relevant procedures
Similar incidents displayed
Turn-by-turn guidance provided
What Gets Extracted from Dashboards :
Dashboard name and URL
Time range (from/to)
Visible widgets and their queries
Current metric values
Threshold configurations
Example Dashboard Context :
" id " : " dashboard-abc-123 " ,
" title " : " API Performance Dashboard " ,
" url " : " https://app.datadoghq.com/dashboard/abc-123 " ,
" query " : " sum:api.requests{env:production} " ,
Extracted from Monitor Pages :
Monitor configuration
Query and formula
Threshold settings (warning/critical)
Notification settings
Historical trigger data
What Gets Extracted :
Log messages and timestamps
Log levels (ERROR, WARN, INFO, DEBUG)
Service and source tags
Stack traces (if present)
Related logs in time window
Example Log Context :
" timestamp " : " 2025-10-15T10:25:00Z " ,
" service " : " api-service " ,
" message " : " DatabaseTimeoutException: Connection timeout after 30 seconds " ,
" tags " : [ " env:production " , " host:api-1 " ],
" error.kind " : " DatabaseTimeoutException " ,
" most_common " : " DatabaseTimeoutException "
Using with Log Explorer :
Open Log Explorer
Navigate to Logs → Explorer
Apply filters for your investigation
Review Logs
Identify relevant error patterns
Note timestamps and affected services
Report Problem
Click Overwatch extension icon
Or use Ctrl+Shift+R shortcut
Log context automatically extracted
Get Solutions
AI analyzes log patterns
Similar errors identified
Solutions suggested with steps
Scenario : High error rate alert triggered
Receive Alert
Datadog alert notification received
Navigate to alert details page
Extension detects alert context
Extract Context
Click Overwatch extension icon
Review extracted alert data
Verify affected services and metrics
Create Incident
Click “Create Incident” in overlay
Incident pre-populated with alert context
Tags automatically assigned
Get Guidance
AI suggests relevant procedures
View similar past incidents
Follow turn-by-turn resolution steps
Track Resolution
Execute suggested procedures
Monitor metrics for improvement
Document successful resolution
Scenario : Performance degradation observed on dashboard
Identify Issue
Reviewing API performance dashboard
Notice response time spike
Correlation with error rate increase
Report Problem
Press Ctrl+Shift+R to open report form
Describe: “API response time increased 10x in last hour”
Dashboard context automatically captured
Receive Analysis
AI analyzes metrics and trends
Identifies potential root causes
Suggests investigation procedures
Follow Guidance
Execute diagnostic procedures
Check database connections
Review recent deployments
Resolve and Learn
Identify root cause (connection pool exhaustion)
Apply fix and verify
Solution captured for future use
Scenario : Investigating complex issue across multiple services
Review Metrics
Open composite dashboard
Multiple services showing issues
Time correlation observed
Capture Context
Click extension icon on dashboard
Review extracted metrics from all widgets
Note time ranges and correlations
Create Detailed Report
Use “Report Problem” feature
Describe multi-service impact
Extension includes all dashboard context
Get Coordinated Response
AI identifies service dependencies
Suggests investigation order
Provides steps for each service
Execute Coordinated Fix
Follow service-specific procedures
Monitor cross-service metrics
Verify cascade resolution
Use Descriptive Alert Names : Helps with solution matching
Tag Consistently : Tags improve context extraction
Create Focused Dashboards : Better metric correlation
Use Log Patterns : Enable pattern recognition
Document Resolutions : Build organizational knowledge
Disable Screenshot Capture : Reduces overhead
Limit Log Extraction : Set reasonable max_log_lines
Configure Time Windows : Shorter windows = faster extraction
Use Selective Detection : Enable only needed features
Verify Extracted Context : Always review before creating incident
Add Manual Context : Supplement automated extraction
Update Tags : Keep Datadog tags current and accurate
Maintain Dashboards : Well-organized dashboards extract better
Symptoms : Extension icon grayed out on Datadog pages
Solutions :
Verify URL matches configured domains
Check if Datadog integration enabled in settings
Refresh page and wait for full load
Clear browser cache and reload
Check browser console for errors
Symptoms : Missing metrics or logs in extracted data
Solutions :
Wait for page to fully load before extraction
Verify extraction limits in settings
Check if data is visible on Datadog page
Try manual refresh and re-extract
Enable debug mode to see what’s extracted
Symptoms : Slow page load or browser lag on Datadog
Solutions :
Disable screenshot capture in settings
Reduce max_log_lines and max_metric_points
Disable auto-extraction and extract manually
Clear browser cache
Check for browser extension conflicts
Symptoms : “Unable to access Datadog data” errors
Solutions :
Verify Overwatch connection status
Check Datadog session is active
Re-authenticate with Datadog if needed
Verify browser permissions granted
Contact administrator for access issues
For Datadog-specific issues, contact support@overwatch-observability.com .
Related Documentation :