Common Workflows

Learn common patterns for using the Overwatch Chrome Extension effectively.

Alert Response Workflow

Scenario: Production alert triggered in monitoring platform

Alert Detection
- Monitoring platform triggers alert
- Navigate to alert details page
- Extension automatically detects context
Review Context
- Click extension icon to review extracted data
- Verify affected services and metrics
- Check severity and impact scope
Create Incident
- Click “Create Incident” in overlay
- Review pre-populated incident details
- Add any additional context
- Submit incident creation
Get AI Guidance
- Review suggested procedures
- Check similar past incidents
- Select most relevant solution
Execute Resolution
- Follow turn-by-turn guidance
- Monitor progress in real-time
- Document resolution steps
Verify Fix
- Confirm metrics return to normal
- Mark incident as resolved
- Capture learnings for team

Scenario: Investigating issue without formal alert

Identify Problem
- Notice anomaly in dashboard
- Observe performance degradation
- Detect error patterns in logs
Report Problem
- Press Ctrl+Shift+R or click extension icon
- Select “Report Problem”
- Describe issue in detail (minimum 50 chars)
Provide Context
- Extension auto-extracts dashboard context
- Add manual details about observations
- Select problem type and urgency
Get Solutions
- AI analyzes problem and context
- Solutions ranked by confidence
- Turn-by-turn guidance displayed
Follow Guidance
- Execute suggested troubleshooting steps
- Monitor results after each step
- Report success or failure
Document Outcome
- Mark solution as successful/unsuccessful
- Add notes about what worked
- Solution captured for future use

Scenario: Investigating performance degradation

Detect Degradation
- Review performance dashboards
- Identify affected metrics
- Note time range of issue
Capture Baseline
- Document normal performance metrics
- Note current degraded metrics
- Calculate deviation percentage
Report with Context
- Use on-demand reporting feature
- Describe performance issue with metrics
- Include baseline vs current comparison
Analyze Patterns
- AI identifies potential root causes
- Review suggested diagnostic procedures
- Check for recent changes
Execute Diagnostics
- Follow diagnostic procedures
- Collect additional metrics
- Narrow down root cause
Apply Fix
- Implement recommended solution
- Monitor performance recovery
- Verify return to baseline

Scenario: Issue affecting multiple monitoring platforms

Initial Detection
- Alert in primary monitoring platform
- Check other platforms for correlation
Gather Context from Each Platform
- Use extension on each platform
- Extract context from Datadog, New Relic, etc.
- Note timing and correlation
Create Unified Incident
- Create single incident with primary platform
- Add context from other platforms manually
- Document cross-platform correlation
Coordinated Investigation
- AI analyzes multi-platform context
- Identifies service dependencies
- Suggests investigation order
Execute Coordinated Fix
- Follow service-specific procedures
- Monitor all platforms during fix
- Verify resolution across platforms
Document Cascade
- Capture how issue propagated
- Document resolution sequence
- Share learnings with team

Scenario: Investigating application errors

Error Detection
- Error appears in monitoring platform
- Review error message and stack trace
- Check error frequency and affected users
Extract Error Context
- Use extension to capture error details
- Include stack trace and error message
- Note affected services and endpoints
Report or Create Incident
- Use on-demand reporting for investigation
- Or create formal incident if critical
- Include full error context
Analyze Root Cause
- AI analyzes error patterns
- Suggests potential code issues
- Links to similar past errors
Fix and Deploy
- Follow suggested code fixes
- Test fix in staging
- Deploy to production
Monitor Resolution
- Verify error frequency decreases
- Check affected user metrics
- Confirm fix is effective

Speed Up Your Workflow:

Related Documentation: