Analytics & Reporting

Overwatch collects data from every incident, procedure execution, and AI interaction. The Analytics section of the dashboard turns this data into actionable metrics that help you identify trends, measure team performance, and optimize costs.

Access analytics from the Analytics tab in the left sidebar of the Overwatch dashboard.

Incident Analytics

Mean Time to Resolution (MTTR)

MTTR measures the average time between incident creation and resolution. The dashboard breaks MTTR down by:

Severity level: Track whether critical incidents are resolved faster than lower-severity ones
Service: Identify which services have the longest resolution times
Time period: Compare MTTR week-over-week or month-over-month to measure improvement

A declining MTTR trend indicates that your team is getting faster at diagnosing and fixing problems. A spike in MTTR for a specific service may signal growing technical debt or insufficient runbook coverage.

Tip: Filter MTTR by incidents that used the AI chat versus those resolved without it. This comparison shows the measurable impact of AI-assisted diagnosis on your resolution speed.

Severity Distribution

The severity distribution chart shows the proportion of incidents at each severity level (Critical, High, Medium, Low) over a selected time range. Use this to:

Validate that your alert thresholds are well-calibrated (too many Critical incidents may indicate thresholds that are too sensitive)
Track whether severity distribution shifts after infrastructure changes
Identify periods with unusual spikes in high-severity incidents

Resolution Trends

The resolution trends view plots incident volume and resolution rates over time. Key metrics include:

Incidents opened per day/week: Volume trend line
Incidents resolved per day/week: Closure rate
Open incident backlog: Running count of unresolved incidents
Resolution rate: Percentage of incidents resolved within SLA targets

Team Performance Metrics

Individual Metrics

Each team member’s profile page includes:

Incidents resolved: Total count and breakdown by severity
Average resolution time: Personal MTTR compared to team average
Procedures executed: Count and success rate
Active incidents: Current assignments

Team Overview

The team dashboard aggregates individual metrics into a team-level view:

Workload distribution: How incidents are distributed across team members
Response time: Average time from incident creation to first response
Collaboration frequency: Number of incidents with multiple contributors

Note: Team metrics are intended for identifying process improvements, not for individual performance evaluation. Use them to find bottlenecks and redistribute workload, not to rank team members.

LLM Cost Monitoring

Overwatch routes AI queries through a tiered model system. The cost dashboard tracks spending across all models and highlights optimization opportunities.

Per-Model Cost Breakdown

Metric	Description
Total spend	Cumulative cost for the billing period
Cost per model	Breakdown by model tier (Nova Micro, Haiku, Sonnet, Opus)
Queries per model	How many requests each tier handled
Average cost per query	Cost efficiency by model tier

Caching Savings

The semantic cache intercepts repeated or similar queries and returns cached responses instead of making a new LLM call. The dashboard shows:

Cache hit rate: Percentage of queries served from cache
Estimated savings: Dollar amount saved by cache hits
Cache size: Number of cached query-response pairs
Top cached queries: Most frequently served cached responses

A healthy cache hit rate is typically between 30% and 50%. If your cache hit rate is below 20%, your team may be encountering mostly novel incidents, or the cache TTL may need adjustment.

Budget Tracking

Set monthly spending limits under Settings > Organization > AI Budget. The dashboard tracks progress against your budget with alerts at configurable thresholds:

25%: Informational notification
50%: Email to organization admins
75%: Dashboard warning banner
100%: Hard cutoff (AI features pause until the next billing cycle or the limit is raised)

Tip: Review the per-model breakdown monthly. If most queries are routed to higher-cost tiers (Sonnet, Opus), check whether your incident complexity genuinely requires those tiers, or if query refinement could allow lower-tier models to handle more cases.

Procedure Analytics

Success Rates

Track the outcome of every procedure execution:

Success rate by procedure: Percentage of executions that completed without failures
Failure breakdown: Which steps fail most often and why
Approval bottlenecks: Average time spent waiting for approval gates

Procedures with success rates below 80% should be reviewed and updated. Common causes of failure include outdated commands, missing prerequisites, and ambiguous instructions.

Execution Times

Compare actual execution times against estimated durations:

Average execution time: How long procedures take in practice
Time per step: Identify steps that consistently take longer than estimated
Trend over time: Whether execution times improve as the team gains familiarity

Use this data to refine estimated durations in your procedures and identify steps that could be automated or simplified.

Using Analytics to Improve Processes

Analytics are most valuable when they drive action. Here is a practical review cycle:

Weekly: Check MTTR trends and open incident backlog. Investigate any upward spikes.
Bi-weekly: Review procedure success rates. Update or retire procedures that fail frequently.
Monthly: Analyze LLM cost breakdown and caching efficiency. Adjust budget thresholds if needed.
Quarterly: Assess team workload distribution and severity trends. Use findings to plan training, hiring, or infrastructure investment.

Exporting Data

Export analytics data for use in external reports, presentations, or BI tools:

Navigate to the analytics view you want to export
Set the desired date range and filters
Click Export in the top-right corner
Choose a format:
- CSV: Raw data for spreadsheet analysis or BI tool import
- PDF: Formatted report with charts, suitable for stakeholder distribution

Exports include all data points visible in the current view, respecting any active filters.

Note: Exported data is scoped to your organization. It does not include data from other tenants. Exports are logged in the audit trail for compliance purposes.

Next Steps

Incident Response Workflow — Apply analytics insights during live response
Creating Procedures — Improve procedures based on success rate data
Team Collaboration — Use team metrics to optimize collaboration patterns