Skip to content

Analytics & Reporting

Overwatch collects data from every incident, procedure execution, and AI interaction. The Analytics section of the dashboard turns this data into actionable metrics that help you identify trends, measure team performance, and optimize costs.

Access analytics from the Analytics tab in the left sidebar of the Overwatch dashboard.

MTTR measures the average time between incident creation and resolution. The dashboard breaks MTTR down by:

  • Severity level: Track whether critical incidents are resolved faster than lower-severity ones
  • Service: Identify which services have the longest resolution times
  • Time period: Compare MTTR week-over-week or month-over-month to measure improvement

A declining MTTR trend indicates that your team is getting faster at diagnosing and fixing problems. A spike in MTTR for a specific service may signal growing technical debt or insufficient runbook coverage.

Tip: Filter MTTR by incidents that used the AI chat versus those resolved without it. This comparison shows the measurable impact of AI-assisted diagnosis on your resolution speed.

The severity distribution chart shows the proportion of incidents at each severity level (Critical, High, Medium, Low) over a selected time range. Use this to:

  • Validate that your alert thresholds are well-calibrated (too many Critical incidents may indicate thresholds that are too sensitive)
  • Track whether severity distribution shifts after infrastructure changes
  • Identify periods with unusual spikes in high-severity incidents

The resolution trends view plots incident volume and resolution rates over time. Key metrics include:

  • Incidents opened per day/week: Volume trend line
  • Incidents resolved per day/week: Closure rate
  • Open incident backlog: Running count of unresolved incidents
  • Resolution rate: Percentage of incidents resolved within SLA targets

Each team member’s profile page includes:

  • Incidents resolved: Total count and breakdown by severity
  • Average resolution time: Personal MTTR compared to team average
  • Procedures executed: Count and success rate
  • Active incidents: Current assignments

The team dashboard aggregates individual metrics into a team-level view:

  • Workload distribution: How incidents are distributed across team members
  • Response time: Average time from incident creation to first response
  • Collaboration frequency: Number of incidents with multiple contributors

Note: Team metrics are intended for identifying process improvements, not for individual performance evaluation. Use them to find bottlenecks and redistribute workload, not to rank team members.

Overwatch routes AI queries through a tiered model system. The cost dashboard tracks spending across all models and highlights optimization opportunities.

MetricDescription
Total spendCumulative cost for the billing period
Cost per modelBreakdown by model tier (Nova Micro, Haiku, Sonnet, Opus)
Queries per modelHow many requests each tier handled
Average cost per queryCost efficiency by model tier

The semantic cache intercepts repeated or similar queries and returns cached responses instead of making a new LLM call. The dashboard shows:

  • Cache hit rate: Percentage of queries served from cache
  • Estimated savings: Dollar amount saved by cache hits
  • Cache size: Number of cached query-response pairs
  • Top cached queries: Most frequently served cached responses

A healthy cache hit rate is typically between 30% and 50%. If your cache hit rate is below 20%, your team may be encountering mostly novel incidents, or the cache TTL may need adjustment.

Set monthly spending limits under Settings > Organization > AI Budget. The dashboard tracks progress against your budget with alerts at configurable thresholds:

  • 25%: Informational notification
  • 50%: Email to organization admins
  • 75%: Dashboard warning banner
  • 100%: Hard cutoff (AI features pause until the next billing cycle or the limit is raised)

Tip: Review the per-model breakdown monthly. If most queries are routed to higher-cost tiers (Sonnet, Opus), check whether your incident complexity genuinely requires those tiers, or if query refinement could allow lower-tier models to handle more cases.

Track the outcome of every procedure execution:

  • Success rate by procedure: Percentage of executions that completed without failures
  • Failure breakdown: Which steps fail most often and why
  • Approval bottlenecks: Average time spent waiting for approval gates

Procedures with success rates below 80% should be reviewed and updated. Common causes of failure include outdated commands, missing prerequisites, and ambiguous instructions.

Compare actual execution times against estimated durations:

  • Average execution time: How long procedures take in practice
  • Time per step: Identify steps that consistently take longer than estimated
  • Trend over time: Whether execution times improve as the team gains familiarity

Use this data to refine estimated durations in your procedures and identify steps that could be automated or simplified.

Analytics are most valuable when they drive action. Here is a practical review cycle:

  1. Weekly: Check MTTR trends and open incident backlog. Investigate any upward spikes.
  2. Bi-weekly: Review procedure success rates. Update or retire procedures that fail frequently.
  3. Monthly: Analyze LLM cost breakdown and caching efficiency. Adjust budget thresholds if needed.
  4. Quarterly: Assess team workload distribution and severity trends. Use findings to plan training, hiring, or infrastructure investment.

Export analytics data for use in external reports, presentations, or BI tools:

  1. Navigate to the analytics view you want to export
  2. Set the desired date range and filters
  3. Click Export in the top-right corner
  4. Choose a format:
    • CSV: Raw data for spreadsheet analysis or BI tool import
    • PDF: Formatted report with charts, suitable for stakeholder distribution

Exports include all data points visible in the current view, respecting any active filters.

Note: Exported data is scoped to your organization. It does not include data from other tenants. Exports are logged in the audit trail for compliance purposes.