Skip to content

Welcome to Overwatch

Conversational AI that diagnoses incidents, executes commands, and resolves problems — right from your monitoring dashboard

Overwatch is an AI-powered incident resolution platform that provides conversational, turn-by-turn guidance for DevOps teams. It integrates directly into your monitoring tools through a Chrome extension, connects to your infrastructure through a local CLI agent, and uses intelligent model routing to keep costs low while delivering expert-level incident diagnosis.

AI Chat Interface

Conversational incident diagnosis powered by AWS Bedrock with 5-tier model routing — from quick triage to deep root-cause analysis

Chrome Extension

Auto-detects alerts on Datadog, Grafana, New Relic, PagerDuty, Prometheus, and more — opens an AI chat panel right in your monitoring dashboard

Helper CLI

Optional local agent that executes approved diagnostic commands (kubectl, aws, docker, gh) with rate limiting and safety controls

Smart Cost Optimization

5-tier model routing (Nova Micro through Opus) with semantic caching reduces AI costs 30-50% while maintaining response quality

  1. An alert fires on your monitoring platform (Datadog, Grafana, PagerDuty, etc.)
  2. The Chrome extension detects it and extracts alert context automatically
  3. You open the AI chat panel and describe what you’re seeing
  4. Overwatch diagnoses the issue using alert data, service registry context, and AI analysis
  5. The Helper CLI executes commands locally (with your approval) to gather diagnostic data
  6. You iterate until the incident is resolved, building organizational knowledge along the way
  • AI-Powered Chat: Multi-turn conversational diagnosis with context injection from alerts and service registry
  • Chrome Extension v3: Manifest V3, side-panel AI chat, alert auto-detection across 8+ platforms
  • Helper CLI Module: Local command execution agent with allowlist-based security (macOS, Linux, Windows)
  • Service Registry: Map monitoring alerts to GitHub repos and deploy targets across any cloud platform
  • Blast Radius Analysis: Visualize the scope of incidents across services and alert sources
  • Semantic Search: 3-layer search (Customer knowledge, Public knowledge, LLM) with vector caching
  • Procedure Management: Create, execute, and track incident resolution runbooks
  • LLM Cost Management: Per-organization quotas, model tier routing, and semantic caching
  • Real-Time Collaboration: WebSocket-based multi-user incident rooms
  • REST API: Complete API with interactive Swagger documentation

15-Minute Quickstart

Install the extension, connect a monitoring platform, and run your first AI chat session

Start Tutorial →

Chrome Extension

Install and configure the extension for your monitoring platforms

Extension Guide →

AI Chat Guide

Learn how to get the most out of conversational incident diagnosis

Chat Guide →

  • Observability: Datadog, New Relic, Grafana, Prometheus, Elasticsearch, SigNoz, AWS CloudWatch
  • Incident Management: PagerDuty
  • Communication: Slack
  • Deploy Targets: Railway, AWS ECS, Kubernetes, GCP Cloud Run, Azure, Vercel, Fly.io
  • CLI Tools: kubectl, aws, gh, docker, terraform, helm, and more via Helper CLI
  • Google Chrome (for the extension)
  • Modern web browser for the dashboard (Chrome, Firefox, Safari, Edge)
  • macOS, Linux, or Windows (for the Helper CLI)
  • Documentation: Browse the sidebar for comprehensive guides
  • Email Support: support@overwatch-observability.com
  • API Documentation: Interactive Swagger UI at /docs on your Overwatch instance

Ready to get started? Follow the Quickstart Guide to have Overwatch running in 15 minutes.