Platform Overview
Overwatch Platform Overview
Section titled “Overwatch Platform Overview”Version 2.0 | Last Updated: February 2026
Overwatch is an AI-powered incident resolution platform that helps DevOps teams diagnose and resolve incidents through conversational AI, directly integrated into the monitoring tools you already use.
What is Overwatch?
Section titled “What is Overwatch?”Overwatch combines three core components:
- Chrome Extension — Detects alerts on your monitoring dashboards and opens an AI chat panel for real-time diagnosis
- AI Chat — Conversational interface powered by AWS Bedrock with 5-tier model routing for cost-optimized incident analysis
- Helper CLI — Optional local agent that executes approved diagnostic commands on your infrastructure
Together, these provide a complete loop: detect an alert, diagnose it with AI, execute commands to gather data, and iterate until the problem is resolved.
Key Features
Section titled “Key Features”AI Chat Interface
Section titled “AI Chat Interface”- Conversational Diagnosis: Describe a problem in plain language and get step-by-step guidance
- Alert Context Injection: Alert data is automatically fed to the AI before analysis begins
- Multi-Turn Sessions: Linked to specific incidents for full conversation history
- Command Suggestions: AI suggests diagnostic commands that the Helper CLI can execute locally
Chrome Extension (v3)
Section titled “Chrome Extension (v3)”- Alert Auto-Detection: Content scripts monitor your monitoring dashboards for active alerts
- Side-Panel AI Chat: Open Overwatch’s AI chat directly from any monitoring platform (Ctrl+Shift+I / Cmd+Shift+I)
- Network Interception: Captures monitoring platform API responses for enriched context
- 8+ Platform Support: Datadog, Grafana, New Relic, PagerDuty, Prometheus, SigNoz, Elasticsearch, CloudWatch
Helper CLI Module
Section titled “Helper CLI Module”- Local Command Execution: Run kubectl, aws, docker, gh, and other CLI tools with AI guidance
- Environment Auto-Detection: Discovers your Kubernetes context, AWS profile, Docker setup, and installed tools
- Security Controls: Allowlist-based command validation, rate limiting (15 cmd/min), and audit logging
- Cross-Platform: macOS (ARM/x86), Linux (ARM/x86), Windows
Service Registry
Section titled “Service Registry”- Alert-to-Service Mapping: Map monitoring alerts to GitHub repos and deploy targets
- Multi-Cloud Support: Railway, AWS ECS, Kubernetes, GCP Cloud Run, Azure, Vercel, Fly.io
- AI Context Enrichment: Service registry data is injected into chat prompts so the AI knows your infrastructure
Smart Cost Optimization
Section titled “Smart Cost Optimization”- 5-Tier Model Routing: Nova Micro → Haiku → Sonnet → Opus → Weaviate fallback
- Semantic Caching: Reduces AI costs 30-50% by caching similar queries
- Organization Quotas: Per-org budget controls with admin overrides
- Per-Message Tracking: Decimal-precision cost tracking for every AI interaction
Integration Ecosystem
Section titled “Integration Ecosystem”Monitoring Platforms:
- Datadog, New Relic, Grafana, PagerDuty
- Prometheus, Elasticsearch, SigNoz, AWS CloudWatch
Communication: Slack webhooks and notifications
API-First Design:
- REST API with interactive Swagger documentation
- WebSocket API for real-time collaboration
- Webhook support for external notifications
Collaboration & Workflow
Section titled “Collaboration & Workflow”- Real-Time Updates: WebSocket-powered multi-user incident rooms
- Multi-Tenant Architecture: Organization-level data isolation with RBAC
- Role-Based Access: Engineer, Manager, Admin, and Viewer roles
- Procedure Management: Executable runbooks with step tracking and approval gates
Analytics & Monitoring
Section titled “Analytics & Monitoring”- Incident Analytics: MTTR tracking, severity trends, team performance
- LLM Cost Monitoring: Per-model cost breakdown, caching savings, budget alerts
- Procedure Analytics: Execution success rates and optimization insights
Architecture
Section titled “Architecture”Core Components
Section titled “Core Components”- Frontend: Next.js 15 dashboard with React 18 and TypeScript
- Backend: FastAPI async API with service-layer architecture
- Data Layer: PostgreSQL (relational), Redis (cache), Weaviate (vector search)
- Chrome Extension: Manifest V3 with side-panel chat interface
- Helper CLI: Rust-based local command execution agent
Search Architecture
Section titled “Search Architecture”The platform uses a progressive search strategy:
- Layer 1 (Customer): Organization-specific historical solutions
- Layer 2 (Public): Community knowledge base via Weaviate vector database
- Layer 3 (LLM): AI-generated solutions via AWS Bedrock with semantic caching
Multi-Tenant Isolation
Section titled “Multi-Tenant Isolation”- All database models scoped by
organization_id - Queries automatically filtered by organization context
- RBAC enforced at the service layer
- Zero cross-organization data visibility
Use Cases
Section titled “Use Cases”Incident Response Teams
Section titled “Incident Response Teams”- Real-time alert detection and AI-powered diagnosis
- Helper CLI for hands-on infrastructure debugging
- Procedure-guided resolution workflows
Site Reliability Engineers
Section titled “Site Reliability Engineers”- Service registry maps alerts to infrastructure components
- Blast radius analysis shows incident impact scope
- Analytics for MTTR trends and team performance
Engineering Managers
Section titled “Engineering Managers”- LLM cost management and quota controls
- Standardized procedures across teams
- Compliance-ready audit trails
What’s Next?
Section titled “What’s Next?”For New Users
Section titled “For New Users”- Quickstart Guide — Get running in 15 minutes
- Key Concepts — Core platform terminology
- AI Chat Guide — Learn conversational incident diagnosis
For Administrators
Section titled “For Administrators”- Organization Setup — Configure your organization
- Service Registry — Map alerts to infrastructure
- LLM Cost Management — Control AI spending
For Developers
Section titled “For Developers”- API Documentation — Explore the REST API
- Webhooks — Receive events from Overwatch
- API Examples — Code samples in multiple languages
For support, contact support@overwatch-observability.com or see the Troubleshooting Guide.