AI Chat

Overwatch’s AI Chat is a conversational interface that helps you diagnose and resolve incidents through multi-turn dialogue. Rather than searching through runbooks or escalating to senior engineers, you describe what is happening and the AI guides you step-by-step toward resolution — suggesting commands, analyzing output, and refining its diagnosis as new information comes in.

Accessing AI Chat

The primary way to use AI Chat is through the Chrome extension’s side panel.

Open the side panel:

Platform	Shortcut
Windows/Linux	`Ctrl+Shift+I`
macOS	`Cmd+Shift+I`

You can also click the Overwatch extension icon in your browser toolbar and select Open Chat Panel.

Each chat conversation is linked to a specific incident. You can start a new conversation from an existing incident in the Overwatch dashboard, or allow the Chrome extension to create one automatically when it detects an alert.

How AI Chat Works

The AI Chat follows an iterative loop of diagnosis, command execution, and refinement. Here is how a typical session progresses.

Describe the problem or let the extension detect it

You can type a description of what you are seeing, paste an error message, or let the Chrome extension auto-extract alert context from your monitoring platform. The more specific you are, the better the initial diagnosis.
AI analyzes available context

The AI combines multiple data sources to form its analysis:
- Alert payload data extracted by the extension
- Service registry context (which repo, deploy target, and team owns the service)
- Your organization’s historical incidents and resolutions
- The public knowledge base of known solutions
AI suggests diagnostic commands

If the Helper CLI is connected, the AI may suggest commands such as kubectl get pods, docker logs, or aws ecs describe-services to gather additional diagnostic information.
You approve and Helper executes

Commands are never run automatically. You review each suggestion and approve it. The Helper CLI executes the command locally on your machine and streams the output back to the chat.
AI refines its diagnosis

With the command output in hand, the AI narrows down the root cause and suggests the next step — another diagnostic command, a configuration change, a restart, or a code fix.
Repeat until resolved

The loop continues as many turns as needed. Each piece of new information helps the AI refine its understanding. When the issue is resolved, the conversation and resolution are captured for future reference.

Model Tiers

Overwatch uses a 5-tier model routing system powered by AWS Bedrock. The system automatically selects the appropriate model based on incident complexity, your organization’s remaining budget, and the nature of your query.

Tier	Model	Best For	Relative Cost
1	Amazon Nova Micro	Quick triage, status checks, simple questions	Lowest
2	Claude Haiku	Fast responses, minor incidents, straightforward diagnostics	Low
3	Claude Sonnet	Balanced quality and cost — the default for most conversations	Medium
4	Claude Opus	Complex root-cause analysis, multi-service failures, architectural issues	High
5	Weaviate fallback	Knowledge base search when a known solution already exists	Minimal

How Routing Decisions Are Made

The model router scores each query based on several factors:

Severity level of the linked incident
Technical complexity (infrastructure breadth, number of services involved)
Error density (volume and variety of error signals)
Specialty domain (AWS, Kubernetes, and database issues may route to models with stronger performance in those areas)
Budget remaining for the billing period

Full tiered routing is active. The system selects the model that best matches query complexity, up to and including Opus for the most challenging problems.

AI Quota and Cost Tracking

Each organization has a configurable AI usage quota that controls spending on LLM inference.

Key quota concepts:

Per-message cost tracking — Every AI response has an associated cost based on the model tier and token count. These costs are visible to administrators.
Monthly budget allocation — Administrators set a monthly budget ceiling. The model router adjusts its behavior as spending approaches the limit.
Usage dashboard — Administrators can monitor spending trends, per-user usage, and model tier distribution.

For detailed information on configuring quotas and monitoring costs, see the LLM Cost Management guide.

Tips for Effective Prompts

The quality of AI responses depends directly on the information you provide. Follow these guidelines to get the most accurate and actionable diagnosis.

Include Error Messages

Paste the exact error text, including timestamps and error codes. Paraphrasing errors loses critical detail that the AI uses for pattern matching.

Name the Service

Specify the affected service, application, or component by name. This allows the AI to pull context from the service registry, including the owning team and deploy target.

Describe Recent Changes

Mention any recent deployments, configuration changes, or infrastructure modifications. A large percentage of incidents correlate with recent changes.

Share Metrics and Logs

Include relevant numbers — error rates, latency percentiles, CPU or memory usage, request counts. Quantitative data helps the AI distinguish between symptoms and root causes.

Example Prompts

Effective prompt:

Our payment-service in production started returning 502 errors about 20 minutes ago. Error rate jumped from 0.1% to 12%. We deployed v2.4.1 an hour ago. The logs show “connection refused” from the downstream billing-api. No infrastructure changes.

Less effective prompt:

Payments are broken.

The first prompt gives the AI specific service names, error codes, timing, a recent deployment to investigate, and a downstream dependency to check. The second prompt forces the AI to ask clarifying questions before it can begin diagnosis.

Semantic Caching

To reduce costs and improve response times, Overwatch caches AI responses and reuses them when a semantically similar query is asked.

How it works:

When you send a message, the system generates an embedding (a vector representation) of your query.
If a cached response exists with a similarity score above the threshold (0.95), the cached answer is returned instead of calling the LLM.
Cache entries have a 30-day time-to-live and track hit counts for analytics.

What this means for you:

Common questions about well-known issues resolve faster (cached responses return in milliseconds).
Your organization saves 30-50% on AI inference costs over time.
The cache is scoped to your organization — other organizations cannot see or access your cached responses.
Unique or novel queries always go to the live LLM for a fresh response.

Security and Privacy

Overwatch applies multiple layers of protection to the AI Chat system.

Prompt injection detection All user input is screened for prompt injection patterns before being sent to the LLM. Attempts to manipulate the AI’s system instructions or extract internal configuration are blocked and logged.

Organization-scoped data isolation Every chat conversation, cached response, and knowledge base entry is scoped to your organization. There is no cross-organization data leakage. Multi-tenant isolation is enforced at the database, cache, and vector search layers.

No training on customer data Your conversations and incident data are never used to train or fine-tune AI models. Data flows to AWS Bedrock for inference only and is not retained by the model provider.

Encrypted in transit All communication between the Chrome extension, the Overwatch backend, and AWS Bedrock uses TLS 1.3 encryption.

Troubleshooting

AI Chat Not Responding

Verify you are logged into the Overwatch extension (check for your avatar in the side panel).
Confirm the WebSocket connection indicator shows green in the dashboard.
Check that your organization’s AI quota has not been exhausted.
Refresh the side panel by closing and reopening it (Ctrl+Shift+I / Cmd+Shift+I).

Responses Seem Generic or Unhelpful

Add more context to your messages — specific service names, error messages, and metrics.
Confirm the incident is linked to the correct service in the service registry.
Check whether the conversation has drifted from the original topic. Starting a new chat session can help.

Helper CLI Commands Not Appearing

Verify the Helper CLI is installed and running (overwatch-helper status).
Confirm the Helper is authenticated with your Overwatch instance.
Check that the command you expect is on the Helper’s allowlist. See the Helper CLI documentation for supported commands.

Cached Responses Feel Stale

Add unique, situation-specific detail to your query to bypass the semantic cache.
Administrators can clear the organization cache from the settings panel if needed.

Next Steps

Chrome Extension — Install and configure the browser extension
Incident Management — Create and track incidents linked to chat sessions
Search Features — Use semantic search across your knowledge base
LLM Cost Management — Configure quotas and monitor AI spending

Need Help?

In-App Help: Press ? for keyboard shortcuts and contextual guidance
Troubleshooting: See Common Issues
Support: Contact support@overwatch-observability.com

Last updated: February 2026 | Edit this page