Glossary
Glossary
Section titled “Glossary”A comprehensive reference of terminology used across the Overwatch platform, documentation, and API.
A notification from a monitoring platform (such as Datadog, PagerDuty, or Prometheus) indicating that a threshold has been breached or an anomaly has been detected. Alerts arrive in Overwatch through webhook integrations and are parsed into a standardized format for analysis and incident creation.
Alert Parser
Section titled “Alert Parser”A backend service module that transforms a vendor-specific alert payload into Overwatch’s normalized alert schema. Each supported monitoring platform has its own parser (for example, datadog, prometheus, pagerduty). Parsers extract severity, source, timestamps, and metadata regardless of the originating platform’s data format.
Blast Radius
Section titled “Blast Radius”A measure of how many services, teams, or customers are potentially affected by an incident. Overwatch calculates blast radius by analyzing service dependencies and alert correlation patterns. It is surfaced in the incident detail view to help responders prioritize their efforts.
Chrome Extension
Section titled “Chrome Extension”The Overwatch browser extension for Google Chrome (and Chromium-based browsers). It provides a side panel that overlays on top of monitoring platforms like Datadog, enabling AI-powered chat, incident reporting, on-demand diagnostics, and contextual search without leaving the monitoring interface. See also Side Panel.
Correlation
Section titled “Correlation”The process of linking related alerts, metrics, and traces to a single incident. Overwatch uses temporal proximity, service dependency graphs, and semantic similarity to determine which signals belong together. Correlated data reduces alert fatigue and provides responders with a unified view of an issue.
Helper CLI
Section titled “Helper CLI”A lightweight command-line binary that runs on a user’s local machine. It connects to the Overwatch backend over a secure WebSocket, receives commands from AI chat sessions, and executes approved operations locally (such as running kubectl commands or querying logs). The Helper CLI is optional and operates under the user’s existing system permissions.
Incident
Section titled “Incident”The core entity in Overwatch. An incident represents an ongoing or resolved service disruption and contains a title, severity level, description, timeline of events, associated alerts, AI-generated analysis, and resolution steps. Incidents progress through statuses such as active, investigating, mitigated, and resolved.
Integration
Section titled “Integration”A configured connection between Overwatch and an external service. Integrations fall into two categories: inbound (monitoring platforms that send alerts to Overwatch via webhooks) and outbound (services that Overwatch notifies, such as Slack or PagerDuty). Each integration is scoped to an organization and stores its credentials securely.
Large Language Model. Overwatch uses LLMs (hosted through AWS Bedrock) to generate incident analysis, suggest resolution steps, power semantic search, and drive the AI chat interface. See also Model Tier.
Model Tier
Section titled “Model Tier”Overwatch routes LLM requests through a tiered model selection system to optimize cost and latency. Simpler queries use smaller, faster models (such as Amazon Nova Micro or Claude Haiku), while complex analysis escalates to larger models (such as Claude Sonnet or Claude Opus). The tier is selected automatically based on query complexity, but administrators can configure cost ceilings per organization.
Mean Time To Resolution. A key performance metric that measures the average elapsed time between when an incident is created and when it is marked as resolved. Overwatch tracks MTTR per team, per service, and per severity level in the Analytics dashboard.
Multi-Tenant
Section titled “Multi-Tenant”Overwatch’s data isolation architecture. Every database query, API response, and vector search is scoped to the requesting user’s organization. Users in one organization cannot view, search, or modify data belonging to another organization.
Organization
Section titled “Organization”The top-level account boundary in Overwatch. An organization has its own users, integrations, incidents, procedures, subscription plan, and vector search namespace. Every authenticated request is scoped to exactly one organization, derived from the user’s JWT token or API key.
Procedure
Section titled “Procedure”A structured, step-by-step operational runbook stored in Overwatch. Procedures define the actions required to diagnose or resolve a class of incident (for example, “Restart the payment service” or “Investigate database connection pool exhaustion”). Procedures can be executed manually or triggered by automation, and each execution is tracked with step-level status.
Role-Based Access Control. Overwatch assigns each user a role (such as owner, admin, engineer, or viewer) that determines which actions they can perform. Permissions follow the pattern resource:action (for example, incidents:create or procedures:execute). API keys can also be scoped with specific permission sets.
Resolution
Section titled “Resolution”The outcome record attached to a resolved incident. A resolution includes the root cause description, the steps taken to remediate the issue, and optionally a link to the procedure that was executed. Resolutions are indexed in the vector database so that Overwatch can suggest similar fixes for future incidents.
Runbook
Section titled “Runbook”See Procedure. The terms are used interchangeably. “Runbook” is the industry-standard term; “Procedure” is the entity name in the Overwatch data model and API.
Semantic Cache
Section titled “Semantic Cache”A caching layer that stores LLM responses keyed by the semantic meaning of the input query rather than its exact text. When a new query is semantically similar to a previously answered one (above a configurable similarity threshold), Overwatch returns the cached response instead of invoking the LLM again. This reduces latency and LLM costs.
Semantic Search
Section titled “Semantic Search”A search method that matches queries based on meaning rather than keyword overlap. Overwatch converts text into vector embeddings and stores them in Weaviate. When a user searches for “database connection timeout,” semantic search also returns results about “DB pool exhaustion” and “PostgreSQL max_connections exceeded” because they share conceptual similarity.
Service Registry
Section titled “Service Registry”A catalog of the services, applications, and infrastructure components that an organization operates. The service registry is managed in the Administrator settings and is used to enrich incident context, calculate blast radius, and route alerts to the correct on-call team.
Session
Section titled “Session”In the context of AI chat, a session represents a single conversational thread between a user and the Overwatch LLM. Sessions maintain context (including the current incident, related alerts, and previous messages) so that follow-up questions build on prior answers. Sessions can be closed and reopened, and their transcripts are stored for audit purposes.
Side Panel
Section titled “Side Panel”The Chrome extension’s primary interface. It renders as a panel on the right-hand side of the browser window (using the Chrome Side Panel API) and provides AI chat, incident search, contextual diagnostics, and on-demand reporting while the user remains on their monitoring platform.
Vector Database
Section titled “Vector Database”A specialized database optimized for storing and querying high-dimensional vector embeddings. Overwatch uses a vector database to power semantic search, semantic caching, and similarity-based incident correlation. See also Weaviate.
Weaviate
Section titled “Weaviate”The vector database used by Overwatch. Weaviate stores embeddings for incidents, resolutions, procedures, and knowledge base articles. It supports hybrid search (combining vector similarity with keyword filtering) and is deployed as a managed cloud instance. Each organization’s data is isolated into its own namespace.
WebSocket
Section titled “WebSocket”A persistent, bidirectional communication channel between the Overwatch frontend (or Chrome extension) and the backend. WebSocket connections deliver real-time updates for incident status changes, new alerts, AI chat messages, and procedure execution progress without requiring the client to poll the API.
Webhook
Section titled “Webhook”An HTTP callback that delivers data from one system to another when an event occurs. In Overwatch, inbound webhooks receive alert payloads from monitoring platforms (for example, Datadog sends a POST to /api/v1/webhooks/datadog when an alert fires). Outbound webhooks send notifications from Overwatch to external systems when incidents are created or resolved.