Skip to content

Search Features

Overwatch provides intelligent, AI-powered semantic search to help you find solutions faster. Unlike traditional keyword search, semantic search understands the meaning and context of your queries, delivering relevant results even when exact words don’t match.

Semantic search uses vector embeddings and machine learning to understand the meaning of your query, not just match keywords. This means you can:

  • Describe problems naturally: “database keeps timing out” finds solutions even if they mention “connection failures” or “query performance”
  • Find conceptually similar content: Search for “API errors” and get results about HTTP failures, timeout issues, and connection problems
  • Get context-aware results: Search considers your incident history, tech stack, and environment

The search system uses a sophisticated 3-layer architecture that progressively searches for solutions:

Layer 1 (Customer-Specific) → Layer 2 (Public Knowledge) → Layer 3 (LLM-Generated)

Each layer provides increasing breadth with appropriate confidence adjustments.

Access search from anywhere in the platform:

From Dashboard

Dashboard → Search icon (magnifying glass) in header

Keyboard Shortcut

  • Windows/Linux: Ctrl + K
  • Mac: Cmd + K

From Any Page The global search bar is always available in the top navigation.

  1. Enter your query: Describe your problem in natural language
  2. Review results: Results appear instantly with confidence scores
  3. Filter if needed: Use filters to narrow by type, date, or tags
  4. Select solution: Click on result to view details
  5. Execute or adapt: Use the solution directly or adapt to your situation

Good Search Queries

✅ "database connection timeout issues in production"
✅ "kubernetes pod keeps crashing with OOM error"
✅ "nginx returning 502 gateway errors"
✅ "redis memory usage high how to fix"
✅ "procedures for restarting microservices safely"

Less Effective Searches

❌ "error" (too vague)
❌ "fix" (no context)
❌ "DB" (use full words for better results)
❌ "it doesn't work" (be specific about what's failing)

Search understands natural language, so describe your problem as you would to a colleague:

Examples

  • “How do I restart a database without downtime?”
  • “API is slow, need to investigate performance”
  • “Memory leak in Node.js application”
  • “Rollback last deployment to production”

The system finds conceptually related content, not just exact matches:

Query: “application crashes on startup”

Matches:

  • Procedures mentioning “service fails to initialize”
  • Incidents about “boot loop errors”
  • Solutions discussing “startup configuration issues”

Why? These are semantically related concepts even with different wording.

Search spans all platform content:

Content TypeWhat’s Searched
IncidentsTitles, descriptions, comments, resolution notes
ProceduresNames, descriptions, step instructions, outcomes
ExecutionsNotes, observations, troubleshooting steps
CommentsAll discussion threads and annotations

Results are ranked by multiple factors:

  1. Semantic relevance: How closely the meaning matches your query
  2. Success rate: Procedures with higher success rates rank higher
  3. Recency: More recent solutions get a boost
  4. Your history: Content you’ve used successfully ranks higher
  5. Organization context: Matches from your tech stack prioritized

Overwatch implements an intelligent 3-layer search system that provides progressively broader solutions with appropriate confidence levels.

Status: Phase 2 (Coming Soon)

What It Is: Your organization’s private knowledge base of historical solutions.

Characteristics:

  • Highest confidence (0.8 - 1.0)
  • Organization-specific patterns and fixes
  • Learns from your team’s successful resolutions
  • Includes your runbooks and procedures
  • Adapts to your tech stack and infrastructure

When It Activates:

  • Always checked first (when enabled in Phase 2)
  • Provides matches from your incident history
  • Uses your organization’s terminology and processes

Status: Active (Current Phase 1)

What It Is: Community-driven knowledge base powered by Weaviate vector database.

Characteristics:

  • Good confidence (0.7 - 0.9)
  • Best practices and proven solutions
  • Continuously updated with community contributions
  • Multiple namespaces for organized content
  • Enhanced metadata for context

Content Sources:

  • Community-contributed solutions
  • Validated procedures from multiple organizations
  • Best practices and standard approaches
  • Integration-specific troubleshooting guides

Search Strategy:

1. Try organization-specific namespace
2. Fall back to "enhanced-solutions" namespace
3. Fall back to "public-solutions" namespace

Status: Available (With Cost Controls)

What It Is: AI-generated solutions via AWS Bedrock when no existing solutions found.

Characteristics:

  • Moderate confidence (0.6 - 0.8)
  • Dynamic confidence based on context quality
  • Generated on-demand for novel problems
  • Cost-optimized with semantic caching
  • Multiple LLM providers for different complexity levels

LLM Providers:

ProviderUse CaseCostSpeed
Nova LiteSimple, straightforward incidentsLowestFastest
Claude Sonnet 4Standard production use, real-time alertsMediumFast
Claude Opus 4Complex incidents, root cause analysisHighestSlower
Nova ProMultimodal analysis (with images/charts)MediumFast

When It Activates:

  • No results found in Layer 1 or Layer 2
  • Novel incidents without historical precedent
  • Complex multi-component failures
  • When explicitly requested with LLM-specific search

Cost Optimization:

  • Semantic caching: 30-50% cost reduction by caching similar queries
  • Provider selection: Automatically chooses cheapest appropriate model
  • Budget controls: Monthly limits and alerts to prevent overspend
  • Cache hits are free: Cached responses cost nothing

Every search result includes a confidence score to help you evaluate its relevance:

Score RangeMeaningLayerRecommendation
0.9 - 1.0Exact matchLayer 1Strongly recommended - follow directly
0.8 - 0.89Very high confidenceLayer 1Recommended - likely to resolve issue
0.7 - 0.79Good matchLayer 2Recommended - adapt to your environment
0.6 - 0.69Moderate matchLayer 2/3Consider carefully - may need customization
0.5 - 0.59Possible matchLayer 3Use caution - generated solution
< 0.5Low confidenceLayer 3Review thoroughly - may not apply

Each search result displays:

Header

  • Title and brief description
  • Confidence score with visual indicator
  • Source layer (Customer/Public/LLM)
  • Result type (Incident/Procedure/Solution)

Content Preview

  • First few lines of content
  • Relevant keywords highlighted
  • Success rate (for procedures)
  • Last used date

Metadata

  • Tags and categories
  • Estimated time to execute
  • Complexity level
  • Prerequisites required

Available actions depend on result type:

For Procedures

  • View full procedure
  • Execute immediately
  • Copy to clipboard
  • Save to favorites
  • Share with team

For Incidents

  • View incident details
  • See resolution steps
  • Copy resolution notes
  • Link to current incident
  • View related incidents

For LLM Solutions

  • View generated steps
  • Execute as procedure
  • Provide feedback
  • Report issues
  • Request regeneration

Refine search results using filters:

Content Type

  • Incidents
  • Procedures
  • Executions
  • Solutions

Time Range

  • Last 24 hours
  • Last 7 days
  • Last 30 days
  • Last 90 days
  • Custom range

Confidence

  • High confidence only (≥ 0.8)
  • Medium+ (≥ 0.7)
  • All results

Source Layer

  • Customer-specific (Layer 1)
  • Public knowledge (Layer 2)
  • LLM-generated (Layer 3)

Tags

  • Filter by any incident/procedure tags
  • Combine multiple tags
  • Exclude specific tags

Success Rate (for procedures)

  • High success (≥ 80%)
  • Medium success (≥ 60%)
  • All procedures

Combine filters for precise results:

Example 1: Recent High-Confidence Solutions

Time Range: Last 7 days
Confidence: ≥ 0.8
Content Type: Procedures
Success Rate: ≥ 80%

Example 2: Database Issues

Tags: database, postgresql
Content Type: All
Confidence: ≥ 0.7

Be Specific

✅ "PostgreSQL connection pool exhausted in production API"
❌ "database problem"

Include Context

✅ "kubernetes pod OOM killed in staging namespace"
❌ "pod crashed"

Mention Technology

✅ "nginx 502 error after deployment to AWS ECS"
❌ "gateway error"

Describe Symptoms

✅ "Redis memory usage 90% causing slow response times"
❌ "Redis issue"

While semantic search understands natural language, you can use operators for precise control:

AND (implicit)

"database timeout production"
(finds results containing all these concepts)

OR

"postgresql OR mysql connection issues"
(finds either database type)

Exclude with Minus

"kubernetes crash -memory"
(excludes memory-related issues)

Exact Phrase with Quotes

"connection refused" nginx
(exact phrase + semantic match)

Find solutions similar to a specific incident:

From Incident Detail Page

Incident Detail → "Find Similar" button → Results appear

This searches using the incident’s full context, not just title.

The system provides proactive suggestions in these scenarios:

When Creating Incidents

  • Suggests similar incidents as you type
  • Recommends relevant procedures based on description
  • Highlights recent similar issues
  • Shows if issue has known solutions

When Viewing Incidents

  • Automatically searches for solutions
  • Displays top 5 most relevant results
  • Updates as you add more details
  • Learns from successful resolutions

During Procedure Execution

  • Suggests next best steps
  • Recommends troubleshooting procedures
  • Identifies common failure points
  • Links to relevant documentation

When viewing an incident, AI suggestions appear in sidebar:

Suggested Procedures

  • Procedures with high success rates for similar issues
  • Ranked by relevance and past success
  • Shows estimated resolution time
  • Includes difficulty level

Similar Incidents

  • Incidents with matching symptoms
  • Filtered by successful resolutions
  • Shows resolution time and steps
  • Links to full incident details

Related Solutions

  • Solutions from public knowledge base
  • LLM-generated guidance (if enabled)
  • External documentation links
  • Best practices articles

Suggestions improve with available context:

From Chrome Extension

  • Alert details from monitoring platform
  • Error messages and stack traces
  • Affected services and metrics
  • Monitoring dashboard links

From Incident Details

  • Technology stack information
  • Environment (production/staging)
  • Affected components
  • Previous similar incidents

From Team History

  • Your team’s successful resolution patterns
  • Procedures your organization uses frequently
  • Tech stack and infrastructure details
  • Team expertise and preferences

Content Transformation

  1. All content (incidents, procedures, comments) converted to vector embeddings
  2. Embeddings capture semantic meaning in 1536-dimensional space
  3. Similar meanings cluster together in vector space
  4. Search queries converted to same embedding space

Similarity Matching

  • Cosine similarity measures semantic closeness
  • Threshold of 0.7 minimum for quality results
  • Multiple namespace fallback for broader coverage
  • Real-time re-ranking based on success rates

The search system improves over time:

Pattern Recognition

  • Tracks which results users select
  • Learns which procedures succeed
  • Identifies common resolution paths
  • Adjusts ranking based on success

Feedback Loop

  • Successful resolutions boost similar content
  • Failed procedures get confidence reduction
  • User feedback affects future rankings
  • Team patterns influence organization results

Solution Capture (Layer 3)

  • 15% of LLM solutions automatically captured
  • High-confidence solutions (≥ 0.8) promoted to Layer 2
  • Creates learning loop for continuous improvement
  • Your successful resolutions help entire community

Organization Isolation

  • Your data never shared with other organizations
  • Layer 1 completely private to your organization
  • Layer 2 public knowledge is opt-in only
  • LLM processing uses anonymization

Data Processing

  • All vector processing within platform infrastructure
  • No external training on your private data
  • Anonymization for LLM calls
  • Compliance with data residency requirements

1. Start Broad, Refine Narrow

First search: "database performance issues"
Refine with: "postgresql slow queries production"
Narrow to: "postgresql query planner index selection"

2. Use Natural Language

✅ "How do I restart nginx without dropping connections?"
✅ "Why is my pod constantly restarting?"
✅ "Best way to rollback a Kubernetes deployment"

3. Include Error Messages

✅ "Connection refused error when connecting to Redis"
✅ "ECONNRESET socket hang up in Node.js API"
✅ "HTTP 503 service unavailable from load balancer"

4. Mention Your Stack

✅ "Docker container memory limit exceeded"
✅ "AWS ECS task health check failing"
✅ "Kubernetes ingress SSL certificate error"

5. Review Multiple Results Don’t just click the first result - review top 3-5 results to find best fit for your situation.

Infrastructure Issues

"kubernetes pod crashloopbackoff"
"AWS EC2 instance high CPU usage"
"nginx upstream timeout"
"docker container out of memory"

Application Errors

"Node.js application memory leak"
"Python API returning 500 errors"
"Java heap space error"
"database connection pool exhausted"

Deployment Problems

"rollback failed deployment"
"blue-green deployment stuck"
"cannot update kubernetes deployment"
"terraform apply failed"

Performance Issues

"slow database queries"
"high API latency"
"Redis memory usage increasing"
"load balancer slow response"

Security and Access

"unauthorized access to API"
"certificate expired"
"authentication failing"
"RBAC permission denied"

Provide Rich Context

  • Detailed incident descriptions improve suggestions
  • Include error messages verbatim
  • Mention what you’ve already tried
  • Specify environment and tech stack

Use Consistent Terminology

  • Standardize tag names across team
  • Use full technology names (not abbreviations)
  • Consistent service naming
  • Standard severity classifications

Document Resolutions

  • Write detailed resolution notes
  • Document root cause clearly
  • Include prevention measures
  • Add relevant tags for searchability

Leverage Chrome Extension

  • Automatic context extraction
  • Alert details captured automatically
  • Monitoring platform integration
  • Faster incident creation with better data

Provide Feedback

  • Mark helpful suggestions
  • Report irrelevant results
  • Share successful resolutions
  • Update procedures with learnings

Master these shortcuts for faster search:

ShortcutAction
Ctrl/Cmd + KOpen global search
EscClose search modal
Navigate results
EnterSelect result
Ctrl/Cmd + EnterOpen in new tab
TabMove to filters
/Focus search (from anywhere)

No Results Found

Cause: Query too specific or unusual issue Solutions:

  1. Broaden your search terms
  2. Try different phrasing
  3. Remove specific details (version numbers, IDs)
  4. Check spelling of technical terms
  5. Wait for Layer 3 LLM generation (if enabled)

Only Low-Confidence Results

Cause: Novel issue or insufficient context Solutions:

  1. Add more details to search query
  2. Include error messages or symptoms
  3. Try related technology terms
  4. Review LLM-generated solutions (Layer 3)
  5. Create new procedure from scratch

Results Not Relevant

Cause: Semantic mismatch or wrong context Solutions:

  1. Use more specific technical terms
  2. Add exact error messages
  3. Include technology stack details
  4. Use filters to narrow results
  5. Try exact phrase matching with quotes

Search is Slow

Cause: Large result set or system load Solutions:

  1. Add filters to narrow scope
  2. Use more specific queries
  3. Check system status dashboard
  4. Try again during off-peak hours
  5. Contact admin if persistent

Budget Exceeded Error

Message: “LLM usage blocked: Monthly cost hard limit reached”

Solution: Wait until next month or contact admin to increase budget

LLM Timeout

Message: “LLM request timed out after 30s”

Solution: Try again - may indicate high AWS Bedrock load

Rate Limited

Message: “ThrottlingException from AWS Bedrock”

Solution: System automatically retries with cheaper model (Nova Lite)

If search issues persist:

  1. Check System Status: Dashboard → Status page for service health
  2. Review Logs: Your queries may provide insights to admin
  3. Contact Admin: Report persistent issues with example queries
  4. Submit Feedback: Help improve search by reporting problems

Do’s

  • ✅ Use detailed, natural language descriptions
  • ✅ Include specific error messages
  • ✅ Mention technology stack and environment
  • ✅ Review top 3-5 results before selecting
  • ✅ Provide feedback on result quality
  • ✅ Use filters to refine large result sets
  • ✅ Save successful searches as favorites

Don’ts

  • ❌ Use single word queries
  • ❌ Rely only on top result
  • ❌ Ignore confidence scores
  • ❌ Skip context in queries
  • ❌ Use abbreviations without full terms
  • ❌ Forget to document successful resolutions

Administrators can improve search performance:

  1. Standardize Tags: Create taxonomy for consistent tagging
  2. Document Resolutions: Require detailed resolution notes
  3. Create Templates: Standard incident templates improve matching
  4. Train Team: Ensure team understands semantic search benefits
  5. Monitor Usage: Review search analytics to identify gaps
  6. Capture Knowledge: Promote successful procedures to knowledge base
  • In-App Help: Press ? key for keyboard shortcuts
  • Search Tips: Hover over search box for quick tips
  • Troubleshooting: See Common Issues
  • Support: Contact your system administrator

Last updated: October 2025 | Edit this page