Search Features

Overwatch provides intelligent, AI-powered semantic search to help you find solutions faster. Unlike traditional keyword search, semantic search understands the meaning and context of your queries, delivering relevant results even when exact words don’t match.

Understanding Semantic Search

What is Semantic Search?

Semantic search uses vector embeddings and machine learning to understand the meaning of your query, not just match keywords. This means you can:

Describe problems naturally: “database keeps timing out” finds solutions even if they mention “connection failures” or “query performance”
Find conceptually similar content: Search for “API errors” and get results about HTTP failures, timeout issues, and connection problems
Get context-aware results: Search considers your incident history, tech stack, and environment

How It Works

The search system uses a sophisticated 3-layer architecture that progressively searches for solutions:

Layer 1 (Customer-Specific) → Layer 2 (Public Knowledge) → Layer 3 (LLM-Generated)

Each layer provides increasing breadth with appropriate confidence adjustments.

Using Search

Search Interface

Access search from anywhere in the platform:

From Dashboard

Dashboard → Search icon (magnifying glass) in header

Keyboard Shortcut

Windows/Linux: Ctrl + K
Mac: Cmd + K

From Any Page The global search bar is always available in the top navigation.

Search Workflow

Enter your query: Describe your problem in natural language
Review results: Results appear instantly with confidence scores
Filter if needed: Use filters to narrow by type, date, or tags
Select solution: Click on result to view details
Execute or adapt: Use the solution directly or adapt to your situation

Example Searches

Good Search Queries

✅ "database connection timeout issues in production"
✅ "kubernetes pod keeps crashing with OOM error"
✅ "nginx returning 502 gateway errors"
✅ "redis memory usage high how to fix"
✅ "procedures for restarting microservices safely"

Less Effective Searches

❌ "error" (too vague)
❌ "fix" (no context)
❌ "DB" (use full words for better results)
❌ "it doesn't work" (be specific about what's failing)

Search Capabilities

Natural Language Queries

Search understands natural language, so describe your problem as you would to a colleague:

Examples

“How do I restart a database without downtime?”
“API is slow, need to investigate performance”
“Memory leak in Node.js application”
“Rollback last deployment to production”

Semantic Matching

The system finds conceptually related content, not just exact matches:

Query: “application crashes on startup”

Matches:

Procedures mentioning “service fails to initialize”
Incidents about “boot loop errors”
Solutions discussing “startup configuration issues”

Why? These are semantically related concepts even with different wording.

Cross-Content Search

Search spans all platform content:

Content Type	What’s Searched
Incidents	Titles, descriptions, comments, resolution notes
Procedures	Names, descriptions, step instructions, outcomes
Executions	Notes, observations, troubleshooting steps
Comments	All discussion threads and annotations

Contextual Ranking

Results are ranked by multiple factors:

Semantic relevance: How closely the meaning matches your query
Success rate: Procedures with higher success rates rank higher
Recency: More recent solutions get a boost
Your history: Content you’ve used successfully ranks higher
Organization context: Matches from your tech stack prioritized

3-Layer Search Architecture

Overwatch implements an intelligent 3-layer search system that provides progressively broader solutions with appropriate confidence levels.

Layer 1: Customer-Specific Solutions

Status: Phase 2 (Coming Soon)

What It Is: Your organization’s private knowledge base of historical solutions.

Characteristics:

Highest confidence (0.8 - 1.0)
Organization-specific patterns and fixes
Learns from your team’s successful resolutions
Includes your runbooks and procedures
Adapts to your tech stack and infrastructure

When It Activates:

Always checked first (when enabled in Phase 2)
Provides matches from your incident history
Uses your organization’s terminology and processes

Layer 2: Public Knowledge Base

Status: Active (Current Phase 1)

What It Is: Community-driven knowledge base powered by Weaviate vector database.

Characteristics:

Good confidence (0.7 - 0.9)
Best practices and proven solutions
Continuously updated with community contributions
Multiple namespaces for organized content
Enhanced metadata for context

Content Sources:

Community-contributed solutions
Validated procedures from multiple organizations
Best practices and standard approaches
Integration-specific troubleshooting guides

Search Strategy:

1. Try organization-specific namespace
2. Fall back to "enhanced-solutions" namespace
3. Fall back to "public-solutions" namespace

Layer 3: LLM-Generated Solutions

Status: Available (With Cost Controls)

What It Is: AI-generated solutions via AWS Bedrock when no existing solutions found.

Characteristics:

Moderate confidence (0.6 - 0.8)
Dynamic confidence based on context quality
Generated on-demand for novel problems
Cost-optimized with semantic caching
Multiple LLM providers for different complexity levels

LLM Providers:

Provider	Use Case	Cost	Speed
Nova Lite	Simple, straightforward incidents	Lowest	Fastest
Claude Sonnet 4	Standard production use, real-time alerts	Medium	Fast
Claude Opus 4	Complex incidents, root cause analysis	Highest	Slower
Nova Pro	Multimodal analysis (with images/charts)	Medium	Fast

When It Activates:

No results found in Layer 1 or Layer 2
Novel incidents without historical precedent
Complex multi-component failures
When explicitly requested with LLM-specific search

Cost Optimization:

Semantic caching: 30-50% cost reduction by caching similar queries
Provider selection: Automatically chooses cheapest appropriate model
Budget controls: Monthly limits and alerts to prevent overspend
Cache hits are free: Cached responses cost nothing

Understanding Search Results

Confidence Scores

Every search result includes a confidence score to help you evaluate its relevance:

Score Range	Meaning	Layer	Recommendation
0.9 - 1.0	Exact match	Layer 1	Strongly recommended - follow directly
0.8 - 0.89	Very high confidence	Layer 1	Recommended - likely to resolve issue
0.7 - 0.79	Good match	Layer 2	Recommended - adapt to your environment
0.6 - 0.69	Moderate match	Layer 2/3	Consider carefully - may need customization
0.5 - 0.59	Possible match	Layer 3	Use caution - generated solution
< 0.5	Low confidence	Layer 3	Review thoroughly - may not apply

Result Information

Each search result displays:

Header

Title and brief description
Confidence score with visual indicator
Source layer (Customer/Public/LLM)
Result type (Incident/Procedure/Solution)

Content Preview

First few lines of content
Relevant keywords highlighted
Success rate (for procedures)
Last used date

Metadata

Tags and categories
Estimated time to execute
Complexity level
Prerequisites required

Result Actions

Available actions depend on result type:

For Procedures

View full procedure
Execute immediately
Copy to clipboard
Save to favorites
Share with team

For Incidents

View incident details
See resolution steps
Copy resolution notes
Link to current incident
View related incidents

For LLM Solutions

View generated steps
Execute as procedure
Provide feedback
Report issues
Request regeneration

Filtering Results

Available Filters

Refine search results using filters:

Content Type

Incidents
Procedures
Executions
Solutions

Time Range

Last 24 hours
Last 7 days
Last 30 days
Last 90 days
Custom range

Confidence

High confidence only (≥ 0.8)
Medium+ (≥ 0.7)
All results

Source Layer

Customer-specific (Layer 1)
Public knowledge (Layer 2)
LLM-generated (Layer 3)

Tags

Filter by any incident/procedure tags
Combine multiple tags
Exclude specific tags

Success Rate (for procedures)

High success (≥ 80%)
Medium success (≥ 60%)
All procedures

Filter Combinations

Combine filters for precise results:

Example 1: Recent High-Confidence Solutions

Time Range: Last 7 days
Confidence: ≥ 0.8
Content Type: Procedures
Success Rate: ≥ 80%

Example 2: Database Issues

Tags: database, postgresql
Content Type: All
Confidence: ≥ 0.7

Advanced Search Techniques

Query Formulation Tips

Be Specific

✅ "PostgreSQL connection pool exhausted in production API"
❌ "database problem"

Include Context

✅ "kubernetes pod OOM killed in staging namespace"
❌ "pod crashed"

Mention Technology

✅ "nginx 502 error after deployment to AWS ECS"
❌ "gateway error"

Describe Symptoms

✅ "Redis memory usage 90% causing slow response times"
❌ "Redis issue"

Query Operators

While semantic search understands natural language, you can use operators for precise control:

AND (implicit)

"database timeout production"
(finds results containing all these concepts)

"postgresql OR mysql connection issues"
(finds either database type)

Exclude with Minus

"kubernetes crash -memory"
(excludes memory-related issues)

Exact Phrase with Quotes

"connection refused" nginx
(exact phrase + semantic match)

Search by Example

Find solutions similar to a specific incident:

From Incident Detail Page

Incident Detail → "Find Similar" button → Results appear

This searches using the incident’s full context, not just title.

AI-Powered Suggestions

Automatic Suggestions

The system provides proactive suggestions in these scenarios:

When Creating Incidents

Suggests similar incidents as you type
Recommends relevant procedures based on description
Highlights recent similar issues
Shows if issue has known solutions

When Viewing Incidents

Automatically searches for solutions
Displays top 5 most relevant results
Updates as you add more details
Learns from successful resolutions

During Procedure Execution

Suggests next best steps
Recommends troubleshooting procedures
Identifies common failure points
Links to relevant documentation

Incident Resolution Suggestions

When viewing an incident, AI suggestions appear in sidebar:

Suggested Procedures

Procedures with high success rates for similar issues
Ranked by relevance and past success
Shows estimated resolution time
Includes difficulty level

Similar Incidents

Incidents with matching symptoms
Filtered by successful resolutions
Shows resolution time and steps
Links to full incident details

Related Solutions

Solutions from public knowledge base
LLM-generated guidance (if enabled)
External documentation links
Best practices articles

Contextual Intelligence

Suggestions improve with available context:

From Chrome Extension

Alert details from monitoring platform
Error messages and stack traces
Affected services and metrics
Monitoring dashboard links

From Incident Details

Technology stack information
Environment (production/staging)
Affected components
Previous similar incidents

From Team History

Your team’s successful resolution patterns
Procedures your organization uses frequently
Tech stack and infrastructure details
Team expertise and preferences

Vector Search Technology

How Embeddings Work

Content Transformation

All content (incidents, procedures, comments) converted to vector embeddings
Embeddings capture semantic meaning in 1536-dimensional space
Similar meanings cluster together in vector space
Search queries converted to same embedding space

Similarity Matching

Cosine similarity measures semantic closeness
Threshold of 0.7 minimum for quality results
Multiple namespace fallback for broader coverage
Real-time re-ranking based on success rates

Continuous Learning

The search system improves over time:

Pattern Recognition

Tracks which results users select
Learns which procedures succeed
Identifies common resolution paths
Adjusts ranking based on success

Feedback Loop

Successful resolutions boost similar content
Failed procedures get confidence reduction
User feedback affects future rankings
Team patterns influence organization results

Solution Capture (Layer 3)

15% of LLM solutions automatically captured
High-confidence solutions (≥ 0.8) promoted to Layer 2
Creates learning loop for continuous improvement
Your successful resolutions help entire community

Privacy and Security

Organization Isolation

Your data never shared with other organizations
Layer 1 completely private to your organization
Layer 2 public knowledge is opt-in only
LLM processing uses anonymization

Data Processing

All vector processing within platform infrastructure
No external training on your private data
Anonymization for LLM calls
Compliance with data residency requirements

Best Practices

Effective Search Strategies

1. Start Broad, Refine Narrow

First search: "database performance issues"
Refine with: "postgresql slow queries production"
Narrow to: "postgresql query planner index selection"

2. Use Natural Language

✅ "How do I restart nginx without dropping connections?"
✅ "Why is my pod constantly restarting?"
✅ "Best way to rollback a Kubernetes deployment"

3. Include Error Messages

✅ "Connection refused error when connecting to Redis"
✅ "ECONNRESET socket hang up in Node.js API"
✅ "HTTP 503 service unavailable from load balancer"

4. Mention Your Stack

✅ "Docker container memory limit exceeded"
✅ "AWS ECS task health check failing"
✅ "Kubernetes ingress SSL certificate error"

5. Review Multiple Results Don’t just click the first result - review top 3-5 results to find best fit for your situation.

Query Examples by Scenario

Infrastructure Issues

"kubernetes pod crashloopbackoff"
"AWS EC2 instance high CPU usage"
"nginx upstream timeout"
"docker container out of memory"

Application Errors

"Node.js application memory leak"
"Python API returning 500 errors"
"Java heap space error"
"database connection pool exhausted"

Deployment Problems

"rollback failed deployment"
"blue-green deployment stuck"
"cannot update kubernetes deployment"
"terraform apply failed"

Performance Issues

"slow database queries"
"high API latency"
"Redis memory usage increasing"
"load balancer slow response"

Security and Access

"unauthorized access to API"
"certificate expired"
"authentication failing"
"RBAC permission denied"

Maximizing AI Benefits

Provide Rich Context

Detailed incident descriptions improve suggestions
Include error messages verbatim
Mention what you’ve already tried
Specify environment and tech stack

Use Consistent Terminology

Standardize tag names across team
Use full technology names (not abbreviations)
Consistent service naming
Standard severity classifications

Document Resolutions

Write detailed resolution notes
Document root cause clearly
Include prevention measures
Add relevant tags for searchability

Leverage Chrome Extension

Automatic context extraction
Alert details captured automatically
Monitoring platform integration
Faster incident creation with better data

Provide Feedback

Mark helpful suggestions
Report irrelevant results
Share successful resolutions
Update procedures with learnings

Keyboard Shortcuts

Master these shortcuts for faster search:

Shortcut	Action
`Ctrl/Cmd + K`	Open global search
`Esc`	Close search modal
`↑` `↓`	Navigate results
`Enter`	Select result
`Ctrl/Cmd + Enter`	Open in new tab
`Tab`	Move to filters
`/`	Focus search (from anywhere)

Troubleshooting

Common Issues

No Results Found

Cause: Query too specific or unusual issue Solutions:

Broaden your search terms
Try different phrasing
Remove specific details (version numbers, IDs)
Check spelling of technical terms
Wait for Layer 3 LLM generation (if enabled)

Only Low-Confidence Results

Cause: Novel issue or insufficient context Solutions:

Add more details to search query
Include error messages or symptoms
Try related technology terms
Review LLM-generated solutions (Layer 3)
Create new procedure from scratch

Results Not Relevant

Cause: Semantic mismatch or wrong context Solutions:

Use more specific technical terms
Add exact error messages
Include technology stack details
Use filters to narrow results
Try exact phrase matching with quotes

Search is Slow

Cause: Large result set or system load Solutions:

Add filters to narrow scope
Use more specific queries
Check system status dashboard
Try again during off-peak hours
Contact admin if persistent

Layer 3 LLM Issues

Budget Exceeded Error

Message: “LLM usage blocked: Monthly cost hard limit reached”

Solution: Wait until next month or contact admin to increase budget

LLM Timeout

Message: “LLM request timed out after 30s”

Solution: Try again - may indicate high AWS Bedrock load

Rate Limited

Message: “ThrottlingException from AWS Bedrock”

Solution: System automatically retries with cheaper model (Nova Lite)

Getting Help

If search issues persist:

Check System Status: Dashboard → Status page for service health
Review Logs: Your queries may provide insights to admin
Contact Admin: Report persistent issues with example queries
Submit Feedback: Help improve search by reporting problems

Search Performance Tips

For Best Results

Do’s

✅ Use detailed, natural language descriptions
✅ Include specific error messages
✅ Mention technology stack and environment
✅ Review top 3-5 results before selecting
✅ Provide feedback on result quality
✅ Use filters to refine large result sets
✅ Save successful searches as favorites

Don’ts

❌ Use single word queries
❌ Rely only on top result
❌ Ignore confidence scores
❌ Skip context in queries
❌ Use abbreviations without full terms
❌ Forget to document successful resolutions

Optimizing Search for Your Team

Administrators can improve search performance:

Standardize Tags: Create taxonomy for consistent tagging
Document Resolutions: Require detailed resolution notes
Create Templates: Standard incident templates improve matching
Train Team: Ensure team understands semantic search benefits
Monitor Usage: Review search analytics to identify gaps
Capture Knowledge: Promote successful procedures to knowledge base

Next Steps

Incident Management - Create and manage incidents with search integration
Procedure Management - Execute procedures found through search
Analytics Dashboard - Track search usage and effectiveness
Chrome Extension - Automatic context for better search

Need Help?

In-App Help: Press ? key for keyboard shortcuts
Search Tips: Hover over search box for quick tips
Troubleshooting: See Common Issues
Support: Contact your system administrator

Last updated: October 2025 | Edit this page