Team Collaboration

Overwatch provides real-time collaboration features so your team can investigate and resolve incidents together. All changes sync instantly across connected browsers using WebSocket connections, and structured roles keep incident response organized.

Real-Time Updates

Every action taken on an incident is broadcast to all team members viewing that incident. This includes:

Status changes (New, In Progress, Resolved, Closed)
Severity updates
Assignee changes
New comments and @mentions
Procedure execution progress
AI chat activity from the Chrome extension

A green connection indicator in the bottom-left corner of the dashboard confirms your WebSocket connection is active. If the connection drops, Overwatch reconnects automatically and syncs any changes that occurred while you were disconnected.

Tip: Open the incident detail page in a separate browser tab during active response. This gives you a persistent view of team activity while you work in your monitoring dashboards and terminal.

Comments and @Mentions

Adding Comments

Every incident has a threaded comment section. Use comments to:

Share findings from your investigation
Post command output or log snippets
Document decisions and their rationale
Record actions taken outside of Overwatch

To add a comment, open the incident detail page, scroll to the Activity section, and type in the comment field.

@Mentions

Tag team members with @username to send them a direct notification. Mentions work in comments and incident descriptions. The mentioned user receives:

An in-app notification badge
An email notification (if configured in their profile settings)
A highlight on the comment in the activity feed

Use @mentions to request help, assign follow-up tasks, or bring someone’s attention to a specific finding.

@jane.smith Found the root cause in the payment-service logs.
Connection pool exhaustion after the v3.2 deploy. Can you check
the staging environment for the same issue?

Incident Roles

During a major incident, clear roles prevent duplicate effort and communication gaps. Assign these roles from the incident detail page under Response Team.

Incident Commander

The incident commander owns the overall response. Responsibilities include:

Coordinating team members and assigning tasks
Making decisions about escalation and communication
Tracking progress and maintaining the incident timeline
Deciding when the incident is resolved

Assign the incident commander role to a senior engineer or team lead who has authority to make operational decisions.

Diagnostician

The diagnostician focuses on technical investigation. Responsibilities include:

Running diagnostic commands through the AI chat and Helper CLI
Analyzing logs, metrics, and traces
Identifying the root cause
Proposing and validating fixes

One or more engineers can share the diagnostician role. They post findings in the comment thread as they investigate.

Communicator

The communicator manages stakeholder updates. Responsibilities include:

Posting status updates to external channels (Slack, status page, email)
Responding to questions from non-technical stakeholders
Maintaining a customer-facing timeline of the incident
Preparing the initial post-incident summary

Note: For smaller teams, one person often fills multiple roles. The structure is a guideline, not a rigid requirement. The goal is to ensure that investigation, coordination, and communication all happen without gaps.

Multi-User Incident Rooms

When multiple team members open the same incident, they enter a shared collaboration space. The incident detail page shows:

Active users: Avatars of team members currently viewing the incident
Live activity feed: Comments, status changes, and actions appear in real time
Typing indicators: See when a teammate is composing a comment
Procedure execution: Watch procedure steps complete as another team member runs them

This shared view eliminates the need to constantly ask “where are we?” during incident response. Everyone sees the same current state.

Handoff Procedures

Incidents that span shift boundaries or require expertise from another team need structured handoffs. Follow this process:

Preparing for Handoff

Update the incident description with the current state of investigation
Add a comment summarizing what has been tried, what worked, and what remains
List any open questions or blocked tasks
Attach relevant log snippets, screenshots, or command output to the incident

Executing the Handoff

@mention the incoming team member in a comment with a handoff summary
Reassign the incident to the new owner
If roles are assigned, transfer the Incident Commander role
The incoming team member acknowledges the handoff with a comment confirming they have context

Example Handoff Comment

@bob.chen Handing off checkout-service incident. Current state:

- Root cause identified: connection pool exhaustion after v3.2 deploy
- Temporary mitigation applied: scaled to 8 replicas
- Remaining: Roll back v3.2 or deploy v3.2.1 hotfix (PR #482 is open)
- Monitoring: Error rate is stable at 0.3% with the extra replicas

The hotfix needs QA sign-off before deploying to production. @alice.wu
has the QA context.

Best Practices for Incident Communication

Lead with facts: State what you know, not what you assume. “Error rate is 5.2%” is more useful than “the service seems slow.”
Timestamp your findings: Include when you observed something, especially in fast-moving incidents. “At 14:32 UTC, pod checkout-7b9f4 entered CrashLoopBackOff.”
Separate observation from hypothesis: Post raw findings as comments, and label theories clearly. “Hypothesis: the new DB connection string is missing the SSL parameter.”
Update frequently: Short, frequent updates are better than long, delayed summaries. Post when you start investigating a path, not just when you finish.
Use threads for deep dives: Keep the main comment stream high-level. Use reply threads for detailed log analysis or lengthy command output.

Next Steps

Incident Response Workflow — See collaboration in the context of a full incident
Creating Procedures — Build runbooks that multiple team members can execute
Analytics & Reporting — Measure team performance and identify improvement areas