Skip to content

Team Collaboration

Overwatch provides real-time collaboration features so your team can investigate and resolve incidents together. All changes sync instantly across connected browsers using WebSocket connections, and structured roles keep incident response organized.

Every action taken on an incident is broadcast to all team members viewing that incident. This includes:

  • Status changes (New, In Progress, Resolved, Closed)
  • Severity updates
  • Assignee changes
  • New comments and @mentions
  • Procedure execution progress
  • AI chat activity from the Chrome extension

A green connection indicator in the bottom-left corner of the dashboard confirms your WebSocket connection is active. If the connection drops, Overwatch reconnects automatically and syncs any changes that occurred while you were disconnected.

Tip: Open the incident detail page in a separate browser tab during active response. This gives you a persistent view of team activity while you work in your monitoring dashboards and terminal.

Every incident has a threaded comment section. Use comments to:

  • Share findings from your investigation
  • Post command output or log snippets
  • Document decisions and their rationale
  • Record actions taken outside of Overwatch

To add a comment, open the incident detail page, scroll to the Activity section, and type in the comment field.

Tag team members with @username to send them a direct notification. Mentions work in comments and incident descriptions. The mentioned user receives:

  • An in-app notification badge
  • An email notification (if configured in their profile settings)
  • A highlight on the comment in the activity feed

Use @mentions to request help, assign follow-up tasks, or bring someone’s attention to a specific finding.

@jane.smith Found the root cause in the payment-service logs.
Connection pool exhaustion after the v3.2 deploy. Can you check
the staging environment for the same issue?

During a major incident, clear roles prevent duplicate effort and communication gaps. Assign these roles from the incident detail page under Response Team.

The incident commander owns the overall response. Responsibilities include:

  • Coordinating team members and assigning tasks
  • Making decisions about escalation and communication
  • Tracking progress and maintaining the incident timeline
  • Deciding when the incident is resolved

Assign the incident commander role to a senior engineer or team lead who has authority to make operational decisions.

The diagnostician focuses on technical investigation. Responsibilities include:

  • Running diagnostic commands through the AI chat and Helper CLI
  • Analyzing logs, metrics, and traces
  • Identifying the root cause
  • Proposing and validating fixes

One or more engineers can share the diagnostician role. They post findings in the comment thread as they investigate.

The communicator manages stakeholder updates. Responsibilities include:

  • Posting status updates to external channels (Slack, status page, email)
  • Responding to questions from non-technical stakeholders
  • Maintaining a customer-facing timeline of the incident
  • Preparing the initial post-incident summary

Note: For smaller teams, one person often fills multiple roles. The structure is a guideline, not a rigid requirement. The goal is to ensure that investigation, coordination, and communication all happen without gaps.

When multiple team members open the same incident, they enter a shared collaboration space. The incident detail page shows:

  • Active users: Avatars of team members currently viewing the incident
  • Live activity feed: Comments, status changes, and actions appear in real time
  • Typing indicators: See when a teammate is composing a comment
  • Procedure execution: Watch procedure steps complete as another team member runs them

This shared view eliminates the need to constantly ask “where are we?” during incident response. Everyone sees the same current state.

Incidents that span shift boundaries or require expertise from another team need structured handoffs. Follow this process:

  1. Update the incident description with the current state of investigation
  2. Add a comment summarizing what has been tried, what worked, and what remains
  3. List any open questions or blocked tasks
  4. Attach relevant log snippets, screenshots, or command output to the incident
  1. @mention the incoming team member in a comment with a handoff summary
  2. Reassign the incident to the new owner
  3. If roles are assigned, transfer the Incident Commander role
  4. The incoming team member acknowledges the handoff with a comment confirming they have context
@bob.chen Handing off checkout-service incident. Current state:
- Root cause identified: connection pool exhaustion after v3.2 deploy
- Temporary mitigation applied: scaled to 8 replicas
- Remaining: Roll back v3.2 or deploy v3.2.1 hotfix (PR #482 is open)
- Monitoring: Error rate is stable at 0.3% with the extra replicas
The hotfix needs QA sign-off before deploying to production. @alice.wu
has the QA context.
  • Lead with facts: State what you know, not what you assume. “Error rate is 5.2%” is more useful than “the service seems slow.”
  • Timestamp your findings: Include when you observed something, especially in fast-moving incidents. “At 14:32 UTC, pod checkout-7b9f4 entered CrashLoopBackOff.”
  • Separate observation from hypothesis: Post raw findings as comments, and label theories clearly. “Hypothesis: the new DB connection string is missing the SSL parameter.”
  • Update frequently: Short, frequent updates are better than long, delayed summaries. Post when you start investigating a path, not just when you finish.
  • Use threads for deep dives: Keep the main comment stream high-level. Use reply threads for detailed log analysis or lengthy command output.