Team Collaboration
Team Collaboration
Section titled “Team Collaboration”Overwatch provides real-time collaboration features so your team can investigate and resolve incidents together. All changes sync instantly across connected browsers using WebSocket connections, and structured roles keep incident response organized.
Real-Time Updates
Section titled “Real-Time Updates”Every action taken on an incident is broadcast to all team members viewing that incident. This includes:
- Status changes (New, In Progress, Resolved, Closed)
- Severity updates
- Assignee changes
- New comments and @mentions
- Procedure execution progress
- AI chat activity from the Chrome extension
A green connection indicator in the bottom-left corner of the dashboard confirms your WebSocket connection is active. If the connection drops, Overwatch reconnects automatically and syncs any changes that occurred while you were disconnected.
Tip: Open the incident detail page in a separate browser tab during active response. This gives you a persistent view of team activity while you work in your monitoring dashboards and terminal.
Comments and @Mentions
Section titled “Comments and @Mentions”Adding Comments
Section titled “Adding Comments”Every incident has a threaded comment section. Use comments to:
- Share findings from your investigation
- Post command output or log snippets
- Document decisions and their rationale
- Record actions taken outside of Overwatch
To add a comment, open the incident detail page, scroll to the Activity section, and type in the comment field.
@Mentions
Section titled “@Mentions”Tag team members with @username to send them a direct notification. Mentions work in comments and incident descriptions. The mentioned user receives:
- An in-app notification badge
- An email notification (if configured in their profile settings)
- A highlight on the comment in the activity feed
Use @mentions to request help, assign follow-up tasks, or bring someone’s attention to a specific finding.
@jane.smith Found the root cause in the payment-service logs.Connection pool exhaustion after the v3.2 deploy. Can you checkthe staging environment for the same issue?Incident Roles
Section titled “Incident Roles”During a major incident, clear roles prevent duplicate effort and communication gaps. Assign these roles from the incident detail page under Response Team.
Incident Commander
Section titled “Incident Commander”The incident commander owns the overall response. Responsibilities include:
- Coordinating team members and assigning tasks
- Making decisions about escalation and communication
- Tracking progress and maintaining the incident timeline
- Deciding when the incident is resolved
Assign the incident commander role to a senior engineer or team lead who has authority to make operational decisions.
Diagnostician
Section titled “Diagnostician”The diagnostician focuses on technical investigation. Responsibilities include:
- Running diagnostic commands through the AI chat and Helper CLI
- Analyzing logs, metrics, and traces
- Identifying the root cause
- Proposing and validating fixes
One or more engineers can share the diagnostician role. They post findings in the comment thread as they investigate.
Communicator
Section titled “Communicator”The communicator manages stakeholder updates. Responsibilities include:
- Posting status updates to external channels (Slack, status page, email)
- Responding to questions from non-technical stakeholders
- Maintaining a customer-facing timeline of the incident
- Preparing the initial post-incident summary
Note: For smaller teams, one person often fills multiple roles. The structure is a guideline, not a rigid requirement. The goal is to ensure that investigation, coordination, and communication all happen without gaps.
Multi-User Incident Rooms
Section titled “Multi-User Incident Rooms”When multiple team members open the same incident, they enter a shared collaboration space. The incident detail page shows:
- Active users: Avatars of team members currently viewing the incident
- Live activity feed: Comments, status changes, and actions appear in real time
- Typing indicators: See when a teammate is composing a comment
- Procedure execution: Watch procedure steps complete as another team member runs them
This shared view eliminates the need to constantly ask “where are we?” during incident response. Everyone sees the same current state.
Handoff Procedures
Section titled “Handoff Procedures”Incidents that span shift boundaries or require expertise from another team need structured handoffs. Follow this process:
Preparing for Handoff
Section titled “Preparing for Handoff”- Update the incident description with the current state of investigation
- Add a comment summarizing what has been tried, what worked, and what remains
- List any open questions or blocked tasks
- Attach relevant log snippets, screenshots, or command output to the incident
Executing the Handoff
Section titled “Executing the Handoff”- @mention the incoming team member in a comment with a handoff summary
- Reassign the incident to the new owner
- If roles are assigned, transfer the Incident Commander role
- The incoming team member acknowledges the handoff with a comment confirming they have context
Example Handoff Comment
Section titled “Example Handoff Comment”@bob.chen Handing off checkout-service incident. Current state:
- Root cause identified: connection pool exhaustion after v3.2 deploy- Temporary mitigation applied: scaled to 8 replicas- Remaining: Roll back v3.2 or deploy v3.2.1 hotfix (PR #482 is open)- Monitoring: Error rate is stable at 0.3% with the extra replicas
The hotfix needs QA sign-off before deploying to production. @alice.wuhas the QA context.Best Practices for Incident Communication
Section titled “Best Practices for Incident Communication”- Lead with facts: State what you know, not what you assume. “Error rate is 5.2%” is more useful than “the service seems slow.”
- Timestamp your findings: Include when you observed something, especially in fast-moving incidents. “At 14:32 UTC, pod checkout-7b9f4 entered CrashLoopBackOff.”
- Separate observation from hypothesis: Post raw findings as comments, and label theories clearly. “Hypothesis: the new DB connection string is missing the SSL parameter.”
- Update frequently: Short, frequent updates are better than long, delayed summaries. Post when you start investigating a path, not just when you finish.
- Use threads for deep dives: Keep the main comment stream high-level. Use reply threads for detailed log analysis or lengthy command output.
Next Steps
Section titled “Next Steps”- Incident Response Workflow — See collaboration in the context of a full incident
- Creating Procedures — Build runbooks that multiple team members can execute
- Analytics & Reporting — Measure team performance and identify improvement areas