Procedure Management
Procedure Management
Section titled “Procedure Management”Procedures in Overwatch are executable runbooks that provide step-by-step guidance for common operational tasks. They ensure consistency, capture organizational knowledge, and accelerate incident resolution through structured workflows.
Understanding Procedures
Section titled “Understanding Procedures”Procedures transform operational knowledge into repeatable, trackable processes:
Key Capabilities
- Step-by-step instructions with detailed guidance at each stage
- Real-time execution tracking with progress monitoring and time tracking
- Variable substitution for dynamic content and customization
- Approval workflows for sensitive operations requiring authorization
- Execution history with complete audit trails and outcome tracking
Benefits
- Reduce resolution time through standardized processes
- Ensure consistency across team members and incidents
- Capture and share tribal knowledge systematically
- Track execution success rates and identify optimization opportunities
- Enable new team members to resolve issues confidently
Creating Procedures
Section titled “Creating Procedures”Quick Create
Section titled “Quick Create”There are multiple ways to create procedures:
From Dashboard
Dashboard → "Create Procedure" buttonFrom Procedures Page
Procedures → "New Procedure" buttonFrom Template Library
Procedures → Templates → Select Template → CustomizeProcedure Configuration
Section titled “Procedure Configuration”When creating a procedure, configure the following settings:
Basic Information
- Name: Clear, actionable procedure name that describes the task
- ✅ Good: “Restart PostgreSQL Database Service”
- ❌ Bad: “Database procedure”
- Description: Detailed purpose and use case description including:
- When to use this procedure
- Expected outcomes
- Prerequisites or requirements
- Category: Organization taxonomy for grouping similar procedures
- Examples: “Database”, “Network”, “Application”, “Security”, “Infrastructure”
- Tags: Keywords for search and organization
- Use consistent terminology across procedures
- Include technology names (e.g., “postgresql”, “kubernetes”)
Workflow Settings
- Approval Required: Whether procedure execution requires manager approval
- Use for sensitive operations (database restarts, production deployments)
- Approval requests sent to designated approvers
- Execution blocked until approval granted
- Estimated Duration: Expected execution time
- Helps with planning and prioritization
- Used for resource allocation
- Compared against actual execution time for optimization
- Timeout: Maximum allowed execution time before automatic failure
- Rollback Capability: Whether procedure can be rolled back if it fails
Visibility and Access
- Organization-wide: Available to all team members in your organization
- Team-specific: Limited to specific teams
- Private: Only visible to creator and designated collaborators
Step Definition
Section titled “Step Definition”Procedures consist of ordered steps that guide execution:
Step Structure
{ "steps": [ { "title": "Verify Database Connection", "description": "Check database connectivity before proceeding with restart", "type": "manual", "estimated_duration": "2 minutes", "approval_required": false, "verification": "Database responds to ping and basic queries", "error_handling": "If connection fails, check network and database logs" }, { "title": "Stop Application Services", "description": "Gracefully stop all services connecting to the database", "type": "manual", "estimated_duration": "3 minutes", "approval_required": false, "commands": [ "systemctl stop api-service", "systemctl stop worker-service" ] }, { "title": "Restart Database Service", "description": "Perform the database service restart with confirmation", "type": "manual", "estimated_duration": "5 minutes", "approval_required": true, "commands": [ "sudo systemctl restart postgresql" ], "verification": "Check systemctl status postgresql shows active (running)" }, { "title": "Verify Database Health", "description": "Confirm database is responding and all services are healthy", "type": "manual", "estimated_duration": "3 minutes", "approval_required": false, "verification": "Run health check queries and verify normal response times" }, { "title": "Restart Application Services", "description": "Bring application services back online", "type": "manual", "estimated_duration": "3 minutes", "approval_required": false, "commands": [ "systemctl start api-service", "systemctl start worker-service" ] } ]}Step Types
- manual: Requires human execution and confirmation
- automated: Can be executed automatically (future capability)
- verification: Check-only steps to confirm system state
- decision: Conditional steps based on previous outcomes
Step Components
| Field | Required | Description |
|---|---|---|
title | Yes | Short, descriptive step name |
description | Yes | Detailed instructions for executing the step |
type | Yes | Step type (manual, automated, verification, decision) |
estimated_duration | No | Expected time to complete step |
approval_required | No | Whether step requires approval before execution |
commands | No | Shell commands or API calls to execute |
verification | No | How to verify step succeeded |
error_handling | No | What to do if step fails |
rollback_steps | No | Steps to undo this operation if needed |
Advanced Step Features
Section titled “Advanced Step Features”Variable Substitution
Use variables for dynamic content that changes per execution:
{ "title": "Connect to Server", "description": "SSH to {{server_hostname}} on port {{ssh_port}}", "commands": [ "ssh -p {{ssh_port}} admin@{{server_hostname}}" ]}Variables Defined
- During procedure creation: Define variable names and default values
- During execution: Users provide actual values for the execution context
Conditional Steps
Define steps that only execute under certain conditions:
{ "title": "Rollback Deployment", "description": "Revert to previous version", "condition": "{{deployment_success}} == false", "type": "manual"}Parallel Step Groups
Group steps that can be executed in parallel for efficiency:
{ "parallel_group": [ { "title": "Check API Health", "description": "Verify API endpoints responding" }, { "title": "Check Database Health", "description": "Verify database queries working" }, { "title": "Check Cache Health", "description": "Verify Redis cache responding" } ]}Executing Procedures
Section titled “Executing Procedures”Starting Execution
Section titled “Starting Execution”From Procedure Detail Page
Procedures → Select Procedure → "Execute Procedure" buttonFrom Incident Page
Incident Detail → AI Suggestions → Select Procedure → "Execute"Execution Setup
- Review procedure overview and estimated duration
- Provide values for any required variables
- Review approval requirements and approvers
- Confirm execution start
- Execution begins and tracking starts
Step-by-Step Execution
Section titled “Step-by-Step Execution”Execution Interface
The execution interface provides real-time guidance:
Current Step Display
- Step number and title prominently displayed
- Detailed instructions and commands
- Verification criteria clearly stated
- Error handling guidance visible
- Estimated time remaining for current step
Progress Tracking
- Visual progress bar showing overall completion
- Steps completed vs total steps
- Elapsed time and estimated remaining time
- Step-by-step completion history
Execution Actions
- Complete Step: Mark current step as successfully completed
- Add Notes: Add observations or context for this step
- Report Issue: Flag problems or deviations from expected behavior
- Request Help: Notify team members for assistance
- Pause Execution: Temporarily pause for investigation or break
Approval Gates
Section titled “Approval Gates”Some steps require approval before execution:
Approval Request Flow
- Executor reaches step requiring approval
- System sends approval request to designated approvers
- Execution pauses and displays “Awaiting Approval” status
- Approver receives notification (email, Slack, in-app)
- Approver reviews step details and context
- Approver approves or rejects with reason
- Execution resumes (if approved) or stops (if rejected)
Approval Notifications
- In-app notifications with direct link to approval request
- Email notifications with execution context
- Slack/Teams messages (if integration enabled)
- Mobile push notifications (if mobile app available)
Who Can Approve
- Designated approvers configured per procedure
- Managers with appropriate permissions
- On-call incident commanders (for critical incidents)
- Never the same person executing the procedure
Real-time Monitoring
Section titled “Real-time Monitoring”Live Execution Updates
Execution progress updates in real-time via WebSocket:
For Executors
- Step completion confirmation
- Approval status updates
- Help requests acknowledged
- Team observations appear live
For Observers
- Team members can observe execution progress without interfering
- See which step is currently being executed
- View executor’s notes and observations in real-time
- Monitor execution time and progress
- Receive notifications when execution completes
Collaboration During Execution
- Add comments visible to executor in real-time
- @mention executor for urgent information
- Share relevant links or documentation
- Provide guidance without interrupting workflow
Execution Completion
Section titled “Execution Completion”Successful Completion
- All steps completed successfully
- Mark execution as successful
- Add final notes and observations
- Document any deviations from procedure
- Review execution summary and metrics
Execution Summary Includes
- Total execution time vs estimated time
- Each step completion time
- Notes and observations from all steps
- Any deviations or issues encountered
- Verification results
- Approvals received
Failed Execution
- Identify which step failed and why
- Mark execution as failed
- Document the failure reason in detail
- Note any rollback actions taken
- Optionally create incident for investigation
Paused Execution
- Execution can be resumed later from pause point
- Executor or managers can pause execution
- Paused executions appear in “In Progress” list
- Automatic timeout after 24 hours without activity
Procedure Templates
Section titled “Procedure Templates”The template library provides standardized procedures for common scenarios:
Using Templates
Section titled “Using Templates”Access Template Library
Procedures → Templates → Browse by CategoryTemplate to Procedure Flow
- Browse templates by category or search
- Select relevant template
- Review template steps and configuration
- Click “Use This Template”
- Customize for your environment
- Save as new procedure in your organization
Template Customization
- Modify step descriptions for your environment
- Add or remove steps as needed
- Adjust estimated durations based on your systems
- Configure approval requirements
- Add organization-specific verification steps
Template Categories
Section titled “Template Categories”Common template categories available:
| Category | Example Templates | Use Cases |
|---|---|---|
| Database Operations | PostgreSQL restart, MySQL backup/restore, MongoDB replication setup | Database maintenance, backup operations, failover procedures |
| Network Troubleshooting | Network connectivity diagnosis, DNS resolution issues, firewall rule updates | Network incidents, connectivity problems, security updates |
| Application Deployment | Blue-green deployment, canary release, rollback procedure | Release management, deployment automation, incident recovery |
| Security Response | Incident response, access revocation, credential rotation | Security incidents, compliance requirements, access management |
| Infrastructure Management | Server provisioning, container orchestration, cloud resource scaling | Infrastructure operations, capacity management, cloud operations |
| Kubernetes Operations | Pod restart, deployment scaling, persistent volume recovery | Container orchestration, cloud-native operations |
| Monitoring & Alerting | Alert configuration, dashboard creation, metric correlation | Observability setup, monitoring improvements |
| Backup & Recovery | Database backup verification, disaster recovery test, snapshot creation | Data protection, business continuity, compliance |
Creating Your Own Templates
Section titled “Creating Your Own Templates”Share successful procedures as templates for the community:
Template Creation (Manager/Admin role required)
Procedures → Select Procedure → "Save as Template"Template Best Practices
- Generalize environment-specific details using variables
- Include comprehensive error handling guidance
- Add verification steps for each critical operation
- Document prerequisites and dependencies
- Provide rollback steps where applicable
- Use clear, unambiguous language
- Test template with different users before publishing
Template Contribution
- Successfully executed procedures become template candidates
- High success rate procedures prioritized for templates
- Community voting on template quality and usefulness
- Template improvements based on execution feedback
Execution History and Analytics
Section titled “Execution History and Analytics”Individual Execution History
Section titled “Individual Execution History”Track all executions of a procedure:
Access Execution History
Procedures → Select Procedure → "Execution History" tabExecution Record Includes
- Execution date and time
- Executor name and role
- Execution duration (actual vs estimated)
- Success or failure status
- Notes and observations
- Deviations from standard procedure
- Linked incidents (if executed during incident response)
- Approvals received (who approved, when)
Filtering Options
- By date range
- By executor
- By success/failure status
- By execution duration
- By related incidents
Procedure Performance Metrics
Section titled “Procedure Performance Metrics”Individual Procedure Metrics
For each procedure, track:
Success Rate
- Total executions
- Successful executions
- Failed executions
- Success rate percentage
- Trends over time
Execution Time
- Average execution time
- Minimum and maximum execution times
- Comparison to estimated duration
- Trends showing improvement or degradation
Usage Statistics
- Total number of executions
- Executions per month/week
- Most frequent executors
- Peak usage times
Failure Analysis
- Common failure points (which steps)
- Failure reasons and patterns
- Time to recover from failures
- Improvement opportunities
Team Analytics
Section titled “Team Analytics”Access team-wide procedure metrics:
Dashboard → Analytics → Procedures
Key Team Metrics
Most Executed Procedures
- Ranking by execution frequency
- Success rates for each
- Total time saved through automation
- Optimization opportunities
Team Performance
- Average execution time by team member
- Success rates by team member
- Procedures executed per person
- Training needs identification
Efficiency Metrics
- Average time saved vs manual resolution
- Execution time trends (improving vs degrading)
- Approval bottlenecks
- Step-level time analysis
Procedure Effectiveness
- Which procedures consistently succeed
- Which procedures need revision
- Where steps frequently fail
- Approval delays and patterns
Using Analytics for Improvement
Section titled “Using Analytics for Improvement”Identify Optimization Opportunities
- Procedures with high failure rates need revision
- Steps that take longer than estimated need adjustment
- Frequently executed procedures benefit from automation
- Approval bottlenecks indicate process issues
Continuous Improvement Process
- Review procedure analytics monthly
- Identify top 5 most-executed procedures
- Analyze execution patterns and failure points
- Update procedures based on execution feedback
- Test improvements and measure impact
- Share successful improvements as templates
Knowledge Capture
- Execution notes become searchable knowledge
- Successful executions improve AI suggestions
- Team expertise captured in execution history
- New procedures created from recurring manual steps
Best Practices
Section titled “Best Practices”Writing Effective Procedures
Section titled “Writing Effective Procedures”Clear Step Descriptions
- Each step should be unambiguous and actionable
- Use imperative language (“Restart the service”, not “The service should be restarted”)
- Include specific commands or API calls where applicable
- Provide expected output or success criteria
✅ Good Example
Title: Restart Nginx Web ServerDescription: Execute 'sudo systemctl restart nginx' and verify service is activeVerification: Run 'systemctl status nginx' - should show "active (running)"❌ Bad Example
Title: Fix NginxDescription: Make nginx work againVerification: Check if it worksAppropriate Granularity
Balance detail with usability:
Too Granular (avoid)
1. Open terminal2. Type 'cd'3. Press Enter4. Type '/var/log'5. Press EnterAppropriate Granularity (preferred)
1. Navigate to log directory: cd /var/log2. Check recent error logs: tail -100 application.logInclude Verification Steps
Every critical operation needs verification:
{ "title": "Update DNS Records", "description": "Update A record to point to new IP address", "commands": ["aws route53 change-resource-record-sets ..."], "verification": "Run 'dig domain.com' and confirm IP address is updated", "expected_result": "New IP address appears in dig output within 5 minutes"}Error Handling Guidance
Anticipate common errors and provide guidance:
{ "title": "Database Connection Test", "description": "Verify database connectivity", "commands": ["psql -h localhost -U postgres -c 'SELECT 1'"], "error_handling": { "Connection refused": "Check if PostgreSQL is running: systemctl status postgresql", "Authentication failed": "Verify credentials in .pgpass file", "Timeout": "Check network connectivity and firewall rules" }}Regular Testing
Maintain procedure accuracy:
- Execute procedures regularly (at least quarterly)
- Update procedures when systems change
- Version control procedure changes
- Archive obsolete procedures rather than deleting
- Test after major infrastructure changes
Execution Best Practices
Section titled “Execution Best Practices”Pre-Execution Preparation
Before starting execution:
- Read Completely First: Review all steps before starting
- Understand Prerequisites: Ensure all requirements are met
- Check Timing: Verify this is an appropriate time to execute
- Notify Stakeholders: Inform relevant parties of planned execution
- Prepare Rollback: Ensure rollback capability if available
During Execution
- Follow Order: Execute steps in the defined sequence unless instructed otherwise
- Document Deviations: Note any deviations or unexpected results immediately
- Add Context: Include relevant context in step notes
- Verify Each Step: Confirm success before proceeding to next step
- Communicate Issues: Report problems immediately, don’t try to work around silently
Post-Execution Actions
- Complete Documentation: Add comprehensive final notes
- Report Discrepancies: Document any differences from expected behavior
- Suggest Improvements: Recommend procedure updates based on experience
- Update Related Incidents: Link execution to incident if applicable
- Share Learnings: Communicate insights to team
Collaboration Guidelines
Section titled “Collaboration Guidelines”Effective Teamwork During Execution
For Executors
- Update notes regularly during execution
- Request help when needed, don’t struggle silently
- Respond to team questions and guidance
- Document deviations immediately
- Share screen if remote collaboration needed
For Observers
- Watch without interrupting unless critical
- Provide guidance through comments, not interruptions
- Share relevant documentation or links
- Offer help proactively if executor seems stuck
- Learn from execution for future reference
Communication Standards
- Use professional, clear language
- Document decisions and rationale
- Keep execution focus on task completion
- Avoid side conversations during execution
- Save detailed discussions for post-execution review
Procedure Maintenance
Section titled “Procedure Maintenance”Regular Review Cycle
Establish a review schedule:
Monthly Review
- Top 10 most-executed procedures
- Procedures with success rate < 90%
- Procedures with execution time variance > 25%
- New procedures added in past month
Quarterly Review
- All active procedures
- Template library updates
- Category reorganization if needed
- Archive obsolete procedures
After Major Changes
- Review all procedures affected by infrastructure changes
- Test procedures in new environment
- Update estimated durations
- Revise verification steps if system behavior changed
Version Control
Track procedure changes systematically:
- Document reason for each change
- Test updated procedure before publishing
- Notify teams of significant procedure updates
- Maintain version history for rollback
- Archive major versions for reference
Procedure Variables
Section titled “Procedure Variables”Use variables to make procedures reusable across different contexts:
Defining Variables
Section titled “Defining Variables”Variable Syntax
{{variable_name}}Variable Definition
{ "variables": [ { "name": "server_hostname", "description": "Hostname of the server to connect to", "type": "string", "required": true, "default": "production-server-01" }, { "name": "ssh_port", "description": "SSH port number", "type": "integer", "required": false, "default": 22 }, { "name": "environment", "description": "Environment name", "type": "enum", "required": true, "options": ["development", "staging", "production"] } ]}Variable Types
Section titled “Variable Types”| Type | Description | Example |
|---|---|---|
string | Text value | ”production-server-01” |
integer | Whole number | 22, 8080, 3306 |
boolean | True/false | true, false |
enum | Limited set of options | [“dev”, “staging”, “prod”] |
json | JSON object or array | {“key”: “value”} |
Using Variables
Section titled “Using Variables”In Step Descriptions
{ "title": "Connect to {{environment}} Server", "description": "SSH to {{server_hostname}} on port {{ssh_port}}", "commands": [ "ssh -p {{ssh_port}} admin@{{server_hostname}}" ]}Execution-Time Variable Input
- Start procedure execution
- System prompts for variable values
- Provide values for all required variables
- Review pre-filled commands with actual values
- Confirm and start execution
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”Can’t Execute Procedure
- Verify you have Engineer role or higher
- Check procedure is published (not draft)
- Ensure all required variables have values
- Verify procedure is active (not archived)
- Contact administrator if organization quota exceeded
Approval Request Not Received
- Check approver’s email and notification settings
- Verify approver has Manager role or higher
- Confirm approver is in the correct team
- Check spam folder for email notifications
- Use in-app notification if email delayed
Execution Stuck on Step
- Verify step is not waiting for approval
- Check if help was requested (may pause execution)
- Ensure WebSocket connection active (green indicator)
- Try refreshing the page to reconnect
- Contact support if issue persists
Can’t Create Procedure
- Verify you have Manager role or higher
- Check organization procedure limit
- Ensure required fields are completed
- Validate JSON syntax in step definitions
- Contact administrator for quota increase
Template Not Available
- Verify you’re browsing the correct category
- Check search terms for typos
- Ensure templates are published for your organization
- Contact administrator to request specific templates
Next Steps
Section titled “Next Steps”- Search Features - Use AI-powered search to find relevant procedures
- Procedure Creation Workflow - Detailed guide for creating procedures
- Analytics Dashboard - Track procedure performance and team metrics
- Integration Setup - Connect observability platforms for context
Need Help?
Section titled “Need Help?”- In-App Help: Press
?key for keyboard shortcuts and contextual help - Execution Guidance: Use the help button during execution to request team assistance
- Troubleshooting: See Common Issues for detailed solutions
- Support: Contact your system administrator for organization-specific questions
Last updated: October 2025 | Edit this page