Skip to content

Procedure Management

Procedures in Overwatch are executable runbooks that provide step-by-step guidance for common operational tasks. They ensure consistency, capture organizational knowledge, and accelerate incident resolution through structured workflows.

Procedures transform operational knowledge into repeatable, trackable processes:

Key Capabilities

  • Step-by-step instructions with detailed guidance at each stage
  • Real-time execution tracking with progress monitoring and time tracking
  • Variable substitution for dynamic content and customization
  • Approval workflows for sensitive operations requiring authorization
  • Execution history with complete audit trails and outcome tracking

Benefits

  • Reduce resolution time through standardized processes
  • Ensure consistency across team members and incidents
  • Capture and share tribal knowledge systematically
  • Track execution success rates and identify optimization opportunities
  • Enable new team members to resolve issues confidently

There are multiple ways to create procedures:

From Dashboard

Dashboard → "Create Procedure" button

From Procedures Page

Procedures → "New Procedure" button

From Template Library

Procedures → Templates → Select Template → Customize

When creating a procedure, configure the following settings:

Basic Information

  • Name: Clear, actionable procedure name that describes the task
    • ✅ Good: “Restart PostgreSQL Database Service”
    • ❌ Bad: “Database procedure”
  • Description: Detailed purpose and use case description including:
    • When to use this procedure
    • Expected outcomes
    • Prerequisites or requirements
  • Category: Organization taxonomy for grouping similar procedures
    • Examples: “Database”, “Network”, “Application”, “Security”, “Infrastructure”
  • Tags: Keywords for search and organization
    • Use consistent terminology across procedures
    • Include technology names (e.g., “postgresql”, “kubernetes”)

Workflow Settings

  • Approval Required: Whether procedure execution requires manager approval
    • Use for sensitive operations (database restarts, production deployments)
    • Approval requests sent to designated approvers
    • Execution blocked until approval granted
  • Estimated Duration: Expected execution time
    • Helps with planning and prioritization
    • Used for resource allocation
    • Compared against actual execution time for optimization
  • Timeout: Maximum allowed execution time before automatic failure
  • Rollback Capability: Whether procedure can be rolled back if it fails

Visibility and Access

  • Organization-wide: Available to all team members in your organization
  • Team-specific: Limited to specific teams
  • Private: Only visible to creator and designated collaborators

Procedures consist of ordered steps that guide execution:

Step Structure

{
"steps": [
{
"title": "Verify Database Connection",
"description": "Check database connectivity before proceeding with restart",
"type": "manual",
"estimated_duration": "2 minutes",
"approval_required": false,
"verification": "Database responds to ping and basic queries",
"error_handling": "If connection fails, check network and database logs"
},
{
"title": "Stop Application Services",
"description": "Gracefully stop all services connecting to the database",
"type": "manual",
"estimated_duration": "3 minutes",
"approval_required": false,
"commands": [
"systemctl stop api-service",
"systemctl stop worker-service"
]
},
{
"title": "Restart Database Service",
"description": "Perform the database service restart with confirmation",
"type": "manual",
"estimated_duration": "5 minutes",
"approval_required": true,
"commands": [
"sudo systemctl restart postgresql"
],
"verification": "Check systemctl status postgresql shows active (running)"
},
{
"title": "Verify Database Health",
"description": "Confirm database is responding and all services are healthy",
"type": "manual",
"estimated_duration": "3 minutes",
"approval_required": false,
"verification": "Run health check queries and verify normal response times"
},
{
"title": "Restart Application Services",
"description": "Bring application services back online",
"type": "manual",
"estimated_duration": "3 minutes",
"approval_required": false,
"commands": [
"systemctl start api-service",
"systemctl start worker-service"
]
}
]
}

Step Types

  • manual: Requires human execution and confirmation
  • automated: Can be executed automatically (future capability)
  • verification: Check-only steps to confirm system state
  • decision: Conditional steps based on previous outcomes

Step Components

FieldRequiredDescription
titleYesShort, descriptive step name
descriptionYesDetailed instructions for executing the step
typeYesStep type (manual, automated, verification, decision)
estimated_durationNoExpected time to complete step
approval_requiredNoWhether step requires approval before execution
commandsNoShell commands or API calls to execute
verificationNoHow to verify step succeeded
error_handlingNoWhat to do if step fails
rollback_stepsNoSteps to undo this operation if needed

Variable Substitution

Use variables for dynamic content that changes per execution:

{
"title": "Connect to Server",
"description": "SSH to {{server_hostname}} on port {{ssh_port}}",
"commands": [
"ssh -p {{ssh_port}} admin@{{server_hostname}}"
]
}

Variables Defined

  • During procedure creation: Define variable names and default values
  • During execution: Users provide actual values for the execution context

Conditional Steps

Define steps that only execute under certain conditions:

{
"title": "Rollback Deployment",
"description": "Revert to previous version",
"condition": "{{deployment_success}} == false",
"type": "manual"
}

Parallel Step Groups

Group steps that can be executed in parallel for efficiency:

{
"parallel_group": [
{
"title": "Check API Health",
"description": "Verify API endpoints responding"
},
{
"title": "Check Database Health",
"description": "Verify database queries working"
},
{
"title": "Check Cache Health",
"description": "Verify Redis cache responding"
}
]
}

From Procedure Detail Page

Procedures → Select Procedure → "Execute Procedure" button

From Incident Page

Incident Detail → AI Suggestions → Select Procedure → "Execute"

Execution Setup

  1. Review procedure overview and estimated duration
  2. Provide values for any required variables
  3. Review approval requirements and approvers
  4. Confirm execution start
  5. Execution begins and tracking starts

Execution Interface

The execution interface provides real-time guidance:

Current Step Display

  • Step number and title prominently displayed
  • Detailed instructions and commands
  • Verification criteria clearly stated
  • Error handling guidance visible
  • Estimated time remaining for current step

Progress Tracking

  • Visual progress bar showing overall completion
  • Steps completed vs total steps
  • Elapsed time and estimated remaining time
  • Step-by-step completion history

Execution Actions

  • Complete Step: Mark current step as successfully completed
  • Add Notes: Add observations or context for this step
  • Report Issue: Flag problems or deviations from expected behavior
  • Request Help: Notify team members for assistance
  • Pause Execution: Temporarily pause for investigation or break

Some steps require approval before execution:

Approval Request Flow

  1. Executor reaches step requiring approval
  2. System sends approval request to designated approvers
  3. Execution pauses and displays “Awaiting Approval” status
  4. Approver receives notification (email, Slack, in-app)
  5. Approver reviews step details and context
  6. Approver approves or rejects with reason
  7. Execution resumes (if approved) or stops (if rejected)

Approval Notifications

  • In-app notifications with direct link to approval request
  • Email notifications with execution context
  • Slack/Teams messages (if integration enabled)
  • Mobile push notifications (if mobile app available)

Who Can Approve

  • Designated approvers configured per procedure
  • Managers with appropriate permissions
  • On-call incident commanders (for critical incidents)
  • Never the same person executing the procedure

Live Execution Updates

Execution progress updates in real-time via WebSocket:

For Executors

  • Step completion confirmation
  • Approval status updates
  • Help requests acknowledged
  • Team observations appear live

For Observers

  • Team members can observe execution progress without interfering
  • See which step is currently being executed
  • View executor’s notes and observations in real-time
  • Monitor execution time and progress
  • Receive notifications when execution completes

Collaboration During Execution

  • Add comments visible to executor in real-time
  • @mention executor for urgent information
  • Share relevant links or documentation
  • Provide guidance without interrupting workflow

Successful Completion

  1. All steps completed successfully
  2. Mark execution as successful
  3. Add final notes and observations
  4. Document any deviations from procedure
  5. Review execution summary and metrics

Execution Summary Includes

  • Total execution time vs estimated time
  • Each step completion time
  • Notes and observations from all steps
  • Any deviations or issues encountered
  • Verification results
  • Approvals received

Failed Execution

  1. Identify which step failed and why
  2. Mark execution as failed
  3. Document the failure reason in detail
  4. Note any rollback actions taken
  5. Optionally create incident for investigation

Paused Execution

  • Execution can be resumed later from pause point
  • Executor or managers can pause execution
  • Paused executions appear in “In Progress” list
  • Automatic timeout after 24 hours without activity

The template library provides standardized procedures for common scenarios:

Access Template Library

Procedures → Templates → Browse by Category

Template to Procedure Flow

  1. Browse templates by category or search
  2. Select relevant template
  3. Review template steps and configuration
  4. Click “Use This Template”
  5. Customize for your environment
  6. Save as new procedure in your organization

Template Customization

  • Modify step descriptions for your environment
  • Add or remove steps as needed
  • Adjust estimated durations based on your systems
  • Configure approval requirements
  • Add organization-specific verification steps

Common template categories available:

CategoryExample TemplatesUse Cases
Database OperationsPostgreSQL restart, MySQL backup/restore, MongoDB replication setupDatabase maintenance, backup operations, failover procedures
Network TroubleshootingNetwork connectivity diagnosis, DNS resolution issues, firewall rule updatesNetwork incidents, connectivity problems, security updates
Application DeploymentBlue-green deployment, canary release, rollback procedureRelease management, deployment automation, incident recovery
Security ResponseIncident response, access revocation, credential rotationSecurity incidents, compliance requirements, access management
Infrastructure ManagementServer provisioning, container orchestration, cloud resource scalingInfrastructure operations, capacity management, cloud operations
Kubernetes OperationsPod restart, deployment scaling, persistent volume recoveryContainer orchestration, cloud-native operations
Monitoring & AlertingAlert configuration, dashboard creation, metric correlationObservability setup, monitoring improvements
Backup & RecoveryDatabase backup verification, disaster recovery test, snapshot creationData protection, business continuity, compliance

Share successful procedures as templates for the community:

Template Creation (Manager/Admin role required)

Procedures → Select Procedure → "Save as Template"

Template Best Practices

  • Generalize environment-specific details using variables
  • Include comprehensive error handling guidance
  • Add verification steps for each critical operation
  • Document prerequisites and dependencies
  • Provide rollback steps where applicable
  • Use clear, unambiguous language
  • Test template with different users before publishing

Template Contribution

  • Successfully executed procedures become template candidates
  • High success rate procedures prioritized for templates
  • Community voting on template quality and usefulness
  • Template improvements based on execution feedback

Track all executions of a procedure:

Access Execution History

Procedures → Select Procedure → "Execution History" tab

Execution Record Includes

  • Execution date and time
  • Executor name and role
  • Execution duration (actual vs estimated)
  • Success or failure status
  • Notes and observations
  • Deviations from standard procedure
  • Linked incidents (if executed during incident response)
  • Approvals received (who approved, when)

Filtering Options

  • By date range
  • By executor
  • By success/failure status
  • By execution duration
  • By related incidents

Individual Procedure Metrics

For each procedure, track:

Success Rate

  • Total executions
  • Successful executions
  • Failed executions
  • Success rate percentage
  • Trends over time

Execution Time

  • Average execution time
  • Minimum and maximum execution times
  • Comparison to estimated duration
  • Trends showing improvement or degradation

Usage Statistics

  • Total number of executions
  • Executions per month/week
  • Most frequent executors
  • Peak usage times

Failure Analysis

  • Common failure points (which steps)
  • Failure reasons and patterns
  • Time to recover from failures
  • Improvement opportunities

Access team-wide procedure metrics:

Dashboard → Analytics → Procedures

Key Team Metrics

Most Executed Procedures

  • Ranking by execution frequency
  • Success rates for each
  • Total time saved through automation
  • Optimization opportunities

Team Performance

  • Average execution time by team member
  • Success rates by team member
  • Procedures executed per person
  • Training needs identification

Efficiency Metrics

  • Average time saved vs manual resolution
  • Execution time trends (improving vs degrading)
  • Approval bottlenecks
  • Step-level time analysis

Procedure Effectiveness

  • Which procedures consistently succeed
  • Which procedures need revision
  • Where steps frequently fail
  • Approval delays and patterns

Identify Optimization Opportunities

  • Procedures with high failure rates need revision
  • Steps that take longer than estimated need adjustment
  • Frequently executed procedures benefit from automation
  • Approval bottlenecks indicate process issues

Continuous Improvement Process

  1. Review procedure analytics monthly
  2. Identify top 5 most-executed procedures
  3. Analyze execution patterns and failure points
  4. Update procedures based on execution feedback
  5. Test improvements and measure impact
  6. Share successful improvements as templates

Knowledge Capture

  • Execution notes become searchable knowledge
  • Successful executions improve AI suggestions
  • Team expertise captured in execution history
  • New procedures created from recurring manual steps

Clear Step Descriptions

  • Each step should be unambiguous and actionable
  • Use imperative language (“Restart the service”, not “The service should be restarted”)
  • Include specific commands or API calls where applicable
  • Provide expected output or success criteria

Good Example

Title: Restart Nginx Web Server
Description: Execute 'sudo systemctl restart nginx' and verify service is active
Verification: Run 'systemctl status nginx' - should show "active (running)"

Bad Example

Title: Fix Nginx
Description: Make nginx work again
Verification: Check if it works

Appropriate Granularity

Balance detail with usability:

Too Granular (avoid)

1. Open terminal
2. Type 'cd'
3. Press Enter
4. Type '/var/log'
5. Press Enter

Appropriate Granularity (preferred)

1. Navigate to log directory: cd /var/log
2. Check recent error logs: tail -100 application.log

Include Verification Steps

Every critical operation needs verification:

{
"title": "Update DNS Records",
"description": "Update A record to point to new IP address",
"commands": ["aws route53 change-resource-record-sets ..."],
"verification": "Run 'dig domain.com' and confirm IP address is updated",
"expected_result": "New IP address appears in dig output within 5 minutes"
}

Error Handling Guidance

Anticipate common errors and provide guidance:

{
"title": "Database Connection Test",
"description": "Verify database connectivity",
"commands": ["psql -h localhost -U postgres -c 'SELECT 1'"],
"error_handling": {
"Connection refused": "Check if PostgreSQL is running: systemctl status postgresql",
"Authentication failed": "Verify credentials in .pgpass file",
"Timeout": "Check network connectivity and firewall rules"
}
}

Regular Testing

Maintain procedure accuracy:

  • Execute procedures regularly (at least quarterly)
  • Update procedures when systems change
  • Version control procedure changes
  • Archive obsolete procedures rather than deleting
  • Test after major infrastructure changes

Pre-Execution Preparation

Before starting execution:

  1. Read Completely First: Review all steps before starting
  2. Understand Prerequisites: Ensure all requirements are met
  3. Check Timing: Verify this is an appropriate time to execute
  4. Notify Stakeholders: Inform relevant parties of planned execution
  5. Prepare Rollback: Ensure rollback capability if available

During Execution

  1. Follow Order: Execute steps in the defined sequence unless instructed otherwise
  2. Document Deviations: Note any deviations or unexpected results immediately
  3. Add Context: Include relevant context in step notes
  4. Verify Each Step: Confirm success before proceeding to next step
  5. Communicate Issues: Report problems immediately, don’t try to work around silently

Post-Execution Actions

  1. Complete Documentation: Add comprehensive final notes
  2. Report Discrepancies: Document any differences from expected behavior
  3. Suggest Improvements: Recommend procedure updates based on experience
  4. Update Related Incidents: Link execution to incident if applicable
  5. Share Learnings: Communicate insights to team

Effective Teamwork During Execution

For Executors

  • Update notes regularly during execution
  • Request help when needed, don’t struggle silently
  • Respond to team questions and guidance
  • Document deviations immediately
  • Share screen if remote collaboration needed

For Observers

  • Watch without interrupting unless critical
  • Provide guidance through comments, not interruptions
  • Share relevant documentation or links
  • Offer help proactively if executor seems stuck
  • Learn from execution for future reference

Communication Standards

  • Use professional, clear language
  • Document decisions and rationale
  • Keep execution focus on task completion
  • Avoid side conversations during execution
  • Save detailed discussions for post-execution review

Regular Review Cycle

Establish a review schedule:

Monthly Review

  • Top 10 most-executed procedures
  • Procedures with success rate < 90%
  • Procedures with execution time variance > 25%
  • New procedures added in past month

Quarterly Review

  • All active procedures
  • Template library updates
  • Category reorganization if needed
  • Archive obsolete procedures

After Major Changes

  • Review all procedures affected by infrastructure changes
  • Test procedures in new environment
  • Update estimated durations
  • Revise verification steps if system behavior changed

Version Control

Track procedure changes systematically:

  • Document reason for each change
  • Test updated procedure before publishing
  • Notify teams of significant procedure updates
  • Maintain version history for rollback
  • Archive major versions for reference

Use variables to make procedures reusable across different contexts:

Variable Syntax

{{variable_name}}

Variable Definition

{
"variables": [
{
"name": "server_hostname",
"description": "Hostname of the server to connect to",
"type": "string",
"required": true,
"default": "production-server-01"
},
{
"name": "ssh_port",
"description": "SSH port number",
"type": "integer",
"required": false,
"default": 22
},
{
"name": "environment",
"description": "Environment name",
"type": "enum",
"required": true,
"options": ["development", "staging", "production"]
}
]
}
TypeDescriptionExample
stringText value”production-server-01”
integerWhole number22, 8080, 3306
booleanTrue/falsetrue, false
enumLimited set of options[“dev”, “staging”, “prod”]
jsonJSON object or array{“key”: “value”}

In Step Descriptions

{
"title": "Connect to {{environment}} Server",
"description": "SSH to {{server_hostname}} on port {{ssh_port}}",
"commands": [
"ssh -p {{ssh_port}} admin@{{server_hostname}}"
]
}

Execution-Time Variable Input

  1. Start procedure execution
  2. System prompts for variable values
  3. Provide values for all required variables
  4. Review pre-filled commands with actual values
  5. Confirm and start execution

Can’t Execute Procedure

  • Verify you have Engineer role or higher
  • Check procedure is published (not draft)
  • Ensure all required variables have values
  • Verify procedure is active (not archived)
  • Contact administrator if organization quota exceeded

Approval Request Not Received

  • Check approver’s email and notification settings
  • Verify approver has Manager role or higher
  • Confirm approver is in the correct team
  • Check spam folder for email notifications
  • Use in-app notification if email delayed

Execution Stuck on Step

  • Verify step is not waiting for approval
  • Check if help was requested (may pause execution)
  • Ensure WebSocket connection active (green indicator)
  • Try refreshing the page to reconnect
  • Contact support if issue persists

Can’t Create Procedure

  • Verify you have Manager role or higher
  • Check organization procedure limit
  • Ensure required fields are completed
  • Validate JSON syntax in step definitions
  • Contact administrator for quota increase

Template Not Available

  • Verify you’re browsing the correct category
  • Check search terms for typos
  • Ensure templates are published for your organization
  • Contact administrator to request specific templates
  • In-App Help: Press ? key for keyboard shortcuts and contextual help
  • Execution Guidance: Use the help button during execution to request team assistance
  • Troubleshooting: See Common Issues for detailed solutions
  • Support: Contact your system administrator for organization-specific questions

Last updated: October 2025 | Edit this page