Home Installation Configuration Commands Integrations

Documentation

Complete guide to installing, configuring, and using RunbookAI for AI-powered incident response.

Installation

Install RunbookAI with a single command:

Terminal
curl -fsSL https://userunbook.ai/install.sh | bash

After installation, restart your shell:

Terminal
exec $SHELL -l
What the install script does

The script installs Bun (if needed), clones the repo to ~/.runbook, builds the CLI, and adds it to your PATH.

Manual Installation

Alternatively, you can install manually:

Terminal
# Clone the repository
git clone https://github.com/Runbook-Agent/RunbookAI.git
cd RunbookAI

# Install dependencies with Bun (recommended)
bun install

# Build the CLI
bun run build

Requirements

Configuration

Initialize your configuration with the setup wizard:

Terminal
runbook init

Demo output (abridged):

Output
$ runbook init
═══════════════════════════════════════════
 Runbook Setup Wizard
═══════════════════════════════════════════
Step 1: Choose your AI provider
Step 2: Enter your API key
...
 Setup Complete!
Configuration complete! Your settings have been saved to .runbook/services.yaml

Full Configuration Reference

Here's a complete configuration example with all available options:

.runbook/config.yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-20250514

providers:
  aws:
    enabled: true
    regions: [us-east-1, us-west-2]
  kubernetes:
    enabled: true

incident:
  pagerduty:
    enabled: true
    apiKey: ${PAGERDUTY_API_KEY}
  opsgenie:
    enabled: false
    apiKey: ${OPSGENIE_API_KEY}
  slack:
    enabled: false
    botToken: ${SLACK_BOT_TOKEN}
    appToken: ${SLACK_APP_TOKEN}
    signingSecret: ${SLACK_SIGNING_SECRET}
    events:
      enabled: false
      mode: socket
      port: 3001
      alertChannels: [C01234567]
      allowedUsers: [U01234567]
      requireThreadedMentions: true

knowledge:
  sources:
    - type: filesystem
      path: .runbook/runbooks/
      watch: true

    # Confluence Cloud/Server
    - type: confluence
      baseUrl: https://mycompany.atlassian.net
      spaceKey: SRE
      labels: [runbook, postmortem]
      auth:
        email: ${CONFLUENCE_EMAIL}
        apiToken: ${CONFLUENCE_API_TOKEN}

    # Google Drive (requires OAuth)
    - type: google_drive
      folderIds: ['your-folder-id']
      clientId: ${GOOGLE_CLIENT_ID}
      clientSecret: ${GOOGLE_CLIENT_SECRET}
      refreshToken: ${GOOGLE_REFRESH_TOKEN}
      includeSubfolders: true

Your First Query

Test your installation by running a simple infrastructure query:

Terminal
# Ask about your infrastructure
runbook ask "What EC2 instances are running in prod?"

# Check Kubernetes cluster status
runbook ask "Show cluster status and any warning events"

# Get a status overview
runbook status
Tip

RunbookAI uses read-only queries by default. Any mutations require explicit approval with rollback commands provided.

Commands

runbook ask

Ask questions about your infrastructure in natural language. The agent will query AWS, Kubernetes, and your knowledge base to provide answers.

Terminal
runbook ask "What's the status of the checkout-api service?"
runbook ask "Show me RDS instances with high CPU"
runbook ask "Who owns the payments service?"
runbook ask "List pods with restart loops in the last hour"

runbook investigate

Perform a hypothesis-driven investigation of a PagerDuty or OpsGenie incident. The agent uses a structured approach to identify root causes.

Terminal
# Basic investigation
runbook investigate PD-12345

# Investigation with auto-remediation
runbook investigate PD-12345 --auto-remediate

The investigation workflow:

  1. Gather Context - Pulls incident details, recent deployments, and relevant runbooks
  2. Form Hypotheses - Creates ranked hypotheses based on symptoms and organizational knowledge
  3. Test Hypotheses - Runs targeted queries against infrastructure to gather evidence
  4. Branch or Prune - Branches deeper on strong evidence, prunes dead ends
  5. Root Cause - Identifies root cause with confidence level
  6. Remediation - Suggests remediation steps with approval gates for mutations

runbook knowledge

Manage the knowledge base that powers contextual understanding:

Terminal
# Sync knowledge from all configured sources
runbook knowledge sync

# Search the knowledge base
runbook knowledge search "redis connection timeout"

# Authenticate with Google Drive
runbook knowledge auth google

runbook status

Get a quick overview of your infrastructure health across all configured providers.

Terminal
runbook status

runbook slack-gateway

Start the Slack gateway for @runbookAI mentions in alert channels:

Terminal
# Local development (Socket Mode)
runbook slack-gateway --mode socket

# Production (HTTP Events API)
runbook slack-gateway --mode http --port 3001

Integrations

AWS

RunbookAI provides read-only access to AWS services including EC2, RDS, ECS, EKS, Lambda, CloudWatch, and more.

.runbook/config.yaml
providers:
  aws:
    enabled: true
    regions: [us-east-1, us-west-2, eu-west-1]

Ensure your AWS credentials are configured via:

Available AWS Operations

Kubernetes

Query Kubernetes clusters using your configured kubeconfig with read-only operations:

.runbook/config.yaml
providers:
  kubernetes:
    enabled: true

Available Kubernetes Operations

PagerDuty

Integrate with PagerDuty to pull incident context during investigations:

.runbook/config.yaml
incident:
  pagerduty:
    enabled: true
    apiKey: ${PAGERDUTY_API_KEY}

Create a read-only API key in PagerDuty under Configuration → API Access Keys.

OpsGenie

Integrate with OpsGenie for alert and incident context:

.runbook/config.yaml
incident:
  opsgenie:
    enabled: true
    apiKey: ${OPSGENIE_API_KEY}

Slack

Enable the Slack gateway to respond to @runbookAI mentions in alert channels:

.runbook/config.yaml
incident:
  slack:
    enabled: true
    botToken: ${SLACK_BOT_TOKEN}
    appToken: ${SLACK_APP_TOKEN}
    signingSecret: ${SLACK_SIGNING_SECRET}
    events:
      enabled: true
      mode: socket          # 'socket' for dev, 'http' for production
      port: 3001
      alertChannels: [C01234567]
      allowedUsers: [U01234567]
      requireThreadedMentions: true

See the Slack Gateway Guide for detailed setup instructions.

Knowledge Sources

Filesystem

Sync runbooks and documentation from local markdown files:

.runbook/config.yaml
knowledge:
  sources:
    - type: filesystem
      path: .runbook/runbooks/
      watch: true    # Auto-sync on file changes

Confluence

Sync runbooks from Confluence Cloud or Server:

.runbook/config.yaml
knowledge:
  sources:
    - type: confluence
      baseUrl: https://mycompany.atlassian.net
      spaceKey: SRE
      labels: [runbook, postmortem]
      auth:
        email: ${CONFLUENCE_EMAIL}
        apiToken: ${CONFLUENCE_API_TOKEN}

Google Drive

Sync documents from Google Drive folders:

.runbook/config.yaml
knowledge:
  sources:
    - type: google_drive
      folderIds: ['your-folder-id']
      clientId: ${GOOGLE_CLIENT_ID}
      clientSecret: ${GOOGLE_CLIENT_SECRET}
      refreshToken: ${GOOGLE_REFRESH_TOKEN}
      includeSubfolders: true

Run the OAuth authentication flow:

Terminal
# Set up OAuth credentials first
export GOOGLE_CLIENT_ID=your-client-id
export GOOGLE_CLIENT_SECRET=your-client-secret

# Run authentication flow
runbook knowledge auth google

Skills

Skills are step-by-step workflows that can be executed with approval gates. RunbookAI includes built-in skills and supports custom skills.

Built-in Skills

Writing Runbooks

Create markdown files with frontmatter to help RunbookAI understand your runbooks:

.runbook/runbooks/redis-connection.md
---
type: runbook
services: [checkout-api, cart-service]
symptoms:
  - "Redis connection timeout"
  - "Connection pool exhausted"
severity: sev2
---

# Redis Connection Exhaustion

## Symptoms
- Connection timeouts in checkout-api logs
- Connection pool exhausted errors
- Increased latency in checkout flow

## Quick Diagnosis
1. Check Redis connection count: `redis-cli info clients`
2. Check client memory usage: `redis-cli info memory`
3. Review recent traffic patterns in CloudWatch

## Mitigation Steps
1. Scale Redis cluster (requires approval)
2. Increase connection pool limit in application config
3. Enable connection queuing in pgbouncer

Frontmatter Fields

Evaluation

RunbookAI includes evaluation tooling to benchmark investigation accuracy against datasets:

Terminal
# Run evaluation with RCAEval fixtures
npm run eval:investigate -- \
  --fixtures examples/evals/rcaeval-fixtures.generated.json \
  --out .runbook/evals/rcaeval-report.json

# Run all benchmark adapters
npm run eval:all -- \
  --out-dir .runbook/evals/all-benchmarks \
  --rcaeval-input examples/evals/rcaeval-input.sample.json \
  --tracerca-input examples/evals/tracerca-input.sample.json

See the Investigation Evaluation Guide for detailed documentation on benchmarking.

Need Help?

If you have questions or run into issues, please open an issue on GitHub or join the discussions.