Automate incident investigation with hypothesis-driven AI. RunbookAI diagnoses issues, executes runbooks, and suggests remediations - so your team can focus on shipping, not firefighting.
curl -fsSL https://userunbook.ai/install.sh | bash
Built for teams running
Unlike traditional monitoring that alerts on symptoms, RunbookAI thinks like your best engineer - forming hypotheses, gathering evidence, and systematically narrowing down to root cause.
Pulls incident details from PagerDuty/OpsGenie, queries recent deployments, and searches your knowledge base for relevant runbooks and past incidents.
Creates ranked hypotheses based on symptoms, organizational knowledge, and infrastructure patterns. Each hypothesis gets a probability score.
Runs targeted queries against CloudWatch, Kubernetes, and your infrastructure. Branches deeper on strong evidence, prunes dead ends quickly.
Identifies root cause with a confidence level, suggests prioritized remediation steps, and can auto-execute approved runbooks with safety gates.
Everything you need for real incident response - from investigation to remediation, with safety built in.
Indexes your runbooks, post-mortems, and architecture docs from Confluence, Google Drive, or local files. Learns from your organization's tribal knowledge and past incidents.
runbook knowledge sync
All mutations require explicit approval with rollback commands. Full audit trail of every action, hypothesis, and decision.
First-class AWS and Kubernetes support. Query EC2, RDS, CloudWatch, ECS, EKS, pods, deployments, and more.
PagerDuty and OpsGenie integration out of the box. Pull incident context, update status, and post findings automatically.
Built-in and custom skills executed step-by-step with approval hooks. Extend with your own workflows and automations.
Mention @runbookAI in your alert channels. Socket Mode for local development, Events API for production.
Deep integration with Claude Code for contextual knowledge during coding sessions. Auto-inject relevant runbooks, known issues, and postmortems based on what you're discussing. MCP server for on-demand queries.
runbook integrations claude enable
Automatically generates runbook updates and postmortems from investigations. Your knowledge base grows with every incident.
See exactly what the agent is thinking. Evidence trails, confidence scores, and clear reasoning for every hypothesis.
Save and resume investigation state across sessions. Never lose context when switching between incidents.
From investigating production incidents to answering infrastructure questions - automate the repetitive parts of on-call.
Agent pulls incident context, searches for similar past issues, forms hypotheses, and gathers evidence from your infrastructure automatically.
"What EC2 instances are running in prod?" "Show me pods with restart loops" "Who owns the payments service?" Get instant answers.
Skills are step-by-step workflows loaded from your runbooks. Agent executes them with approval gates for any mutations.
Get cluster status, list deployments, check node health, view recent events, find resource hogs - all with natural language.
Out-of-the-box integrations with your existing infrastructure, incident management, and knowledge systems.
Get started with RunbookAI in just a few commands. No complex setup required.
Join SRE teams using AI to investigate incidents faster and more consistently. Open source, self-hosted, no vendor lock-in.