RunbookAI investigates production incidents like your best on-call engineer: forms hypotheses, gathers evidence, and drives to root cause with recommended fixes. Built for production ops with approval gates, audit trails, and cross-system context.
Proven workflow in the demo: ranked hypotheses with confidence scores, full evidence traces, and remediation steps your team can approve in channel.
npx @runbook-agent/runbook demo
Works with your stack
Forms hypotheses, gathers evidence, finds root cause automatically.
Step-by-step execution with approval gates for every mutation.
Natural language queries across AWS, Kubernetes, and CloudWatch.
Indexes runbooks, postmortems, and architecture docs automatically.
Slack, PagerDuty, OpsGenie, Claude Code — all connected.
Every mutation requires approval. Full audit trail. Always.
Self-host a shared knowledge server. One deployment, every engineer connected.
An incident fires from PagerDuty, OpsGenie, or a Slack mention
Forms ranked hypotheses from symptoms and organizational knowledge
Runs targeted queries against your infrastructure for evidence
Delivers root cause with confidence scores and remediation steps
Surface relevant runbooks, known issues, and postmortems inside Claude Code sessions — so operational context is already there when you're debugging.
runbook integrations claude enable
Tell us your role, infra setup, top incident pain, and how to reach you. We use this to prioritize roadmap work.
Open intake formFollow an opinionated 3-step setup sequence with exact commands and links to the right docs sections.
Open 3-step onboardingWe are onboarding a small number of teams running production workloads on AWS and Kubernetes. Share a few details and we will follow up with a focused setup session.
If the form submit does not open your mail app, use the fallback link below.
Run a full incident investigation in minutes, then point it at your own stack. Open source. Self-hosted. No vendor lock-in.