Ship AI features you can put your name on.
The templates, evals, and launch gates AI PMs use to turn working demos into production calls they own.
Vibe-coding gets you to a demo fast. This playbook helps you make the harder call: whether that demo should become a product, where humans need to stay in the loop, what evidence would make it safe to ship, and when to say "not yet."
Built for PMs working with LLMs, agents, copilots, RAG, and workflow automation.
If useful, a GitHub star helps me know this is worth maintaining.
New here? Start with A week with the AI PM Playbook — a walkthrough of one PM using these artifacts on an actual product, from opportunity brief to roadmap review.
I have an idea
Before you vibe code -> Opportunity brief -> AI PRD -> Eval plan -> Launch gate
Use this path when the problem is still fuzzy and you need to decide whether AI is worth building at all.
I already built a prototype and now I'm nervous
Error analysis -> Eval plan -> PRD risk table -> Observability plan -> Launch gate
Use this path when the demo works, but you do not yet know whether the product is safe, measurable, affordable, or ready for users.
- Before you vibe code — answer 8 questions before building
- AI opportunity brief — decide if AI is worth pursuing and align user, AI job, human control, evals, risk, and cost
- AI PRD — define what the AI does, its quality bar, risks, and what happens when it fails
- Eval plan — define "good" before trusting model output
- Human review workflow — decide who validates, corrects, escalates, or blocks AI output before it matters
- Launch gate checklist — make a go/no-go call for pilot, production, or scale
- Healthcare intake example — see what a "do not launch" recommendation looks like
The full playbook has the operating model, evidence hierarchy, readiness scoring, and decision framework.
"Do not launch" is not a failure state. It is a product decision when the evidence says the blast radius is larger than the team's ability to measure, review, roll back, or operate the AI safely.
Stop or hold when evals are missing, human review is undefined, agent rollback is impossible, data permissioning is unclear, cost exceeds the business case, or legal/security review has not happened for a high-risk workflow. A convincing LLM demo is not evidence that the product can act safely in the real workflow. Use the Launch Gates guide to make that call with evidence.
- PMs shipping AI features from prototype to production
- Founders deciding which AI workflows are worth building
- Product leaders reviewing whether an AI roadmap is credible
- Engineering, design, and legal partners who want clearer AI product artifacts
This is not a prompt pack or a strategy deck. There are no starter apps.
Most of these jobs didn't exist three years ago. Each one has a template.
| Skill | What it means | Artifact |
|---|---|---|
| Opportunity assessment | Decide whether AI is worth pursuing and align user, AI job, human control, evals, risk, and cost | Opportunity Brief |
| AI job definition | Specify what the AI does, its constraints, and its fallback behavior | AI PRD |
| Eval design | Define "good" before trusting model output | Eval Plan |
| Risk management | What can go wrong, how bad is it, what do we do about it | PRD risk table + Launch Gate |
| Human-in-the-loop design | Decide who validates, corrects, escalates, or blocks AI output before it matters | Review Workflow |
| Unit economics | Cost per workflow and margin impact at scale | Cost Model |
| Launch gating | Go/no-go calls using evidence | Launch Gate Checklist |
| Observability | Monitor quality, drift, and cost in production | Observability Plan |
| Post-launch review | What actually happened vs. what we expected | Observability Plan |
| Optional handoff and operations | Build handoff, meeting review, and prompt change control | Optional templates |
Twelve guides on the parts of AI product management where most teams get stuck.
| Guide | What it covers |
|---|---|
| Before You Vibe Code | Eight questions to answer before turning an AI idea into a demo |
| Walkthrough | A week with the playbook: one PM, one product, five artifacts |
| Eval Design | Building evals that catch real failures, including the ones you miss in demos |
| Agentic Products | How to spec agents vs. chatbots vs. copilots |
| Operating AI Products | Human review, safety, observability, and cost discipline after the demo works |
| Launch Gates | How to say "do not launch" with evidence |
| Prompt Craft | Treating prompts as product surfaces |
| Bad to Good AI PRD | Turning a vague AI assistant brief into a buildable PRD slice |
| Error Analysis | Reading traces, labeling failures, and deciding which evals are worth automating |
| Artifact Flow Map | What artifact comes when, who owns it, and what decision it unlocks |
| Agent PM Starter Pack | Tool boundaries, autonomy, rollback, trajectory evals, cost ceilings, and handoff |
| AI-Native PM Loop | Build small PM agents, trace behavior, create evals from traces, and improve safely |
Three worked examples. Each one includes an opportunity brief, PRD, eval plan, launch gate assessment, and a scored readiness recommendation. The customer support example also includes a week-2 post-launch review to show the operating loop after pilot launch.
| Case study | Risk | Recommendation |
|---|---|---|
| Customer Support Copilot | Medium | Pilot after blockers resolved |
| Sales Call CRM Assistant | Medium | Pilot after blockers resolved |
| Healthcare Intake Assistant | High | Prototype only |
The examples are synthetic but realistic. They show how the artifacts reason through tradeoffs rather than filling in blanks.
Use these artifacts to answer common AI PM interview questions with concrete examples.
| Interview question | Where to point |
|---|---|
| How do you decide if an AI feature is worth building? | Opportunity Brief + Healthcare Intake opportunity |
| How do you define quality for LLM output? | Eval Plan + Customer Support eval |
| How do you handle hallucination risk? | AI PRD risk table + Customer Support launch gate |
| How do you decide not to launch? | Launch Gates guide + Healthcare Intake launch gate |
| How do you operate after launch? | Observability Plan + Week-2 post-launch review |
ai-pm-playbook.md # Full playbook: operating model, scoring, gates
templates/ # 7 core PM artifacts plus 3 optional templates
docs/ # 12 reference guides (including walkthrough)
examples/ # 3 scored case studies, plus one post-launch review example
schema/ # JSON schema for readiness assessments
GRIT covers the engineering side: how AI-assisted code gets specified, tested, and reviewed. This playbook covers the product side: what gets built, why, and when it is ready.
