Fable came out and then disappeared a couple of days later. In that short time, I managed to run a lot of repository reviews with it, and was really happy with the results. I’ve found and fixed around 400 bugs and 400+ other issues (performance, simplifications, modernizations) across dozens of my projects, making several hundred PRs with a nearly 100% merge rate on decided PRs.
Repo code review with Claude Fable
Disclaimer
I am a core maintainer on most of these projects, and I had permission before starting from the few that I’m not a core maintainer on. Don’t do this to someone without making sure first that they are interested in it!This is a followup to my earlier post on starting with agentic AI. I’ve been accepted into the Claude Code OSS program, so I have access to Claude Max for six months. For OSS work, that’s a ton of compute, even with the 2x multiplier on Fable, I can’t reasonably hit the limit. So I decided to try a simple prompt with Fable:
Review this project for bugs, performance, simplifications, and modernizations
I was shocked at the results. It found dozens of bugs, lots of performance fixes, and other nice cleanups on every project I tried it on. On scikit-build-core, which I wrote from scratch, it found 4 serious bugs (mostly unreleased), 11 smaller ones, a large batch of tiny ones, and 8 simplification opportunities.
For most of these, I followed it up with a prompt like this:
Put this into an issue, then open up draft PRs for these, use Sonnet or Opus
based on the task complexity. Group several into one PR when it makes sense. The
PRs should reference the issue.
(Actually, usually I /copy the response, then tell it to reference that new
issue when making PRs. I always add my AI text below disclaimer).
This makes a batch of PRs. Note I didn’t have to ask it to use subagents or worktrees, it figured that out (originally I was adding instructions like that). The grouping is a matter of taste; on some repos I didn’t allow grouping, and sometimes I guided it. I very rarely had to skip a suggestion, basically just if it recommended bumping the Python floor.
I have ~/.claude/CLAUDE.md that looks like this:
If you make a commit, follow conventional commits and add a trailer:
`Assisted-by: <harness>:<model>`, where `<harness>` is the current agent harness
(like ClaudeCode), and `<model>` is the AI model (Like claude-opus-4.8). You
don't need to add a coauthored-by Claude when you have this.
Prefix PR descriptions and comments on PRs with the line ":robot: _AI text
below_ :robot:" to indicate you are an agent speaking on a user's behalf.
That’s critical to ensure proper commit trailers and keep Claude from pretending it’s me. I have similar things for OpenCode, Pi, etc.
I sometimes needed Claude to babysit the PRs; that’s simple as going back into the conversation and asking it to check CI on all the PRs; it will continue to fix until the CI goes green. I’m used to thinking of one-PR-at-a-time, but you can just as easily ask “rebase all my PRs” in a repo.
The merge rate on the PRs it opened has been nearly 100%. Due to the grouping, occasionally there were very minor removals, so if it was per-feature, I’d guess it was around 95% success rate or maybe even higher.
After Fable was taken down, I did a few more with Opus; it’s not as impressive, but still can find some easy issues and it’s still very careful to avoid false positives (Opus 4.8 is supposed to be 4x less likely to introduce bugs than 4.7, I think it’s mostly due to a system prompt change causing it to be paranoid with testing).
The runs
Here’s most of the ones I’ve done at the time of writing:
| Issue | Repo | PRs | Total |
|---|---|---|---|
| #4085 | awkward | 22/2/0 | 24 |
| #759 | beautifulhugo | 3/15/0 | 18 |
| #1143 | boost-histogram | 0/8/0 | 8 |
| #1097 | build (Opus 4.8) | 0/9/2 | 11 |
| #163 | check-sdist | 1/8/0 | 9 |
| #2908 | cibuildwheel | 0/2/0 | 2 |
| #2885 | cibuildwheel (4.0 pre-release Opus) | 0/2/0 | 2 |
| #2854 | cibuildwheel (Kimi-K2.6) | 0/9/0 | 9 |
| #1357 | CLI11 | 11/1/0 | 12 |
| #581 | decaylanguage | 9/0/0 | 9 |
| #86 | flake8-errmsg (Opus 4.8) | 0/4/0 | 4 |
| #690 | hist | 0/5/0 | 5 |
| #159 | histoprint | 3/0/0 | 3 |
| #1132 | iminuit | 7/1/0 | 8 |
| #23 | jekyll-indico (Opus 4.8) | 1/5/0 | 6 |
| #731 | mplhep | 6/1/0 | 7 |
| #1102 | nox | 0/10/0 | 10 |
| #1239 | packaging | 9/5/1 | 15 |
| #772 | particle | 0/7/0 | 7 |
| #820 | plumbum | 4/1/0 | 5 |
| #805 | plumbum (Opus 4.8) | 0/1/0 | 1 |
| #6084 | pybind11 | 5/0/0 | 5 |
| #2706 | pyhf | 10/0/0 | 10 |
| #230 | pyhs3 | 1/6/0 | 7 |
| #378 | pylhe | 4/1/1 | 6 |
| #6288 | pyodide (generic) | 7/0/0 | 7 |
| #6278 | pyodide (JS FFI only) | 3/2/0 | 5 |
| #376 | pyodide-build | 5/3/2 | 10 |
| #307 | pyproject-metadata | 2/7/0 | 9 |
| #144 | ragged | 6/0/0 | 6 |
| #398 | repo-review | 0/4/0 | 4 |
| #1317 | scikit-build-core | 0/6/0 | 6 |
| #549 | scikit-hep (Opus) | 3/0/0 | 3 |
| #228 | scikit-hep-testdata (Opus) | 5/0/0 | 5 |
| #241 | uhi | 0/4/0 | 4 |
| #233 | uproot-browser | 0/7/0 | 7 |
| #1646 | uproot5 | 14/1/0 | 15 |
| #317 | validate-pyproject | 0/4/0 | 4 |
| #711 | vector | 9/1/0 | 10 |
PRs column format: Open/Merged/Closed
If you have access to AI and a repository you maintain, I highly, highly recommend trying this. With Opus 4.8+ or Fable, a simple prompt is all you need. I’ve done similar things with OpenCode and Kimi K2.6, but the “someone liked the finding enough to work on it” rate was much lower, around 70%. It wasn’t high enough to want to auto-generate PRs. With open models and OpenCode or Pi, you should probably add more instructions about verifying all findings. I have not tried this with models I’m on token counts for (GPT and Gemini), since these searches are a bit pricy - Fable was taking around $20-$60 if token counting. Generating the fixes isn’t that bad, especially if you can use the simpler models - the runner model will check the subagents work.
Specific Examples (Bonus)
- Made model building go from 118 seconds to 73 seconds - and this is in a project I have never worked on, a friend requested a review!
- nox:
a fully non-ASCII session name
would wipe the whole
.noxdir, deleting every other session! - CLI11:
ignore_case()andignore_underscore()each worked alone, but together didn’t! (100% coverage didn’t catch that) - Lots of cruft from removed support platforms, like a Python 2 header injected by pybind11 - that also caused error line numbers to be off by one!
Almost every review (see above) had great findings in it, feel free to browse. Those are just a few quick ones. I hope this AI age means we’ll have rock solid stable software; focus on code quality with AI instead of pumping out new features (unless you are buildling an AI harness, the dev speed on those is scary, and I guess you have to keep up).
A few other Fable uses
I didn’t just use Fable for these, I did a few other things:
- Finally managed to rewrite repo-review’s webapp to run in a web worker (also using Claude Desktop, so it could view the errors directly, helped a lot!)
- Replaced scikit-build’s backend with scikit-build-core - and found 8 bugs in the setuptools plugin for scikit-build-core along the way! It was like “hey, I fixed these, do you want a PR for that too?”
- Reworked scikit-build-core’s test suite to save 40% wall clock time while still keeping the same coverage by reducing duplication (and more).
- Tried to do a major refactor of cibuildwheel; Fable outperformed Opus here, but still wasn’t merge-ready directly from the AI.
I have other things I want to try, I hope it comes back (and is available via subscription)!
My experience with Claude Code (Bonus)
I didn’t like the look of Claude Code at first, but I found that using
ccpowerline really helped. You can run npx ccpowerline, select the parts you
want, adjust the options (I went full-width), and then install it into Claude
Code. Here’s mine (config doesn’t seem to be sharable):
⎇ modernize | (+0,-0) Context: [██░░░░░░░░░░░░░░] 130k/1000k (13%) Model: Opus 4.8 | Thinking: high
cwd: /tmp/example Cost: $8.21 | Session: 13.0% | Weekly: 17.0%
This adds a lot of context I need while working. I can see what folder I’m in, branch, changes, current context (admittedly less important on a 1M model than most of the open source models), and info about usage.
Things I like:
- Model performance is good for Opus 4.8 and Fable, the models are really paranoid and test everything
- Starts up fast - maybe the fastest startup time for a harness I’ve used (pi without plugins doesn’t count - Copilot CLI startup time is awful)
- Subagents work really well (also workflows)
- I just discovered
/copy N, which copies the Nth response - yes! Would have been a major gripe if missing. ccstatuslineis great (asking Claude to write its own status line is not)/reviewis good (but so is other harness versions)- Generally handles
ghwell - Lots of nice isolation options like
/sandbox /remote-controlworks well until my computer sleeps, not tied to GitHub repo like Copilot CLI’s version
Things I don’t like:
- No standards: have to symlink CLAUDE.md to AGENTS.md, make symlinks to skills
- Can’t press
ctrl-p, like in OpenCode, to open a command palette. Sometimes I want to switch models while typing a prompt. - Worst
/diffimplementation (OpenCode’s is great, copilot is fine) /branchthen/rewindfeels a lot worse than other implementations, bad session naming when you do it, too/memorygets triggered a lot, I pretty much never want anything that’s not inAGENTS.md.- If you upgrade from Pro to Max, you have to logout then login (happened to another RSE, wasted about half a month of Max!)
Things I’m neutral on or haven’t tried
- I thought I’d love hooks, but they can get Claude stuck (autofix something Claude didn’t touch, Claude undoes it, repeats).
/goaland/loopsound great, but haven’t had to use them, generally I’m doing several things and don’t mind a bit of hands-on./voicesounds interesting, I just don’t have interest in using it./radio- ummm, is this why CC is 200MB+?
I’ve been enjoing being able to run 3-4 at the same time, often running several subagents, and have been fixing bugs up to 6 years old that I’ve had the time or patience for. Having it monitor CI until it turns green, and rebase, etc. is fantastic!
AI usage disclaimer: All text was written by me. AI was used to help make the table (GLM-5.1 and Opus 4.8), review the post, and set up formatting in a couple of spots.