Interested in working together?
Interested in working together?
Engineers treat prompts like throwaway text — no version control, no test suite, no regression detection. A prompt that worked last week silently degrades after a model update, and there's no tooling to catch it.
Sole creator. Designed the command architecture, scoring rubric, cost benchmarking model, and plugin distribution strategy. Open-sourced on GitHub.
Five user-invoked commands (/evaluate, /improve, /regression, /compare, /run) plus two auto-invoked skills (prompt-quality-check, golden-dataset-builder). All output persists to .promptops/ locally — evaluations, improved versions, regression reports, and cost comparisons. The $ARGUMENTS pattern with behavioral overrides prevents Claude from executing prompts instead of analyzing them.
Distributed as a Claude Code plugin — installable via terminal or Mac app upload. All state stored locally in .promptops/ per project. No auth, no backend, no infrastructure for v1.
Closes the gap between writing prompts and testing them. Engineers get a repeatable evaluate → improve → verify → run loop without leaving their terminal.
I Built a Prompt Testing Plugin for Claude Code — Here's How
PromptOps treats prompts like tested code — evaluate, improve, regression-test, and run, all without leaving your terminal.
The Agentic Frontier Is Not a Single Race
Anthropic builds the engine. Cognition drives the car. But who defines the finish line — and what happens when your AI loses the plot halfway there?