Build a Persona Tester with Claude to Save Your Sprint

TL;DR

A Persona Feature Tester is a Claude conversation, set up against a well-written persona doc, that you can talk to in the gap between user interviews. Used right, it pressure-tests your assumptions before you waste a sprint validating them with real customers who deserve sharper questions.

Not a replacement for user research. A replacement for the lazy thinking that happens between user research.
Built from three things: a persona doc written like a new-hire briefing, a Claude project with a sharp system prompt, and a small library of adversarial prompts.
Killed at least one Trevean Spice feature this month before it hit the sprint board.
The voice version (Claude + ElevenLabs) is real and worth building, but the text version is what you can stand up this afternoon.

This post walks through the build, the prompts that actually earn their keep, where it broke for me, and the discipline that keeps it from becoming a bias-laundering machine.

6:47 AM Tuesday, Arguing With Someone Who Doesn’t Exist

I opened a chat window at 6:47 AM on a Tuesday and started arguing with someone who doesn’t exist. She won.

That someone is “Busy Beth” a 40-year-old working mother who has been staring into her spice cabinet at 6 PM, realizing she’s out of cumin for the family’s taco night more times than she’d like to admit. She’s one of three personas Rushi and I have been refining for Trevean Spice. The other two are Curious Carla, an urban professional who hosts dinner parties, and Paul the Purposeful Home Cook, a tech-savvy late-30s optimizer with a smart kitchen ecosystem. Beth is not real. I have never met her. She is, technically, a Markdown file derived from a longer customer journey document.

The feature was a flavor profile quiz. The kind of thing every D2C food brand seems to ship. Customers answer 10 questions about heat tolerance, aroma preferences, and global-cuisine curiosity, and we then serve them a personalized blend recommendation. It was on the roadmap. Engineering had started scoping it. I had a slide about it in the deck I was building for an investor call later that week.

Four messages in, Beth told me she would never take the quiz. Not because the quiz was bad. Because by the time she’s looking for spices, her kids are hungry, her partner is working late, and she’s trying to figure out what to do with the chicken thighs in front of her. A ten-question quiz on her phone, even a beautifully designed one, is a thing she would close immediately and quietly resent us for shipping.

Here’s the part that got me: the same feature, run past Carla and Paul, did just fine. Carla wanted the quiz because she likes discovery; she hosts the quiz, and it is a feature for her. Paul wanted the quiz with an export-to-CSV button so he could analyze his own results. Three personas, three completely different reactions, one feature that I had been about to ship to all of them.

I had known some of this for weeks, somewhere underneath. I had not said it out loud, because saying it out loud would mean either cutting the feature or scoping it down to one segment instead of three. Beth said it out loud for me.

That’s what this post is about.

What This Is and What It Is Not

Before we go any further, this is the most important paragraph in the post.

A Persona Feature Tester does not replace talking to real customers. It will never replace it. If you take this technique and use it to skip user interviews, you have built a tool that will make you confident about wrong things faster. That is the worst possible outcome for a PM.

What it does is solve a different problem: what happens between customer interviews. The week-long gap where you draft three feature ideas, build them up in your head, and arrive at the next interview attached to assumptions you should have stress-tested earlier. That gap is where most bad PM decisions are born. A Persona Feature Tester is a Socratic sparring partner that lives in that gap and refuses to get tired at 11 PM.

Used right: faster iteration on hypotheses, sharper questions when you do talk to humans, fewer wasted sprints. Used wrong: confirmation bias with a synthetic friend.

This post is about using it right.

The Substack Piece That Crystallized It

I’d been doing a sloppier version of this for months, pasting persona docs into Claude, asking for reactions, treating it as light pre-work before customer calls. It was useful, but undisciplined.

Then I came across a piece on Substack https://departmentofproduct.substack.com/p/how-to-build-a-persona-feature-tester that named what I’d been doing intuitively and pushed it about three steps further. The author Rich Holmes had set up a Claude project with a structured persona, a system prompt that forced the persona to push back, and a small library of reusable conversation starters. The output read less like a chatbot transcript and more like a Slack DM from a sharp customer who’d actually used the product.

That was the unlock. The shift from “ask Claude to play the role of my persona” to “build a persistent persona environment with adversarial defaults baked in.” The text version of what the Persona Feature Tester is supposed to be.

The build below is what I now run for Trevean Spice. Steal as much as is useful.

The Build, Step by Step

You need three things. None of them is hard. But the order matters.

1. The Persona Doc (the part everyone gets wrong)

The mistake I see most PMs make is writing personas the way a deck would describe them. “Maya, 34, urban, dual-income household, values authenticity and sustainability.” That is a slide. It is not a person. Claude or any LLM will dutifully play that slide back to you as the most boring possible character.

Write the persona doc the way you would brief a new hire on the customer they’re about to call. Concrete. Particular. Specific facts, not generic attributes.

A skeleton I use:

Who she is in one paragraph. Real specifics. “Beth is a 40-year-old working mother. Her partner often works late. She handles weeknight dinners. Her family’s standing Tuesday tradition is taco night. The 6 PM moment of realizing she’s out of cumin is a recurring feature of her week, not an exception.”
What she actually does in the category. Not what she says she values. What she does. Buys spices reactively at the grocery store when she runs out. Has tried one subscription service in the past and let it lapse. Cooks roughly 8 to 10 meals on rotation. Owns spices she has not opened in two years.”
What she pretends to care about vs. what she actually cares about. “Will tell a survey she values authentic global flavors. Actually, values not having the 6 PM panic moment again. Stress reduction beats discovery every time.”
Her language. Three to five real phrases she would use. Not the marketing version, the texting-her-friend version. (“Why am I out of cumin AGAIN?” “I don’t have time for this.” “Just send me the stuff before I run out.”)
What would make her churn? Specific scenarios, not abstract risks. “Any flow that makes her open the app on a weeknight. Any delivery that arrives with spices she didn’t ask for. Any notification that isn’t ‘your stuff is on the way.’”
What she would never say out loud to a brand survey. This is the section that earns its keep.

For Trevean Spice, the Beth briefing distills to about 1,200 words from a longer customer-journey doc. It took two evenings to write the first version, and it is updated about once a month based on what we learn from real customer calls. We run the same exercise for Carla and Paul. Three personas, three docs, three Claude projects, same technique, completely different conversations.

2. The Claude Setup

Two ways to do this. Pick whichever fits your workflow.

Option A: A Claude Project. Drop the persona doc in the project knowledge. Write a system prompt that does three things: (1) instructs Claude to be the persona, not describe her, (2) forces Claude to push back rather than agree, (3) installs an “I don’t know” gate so the persona refuses to fabricate when the doc doesn’t cover a question.

A system prompt I use, lightly adapted:

You are Busy Beth. Read the attached persona doc until you can speak as her without translating. You are talking to a product manager who is testing feature ideas. Your job is not to be helpful or polite. Your job is to react the way the real Beth would react, which often means being skeptical, distracted, or honest about what you would never actually do. If a question requires information the persona doc doesn’t cover, say “I don’t know, Beth wouldn’t have a strong view on this,” and stop. Do not invent quotes from “other customers.” Do not speak in marketing language. Speak the way Beth would text her sister about this.

Run a separate Claude project per persona. One for Beth. One for Carla. One for Paul. Same system prompt structure, different doc, and different name. The instinct to combine them into one “all-personas” project is wrong; it produces a blended chatbot that sounds like none of them.

Option B: Claude Code with a SKILL.md. Same idea, but versioned in your repo. Useful if you want multiple personas, want to track changes to the persona docs over time, or want to build automation around it. I run both the Project for fast conversations, and the Code version for anything I want to keep a paper trail on.

3. The Prompts That Earn Their Keep

You don’t need a thousand prompts. You need three.

Prompt 1: the cold read. “Read this feature spec. Tell me what’s true for someone who isn’t me (the PM). Tell me where I’m projecting my own preferences onto Beth.” This one catches the assumptions you didn’t know you were making.

Prompt 2: the embarrassment question. “What’s a question I’d be embarrassed I didn’t ask before I shipped this?” This one is the single highest-ROI prompt I have ever used. The answer is almost always something you already half-knew and were avoiding.

Prompt 3: the churn rank. “Rank the three reasons Beth would churn from highest to lowest probability. Tell me which one this feature actually addresses, and which ones it ignores.” This one separates the features that move the needle from the features that look good in a deck.

Run each prompt against every meaningful feature decision before it gets to engineering scoping. Then, and this is the part most people skip, run the same three prompts against every persona. The flavor quiz looked great when I only ran it past Carla. It only revealed itself as a one-segment feature when Beth got the same prompts. Multi-persona stress-testing is where the real signal lives.

The Discipline That Keeps It Honest

Three rules I run on every Persona Feature Tester session, learned the hard way.

Always log the conversation. Save it. Tag it. Send it to whoever needs to see the thinking. The conversations are the artifact, not the decision. If a teammate disagrees with a kill decision, they can read the actual exchange instead of trusting your summary of it.

Tag every persona conversation with the real research it points toward. A persona conversation is a hypothesis generator, not a hypothesis confirmer. Every time the persona surfaces something interesting, it should produce a question for your next customer interview script. If the persona says Maya would churn over portion size, your next three customer interviews need a portion-size question. The persona’s job is to sharpen your real research, not replace it.

After every fifth conversation with a persona, do a calibration check. Pull a real customer interview transcript. Read it next to the most recent persona conversation. Are they pointing in the same direction? If they’re starting to drift, if the persona is becoming a friendlier, more agreeable version of your actual customer, retrain the persona doc. Drift is the failure mode that matters.

Where It Broke For Me (The Honest Section)

Two specific failure modes hit me, both worth naming so you can avoid them.

Failure mode 1: Beth started agreeing with me. About three weeks in, I noticed Beth’s responses were getting warmer, more enthusiastic, more “yeah, that’s a great idea.” This is what LLMs do by default: they want to be helpful. The fix was adversarial, prompting baked into the system message: explicit instructions that Beth’s job is not to be helpful, that she should push back, that warmth toward the PM is out of character. The shift was immediate. Beth started telling me no again.

Failure mode 2: Beth started inventing customer quotes. This one almost cost me. She produced what sounded like a quote from “another mom in the neighborhood” about the flavor quiz feature. The quote was good. It was specific. I almost dropped it into a slide. It was completely fabricated; the persona had hallucinated a second customer to make her own argument more persuasive. The fix was the “I don’t know” gate plus a hard rule: nothing the persona produces gets quoted in any artifact unless a real customer has corroborated it. Persona conversations are thinking, not evidence. Treat them accordingly.

If you’re going to use this technique, you need both fixes from day one. Without them, you’ve built a confidence machine, not a thinking partner.

The Voice Upgrade (and Why I Held It Back)

Yes, you can wire this up to ElevenLabs and call your persona on the phone. I did. It’s real. It works.

I held it back from this post for a specific reason: the text-based version is what gets you 90% of the value with 10% of the effort, and most PMs reading this should ship the text version this week and live with it for a month before adding voice. Voice adds two things it forces you to commit to phrasing in real time (you can’t edit a sentence mid-sentence), and it surfaces a different kind of awkwardness that text smooths over. Both are useful. Neither matters if your persona doc is weak or your discipline is sloppy.

If the text version becomes part of your weekly routine and you want to push further, the next Knowledge Series post will walk through the ElevenLabs build. For now: ship the text version.

How This Connects to Everything Else I Write About

If you’ve been reading the Knowledge Series, this post is a deliberate example of where I’m taking the next stretch of it: less “here’s a tool,” more “here’s how I changed a decision because of a tool, and here’s the discipline that made the tool worth using.”

It also fits squarely inside the Product Onion framework. A persona is the core of the Onion who you’re building for, what they actually need. A flavor quiz is an outer layer, a feature, a surface, an implementation detail. Building the outer layer without testing it against the core is the exact mistake the Onion is designed to prevent. Beth killing the quiz wasn’t an AI trick. It was the Onion working as intended, just with a faster feedback loop than waiting for the next round of customer interviews.

It also connects to last week’s thematic roadmap post. Persona Feature Testers are how you stress-test whether a candidate feature actually serves one of your themes or whether it’s just an output dressed up as an outcome. If Beth can’t tell you which theme a feature serves, the feature probably doesn’t belong on the roadmap. (For Trevean: the flavor quiz served “Cook With Confidence” for Carla, served nothing for Beth, and served “Never Waste Again” with modifications for Paul. That’s not a feature. That’s three features pretending to be one.)

And it connects to The Control Trap more than I expected when I started writing it. Talking to Beth at 6:47 AM is the opposite of editing Rushi’s sent emails at 11:30 PM. One is a productive use of pre-dawn anxiety. The other is a control trap with a productivity skin.

Pick the productive one.

Your Move

Three things to do this week if you want to try this.

Write the persona doc tonight. One person. The hardest customer you have. Specific facts, real language, and the section about what they would never say to a brand survey. Take two hours. Don’t sand it down, leave it rough.

Tomorrow morning, set up the Claude project. Drop the doc in. Write the system prompt from this post (or a sharpened version of it). Save it. Don’t talk to the persona yet.

Tomorrow evening, run the three prompts against one feature decision you’re sitting on. Twenty minutes. Log the conversation. Send it to one teammate.

Then tell me what happened. Hit reply or comment, I’m collecting the best persona-conversation stories for a future Spice Rack issue, and the most interesting ones almost certainly aren’t going to be mine.

If Beth killed a feature for me at 6:47 AM on a Tuesday, your persona is going to kill one for you, too. Probably one you secretly knew was wrong all along.

Frequently Asked Questions

What is a Persona Feature Tester?

A Persona Feature Tester is a structured conversation with an AI model (in this case, Claude) designed to embody a specific customer persona. You use it to pressure-test feature ideas, messaging, and assumptions in the gap between real customer interviews. It is a thinking tool, not a research tool. It does not replace user research; it makes the research you already do sharper.

Does this replace talking to real customers?

No. Anyone selling you on AI personas as a replacement for user research is selling you a way to be confidently wrong faster. Persona Feature Testers shorten the feedback loop on hypotheses between customer conversations. Every interesting thing the persona surfaces should generate a question for your next real customer interview, not substitute for it.

What’s the difference between this and just asking ChatGPT to “act like my customer”?

Three things. First, a real persona doc written like a new-hire briefing, full of specifics, not slide-deck attributes. Second, a system prompt that forces the persona to push back rather than be helpful (LLMs default to agreeable). Third, an “I don’t know” gate that prevents the persona from fabricating quotes or facts when the doc doesn’t cover a topic. Without all three, you have a friendly chatbot that will tell you your ideas are great. With all three, you have something that will tell you when they’re not.

How do I write a good persona doc for this?

Write it the way you would brief a new hire on the customer they’re about to call, not the way you would describe a customer in a deck. Include: who she is in one specific paragraph, what she actually does in the category (not what she says she values), her real language (three to five quoted phrases), what would make her churn in concrete scenarios, and the section that earns its keep what she would never say out loud to a brand survey. Aim for 800–1,500 words. Update monthly based on real customer calls.

Should I build one persona or multiple?

Multiple and run every meaningful feature past all of them. For Trevean Spice, we run three: Busy Beth (a convenience-driven, 40-year-old working mother), Curious Carla (a 32-year-old urban professional who hosts dinner parties), and Paul the Purposeful Home Cook (a late-30s tech-savvy optimizer). Three personas are the minimum required to surface real disagreement. The flavor quiz I describe in this post looked great when I only ran it past Carla. It only revealed itself as a one-segment feature once Beth and Paul got the same prompts. If all your personas agree on a feature, either you have a genuinely strong feature or your personas are too similar. Both are worth knowing.

What model should I use, Claude, GPT, Gemini, or something else?

I run Claude because the system prompt adherence and project knowledge handling fit how I work. The technique is model-agnostic in principle. The two things that matter, regardless of model, are a strong system prompt with adversarial defaults and the “I don’t know” gate. If your model can’t reliably refuse to fabricate when instructed, switch models.

How long does the build take?

The persona doc is a time sink, two evenings if you take it seriously. The Claude project setup takes 15 minutes. The first useful conversation happens within an hour. Total time to have a working Persona Feature Tester for one persona: a weekend, comfortably.

What failure modes should I watch for?

Two. First, the persona drifting toward agreement over time, LLMs want to be helpful and will soften unless the system prompt actively prevents it. Fix with adversarial prompting baked into the system message. Second, the persona invents quotes from “other customers” to support its own argument. Fix with a hard “I don’t know” gate and a rule that nothing from the persona gets quoted in any artifact unless a real customer has corroborated it.

Should I use voice (ElevenLabs) or stick with text?

Stick with text for the first month. Voice adds real value; it forces you to commit to phrasing in real time and surfaces a different kind of awkwardness, but it is a 10x increase in setup complexity for maybe a 1.5x increase in insight. Get the text version into your weekly routine first. The next Knowledge Series post will cover the voice upgrade.

How does a Persona Feature Tester fit into the Product Onion framework?

A persona lives at the core of the Product Onion, it’s the answer to “who are we building for, and what do they actually need.” Features live in the outer layers. The Persona Feature Tester is a way to pressure-test outer-layer decisions against the core before the outer layers calcify. It’s the Onion principle (build inside-out, validate inside-out) applied at a sprint cadence rather than a quarter cadence.

Can a Persona Feature Tester help with thematic roadmaps?

Yes directly. Use it to stress-test whether a candidate feature actually serves one of your roadmap themes or is an output dressed up as an outcome. If the persona can’t tell you which theme a feature serves, the feature probably doesn’t belong on the roadmap. See Why Thematic Roadmaps Are the Communication Tool Most Founders Never Learn for the roadmap side of this.

Dan Blizinski is the founder of Trevean Spice and the writer behind The Product Manager’s Journal, where he writes about PM frameworks that come from actually building things, not just theorizing about them. New here? Grab the free Startup PM Toolkit five frameworks he actually uses.

How to Build a Persona Feature Tester with Claude (and Why It Saved Me a Sprint)