Story Point Estimation: How We Sized Our Backlog Without Endless Debates

Story Point Estimation: How We Sized Our Backlog Without Endless Debates

If you’ve ever sat through a three-hour backlog grooming session that devolved into debates about whether a ticket is a 5 or an 8, congratulations — you’ve experienced the special circle of product management purgatory we like to call “estimation hell.”

At Spice Sage, we’ve been there. As we built our direct-to-consumer spice subscription platform, we struggled with the same estimation challenges that plague product teams everywhere. But over the past year, we’ve refined our process into something that’s not just bearable, but actually valuable. Here’s how we did it.

The Problem with Traditional Estimation

Before diving into our approach, let’s acknowledge the elephant in the room: estimation is hard, and often inaccurate. In our early days, our estimation sessions looked something like this:

* Engineer A: “This is simple, probably a 3.”

* Engineer B: “But have you considered the integration with the inventory system? That’s complex.”

* PM: “Let’s discuss the details a bit more…”

* [40 minutes later]

* Designer: “Wait, are we accounting for the mobile view?”

* [Everyone groans]

Sound familiar?

We found ourselves with three core problems:

1. **Marathon meetings** that drained team energy

2. **Inconsistent estimates** between teams and sprints

3. **Too much focus on hours**, which never matched reality anyway

Shifting Our Mindset: Relative Sizing

The breakthrough came when we stopped thinking about time and started thinking about complexity and risk. Story points aren’t about hours or days — they’re about comparing the relative effort and complexity of work items.

We began asking different questions:

* “Compared to this reference story, is this bigger or smaller?”

* “What makes this more complex than that one?”

* “What uncertainties exist here that don’t exist there?”

This shift was subtle but powerful. Instead of debating whether something would take 4 hours or 6 hours (which nobody could actually predict with accuracy), we were evaluating comparative complexity, which teams could assess more reliably.

Our Modified Fibonacci Scale

We adopted a modified Fibonacci sequence (1, 2, 3, 5, 8, 13, 21) as our estimation scale. The increasing gaps between numbers forced us to make meaningful distinctions rather than quibbling over small differences.

Here’s how we define each value at Spice Sage:

| Points | What It Means | Example |

|——–|—————|———|

| 1 | Trivial change, well-understood | Text change on landing page |

| 2 | Simple task, clear path | Adding a field to an existing form |

| 3 | Standard feature work | Adding filter to recipe search |

| 5 | Complex but contained feature | Subscription pause functionality |

| 8 | Complex work with some unknowns | Spice inventory tracking system |

| 13 | Very complex, significant unknowns | NFC tag integration for spice containers |

| 21 | So large it should be broken down | Initial AR feature set |

The beauty of this scale is that it acknowledges the exponential nature of complexity and uncertainty. The difference between a 1 and a 2 is much smaller than the difference between a 13 and a 21.

Our Step-by-Step Process

Here’s the estimation workflow that finally worked for us:

1. Preparation is Everything

Before estimation, our product managers ensure that:

* Each story has clear acceptance criteria

* Any design assets are attached

* Technical concerns are documented

* Dependencies are identified

This prep work is critical. Unclear stories lead to endless debates or, worse, misleading estimates.

2. Reference Stories for Each Point Value

We maintain a library of “reference stories” — previously completed work that serves as a benchmark for each point value. When estimating, we compare new items to these references.

For example:

* “Remember the subscription email notification feature? That was a 3. Is this more or less complex than that?”

These references create consistent benchmarks and speed up the process dramatically, especially for new team members.

3. Silent Voting with Planning Poker

To avoid anchoring bias (where the first opinion influences everyone else), we use a simultaneous reveal approach:

1. The PM reads the story and answers initial questions

2. Everyone silently decides on their estimate

3. All team members reveal their estimates simultaneously

4. If there’s significant disagreement, we discuss briefly

5. We re-vote if necessary

This prevents the loudest or most senior voices from dominating the conversation.

4. The Three-Vote Maximum Rule

Our most important rule: No story gets more than three voting rounds. If after three rounds we still don’t have consensus, we go with the higher estimate and move on.

Why? Because the time spent debating further is more valuable than the potential precision gained. This single rule saved us countless hours.

5. Split Stories That Generate Debate

If a story consistently generates extensive debate, it’s often a sign that it should be split. Rather than forcing consensus, we ask:

“Can we break this into smaller, clearer pieces?”

Often the answer is yes, and those smaller stories are much easier to estimate.

The Results: Accurate Team Velocity

After implementing these changes, something interesting happened. While individual story estimates were still often “wrong” (some 3s took longer than some 5s), our overall sprint velocity became remarkably consistent.

This is the key insight: the goal of estimation isn’t to perfectly predict each story, but to have enough relative accuracy that the team’s overall capacity becomes predictable.

Once we had 3-4 sprints of data using this consistent approach, our velocity stabilized around 45 points per sprint. This gave our product managers and stakeholders the predictability they needed for roadmap planning, without requiring perfect individual estimates.

Beyond Technical Tasks: Design and Research

We don’t just estimate development work. Our approach extends to design tasks and research work as well, with the same principles applied:

* Design Tasks: Sized based on complexity, number of screens, interaction details, and number of user flows involved

* Research Tasks: Sized based on methodology complexity, participant recruitment challenges, and analysis time

This holistic approach ensures our entire product creation process has consistent measurement.

When to Revisit Estimates

We found that estimates should be revisited in two specific circumstances:

1. When significant new information emerges that changes our understanding of the work

2. When a story remains in the backlog for more than two months (requirements and context drift over time)

Otherwise, we trust our initial estimates and move forward.

Lessons for Your Team

If you’re looking to improve your estimation process, here are our key takeaways:

1. **Focus on relative sizing**, not hours

2. **Create reference stories** for each point value

3. **Limit discussion time** with firm rules like our three-vote maximum

4. **Prepare stories thoroughly** before estimation sessions

5. **Track velocity** over multiple sprints to validate your approach

6. **Be consistent** with your scale and process

The magic happens when you stop viewing estimation as a prediction tool and start seeing it as a relative sizing exercise that helps your team communicate about complexity and risk.

What’s Next for Us

As we continue to refine our process, we’re experimenting with:

* **Team-specific reference stories** to account for domain expertise differences

* **Confidence ratings** alongside estimates to flag high-uncertainty work

* **Automated historical analysis** to identify patterns in our estimation accuracy

What estimation challenges is your team facing? Have you found innovative ways to make the process more efficient? We’d love to hear about your experiences in the comments.

*This blog post is part of our ongoing series on product management practices at Spice Sage. Last week, we discussed “Bridging Requirements and Design: Why Technical Architecture Reviews Matter.” Stay tuned for next week’s exploration of “User Story Mapping for Subscription Experiences.”*

Leave a comment