The point of story points

Their practical uses and limitations

The point of story points

By Sidu Ponnappa

A story point is a metric used in agile project management and development to determine (or estimate) the difficulty of implementing a given user story.”

Some teams count these story points in time — days, or even hours.

Some teams count these story points in complexity — multiples of a very simple story, like “Accept a param via an API and store it in DB”.

But whether they measure time, or they measure complexity, most teams tend to derive little value from them.

Story points are a valuable partial solution to a Truly Hard Problem, and a powerful software engineering and product execution tool. Yet, more often than not they are misunderstood and reduced to mere ritual.

What exactly are story points used for? Let’s approach this from the top.

How long will this take to ship?

Non-trivial software projects are complex systems. Meaningful predictions are hard, and constantly hindered by emergent properties. The rapid rate of increase of software entropy doesn’t help.

At GO-JEK, our crazy, often double digit week on week growth makes sustained execution really hard. Increasing predictability in execution is a real, pressing problem for us.

There are a small handful of techniques we use that make software projects somewhat predictable. All involve compromises and all require rigour in execution.

  1. Minimise assumptions. Replace detailed long term plans with rapid iterations that validate hypotheses based on real world feedback.
  2. Define ‘completion’ as software reaching the customer.
  3. Scope completion is never the target.
  4. Prioritisation accuracy is a key metric.
  5. Per cycle productivity is a key metric.
  6. Estimation accuracy is a key metric.
  7. Accept that fine grained complexity estimation of software by a human is always wildly inaccurate.
  8. Ensure liquidity when converting execution complexity estimates to time estimates to compensate for volatility.

Story points are a tool to improve estimation accuracy, and to help convert complexity to timelines.

What does this mean, practically?

All the theory is fine, but what does a practical execution template for a team look like? Here’s what I use as a basic execution framework myself.

  1. Short execution iterations. I tend to prefer one week. ‘Done’ means in production.
  2. Categorize User Stories by coarse complexity into Trivial, Small, Medium and Large. This must be done by the developers who will build the software and not anyone else.
  3. Definition of Trivial: It’s done and in production before we finish talking about it. Think one line changes.
  4. Definition of Small: It’ll be quick. Not more than a day, probably.
  5. Definition of Medium: More than a small, but will fit within the iteration for sure.
  6. Definition of Large: Will take more than an iteration.
  7. Always split all Large stories into Trivial, Small and Medium stories. Never execute on them as is, as they are beyond the detailed planning horizon.
  8. Assign a numeric value to each category of complexity. I tend to do Trivial:0, Small:1, Medium:3, Large:5. This gives you complexity in “story points.” Remember, Story Points are a currency, and have no unit. They exist simply to introduce liquidity into the conversion of complexity to time.
  9. The sum of the points of completed stories in an iteration gives you a (relative) measure of productivity of the team. This is often called velocity. Because it’s a relative measure without a unit, using it as a target or comparing the velocity of two teams is a (*cough*) pointless activity. Fixing bugs doesn’t add to velocity for obvious reasons.
  10. Track the actual time taken for each story in days. Over a few iterations, you should be able to calculate the average conversion rate of one point of complexity into days.
  11. The team should steadily strive to improve their estimates so they converge with actuals. A consistent ±20% accuracy is pretty awesome.
  12. The team should steadily strive to increase their velocity iteration to iteration. Velocity is directly correlated to quality. Better tools, improved skills and more automation are typical ways to improve quality, and so improve velocity.

A measure of predictability

Volatility in the velocity of a team will give you clear, timely warning of emergent behaviour that is affecting productivity. This can be anything from a change in team strength, to a change of tech stack, change in quality of requirements or a change in the defect rate.

Sudden volatility should trigger a root cause analysis by the team. We may not always be able to control these factors, but it’s crucially important to know what they are.

Timelines for milestones are recomputed iteration to iteration by converting the total points in scope to time using actual time taken. This is not precise, but because it is based on actual data, it is far more accurate than just… guessing. The analogy here is the valuation of a company — we compute the value of a point the same way that the market approaches valuing the shares of a company. “Mark to market” of points, so to speak, should ideally happen every iteration.

Tactical control is exerted over timelines by managing the number of points in scope, or of course, adjusting the timelines themselves.

In Summary

Story points are a currency. They have no unit. We use concepts from economics to make aspects of software projects, which are complex systems, somewhat tractable.

They exist for two reasons:

  • They are indicative of, and help optimise, productivity
  • They allow conversion of estimates of complexity to estimates of time based on data rather than feelings

They are useless (or being misused) if:

  • They are illiquid — i.e nobody trusts the story points enough to treat them as a meaningful currency backed by the output of the team
  • They aren’t backed by sufficient real world data — think secondary market valuations (scarce, coarse data with occasional mark-to-market) vs primary market valuations (lots of fine grained data with continuous mark-to-market)
  • They have a unit — time (hours, days), effort (LoC) — are both examples of this kind of mistake