"How much did this feature cost us?" – the question I could never answer precisely
It's one of those questions that sounds simple until you actually try to answer it.
A CPO asks in a roadmap review: "What did the onboarding flow redesign cost us, total?" You know the sprint it shipped in, you know roughly how many engineers touched it, and you have a vague sense it ran long. But the actual number (hours spent, at what rate, against what estimate, including the two bugs it generated two sprints later) that's not sitting anywhere you can pull up in thirty seconds.
For most of my career as a Product Manager, I handled this question in one of a few ways: I didn't calculate it at all, I used a cost-per-story-point ratio, or I asked the engineering lead for a rough estimate and multiplied it by 1.5. When none of those felt defensible enough, I estimated backward: sprint count times team size times an average day rate. It was defensible in the room, and in any precise sense, made up: not all team members work on a single feature for the entire sprint, and that method erases every nuance.
The problem was that the actual data existed in four different tools that had never been asked to talk to each other: Jira had the tickets, GitHub had the commits, our HR system had the rates, and a spreadsheet somewhere had the original estimate. Nobody had connected them into a single answer to a single question, because nobody had needed to badly enough, until someone did.
Why "story points × sprint velocity" was not good enough
Story points were supposed to solve this. The logic was clean: normalize effort into abstract units, track velocity, and use historical ratios to forecast cost. It works well enough for planning. It falls apart for accountability.
The first problem is that story points measure perceived complexity at estimation time, not actual effort at delivery time. A three-point ticket that took eleven hours because the underlying module was a mess doesn't show up as anomalous in a velocity chart; the sprint closes on time, the story points land, and the hours disappear into the aggregate.
The second problem is that story points don't convert to money. A CPO or CFO asking about feature ROI is asking a financial question. "We spent 34 story points" is not an answer they can do anything with. "We spent approximately $18,000 in engineering time, against an estimate of $11,000, and here's where the gap came from": that's a conversation.
The third problem is that story points measure input, not outcome. They track effort going in, not value coming out. A feature that consumed 60 hours and drives daily active usage is a different investment than one that consumed 80 hours and gets opened twice a month. Story points can't tell you which is which.
What I needed wasn't a better estimation unit, but a way to connect planned effort to actual time, actual time to actual cost, and actual cost to actual delivery output across every feature, every sprint, consistently enough to be useful.
What "feature cost" actually means when you break it down
Before building anything, I had to get specific about what I was actually trying to measure, because "feature cost" turns out to mean at least three different things depending on who's asking.
For engineering, it means time: how many hours went into design, development, review, testing, and the unplanned work that followed.
For finance, it means money: those hours translated into loaded labor cost, ideally broken down by seniority and role, because a senior engineer hour and a junior engineer hour are not the same line item.
For the product, it means value: what did we get for that investment? Did it move the metric it was built to move? Are we still maintaining it, and at what ongoing cost?
A complete picture of feature cost requires all three. In practice, most teams have fragments of each and no systematic way to connect them.
The version I built first, and the one I'd recommend starting with, focuses on the first two: actual hours against estimates, translated into approximate cost using loaded rates. It's not a full ROI analysis, and it doesn't require a data warehouse. But it produces numbers specific enough to change how decisions get made.
How we built a working system with the engineering team
Getting the engineering team aligned was more practical than political: worklog culture was already in place, so the conversation was about granularity and purpose rather than buy-in. We landed on a simple convention: every entry references a feature tag and a work type – implementation, review, rework, or unplanned fix. Enough context to answer real questions; not enough to feel like surveillance.
The answer came down to visibility, the kind that benefits both sides. When cost is invisible, scope creep is invisible too: engineers absorb the consequences of a feature growing from two weeks to six without leadership noticing. Making that visible protects the team as much as it informs the business.
The practical setup had three components:
- Worklog discipline. Not surveillance-level granularity, but enough context to connect hours to features rather than just tickets. We established a lightweight convention: every worklog entry references a feature tag, not just a ticket number. This sounds minor. The difference in what you can query afterward is significant.
- Estimate capture at the right level. We moved estimates from the ticket level to the feature level, with tickets rolling up rather than standing alone. A feature estimate is a number leadership can relate to, while an individual ticket estimate is noise at that altitude.
- Cost translation. We used loaded day rates by role, not individual salaries, but role-level averages agreed on with finance, to convert hours into approximate cost. Approximate is fine. The goal is order-of-magnitude clarity, not payroll accounting.
Once those three pieces were in place, connecting them to Enji's Project Margins view gave us what we'd been missing: a comprehensive picture of estimated cost versus actual cost, per feature, updated as work progressed. Not at the end of the sprint, but during it.
PM Agent filled in the narrative layer. Instead of manually reconstructing why a feature overran, I could ask directly and get an answer that pulls from worklogs, commit activity, and calendar data, including context that would otherwise stay invisible, like unplanned incidents and capacity lost outside the sprint plan.
When a stakeholder opens Project Margins and still asks me what it means
Cost visibility creates a new problem: explaining it to people who didn't ask for it and aren't sure what to do with it.
The first time I put a Project Margins view in front of a CPO, the reaction wasn't "Great, now I know" but "What am I looking at, and is this bad?" The data was accurate; Enji had pulled everything together correctly, but the context wasn't there.
What I learned: cost data without narrative is just anxiety-inducing. Leadership doesn't need to know that a feature overran by 40%; they need to know why it overran, whether that reason is systemic or situational, and what (if anything) should change as a result.
My job shifted.
I stopped presenting cost data as information and started presenting it as context for a recommendation. "The authentication module came in at 140% of estimate. Half of that is explained by a scope change we agreed to in week two. The other half is a pattern we're seeing across anything that touches the legacy session layer; that's worth a separate conversation about whether we address it proactively or keep absorbing it reactively."
That's a different conversation from "we went over budget." It's also the one that actually gets resources allocated, priorities shifted, or architectural decisions escalated.
How cost visibility changed the way I prioritize the roadmap
The most unexpected effect of making feature costs visible was what it did to prioritization, not in a dramatic way, but in a quiet, persistent way that added up over time.
- Before: I prioritized based on perceived business value and team capacity. Both of those inputs were soft: business value was usually an argument, not a measurement; capacity was usually a gut check, not a forecast.
- After: I had a third input, which was historical cost accuracy by feature type. And it turned out to be more informative than either of the first two.
We discovered that certain categories of features consistently overran estimates by significant margins. Anything touching our notification infrastructure took roughly twice as long as estimated, almost without exception. Anything involving third-party API integrations had high variance, sometimes on estimate, sometimes wildly over, with no reliable predictor.
That information changed how I approached both planning and prioritization. For example, notification features got a standing 2× buffer as a starting point, which helped, but as a blanket rule, it created distortions, over-allocating capacity to simple tasks and leaving complex ones underestimated. It worked better as a prompt to look closer at each item than as a rule to apply blindly. Features with significant third-party API dependencies got scoped more conservatively and staged more carefully, because the variance was real and we couldn't predict the direction.
None of this required a new process: only having the data in a form that made the pattern visible. That's what consistent cost tracking, over enough sprints, eventually produces. In practice, Project Margins is where I check it: the pattern becomes impossible to ignore when it's sitting in the same view every week, updated without any manual assembly on my end.
Lessons learned, and gaps still open
Two things I'd do differently:
- Start with the data layer, not the reporting layer. We built a cost dashboard before we'd established reliable worklog discipline, and the first version was full of gaps that undermined trust in the numbers before we'd had a chance to demonstrate their value. The right order is data first, reporting second, and decisions third.
- Prioritize consistency over precision. We spent too long debating the right loaded rate, the right way to handle overhead allocation, and the right granularity for feature tagging. A perfect calculation methodology is less valuable than a consistent methodology that you actually use. Approximate and consistent beats are precise and inconsistent every time.
Those are the things I'd change. What I haven't yet solved is a different category, gaps that I suspect most teams hit eventually.
The first is connecting feature cost to feature value in a way that's systematic rather than anecdotal. I can tell you what a feature costs to build; I cannot always tell you, in the same dashboard, what it returns. Usage data lives somewhere else, and revenue attribution is a separate conversation with a separate team. The left side of the ROI calculation is cleaner than the right side.
The second is the ongoing maintenance cost of features post-launch. Build cost is tractable. The engineering time consumed by a feature over the two years after it ships, bug fixes, performance work, dependency updates, and support investigations are harder to attribute and rarely tracked at all. That's a real cost that rarely appears in feature budgets, and I haven't found a clean way to surface it yet.
What a Product Manager needs from engineering to make this work
This doesn't work if the PM builds it alone; the data lives in engineering, the habits that produce useful data live in engineering, and the buy-in that makes those habits sustainable has to come from engineering leadership.
And before getting into what that looks like in practice, the quality of what Project Margins surfaces is directly proportional to the quality of the data going in. Clean worklogs, consistent tagging, and honest re-estimation produce reliable cost visibility. Gaps and shortcuts in the data produce gaps and shortcuts in the output, and those tend to surface at the worst possible moment, like a leadership review where someone asks a follow-up question you can't answer.
What I needed, specifically:
- An agreed feature taxonomy. A shared list of feature tags that both product and engineering use consistently, so worklog entries and ticket work roll up to the same buckets. Without this, you're aggregating apples and office chairs.
- Worklog context, not just hours. Hours logged against a ticket number tell you the time. Hours logged with a note about what the work actually was (implementation, debugging, rework, review) tell you something about where the time went. The second is more useful. It also requires slightly more friction in the logging habit, which means engineering leadership has to actively support it.
- Honest re-estimation when scope changes. If a feature grows mid-sprint and the original estimate doesn't get updated, the cost data shows a false overrun that has nothing to do with execution quality. We established a convention: any scope change above a certain threshold triggers a re-estimate and a note. Small overhead, significant improvement in data quality.
- Access to rate information at the role level. Not individual salaries: role-level loaded rates, agreed on with finance and kept current. Cost data that runs on outdated rates produces accurate-looking numbers that are quietly wrong, and those tend to surface at the worst possible moment.
After those inputs are in place, the output, consistent, believable feature cost data that supports real decisions, is achievable without a dedicated analytics team or custom data infrastructure. It's a process question more than a technology question, and while it's not simple, it's manageable: the steps are known, the inputs are defined, and the result is predictable if the discipline holds.
The question "How much did this feature cost us?" should have a real answer. It can; it just requires deciding that the answer matters enough to build the habit of capturing it.
