Created: March 22, 2026

Anastasiia Rebrova.

Anastasiia Rebrova Project Manager

Project Management

[Technical Debt Is a Business Problem: How to Quantify, Visualize, and Communicate It to Leadership]

Analyze with AI

Get AI-powered insights from this Enji tech article:

Why I stopped trusting "more tickets closed" as proof of team performance

For a long time, my weekly report to leadership looked mostly clean. Velocity was acceptable, tickets were closing, and sprint goals were met, or close enough that the delta felt like a normal margin. A day here, two days there. Nothing that warranted an escalation.

Then we missed a release by three weeks. Not because the team stopped working (they were working harder than ever). We missed it because we were moving through code that had become genuinely dangerous to touch. A refactor that should have taken two days took nine. A bug fix introduced two new bugs. The signal was there, but it didn't reach my reports. There was nothing to flag: the tickets kept closing.

That's when I started questioning what I was actually measuring. Technical debt was the missing variable (or at least the part of it I could have measured) actively shaping our delivery capacity without showing up anywhere in my reports. The cost of technical debt was embedded in every inflated estimate, every "simple fix" that took a week, and every engineer who left because they were exhausted by code they were frustrated with.

Recognizing that was step one. Getting leadership to care about it, people whose mental model of the project ran on timelines and budget lines, not code quality, was a different problem entirely.

What actually broke when we added AI assistants to our delivery process

We adopted AI coding assistants (I won't name the specific tools because the pattern is universal), expecting velocity gains. We got them initially: engineers were shipping faster, code generation helped with boilerplate, and reviews moved more quickly.

What nobody anticipated was the debt acceleration. AI assistants are extremely good at producing working code quickly. But they are not inherently good at producing code that fits coherently into an existing architecture, respects established patterns, or avoids duplicating logic that already exists somewhere in a 200k-line codebase.

Within a few weeks of the first releases, small inconsistencies started surfacing in review, although nothing was blocking, and nothing had failed a test. Duplication crept in gradually, and pattern variants multiplied across sprints. By the time a new engineer asked which version was canonical, there were four of them, all technically working, none of them authoritative.

The technical debt management challenge had shifted. Before AI assistants, debt accumulated at a pace that matched human writing speed; after, it could accumulate at generation speed, and the tooling we had for tracking it hadn't changed.

Our code review process was the first thing that buckled. Reviewers were now gatekeeping volume they hadn't designed for; the reviews that caught real architectural problems started getting compressed because there were simply more reviews to process. We were optimizing for throughput and inadvertently penalizing depth.

The lesson: AI assistance without delivery intelligence is an accelerator with no governor. You need visibility infrastructure that scales at the same rate as your generation capability.

The metrics I now watch first (and why classic DORA wasn't enough)

I still track DORA – DevOps Research and Assessment metrics. Deployment frequency, lead time for changes, change failure rate, and time to restore: these are legitimate health indicators, and I'm not dismissing them. But they measure the pipeline, not the payload. They tell you how your delivery machinery is running, but don't tell you what the delivery machinery is accumulating inside the walls.

The technical debt metrics I now watch first:

  • Estimation accuracy over time. Not per sprint, but trended across quarters. When a team consistently underestimates, it's rarely because they're bad at estimation; it's because the codebase is more resistant than the tickets suggest.
  • Rework rate. What percentage of closed work comes back? A ticket reopened two sprints later is often a symptom of a fix that couldn't be done properly because of the surrounding code. Rework that clusters around specific modules is diagnostic: those modules are where debt is concentrated.
  • Cycle time variance by domain. Average cycle time is nearly useless. The variance, particularly which areas of the codebase show consistently high variance, tells you where the friction lives. Predictable cycle time means clean code. Wildly variable cycle time means something is wrong structurally.
  • Bug-to-feature ratio. When maintenance work starts crowding out development, it shows up here first. The threshold depends on where you are: for a young product or a codebase under active growth, a ratio trending above 20% is already a signal worth escalating. For mature systems, you have slightly more runway, but anything consistently above 30-35% should trigger a conversation with leadership, not a sprint retrospective.
  • Review depth signals. How many comments per PR, and more importantly, are they architectural or cosmetic? A high comment volume on cosmetic issues means reviewers are spending energy on surface problems because the structural ones are too expensive to address in a normal review cycle.

DORA tells you how fast the water is flowing, while these metrics tell you how much sediment is accumulating in the pipe.

Where agentic AI really helps an engineering manager day to day

I was skeptical of "agentic" as a category for a long time. My first attempts didn't help: I'd prompt a model with a project question, get a confident-sounding answer that missed half the context, and go back to pulling data manually. It felt like a faster way to produce confident answers to the wrong question.

What changed my view was experiencing the specific problem that agents solve for engineering managers: the synthesis bottleneck.

My job requires me to hold context across Jira, GitHub, Slack, our calendar, and a dozen ongoing conversations simultaneously. Before I can make any meaningful decision, whether to escalate a risk, approve a scope change, push back on a deadline, or flag a resource problem, I need to reconstruct the current state of the world from fragments scattered across those tools. That reconstruction was consuming hours of my week, and it was the part of my job that added the least value. It was mechanical work dressed up as management work, time-consuming, but not analytical.

Agentic AI handles the synthesis. I can ask, "What's the current state of the authentication module refactor, and what are the blockers?" and receive an answer that pulls from Jira tickets, recent commits, PR comments, and standup logs, assembled in seconds, in plain language.

The practical impact on technical debt management is significant. Debt is invisible partly because nobody has time to look. When synthesis is cheap, you can ask the questions you were previously skipping because the answer would have taken too long to construct. You start catching things earlier. The cost of technical debt stops compounding silently.

What agentic AI doesn't replace is judgment. I still decide what the information means, what to prioritize, and what to communicate to leadership. The agent gives me the raw material. The engineering management is still mine.

How we plugged Enji into our existing tool chaos

"Tool chaos" is the accurate description for how most engineering organizations actually operate, and ours was no exception. The integration question was pragmatic: we couldn't rip and replace, and we didn't want to add another tool that required its own maintenance. We needed something that read from what we already had and made it coherent.

Enji connected to Jira and GitHub first: those were the critical path. The setup was straightforward. Within the first week, the Worklogs feature was pulling actual time data against planned estimates in a way our previous tooling hadn't done at the right granularity. We could see, per feature, per developer, per sprint, where time was going versus where we'd predicted it would go.

The Project Margins view showed the overspend clearly. Connecting it to accumulated technical debt still required my own read of the codebase, but for the first time, I had the numbers to start that conversation. I had a view that connected delivery activity to cost, not in an accounting sense, but in a "this feature was estimated at 40 hours and took 90, and here's the pattern across the last quarter" sense. That's not a velocity number. That's a business number – the kind I can put in front of a CFO.

PM Agent changed my morning routine; instead of opening five tabs and spending thirty minutes reconstructing what happened yesterday, I ask a question and get a briefing. The first time it correctly identified that two separate issues were likely related to the same underlying module problem (before I'd made that connection myself), I understood why Enji positions itself around 'delivery intelligence' rather than 'engineering metrics.' The distinction isn't marketing; it reflects a real difference in what the tool surfaces. The same logic applies to the AI Activity Dashboard, which works at the team layer: spotting declining collaboration patterns and engagement signals before they show up as delivery problems.

What my workflow looks like with AI agents and delivery intelligence

The process isn't complicated. With the right tooling, it runs on the margins of normal work — not in addition to it:

  • PM Agent briefing on the prior sprint: checking for estimation drift, unexpected ticket movement, and bug-to-feature ratio shifts; context already assembled before standup, not reconstructed during it.
  • Outlier analysis: identifying tickets that took significantly longer than estimated and grouping them by codebase area; patterns across the same module are signals, not coincidences.
  • Leadership prep: pulling budget data from Project Margins and delivery narrative from worklog summaries; data assembly is automated, so the time goes to thinking rather than compiling.
  • Recurring debt signal report: configured once as a PM Agent periodic task, delivered by email automatically. What moved, what it likely means, what's worth watching.

That last point matters more than it might look: every successful conversation I've had with leadership about addressing technical debt was built on a paper trail. You can't ask for investment based on a feeling. With this setup, the paper trail generates itself.

Trade-offs, failure modes, and things I would not automate (yet)

What I've learned to be cautious about:

Conflicting data across tools
When Jira says a ticket is closed and GitHub shows the branch was not merged, you have a data conflict, not an insight. It happens more than the demos suggest. Cross-tool synthesis is directionally reliable, but I've learned to verify the details before anything surfaces in a stakeholder report.

Hallucination in complex queries
The more context a query requires, the higher the risk of a confident-sounding answer that quietly skips something important. I now routinely spot-check outputs against source data. In my experience, the error rate drops as your data becomes more structured and consistent, but it doesn't disappear. A hallucinated status on a key feature, delivered confidently in a leadership update, is the kind of mistake that's hard to walk back.

Patterns are visible; context is not
AI systems are excellent at identifying patterns. They are not good at knowing that the engineer whose velocity dropped this sprint just had a family emergency, or that the PR review times are up because we deliberately slowed down after a production incident. Context that lives outside the tools breaks pattern-matching. I treat performance signals as prompts for a conversation, not conclusions.

Useful estimates, until someone treats them as facts
PM Agent can model the consequences of adding scope; when I ask whether a new item is realistic given the current workload, it pulls from Jira and GitHub to produce a probabilistic read on deadline risk, surfacing blockers and trade-offs I'd otherwise reconstruct manually. But I've seen teams treat those estimates as authoritative, which they aren't. They're only as good as the historical data behind them. I use them as starting points, not endpoints.

Accurate drafts, but the relationship is still yours
I've experimented with letting AI agents draft status updates. The drafts are usually accurate and occasionally quite good. But the relationship between an engineering manager and their stakeholders is built partly on the texture of how you communicate: tone, framing, and what you choose to flag versus hold back. I haven't found a way to delegate that authentically, and I'm not sure I should.

The numbers are automatable; the conversation is not
The quantification and visualization can be automated. The conversation about what to do cannot. That still requires a human who understands the organization, the politics, the risk tolerance, and the right moment to make the ask.

What I would do differently if I had to roll out AI in my teams again

I would start with the measurement infrastructure, not the generation tools.

We did it backwards; we gave engineers AI coding assistants first, saw the velocity uptick, reported it upward, and only later had to reckon with the debt the velocity numbers had hidden. If I were doing it again, I would establish a delivery intelligence platform (something like Enji) before or simultaneously with AI generation tools. You need to see what's happening to your codebase at the same time as you start accelerating what goes into it.

I would also be explicit about what you're measuring and why with the team. Engineers are smart; they'll optimize for whatever metric they think you're watching. If they think you're watching closed tickets, they'll close tickets. If they understand that you're watching estimation accuracy, rework rate, and cycle time variance, and that those metrics represent code health, not individual performance, the culture around debt shifts. People start flagging problems instead of shipping around them.

I would have made the cost of technical debt visible in financial terms earlier because that's what it is: a financial problem. Debt in your codebase means missed deadlines, expensive engineers pulled off roadmap work to fight fires, and features that take three times longer than they should. That's lost revenue and wasted payroll. "Last quarter, we burned through an estimated $n in engineering hours on work that debt made three times harder than it needed to be" is a budget conversation, and budget conversations get action.

Enji's Project Margins feature is the closest I've found to making that second conversation routine rather than exceptional, because it keeps the cost data current without requiring manual assembly. The hardest part of technical debt communication has always been showing up to a leadership meeting with numbers that people can't dispute. When the numbers are there every week, the conversation changes.

You can also read:

Engineering Management

[Engineering Cost Transparency: How to Know What Every Feature Really Costs]

A practical guide to feature cost tracking: worklog discipline, role-level rates, and how to connect effort to delivery output across every sprint.