The Canary in the Fintech Mine
In 2023, Klarna CEO Sebastian Siemiatkowski stood before investors and declared a new era. The Swedish buy-now-pay-later giant had deployed an OpenAI-powered assistant that, within its first month, handled 2.3 million customer conversations—the equivalent of 700 full-time support agents.1 The chatbot resolved issues in an average of under two minutes, compared to eleven minutes for human agents. Wall Street cheered. The narrative wrote itself: AI had made a category of worker economically redundant.
By May 2025, Siemiatkowski was telling Bloomberg something else entirely. “We focused too much on efficiency and cost,” he admitted. “The result was lower quality, and that’s not sustainable.”2 Customer satisfaction scores on complex interactions had deteriorated. Repeat contact rates—customers forced to call back about the same unresolved issue—had climbed. The chatbot, which excelled at password resets and order status queries, had no reliable mechanism for handling the emotionally charged, legally nuanced, or contextually ambiguous interactions that constitute the 20 percent of volume responsible for the majority of brand damage. By September 2025, Klarna was reassigning internal employees to customer support and actively recruiting human agents again.3
The Klarna reversal is not an isolated anecdote. It is the most publicly documented instance of a pattern now replicated across hundreds of enterprise deployments. Forrester Research’s 2026 Future of Work report found that 55 percent of employers regret replacing human workers with AI for knowledge work functions.4 Gartner, surveying 350 global executives in Q3 2025, found that roughly 80 percent of organizations piloting autonomous technologies reported workforce reductions—and that those reductions showed no statistically meaningful correlation with improved ROI.5
The question enterprise leaders should be asking in mid-2026 is not “Can AI replace developers?” The answer to that question is already in the data, and it is nuanced. The question is: “What does AI-first workforce strategy actually cost across the full lifecycle of an engineering organization?” That accounting is considerably more uncomfortable than the line items on an AI vendor invoice.
The Cost You Model Is Not the Cost You Pay
The business case for replacing developers with AI rests on a deceptively simple comparison. A mid-market software engineer costs between $130,000 and $163,000 in base salary, or roughly $182,000 to $229,000 fully loaded with benefits, payroll taxes, equipment, real estate, and management overhead.6 GitHub Copilot Enterprise, the most widely deployed AI coding assistant, lists at $39 per user per month—$468 annually—against a GitHub Enterprise Cloud requirement of $21 per user per month, bringing the effective minimum to $60 per seat, or $720 per year.7
The arithmetic looks decisive. A single developer costs 250 times what a Copilot seat costs. Ten developers cost $1.8 to $2.3 million fully loaded. Ten Copilot seats cost $7,200. The spreadsheet practically completes itself, and for a board presentation at 10 a.m., it is entirely sufficient.
It is also deeply misleading.
The subscription fee for an AI coding assistant is, by most independent estimates, between 6 and 15 percent of the actual annual cost of operating that tool in a production engineering environment.8 The remaining 85 to 94 percent arrives in categories that rarely appear in a vendor’s ROI calculator: token overages and credit exhaustion, extended code review cycles, technical debt remediation, security triage, debugging time, and the carrying cost of the institutional knowledge that departed with the developers who were replaced.
Software Improvement Group’s State of Software 2026, drawn from analysis of more than 30,000 enterprise systems encompassing over 400 billion lines of code, documented the macro-level consequence of this accounting gap. SIG found that AI-generated code carries roughly double the security risk violations of human-written code, and that for a team of 50 developers, AI token spend now averages the equivalent cost of nearly one additional full-time developer.9 That is before factoring the secondary costs that compound downstream.
The Core Premise AI tools do not replace developer cost. They restructure it—front-loading productivity gains while back-loading technical debt, security remediation, and governance overhead. Organizations that model only the front-loaded gains discover the back-loaded costs in their quarterly incident reviews.
How the Hidden Costs Compound
Code Volume Without Code Quality
The headline productivity statistic for AI coding tools is speed. GitHub’s research indicates that developers complete isolated coding tasks up to 55 percent faster with Copilot active.10 Accenture’s randomized controlled trial across its own engineering teams found an 8.69 percent increase in pull requests per developer and an 84 percent improvement in successful builds.11 These are real numbers from real deployments. They are not fabricated.
What the same research shows—and what most vendor-facing narratives omit—is that individual coding speed is not the binding constraint in enterprise software delivery. Code review, testing, integration, security scanning, and deployment are. And AI assistance on the generation side compounds the pressure on every downstream stage.
Analysis of 10,000-plus developers across enterprise deployments found that teams with high AI adoption merged 98 percent more pull requests but experienced a 91 percent increase in review time.12 DORA’s 2024–2025 reports confirm the pattern: AI adoption correlates with higher throughput but lower delivery stability, with pull requests per developer increasing 20 percent while incidents per pull request rose 23.5 percent.13
The most rigorous independent data point remains a randomized controlled trial published by METR in July 2025. Sixteen experienced open-source developers completed 246 real-world tasks under controlled conditions, with and without AI tools. The result contradicted every vendor study: developers using AI tools took 19 percent longer to complete tasks than those working without them. The developers themselves believed they were 20 percent faster—a perceptual inversion that speaks directly to why self-reported productivity surveys and controlled experiments diverge so consistently.14
Synthesizing six independent research efforts measuring organizational delivery velocity—as opposed to individual task speed—converges on roughly a 10 percent improvement in total output when controlling for review overhead and downstream remediation.15 Not 55 percent. Not 40 percent. Ten.

The Technical Debt Accelerant
The more consequential long-term cost is architectural. AI coding tools generate code that passes tests and compiles cleanly. They do not generate code that ages gracefully.
Software Improvement Group identified a fundamental constraint: productivity gains from AI coding tools diminish significantly once a codebase exceeds 100,000 lines, because large language models struggle to understand complex software architecture in full context.9 The result is a pattern SIG describes as generation outrunning governance: AI produces volume faster than human engineering oversight can assess quality, producing what researchers at SonarSource and others have termed “comprehension debt”—code that works in the moment but cannot be understood, extended, or debugged by the next engineer who encounters it.16
An ArXiv study published in March 2026 analyzed 6,275 public GitHub repositories containing 304,362 verified AI-authored commits and found that unresolved technical debt climbed from a few hundred issues in early 2025 to more than 110,000 surviving issues by February 2026—accumulated across codebases where AI-generated code had been merged without sufficient human architectural review.17
SIG’s benchmark puts an economic figure on this: reducing code-level technical debt saves an estimated €870,000 per system per year in developer time.9 The corollary is that allowing technical debt to accumulate at the rate enabled by ungoverned AI generation creates that liability at scale. Deloitte’s 2026 Global Technology Leadership Study estimates that technical debt already consumes between 21 and 40 percent of IT spending at most large enterprises.18 AI-assisted development, without corresponding governance, accelerates that consumption.
“When code generation outruns governance, technical debt accumulates faster, security exposure widens, and the systems a business depends on become harder to maintain and evolve.”
— Luc Brandts, CEO, Software Improvement Group, State of Software 2026
The Security Tax
The security cost of AI-generated code is now the subject of systematic, multi-source, multi-methodology research, and the findings are consistent enough to constitute a settled empirical pattern.
Veracode’s 2025 GenAI Code Security Report, the largest systematic study of its kind, tested more than 100 large language models across 80 coding tasks in four programming languages and found that AI-generated code contains 2.74 times more vulnerabilities than human-written code.19 The industry-wide security pass rate for AI-generated code stands at approximately 55 percent—meaning nearly half of AI-generated code samples introduce OWASP Top 10 vulnerabilities—and that rate has not materially improved across multiple testing cycles from 2025 through early 2026 despite vendor claims to the contrary.20

Apiiro’s research across Fortune 50 enterprises found that AI-assisted developers produce commits at three to four times the rate of their peers but introduce security findings at ten times the rate—creating a security debt that accumulates faster than organizations can remediate it.21 In specific vulnerability categories, the differential is more severe: 322 percent more privilege escalation paths, 153 percent more design flaws, and a 40 percent increase in secrets exposure in AI-generated code versus human-written equivalents.22
Georgia Tech’s Vibe Security Radar, launched in May 2025 and tracking CVEs directly attributable to AI coding tools, documented 35 such vulnerabilities in a single month—March 2026—up from 6 in January, with researchers estimating that the true industry-wide count is five to ten times higher than formally attributed incidents.23
The “vibe coding” pattern—where developers accept AI-generated code at high rates without architectural review—compounds each of these figures. GitGuardian’s State of Secrets Sprawl 2026 found that AI-assisted commits expose secrets at 3.2 percent, compared with 1.5 percent for human-only commits.24 Java creator James Gosling, commenting on the enterprise application of these tools, was direct: “As soon as your project gets even slightly complicated, they pretty much always blow their brains out. Vibe coding is not ready for the enterprise because in the enterprise, software has to work every time.”25
The security cost is not hypothetical. It is measurable, it is recurring, and it does not appear on the AI vendor invoice.
The Tribal Knowledge Problem
There is a fourth cost category that resists easy quantification but may be the most consequential for organizations that executed rapid, large-scale developer reductions. When an experienced engineer leaves, they carry with them a working model of how the system actually behaves—why a particular service was built the way it was, which edge case broke production two years ago and what the workaround is, which data migration assumption is embedded in a module that cannot be changed without understanding a business rule from a client relationship that predates current management.
This is tribal knowledge. Research from workforce analytics firm Workplace Intelligence suggests that 42 percent of the expertise a senior employee performs is known only to that individual and cannot easily be replicated by a replacement.26 Knowledge loss during turnover can cost an organization up to 213 percent of the departing individual’s salary when accounting for recruitment, onboarding, productivity ramp, and the errors made by successors operating without institutional context.27
AI does not possess tribal knowledge. It cannot. A large language model trained on public code and fine-tuned on a company’s codebase will reproduce patterns; it will not reproduce judgment. Gartner’s analysis of AI agent production failures found that 60 percent trace back to context quality issues—missing or stale context that makes technically accurate outputs operationally wrong.28 This is the tribal knowledge gap, measured at the system level.
The consequence for organizations that executed significant developer reductions in 2024 and 2025 is that they are now discovering the cost of that context loss in production—in the form of incidents that experienced engineers would have anticipated, architectural decisions that violate constraints no one remembered to document, and debugging cycles that consume multiples of the time they would have required when the original authors were available.
Impact On the Enterprise
For a CFO evaluating AI-driven workforce strategy in the second half of 2026, the data resolves to a set of conclusions that are uncomfortable but not ambiguous.
AI tools reduce certain categories of developer labor but do not eliminate the need for senior engineering judgment. The roles most reliably displaced are junior-tier tasks—boilerplate generation, basic testing, documentation, simple integrations. The roles that AI amplifies rather than replaces are architectural decision-making, code review, security assessment, system design, and domain-specific problem-solving that requires organizational context. This means AI-first workforce strategies disproportionately eliminate the cheapest engineering labor while creating additional demand for the most expensive.
The fully loaded per-engineer annual cost of AI coding tools—subscription, token overages, training, extended review overhead, and debt remediation—ranges from $3,720 to $9,000, against a median base subscription of under $720.29 That figure grows sharply with agentic adoption: SIG found agentic coding tasks can consume up to 1,000 times more tokens than standard code chat, with cost implications already producing documented billing surprises—including one enterprise spending $500 million in a single month on AI services.30
The productivity gains are real but bounded. Forrester’s Total Economic Impact study for GitHub Enterprise Cloud—commissioned by GitHub—documents 376 percent three-year ROI for a composite enterprise with 5,000 developers and dedicated enablement teams.31 Independent research suggests organizational delivery velocity gains closer to 10 percent for typical enterprise environments. The gap between those figures reflects the difference between an idealized composite and the median enterprise, which operates with weaker governance and higher inherited technical debt.

The Governance Gap 91 percent of enterprises lack a mature AI governance framework for their software development practices (McKinsey). This is not a technology gap. It is a management gap—and it is where the difference between measurable AI ROI and AI-accelerated technical debt is determined.
Finally, the workforce reduction math requires a second-order accounting that most layoff business cases omit. Forrester’s model predicts that half of AI-attributed layoffs will result in rehiring, frequently at lower salaries into less stable arrangements—a pattern that creates a dual cost: the original severance and rehiring overhead, plus the productivity loss incurred while institutional knowledge reconstitutes itself in a reduced-experience workforce. One in three employers who rehired for AI-eliminated roles reported spending more on restaffing than they saved from the initial layoffs.4
Gartner’s language on this point is precise: “Workforce reductions may create budget room, but they do not create return.”32
Vendor ROI Claims: Ceilings Dressed as Floors
Every major AI coding tool vendor—GitHub Copilot, Google Gemini Code Assist, Amazon Q Developer, Cursor, Tabnine—publishes ROI research. The pattern across all of them is identical: productivity figures are drawn from vendor-commissioned studies, measured at the individual task level, and presented without controlling for downstream review overhead, technical debt accumulation, or security remediation cost. None of them publish organizational delivery velocity data. None of them publish change failure rates before and after adoption. The metrics they choose to surface are the metrics that look best in a procurement conversation, and the metrics they omit are precisely those that independent research shows deteriorating under high AI adoption.
This is not a criticism unique to any single vendor. It is a structural feature of how the AI tooling market operates in 2026. When GitHub’s commissioned Forrester study reports 376 percent three-year ROI, it is measuring a composite organization with 5,000 developers, dedicated enablement teams of seven to twelve full-time equivalents, and mature DevSecOps practices.31 When Google publishes internal data on Gemini Code Assist adoption across its own engineering teams, it is measuring an environment with world-class infrastructure and review processes. These figures are not fabricated, but they describe ceilings, not medians, and the gap between the two is where most enterprise deployments actually land.
Pricing model complexity compounds the problem. GitHub Copilot’s June 2026 transition from per-seat to token-based AI Credits pricing means that agentic usage—the mode vendors are actively promoting as the next frontier—can escalate per-developer monthly costs from $19–$39 to $750 or more under heavy usage patterns.33 Amazon Q Developer carries similar consumption-based escalation risk under agentic workflows. Cursor’s usage tiers introduce overage charges that enterprise procurement teams accustomed to flat per-seat SaaS pricing are not yet modeling correctly. In each case, the subscription price that anchors the initial business case bears little relationship to what a fully adopted, agentic deployment actually costs at scale.
The signal enterprises should watch is not what vendors claim in ROI studies. It is what they are investing in. Every major vendor is accelerating agentic capabilities—autonomous agents that write, test, review, and deploy code with minimal human intervention. That investment direction reflects a market reality: the vendors understand that the governance gap documented in this article is their next sales cycle. They are building the tools to close it. The enterprise that waits for vendor-provided governance solutions rather than building its own framework now is outsourcing a management decision to the people with the strongest financial interest in a particular answer.

Cost Governance for the AI-Augmented Engineering Organization
The conclusion that emerges from this body of evidence is not that AI tools lack value in software engineering. They have demonstrated value, and the organizations that ignore them will face genuine competitive disadvantage. The conclusion is that the value is realized through augmentation, not substitution—and that realizing it requires governance infrastructure that most organizations have not yet built.
The following framework is organized around the decisions that CFOs and CTOs face in the second half of 2026.
1. Build a true TCO model before the next renewal cycle.
Subscription cost is not your AI cost. Account for per-seat licensing, platform dependencies, token overages under realistic usage patterns, onboarding time (two to four engineering hours per seat at fully loaded rates), extended review overhead, and quarterly technical debt remediation. Independently tracked deployments put this total at $3,720 to $9,000 per engineer annually—model your own number against your actual usage and compare it honestly to what your contracts project.
2. Separate individual productivity metrics from organizational delivery metrics.
Track AI-assisted task completion speed and developer satisfaction—these will look good. Also track change failure rate, time to restore service, mean time between incidents on AI-generated versus human-written code, and PR review cycle length. If individual metrics improve while delivery stability degrades, you have a review and validation bottleneck, not an AI productivity problem.
3. Define the human-AI boundary before deployment, not after the first incident.
The Klarna failure occurred because the boundary between AI-handled and human-handled interactions was defined by cost metrics rather than quality requirements. For software engineering, the equivalent boundary is architectural: AI generates, humans review, approve, and own. Every AI-assisted commit should have a named human author who has read the code and accepts accountability for its behavior in production.
4. Treat technical debt as a financial liability, not a technical problem.
SIG found that systematic remediation of architectural debt yields median returns of 437 percent over 24 months with a 6.2-month break-even.34 If AI tooling accelerates debt accumulation without commensurate governance investment, you are borrowing productivity from future engineering budgets at a high interest rate. Require a board-ready technical debt register, updated quarterly, with AI-generated debt flagged as a distinct line item.
5. Govern AI-generated code security as a distinct risk category.
Forty-five percent of AI-generated code contains security vulnerabilities requiring remediation before production deployment.19 If AI doubles your commit volume and security review capacity remains flat, your security posture deteriorates even if per-line vulnerability rates hold constant. Budget security review proportional to output, not to historical baselines.
6. Model the full cost of institutional knowledge loss before authorizing developer reductions.
Require any AI-replacement business case to include estimated knowledge loss cost (213 percent of departing engineer salary is a conservative benchmark), expected rehiring cost if the reduction reverses within 18 months (Gartner projects 50 percent will), and the incident carrying cost attributable to context gaps during the transition. If those numbers do not appear in the business case, the business case is incomplete.
| Cost Category | What Most Models Include | What They Miss | Benchmark (per engineer/yr) |
|---|---|---|---|
| AI Tool Licensing | Subscription seat fee | Token overages, platform dependencies, credit exhaustion under agentic use | $720–$2,400 modeled; $3,720–$9,000 actual TCO |
| Developer Productivity | Individual task speed (+55% vendor) | Review overhead (+91%), delivery velocity (actual +10%), METR RCT (−19% on complex tasks) | Net org gain: ~10% |
| Technical Debt | Rarely modeled at procurement | €870K/system/yr carrying cost; 2× accumulation rate under ungoverned AI generation | 21–40% of IT spend (Deloitte 2026) |
| Security Remediation | Existing SAST/DAST budget | 2.74× vulnerability rate in AI code; 10× security finding rate at AI-assisted commit velocity | +$150–$300/engineer/mo (regulated industries) |
| Workforce Transition | Severance (one-time) | Rehire cost if reversed (213% salary), knowledge loss during gap, incident cost from context absence | 1 in 3 employers spent more restaffing than they saved |
Govern AI Proactively
Eighteen months of enterprise AI adoption have produced a data set sufficiently large and diverse to resolve the developer-versus-AI cost question with meaningful confidence. AI tools are not cheaper than developers. They are an additional cost layer that, when governed well, produces net productivity and quality gains that justify the investment. When governed poorly—which describes the majority of current enterprise deployments—they accelerate technical debt, increase security exposure, and create a false economy that reverses on contact with production reality.
The organizations that are generating measurable, durable value from AI in software engineering share a common characteristic: they treated AI tooling as an amplifier of engineering capability, not a substitute for it. They staffed governance, measured delivery holistically, maintained senior engineering judgment in the loop, and tracked the full lifecycle cost of the code their tools generated. The organizations that are now quietly rehiring, paying down AI-generated technical debt, and managing security incidents they cannot explain to their auditors made the opposite choice, and they made it because their business cases modeled the subscription fee and stopped there.
The developer is not too expensive. The ungoverned AI deployment is.
References
- Klarna, “Klarna AI assistant handles two-thirds of customer service chats,” Klarna Press Release, February 2024.
- Sebastian Siemiatkowski, interview with Bloomberg, May 8, 2025; as reported in “Klarna Reverses Course on AI Customer Support,” FinTech Weekly, May 12, 2025.
- “Klarna reassigns workers to customer support after AI quality concerns,” Business Insider, September 2025; “Klarna again recruiting humans for customer service after AI push,” CX Dive, September 2025.
- “Companies Rehire Workers After AI Replacements Fail,” The Washington Times, March 10, 2026.
- Gartner, “Gartner Says Autonomous Business and AI Layoffs May Create Budget Room, but Do Not Deliver Returns,” Gartner Newsroom, May 5, 2026.
- Bureau of Labor Statistics, Occupational Employment and Wage Statistics, May 2024 (median $133,080); Glassdoor, Software Engineer salary data, June 2026 (average $150,207); fully loaded estimate applies standard 40% employer burden factor.
- GetDX, “AI Coding Assistant Pricing and ROI Guide 2026,” June 2026.
- Fordel Studios, “What AI Coding Assistants Actually Cost Per Engineer (Nobody Tells You This),” April 2026. Quote: “The subscription fee is 6–15% of the real cost.”
- Software Improvement Group, State of Software 2026, June 2026.
- GitHub / Sida Peng et al., “The Impact of AI on Developer Productivity,” ACM CHI, 2023; updated enterprise research, GitHub Blog, 2024.
- Accenture, randomized controlled trial results reported in Second Talent, “GitHub Copilot Statistics & Adoption Trends,” October 2025.
- AI Coding Productivity Paradox analysis, philippdubach.com, March 2026, synthesizing six independent research efforts.
- Google DORA, State of AI-Assisted Software Development, 2024–2025.
- METR, randomized controlled trial of AI-assisted development, experienced open-source developers, July 2025; as cited in Uvik Software, “AI Coding Assistant Stats 2026,” May 2026.
- Philipp Dubach, “93% Adoption, 10% Gains,” philippdubach.com, March 2026.
- SonarSource, landmark study of LLM coding security, August 2025; Justin Hamade, “True Cost of AI-Generated Code: A Strategic Analysis of Comprehension Debt,” October 2025.
- ArXiv, “Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild,” March 30, 2026.
- Deloitte, 2026 Global Technology Leadership Study; as cited in Software Improvement Group, “The Next Wave of Technical Debt Is Architectural,” May 2026.
- Veracode, 2025 GenAI Code Security Report, 2025.
- Veracode, “Despite Claims, AI Models Are Still Failing Security,” March 2026 update; as cited in Cloud Security Alliance AI Safety Initiative, “Vibe Coding’s Security Debt,” April 4, 2026.
- Cloud Security Alliance AI Safety Initiative, “Vibe Coding’s Security Debt: The AI-Generated CVE Surge,” April 4, 2026, citing Apiiro enterprise research.
- SoftwareSeni, “AI-Generated Code Security Risks: Why Vulnerabilities Increase 2.74x,” February 2026, citing Apiiro Fortune 50 enterprise analysis.
- Georgia Tech Systems Software and Security Lab, Vibe Security Radar, 2026; as cited in RTSLABS, “Enterprise Vibe Coding: Governance & Security Guide for 2026,” June 2026.
- GitGuardian, State of Secrets Sprawl 2026; as cited in RTSLABS, “Enterprise Vibe Coding,” June 2026.
- James Gosling, quoted in Darryl K. Taft, “Vibe Coding Fails Enterprise Reality Check,” The New Stack, September 2025.
- Workplace Intelligence, 2025 research on knowledge retention; as cited in Glean, “The Ultimate Guide to Choosing AI Knowledge Management Tools,” October 2025.
- Glean, “The Ultimate Guide to Choosing AI Knowledge Management Tools,” October 2025.
- Gartner, Top Strategic Technology Trends for 2025: Agentic AI; as cited in Atlan, “Tribal Knowledge: Definition, Five Types and AI-Era Risks,” May 2026.
- Exceeds.ai, “Cost-Benefit Analysis of AI Coding Assistants for Leaders,” April 2026; Fordel Studios, “What AI Coding Assistants Actually Cost Per Engineer,” April 2026.
- Vaasblock, “Enterprise AI Spending ROI Crisis 2026: $2.59 Trillion and One $500M Bill,” June 2026, citing Axios investigation May 28, 2026.
- Forrester Consulting, The Total Economic Impact of GitHub Enterprise Cloud, commissioned by GitHub, July 2025.
- Helen Poitevin, Distinguished VP Analyst, Gartner, quoted in Gartner Newsroom press release, May 5, 2026.
- GetDX, “AI Coding Assistant Pricing and ROI Guide 2026,” June 2026, citing Troy Gray DX customer data.
- Software Improvement Group, “The Next Wave of Technical Debt Is Architectural, and AI Is Accelerating It,” citing peer-reviewed study published April 2026 on architectural debt remediation returns.