The ROI of AI: The Risk You're Not Pricing

The Missing Variable in AI ROI Calculations
The statistics on AI project failure have become familiar enough to feel like settled fact. Gartner predicted that through 2025, at least 30% of generative AI projects would be abandoned after proof of concept. Various industry surveys put the failure rate for enterprise AI initiatives at 80% or higher. The pattern is real—organisations are pouring resources into AI and frequently not seeing the returns they projected.
The standard explanations focus on execution problems. The data wasn't ready. The use case wasn't well-defined. The organisation lacked technical talent. Leadership didn't provide adequate sponsorship. These explanations aren't wrong, and any experienced AI practitioner can point to projects that failed for exactly these reasons.
But how exactly are we measuring success and failure in the first place? What if the problem isn't primarily about execution, but about the math we're using to justify these projects?
The calculation everyone is running
The typical AI business case follows a familiar structure. You project the benefits—efficiency gains, cost reductions, revenue uplift, competitive differentiation. You estimate the costs—infrastructure, talent, vendor fees, integration work, ongoing maintenance. Subtract costs from benefits, and you get your expected ROI. If the number looks good, the project gets funded.
What's absent from this calculation is any systematic accounting for risk.
The implicit assumption is that risks either won't materialise or can be handled reactively when they do. This is not how we evaluate other major capital investments. Construction projects carry contingency budgets. Financial instruments are priced for risk. Pharmaceutical development accounts for trial failure rates at each stage. In those domains, risk is a first-class variable in investment decisions—not a footnote that gets waved away.
It turns out that AI is being treated as a software project when it should be treated as a risk-bearing asset. And when you leave a significant variable out of your calculations, you shouldn't be surprised when your predictions don't match reality.
The "just add AI" fallacy
Part of the problem is a mental model of AI implementation that dramatically understates its complexity. The model goes something like this: call an API, get predictions back, integrate them into your product, ship the feature. The demo worked beautifully, so production should be straightforward.
The reality is that the model itself—the thing that gets all the attention—represents perhaps 10% of what you're actually building. The other 90% consists of data pipelines to feed the model, monitoring infrastructure to detect when it's failing, retraining workflows for when performance degrades, evaluation frameworks to measure whether it's working, fallback mechanisms for when it doesn't, and versioning systems to track what's running where.
Google's researchers documented this pattern back in 2015 in a paper called "Hidden Technical Debt in Machine Learning Systems." The core insight remains underappreciated a decade later: the bread-and-butter work of keeping an AI system running in production dwarfs the glamorous work of building the model in the first place.
The skills required to maintain AI systems over time are genuinely different from traditional software engineering. Models degrade as the world changes around them. Data distributions drift. Edge cases that never appeared in training show up in production. The maintenance burden isn't a one-time cost but an ongoing operational commitment—and organisations that staff AI projects like traditional software projects discover this mismatch the hard way.
What we mean when we say "risk"
If we're going to account for risk in AI investment decisions, we need to be specific about what kinds of risk we're talking about.
Technical risks are often the proximate cause of failure. Model performance in production rarely matches benchmark performance, because benchmarks are clean and production environments aren't. Data quality issues that seemed manageable in the proof of concept become blocking problems at scale. Integration with existing systems turns out to be harder than anticipated—AI isn't a microservice you slot into an architecture without consequences. And there's vendor dependency: the model you built your system around can change, deprecate, or reprice in ways that break your business case entirely.
Operational risks follow from the fact that AI systems fail differently than traditional software. A conventional bug produces an error message or crashes. An AI system can fail silently, producing outputs that look plausible but are wrong. The confident hallucination is a failure mode that traditional software doesn't have, and it requires different monitoring approaches to detect. Many organisations lack the observability infrastructure to know when their AI systems are underperforming—which means they discover failures through downstream effects rather than proactive detection.
Regulatory risks are increasing as AI-specific legislation comes into force. The EU AI Act becomes fully applicable in August 2026 and introduces penalties of up to 7% of global annual turnover for serious violations. High-risk AI systems require conformity assessments, technical documentation, and ongoing monitoring. Most organisations have not priced this regulatory exposure into their AI business cases—which means they're carrying risk they haven't accounted for.
Reputational risks deserve separate attention because AI failures attract disproportionate public interest. A biased hiring algorithm, a chatbot that produces harmful content, a recommendation system that surfaces inappropriate material—these generate headlines in ways that traditional software bugs typically don't. One high-profile AI failure can cost more in brand damage and customer trust than the entire projected benefit of the initiative.
The compounding effect is what makes AI risk particularly hard to manage. A technical failure triggers an operational incident. The incident attracts regulatory scrutiny. The investigation generates press coverage. The coverage damages customer trust. What started as a model performance problem becomes a multi-dimensional crisis. These risks don't just add together—they amplify each other.
A different way to do the math
The concept of risk-adjusted returns isn't novel. In finance, it's standard practice to evaluate investments not just by expected returns but by risk-adjusted returns. A high-expected-return investment with high volatility may be worse than a moderate-expected-return investment with lower volatility. The Sharpe ratio exists precisely to make this comparison rigorous.
The same logic applies to AI investments. A project with high expected ROI but high risk variance may actually have negative risk-adjusted ROI once you account for the probability and impact of failure scenarios. Conversely, a project with moderate expected ROI but well-understood and mitigated risks may be the better investment—even though its headline number is less impressive.
Calculating risk-adjusted ROI forces you to be explicit about what could go wrong. You identify the risk scenarios, estimate their probability, estimate their impact if they materialise, and incorporate those weighted costs into your calculation. This is harder than ignoring risk, which is precisely why it's valuable.
Two outcomes, both of them wins
When you apply rigorous risk quantification to an AI initiative, you get one of two outcomes.
The first is that you kill the project early. Risk-adjusted ROI comes back negative—the expected benefits don't justify the risk-weighted costs. This feels like failure, but it's actually success: you've avoided sunk costs, freed resources for better opportunities, and learned something about your risk appetite. The failure happened on paper, where it's cheap, rather than in production, where it's expensive.
The second is that you derisk the project and proceed. Risk quantification identifies specific failure modes. You implement mitigations that reduce either the probability of those failures or their impact if they occur. The mitigations have costs, but they improve your risk-adjusted ROI by narrowing the variance of outcomes. You proceed with eyes open, and your actual ROI converges toward your expected ROI because you've addressed the factors that cause projects to underperform.
The only losing move is proceeding without quantification—discovering risks as they materialise and absorbing costs that could have been avoided.
What this looks like in practice
For technical risks, it means setting realistic performance expectations based on production conditions rather than benchmark results. It means mapping integration complexity before committing to timelines. It means estimating the ongoing maintenance burden rather than treating deployment as the finish line.
For operational risks, it means analysing failure modes specific to AI systems and designing monitoring that can detect them. It means identifying skill gaps and either filling them or acknowledging them as risks. It means building incident response playbooks that account for the ways AI fails differently.
For regulatory risks, it means mapping which obligations apply to your specific systems and what the penalty exposure is for non-compliance. It means understanding documentation requirements before you've accumulated months of technical debt.
The output is a business case where risk costs appear as explicit line items rather than omissions. The conversation about whether to proceed becomes a conversation about whether the risk-adjusted ROI justifies the investment—which is the right conversation to have.
Looking forward
As AI regulation matures and the EU AI Act reaches full applicability, the cost of unquantified regulatory risk increases. As AI ambitions grow and organisations attempt more complex implementations, technical and operational risks grow with them. The environment is becoming less forgiving of risk-blind investment.
The "90% failure rate" narrative isn't a fact about AI as a technology. It's a fact about how organisations have been making AI investment decisions. The technology works. The implementations fail because the business cases that justified them were incomplete.
Will the next wave of AI investment be governed by better math? I don't know. But the organisations that build risk-adjusted thinking into their AI governance now—rather than after the next expensive lesson—will have a structural advantage as the market matures. That much seems clear.