Why are 95% of enterprise GenAI investments delivering zero ROI?

MIT NANDA reports that projects run with external vendors succeed 67% of the time while in-house builds succeed only 33%. Failures concentrate in three areas: (1) pilots that never make it into production, (2) data governance and process redesign that never happen alongside the technology, and (3) executives and front-line teams who never agree on the same KPIs. The assumption that buying a GenAI model is enough to generate value has collapsed.

What do the 6% of AI high performers have in common?

McKinsey 2026, Stanford HAI 2026, and BCG 2026 all converge on three traits: (1) workflow redesign deep enough to drive more than 5% of EBIT, (2) reskilling and role redesign run in lockstep with AI deployment, and (3) governance owned by the CEO rather than parked with the CIO or CDO. The differentiator is no longer "have you adopted AI" but "have you rewired people and organization around it."

How does WARP address the GenAI Divide?

WARP is built around a "partnership-based AI adoption" model — engaging before in-house teams get stuck. We use spec-driven development to make AI implementations reproducible, anchor internal knowledge inside ZEROCK on Japan-domestic servers, and redesign workflows and reskilling at the same time. Our delivery rhythm is built around a three-month reset, with WARP acting as the implementation partner that keeps consulting, build, and training on the same table.

The GenAI Divide: Why 95% of Enterprise AI Projects Fail — A Reality Check from MIT NANDA, McKinsey, and Stanford HAI

Hello, this is Hamamoto from TIMEWELL. There are two numbers that come up in almost every customer conversation I have lately: 95% and 6%. The first is the failure rate MIT reported for enterprise GenAI investments. The second is the share of companies McKinsey classifies as "AI high performers." Three flagship reports — landing within months of each other — converged on these numbers, and inside our circles "GenAI Divide" has quietly become an everyday phrase.

Translated literally, it means a split caused by generative AI. It is the widening gap between companies that have successfully landed AI and those that have not, between teams that are getting stuck trying to build everything in-house and teams that are getting unstuck by working with the right external partners. It is rare for MIT NANDA, McKinsey, and Stanford HAI to reach such similar conclusions at the same time. "Let's wait and see" is no longer a defensible posture. Working day-to-day on WARP engagements, I have watched companies fall into the divide and others slip past it. Today I want to be honest about what separates the two, by walking through the primary sources.

What three reports agree on: the shape of the GenAI Divide

Let's start with MIT NANDA's "The GenAI Divide: State of AI in Business 2025" [1]. The headline number is now famous — out of roughly 30 to 40 billion dollars in enterprise GenAI investment, 95% has not generated business returns. The methodology behind it is unusually robust: 150 executive interviews, 350 employee surveys, and an analysis of 300 publicly disclosed enterprise AI deployments. Quantitative and qualitative data are stitched together in a way that is hard to dismiss.

The detail people often miss is that "95% failure" does not mean "the AI didn't work." Technically, the models and tools are running. They simply do not move the P&L. The MIT research team calls this state a "learning gap." Most enterprise GenAI systems do not accumulate feedback, do not adapt to the business context they live in, and do not improve over time. So they look operational from a distance, yet they never become a true company capability. That is MIT's core argument.

Stanford HAI's "2026 AI Index Report" looks at the same landscape from another angle [2]. Generative AI reached 53% of the global population in three years, outpacing the diffusion curves of personal computing and the public internet. Organizational adoption sits at 88%, and 70% of organizations now use GenAI in at least one business function. The pace of adoption is no longer a question. Yet agentic AI in production stays in the single-digit percentages across most functions. Widely used, but not deeply used. That is what Stanford observes.

McKinsey's "The State of AI: How organizations are rewiring to capture value" layers a financial lens on top [3]. Only 6% of companies see GenAI contributing more than 5% to EBIT. Another 39% sense some contribution but mostly below 5%, and more than half see no meaningful EBIT impact at all. The world is uniformly experiencing a situation where adoption climbs and yet very few organizations turn it into profit.

When you line up the three reports, the data sources and the framings differ, but the conclusions almost overlap. The barrier to adoption is gone. The gap that has opened up sits downstream — in the work of becoming an organization that can actually generate outcomes after deployment. The reason "GenAI Divide" is starting to gain currency in Japan, too, is that the phrase fits what people are seeing on the ground.

The four root causes of 95% failure — MIT's findings meet what we see on the ground

So why are so many companies unable to escape the zero-ROI zone? The MIT report points to four structural causes [1]. They map very closely to the failure patterns I have watched play out, so let me work through each with examples from the field.

The first is the limits of in-house build. According to MIT, AI deployments run with specialized external vendors succeed 67% of the time, while in-house builds succeed only 33%. The gap is a factor of two. In Japan, renue's 2026 report has done a clean job mapping the "PoC fatigue" pattern that follows from over-investing in internal pilots that never make it into production [4]. I once worked alongside a company that put three or four of its strongest engineers on an internal build. The evaluation criteria stayed vague, the team could not get the system into production, and after six months the project quietly collapsed. The gap between running an AI model and embedding it into business operations is deeper than most executives expect.

The second is the spread of "shadow AI." MIT reports that only about 40% of employees use the AI tools their company has officially licensed, while 90% use consumer-grade tools like ChatGPT in their daily work [1]. While decision-makers still believe pilots are progressing on the executive dashboard, the front line has long since moved on — often pasting confidential information into chatbots in the process. The gap between the project tracker leadership is reading and the reality on the ground is enormous at a surprising number of companies.

The third is misallocated investment. MIT finds that more than half of enterprise AI budgets are concentrated on the visible, outward-facing functions like sales and marketing. Yet the ROI tends to show up in back-office automation — contract review, procurement, risk management, and other unglamorous areas [1]. NTT DATA's analysis of manufacturing PoCs makes the same point, framing it not as a "precision problem" but as a "structural mismatch" [5]. Pointing AI at "where cost concentrates" instead of "where audiences applaud" is a basic management instinct, but in practice fewer companies do it than you would think.

The fourth is fragile workflows and weak organizational scaffolding. No matter how accurate the AI itself is, if the surrounding workflow, permission model, and data governance are not in place, the deployment breaks within weeks. Most of the failure cases I have seen stalled exactly here. The technology team builds a strong model, the business team refuses to change the approval flow, and the company ends up with double work — AI drafts a document, then a human rewrites every line. When METI's AI Business Operator Guidelines version 1.2 explicitly required human-in-the-loop oversight for autonomous AI agents, this risk of ungoverned execution was clearly part of the motivation [6].

What strikes me when I look at these four root causes together is that none of them are about model performance. You can buy the smartest model on the market, but if the organization around it is not configured, you cannot draw out its capability. MIT's conclusion, distilled, is this: the bottleneck is not the model — it is the learning process and the organization.

What the winning 6% are quietly doing right

Now let's flip the lens. The traits that McKinsey, Stanford HAI, and BCG all flag for the "winning side" are strangely consistent. The deciding factors are not surface-level tool choices or budget size — they are mundane, organizational, and operational.

The first is doing workflow redesign for real. McKinsey compared 25 organizational variables and concluded that "fundamental workflow redesign" correlates most strongly with EBIT impact [3]. Yet only 21% of GenAI-using organizations are actually doing it. Most companies place generative AI on top of an unchanged process and call it a day. That only buys you a linear improvement. The winning 6%, by contrast, are reconstructing the steps of the work itself around the assumption that GenAI exists.

The second is CEOs owning governance. McKinsey reports that EBIT contribution from GenAI is greatest in organizations where the CEO directly oversees AI governance [3]. The pattern is especially pronounced in companies above $500 million in revenue. Yet only 28% of companies actually have the CEO accountable for AI governance. BCG's "AI Radar 2026" lands in the same place — 62% of Trailblazer CEOs expect to materially redesign their organizational structure within four years [7]. When AI is parked as "the CIO's project" or "the CDO's project," cross-functional redesign never gets unlocked. That is a heavy point for Japanese companies in particular.

The third is running reskilling and role redesign simultaneously. According to BCG, Trailblazer companies invest roughly 60% of their AI budget into reskilling and rehiring. Pragmatist companies sit at 27% and Followers at 24% — a three- to four-fold gap [7]. The downstream result is that 70% of employees at Trailblazer companies have been reskilled for AI, versus 41% at Pragmatists and 35% at Followers. With BCG separately projecting that 50 to 55% of US jobs will be reconfigured by AI within the next two to three years [8], reskilling is no longer a training-budget line item. It is a strategic decision in its own right.

The fourth is using external partners well. MIT's "67% external versus 33% internal" success rates are not a case for outsourcing everything. The companies actually winning combine internal teams with external partners and design knowledge transfer into the contract from day one. Anthropic's announcement in May 2026 that it was launching an enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs is symbolic of this shift [9]. As CFO Krishna Rao put it, enterprise demand for Claude has outgrown what a single delivery model can support — even the frontier model labs are spinning up separate organizations dedicated to delivering implementations into the field.

All four points are about people, organization, and operations. None of them are about model selection. That is where the essence of the GenAI Divide sits.

"Call WARP before in-house gets stuck" — a three-month reset framework

This section gets a little more TIMEWELL-specific. Through WARP — our AI consulting service — we run a framework with customers that addresses these four failure patterns concretely. The full service overview lives on the WARP consulting page, but the core idea is a three-month reset. Rather than spending a year designing the ideal future state, we deliver a realistic restart in three months.

Month one is dedicated to a full inventory. We surface shadow AI usage, the wreckage of past pilots, the actual workflows, the location of data, the skills of people, and the KPIs being used. We make the "learning gap" MIT describes visible in the customer's own business context. Many customers tell us this is the first time they have ever articulated their AI situation in this much detail. Which is another way of saying: many companies have spent three years pushing AI forward without doing this groundwork.

Month two is about focus — choosing where to point AI. Because investment is almost always over-concentrated in sales and marketing, we start by identifying back-office and specialist domains where cost is concentrated and decision rules can be made explicit. In the engagements we run, the recurring candidates are contract review, export controls, knowledge search, internal help-desk traffic, and monthly close operations. In parallel, we redesign the workflow and data governance so that AI has solid ground to operate on. The human-in-the-loop design METI's Guidelines 1.2 calls for is built in at this stage [6].

Month three is implementation and education running in parallel. We typically anchor the build on ZEROCK, TIMEWELL's internal knowledge platform, combining GraphRAG with task-specific agents. Data stays on Japan-domestic servers, and operational know-how accumulates as structured knowledge. At the same time, we run reskilling workshops for front-line staff. The reason BCG argues for putting 60% of the AI budget into reskilling is simple: skip this step and the technology investment ends up spinning in place because no one can actually use it [7].

"Call WARP before in-house gets stuck" is something we say to each other a lot. It is not just a slogan — it is MIT's "33% internal versus 67% external" statistic translated into how we operate. Not full outsourcing, not full in-house, but a defined period of external partnership that bakes knowledge transfer into the design. Because our goal is to leave customers self-sufficient at the three-month mark, we deliver consulting, implementation, and training as one engagement. If this resonates, please feel free to reach out via /contact?product=warp.

What partnership-based adoption looks like — the other divide WARP keeps seeing

I want to surface one more thing from the field. The GenAI Divide is not only an outward-facing split between companies; it is also an inward-facing split inside a single company.

I recently worked with a mid-sized firm where leadership was confident they were "not behind on AI." They had multiple GenAI tool contracts and several active pilots. But once we did interviews on the ground, the picture changed: sales were paying for ChatGPT out of their own pockets, engineering was on GitHub Copilot, the corporate functions had nothing they could use, and the governance team had written a policy that nobody had read. The phenomenon MIT calls "shadow AI" was unfolding inside the same company in different forms in each department.

The right move in that situation is not tighter governance. The harder you clamp down, the deeper the front line goes underground. In WARP engagements, we treat shadow AI not as an enemy but as a signal — a set of hints the front line has already discovered about where the work needs help — and we organize them as candidates for legitimate deployment. METI's guidelines, too, are moving from outright prohibition toward "manage the risk while using it." Nikkei Cross Tech's coverage of the 1.2 revision made clear that the central concern is designing in human judgment alongside autonomous AI decisions [10]. The mental shift required is from "ban" to "design."

The other pattern I feel strongly about is that Japanese companies tend to separate reskilling and workflow redesign. Training sits with HR, workflow redesign sits with the business units, AI deployment sits with IT, and nobody truly owns the whole. BCG's point is that AI Trailblazer companies put all three around the same table. McKinsey says the same in different words: only when the CEO becomes the ultimate owner of governance can the CFO, CTO, and CHRO finally have a horizontal conversation about it [3]. WARP's role, for three months, is to be the external party that holds that "same table" together.

I want to be straightforward. We cannot solve every problem at every company. Where there is strong internal conviction and a strong internal champion, many customers do just fine deploying products like ZEROCK or BASE on their own. Where there is no internal champion, where the organization is exhausted from PoC fatigue, where leadership and the front line are operating in different climates — that is where WARP earns its keep. If you are weighing "which side are we on," a 30-minute conversation is the place to start. Even just using it as time to map the MIT and McKinsey numbers onto your own reality should be worth the slot.

Don't get stuck in-house — the next step with WARP

The GenAI Divide is no longer a question of "is it coming." The split is already open. The question is which side of it your company will land on. MIT NANDA's 95%, McKinsey's 6%, Stanford HAI's 53%, BCG's 70% — these numbers all describe the same reality from different angles. The metric that will define corporate value going forward is not "have you adopted AI" but "have you changed the P&L and the organization with AI."

WARP exists to accompany that decision and that implementation. We are not a company that sells AI models or a company that sells tools. We are a team that takes responsibility all the way to where AI actually bites into your operations and your organization. Full service detail is on the WARP consulting page, and the inquiry form is at /contact?product=warp. The first 30-minute consultation is free. Use it as the first step in translating MIT's and McKinsey's numbers into your own reality.

References

[1] MIT NANDA, "The GenAI Divide: State of AI in Business 2025" https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf

[2] Stanford HAI, "The 2026 AI Index Report" https://hai.stanford.edu/ai-index/2026-ai-index-report (PDF: https://hai.stanford.edu/assets/files/ai_index_report_2026.pdf )

[3] McKinsey & Company, "The state of AI: How organizations are rewiring to capture value" https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai

[4] renue, "Twelve GenAI PoC Failure Patterns and Seven Principles for Production Migration 2026" (Japanese) https://renue.co.jp/posts/generative-ai-poc-failure-patterns-production-transition-2026

[5] NTT DATA DATA INSIGHT, "Why Manufacturing AI Stalls at PoC — Not a 'Precision Problem' but a 'Structural Mismatch'" (Japanese) https://www.nttdata.com/jp/ja/trends/data-insight/2026/0501/

[6] METI and MIC, "AI Business Operator Guidelines (Version 1.2)" (March 31, 2026) (Japanese) https://www.meti.go.jp/shingikai/mono_info_service/ai_shakai_jisso/20260331_report.html

[7] BCG, "AI Transformation Is a Workforce Transformation" (2026) https://www.bcg.com/publications/2026/ai-transformation-is-a-workforce-transformation

[8] BCG, "AI Will Reshape More Jobs Than It Replaces" (2026) https://www.bcg.com/publications/2026/ai-will-reshape-more-jobs-than-it-replaces

[9] Anthropic, "Enterprise AI Services Company" (May 2026) https://www.anthropic.com/news/enterprise-ai-services-company

[10] Nikkei Cross Tech, "Government Revises AI Business Operator Guidelines to Require Human Judgment in Autonomous AI Execution" (Japanese) https://xtech.nikkei.com/atcl/nxt/column/18/00001/11580/

[11] Fortune, "MIT report: 95% of generative AI pilots at companies are failing" https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/

The GenAI Divide: Why 95% of Enterprise AI Projects Fail — A Reality Check from MIT NANDA, McKinsey, and Stanford HAI

What three reports agree on: the shape of the GenAI Divide

The four root causes of 95% failure — MIT's findings meet what we see on the ground

What the winning 6% are quietly doing right

"Call WARP before in-house gets stuck" — a three-month reset framework

What partnership-based adoption looks like — the other divide WARP keeps seeing

Don't get stuck in-house — the next step with WARP

References

Considering AI adoption for your organization?

Newsletter

あなたのAIリテラシー、診断してみませんか？

Related Knowledge Base

Solutions

Learn More About WARP

Related Articles

[2026-May Update] AI Operator Guideline Version 1.2 in Plain English: AI Agent Regulation, Human-in-the-Loop Mandate, and What It Means for Enterprises

Anthropic Claude Security Deep Dive [2026-May Update]: How the Opus 4.7-Powered Vulnerability Scanner Works and Integrates with the Enterprise Stack

AI x Legal Implementation Patterns | Contract Review, IP, and Compliance Automation with Major Law Firm Case Studies (2026 Update)

Newsletter