Hello, this is Hamamoto from TIMEWELL.
The debate over "should we invest in generative AI" feels stale to me. The real question is quantitative: how much do you pay for which model, where do you deploy it, and when does the money come back. McKinsey estimates the annual economic value of generative AI at USD 2.6 to 4.4 trillion[^1], and Goldman Sachs projects a 7% lift to global GDP over a decade[^2]. Meanwhile, MIT NANDA's July 2025 study found that 95% of the USD 30-40 billion enterprises have invested has produced no meaningful P&L impact[^3]. The gap between expectation and execution is wider than for any technology I have watched in twenty years.
This is the fourth installment in our deep-dive AI model comparison series. This time, we use disclosed numbers from Klarna, Microsoft, Salesforce, KPMG, JPMorgan and others to break ROI down into computable units. By the end, you should be able to see the conditions under which your AI investment lands on the winning 5% side.
How to Calculate Generative AI ROI: Time, Cost, and Revenue as Three Axes
ROI debates almost always go off the rails right here. "Productivity went up" will not survive a CFO review. Sit down next to your finance lead and split the calculation across three axes.
The first is time savings. GitHub's research found that developers using Copilot completed the same task in 1 hour 11 minutes, versus 2 hours 41 minutes without it - a 55% reduction (95% CI 21-89%, p=0.0017)[^4]. Forrester's TEI study of Microsoft 365 Copilot reports an average of 9 hours saved per knowledge worker per month[^5]. If 1,000 white-collar staff at JPY 5,000 per hour each save 9 hours a month, that is JPY 540 million in equivalent labor cost per year. Only at this point does "productivity" start speaking the language of the boardroom.
The second is cost reduction. This shows up most clearly in functions where labor cost is already visible, such as accounts payable or customer service. Industry surveys on invoice automation report per-document cost dropping from USD 12-30 to USD 1-5, with first-year ROI of 200-600%[^6]. In customer support, ticket cost has compressed from USD 8-15 with humans to USD 0.5-0.7 with AI in published cases[^7].
The third is revenue contribution. Forrester's TEI noted a 2.5 percentage point lift in win rate after Microsoft 365 Copilot deployment, equivalent to USD 14.8 million in net profit[^5]. BCG's banking case study showed a 40% reduction in false positives for fraud detection and a 20% drop in KYC cost[^8]. Defensive automation routinely turns into offensive revenue.
If you are talking NPV and IRR, three years is the realistic evaluation horizon. AI value typically takes 18-24 months to ramp fully after deployment, and a 12-month cutoff will kill projects that lose money in year one. Stress-test with discount rates of 8-15% and present pessimistic, base, and optimistic scenarios - that is what tends to clear committees today[^9]. When I model these in the field, I stack the value across three layers: labor cost reduction (40-60%), operational efficiency (25-35%), and strategic impact (15-25%).
Interested in leveraging AI?
Download our service materials. Feel free to reach out for a consultation.
ROI by Major Model: How to Use Claude, GPT, and Gemini in Combination
Let us drop into the specific models. Pricing is based on disclosed information as of April 2026, expressed as input/output cost per million tokens.
Anthropic's Claude Sonnet 4.5 is USD 3 / USD 15[^10]. OpenAI's GPT-5.5 is USD 5 / USD 30, with the Pro tier jumping all the way to USD 30 / USD 180[^11]. Google's Gemini 2.5 Pro is the cheapest at USD 1.25 / USD 10 for 200K tokens or below[^12]. On surface unit price alone, Gemini wins outright - but stopping there will lead you to the wrong decision.
Effective unit cost is governed by three factors: prompt caching (Anthropic offers up to 90% off, Google is comparable), the Batch API (50% off on OpenAI and Anthropic), and "tokens needed to complete the same task." OpenAI has stated that GPT-5.5 returns equivalent answers using 40% fewer tokens than the previous generation, so the headline 100% price increase translates to roughly 20% in real terms[^11]. Claude posts SWE-bench Verified at 80.9% on coding tasks, beating GPT-5.1 and Gemini 3 Pro[^13]. Higher accuracy means fewer retries, which means fewer tokens consumed.
In my hands-on experience, Gemini wins on unit price for general document generation and RAG, while Claude wins on per-hour economics for code generation and longer-form agent workflows that require sustained reasoning. GPT-5.5 today gets selected mostly for ChatGPT Enterprise integration and its surrounding ecosystem; on pure API ROI it sits in the middle of the pack. According to Anthropic's own research, engineers using Claude pair the model alongside their work in 60% of their tasks, lifting productivity by 50%[^14].
That said, debating ROI on the model level alone feels dated. Anthropic says it has more than 500 customers paying over USD 1 million per year, and IG Group has publicly stated that it recouped its investment within three months of deployment[^15]. Novo Nordisk reports that authoring productivity for 300-page clinical study reports - of which they previously produced an average of 2.3 per year - has roughly doubled in some cases. TELUS has rolled out a Claude-based internal platform to 57,000 employees[^15]. Model selection should really be decided on total economics, including vendor lock-in, domestic data sovereignty, and SLAs.
ROI by Use Case: P&L Sensitivity Across Coding, Customer Support, Marketing, and Accounting
When you switch the horizontal axis from "model" to "function," the ROI landscape changes dramatically.
Coding is the honor student of generative AI ROI. GitHub Copilot frequently breaks even within 3-6 months[^4]. Accenture saw a 15% lift in PR merge rate, and Cognizant has reached the point where 30% of its internal code is machine-written, with case studies showing 30% productivity gains at an insurance company and 50% in software development at a telecom[^16]. NTT Data applied generative AI to 500 projects in fiscal 2025, achieving 20% productivity improvement across the development process and targeting 40% by fiscal 2027[^17]. Code is mechanically verifiable, which makes it inherently well-suited to AI.
Klarna is the canonical customer support story. Its AI chat handles two thirds of internal inquiries, with response time reduced 82%. Per-ticket cost dropped from USD 0.32 in Q1 2023 to USD 0.19 in Q1 2025, a 40% compression, and quarterly cost fell from USD 57 million to USD 51 million[^18]. On an annualized basis that is USD 10 million in savings, which had grown to USD 60 million by Q3 2025 - equivalent, by their measure, to 853 full-time employees[^18]. On Salesforce Agentforce, publisher Wiley clocked 213% ROI and energy company Engie completes 83% of user interactions with AI alone[^19].
Marketing and accounting differ in character but both produce highly visible savings. AI in marketing has cut the time to produce a 1,500-character article from 8-10 hours to under 2 hours[^20], averaging more than 5 hours saved per week. BCG research has documented banks where personalized offers delivered triple the return of legacy approaches[^8]. In accounting, AP automation drops per-invoice cost from USD 12-30 to USD 1-5, with one cited case at a USD 200 million manufacturer generating USD 6.2 million in first-year value at an 18x ROI[^6].
If I had to pick a sequence among these four functions, my recommendation is to start with accounting, then customer support, then coding, then marketing. The reason is simple: that is the order from most to least defensible in terms of ROI certainty and baseline measurability. Marketing has the highest upside but the most variability, which makes it a poor candidate for the first persuasion exercise.
Disclosed ROI from Major Enterprises: Klarna, Microsoft, Salesforce, KPMG, JPMorgan
Setting aside the textbook discussion, here are the cases where actual numbers are in IR materials and official releases. These are usable as evidence in board meetings.
Klarna, as noted, reduced costs by USD 10 million per year through AI (USD 6 million in marketing, USD 4 million in customer service), with savings expanding to USD 60 million by Q3 2025[^18]. CEO Sebastian Siemiatkowski has publicly described it as "AI-first replacement of 853 jobs." That said, a Bloomberg report in May 2025 noted that the company partially restored human customer service due to quality concerns - the optimal split between AI and humans is still being calibrated[^21].
Microsoft has published three Forrester TEI studies. For mid-market customers, three-year ROI runs 132-353%; for large enterprises, 116%, with NPV of USD 19.7 million for a 25,000-person organization[^5]. British Columbia Investment Corporation saved over 2,300 hours during pilot alone, with 84% of users reporting 10-20% productivity improvement. Commercial Bank of Dubai is freeing up 39,000 hours annually. Microsoft Copilot drew complaints right after launch, but on a three-year horizon the numbers are solid.
Salesforce announced that Agentforce has delivered cumulative customer cost savings of more than USD 100 million[^22]. Wiley's 213% ROI, 1-800Accountant resolving 70% of inquiries autonomously during tax season, Engie's 83% auto-resolution rate, Hero FinCorp closing loan approvals in 30 minutes - the data spans industries. Salesforce itself saved 35,000 hours through internal deployment.
KPMG's 2024 GenAI Executive Survey reports that 57% of leaders said ROI exceeded expectations, 93% felt their competitive position improved, and respondents plan an average of USD 114 million in additional investment over the next year[^23]. In audit, sampling has shifted from 5-10% to full-population analysis, cutting work-paper preparation time by 35% and fundamentally changing how risk is taken.
JPMorgan invests USD 18 billion annually in IT/AI and has nearly half its workforce using generative AI daily[^24]. Its LLM Suite has been deployed to 200,000 people, compressing five-page pitch deck generation to 30 seconds. COiN saves 360,000 hours per year on contract review. McKinsey estimates USD 700 billion in cost reduction potential across the banking sector overall, but expects much of it to be competed away to customers - implying a brutal calculus where firms that do not race to capture it disappear from the market.
One more from Japan. Sumitomo Corporation has achieved JPY 1.2 billion in annual cost reduction from a company-wide rollout of Microsoft 365 Copilot, and Panasonic Connect has shaved 186,000 hours of working time in a single year[^25]. The PwC survey shows 13% of Japanese firms reporting that benefits "greatly exceeded expectations," compared to 51% in the US and 50% in the UK - lower internationally, but the winners are real[^26]. The gap is not technology; it is the willingness to redesign the business process.
Common Failure Patterns: Stuck at PoC, Data Gaps, Organizational Resistance
What stands out as I assemble this data is that the gap between winners and losers is decided not by technology selection but by operational design. The MIT NANDA finding that 95% of investments produced zero ROI[^3] also means - reading it the other way - that 5% are reliably winning. Three patterns separate the losers.
First, getting stuck at PoC. Gartner warned in 2024 that "by the end of 2025, at least 30% of generative AI projects will be abandoned after proof of concept"[^27], but more recent surveys suggest the actual figure has reached 50%. S&P Global reports that 42% of companies abandoned the majority of their AI projects (up sharply from 17% in 2024)[^26]. PoCs should not end at "we tried it"; from day one, set production KPIs and an integration plan together. As MIT also notes, the approach of buying from specialist vendors and combining capabilities succeeds 67% of the time, while in-house builds succeed at one third that rate[^3]. This is where the in-house-build religion needs to give way.
Second, data gaps. Informatica's research found that 43% of implementations cited data quality as the top obstacle, and a separate industry survey reports that 85% of AI projects fail on "data quality"[^28]. Generative AI does not generate ROI from training data; it generates it from being connected to your operational data. Whether you choose RAG or fine-tuning, if you do not invest in the underlying knowledge base, even a high-performing model is unusable.
Third, organizational resistance. AI projects with C-level sponsorship succeed 84% of the time, versus 23% without[^29]. IBM also reports that 60% of companies have not set KPIs[^29]. Before lamenting that "the front line will not use it," ask whether you have the conviction to rewire performance reviews, business processes, and training.
In my experience, 90% of failing companies treat "tool deployment = done." Generative AI is not something you buy; it is a tool for redesigning how work is done. Without arguing about which parts of the business to break and how before deployment, all you are left with is JPY 30 million in annual license fees on the books. Most Japanese companies are still trapped in this loop[^26].
Six Levers to Maximize ROI
To close, here are six levers TIMEWELL actually uses in consulting engagements, distilled from the research and field experience above.
First, rigorous baseline measurement. Measure pre-deployment cost, cycle time, error rate, and CSAT for at least one month. Without comparability, you cannot talk ROI. As BCG points out, top-ROI companies are "value-driven," not technology-driven[^8].
Second, use case selection. McKinsey's analysis concentrates 75% of economic value in four domains: customer operations, marketing & sales, software engineering, and R&D[^1]. Do not spread thin; concentrate where it works.
Third, treat models as a portfolio. Use Claude, GPT, and Gemini differentiated by workload, then compress effective unit cost 50-95% with prompt caching and batching. Single-model lock-in is economically irrational.
Fourth, build the knowledge base. As MIT notes, in-house AI succeeds at one third the rate of vendored solutions[^3]. The flip side: structuring operational knowledge so AI can consume it - the "data foundation" - is rightly built in-house. This is where TIMEWELL's ZEROCK comes in. We designed it as a foundation that uses GraphRAG and AWS domestic servers to safely connect enterprise-sensitive information to AI.
Fifth, refit the organization and incentive system. Build using AI into performance criteria, and design a mechanism to reinvest the time you save. What Sumitomo Corporation and Klarna are actually doing comes down to this.
Sixth, NPV governance over a three-year horizon. Evaluating at 12 months and giving up is the same as removing generative AI from the option set. Calculate three-year NPV at roughly 10% discount, and operate with quarterly assumption reviews. BCG's observation that "companies that scale share four tactics: focus on value, embed in transformation, partner actively, expand in sequence" is on the mark[^8].
We offer a service called WARP that supports companies end-to-end, from AI strategy design through KPI design, model selection, PoC, production rollout, and organizational redesign. The conviction I have settled on over the past two years is that, before you invest in AI, you should pick a partner who is willing to share the conviction and the plan to redesign the business with you. AI alone does not produce ROI. People and processes always do.
For related themes, see Management in the Age of AI Agents for governance perspectives, AI Agent Operations KPI for monitoring design, and AI-Driven Business Model Transformation for thinking about revenue contribution. Reading them together will give you a more three-dimensional view.
References
[^1]: McKinsey, "The economic potential of generative AI: The next productivity frontier", https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier [^2]: Goldman Sachs, "Generative AI could raise global GDP by 7%", https://www.goldmansachs.com/insights/articles/generative-ai-could-raise-global-gdp-by-7-percent [^3]: MIT NANDA, "The GenAI Divide: State of AI in Business 2025" / Fortune coverage, https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/ [^4]: GitHub Blog, "Research: quantifying GitHub Copilot's impact on developer productivity and happiness", https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/ [^5]: Microsoft / Forrester, "The Total Economic Impact of Microsoft 365 Copilot", https://www.microsoft.com/en-us/microsoft-365/blog/2024/10/17/microsoft-365-copilot-drove-up-to-353-roi-for-small-and-medium-businesses-new-study/ [^6]: Artsyl, "Invoice Processing Automation: 2025 ROI Formula Guide", https://www.artsyltech.com/blog/invoice-processing-automation-guide [^7]: Pylon, "AI Ticket Deflection", https://www.usepylon.com/blog/ai-ticket-deflection-reduce-support-volume-2025 [^8]: BCG, "The Widening AI Value Gap - Build for the Future 2025", https://www.bcg.com/publications/2025/are-you-generating-value-from-ai-the-widening-gap [^9]: Writer, "AI ROI calculator: From generative to agentic AI success in 2025", https://writer.com/blog/roi-for-generative-ai/ [^10]: Anthropic, "Claude API Pricing", https://platform.claude.com/docs/en/about-claude/pricing [^11]: OpenAI, "API Pricing" / The Decoder, https://openai.com/api/pricing/ [^12]: Google AI for Developers, "Gemini Developer API pricing", https://ai.google.dev/gemini-api/docs/pricing [^13]: Vellum AI, "Claude Opus 4.5 Benchmarks", https://www.vellum.ai/blog/claude-opus-4-5-benchmarks [^14]: Anthropic, "Models overview / How enterprises are driving AI transformation with Claude", https://anthropic.com/news/driving-ai-transformation-with-claude [^15]: Anthropic, "Customer Stories", https://claude.com/customers [^16]: Cognizant, "Generative AI Services PEAK Matrix Assessment 2025", https://www.cognizant.com/us/en/recognitions/artificial-intelligence-and-generative-ai-services-peak-matrix-assessment-2025 [^17]: NTT Data, "2025 Trends: Where Generative-AI-Powered Software Development Stands", https://www.nttdata.com/jp/ja/trends/data-insight/2025/1201/ [^18]: Customer Experience Dive, "Klarna credits AI for slashing customer service costs", https://www.customerexperiencedive.com/news/klarna-ai-slash-customer-service-costs/748647/ [^19]: Salesforce, "Agentforce Customer Stories / Metrics", https://www.salesforce.com/agentforce/customer-stories/ [^20]: CoSchedule, "State of AI in Marketing Report 2025", https://coschedule.com/ai-marketing-statistics [^21]: Bloomberg, "Klarna Turns From AI to Real Person Customer Service", https://www.bloomberg.com/news/articles/2025-05-08/klarna-turns-from-ai-to-real-person-customer-service [^22]: Salesforce, "Agentforce Metrics: Real Impact & Results", https://www.salesforce.com/agentforce/metrics/ [^23]: KPMG, "AI Quarterly Pulse Survey / 2024 GenAI Executive Survey", https://kpmg.com/us/en/articles/2025/ai-quarterly-pulse-survey.html [^24]: CNBC, "JPMorgan Chase's blueprint to become the world's first fully AI-powered megabank", https://www.cnbc.com/2025/09/30/jpmorgan-chase-fully-ai-connected-megabank.html [^25]: Taskhub / Corporate IR, "[2025 Edition] Generative AI Adoption in Japan: Status, Challenges, and 11 Cases", https://taskhub.jp/useful/generative-ai-adoption-status/ [^26]: PwC Japan, "Generative AI Survey 2025 Spring: Five-Country Comparison", https://www.pwc.com/jp/ja/knowledge/thoughtleadership/generative-ai-survey2025.html [^27]: Gartner, "Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept By End of 2025", https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025 [^28]: Informatica, "The Surprising Reason Most AI Projects Fail", https://www.informatica.com/blogs/the-surprising-reason-most-ai-projects-fail-and-how-to-avoid-it-at-your-enterprise.html [^29]: IBM, "How to maximize AI ROI in 2026", https://www.ibm.com/think/insights/ai-roi
![Generative AI ROI Comparison [2026 Edition] | A Quantitative Analysis of the AI Models and Tools Enterprises Should Choose Based on Return on Investment](/images/columns/generative-ai-roi-enterprise-comparison-2026/cover.png)