How has AI research publication volume changed over the past decade?

Global AI research publications grew from approximately 200,000 in 2010 to over 500,000 by 2021—a 2.5x increase. China now leads in sheer volume, while the US maintains leadership in citation quality and influence.

How much does it cost to train a large language model in 2023?

Training costs have escalated dramatically. GPT-2 (1.5 billion parameters) cost roughly $50,000 to train in 2019. PaLM (540 billion parameters) cost approximately $8 million in 2022. More parameters require exponentially more compute.

What is benchmark saturation in AI, and why does it matter?

Benchmark saturation occurs when AI models reach near-perfect scores on standard tests like ImageNet, making those tests no longer useful for measuring further progress. Researchers must continuously create harder benchmarks to track real capability gains.

Stanford AI Index Report 2023 (Part 2): R&D Trends, LLM Scale, and the Race Between Nations

This is Hamamoto from TIMEWELL.

Stanford AI Index 2023, Part 2: The R&D Landscape

This is the second part of my summary of the Stanford AI Index Report 2023. Part 1 covered the overview, investment trends, and societal impact findings. This part focuses on the R&D chapter—where the research is actually happening, who is doing it, and how fast capabilities are scaling.

AI Research Publications: From 200K to 500K

One of the clearest signals in the report is the raw volume of AI research. Global AI-related publications grew from roughly 200,000 per year in 2010 to over 500,000 by 2021. That's a 2.5x increase in just over a decade.

China vs. the United States

China now leads the world in AI publication volume by a significant margin. But volume and influence are not the same thing. When measured by citation counts—a proxy for how much a paper actually shapes subsequent research—the United States remains the clear leader.

This distinction matters. A country can publish many papers without producing the foundational research that others build on. The US academic and industry ecosystem continues to generate the work that defines the field, even as Chinese institutions produce more papers overall.

Europe's position

The EU ranks between China and the US, solid in volume but behind both in the highest-impact work.

The Shift from Academia to Industry

One of the most significant structural changes the report documents is the migration of cutting-edge AI research from universities to industry labs.

In 2022, industry produced 32 notable machine learning models. Academia produced 3.

This inversion was not the norm a decade ago. Universities were the primary source of AI breakthroughs for most of the field's history. The shift reflects a simple reality: training frontier models now requires compute resources that only large technology companies can afford.

Implications for talent

This also means that the researchers capable of working on the most capable systems are increasingly concentrated inside a handful of companies—primarily in the United States. Academic labs are losing ground not just in compute, but in talent.

The Scale Escalation of Large Language Models

The report traces the evolution of LLM scale across four years:

Model	Year	Parameters	Estimated Training Cost
GPT-2	2019	1.5 billion	~$50,000
GPT-3	2020	175 billion	~$4.6 million
PaLM	2022	540 billion	~$8 million

The parameter count grew 360x. The training cost grew 160x. Both trends show no sign of slowing.

What this means in practice: the barrier to training a frontier model is no longer just technical—it's financial. This is concentrating capability development among a small number of well-capitalized organizations.

Benchmark Saturation: Progress Without a Ruler

One of the subtler findings in the R&D chapter is the problem of benchmark saturation. As AI systems improve, they eventually reach near-perfect scores on standard evaluation benchmarks. When that happens, the benchmark stops being useful—you can't measure further progress with a ruler that everyone has already maxed out.

This has happened repeatedly:

ImageNet (image classification): Near-perfect human-level performance achieved
SuperGLUE (language understanding): Exceeded human baselines
Several reading comprehension benchmarks: Similarly saturated

Researchers respond by creating harder benchmarks. But this is an ongoing arms race, and it makes cross-year comparisons difficult. A model that scores 95% in 2023 on a new benchmark may be dramatically more capable than a model that scored 95% in 2020 on an older one—but the numbers look the same.

The Cryptocurrency Index Parallel

The report introduces a "Crypto Index" as an analogy for thinking about AI development speed. Just as cryptocurrency indices track volatility and adoption curves across different digital assets, the AI Index attempts to measure AI progress across multiple dimensions simultaneously.

This framing reflects a broader methodological challenge: AI is not one thing, and measuring "AI progress" as a single number is inherently misleading. The Stanford team acknowledges this and presents the index as a tool for directional orientation rather than precise measurement.

Key Takeaways

AI research volume has grown 2.5x in a decade, with China leading in quantity and the US in influence
Industry now produces 10x more notable ML models than academia
LLM training costs grew from $50K to $8M in four years
Benchmark saturation is a real methodological problem that understates true capability gains
No single index can capture AI progress—directional trends matter more than point-in-time scores

TIMEWELL AI Consulting

TIMEWELL supports business transformation in the AI agent era.

Our Services

ZEROCK: High-security AI agent running on domestic servers
TIMEWELL Base: AI-native event management platform
WARP: AI talent development program

Book a Free Consultation →

Stanford AI Index Report 2023 (Part 2): R&D Trends, LLM Scale, and the Race Between Nations

Stanford AI Index 2023, Part 2: The R&D Landscape

AI Research Publications: From 200K to 500K

The Shift from Academia to Industry

The Scale Escalation of Large Language Models

Benchmark Saturation: Progress Without a Ruler

The Cryptocurrency Index Parallel

Key Takeaways

TIMEWELL AI Consulting

Our Services

Considering AI adoption for your organization?

Newsletter

あなたのAIリテラシー、診断してみませんか？

Related Knowledge Base

Solutions

Learn More About AIコンサル

Related Articles

The Day the Government Becomes a Startup's 'First Customer': How the New Procurement Package for Japan's 17 Strategic Sectors Changes the Deep Tech Landscape (April 2026 Update)

Management Strategy for an AI-Driven Society — Fujitsu CTO Takagi on the Reality of "Human-Centered AI x Corporate Transformation" [SusHi Tech Tokyo 2026]

AI x Education for Well-being in the Intelligent Age | The Vision of UTokyo President Fujii and Mongolia-born AI Academia at SusHi Tech Tokyo 2026

Newsletter