AIコンサル

Stanford AI Index Report 2023 (Part 2): R&D Trends, LLM Scale, and the Race Between Nations

2026-01-21濱本

Part 2 of the Stanford AI Index Report 2023, covering the R&D chapter: how AI research publications grew from 200K to 500K, why China leads in volume but the US leads in citation quality, the dramatic shift from academia to industry, and how LLM training costs have escalated from $50K to $8M.

Stanford AI Index Report 2023 (Part 2): R&D Trends, LLM Scale, and the Race Between Nations
シェア

This is Hamamoto from TIMEWELL.

Stanford AI Index 2023, Part 2: The R&D Landscape

This is the second part of my summary of the Stanford AI Index Report 2023. Part 1 covered the overview, investment trends, and societal impact findings. This part focuses on the R&D chapter—where the research is actually happening, who is doing it, and how fast capabilities are scaling.

AI Research Publications: From 200K to 500K

One of the clearest signals in the report is the raw volume of AI research. Global AI-related publications grew from roughly 200,000 per year in 2010 to over 500,000 by 2021. That's a 2.5x increase in just over a decade.

China vs. the United States

China now leads the world in AI publication volume by a significant margin. But volume and influence are not the same thing. When measured by citation counts—a proxy for how much a paper actually shapes subsequent research—the United States remains the clear leader.

This distinction matters. A country can publish many papers without producing the foundational research that others build on. The US academic and industry ecosystem continues to generate the work that defines the field, even as Chinese institutions produce more papers overall.

Europe's position

The EU ranks between China and the US, solid in volume but behind both in the highest-impact work.

Looking for AI training and consulting?

Learn about WARP training programs and consulting services in our materials.

The Shift from Academia to Industry

One of the most significant structural changes the report documents is the migration of cutting-edge AI research from universities to industry labs.

In 2022, industry produced 32 notable machine learning models. Academia produced 3.

This inversion was not the norm a decade ago. Universities were the primary source of AI breakthroughs for most of the field's history. The shift reflects a simple reality: training frontier models now requires compute resources that only large technology companies can afford.

Implications for talent

This also means that the researchers capable of working on the most capable systems are increasingly concentrated inside a handful of companies—primarily in the United States. Academic labs are losing ground not just in compute, but in talent.

The Scale Escalation of Large Language Models

The report traces the evolution of LLM scale across four years:

Model Year Parameters Estimated Training Cost
GPT-2 2019 1.5 billion ~$50,000
GPT-3 2020 175 billion ~$4.6 million
PaLM 2022 540 billion ~$8 million

The parameter count grew 360x. The training cost grew 160x. Both trends show no sign of slowing.

What this means in practice: the barrier to training a frontier model is no longer just technical—it's financial. This is concentrating capability development among a small number of well-capitalized organizations.

Benchmark Saturation: Progress Without a Ruler

One of the subtler findings in the R&D chapter is the problem of benchmark saturation. As AI systems improve, they eventually reach near-perfect scores on standard evaluation benchmarks. When that happens, the benchmark stops being useful—you can't measure further progress with a ruler that everyone has already maxed out.

This has happened repeatedly:

  • ImageNet (image classification): Near-perfect human-level performance achieved
  • SuperGLUE (language understanding): Exceeded human baselines
  • Several reading comprehension benchmarks: Similarly saturated

Researchers respond by creating harder benchmarks. But this is an ongoing arms race, and it makes cross-year comparisons difficult. A model that scores 95% in 2023 on a new benchmark may be dramatically more capable than a model that scored 95% in 2020 on an older one—but the numbers look the same.

The Cryptocurrency Index Parallel

The report introduces a "Crypto Index" as an analogy for thinking about AI development speed. Just as cryptocurrency indices track volatility and adoption curves across different digital assets, the AI Index attempts to measure AI progress across multiple dimensions simultaneously.

This framing reflects a broader methodological challenge: AI is not one thing, and measuring "AI progress" as a single number is inherently misleading. The Stanford team acknowledges this and presents the index as a tool for directional orientation rather than precise measurement.

Key Takeaways

  • AI research volume has grown 2.5x in a decade, with China leading in quantity and the US in influence
  • Industry now produces 10x more notable ML models than academia
  • LLM training costs grew from $50K to $8M in four years
  • Benchmark saturation is a real methodological problem that understates true capability gains
  • No single index can capture AI progress—directional trends matter more than point-in-time scores

TIMEWELL AI Consulting

TIMEWELL supports business transformation in the AI agent era.

Our Services

  • ZEROCK: High-security AI agent running on domestic servers
  • TIMEWELL Base: AI-native event management platform
  • WARP: AI talent development program

Book a Free Consultation →

Considering AI adoption for your organization?

Our DX and data strategy experts will design the optimal AI adoption plan for your business. First consultation is free.

Share this article if you found it useful

シェア

Newsletter

Get the latest AI and DX insights delivered weekly

Your email will only be used for newsletter delivery.

無料診断ツール

あなたのAIリテラシー、診断してみませんか?

5分で分かるAIリテラシー診断。活用レベルからセキュリティ意識まで、7つの観点で評価します。

Learn More About AIコンサル

Discover the features and case studies for AIコンサル.