AIコンサル

Gemini 2.5 Pro Deep Think: IMO Gold Medal, 34.8% on HLE, and Multi-Agent Reasoning Explained

2026-01-21濱本

A complete guide to Gemini 2.5 Pro Deep Think — benchmark results (34.8% on Humanity's Last Exam, gold medal at IMO 2025), parallel thinking architecture, multi-agent design, comparison with o3 and Claude Opus 4.5, pricing, and business applications.

Gemini 2.5 Pro Deep Think: IMO Gold Medal, 34.8% on HLE, and Multi-Agent Reasoning Explained
シェア

This is Hamamoto from TIMEWELL.

In 2026, Gemini 2.5 Pro Deep Think reached the top of the reasoning AI benchmark rankings. The model scored 34.8% on Humanity's Last Exam — significantly ahead of Grok 4 (25.4%) and o3 (20.3%) — and achieved gold medal-level performance at the International Mathematical Olympiad 2025. This article covers the architecture behind these results, the benchmark details, a comparison with competing models, and where Deep Think fits in practical business workflows.

Current Status at a Glance

Item Details
Model name Gemini 2.5 Pro Deep Think
General availability August 1, 2025
Architecture Sparse Mixture-of-Experts Transformer
Max input tokens 1 million
Max output tokens 192,000
Humanity's Last Exam 34.8% (no tools)
IMO 2025 Gold medal level (research version)
Pricing Google Ultra $250/month
Key features Parallel thinking, multi-agent

Deep Think: What Parallel Thinking Actually Means

How Standard AI Reasons vs. Deep Think

Standard AI models reason sequentially — following a single chain of thought from start to finish, evaluating one approach at a time.

Deep Think operates differently:

  • Generates multiple candidate approaches simultaneously
  • Evaluates different reasoning paths in parallel
  • Refines and merges approaches over time
  • Selects the optimal final answer from the parallel search

The analogy: a standard model is one person thinking through a problem step by step. Deep Think is multiple people working the problem simultaneously and comparing notes.

Multi-Agent Architecture

Gemini 2.5 Pro Deep Think is the first multi-agent model Google has released publicly.

How multi-agent differs from single-agent:

  • A single question spawns multiple AI agent instances
  • Each agent works on the problem independently and in parallel
  • Results are compared and consolidated
  • Higher computational cost than single-agent reasoning
  • Higher output quality on complex problems

Best suited for:

  • Iterative design and development tasks
  • Scientific and mathematical research
  • Complex competitive programming problems
  • Business analysis requiring multiple analytical angles

Looking for AI training and consulting?

Learn about WARP training programs and consulting services in our materials.

Benchmark Results

Humanity's Last Exam (HLE)

HLE is a benchmark covering difficult problems across mathematics, humanities, and sciences — designed to test the outer limits of current AI reasoning capability.

Model Score (no tools)
Gemini 2.5 Pro Deep Think 34.8%
xAI Grok 4 25.4%
OpenAI o3 20.3%

Google describes this as the current state-of-the-art performance.

International Mathematical Olympiad (IMO) 2025

  • Research version: Gold medal level
  • Generally available version: Bronze level (multi-hour reasoning processes removed for practical responsiveness)

The generally available model trades the most computationally intensive reasoning steps for more practical response times. Users requiring maximum mathematical reasoning can access the research-tier capability through the API.

Additional Benchmarks

Benchmark Result
2025 USAMO Highest score (mathematics)
LiveCodeBench 6 Highest score (competitive programming)
MMMU 84.0% (multimodal reasoning)

Technical Specifications

Item Specification
Architecture Sparse Mixture-of-Experts Transformer
Input modalities Text, images, audio
Max input tokens 1,000,000
Max output tokens 192,000

Safety testing results:

  • Content safety: improved over Gemini 2.5 Pro
  • Tone objectivity: improved
  • Note: slightly higher tendency to refuse benign requests; prompt adjustment may be needed

Pricing and Access

Plan Price Deep Think Access
Google Ultra $250/month Available
Standard Gemini Free and up Limited

Access steps:

  1. Open Gemini app (Web, Android, iOS)
  2. Select "Gemini 2.5 Pro" from the model dropdown
  3. Toggle "Deep Think" on in the prompt bar
  4. Note: daily prompt limits apply

API access: Available through Vertex AI and Google AI Studio. Higher computational cost means API usage is priced accordingly. Best used for complex tasks where the quality improvement justifies the cost.

Evolution: Then vs. Now

Item Feb 2024 (Gemini 1.0 Ultra) Jan 2026 (Gemini 2.5 Pro Deep Think)
Reasoning approach Single-pass Parallel thinking + multi-agent
HLE score Not measured 34.8% (top score)
IMO Not entered Gold medal level
Max input tokens 128K 1M
Max output tokens 8K 192K
Multimodal Limited Text, images, audio
Price Gemini Advanced $20/month Ultra $250/month

Competitor Comparison

vs. OpenAI o3

Item Gemini 2.5 Pro Deep Think OpenAI o3
HLE 34.8% 20.3%
IMO Gold medal level Not disclosed
Reasoning approach Multi-agent Single-agent
Max input tokens 1M 200K
Max output tokens 192K 100K
Pricing Ultra $250/month Pro $200/month

vs. Claude Opus 4.5

Item Gemini 2.5 Pro Deep Think Claude Opus 4.5
Strength Math and scientific reasoning Long-running tasks, code generation
Architecture Multi-agent Extended thinking
Max input tokens 1M 1M
Multimodal Text, images, audio Text, images
Ecosystem Google Workspace Claude Code

When to Use Each

Gemini 2.5 Pro Deep Think is the better choice for:

  • Complex mathematical and scientific problems
  • Tasks requiring multi-angle analysis
  • Competitive programming-level code generation
  • Google Workspace integration workflows

Other models may be better for:

  • Long-running autonomous tasks (Claude Opus 4.5)
  • General-purpose conversation (GPT-5.2)
  • Cost-efficiency priority (Gemini 2.5 Flash)

Google Workspace Integration

Gemini 2.5 Pro integrates across the Google Workspace suite:

  • Gmail: AI-assisted email drafting and replies
  • Google Docs: Document summarization, generation, editing
  • Google Sheets: Data analysis and formula generation
  • Google Slides: Automatic presentation generation
  • Google Meet: Meeting summarization and action item extraction

Business Use Cases for Deep Think

  1. Complex analysis reports: Multi-angle financial data analysis
  2. Technical design: Evaluating multiple architectural options
  3. Strategic planning: Competitive analysis and strategy option evaluation
  4. R&D: Scientific hypothesis validation

Adoption Considerations

Advantages

  • Highest-tier reasoning: top scores on HLE, IMO, LiveCodeBench
  • Multi-agent flexibility: multiple analytical perspectives in a single query
  • Google ecosystem integration: seamless Workspace and NotebookLM connectivity
  • Massive context window: 1M input tokens for complex, long-context tasks

Limitations

  • Cost: Ultra at $250/month is higher than competing plans
  • Response speed: Deep Think takes longer than standard generation; unsuitable for real-time applications
  • Over-refusal: Slightly elevated tendency to refuse benign requests; prompt engineering may be required

Summary

Gemini 2.5 Pro Deep Think leads the 2026 reasoning AI benchmark rankings:

  • Humanity's Last Exam: 34.8% — ahead of Grok 4 (25.4%) and o3 (20.3%)
  • IMO 2025: gold medal level (research version)
  • LiveCodeBench 6 and 2025 USAMO: top scores
  • Architecture: parallel thinking across multiple simultaneous reasoning paths
  • Multi-agent: multiple AI agents working the same problem concurrently
  • Context: 1M input tokens, 192K output tokens
  • Available on Google Ultra at $250/month
  • Deep Google Workspace integration

From Gemini 1.0 Ultra in February 2024 to the 2026 release, Google has moved to the front of the reasoning AI competition through architectural innovation. For tasks requiring complex problem analysis across multiple angles — mathematics, science, competitive programming — Gemini 2.5 Pro Deep Think is currently one of the strongest available options.

Considering AI adoption for your organization?

Our DX and data strategy experts will design the optimal AI adoption plan for your business. First consultation is free.

Share this article if you found it useful

シェア

Newsletter

Get the latest AI and DX insights delivered weekly

Your email will only be used for newsletter delivery.

無料診断ツール

あなたのAIリテラシー、診断してみませんか?

5分で分かるAIリテラシー診断。活用レベルからセキュリティ意識まで、7つの観点で評価します。

Learn More About AIコンサル

Discover the features and case studies for AIコンサル.