AIコンサル

OpenAI Announces Next-Generation AI Models O3 and O3 Mini

2026-01-21濱本

On December 21 (around 3:00 AM Japan time), on the final day of their 12-day new features and models event, OpenAI announced the next-generation AI models O3 and O3 Mini. These models exceed the performance of the preceding O1 model, delivering impressive results in programming and mathematics.

OpenAI Announces Next-Generation AI Models O3 and O3 Mini
シェア

This is Hamamoto from TIMEWELL

This is Hamamoto from TIMEWELL.

OpenAI Announces O3 and O3 Mini on the Final Day of Its 12-Day Event

On December 21 (around 3:00 AM Japan time), on the final day of its 12-day new features and models announcement event, OpenAI unveiled the next-generation AI models O3 and O3 Mini. These models significantly exceed the performance of the preceding O1 model and have achieved remarkable results in programming and mathematics. OpenAI hopes these models will mark the dawn of a new era in artificial intelligence.

This article provides a detailed breakdown of the performance of the next-generation AI models O3 and O3 Mini.

O3 and O3 Mini's Remarkable Performance Record-Breaking Results on the ARC AGI Benchmark What Comes Next Summary

Looking for AI training and consulting?

Learn about WARP training programs and consulting services in our materials.

O3 and O3 Mini's Remarkable Performance

The next-generation AI models O3 and O3 Mini have delivered stunning results across a variety of benchmark tests, particularly in programming and mathematics — far surpassing the performance of the predecessor O1 model.

Strong Performance on Software-Style Benchmarks

On the SWE-bench Verified benchmark, which consists of real-world software tasks, O3 achieved approximately 71.7% accuracy — an improvement of 22.8 percentage points over O1, demonstrating a significant leap in software engineering capability. In programming, it scored 2,727 on the Codeforces ELO ranking, surpassing OpenAI's Chief Scientist's score of 2,665 and demonstrating advanced coding ability. In mathematics, O3 achieved a 96.7% correct answer rate on a simulated USA Mathematical Olympiad exam, far exceeding O1's 83.3%.

Furthermore, O3 recorded over 25% accuracy on the Epic AI's Frontier Math Benchmark, currently considered the most difficult mathematics benchmark. This is an impressive result given that other AI models had achieved less than 2%.

O3 Mini likewise demonstrated exceptional performance, delivering performance equal to or better than O1 Mini at significantly lower cost. Both in programming and mathematics, O3 Mini outperformed O1 Mini.

Record-Breaking Results on the ARC AGI Benchmark

O3 also set a new record on the ARC AGI benchmark — a test that AI models had long struggled with. ARC AGI is a benchmark designed to measure AGI (Artificial General Intelligence) — easy for humans but difficult for AI. Until now, humans had averaged around 84% correct, while the best AI scores hovered around 30%.

O3 Surpasses Human-Level Performance on ARC AGI

On the ARC AGI private test set, O3 achieved 75.7% accuracy under low-compute settings, placing first on the public leaderboard. Under high-compute settings, it reached an accuracy rate of 87.5% — more than three times better than previous models and surpassing the human average of 85%.

This marks the first time an AI model has achieved human-level performance on ARC AGI. Greg, a representative of the ARC Prize Foundation, stated that this result is an important milestone toward AGI and expressed anticipation for further collaboration with OpenAI.

Currently, O3 and O3 Mini are not yet publicly available. OpenAI is currently conducting internal safety testing as well as providing access to external researchers to verify safety before proceeding with broader release.

However, early access is available for safety and security researchers. By filling out an application form on OpenAI's website, interested parties can participate in safety testing of O3 and O3 Mini and be among the first to evaluate these next-generation models. (Applications were accepted through January 10.)

Public Release Timeline

OpenAI has announced a plan to release O3 Mini to the general public at the end of January, with O3 to follow shortly after. However, the release schedule is subject to change depending on the results of safety testing.

OpenAI has also published a report on a new safety technology called "Deliberative Alignment." Traditional safety approaches train models by showing examples of safe and unsafe prompts to learn the boundary between acceptable and unacceptable content. However, this new technique leverages the model's reasoning capabilities to more accurately judge the safety of prompts, enabling a better tradeoff between safety and performance — paving the way for AI models that are both safer and more capable.

The next-generation AI models O3 and O3 Mini announced by OpenAI have demonstrated remarkable performance in programming and mathematics, achieving human-level performance on the ARC AGI benchmark — a potential harbinger of a new era in artificial intelligence.

OpenAI is taking careful measures to ensure safety, conducting both internal testing and external researcher evaluations. The company is also working on the new safety technology "Deliberative Alignment," aiming to realize AI models that are both safer and more capable.

Public Launch Dates Subject to Safety Results

O3 Mini is expected to be released publicly at the end of January and O3 shortly after, though timing may change depending on safety test outcomes. These efforts by OpenAI represent an important step in advancing artificial intelligence while ensuring its safety.

Reference: OpenAI Official HP "Day 12 — o3 preview & call for safety researchers"

Considering AI adoption for your organization?

Our DX and data strategy experts will design the optimal AI adoption plan for your business. First consultation is free.

Share this article if you found it useful

シェア

Newsletter

Get the latest AI and DX insights delivered weekly

Your email will only be used for newsletter delivery.

無料診断ツール

あなたのAIリテラシー、診断してみませんか?

5分で分かるAIリテラシー診断。活用レベルからセキュリティ意識まで、7つの観点で評価します。

Learn More About AIコンサル

Discover the features and case studies for AIコンサル.