This is Hamamoto from TIMEWELL
This is Hamamoto from TIMEWELL.
OpenAI Announces O3 and O3 Mini on the Final Day of Its 12-Day Event
On December 21 (around 3:00 AM Japan time), on the final day of its 12-day new features and models announcement event, OpenAI unveiled the next-generation AI models O3 and O3 Mini. These models significantly exceed the performance of the preceding O1 model and have achieved remarkable results in programming and mathematics. OpenAI hopes these models will mark the dawn of a new era in artificial intelligence.
This article provides a detailed breakdown of the performance of the next-generation AI models O3 and O3 Mini.
O3 and O3 Mini's Remarkable Performance Record-Breaking Results on the ARC AGI Benchmark What Comes Next Summary
Looking for AI training and consulting?
Learn about WARP training programs and consulting services in our materials.
O3 and O3 Mini's Remarkable Performance
The next-generation AI models O3 and O3 Mini have delivered stunning results across a variety of benchmark tests, particularly in programming and mathematics — far surpassing the performance of the predecessor O1 model.
Strong Performance on Software-Style Benchmarks
On the SWE-bench Verified benchmark, which consists of real-world software tasks, O3 achieved approximately 71.7% accuracy — an improvement of 22.8 percentage points over O1, demonstrating a significant leap in software engineering capability. In programming, it scored 2,727 on the Codeforces ELO ranking, surpassing OpenAI's Chief Scientist's score of 2,665 and demonstrating advanced coding ability. In mathematics, O3 achieved a 96.7% correct answer rate on a simulated USA Mathematical Olympiad exam, far exceeding O1's 83.3%.
Furthermore, O3 recorded over 25% accuracy on the Epic AI's Frontier Math Benchmark, currently considered the most difficult mathematics benchmark. This is an impressive result given that other AI models had achieved less than 2%.
O3 Mini likewise demonstrated exceptional performance, delivering performance equal to or better than O1 Mini at significantly lower cost. Both in programming and mathematics, O3 Mini outperformed O1 Mini.
Record-Breaking Results on the ARC AGI Benchmark
O3 also set a new record on the ARC AGI benchmark — a test that AI models had long struggled with. ARC AGI is a benchmark designed to measure AGI (Artificial General Intelligence) — easy for humans but difficult for AI. Until now, humans had averaged around 84% correct, while the best AI scores hovered around 30%.
O3 Surpasses Human-Level Performance on ARC AGI
On the ARC AGI private test set, O3 achieved 75.7% accuracy under low-compute settings, placing first on the public leaderboard. Under high-compute settings, it reached an accuracy rate of 87.5% — more than three times better than previous models and surpassing the human average of 85%.
This marks the first time an AI model has achieved human-level performance on ARC AGI. Greg, a representative of the ARC Prize Foundation, stated that this result is an important milestone toward AGI and expressed anticipation for further collaboration with OpenAI.
Currently, O3 and O3 Mini are not yet publicly available. OpenAI is currently conducting internal safety testing as well as providing access to external researchers to verify safety before proceeding with broader release.
However, early access is available for safety and security researchers. By filling out an application form on OpenAI's website, interested parties can participate in safety testing of O3 and O3 Mini and be among the first to evaluate these next-generation models. (Applications were accepted through January 10.)
Public Release Timeline
OpenAI has announced a plan to release O3 Mini to the general public at the end of January, with O3 to follow shortly after. However, the release schedule is subject to change depending on the results of safety testing.
OpenAI has also published a report on a new safety technology called "Deliberative Alignment." Traditional safety approaches train models by showing examples of safe and unsafe prompts to learn the boundary between acceptable and unacceptable content. However, this new technique leverages the model's reasoning capabilities to more accurately judge the safety of prompts, enabling a better tradeoff between safety and performance — paving the way for AI models that are both safer and more capable.
The next-generation AI models O3 and O3 Mini announced by OpenAI have demonstrated remarkable performance in programming and mathematics, achieving human-level performance on the ARC AGI benchmark — a potential harbinger of a new era in artificial intelligence.
OpenAI is taking careful measures to ensure safety, conducting both internal testing and external researcher evaluations. The company is also working on the new safety technology "Deliberative Alignment," aiming to realize AI models that are both safer and more capable.
Public Launch Dates Subject to Safety Results
O3 Mini is expected to be released publicly at the end of January and O3 shortly after, though timing may change depending on safety test outcomes. These efforts by OpenAI represent an important step in advancing artificial intelligence while ensuring its safety.
Reference: OpenAI Official HP "Day 12 — o3 preview & call for safety researchers"
Related Articles
- The Reality of a Part-Time Employee Who Worked Full-Time, Took Two Maternity Leaves, and Changed Her View of Work | TIMEWELL
- Before Paternity Leave — What You Absolutely Must Do to Take Leave Even During a Busy Period
- Pursuing a Hands-On Architecture Firm: Finding My Own Way as the 5th Generation of a Construction Company | Fujita Construction
