TIMEWELL
Solutions
Free ConsultationContact Us
TIMEWELL

Unleashing organizational potential with AI

Services

  • ZEROCK
  • TRAFEED (formerly ZEROCK ExCHECK)
  • TIMEWELL BASE
  • WARP
  • └ WARP 1Day
  • └ WARP NEXT
  • └ WARP BASIC
  • └ WARP ENTRE
  • └ Alumni Salon
  • AIコンサル
  • ZEROCK Buddy

Company

  • About Us
  • Team
  • Why TIMEWELL
  • News
  • Contact
  • Free Consultation

Content

  • Insights
  • Knowledge Base
  • Case Studies
  • Whitepapers
  • Events
  • Solutions
  • AI Readiness Check
  • ROI Calculator

Legal

  • Privacy Policy
  • Manual Creator Extension
  • WARP Terms of Service
  • WARP NEXT School Rules
  • Legal Notice
  • Security
  • Anti-Social Policy
  • ZEROCK Terms of Service
  • TIMEWELL BASE Terms of Service

Newsletter

Get the latest AI and DX insights delivered weekly

Your email will only be used for newsletter delivery.

© 2026 株式会社TIMEWELL All rights reserved.

Contact Us
HomeColumnsテックトレンドStripe's 'Minions' and the Reality of Autonomous AI Agents
テックトレンド

Stripe's 'Minions' and the Reality of Autonomous AI Agents

2026-02-24濱本 隆太
AIAI AgentsAutomationProductivityWARP

A comprehensive look at Stripe's autonomous coding agent "Minions" — which generates over 1,000 pull requests per week without human intervention — alongside the state of AI agent development at Google, Microsoft, OpenAI, Anthropic, and leading startups.

Stripe's 'Minions' and the Reality of Autonomous AI Agents
シェア

Hello, I'm Hamamoto from TIMEWELL. Today's topic is autonomous AI agents.

Over the past year, the world of software development has completely changed its landscape. The era where AI completes your code is already ending. What's happening now is the emergence of agents: you give a developer's instruction, and AI independently plans tasks, writes code, runs tests, and fixes failures. All of that.

A symbolic event of this shift happened in February 2026. Stripe — the giant of payment platforms — published details about their internally operated autonomous coding agent "Minions" on their blog. Over 1,000 pull requests per week being generated entirely by AI with no human hands involved. This fact was a shock to me, and to many engineers.

So I became curious: what about everyone else? I investigated everything from major players like Google, Microsoft, Amazon, OpenAI, and Anthropic to startups like Cognition AI. It's quite long, but I'd love for you to stay with me.

Stripe's "Minions" and the Reality of Autonomous Agents

The title of the blog post Stripe published on February 9, 2026, was "Minions: Stripe's one-shot, end-to-end coding agents."

Minions operates completely autonomously. A developer just mentions "fix this bug" in a Slack thread, and the agent independently starts working, finally submitting a pull request that has passed all CI tests. During that time, humans do nothing. This is the daily scene at Stripe, and it's astonishing.

In numbers: over 1,000 pull requests per week are automatically generated by Minions and merged. Humans do the final review, but the code itself is 100% AI-produced. Engineers can start multiple Minions simultaneously to process small tasks in parallel. They're even using it to knock out small issues accumulated during on-call duty all at once.

Why Build In-House?

There are plenty of excellent AI coding tools in the world — Cursor, Claude Code, and others. Why did Stripe build their own agent?

The reason is simple: Stripe's codebase is too specialized.

Stripe's system consists of hundreds of millions of lines of code. Most of the backend uses a proprietary Ruby implementation (not Ruby on Rails) with a typing system called Sorbet — already an unusual stack. On top of that, there are mountains of internal-only libraries that external LLMs haven't learned.

Then there's the fact that they're processing over a trillion dollars in payments annually. In an environment where a single bug could affect businesses worldwide, they couldn't trust general-purpose AI tools with code. Complex dependencies with financial institutions, and regulatory/compliance requirements in each country that must be followed at the code level.

Stripe's design philosophy is clear: "What's good for humans is good for LLMs." Provide Minions with the same tools and environments that human engineers use. By integrating closely with developer productivity tools that have been invested in over the years, they solved this challenge.

How Minions Works

Three cleverly designed mechanisms support Minions' autonomous task execution.

First, isolated development environments called "devboxes." The same environment that Stripe engineers normally use starts in 10 seconds. Code and services are pre-loaded, completely isolated from production and the internet. So it can safely execute code without human permission.

Second, agent loops and tool integration. At the core is a fork of the open-source agent "goose" developed by Block. Stripe customized this to combine LLM creative reasoning with deterministic processes like git operations and test execution. Here's the interesting part: it uses "Toolshed" — a foundation for calling over 400 internal and external tools as APIs — via Model Context Protocol (MCP). Minions can read internal documentation, check build status, and make situational judgments just like a human engineer.

Third, feedback loops for self-correction. Minions aims for "one-shot completion," but it also has mechanisms to learn from failure. When code is pushed, fast local tests that complete within 5 seconds run first. Once these pass, tests with high relevance among over 3 million tests run on CI. Even if tests fail, automatic fixes are applied immediately for those with that capability. If that's still not enough, failure information is fed back to Minions, which fixes the code and tries again. But CI runs a maximum of 2 times. The Shift-left feedback philosophy of "return feedback as early as possible, as early in the development process as possible" supports Minions' efficiency.

Honestly, I was genuinely impressed by the fact that such a refined system is being built and operated internally. Minions is not just a code generation tool — it's an engineering system that operates autonomously under the enormous pressure of $1 trillion in annual payments.

Interested in leveraging AI?

Download our service materials. Feel free to reach out for a consultation.

Book a Free ConsultationDownload Resources

The Frontier of Autonomous AI Agent Development Where Companies Are Competing

The future Stripe's Minions showed isn't theirs alone. Major tech companies are making huge investments in autonomous AI agents to seize dominance in the next generation of software development. Approaches vary, but the direction is common: free developers from repetitive tasks so they can focus on more creative problem-solving.

The Giants Who Control the Development Platform

Companies that have long dominated the software development ecosystem are attempting to redefine the developer experience itself by deeply integrating AI agents into IDEs and cloud platforms.

GitHub Copilot Workspace shows Microsoft's seriousness. This feature with a technical preview released in April 2025 goes far beyond traditional code completion. Just write in natural language on a GitHub Issue "I want to add this feature," and Copilot Workspace autonomously handles spec definition, file identification, coding, test execution, and pull request creation. The agent operates in a sandbox environment and developers can check and modify the plan and execution process at any time. AI evolving from "completion tool" to "team member." Microsoft's strategy of providing this experience at GitHub — the center of development — is quite clever.

Amazon Q Developer uses AWS integration as its greatest weapon. Originally provided as code completion functionality under the name CodeWhisperer, it has evolved into a broader developer assistant. Just give natural language instructions for AWS-specific tasks like "write a Lambda function that uploads images to S3," and optimal code is generated. It handles vulnerability scanning, code optimization suggestions, and Q&A about AWS best practices. Amazon reportedly already uses thousands of AI agents internally, and having that knowledge fed back is a significant strength.

Google DeepMind's approach is slightly different. AlphaCode 2, announced in December 2023, outperformed 85% of human contest participants in competitive programming — the extreme problem-solving domain. Based on Gemini, it features a search mechanism that generates large numbers of code candidates and filters, clusters, and scores them. AlphaEvolve, announced in May 2025, goes further — an "evolutionary coding agent" where AI discovers and evolves algorithms itself. It achieved improvement over the best-known algorithm for 4x4 complex matrix multiplication that had been known for years. There's an ambition to push the frontier of intellectual production itself, not just near-term development efficiency.

Next-Generation Coding Partners from LLM Developers

Companies developing LLMs like ChatGPT and Claude themselves are aiming for more advanced, general-purpose coding agents leveraging the most cutting-edge model capabilities.

Codex App, announced by OpenAI in February 2026, is a native application for macOS. Beyond just conversing with cloud AI, it can simultaneously run multiple AI agents in a local environment to handle long-term tasks in coordination. Equipped with GPT-5.2-Codex, it supports the full software lifecycle from design to maintenance. Generating UI code from Figma designs, updating tasks in project management tool Linear, deploying to cloud services — and more coding-adjacent peripheral tasks are executable as "skills." A case of developing an entire racing game using over 7 million tokens from a single prompt has been reported. Since release, total Codex usage has doubled, with over 1 million developers using it in the past month.

Anthropic's Claude Code has built a unique position. It has the capability to understand an entire codebase and autonomously execute tasks across multiple files and tools. What personally shocked me was the fact that approximately 90% of Claude Code's own code is written by Claude Code itself. AI developing AI — "self-reproduction" has already begun. Technically, it can integrate with Jira and Slack via Model Context Protocol (MCP), also adopted by Stripe, and agent behavior can be customized per project through a settings file called CLAUDE.md. Netflix's use of it for bug fixes has also been reported.

Cognition AI's Devin appeared suddenly in 2024 and caused a stir in the industry. Under the bold banner of "the world's first autonomous AI software engineer," it receives ambiguous requirements, independently plans tasks, reads documentation to learn about unknown technologies and APIs, and handles everything from coding to deployment end-to-end. Adoption cases at Goldman Sachs and partnerships with major IT consulting firm Cognizant have been announced, with rapid advancement into enterprise applications. A 2025 performance review reported 4x improvement in problem-solving speed and 2x improvement in resource efficiency. A figure of 67% of created pull requests being merged has also been published.

As an aside, "Confucius Code Agent (CCA)" announced by Meta and Harvard University in January 2026 is also interesting. Rather than just using a powerful model, it takes an approach that emphasizes the architectural design of the agent itself. The design with three clearly separated aspects — "Agent Experience" for managing information handled by AI models, "User Experience" for humans monitoring, "Developer Experience" for improving agents — is characteristic, achieving a 54.3% solution rate on the SWE-Bench-Pro benchmark.

Practical Agents Born from the Field

Major tech companies aren't the only protagonists. Unique agents responding to needs born in the development field are also emerging.

Cursor's Cursor Agent proposes a new design philosophy called multi-agent architecture. Rather than relying on a single powerful agent, a "Planner" plans and divides the entire development task, while many "Workers" execute individual coding tasks in parallel. This role division enables efficient development without conflicts between agents even in large-scale projects. Over 20,000 developers at Salesforce use Cursor, with over 90% actively using it. Experiments building a web browser (over 1 million lines) and Windows 7 emulator (over 1.2 million lines) with only AI agents have been reported, and the scalability is striking.

Block's Goose is open-source autonomous agent software also known for being forked and used by Stripe. Its philosophy is that developers can fully control the agent, and since it operates in a local environment, there's no need to send confidential code to external sources. Connect your preferred LLM, integrate with original tools and APIs — freely extensible. Within Block, there's apparently a case where a Google Docs extension development that had taken months was completed in 30 minutes.

Uber's AutoCover is an agent specialized for a specific challenge: test code generation. Writing test code is essential for quality assurance but is time-consuming repetitive work for developers. AutoCover autonomously handles everything from creating test templates to code generation, execution, and fixing failures, with multiple sub-agents coordinating. It has cumulatively reduced developer working time by 21,000 hours, with 5,000 engineers using it — proof that agents specialized in a specific domain can generate clear ROI.

The Wave of Autonomy Beyond Coding

AI agent applications aren't limited to coding.

Salesforce's Agentforce, announced in September 2024, is an autonomous agent that automates business processes themselves in sales, service, and marketing. Utilizing vast customer data integrated in Data Cloud, it autonomously executes automatic responses to inquiries and product suggestions to prospects. Customizable with low-code, so field representatives can create agents suited to their business. Adoption company Wiley reported over 40% improvement in case resolution rates. Salesforce has set a goal of "operating 1 billion agents by end of 2025."

Apple also incorporated "agentic coding" functionality in Xcode 26.3 in February 2026. Rather than Apple itself making the agent, they chose an approach of integrating external agents like Anthropic's Claude Agent and OpenAI's Codex into Xcode via MCP. The Apple known for its self-sufficiency chose an open protocol path to collaborate with third parties for agents. This also reflects the speed of change in this technology area and the breadth of evolution that can't be handled alone.

Here's a summary of each company's developments:

Company Agent Features Key Results
Stripe Minions Fully autonomous, one-shot completion 1,000+ PRs auto-generated weekly
Microsoft/GitHub Copilot Workspace Autonomous execution from Issue to PR In technical preview
Amazon Q Developer Close AWS service integration Thousands of AI agents used internally
Google DeepMind AlphaCode 2 / AlphaEvolve Algorithm discovery and evolution Top 85% in competitive programming
OpenAI Codex App macOS native, multi-agent coordination 1M+ monthly users
Anthropic Claude Code Self-generation capability, MCP integration 90% of own code self-generated
Cognition AI Devin Autonomous AI software engineer 67% PR merge rate
Meta Confucius Code Agent Three-layer architecture design 54.3% on SWE-Bench-Pro
Cursor Cursor Agent Planner-Worker multi-agent 20,000+ users at Salesforce
Block goose Open source, local execution Development time from months to 30 min
Uber AutoCover Test code generation specialized 21,000 hours of work time reduced
Salesforce Agentforce Business process automation Case resolution rate 40%+ improved
Apple Xcode agentic coding External agent integration via MCP Released February 2026

What Engineers Will Need Going Forward

Looking across Stripe's Minions to each company's developments, common themes emerge.

First, autonomous task execution. Not just generating code, but completing everything from Issue understanding to planning, implementation, testing, and fixing without human intervention. This is the core of the next-generation agent.

Essential for this is integration with external tools. As Stripe's Toolshed and Anthropic's MCP symbolize, agents no longer operate independently. Version control, CI/CD, project management, internal APIs — by connecting with existing infrastructure, they become capable of work that withstands real business. And feedback loops: immediately receiving test failures and linter errors to fix code themselves. This fast iteration is what I believe raises AI agent quality to human levels.

As a result, what happens? As Uber reduced 21,000 hours, routine and repetitive work is being replaced by AI. Engineers can focus on upstream judgments like "what should be built" and "what design is optimal," and more difficult technical challenges.

At the same time, our role changes. Code writing skills alone aren't enough. The ability to give accurate instructions to AI agents and define problems. The ability to evaluate code and architecture generated by AI and make final judgments. Management ability to use AI as an excellent teammate. These skills will become essential for engineers going forward.

Turning "We Want to Do This Too" into Reality Together

Looking at the frontline of the landscape change happening right now in software development, from Stripe's Minions to each company's developments, companies from giants Microsoft and Google to newcomers like Cognition AI are competing to seize dominance of the new paradigm of "autonomous AI agents."

Among those who've read this far, many are probably thinking: "I want to use AI agents at our company too" but "I don't know where to start."

I understand that feeling. Stripe-level giant tech companies can form dedicated teams and build in-house, but most companies can't do that. Tool selection, integration with internal data, security requirement organization, field adoption — there are too many things to consider, and you can't take that first step.

TIMEWELL faces exactly these challenges daily.

Our WARP consulting service is a service that comprehensively supports AI agent implementation from strategy development to implementation support. Former specialists in DX and data strategy at major companies analyze your business workflows and codebase, and work with you to identify "where to put AI agents for the most impact." While utilizing monthly-updated knowledge, we accompany you from PoC (proof of concept) through production operation.

The enterprise AI platform ZEROCK provides a foundation for AI agents to safely reference internal knowledge. Just as Stripe opened internal tools to agents via "Toolshed," we can build the mechanism to have AI utilize your company's information assets on domestic AWS servers.

Autonomous AI agents are not a threat — they're an opportunity. The future where you're freed from boring repetitive work and can focus on more human, creative work is right around the corner.

"How would it work for us?" "I'd like to just hear about it first." Those who feel that way, please feel free to consult with us. Let's ride this new wave together.


References

  • Alistair Gray. (2026, February 9). Minions: Stripe's one-shot, end-to-end coding agents. Stripe Dot Dev Blog.
  • Amazon Web Services. Amazon Q Developer.
  • GitHub. GitHub Copilot.
  • Google DeepMind. (2025, May 14). AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms.
  • OpenAI. (2026, February 2). Introducing the Codex app.
  • Anthropic. Claude Code Docs.
  • Cognition AI. Cognition.
  • Cursor. Cursor.
  • Block. goose.
  • Salesforce. Agentforce.
  • Apple. (2026, February 3). Xcode 26.3 unlocks the power of agentic coding. Apple Newsroom.
  • Uber. Agentic AI Solutions.
  • DevOps.com. (2026, January 12). Meta Introduces Confucius Code Agent.

How well do you understand AI?

Take our free 5-minute assessment covering 7 areas from AI comprehension to security awareness.

Check AI Literacy
Book a Free Consultation30-minute online sessionDownload ResourcesProduct brochures & whitepapers

Share this article if you found it useful

シェア

Newsletter

Get the latest AI and DX insights delivered weekly

Your email will only be used for newsletter delivery.

無料診断ツール

あなたのAIリテラシー、診断してみませんか?

5分で分かるAIリテラシー診断。活用レベルからセキュリティ意識まで、7つの観点で評価します。

無料で診断する

Related Knowledge Base

Enterprise AI GuideAI Adoption Roadmap

Solutions

Solve Knowledge Management ChallengesCentralize internal information and quickly access the knowledge you need
AI Adoption & DX SupportEnd-to-end support from strategy to adoption

Learn More About テックトレンド

Discover the features and case studies for テックトレンド.

View テックトレンド DetailsContact Us

Related Articles

AI Development Standards Are Changing — An Introduction to the 'superpowers' Plugin with 57K GitHub Stars

A thorough explanation of the design philosophy behind "superpowers" — the Claude Code plugin with over 57,000 GitHub stars that enforces disciplined development processes on AI. The full story of a framework that makes AI follow the rules.

2026-02-22

What Is Web4.0? The Shape of the Web in the Age of AI Agents

The full picture of Web4.0 — the "agentic internet." An analysis with the latest examples and data of the era when AI earns autonomously, self-replicates, and hires humans.

2026-02-21

Antigravity × Google Workspace Integration: A Complete Setup and Usage Guide

A comprehensive guide to integrating Antigravity with Google Workspace via MCP — from initial setup through practical use cases for AI-powered productivity in Gmail, Calendar, Drive, and more.

2026-02-20