AIコンサル

Three Generative AI Breakthroughs: How Veo3, ChatGPT, and Eleven v3 Are Evolving Creative Work and Revenue Models

2026-01-21濱本

The rapid evolution and spread of AI technology is bringing major transformation not only to creative fields but to business and marketing operations as well. Google DeepMind's video generation technology "Veo3" represents a turning point for the AI video industry. Within just one week of its release, video content generated with it flooded social media — signaling the arrival of a new era.

Three Generative AI Breakthroughs: How Veo3, ChatGPT, and Eleven v3 Are Evolving Creative Work and Revenue Models
シェア

From TIMEWELL

This is Hamamoto from TIMEWELL.

Looking for AI training and consulting?

Learn about WARP training programs and consulting services in our materials.

Generative AI Enters a New Era

The rapid evolution and spread of AI technology is bringing major transformation not only to creative fields but to the very front lines of business and marketing. In particular, the arrival of Google DeepMind's video generation technology "Veo3" has become a turning point for the AI video industry. Within just one week, AI-generated video content was flooding social media — a clear signal that a new era has arrived. Add to this OpenAI's advanced voice interaction update for ChatGPT and the release of Eleven Labs' emotionally rich voice generation model "v3," and the possibilities for AI-powered storytelling and branding have expanded dramatically.

This article explores how these latest AI technologies are evolving, what real-world use cases already exist, and what benefits they offer to companies and creators.

  • The Impact of Veo3: The Revolutionary Model That Simultaneously Generates Video and Audio
  • The Era of "Talking to AI" — ChatGPT's Voice Capability Revolution
  • Eleven Labs v3 Voice Generation and the Reality of Consumer AI Revenue Growth
  • Summary

The Impact of Veo3: The Revolutionary Model That Simultaneously Generates Video and Audio

Google DeepMind's latest video generation model "Veo3" introduces capabilities that set it apart from previous AI technology. Going beyond prior static image generation and short clip creation, this model can generate both audio and video simultaneously from text — enabling realistic dialogue scenes and dynamic scene transitions based on the user's specific scenario inputs. For example, a street corner interview scene or characters having a conversation can be generated from a single text prompt, making expressions that were difficult in conventional video production and seamless storytelling accessible to anyone.

What makes Veo3 stand out is that it generates audio simultaneously with the video. Users no longer need a separate voice synthesis tool — a single prompt creates content with integrated visuals and dialogue. For example, a specific scene like "a street-style interview where a man asks a woman 'Which dating app do you use?' and she responds 'What do you mean by that?'" can be created all at once — allowing users to enjoy dynamic storytelling while maintaining visual and audio coherence.

This technology is a natural evolution building on the success of the earlier Veo2 model. Veo2 had already earned recognition for high-quality video generation, with improvements in physical movement and character consistency within scenes. Veo3 takes that further with the addition of "audio generation capability," realizing dialogue and situational descriptions that feel as if they could actually exist. For now, however, there is a constraint: clips are limited to 8 seconds. Creating longer videos requires connecting multiple 8-second clips — and maintaining character appearance, personality settings, and other consistency the model has already learned becomes important for overall coherence.

Veo3 was initially offered exclusively through Google's AI Ultra plan, which costs approximately $250 per month (¥36,000). More recently, API access has expanded and consumer video generation platforms have entered the space, diversifying its modes of use. For small businesses and individual creators, low-cost plans at around $10 per month (¥1,450) have appeared, and pay-per-use pricing at 75 cents per 75-second video (~¥109) has also been introduced. Anyone can now generate innovative video content at low cost and apply it as a social media or marketing tool.

For companies, adopting Veo3 is not only an opportunity to lower market entry barriers dramatically — it's a major opportunity to improve content production efficiency. Previously, video production required high technical skill or specialist creators. But this technological innovation means that companies and agencies with a good idea can produce diverse video content in a short time. Promotional videos and brand stories can be generated automatically in high quality audio and visual form from a text scenario alone, without the need for traditional shooting or editing. This allows marketing cost reduction and rapid campaign execution — a significant competitive advantage.

There are challenges, however. The 8-second clip constraint is one. Creating long-form videos requires connecting short clips, and maintaining character consistency and narrative continuity through that process is a key consideration. In audio generation too, when specific characters, accents, or expressive dialogue are needed, the model may not be able to supplement what it hasn't already learned. Addressing these challenges, further model updates and the development of supplementary tools are anticipated — and on the enterprise side, phased technology adoption and operational ingenuity will be required.

In this way, Veo3 is not merely a technological innovation — it holds the potential to revolutionize corporate video marketing strategy and content production processes. New forms of "faceless" channels are rapidly emerging, and changes to the existing paradigm of personal brands and YouTubers are anticipated. Concretely, individuals who don't show their faces can now distribute video through AI-generated virtual characters — opening up new business models across the entire content industry.


The Era of "Talking to AI" — ChatGPT's Voice Capability Revolution

OpenAI's ChatGPT has long been used across business settings and daily conversation for its advanced text generation capability. But the latest update to its voice interaction feature has expanded that potential even further. Early ChatGPT was limited to text-based responses, so the introduction of a voice interface was eagerly anticipated as a leap in convenience for users. This update moves beyond "basic voice generation" to deliver an "advanced voice mode" with human-like dynamism in delivery, pacing, and intonation — dramatically improving the realism of dialogue.

ChatGPT with its new voice mode incorporates emotional expression responsive to the flow of conversation while achieving natural intonation and speech patterns. For example, when a user poses a question, question marks are reflected in facial expression and vocal tone — giving the impression of conversing with a real person more than ever before. In actual demonstrations, small "um" and "uh" sounds were inserted during dialogue with ChatGPT, and rather than sounding awkward, these actually contributed to creating a natural conversational rhythm.

ChatGPT's voice capability has also evolved amid competition with other AI voice models and open-source voice synthesis tools. While other AI voice models had already incorporated improvements pursuing human-like qualities, OpenAI has been advancing improvements cautiously, informed by user feedback and market needs. Some users had previously found the artificial "perfection" of AI voices to feel unnatural — but this update deliberately introduces human-like "imperfection," creating more realistic dialogue.

ChatGPT's voice interaction capability also opens up expanding applications beyond smoothing conversational flow — in corporate customer support, online education, personal assistants, and other operational settings. Specifically, the potential to function as an AI operator available 24 hours a day, 365 days a year as a replacement for phone and video calls, both reducing corporate operational costs and improving service quality for users. Major companies like Google and Apple are also announcing new AI voice technologies — Apple's Siri with real-time translation capabilities, and Genmoji's natural emotional expression using facial recognition — with competition intensifying.

Behind all this is a rapidly growing need from consumers and businesses for more advanced, realistic AI experiences. Companies are making large investments in technological innovation to establish competitive advantage in this market. ChatGPT's advanced voice mode is just one example — as user experience improves, information that was previously expressed only in text form is expected to develop into more multi-dimensional communication tools fusing voice and visuals.

Furthermore, the evolution of conversation through the latest voice technology holds the potential to transcend international language barriers. Scenarios where different languages switch naturally within a single conversation, with mutual interpretation possible, are now conceivable. In demonstrations, ChatGPT was seen switching between multiple languages to respond flexibly to users' questions in real time — raising hopes for resolving the communication challenges global companies face.

In this way, ChatGPT's evolved voice interaction capability is positioned not merely as a feature extension but as a strategic tool directly connected to the innovation of corporate service delivery and global communication. Riding the wave of technological innovation, the adoption of voice interfaces in business settings is advancing rapidly — and going forward, the potential will only expand further, both as internal employee tools and as personalized support systems for customers. From the efforts of OpenAI and its competitors alike, it's becoming clear that the shape of future business communication is on the verge of significant transformation.


Eleven Labs v3 Voice Generation and the Reality of Consumer AI Revenue Growth

Eleven Labs' latest voice generation model "v3" represents next-generation technology in the AI voice market, with particularly revolutionary advances in storytelling, advertising, and marketing. This model breaks through the limitations of previous text-to-speech technology, attracting attention for its ability to express emotions, intonation, accents, and even subtle nuances during speech. Previously, users either recorded their own voice or depended on external audio editing tools — but Eleven v3's greatest strength is that tagging and prompt instructions can batch-generate natural pauses, emotional expression, and changes in tone all at once. In actual demonstrations, a promotional video for a frozen yogurt brand with a milk-churning style naturally and realistically reproduced each character's vocal tone, emotional variation, and even sound effects at scene transitions — making a strong impression on viewers.

This technological innovation has a significant impact on creative fields and marketing strategy. The improved new voice generation capability allows creators to produce content suited to more diverse situations than before, with diverse modes of expression being explored in advertising campaigns, promotional videos, and corporate branding strategies. Projects that previously required hiring voice actors, for example, can now achieve cost savings and speed improvements using Eleven v3. Furthermore, this technology can be applied not just to narration and dialogue within video but to automatic response systems in product description and customer support — making it increasingly essential to comprehensive corporate marketing strategy.

In the consumer AI product market, recent rapid revenue growth is attracting attention. Against the backdrop of the previously B2B-centric AI business, subscription-based B2C models are spreading and monetization of AI creative tools is advancing. For example, the 12-month ARR (annual recurring revenue) of consumer AI companies has reached approximately $4.2 million (¥600 million), with top companies recording $8.7 million (¥1.26 billion). An average monthly price of approximately $22 (~¥3,200) — a relatively high price point — sees strong user willingness to pay, with paid conversion rates also rising.

Behind this revenue growth is the rapid evolution of AI technology and the provision of solutions to challenges users face. For example, the customer acquisition process that previously took companies years to build has now expanded instantly with the use of AI tools, contributing directly to branding and market expansion. In creative domains particularly, image generation, video editing, voice generation, and other creative processes are all being automated by AI and efficiency is improving. As a result, companies can produce large volumes of high-quality content with limited resources — and this is directly linked to revenue growth.

Furthermore, voice generation technology like Eleven v3 is extremely useful not only for traditional "emotionally rich dialogue" and "seamless voice conversion" but as a new creative tool for building personal brands and marketing strategies for small and mid-sized businesses. For example, an individual can use AI to quickly and cheaply generate video for their brand image and product promotion, or marketing materials combining logos and product photos with audio — and it's expected that more and more companies will leverage these tools going forward.

In this way, evolved AI voice technology like Eleven v3, along with the rapid revenue growth of consumer AI, is not merely a technological innovation — it will have a major impact on future market competition and the nature of corporate strategy. By maximizing the potential of new creative tools, companies can refresh their brands and pursue further market expansion. As flexible approaches unconstrained by conventional concepts are required, Eleven v3's emergence as a success story for the consumer AI market as a whole will undoubtedly continue to attract attention.


This article has explored in detail the latest AI video generation model Veo3, ChatGPT's advanced voice interaction update, the emotionally rich Eleven v3 voice generation model, and the rapid revenue growth in consumer AI that accompanies it. In each domain, the practical use cases enabled by technological innovation and its utility in corporate strategy have become clear — the possibilities for AI use in future business settings are extremely large.

First, Google DeepMind's Veo3 has revolutionized the conventional video production process by simultaneously generating video and audio from a single text prompt. Even at just 8 seconds per clip, the ability to fuse visuals and audio while maintaining scene construction and character consistency is driving dramatic efficiency gains in advertising and promotional use. This has enabled companies to produce large quantities of diverse content quickly and at low cost without the traditional need for shooting and editing, bringing revolutionary change to branding strategy.

ChatGPT's evolution has also expanded beyond a mere text generation tool into the realm of voice dialogue, and its naturalness and human-like conversational expression suggest that its use in customer support, personal assistants, and multilingual communication tools will accelerate going forward. Voice-based AI dialogue is bringing revolutionary changes to user experience compared to text-based services, becoming an important tool for companies to deepen relationships with customers.

Furthermore, the emergence of Eleven v3 has made possible the generation of emotionally expressive voice content beyond the bounds of previous voice generation. This is creating an environment where companies can generate more personalized voice content at low cost for marketing, promotion, and internal communication tools. As noted above, consumer AI companies — unlike the previously B2B-centric business model — are achieving rapid revenue growth by directly capturing subscription income from users. The success factors of rising subscription prices, improved conversion rates from free to paid users, and revenue expansion through optional features are emblematic of the improving value of AI products in the market as a whole.

Finally, these technological innovations are not limited to individual tools and services — they are driving forces promoting major transformation across the creative industry and in corporate strategy as a whole. As business professionals, it's necessary to accurately grasp the evolution of these AI technologies and the new market opportunities they bring, and to consider how to apply them to your own marketing strategy and product development. As AI technology reaches an increasingly practical and user-friendly stage going forward, the environment in which companies can fully enjoy its benefits will undoubtedly continue to develop. Each company, by actively incorporating these technologies, is both expected to greatly transform existing creative processes and to improve business performance and competitive advantage in the market.

Reference: https://www.youtube.com/watch?v=fySodSi4aUU

Considering AI adoption for your organization?

Our DX and data strategy experts will design the optimal AI adoption plan for your business. First consultation is free.

Share this article if you found it useful

シェア

Newsletter

Get the latest AI and DX insights delivered weekly

Your email will only be used for newsletter delivery.

無料診断ツール

あなたのAIリテラシー、診断してみませんか?

5分で分かるAIリテラシー診断。活用レベルからセキュリティ意識まで、7つの観点で評価します。

Learn More About AIコンサル

Discover the features and case studies for AIコンサル.