AIコンサル

Sesame AI: $307M Raised, Maya/Miles Voice Assistants, and the Quest to Cross the Uncanny Valley of Voice

2026-01-21濱本

Sesame AI — founded by Oculus co-founder Brendan Iribe — has raised $307M, gone viral with its Maya and Miles voice assistants (1M+ users, 5M+ minutes of conversation), and open-sourced its CSM-1B voice generation model. This article explains what makes Sesame's "Voice Presence" approach different from conventional AI voice, and what it means for enterprise deployment.

Sesame AI: $307M Raised, Maya/Miles Voice Assistants, and the Quest to Cross the Uncanny Valley of Voice
シェア

Hello, I'm Hamamoto from TIMEWELL.

"Is this really AI?" — that was the reaction from many people who tried Sesame's demo when it launched in February 2025. Within weeks, over one million users had tested it, generating more than five million minutes of conversation. What was different wasn't the information the AI provided — it was how it sounded while providing it.


What Is Sesame AI?

Sesame is a voice AI and smart glasses startup focused on one specific problem: making AI voice feel like talking to a person rather than a machine.

Company overview:

  • Founded: 2022
  • Founders: Brendan Iribe (Oculus co-founder, former CEO) and Ankit Kumar (former CTO, Ubiquity6)
  • Total funding: $307.6M
  • Key products: Maya, Miles voice assistants; CSM-1B open-source model

Funding history:

Period Round Amount Key Investors
2023 Seed Undisclosed a16z, Spark Capital, Matrix Partners
October 2025 Series B $250M Sequoia, Spark Capital

Sequoia published a blog post titled "A New Era of Voice" explaining their investment thesis — which gives a sense of how seriously they're treating this category.


Maya and Miles: What Made Them Go Viral

The February 2025 demo introduced two voice assistant characters:

Maya — warm, friendly, empathetic conversational style Miles — more measured, witty, intellectually-oriented style

Users can choose based on preference. The distinction isn't just personality labeling — the voice qualities, pacing, and tonal choices are genuinely different.

What Separates Sesame from Siri, Alexa, and Google Assistant

Traditional AI voice assistants use a two-step pipeline:

[LLM] → text → [TTS engine] → audio

The text-to-speech conversion step is where expressiveness is lost. The TTS engine generates audio from text, but the text itself doesn't carry tonal information — when to pause, when to inflect, what emotional register to use. The result sounds like a system reading text aloud.

Sesame's approach:

[Conversational model] → audio directly (with rhythm, emotion, expressiveness)

The end-to-end generation means the model produces the audio itself, not a text representation that gets converted. Nuance that would be lost in transcription is preserved.

Sesame calls the result Voice Presence — a concept combining:

  • Emotional expression through tone (joy, surprise, empathy)
  • Natural timing and pauses matching human conversation rhythm
  • Context-aware tonal adjustment
  • Wit and humor in appropriate situations

The research paper supporting this approach is titled "Crossing the Uncanny Valley of Voice" — directly addressing why AI voice has felt slightly wrong, and what it takes to fix it.


Looking for AI training and consulting?

Learn about WARP training programs and consulting services in our materials.

CSM-1B: Open-Source Voice Generation

In March 2025, Sesame released CSM-1B (Conversational Speech Model, 1 billion parameters) under the Apache 2.0 license.

Technical specifications:

  • 1B parameters
  • Apache 2.0 license (commercial use permitted)
  • Output format: RVQ audio codes
  • Input: text and audio

The open-source release serves multiple purposes: it accelerates research in voice AI, builds developer ecosystem around Sesame's approach, and establishes credibility in the research community — the same approach ElevenLabs and others have used to generate attention.


The Smart Glasses Vision

Sesame isn't positioning itself as a voice assistant company in isolation. The end goal is an ambient AI interface — a lightweight, all-day wearable that operates primarily through voice.

Brendan Iribe's Oculus experience is relevant here. Building the Oculus Rift required solving miniaturization, weight distribution, battery life, and comfort challenges at consumer scale. Those problems have direct analogies in smart glasses development.

The case for voice as the primary interface for wearables:

  1. Hands-free — no screen interaction required
  2. Low friction — faster than typing, natural conversational flow
  3. Wearable-native — smart glasses and earbuds need voice as primary input

This is where Sesame's long-term strategy diverges from pure software voice AI companies. The combination of voice AI technology with hardware development capability — building both the software and the device — mirrors the vertical integration strategy that made Apple effective.


Current Limitations and Challenges

Task execution uncertainty: LLMs are strong at conversation but less reliable at executing precise actions. Error recovery in voice interfaces is harder than in visual interfaces.

Privacy: Always-on listening raises legitimate concerns about data handling. Public space use creates additional complexity.

Context limitations without vision: Voice-only interfaces lack the visual context that clarifies ambiguous requests. Multimodal integration (camera + voice) addresses this but adds hardware complexity.

Security: Prompt injection via voice — using audio to manipulate AI behavior — is an active area of security research.


Competitive Landscape

Company Approach Differentiator
Sesame End-to-end voice generation, Voice Presence Natural expressiveness, smart glasses roadmap
ElevenLabs TTS platform, voice cloning Multi-language, voice marketplace
OpenAI GPT Advanced Voice Mode Integrated with leading LLM
Amazon Alexa Smart home integration
Apple Siri Device ecosystem
Google Google Assistant Search and services integration

Sesame's differentiation is technical (end-to-end generation vs. two-step pipeline), hardware-oriented (smart glasses as destination), and focused (conversation quality specifically, rather than breadth of features).


Enterprise Applications

Voice AI in business contexts has specific use cases where the quality improvement Sesame offers matters:

Customer support: Human-sounding AI that handles the emotional dimensions of service conversations more naturally than robotic-sounding alternatives.

Internal assistants: Hands-free information retrieval and task logging during work sessions, without requiring screen interaction.

Sales support: Voice-based CRM data entry, pre-meeting briefings, follow-up management.

Training: Conversation practice, role-playing scenarios, language learning support.

Considerations for enterprise deployment:

  • Identify use cases where voice specifically outperforms text interfaces
  • Establish clear data handling policies for voice recordings
  • Plan integration with existing systems through API
  • Build error handling for voice-specific failure modes

TIMEWELL supports evaluation and implementation of voice AI technology through WARP consulting services, and ZEROCK provides enterprise AI infrastructure that enables integration with tools including voice AI systems.


Summary

Key facts about Sesame AI:

  • Founded by Oculus co-founder Brendan Iribe and AR/VR expert Ankit Kumar
  • Raised $307.6M total (Series B: $250M from Sequoia, October 2025)
  • Maya and Miles attracted 1M+ users and 5M+ minutes of conversation within weeks of launch
  • CSM-1B open-sourced under Apache 2.0 (commercial use permitted)
  • "Voice Presence" concept — emotional intelligence + natural timing + contextual awareness
  • Long-term goal: AI smart glasses as ambient interface

The broader signal: voice as an AI interaction modality is moving from "functional but robotic" toward genuinely natural. Companies that deploy voice AI now will have a meaningful experience advantage once that quality gap closes.

Considering AI adoption for your organization?

Our DX and data strategy experts will design the optimal AI adoption plan for your business. First consultation is free.

Share this article if you found it useful

シェア

Newsletter

Get the latest AI and DX insights delivered weekly

Your email will only be used for newsletter delivery.

無料診断ツール

あなたのAIリテラシー、診断してみませんか?

5分で分かるAIリテラシー診断。活用レベルからセキュリティ意識まで、7つの観点で評価します。

Learn More About AIコンサル

Discover the features and case studies for AIコンサル.