Microsoft AI’s Copilot showcases MAI-Voice-1 for expressive speech and MAI-1-preview for intelligent text, marking its first in-house models. Image Source: ChatGPT-5

Microsoft AI Introduces MAI-Voice-1 and MAI-1-Preview Foundation Model

Key Takeaways: Microsoft AI’s New In-House Models

  • Microsoft AI (MAI) introduced two in-house models: MAI-Voice-1 for natural speech generation and MAI-1-preview, its first foundation model.

  • MAI-Voice-1 delivers high-fidelity, expressive audio at remarkable speed — generating a full minute of audio in less than a second on a single GPU.

  • The speech model is already powering Copilot Daily and Podcasts, and is available for experimentation in Copilot Labs.

  • MAI-1-preview, a mixture-of-experts foundation model, was trained on ~15,000 NVIDIA H100 GPUs and is now being tested publicly on LMArena.

  • The foundation model will gradually roll out into Copilot text use cases, with API access for trusted testers also available.


Microsoft AI’s Mission: AI for Everyone

Microsoft AI (MAI) describes its mission as creating AI that empowers every person and organization. The company envisions AI as a supportive, reliable companion — a gateway to knowledge and a set of capabilities that adapt to people’s unique needs.

To realize that vision, MAI has been building purpose-built models alongside its world-class team and infrastructure. This week marks the preview of its first two in-house systems designed to move that mission forward.

MAI-Voice-1: Expressive, High-Speed Speech Generation

The first release is MAI-Voice-1, a speech generation model designed to produce natural, expressive, high-fidelity audio across both single- and multi-speaker scenarios.

  • Performance: MAI-Voice-1 can generate a full minute of audio in under a second on a single GPU, making it one of the most efficient speech systems available today.

  • Deployment: It is already powering Copilot Daily and Podcasts, bringing more natural audio to those features.

  • Experimentation: The model is also available in Copilot Labs, where users can test demos like storytelling experiences or guided meditations created from a simple prompt.

By making voice faster and more expressive, MAI-Voice-1 positions speech as the interface of the future for AI companions. You can try it here.

MAI-1-Preview: Foundation Model Trained on 15,000 GPUs

The second milestone is MAI-1-preview, the company’s first foundation model trained end-to-end in-house. It follows a mixture-of-experts architecture and was pre-trained and post-trained on ~15,000 NVIDIA H100 GPUs.

  • Evaluation: The model is undergoing public testing on LMArena, a community platform for model evaluation.

  • Use Cases: It is designed to handle instruction following and helpful everyday responses, with rollout into select Copilot text features planned over the coming weeks.

  • Access: In addition to LMArena, the model is available to trusted testers through API access, allowing Microsoft AI to collect targeted feedback. You can apply for access here.

This represents the start of MAI’s strategy to deliver improved in-house foundation models, while also leveraging partner models and open-source innovations to ensure the best outcomes across its products.

Looking Ahead: Specialized Models for Diverse Use Cases

Microsoft AI emphasizes that these two models are only the first step in a broader strategy. Beyond foundation systems, the company intends to orchestrate a range of specialized models tuned for specific user intents and contexts.

This approach is designed to unlock greater value for customers, ensuring that Copilot and other Microsoft products can adapt to the millions of unique interactions they support daily.

Q&A: Microsoft AI’s MAI-Voice-1 and MAI-1-Preview Models

Q: What models did Microsoft AI just release?
A: Microsoft AI (MAI) announced MAI-Voice-1, an expressive speech generation model, and MAI-1-preview, its first foundation model.

Q: What makes MAI-Voice-1 unique?
A: MAI-Voice-1 generates natural, high-fidelity audio at remarkable speed, producing a minute of speech in under a second on a single GPU.

Q: Where is MAI-Voice-1 available?
A: It already powers Copilot Daily and Podcasts, and can be tested through Copilot Labs, where demos showcase storytelling and guided meditations.

Q: How was MAI-1-preview trained?
A: MAI-1-preview is a mixture-of-experts foundation model trained on ~15,000 NVIDIA H100 GPUs, designed for instruction following and helpful responses.

Q: How can developers test MAI-1-preview?
A: It is live on LMArena for public evaluation, with additional access available through API testing for trusted users.

What This Means: Microsoft AI’s Expanding Role in Model Development

The introduction of MAI-Voice-1 and MAI-1-preview signals Microsoft AI’s commitment to building its own core models, while still leveraging partnerships and open-source innovation. For many observers, this also suggests a step toward greater independence from OpenAI’s products, as Microsoft invests in developing its own large-scale systems for the future.

For users, this means more expressive, human-like interactions through voice, and more capable, responsive models for text-based use cases in Copilot. For the industry, it reflects Microsoft’s strategy of combining general-purpose foundation models with specialized systems to meet diverse user needs.

Above all, Microsoft AI is positioning itself for the long term — where voice becomes the defining interface for AI companions and foundation models become the backbone of trusted, applied AI. These early in-house releases mark the start of a larger portfolio designed to bring reliable, independent AI into everyday life.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.

Keep Reading

No posts found