Image Source: ChatGPT-4o

Midjourney Launches First Video Generation Model

Midjourney has released Version 1 of its video generation model, expanding its platform beyond static images to include motion. The new feature, called “Image-to-Video,” allows users to animate images directly within Midjourney’s interface—a key step toward the company's broader vision of real-time, interactive simulations.

Image Source: Midjourney X Post

A Step Toward Real-Time, Open-World AI

Midjourney believes the inevitable destination of this technology is AI models capable of generating real-time, open-world simulations—dynamic environments where users can move, interact, and explore freely.

According to the company, this release is part of a longer-term roadmap that includes:

Image models for static visuals
Video models to animate those visuals
3D models for movement through space
Real-time models for immediate interaction

The company sees these components as the building blocks of AI systems capable of generating fully immersive, interactive environments on demand. Over the next year, Midjourney plans to develop and release each piece individually, gradually combining them into a unified system. While early use may be costly, the company believes broad accessibility will follow sooner than expected.

How the New Video Tool Works

The new “Image-to-Video” workflow builds on Midjourney’s existing image generation system. After creating an image, users can now press an “Animate” button to bring it to life. There are two animation modes:

Automatic Mode: Automatically generates a motion prompt, creating a general sense of movement with minimal user input.
Manual Mode: Allows users to describe how the image should move, giving more creative control over the animation.

Users can also choose between:

Low Motion: Ideal for subtle scenes with slow subject movement and a mostly static camera. The tradeoff is that, in some cases, the result may appear nearly motionless.
High Motion: Best for dynamic scenes where both the subject and camera are in motion, though this setting may introduce more visual errors.

Videos are created in 5-second clips, and users can extend each video up to four times for a total of 20 seconds.

Support for External Images and Cost Structure

Midjourney’s video tool isn’t limited to images generated within its platform. Users can upload outside images, mark them as a “start frame,” and then apply motion prompts to animate them.

At launch, video generation is only available via the web and comes at a premium. A video job costs approximately 8 times more than an image job, producing four 5-second clips. Midjourney estimates this is roughly equivalent to the cost of upscaling an image, translating to about one image’s cost per second of video—25 times lower than existing market rates.

Basic Plan ($10/month): Entry-level access to the video tool.
Pro Plan ($60/month) and Mega Plan ($120/month): Include unlimited video generation in the slower “Relax” mode.

The company notes that pricing may change as usage data comes in and infrastructure costs are better understood.

What This Means

Midjourney’s first video model marks a shift from static visuals to motion-based content creation—an important milestone toward building real-time, explorable virtual environments. While the tool is still evolving, its affordability and accessibility could help democratize animation and reshape how people engage with visual storytelling.

With this leap into motion, Midjourney now enters the same conversation as established players like Google Veo and Runway’s Gen‑4.

Google Veo 3 sets a high bar with cinematic visuals and integrated audio, automatically generating sound effects and dialogue alongside high-resolution, 8‑second clips. However, users note that it still struggles with multi-scene storytelling and precise spatial control.

Runway Gen‑4, building on its Gen‑3 Alpha, offers pro-level consistency across characters, styles, and camera movement, and is already widely used in creative industries—from independent film to mainstream TV and music videos.

By contrast, Midjourney’s V1 is still early stage: it focuses on short 5‑second loops, with lower resolution (around 480p) and occasionally uneven motion. Yet it delivers the distinctive aesthetic and highly controllable vibes that the Midjourney image engine is known for. Reddit users have highlighted its “cinematic camera movement” and strong potential for B‑roll sequences, while noting room for smoother long‑form dynamics.

What sets Midjourney apart—for now—is accessibility. The tool runs on the same familiar web and Discord workflows as its image generator, and it's available for as little as $10/month. That’s significantly more affordable than Veo’s $249 Ultra tier or Runway’s enterprise-level pricing.

Bottom line: Midjourney isn’t aiming to beat Veo or Runway on technical depth—yet. Instead, it offers a creative-first motion tool that's instantly available and familiar to its large user base. As V1 evolves, it could become a bridge between static art and dynamic storytelling—an accessible middle ground where creative control meets motion.

By breaking complex systems into usable parts—starting with images, now video—Midjourney is gradually building toward a more interactive and immersive future for AI-generated media.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.

Midjourney Launches First Video Generation Model

Midjourney Launches First Video Generation Model

Keep Reading

AiNews.com