• AiNews.com
  • Posts
  • Midjourney Launches First Video Generation Model

Midjourney Launches First Video Generation Model

A digital animation interface displays an AI-generated portrait of a young woman, divided vertically down the center. On the left side, the image is static, with the woman’s hair still and the background softly lit. On the right, her hair flows outward as if caught in motion, with swirling clouds suggesting dynamic movement. The interface is dark-themed and modern, featuring controls on the right labeled “Motion” and “Style,” both set to mid-range. A play button and timeline appear below the image, emphasizing the video editing context and the transition from still image to animated sequence.

Image Source: ChatGPT-4o

Midjourney Launches First Video Generation Model

Midjourney has released Version 1 of its video generation model, expanding its platform beyond static images to include motion. The new feature, called “Image-to-Video,” allows users to animate images directly within Midjourney’s interface—a key step toward the company's broader vision of real-time, interactive simulations.

A stylized video frame generated by Midjourney’s V1 Video Model features a bald, androgynous figure in a flowing, pleated cream-colored robe, standing against a windswept desert landscape. Behind them, long translucent fabric streams dramatically through the air, suggesting motion. The lighting is soft and cinematic, with a surreal, high-fashion aesthetic. The image was shared in a promotional tweet announcing the model’s launch and $10/month pricing.

Image Source: Midjourney X Post

A Step Toward Real-Time, Open-World AI

Midjourney believes the inevitable destination of this technology is AI models capable of generating real-time, open-world simulations—dynamic environments where users can move, interact, and explore freely.

According to the company, this release is part of a longer-term roadmap that includes:

  • Image models for static visuals

  • Video models to animate those visuals

  • 3D models for movement through space

  • Real-time models for immediate interaction

The company sees these components as the building blocks of AI systems capable of generating fully immersive, interactive environments on demand. Over the next year, Midjourney plans to develop and release each piece individually, gradually combining them into a unified system. While early use may be costly, the company believes broad accessibility will follow sooner than expected.

How the New Video Tool Works

The new “Image-to-Video” workflow builds on Midjourney’s existing image generation system. After creating an image, users can now press an “Animate” button to bring it to life. There are two animation modes:

  • Automatic Mode: Automatically generates a motion prompt, creating a general sense of movement with minimal user input.

  • Manual Mode: Allows users to describe how the image should move, giving more creative control over the animation.

Users can also choose between:

  • Low Motion: Ideal for subtle scenes with slow subject movement and a mostly static camera. The tradeoff is that, in some cases, the result may appear nearly motionless.

  • High Motion: Best for dynamic scenes where both the subject and camera are in motion, though this setting may introduce more visual errors.

Videos are created in 5-second clips, and users can extend each video up to four times for a total of 20 seconds.

Support for External Images and Cost Structure

Midjourney’s video tool isn’t limited to images generated within its platform. Users can upload outside images, mark them as a “start frame,” and then apply motion prompts to animate them.

At launch, video generation is only available via the web and comes at a premium. A video job costs approximately 8 times more than an image job, producing four 5-second clips. Midjourney estimates this is roughly equivalent to the cost of upscaling an image, translating to about one image’s cost per second of video—25 times lower than existing market rates.

  • Basic Plan ($10/month): Entry-level access to the video tool.

  • Pro Plan ($60/month) and Mega Plan ($120/month): Include unlimited video generation in the slower “Relax” mode.

The company notes that pricing may change as usage data comes in and infrastructure costs are better understood.

What This Means

Midjourney’s first video model marks a shift from static visuals to motion-based content creation—an important milestone toward building real-time, explorable virtual environments. While the tool is still evolving, its affordability and accessibility could help democratize animation and reshape how people engage with visual storytelling.

With this leap into motion, Midjourney now enters the same conversation as established players like Google Veo and Runway’s Gen‑4.

Google Veo 3 sets a high bar with cinematic visuals and integrated audio, automatically generating sound effects and dialogue alongside high-resolution, 8‑second clips. However, users note that it still struggles with multi-scene storytelling and precise spatial control.

Runway Gen‑4, building on its Gen‑3 Alpha, offers pro-level consistency across characters, styles, and camera movement, and is already widely used in creative industries—from independent film to mainstream TV and music videos.

By contrast, Midjourney’s V1 is still early stage: it focuses on short 5‑second loops, with lower resolution (around 480p) and occasionally uneven motion. Yet it delivers the distinctive aesthetic and highly controllable vibes that the Midjourney image engine is known for. Reddit users have highlighted its “cinematic camera movement” and strong potential for B‑roll sequences, while noting room for smoother long‑form dynamics.

What sets Midjourney apart—for now—is accessibility. The tool runs on the same familiar web and Discord workflows as its image generator, and it's available for as little as $10/month. That’s significantly more affordable than Veo’s $249 Ultra tier or Runway’s enterprise-level pricing.

Bottom line: Midjourney isn’t aiming to beat Veo or Runway on technical depth—yet. Instead, it offers a creative-first motion tool that's instantly available and familiar to its large user base. As V1 evolves, it could become a bridge between static art and dynamic storytelling—an accessible middle ground where creative control meets motion.

By breaking complex systems into usable parts—starting with images, now video—Midjourney is gradually building toward a more interactive and immersive future for AI-generated media.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.