Skip to content

ziqihuangg/Awesome-From-Video-Generation-to-World-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

292 Commits
 
 
 
 

Repository files navigation

Awesome From Video Generation to World Model

overall_structure

The field of video generation is undergoing a paradigm shift - from generating realistic and appealing visuals to constructing world models that can simulate interactive and navigable environments. These models are not just visual tools; they serve as testbeds for training and evaluating intelligent agents, such as robots, autonomous vehicles, or virtual avatars. A central goal is to enable agents to perceive, act, and plan within generated video scenarios as if they were interacting with the real world. We compile key works that push video generation toward actionable world modeling, focusing physical plausibility, and the capacity for agents to navigate, manipulate, and learn from these synthetic environments.

arXiv Project Page Awesome visitors PRs Welcome

Overview

This repository currently contains the paper list for "Video Generation towards World Model".

What You'll Find Here

We hope to support the research and industrial communities by systematically collecting and organizing influential works that drive progress in video generation for world modeling.

News 🔥

Updates

This repository is updated periodically. If you have suggestions for additional resources, updates on methodologies, or fixes for expiring links, please feel free to do any of the following:

  • raise an Issue,
  • nominate awesome related works with Pull Requests,
  • For other queries: email both Ziqi ZIQI002 at e dot ntu dot edu dot sg and Jingtong yuejingtong137 at gmail dot com.

Table of Contents

1. Generation 1: Faithfulness - Accurate Simulation of the Real World

1.1 Video Foundation Model

Date Venue Acronym Paper Project Repo@GitHub
2025-03-04 Arxiv Helios Helios: Real Real-Time Long Video Generation Model Website Code
2024-12-30 Arxiv LTX-Video LTX-Video: Realtime Video Latent Diffusion Website Code
2024-12-12 Arxiv Owl-1 Owl-1: Omni World Model for Consistent Long Video Generation Code
2024-12-10 Arxiv STIV STIV: Scalable Text and Image Conditioned Video Generation
2024-09-24 JT-CV Website
2024-09 Hailuo AI Website
2024-06-06 VideoTetris VideoTetris: Towards Compositional Text-to-Video Generation Website Code
2024-02-22 Snap Video Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Website
2024-01-23 Arxiv Lumiere Lumiere: A Space-Time Diffusion Model for Video Generation Website
2024-01-17 CVPR24 VideoCrafter2 Videocrafter2:Overcoming data limitations for high-quality video diffusion models Website Code
2024-01-09 Arxiv MagicVideo-V2 MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation Website
2024-01-05 TMLR25 Latte Latte: Latent Diffusion Transformer for Video Generation Website Code
2023-12-07 Arxiv HiGen Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation Website Code
2023-11-25 Arxiv SVD Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets Code
2023-11-07 Arxiv I2VGen-XL I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Code
2023-10-31 ICLR24 SEINE SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction Website Code
2023-10-30 Arxiv VideoCrafter1 Videocrafter1: Open diffusion models for high-quality video generation Website Code
2023-10-18 ECCV24 DynamiCrafter DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors Website Code
2023-10-09 ICLR24 MAGVIT-v2 Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
2023-09-27 IJCV24 Show-1 Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Website Code
2023-09-26 IJCV24 LaVie LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models Website Code
2023-09-01 Arxiv VideoGen VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation Website
2023-08-12 Arxiv ModelScope ModelScope Text-to-Video Technical Report Website
2023-07-10 ICLR24 AnimateDiff AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning Website Code
2023-06-29 Arxiv Pika Website
2023-06-07 Gen-2 Website
2023-02 Gen-1 Gen-1: The Next Step Forward for Generative Al Website
2022-12-10 CVPR23 MAGVIT MAGVIT:Masked Generative Video Transformer Website Code
2022-11-20 Arxiv MagicVideo MagicVideo: Efficient Video Generation With Latent Diffusion Models Website
2022-10-05 Arxiv Imagen Video Imagen Video: High Definition Video Generation with Diffusion Models Website
2022-09-29 Arxiv Make-A-Video Make-A-Video: Text-to-Video Generation without Text-Video Data
2022-05-29 ICLR23 CogVideo CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers Code

1.2 Other Video Generation Model

1.2.1 GAN Based Video Generation

1.2.2 U-Net Based Video Generation

1.2.3 DiT Based Video Generation

1.2.4 Autoregressive Based Video Generation

1.3 Conditioned World Model

1.3.1 Conditined World Model in General Scene

Geometry Condition

3D Condition

Physics Condition

Trajectory Navigation

Camera Motion Navigation

Instruction Navigation

Action Navigation

1.3.2 Conditined World Model in Robotics

Action Navigation

Instruction Navigation

Goal Navigation

Hybrid Navigation

1.3.3 Conditined World Model in Autonomous Driving

Layout Condition

Instruction Navigation

Action Navigation

Hybrid Navigation

Other Navigation

1.3.4 Conditined World Model in Gaming

Controller Navigation

Action Navigation

2. Generation 2: Interactiveness - Controllability and Interactive Dynamics

2.1 High-quality World Foundation Model

Date Venue Acronym Paper Project Repo@GitHub
2026-03-04 Arxiv Helios Helios: Real Real-Time Long Video Generation Model Website Code
2026-02-02 Arxiv Causal Forcing Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation Website Code
2026-01-24 Arxiv SkyReels-V3 SkyReels-V3 Technique Report Website Code
2025-12-23 Arxiv SemanticGen SemanticGen: Video Generation in Semantic Space Website
2025-12-18 Arxiv Kling-Omni Kling-Omni Technical Report Website
2025-12-16 Arxiv MemFlow MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives Website
2025-06-18 Hailuo 02 Website
2025-06-10 Arxiv Seedance 1.0 Seedance 1.0: Exploring the Boundaries of Video Generation Models Website
2025-06-09 Arxiv Self Forcing Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Website Code
2025-05-19 Arxiv MAGI-1 MAGI-1: Autoregressive Video Generation at Scale Website Code
2025-05 Veo 3 Veo 3: AI Video Generation with Realistic Sound Website
2025-04-17 Arxiv SkyReels-V2 SkyReels-V2: Infinite-length Film Generative Model Website Code
2025-04-07 Nova Reel Website
2025-03-31 Gen-4 Website
2025-03-26 Arxiv Wan 2.1 Wan: Open and Advanced Large-Scale Video Generative Models Website Code
2025-03-13 Step-Video-T2V Website
2025-03-12 Arxiv Open-Sora2.0 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Website Code
2025-02-11 Arxiv Magic 1-For-1 Magic 1-For-1: Generating One Minute Video Clips within One Minute Website Code
2025-01-21 MiracleVision Website
2025-01-15 Arxiv RepVideo RepVideo: Rethinking Cross-Layer Representation for Video Generation Website Code
2025-01-14 Arxiv Vchitect-2.0 Vchitect-2.0: Parallel transformer for scaling up video diffusion models Website Code
2025-01-07 Arxiv Cosmos Cosmos World Foundation Model Platform for Physical AI Website Code
2024-12-29 Arxiv Open-Sora Open-sora: Democratizing efficient video production for all Website Code
2024-12-10 CVPR25 CausVid From Slow Bidirectional to Fast Autoregressive Video Diffusion Models Website Code
2024-12-03 Arxiv HunyuanVideo HunyuanVideo: A Systematic Framework For Large Video Generative Models Website Code
2024-11-28 Arxiv Open-Sora Plan Open-Sora Plan: Open-Source Large Video Generation Model Code
2024-10-22 Mochi-1 Website Code
2024-08-12 ICLR25 Cogvideox Cogvideox:Text-to-video diffusion models with an expert transformer Code
2024-07-08 Arxiv Mira MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions Website Code
2024-06-17 Gen-3 Website
2024-06-13 Luma Website
2024-06-06 Kling Website
2024-05-29 Arxiv EasyAnimate Easyanimate: A high-performance long video generation method based on transformer architecture Website Code
2024-05-09 Jimeng Website
2024-05-07 Arxiv Vidu Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models Website
2024-02-15 Sora Video generation models as world simulators Website

World Model Regulation Methods

Stabilization:

Inference-time Physics Alignment:

Efficiency:

Planning Optimization:

Long Video Generation Methods

2.2 Video Generation as World Model in General Scenes

2.2.1 Geometry Condition Prior World Model

2.2.2 3D Condition Prior World Model

2.2.3 Physical Prior World Model

2.2.4 Audio Driven World Model

2.2.5 Trajectory Navigation World Model

2.2.6 Camera Motion Navigation World Model

2.2.7 Instruction Navigation World Model

2.2.8 Action Navigation World Model

2.3 Video Generation as World Model in Robotics

2.3.1 Action Navigation World Model

2.3.2 Instruction Navigation World Model

2.3.3 Goal Navigation World Model

2.3.4 Hybrid Navigation World Model

2.3.5 Real-time Interactive World Model

2.4. Video Generation as World Model in Autonomous Driving

2.4.1 Layout Prior World Model

2.4.2 Instruction Navigation World Model

2.4.3 Trajectory Navigation World Model

2.4.4 Action Navigation World Model

2.4.5 Hybrid Navigation World Model

2.4.6 Other Navigation World Model

2.5 Video Generation as World Model in Gaming

2.5.1 Controller Navigation World Model

2.5.2 Action Navigation World Model

2.5.3 Hybrid Navigation World Model

3. Generation 3: Planning - Modeling the Future Evolution of Complex Systems

Date Venue Acronym Paper Project Repo@GitHub
2026-01-28 Arxiv LingBot-World Advancing Open-source World Models Website Code
2025-12-26 Arxiv Yume1.5 Yume1.5: A Text-Controlled Interactive World Generation Model Website Code
2025-12-16 Arxiv WorldPlay WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Website Code
2025-12-09 ICLR26 Astra Astra: General Interactive World Model with Autoregressive Denoising Website Code
2025-07-17 MirageLSD MirageLSD: The First Live-Stream Diffusion AI Video Model Website
2025-06-11 Arxiv V-JEPA 2 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Website Code
2024-12-04 CVPR25 NWM Navigation World Models Website Code

For Robotics:

Note: Action and goal navigation for robotics.

4. Generation 4: Counterfactual and Outlier Modeling

4.1 Macroscopic Scale World Model

4.2 Mesoscopic Scale World Model

4.3 Microscopic Scale World Model

5. Evaluation and Datasets

5.1 Evaluation Metrics of Video Generation

5.2 Evaluation Metrics of World Model

5.3 Datasets

6. Study and Rethinking

6.1 Survey

6.2 Position & Perspective

7. Downstream Tasks for World Modeling

7.1 World Models as Data Generators

7.2 World Models as Reasoning Proxy

8. World Modele for Other Application

8.1 World Models for Medicine

Citation

If you find this paper useful, please consider citing:

@article{yue2025video,
  title={Simulating the World Model with Artificial Intelligence: A Roadmap},
  author={Jingtong Yue, Ziqi Huang, Zhaoxi Chen, Xintao Wang, Pengfei Wan, Ziwei Liu},
  journal={arXiv preprint arXiv:2511.08585},
  year={2025}
}

About

A list of works on video generation towards world model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors