Ying Yang1, Zhengyao Lv1,2, Tianlin Pan1,3, Haofan Wang4, Binxin Yang5, Hubery Yin6, Chen Li6, Ziwei Liu6, Chenyang Si1
1PRLab, Nanjing University · 2The University of Hong Kong · 3University of Chinese Academy of Sciences · 4LibLib.ai · 5WeChat, Tencent Inc. · 6Nanyang Technological University
If the embedded YouTube player is blocked in your region/network, please visit the project page for mirrored videos.
- Paper released
- Project page released
- Demo videos released
- Code released
Long-horizon interactive video generation often suffers from spatial drift and scene collapse.
We find that a major source of instability is error accumulation, especially within the same scene: small drifts accumulate under the same viewpoint, eventually leading to the collapse of the entire scene.
This perspective is different from the commonly discussed error accumulation caused by the train–inference mismatch, and we find that scene collapse is largely driven by this factor.
As illustrated below:
Small drifts may seem negligible at first, but when repeatedly accumulated under the same viewpoint, they gradually grow and eventually lead to severe scene collapse.
StableWorld addresses this issue at the root by continuously filtering out degraded frames while retaining geometrically consistent ones, preventing drift from compounding over time.
StableWorld is a simple yet effective Dynamic Frame Eviction Mechanism that is model-agnostic and can be plugged into different interactive generation frameworks (e.g., Matrix-Game, Open-Oasis, Hunyuan-GameCraft) to improve stability, temporal consistency, and generalization.
We provide extensive interactive demonstrations across multiple frameworks:
- Matrix-Game 2.0
- Open-Oasis
- Hunyuan-GameCraft
- Ultra-long video generation (thousands of frames)
- Self-Forcing (autoregressive video)
Please see the project page for full videos and side-by-side comparisons:
👉 https://sd-world.github.io/
If you find this work helpful, please consider citing:
@article{stableworld2026,
title={StableWorld: Towards Stable and Consistent Long Interactive Video Generation},
author={Ying Yang and Zhengyao Lv and Tianlin Pan and Haofan Wang and Binxin Yang and Hubery Yin and Chen Li and Ziwei Liu and Chenyang Si},
journal={arXiv preprint arXiv:2601.15281},
year={2026}
}

