By Zhongqian Fu, Tianyi Zhao, Kai Han, Hang Zhou, Xinghao Chen and Yunhe Wang. [arXiv]
This project is designed to evaluate the quantization inference behavior of World Model (Dino-WM). The code is based on the official Dino-WM implementation and integrates several Post-Training Quantization (PTQ) methods to replicate the core conclusions from the related research paper.
This project is built upon the official Dino-WM repository:
👉 https://github.com/gaoyuezhou/dino_wm.git
Please ensure you have the complete environment and dependencies to run the original Dino-WM planning code.
Please strictly follow the instructions in the official Dino-WM repository for the following steps:
- Python / CUDA environment setup
- Dependency installation
- Wall / PushT dataset download and preparation
Before proceeding with this README, please ensure that you can run the original floating-point (FP) planning inference code without modifications.
All commands in this document use placeholders. Please replace them with actual values before running the scripts:
| Placeholder | Description |
|---|---|
<PROJECT_ROOT> |
Root directory of the project |
<DATASET_DIR> |
Root directory of the dataset |
<GPU_ID> |
The GPU ID you want to use |
cd <PROJECT_ROOT>
mkdir -p plan_outputs
export DATASET_DIR=<DATASET_DIR>plan.py: Floating-point planning inference baseline without any quantization operations, used to compare performance degradation under different quantization configurations. Reference: DINO_WM repository.
# PushT
python plan.py --config-name plan_pusht.yaml model_name=pusht
# Wall
python plan.py --config-name plan_wall.yaml model_name=wallplan_act.py is used to statistically analyze the activation distribution during the iterative planning process of World Model, and generate the scale parameters required for SmoothQuant.
# Wall
CUDA_VISIBLE_DEVICES=<GPU_ID> python plan_act.py --config-name plan_wall.yaml model_name=wall_single tag=fp sta_scale=True n_evals=50 planner.max_iter=2 planner.sub_planner.opt_steps=10 scale_tag=iter2_opt10_eval50
# PushT
CUDA_VISIBLE_DEVICES=<GPU_ID> python plan_act.py --config-name plan_pusht.yaml model_name=pusht tag=fp sta_scale=True n_evals=50 planner.max_iter=2 planner.sub_planner.opt_steps=30 scale_tag=iter2_opt30_eval50The following scripts are used to evaluate the planning performance of Dino-WM under different quantization methods and bit-width configurations. Below are examples using the Wall dataset.
# Group size
export W_GROUP_SIZE=-1
# Or
export W_GROUP_SIZE=128Script: plan_quant_omse_rtn.py
CUDA_VISIBLE_DEVICES=<GPU_ID> python -u plan_quant_omse_rtn.py --config-name plan_wall.yaml model_name=wall_single quant=True quant_encoder=True predictor_wbit=8 predictor_abit=8 encoder_wbit=8 encoder_abit=8 w_quant_method="minmax" a_quant_method="minmax" calib_mode_a="layer_wise" quant_iter=2 tag=RTN_quant_Pw8a8_Ew8a8_per_tensor_iter2 | tee -a plan_outputs/logfile_plan_wall_RTN.txt 2>&1CUDA_VISIBLE_DEVICES=<GPU_ID> python -u plan_quant_omse_rtn.py --config-name plan_wall.yaml model_name=wall_single quant=True quant_encoder=True predictor_wbit=8 predictor_abit=8 encoder_wbit=8 encoder_abit=8 w_quant_method="minmax" a_quant_method="minmax" calib_mode_a="token_wise" quant_iter=2 tag=RTN_quant_Pw8a8_Ew8a8_per_token_iter2 | tee -a plan_outputs/logfile_plan_wall_RTN.txt 2>&1Script: plan_quant_omse_rtn.py
CUDA_VISIBLE_DEVICES=<GPU_ID> python -u plan_quant_omse_rtn.py --config-name plan_wall.yaml model_name=wall_single quant=True quant_encoder=True predictor_wbit=8 predictor_abit=8 encoder_wbit=8 encoder_abit=8 w_quant_method="omse" a_quant_method="minmax" calib_mode_a="layer_wise" quant_iter=2 tag=OMSE_quant_Pw8a8_Ew8a8_per_tensor_iter2 | tee -a plan_outputs/logfile_plan_wall_OMSE.txt 2>&1Script: plan_quant_smooth.py
CUDA_VISIBLE_DEVICES=<GPU_ID> python -u plan_quant_smooth.py --config-name plan_wall.yaml model_name=wall_single quant=True quant_encoder=True predictor_wbit=8 predictor_abit=8 encoder_wbit=8 encoder_abit=8 w_quant_method="minmax" a_quant_method="minmax" calib_mode_a="layer_wise" quant_iter=2 scale_tag=iter2_opt10_eval50 tag=smooth_quant_Pw8a8_Ew8a8_per_tensor_iter2 | tee -a plan_outputs/logfile_plan_wall_smoothquant.txt 2>&1Script: plan_quant_omniquant.py
CUDA_VISIBLE_DEVICES=<GPU_ID> python -u plan_quant_omniquant.py --config-name plan_wall.yaml model_name=wall_single quant=True quant_encoder=True predictor_wbit=8 predictor_abit=8 encoder_wbit=8 encoder_abit=8 w_quant_method="omniquant" a_quant_method="omniquant" calib_mode_a="layer_wise" quant_iter=2 scale_tag=iter2_opt10_eval50 tag=omni_quant_Pw8a8_Ew8a8_per_tensor_iter2 | tee -a plan_outputs/logfile_plan_wall_omniquant.txt 2>&1Script: plan_quant_awq.py
CUDA_VISIBLE_DEVICES=<GPU_ID> python -u plan_quant_awq.py --config-name plan_wall.yaml model_name=wall_single quant=True quant_encoder=True predictor_wbit=8 predictor_abit=16 encoder_wbit=8 encoder_abit=16 w_quant_method="awq" a_quant_method="minmax" quant_iter=2 scale_tag=iter2_opt10_eval50 tag=awq_quant_Pw8a16_Ew8a16_iter2 | tee -a plan_outputs/logfile_plan_wall_awq.txt 2>&1| Parameter | Description |
|---|---|
predictor_wbit / encoder_wbit |
Weight quantization bit-width |
predictor_abit / encoder_abit |
Activation quantization bit-width |
w_quant_method |
Weight quantization method |
a_quant_method |
Activation quantization method |
quant_iter |
Quantization calibration iterations |
scale_tag |
Activation scale for SmoothQuant |
planner.max_iter |
Outer loop iterations of the planner |
planner.sub_planner.opt_steps |
Optimization steps for the sub-planner |
n_evals |
Number of evaluation rounds |
calib_mode_a |
Activation quantization granularity: "layer_wise"(default) / "token_wise" |
| Script | Function |
|---|---|
plan.py |
Floating-point inference (FP baseline) |
plan_act.py |
Activation statistics (for SmoothQuant) |
plan_quant_omse_rtn.py |
RTN / OMSE |
plan_quant_smooth.py |
SmoothQuant |
plan_quant_omniquant.py |
OmniQuant |
plan_quant_awq.py |
AWQ |
We appreciate the following code bases: DINO-WM, SmoothQuant, AWQ, OmniQuant, FQ-ViT.
@misc{fu2026empiricalstudyworldmodel,
title={An Empirical Study of World Model Quantization},
author={Zhongqian Fu and Tianyi Zhao and Kai Han and Hang Zhou and Xinghao Chen and Yunhe Wang},
year={2026},
eprint={2602.02110},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.02110},
}