This is the official repository for the paper Variational Reasoning for Language Models.
The repository currently includes data processing, training pipelines, and an evaluation suite. It's initialized from LLaMA-Factory and SkyThought.
This work uses two separate environments for training and evaluation.
You can install them as follows:
# Environment for training with LLaMA-Factory
conda create -n llama_factory python=3.10 -y
conda activate llama_factory
cd LLaMA-Factory
pip install -e ".[torch,metrics,deepspeed,vllm,wandb]" --no-build-isolation
conda deactivate
# Environment for evaluation and verification with SkyThought
conda create -n skythought python=3.10 -y
conda activate skythought
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
cd SkyThought
pip install -e .
cd ..
pip install timeout-decorator==0.5.0
conda deactivate
You can directly evaluate the provided final reasoning models
Please refer to SkyThought/variational_reasoning/eval/eval.sh
.
Notes:
-
SkyThought requires
vllm==0.7.0
by default. For Qwen3-based models, please upgrade tovllm==0.8.4
. -
For 32B models, set
tensor_parallel_size=4
.
We provide scripts to reproduce experiments.
For example, training with Qwen3-4B-Base is included; other experiments can be reproduced by changing the model and dataset path.
All training scripts assume 2 nodes (2 x 8 H100 GPUs). If using a different number of nodes, adjust gradient_accumulation_steps
in LLaMA-Factory/examples/variational_reasoning/*.yaml
accordingly to keep the effective batch size constant.
For some training steps (e.g., 1/2/9 below), you must precompute the value constant_loss_normalizer
used in LLaMA-Factory/examples/variational_reasoning/*.yaml
:
cd LLaMA-Factory
python -m variational_reasoning.data_process.compute_normalizer \
--dataset_name ${dataset_name} \
--base_model_name ${base_model_name}
cd ..
Please refer to LLaMA-Factory/variational_reasoning/train/train_initial_reasoning_model
.
Please refer to LLaMA-Factory/variational_reasoning/train/train_variational_posterior
.
Run 8 times with seeds 0–7:
python variational_reasoning/sample_from_variational_posterior.py
--prompt_template B \
--model_name "zhouxiangxin/Variational-Posterior-PB-4B" \
--seed ${seed}
Run 8 times with split_idx=0–7:
deepspeed --num_gpus 8 --master_port=`bash get_free_port.sh` \
variational_reasoning/estimate_logp_initial_reasoning_model.py \
--posterior_name "zhouxiangxin/Variational-Posterior-PB-4B" \
--initial_reasoning_model "zhouxiangxin/Initial-Reasoning-4B" \
--split_idx ${split_idx}
Run 8 times with split_idx=0–7:
deepspeed --num_gpus 8 --master_port=`bash get_free_port.sh` \
variational_reasoning/estimate_logp_variational_posterior.py \
--posterior_name "zhouxiangxin/Variational-Posterior-PB-4B" \
--split_idx ${split_idx}
6. Sample from the initial reasoning model $\pi_{\theta_0}$ (Optional, required by accuracy-based estimator)
Run 8 times with split_idx=0–7:
python variational_reasoning/sample_from_initial_reasoning_model.py \
--posterior_name "zhouxiangxin/Variational-Posterior-PB-4B" \
--initial_reasoning_model "zhouxiangxin/Initial-Reasoning-4B" \
--split_idx ${split_idx}
Run 8 times with dataset_idx=0–7:
python -m variational_reasoning.verify.verify_parallel --dataset_idx ${dataset_idx}
Choose one of the estimators:
# option: use an estimator of \pi(Y|x,z) based on geometric mean of token likelihood
python variational_reasoning/build_data_GML.py
# option: use an estimator of \pi(Y|x,z) based on accuracy
python variational_reasoning/build_data_Acc.py
Please refer to LLaMA-Factory/variational_reasoning/train/train_variational_posterior
.
If you find this code useful, please consider citing our paper:
@article{zhou2025variationalreasoninglanguagemodels,
title={Variational Reasoning for Language Models},
author={Xiangxin Zhou and Zichen Liu and Haonan Wang and Chao Du and Min Lin and Chongxuan Li and Liang Wang and Tianyu Pang},
journal={arXiv preprint arXiv:2509.22637},
year={2025}
}