-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
==============================
System Info
==============================
OS : Tencent tlinux 2.6 (x86_64)
GCC version : (GCC) 11.4.0
Clang version : 18.1.8 (Red Hat 18.1.8-1.module+el8.10.0+703+ec7b33ba)
CMake version : version 3.31.6
Libc version : glibc-2.28
==============================
PyTorch Info
==============================
PyTorch version : 2.8.0+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.10.18 (main, Jun 5 2025, 13:14:17) [GCC 11.2.0] (64-bit runtime)
Python platform : Linux-5.4.241-1-tlinux4-0017.7-x86_64-with-glibc2.28
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : 12.8.93
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration :
GPU 0: NVIDIA H20
GPU 1: NVIDIA H20
GPU 2: NVIDIA H20
GPU 3: NVIDIA H20
GPU 4: NVIDIA H20
GPU 5: NVIDIA H20
GPU 6: NVIDIA H20
GPU 7: NVIDIA H20
Nvidia driver version : 535.161.08
cuDNN version : Probably one of the following:
/usr/lib64/libcudnn.so.8.9.7
/usr/lib64/libcudnn_adv_infer.so.8.9.7
/usr/lib64/libcudnn_adv_train.so.8.9.7
/usr/lib64/libcudnn_cnn_infer.so.8.9.7
/usr/lib64/libcudnn_cnn_train.so.8.9.7
/usr/lib64/libcudnn_ops_infer.so.8.9.7
/usr/lib64/libcudnn_ops_train.so.8.9.7
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 384
On-line CPU(s) list: 0-383
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
BIOS Vendor ID: Advanced Micro Devices, Inc.
CPU family: 25
Model: 17
Model name: AMD EPYC 9K84 96-Core Processor
BIOS Model name: AMD EPYC 9K84 96-Core Processor
Stepping: 1
CPU MHz: 3699.843
CPU max MHz: 2600.0000
CPU min MHz: 1500.0000
BogoMIPS: 5199.99
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 32768K
NUMA node0 CPU(s): 0-95,192-287
NUMA node1 CPU(s): 96-191,288-383
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d
==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.3.1.post1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.14.1
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.27.3
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pynvml==13.0.1
[pip3] pyzmq==27.1.0
[pip3] sentence-transformers==5.1.1
[pip3] torch==2.8.0
[pip3] torchaudio==2.8.0
[pip3] torchvision==0.23.0
[pip3] transformers==4.56.2
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.4.0
[conda] flashinfer-python 0.3.1.post1 pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.8.4.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.8.90 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.8.93 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.8.90 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.10.2.21 pypi_0 pypi
[conda] nvidia-cudnn-frontend 1.14.1 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.3.3.83 pypi_0 pypi
[conda] nvidia-cufile-cu12 1.13.1.3 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.9.90 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.7.3.90 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.5.8.93 pypi_0 pypi
[conda] nvidia-cusparselt-cu12 0.7.1 pypi_0 pypi
[conda] nvidia-ml-py 13.580.82 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.27.3 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.8.93 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.8.90 pypi_0 pypi
[conda] pynvml 13.0.1 pypi_0 pypi
[conda] pyzmq 27.1.0 pypi_0 pypi
[conda] sentence-transformers 5.1.1 pypi_0 pypi
[conda] torch 2.8.0 pypi_0 pypi
[conda] torchaudio 2.8.0 pypi_0 pypi
[conda] torchvision 0.23.0 pypi_0 pypi
[conda] transformers 4.56.2 pypi_0 pypi
[conda] transformers-stream-generator 0.0.5 pypi_0 pypi
[conda] triton 3.4.0 pypi_0 pypi
==============================
vLLM Info
==============================
ROCM Version : Could not collect
vLLM Version : 0.10.2
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled
🐛 Describe the bug
I encountered an error when using lmms-eval to evaluate the model. The model is initialized via vllm, and I encountered an assertion error. Could you please advise what this assertion error might be related to, and where I should start debugging?
The Assertion Error:
AssertionError: Failed to apply prompt replacement for mm_items['image'][1]
点击展开:完整的运行脚本 (Click to expand: Full Run scripts)
MODEL_DIR="/root/save_models/opensource/VisionThink/VisionThink-Efficient"
MODEL_NAME="VisionThink-Efficient"
MODEL_CLASS="visionthink_vllm_tool"
GPU_LIST="0,1,2,3" # 字注意力头为 28 个,所以只能 4 卡不能 8 卡
LOG_SAMPLES_SUFFIX="vllm" # Specify a suffix for the log_samples file name.
TASKS="mmbench_en_test,mmmu_val,pope,mme,mathvista_testmini,mathverse_testmini_vision_only,mmvet"
# 计算 GPU_LIST 的数量
TENSOR_PARALLEL_SIZE=$(echo $GPU_LIST | awk -F',' '{print NF}')
LMMS_EVAL_DATASET_CACHE="/root/dataset/opensource/lmms_eval"
VLLM_CACHE_ROOT="/root/save_models/vllm_cache"
HF_TOKEN="REMOVED"
PROJECT_DIR="/root/opensource/lmms-eval"
OUTPUT_PATH="$PROJECT_DIR/lmms_eval_outputs/${MODEL_NAME}"
CACHE_PATH="$PROJECT_DIR/lmms_eval_cache/sqlite_cache_"
# 环境变量设置
export HF_HOME="$LMMS_EVAL_DATASET_CACHE"
export VLLM_CACHE_ROOT="$VLLM_CACHE_ROOT"
export VLLM_WORKER_MULTIPROC_METHOD="spawn"
export HF_TOKEN="$HF_TOKEN"
CUDA_VISIBLE_DEVICES=$GPU_LIST python -m lmms_eval \
--model "$MODEL_CLASS" \
--model_args "model_version=${MODEL_DIR},tensor_parallel_size=${TENSOR_PARALLEL_SIZE},\
trust_remote_code=True,max_images=2,prompt=tool_call,enable_tool_call=True,\
downsample_image=True,max_token=40960" \
--tasks "${TASKS}" \
--batch_size 1024 \
--log_samples \
--log_samples_suffix "$LOG_SAMPLES_SUFFIX" \
--use_cache "$CACHE_PATH" \
--cache_requests "true" \
--output_path "$OUTPUT_PATH" \
--verbosity DEBUG \
--seed 42
点击展开:完整的错误日志 (Click to expand: Full Error Log)
2025-09-26 19:17:39 | INFO | lmms_eval.models.chat.visionthink_vllm_tool:generate_until:401 - [Round #0 Rollout Tool Call Tri
gger] For THIS round, ids 32 need to apply function tool using: {'tool_type': 'resize', 'error_info': None} ...
2025-09-26 19:17:39 | INFO | lmms_eval.models.chat.visionthink_vllm_tool:generate_until:412 - [Round #0 Rollout END] For NEXT
round, We hava 238 trajs to complete ...
2025-09-26 19:17:39 | INFO | lmms_eval.models.chat.visionthink_vllm_tool:generate_until:328 - [Round #1 Rollout START] For TH
IS round, We hava 238 trajs to complete ...
2025-09-26 19:18:22 | INFO | lmms_eval.models.chat.visionthink_vllm_tool:generate_until:405 - [Round #1 Rollout Tool Call Tri
gger] No ids need to apply function tool for this round.
2025-09-26 19:18:22 | INFO | lmms_eval.models.chat.visionthink_vllm_tool:generate_until:412 - [Round #1 Rollout END] For NEXT
round, We hava 0 trajs to complete ...
Model Responding: 40%|███████████████████████████▋ | 9216/22946 [03:23<08:23, 27.26it/s]
2025-09-26 19:18:34 | INFO | lmms_eval.models.chat.visionthink_vllm_tool:generate_until:328 - [Round #0 Rollout START] For TH
IS round, We hava 1024 trajs to complete ...
Traceback (most recent call last): [42/8503]
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/opensource/lmms-eval/lmms_eval/__main__.py", line 539, in <module>
cli_evaluate()
File "/root/opensource/lmms-eval/lmms_eval/__main__.py", line 366, in cli_evaluate
raise e
File "/root/opensource/lmms-eval/lmms_eval/__main__.py", line 347, in cli_evaluate
results, samples = cli_evaluate_single(args)
File "/root/opensource/lmms-eval/lmms_eval/__main__.py", line 474, in cli_evaluate_single
results = evaluator.simple_evaluate(
File "/root/opensource/lmms-eval/lmms_eval/utils.py", line 536, in _wrapper
return fn(*args, **kwargs)
File "/root/opensource/lmms-eval/lmms_eval/evaluator.py", line 268, in simple_evaluate
results = evaluate(
File "/root/opensource/lmms-eval/lmms_eval/utils.py", line 536, in _wrapper
return fn(*args, **kwargs)
File "/root/opensource/lmms-eval/lmms_eval/evaluator.py", line 501, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs) # Choiszt run generate until
File "/root/opensource/lmms-eval/lmms_eval/models/chat/visionthink_vllm_tool.py", line 331, in generate_until
outputs = self.inference_engine.generate(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 388, in generate
self._validate_and_add_requests(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1501, in _validate_and_add_requests [12/8503]
self._add_request(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1519, in _add_request
self.llm_engine.add_request(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 213, in add_request
prompt_str, request = self.processor.process_inputs(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/v1/engine/processor.py", line 365, in process_inputs
processed_inputs: ProcessorInputs = self.input_preprocessor.preprocess(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/inputs/preprocess.py", line 919, in reprocess
return self._process_decoder_only_prompt(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/inputs/preprocess.py", line 866, in_process_decoder_only_prompt
prompt_comps = self._prompt_to_llm_inputs(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/inputs/preprocess.py", line 540, in_prompt_to_llm_inputs
return self._process_tokens(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/inputs/preprocess.py", line 398, in_process_tokens
inputs = self._process_multimodal(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/inputs/preprocess.py", line 278, in_process_multimodal
mm_input = mm_processor.apply(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 181$, in apply
prompt_ids, prompt, mm_placeholders = self._maybe_apply_prompt_updates(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 176$, in _maybe_apply_prompt_updates
) = self._apply_prompt_updates(
File "/root/miniconda3/envs/psp-lmms-eval/lib/python3.10/site-packages/vllm/multimodal/processing.py", line 169$, in _apply_prompt_updates
assert update_idx is not None, (
AssertionError: Failed to apply prompt replacement for mm_items['image'][1]
The conda environment I used:
点击展开:完整的环境 (Click to expand: Full Env)
Package Version Editable project location
--------------------------------- ------------- ----------------------------------------------------------------------
accelerate 1.10.1
aiohappyeyeballs 2.6.1
aiohttp 3.12.15
aiosignal 1.4.0
annotated-types 0.7.0
antlr4-python3-runtime 4.7.2
anyio 4.11.0
astor 0.8.1
async-timeout 5.0.1
attrs 25.3.0
av 15.1.0
black 25.9.0
blake3 1.0.6
cachetools 6.1.0
cbor2 5.7.0
certifi 2025.8.3
cffi 2.0.0
cfgv 3.4.0
chardet 5.2.0
charset-normalizer 3.4.3
click 8.3.0
cloudpickle 3.1.1
colorama 0.4.6
compressed-tensors 0.11.0
cupy-cuda12x 13.6.0
DataProperty 1.1.0
datasets 4.1.0
decord 0.6.0
depyf 0.19.0
dill 0.4.0
diskcache 5.6.3
distlib 0.4.0
distro 1.9.0
dnspython 2.8.0
einops 0.8.1
email-validator 2.3.0
et_xmlfile 2.0.0
evaluate 0.4.6
exceptiongroup 1.3.0
fastapi 0.117.1
fastapi-cli 0.0.13
fastapi-cloud-cli 0.2.1
fastrlock 0.8.3
filelock 3.19.1
flashinfer-python 0.3.1.post1
frozendict 2.4.6
frozenlist 1.7.0
fsspec 2025.9.0
ftfy 6.3.1
gguf 0.17.1
gitdb 4.0.12
GitPython 3.1.45
h11 0.16.0
hf_transfer 0.1.9
hf-xet 1.1.10
httpcore 1.0.9
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.35.1
identify 2.6.14
idna 3.10
interegular 0.3.3
isort 6.0.1
Jinja2 3.1.6
jiter 0.11.0
joblib 1.5.2
jsonlines 4.0.0
jsonschema 4.25.1
jsonschema-specifications 2025.9.1
lark 1.2.2
latex2sympy2 1.9.1
Levenshtein 0.27.1
llguidance 0.7.30
llvmlite 0.44.0
lm-format-enforcer 0.11.3
lmms_eval 0.4.0
loguru 0.7.3
lxml 6.0.2
markdown-it-py 4.0.0
MarkupSafe 3.0.2
mbstrdecoder 1.1.4
mdurl 0.1.2
mistral_common 1.8.5
mpmath 1.3.0
msgpack 1.1.1
msgspec 0.19.0
multidict 6.6.4
multiprocess 0.70.16
mypy_extensions 1.1.0
networkx 3.4.2
ninja 1.13.0
nltk 3.9.1
nodeenv 1.9.1
numba 0.61.2
numexpr 2.13.0
numpy 1.26.4
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cudnn-frontend 1.14.1
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-ml-py 13.580.82
nvidia-nccl-cu12 2.27.3
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvtx-cu12 12.8.90
openai 1.109.1
openai-harmony 0.0.4
opencv-python-headless 4.11.0.86
openpyxl 3.1.5
outlines_core 0.2.11
packaging 25.0
pandas 2.3.2
partial-json-parser 0.2.1.1.post6
pathspec 0.12.1
pathvalidate 3.3.1
peft 0.17.1
pillow 11.3.0
pip 25.2
platformdirs 4.4.0
portalocker 3.2.0
pre_commit 4.3.0
prometheus_client 0.23.1
prometheus-fastapi-instrumentator 7.1.0
propcache 0.3.2
protobuf 6.32.1
psutil 7.1.0
py-cpuinfo 9.0.0
pyarrow 21.0.0
pybase64 1.4.2
pybind11 3.0.1
pycocoevalcap 1.2
pycocotools 2.0.10
pycountry 24.6.1
pycparser 2.23
pydantic 2.11.9
pydantic_core 2.33.2
pydantic-extra-types 2.10.5
Pygments 2.19.2
pynvml 13.0.1
pytablewriter 1.2.1
python-dateutil 2.9.0.post0
python-dotenv 1.1.1
python-json-logger 3.3.0
python-multipart 0.0.20
pytokens 0.1.10
pytz 2025.2
PyYAML 6.0.3
pyzmq 27.1.0
RapidFuzz 3.14.1
ray 2.49.2
referencing 0.36.2
regex 2025.9.18
requests 2.32.5
rfc3986 1.5.0
rich 14.1.0
rich-toolkit 0.15.1
rignore 0.6.4
rpds-py 0.27.1
sacrebleu 2.5.1
safetensors 0.6.2
scikit-learn 1.7.2
scipy 1.15.3
sentence-transformers 5.1.1
sentencepiece 0.2.1
sentry-sdk 2.39.0
setproctitle 1.3.7
setuptools 78.1.1
shellingham 1.5.4
six 1.17.0
smmap 5.0.2
sniffio 1.3.1
soundfile 0.13.1
soxr 1.0.0
sqlitedict 2.1.0
starlette 0.48.0
sympy 1.14.0
tabledata 1.3.4
tabulate 0.9.0
tcolorpy 0.1.7
tenacity 8.3.0
threadpoolctl 3.6.0
tiktoken 0.11.0
timm 1.0.20
tokenizers 0.22.1
tomli 2.2.1
torch 2.8.0
torchaudio 2.8.0
torchvision 0.23.0
tqdm 4.67.1
tqdm-multiprocess 0.0.11
transformers 4.56.2
transformers-stream-generator 0.0.5
triton 3.4.0
typepy 1.3.4
typer 0.19.2
typing_extensions 4.15.0
typing-inspection 0.4.1
tzdata 2025.2
urllib3 2.5.0
uvicorn 0.37.0
uvloop 0.21.0
virtualenv 20.34.0
vllm 0.10.2
wandb 0.22.0
watchfiles 1.1.0
wcwidth 0.2.14
websockets 15.0.1
wheel 0.45.1
xformers 0.0.32.post1
xgrammar 0.1.23
xxhash 3.5.0
yarl 1.20.1
yt-dlp 2025.9.23
zss 1.2.0
zstandard 0.25.0
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working