GHSA-PQ5C-RJHQ-QP7P
Vulnerability from github – Published: 2026-04-03 21:51 – Updated: 2026-04-06 23:20Summary
The VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py:51-62 splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path at line 47-48, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM.
Details
Vulnerable code
# video.py:51-62
def load_base64(self, media_type: str, data: str) -> tuple[npt.NDArray, dict[str, Any]]:
if media_type.lower() == "video/jpeg":
load_frame = partial(self.image_io.load_base64, "image/jpeg")
return np.stack(
[np.asarray(load_frame(frame_data)) for frame_data in data.split(",")]
# ^^^^^^^^^^
# Unbounded split — no frame count limit
), {}
return self.load_bytes(base64.b64decode(data))
The load_bytes() path (line 47-48) properly delegates to a video loader that respects self.num_frames (default 32). The load_base64("video/jpeg", ...) path bypasses this limit entirely — data.split(",") produces an unbounded list and every frame is decoded into a numpy array.
video/jpeg is part of vLLM's public API
video/jpeg is a vLLM-specific MIME type, not IANA-registered. However it is part of the public API surface:
encode_video_url()atvllm/multimodal/utils.py:96-108generatesdata:video/jpeg;base64,...URLs- Official test suites at
tests/entrypoints/openai/test_video.py:62andtests/entrypoints/test_chat_utils.py:153both use this format
Memory amplification
Each JPEG frame decodes to a full numpy array. For 640x480 RGB images, each frame is ~921 KB decoded. 5000 frames = ~4.6 GB. np.stack() then creates an additional copy. The compressed JPEG payload is small (~100 KB for 5000 frames) but decompresses to gigabytes.
Data flow
POST /v1/chat/completions
→ chat_utils.py:1434 video_url type → mm_parser.parse_video()
→ chat_utils.py:872 parse_video() → self._connector.fetch_video()
→ connector.py:295 fetch_video() → load_from_url(url, self.video_io)
→ connector.py:91 _load_data_url(): url_spec.path.split(",", 1)
→ media_type = "video/jpeg"
→ data = "<frame1>,<frame2>,...,<frame10000>"
→ connector.py:100 media_io.load_base64("video/jpeg", data)
→ video.py:54 data.split(",") ← UNBOUNDED
→ video.py:55-57 all frames decoded into numpy arrays
→ video.py:56 np.stack([...]) ← massive combined array → OOM
connector.py:91 uses split(",", 1) which splits on only the first comma. All remaining commas stay in data and are later split by video.py:54.
Comparison with existing protections
| Code Path | Frame Limit | File |
|---|---|---|
load_bytes() (binary video) |
Yes — num_frames (default 32) |
video.py:46-49 |
load_base64("video/jpeg", ...) |
No — unlimited data.split(",") |
video.py:51-62 |
{
"affected": [
{
"package": {
"ecosystem": "PyPI",
"name": "vllm"
},
"ranges": [
{
"events": [
{
"introduced": "0.7.0"
},
{
"fixed": "0.19.0"
}
],
"type": "ECOSYSTEM"
}
]
}
],
"aliases": [
"CVE-2026-34755"
],
"database_specific": {
"cwe_ids": [
"CWE-770"
],
"github_reviewed": true,
"github_reviewed_at": "2026-04-03T21:51:35Z",
"nvd_published_at": "2026-04-06T16:16:36Z",
"severity": "MODERATE"
},
"details": "## Summary\n\nThe `VideoMediaIO.load_base64()` method at `vllm/multimodal/media/video.py:51-62` splits `video/jpeg` data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The `num_frames` parameter (default: 32), which is enforced by the `load_bytes()` code path at line 47-48, is completely bypassed in the `video/jpeg` base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM.\n\n## Details\n\n### Vulnerable code\n\n```python\n# video.py:51-62\ndef load_base64(self, media_type: str, data: str) -\u003e tuple[npt.NDArray, dict[str, Any]]:\n if media_type.lower() == \"video/jpeg\":\n load_frame = partial(self.image_io.load_base64, \"image/jpeg\")\n return np.stack(\n [np.asarray(load_frame(frame_data)) for frame_data in data.split(\",\")]\n # ^^^^^^^^^^\n # Unbounded split \u2014 no frame count limit\n ), {}\n return self.load_bytes(base64.b64decode(data))\n```\n\nThe `load_bytes()` path (line 47-48) properly delegates to a video loader that respects `self.num_frames` (default 32). The `load_base64(\"video/jpeg\", ...)` path bypasses this limit entirely \u2014 `data.split(\",\")` produces an unbounded list and every frame is decoded into a numpy array.\n\n### video/jpeg is part of vLLM\u0027s public API\n\n`video/jpeg` is a vLLM-specific MIME type, not IANA-registered. However it is part of the public API surface:\n\n- `encode_video_url()` at `vllm/multimodal/utils.py:96-108` generates `data:video/jpeg;base64,...` URLs\n- Official test suites at `tests/entrypoints/openai/test_video.py:62` and `tests/entrypoints/test_chat_utils.py:153` both use this format\n\n### Memory amplification\n\nEach JPEG frame decodes to a full numpy array. For 640x480 RGB images, each frame is ~921 KB decoded. 5000 frames = ~4.6 GB. `np.stack()` then creates an additional copy. The compressed JPEG payload is small (~100 KB for 5000 frames) but decompresses to gigabytes.\n\n### Data flow\n\n```\nPOST /v1/chat/completions\n \u2192 chat_utils.py:1434 video_url type \u2192 mm_parser.parse_video()\n \u2192 chat_utils.py:872 parse_video() \u2192 self._connector.fetch_video()\n \u2192 connector.py:295 fetch_video() \u2192 load_from_url(url, self.video_io)\n \u2192 connector.py:91 _load_data_url(): url_spec.path.split(\",\", 1)\n \u2192 media_type = \"video/jpeg\"\n \u2192 data = \"\u003cframe1\u003e,\u003cframe2\u003e,...,\u003cframe10000\u003e\"\n \u2192 connector.py:100 media_io.load_base64(\"video/jpeg\", data)\n \u2192 video.py:54 data.split(\",\") \u2190 UNBOUNDED\n \u2192 video.py:55-57 all frames decoded into numpy arrays\n \u2192 video.py:56 np.stack([...]) \u2190 massive combined array \u2192 OOM\n```\n\n`connector.py:91` uses `split(\",\", 1)` which splits on only the first comma. All remaining commas stay in `data` and are later split by `video.py:54`.\n\n### Comparison with existing protections\n\n| Code Path | Frame Limit | File |\n|-----------|-------------|------|\n| `load_bytes()` (binary video) | Yes \u2014 `num_frames` (default 32) | video.py:46-49 |\n| `load_base64(\"video/jpeg\", ...)` | No \u2014 unlimited `data.split(\",\")` | video.py:51-62 |",
"id": "GHSA-pq5c-rjhq-qp7p",
"modified": "2026-04-06T23:20:56Z",
"published": "2026-04-03T21:51:35Z",
"references": [
{
"type": "WEB",
"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-pq5c-rjhq-qp7p"
},
{
"type": "ADVISORY",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2026-34755"
},
{
"type": "WEB",
"url": "https://github.com/vllm-project/vllm/pull/38636"
},
{
"type": "WEB",
"url": "https://github.com/vllm-project/vllm/commit/58ee61422169ce17e08248f8efa1e9df434fe395"
},
{
"type": "PACKAGE",
"url": "https://github.com/vllm-project/vllm"
}
],
"schema_version": "1.4.0",
"severity": [
{
"score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
"type": "CVSS_V3"
}
],
"summary": "vLLM: Denial of Service via Unbounded Frame Count in video/jpeg Base64 Processing"
}
Sightings
| Author | Source | Type | Date |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.