Vulnerability-Lookup

GHSA-3MWP-WVH9-7528

Vulnerability from github – Published: 2026-04-03 15:35 – Updated: 2026-04-06 23:40

Summary

vLLM: Unauthenticated OOM Denial of Service via Unbounded `n` Parameter in OpenAI API Server

Details

Summary

A Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue.

Details

The root cause of this vulnerability lies in the missing upper bound checks across the request parsing and asynchronous scheduling layers:

Protocol Layer: In vllm/entrypoints/openai/chat_completion/protocol.py, the n parameter is defined simply as an integer without any pydantic.Field constraints for an upper bound.

class ChatCompletionRequest(OpenAIBaseModel):
    # Ordered by official OpenAI API documentation
    # https://platform.openai.com/docs/api/reference/chat/create
    messages: list[ChatCompletionMessageParam]
    model: str | None = None
    frequency_penalty: float | None = 0.0
    logit_bias: dict[str, float] | None = None
    logprobs: bool | None = False
    top_logprobs: int | None = 0
    max_tokens: int | None = Field(
        default=None,
        deprecated="max_tokens is deprecated in favor of "
        "the max_completion_tokens field",
    )
    max_completion_tokens: int | None = None
    n: int | None = 1
    presence_penalty: float | None = 0.0

SamplingParams Layer (Incomplete Validation): When the API request is converted to internal SamplingParams in vllm/sampling_params.py, the _verify_args method only checks the lower bound (self.n < 1), entirely omitting an upper bounds check.

    def _verify_args(self) -> None:
        if not isinstance(self.n, int):
            raise ValueError(f"n must be an int, but is of type {type(self.n)}")
        if self.n < 1:
            raise ValueError(f"n must be at least 1, got {self.n}.")

Engine Layer (The OOM Trigger): When the malicious request reaches the core engine (vllm/v1/engine/async_llm.py), the engine attempts to fan out the request n times to generate identical independent sequences within a synchronous loop.

        # Fan out child requests (for n>1).
        parent_request = ParentRequest(request)
        for idx in range(parent_params.n):
            request_id, child_params = parent_request.get_child_info(idx)
            child_request = request if idx == parent_params.n - 1 else copy(request)
            child_request.request_id = request_id
            child_request.sampling_params = child_params
            await self._add_request(
                child_request, prompt_text, parent_request, idx, queue
            )
        return queue

Because Python's asyncio runs on a single thread and event loop, this monolithic for-loop monopolizes the CPU thread. The server stops responding to all other connections (including liveness probes). Simultaneously, the memory allocator is overwhelmed by cloning millions of request object instances via copy(request), driving the host's Resident Set Size (RSS) up by gigabytes per second until the OS OOM-killer terminates the vLLM process.

Impact

Vulnerability Type: Resource Exhaustion / Denial of Service

Impacted Parties: - Any individual or organization hosting a public-facing vLLM API server (vllm.entrypoints.openai.api_server), which happens to be the primary entrypoint for OpenAI-compatible setups. - SaaS / AI-as-a-Service platforms acting as reverse proxies sitting in front of vLLM without strict HTTP body payload validation or rate limitations.

Because this vulnerability exploits the control plane rather than the data plane, an unauthenticated remote attacker can achieve a high success rate in taking down production inference hosts with a single HTTP request. This effectively circumvents any hardware-level capacity planning and conventional bandwidth stress limitations.

Severity

6.5 (Medium)


                  
                    CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Show details on source website

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0.1.0"
            },
            {
              "fixed": "0.19.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-34756"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-770"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-04-03T15:35:48Z",
    "nvd_published_at": "2026-04-06T16:16:36Z",
    "severity": "MODERATE"
  },
  "details": "### Summary\nA Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the `n` parameter in the `ChatCompletionRequest` and `CompletionRequest` Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large `n` value. This completely blocks the Python `asyncio` event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue.\n\n### Details\nThe root cause of this vulnerability lies in the missing upper bound checks across the request parsing and asynchronous scheduling layers:\n\n1. **Protocol Layer:**\n   In `vllm/entrypoints/openai/chat_completion/protocol.py`, the `n` parameter is defined simply as an integer without any `pydantic.Field` constraints for an upper bound.\n```python\nclass ChatCompletionRequest(OpenAIBaseModel):\n    # Ordered by official OpenAI API documentation\n    # https://platform.openai.com/docs/api/reference/chat/create\n    messages: list[ChatCompletionMessageParam]\n    model: str | None = None\n    frequency_penalty: float | None = 0.0\n    logit_bias: dict[str, float] | None = None\n    logprobs: bool | None = False\n    top_logprobs: int | None = 0\n    max_tokens: int | None = Field(\n        default=None,\n        deprecated=\"max_tokens is deprecated in favor of \"\n        \"the max_completion_tokens field\",\n    )\n    max_completion_tokens: int | None = None\n    n: int | None = 1\n    presence_penalty: float | None = 0.0\n```\n\n1. **SamplingParams Layer (Incomplete Validation):**\n   When the API request is converted to internal `SamplingParams` in `vllm/sampling_params.py`, the `_verify_args` method only checks the lower bound (`self.n \u003c 1`), entirely omitting an upper bounds check.\n```python\n    def _verify_args(self) -\u003e None:\n        if not isinstance(self.n, int):\n            raise ValueError(f\"n must be an int, but is of type {type(self.n)}\")\n        if self.n \u003c 1:\n            raise ValueError(f\"n must be at least 1, got {self.n}.\")\n```\n\n1. **Engine Layer (The OOM Trigger):**\n   When the malicious request reaches the core engine (`vllm/v1/engine/async_llm.py`), the engine attempts to fan out the request `n` times to generate identical independent sequences within a synchronous loop.\n```python\n        # Fan out child requests (for n\u003e1).\n        parent_request = ParentRequest(request)\n        for idx in range(parent_params.n):\n            request_id, child_params = parent_request.get_child_info(idx)\n            child_request = request if idx == parent_params.n - 1 else copy(request)\n            child_request.request_id = request_id\n            child_request.sampling_params = child_params\n            await self._add_request(\n                child_request, prompt_text, parent_request, idx, queue\n            )\n        return queue\n```\n   Because Python\u0027s `asyncio` runs on a single thread and event loop, this monolithic `for`-loop monopolizes the CPU thread. The server stops responding to all other connections (including liveness probes). Simultaneously, the memory allocator is overwhelmed by cloning millions of request object instances via `copy(request)`, driving the host\u0027s Resident Set Size (RSS) up by gigabytes per second until the OS `OOM-killer` terminates the vLLM process.\n\n### Impact\n**Vulnerability Type:** Resource Exhaustion / Denial of Service\n\n**Impacted Parties:**\n- Any individual or organization hosting a public-facing vLLM API server (`vllm.entrypoints.openai.api_server`), which happens to be the primary entrypoint for OpenAI-compatible setups.\n- SaaS / AI-as-a-Service platforms acting as reverse proxies sitting in front of vLLM without strict HTTP body payload validation or rate limitations.\n\nBecause this vulnerability exploits the control plane rather than the data plane, an unauthenticated remote attacker can achieve a high success rate in taking down production inference hosts with a single HTTP request. This effectively circumvents any hardware-level capacity planning and conventional bandwidth stress limitations.",
  "id": "GHSA-3mwp-wvh9-7528",
  "modified": "2026-04-06T23:40:16Z",
  "published": "2026-04-03T15:35:48Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-3mwp-wvh9-7528"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2026-34756"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/pull/37952"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/commit/b111f8a61f100fdca08706f41f29ef3548de7380"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/vllm-project/vllm"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
      "type": "CVSS_V3"
    }
  ],
  "summary": "vLLM: Unauthenticated OOM Denial of Service via Unbounded `n` Parameter in OpenAI API Server"
}

CVE-2026-34756 (GCVE-0-2026-34756)

Vulnerability from cvelistv5 – Published: 2026-04-06 15:40 – Updated: 2026-04-07 14:17

Title

vLLM Affected by Unauthenticated OOM Denial of Service via Unbounded `n` Parameter in OpenAI API Server

Summary

vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. This vulnerability is fixed in 0.19.0.

Severity

6.5 (Medium)


                        
                          CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

CWE

CWE-770 - Allocation of Resources Without Limits or Throttling

Assigner

GitHub_M

References

3 references

URL	Tags
https://github.com/vllm-project/vllm/security/adv…	x_refsource_CONFIRM
https://github.com/vllm-project/vllm/pull/37952	x_refsource_MISC
https://github.com/vllm-project/vllm/commit/b111f…	x_refsource_MISC

Impacted products

1 product

Vendor	Product	Version
vllm-project	vllm	Affected: >= 0.1.0, < 0.19.0

Show details on NVD website

JSON

To clipboard

{
  "containers": {
    "adp": [
      {
        "metrics": [
          {
            "other": {
              "content": {
                "id": "CVE-2026-34756",
                "options": [
                  {
                    "Exploitation": "none"
                  },
                  {
                    "Automatable": "no"
                  },
                  {
                    "Technical Impact": "partial"
                  }
                ],
                "role": "CISA Coordinator",
                "timestamp": "2026-04-07T14:16:25.517505Z",
                "version": "2.0.3"
              },
              "type": "ssvc"
            }
          }
        ],
        "providerMetadata": {
          "dateUpdated": "2026-04-07T14:17:12.597Z",
          "orgId": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
          "shortName": "CISA-ADP"
        },
        "title": "CISA ADP Vulnrichment"
      }
    ],
    "cna": {
      "affected": [
        {
          "product": "vllm",
          "vendor": "vllm-project",
          "versions": [
            {
              "status": "affected",
              "version": "\u003e= 0.1.0, \u003c 0.19.0"
            }
          ]
        }
      ],
      "descriptions": [
        {
          "lang": "en",
          "value": "vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. This vulnerability is fixed in 0.19.0."
        }
      ],
      "metrics": [
        {
          "cvssV3_1": {
            "attackComplexity": "LOW",
            "attackVector": "NETWORK",
            "availabilityImpact": "HIGH",
            "baseScore": 6.5,
            "baseSeverity": "MEDIUM",
            "confidentialityImpact": "NONE",
            "integrityImpact": "NONE",
            "privilegesRequired": "LOW",
            "scope": "UNCHANGED",
            "userInteraction": "NONE",
            "vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
            "version": "3.1"
          }
        }
      ],
      "problemTypes": [
        {
          "descriptions": [
            {
              "cweId": "CWE-770",
              "description": "CWE-770: Allocation of Resources Without Limits or Throttling",
              "lang": "en",
              "type": "CWE"
            }
          ]
        }
      ],
      "providerMetadata": {
        "dateUpdated": "2026-04-06T15:40:03.448Z",
        "orgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
        "shortName": "GitHub_M"
      },
      "references": [
        {
          "name": "https://github.com/vllm-project/vllm/security/advisories/GHSA-3mwp-wvh9-7528",
          "tags": [
            "x_refsource_CONFIRM"
          ],
          "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-3mwp-wvh9-7528"
        },
        {
          "name": "https://github.com/vllm-project/vllm/pull/37952",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/pull/37952"
        },
        {
          "name": "https://github.com/vllm-project/vllm/commit/b111f8a61f100fdca08706f41f29ef3548de7380",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/commit/b111f8a61f100fdca08706f41f29ef3548de7380"
        }
      ],
      "source": {
        "advisory": "GHSA-3mwp-wvh9-7528",
        "discovery": "UNKNOWN"
      },
      "title": "vLLM Affected by Unauthenticated OOM Denial of Service via Unbounded `n` Parameter in OpenAI API Server"
    }
  },
  "cveMetadata": {
    "assignerOrgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
    "assignerShortName": "GitHub_M",
    "cveId": "CVE-2026-34756",
    "datePublished": "2026-04-06T15:40:03.448Z",
    "dateReserved": "2026-03-30T19:17:10.225Z",
    "dateUpdated": "2026-04-07T14:17:12.597Z",
    "state": "PUBLISHED"
  },
  "dataType": "CVE_RECORD",
  "dataVersion": "5.2"
}

Sightings

Author	Source	Type	Date	Other

Nomenclature

Seen: The vulnerability was mentioned, discussed, or observed by the user.
Confirmed: The vulnerability has been validated from an analyst's perspective.
Published Proof of Concept: A public proof of concept is available for this vulnerability.
Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
Not confirmed: The user expressed doubt about the validity of the vulnerability.
Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Detection rules are retrieved from Rulezet.

Action not permitted

GHSA-3MWP-WVH9-7528

Summary

Details

Impact

CVE-2026-34756 (GCVE-0-2026-34756)

Tags

Sightings

Nomenclature