GHSA-X6WF-F3PX-WCQX

Vulnerability from github – Published: 2026-04-22 20:17 – Updated: 2026-04-22 20:17
VLAI?
Summary
xmldom has XML node injection through unvalidated processing instruction serialization
Details

Summary

The package allows attacker-controlled processing instruction data to be serialized into XML without validating or neutralizing the PI-closing sequence ?>. As a result, an attacker can terminate the processing instruction early and inject arbitrary XML nodes into the serialized output.


Details

The issue is in the DOM construction and serialization flow for processing instruction nodes.

When createProcessingInstruction(target, data) is called, the supplied data string is stored directly on the node without validation. Later, when the document is serialized, the serializer writes PI nodes by concatenating <?, the target, a space, node.data, and ?> directly.

That behavior is unsafe because processing instructions are a syntax-sensitive context. The closing delimiter ?> terminates the PI. If attacker-controlled input contains ?>, the serializer does not preserve it as literal PI content. Instead, it emits output where the remainder of the payload is treated as live XML markup.

The same class of vulnerability was previously addressed for CDATA sections (GHSA-wh4c-j3r5-mjhp / CVE-2026-34601), where ]]> in CDATA data was handled by splitting. The serializer applies no equivalent protection to processing instruction data.


Affected code

lib/dom.jscreateProcessingInstruction (lines 2240–2246):

createProcessingInstruction: function (target, data) {
    var node = new ProcessingInstruction(PDC);
    node.ownerDocument = this;
    node.childNodes = new NodeList();
    node.nodeName = node.target = target;
    node.nodeValue = node.data = data;
    return node;
},

No validation is performed on data. Any string including ?> is stored as-is.

lib/dom.js — serializer PI case (line 2966):

case PROCESSING_INSTRUCTION_NODE:
    return buf.push('<?', node.target, ' ', node.data, '?>');

node.data is emitted verbatim. If it contains ?>, that sequence terminates the PI in the output stream and the remainder appears as active XML markup.

Contrast — CDATA (line 2945, patched):

case CDATA_SECTION_NODE:
    return buf.push(g.CDATA_START, node.data.replace(/]]>/g, ']]]]><![CDATA[>'), g.CDATA_END);

PoC

Minimal (from @tlsbollei report, 2026-04-01)

const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom');

const doc = new DOMImplementation().createDocument(null, 'r', null);
doc.documentElement.appendChild(
    doc.createProcessingInstruction('a', '?><z/><?q ')
);
console.log(new XMLSerializer().serializeToString(doc));
// <r><?a ?><z/><?q ?></r>
//          ^^^^ injected <z/> element is active markup

With re-parse verification (from @tlsbollei report)

const assert = require('assert');
const { DOMParser, XMLSerializer } = require('@xmldom/xmldom');

const doc = new DOMParser().parseFromString('<r/>', 'application/xml');
doc.documentElement.appendChild(doc.createProcessingInstruction('a', '?><z/><?q '));
const xml = new XMLSerializer().serializeToString(doc);
assert.strictEqual(new DOMParser().parseFromString(xml, 'application/xml')
    .getElementsByTagName('z').length, 1); // passes — z is a real element

Impact

An application that uses the package to build XML from untrusted input can be made to emit attacker-controlled elements outside the intended PI boundary. That allows the attacker to alter the meaning and structure of generated XML documents.

In practice, this can affect any workflow that generates XML and then stores it, forwards it, signs it, or hands it to another parser. Realistic targets include XML-based configuration, policy documents, and message formats where downstream consumers trust the serialized structure.

As noted by @tlsbollei: this is the same delimiter-driven XML injection bug class previously addressed by GHSA-wh4c-j3r5-mjhp for createCDATASection(). Fixing CDATA while leaving PI creation and PI serialization unguarded leaves the same standards-constrained issue open for another node type.


Disclosure

This vulnerability was publicly disclosed at 2026-04-06T11:25:07Z via xmldom/xmldom#987, which was subsequently closed without being merged.


Fix Applied

⚠ Opt-in required. Protection is not automatic. Existing serialization calls remain vulnerable unless { requireWellFormed: true } is explicitly passed. Applications that pass untrusted data to createProcessingInstruction() or mutate PI nodes with untrusted input (via .data = or CharacterData mutation methods) should audit all serializeToString() call sites and add the option.

XMLSerializer.serializeToString() now accepts an options object as a second argument. When { requireWellFormed: true } is passed, the serializer throws InvalidStateError before emitting any ProcessingInstruction node whose .data contains ?>. This check applies regardless of how ?> entered the node — whether via createProcessingInstruction directly or a subsequent mutation (.data =, CharacterData methods).

On @xmldom/xmldom ≥ 0.9.10, the serializer additionally applies the full W3C DOM Parsing §3.2.1.7 checks when requireWellFormed: true:

  1. Target check: throws InvalidStateError if the PI target contains a : character or is an ASCII case-insensitive match for "xml".
  2. Data Char check: throws InvalidStateError if the PI data contains characters outside the XML Char production.
  3. Data sequence check: throws InvalidStateError if the PI data contains ?>.

On @xmldom/xmldom ≥ 0.8.13 (LTS), only the ?> data check (check 3) is applied. The target and XML Char checks are not included in the LTS fix.

PoC — fixed path

const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom');

const doc = new DOMImplementation().createDocument(null, 'r', null);
doc.documentElement.appendChild(doc.createProcessingInstruction('a', '?><z/><?q '));

// Default (unchanged): verbatim — injection present
const unsafe = new XMLSerializer().serializeToString(doc);
console.log(unsafe);
// <r><?a ?><z/><?q ?></r>

// Opt-in guard: throws InvalidStateError before serializing
try {
  new XMLSerializer().serializeToString(doc, { requireWellFormed: true });
} catch (e) {
  console.log(e.name, e.message);
  // InvalidStateError: The ProcessingInstruction data contains "?>"
}

The guard catches ?> regardless of when it was introduced:

// Post-creation mutation: also caught at serialization time
const pi = doc.createProcessingInstruction('target', 'safe data');
doc.documentElement.appendChild(pi);
pi.data = 'safe?><injected/>';
new XMLSerializer().serializeToString(doc, { requireWellFormed: true });
// InvalidStateError: The ProcessingInstruction data contains "?>"

Why the default stays verbatim

The W3C DOM Parsing and Serialization spec §3.2.1.3 defines a require well-formed flag whose default value is false. With the flag unset, the spec explicitly permits serializing PI data verbatim. This matches browser behavior: Chrome, Firefox, and Safari all emit ?> in PI data verbatim by default without error.

Unconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in requireWellFormed: true flag allows applications that require injection safety to enable strict mode without breaking existing code.

Residual limitation

createProcessingInstruction(target, data) does not validate data at creation time. The WHATWG DOM spec (§4.5 step 2) mandates an InvalidCharacterError when data contains ?>; enforcing this check unconditionally at creation time is a breaking change and is deferred to a future breaking release.

When the default serialization path is used (without requireWellFormed: true), PI data containing ?> is still emitted verbatim. Applications that do not pass requireWellFormed: true remain exposed.

Show details on source website

{
  "affected": [
    {
      "package": {
        "ecosystem": "npm",
        "name": "@xmldom/xmldom"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "0.8.13"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "npm",
        "name": "@xmldom/xmldom"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0.9.0"
            },
            {
              "fixed": "0.9.10"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "npm",
        "name": "xmldom"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "last_affected": "0.6.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-41675"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-91"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-04-22T20:17:58Z",
    "nvd_published_at": null,
    "severity": "HIGH"
  },
  "details": "## Summary\n\nThe package allows attacker-controlled processing instruction data to be serialized into XML without validating or neutralizing the PI-closing sequence `?\u003e`. As a result, an attacker can terminate the processing instruction early and inject arbitrary XML nodes into the serialized output.\n\n---\n\n## Details\n\nThe issue is in the DOM construction and serialization flow for processing instruction nodes.\n\nWhen `createProcessingInstruction(target, data)` is called, the supplied `data` string is stored directly on the node without validation. Later, when the document is serialized, the serializer writes PI nodes by concatenating `\u003c?`, the target, a space, `node.data`, and `?\u003e` directly.\n\nThat behavior is unsafe because processing instructions are a syntax-sensitive context. The closing delimiter `?\u003e` terminates the PI. If attacker-controlled input contains `?\u003e`, the serializer does not preserve it as literal PI content. Instead, it emits output where the remainder of the payload is treated as live XML markup.\n\nThe same class of vulnerability was previously addressed for CDATA sections (GHSA-wh4c-j3r5-mjhp / CVE-2026-34601), where `]]\u003e` in CDATA data was handled by splitting. The serializer applies no equivalent protection to processing instruction data.\n\n---\n\n## Affected code\n\n**`lib/dom.js` \u2014 `createProcessingInstruction` (lines 2240\u20132246):**\n\n```js\ncreateProcessingInstruction: function (target, data) {\n    var node = new ProcessingInstruction(PDC);\n    node.ownerDocument = this;\n    node.childNodes = new NodeList();\n    node.nodeName = node.target = target;\n    node.nodeValue = node.data = data;\n    return node;\n},\n```\n\nNo validation is performed on `data`. Any string including `?\u003e` is stored as-is.\n\n**`lib/dom.js` \u2014 serializer PI case (line 2966):**\n\n```js\ncase PROCESSING_INSTRUCTION_NODE:\n    return buf.push(\u0027\u003c?\u0027, node.target, \u0027 \u0027, node.data, \u0027?\u003e\u0027);\n```\n\n`node.data` is emitted verbatim. If it contains `?\u003e`, that sequence terminates the PI in the output\nstream and the remainder appears as active XML markup.\n\n**Contrast \u2014 CDATA (line 2945, patched):**\n\n```js\ncase CDATA_SECTION_NODE:\n    return buf.push(g.CDATA_START, node.data.replace(/]]\u003e/g, \u0027]]]]\u003e\u003c![CDATA[\u003e\u0027), g.CDATA_END);\n```\n\n---\n\n## PoC\n\n### Minimal (from @tlsbollei report, 2026-04-01)\n\n```js\nconst { DOMImplementation, XMLSerializer } = require(\u0027@xmldom/xmldom\u0027);\n\nconst doc = new DOMImplementation().createDocument(null, \u0027r\u0027, null);\ndoc.documentElement.appendChild(\n    doc.createProcessingInstruction(\u0027a\u0027, \u0027?\u003e\u003cz/\u003e\u003c?q \u0027)\n);\nconsole.log(new XMLSerializer().serializeToString(doc));\n// \u003cr\u003e\u003c?a ?\u003e\u003cz/\u003e\u003c?q ?\u003e\u003c/r\u003e\n//          ^^^^ injected \u003cz/\u003e element is active markup\n```\n\n### With re-parse verification (from @tlsbollei report)\n\n```js\nconst assert = require(\u0027assert\u0027);\nconst { DOMParser, XMLSerializer } = require(\u0027@xmldom/xmldom\u0027);\n\nconst doc = new DOMParser().parseFromString(\u0027\u003cr/\u003e\u0027, \u0027application/xml\u0027);\ndoc.documentElement.appendChild(doc.createProcessingInstruction(\u0027a\u0027, \u0027?\u003e\u003cz/\u003e\u003c?q \u0027));\nconst xml = new XMLSerializer().serializeToString(doc);\nassert.strictEqual(new DOMParser().parseFromString(xml, \u0027application/xml\u0027)\n    .getElementsByTagName(\u0027z\u0027).length, 1); // passes \u2014 z is a real element\n```\n\n---\n\n## Impact\n\nAn application that uses the package to build XML from untrusted input can be made to emit attacker-controlled elements outside the intended PI boundary. That allows the attacker to alter the meaning and structure of generated XML documents.\n\nIn practice, this can affect any workflow that generates XML and then stores it, forwards it, signs it, or hands it to another parser. Realistic targets include XML-based configuration, policy documents, and message formats where downstream consumers trust the serialized structure.\n\nAs noted by @tlsbollei: this is the same delimiter-driven XML injection bug class previously addressed by GHSA-wh4c-j3r5-mjhp for `createCDATASection()`. Fixing CDATA while leaving PI creation and PI serialization unguarded leaves the same standards-constrained issue open for another node type.\n\n---\n\n## Disclosure\n\nThis vulnerability was publicly disclosed at 2026-04-06T11:25:07Z via\n[xmldom/xmldom#987](https://github.com/xmldom/xmldom/pull/987), which was subsequently closed\nwithout being merged.\n\n---\n\n## Fix Applied\n\n\u003e **\u26a0 Opt-in required.** Protection is not automatic. Existing serialization calls remain\n\u003e vulnerable unless `{ requireWellFormed: true }` is explicitly passed. Applications that pass\n\u003e untrusted data to `createProcessingInstruction()` or mutate PI nodes with untrusted input\n\u003e (via `.data =` or `CharacterData` mutation methods) should audit all `serializeToString()`\n\u003e call sites and add the option.\n\n`XMLSerializer.serializeToString()` now accepts an options object as a second argument. When `{ requireWellFormed: true }` is passed, the serializer throws `InvalidStateError` before emitting any ProcessingInstruction node whose `.data` contains `?\u003e`. This check applies regardless of how `?\u003e` entered the node \u2014 whether via `createProcessingInstruction` directly or a subsequent mutation (`.data =`, `CharacterData` methods).\n\nOn `@xmldom/xmldom` \u2265 0.9.10, the serializer additionally applies the full W3C DOM Parsing \u00a73.2.1.7 checks when `requireWellFormed: true`:\n\n1. **Target check**: throws `InvalidStateError` if the PI target contains a `:` character or is an ASCII case-insensitive match for `\"xml\"`.\n2. **Data Char check**: throws `InvalidStateError` if the PI data contains characters outside the XML Char production.\n3. **Data sequence check**: throws `InvalidStateError` if the PI data contains `?\u003e`.\n\nOn `@xmldom/xmldom` \u2265 0.8.13 (LTS), only the `?\u003e` data check (check 3) is applied. The target and XML Char checks are not included in the LTS fix.\n\n### PoC \u2014 fixed path\n\n```js\nconst { DOMImplementation, XMLSerializer } = require(\u0027@xmldom/xmldom\u0027);\n\nconst doc = new DOMImplementation().createDocument(null, \u0027r\u0027, null);\ndoc.documentElement.appendChild(doc.createProcessingInstruction(\u0027a\u0027, \u0027?\u003e\u003cz/\u003e\u003c?q \u0027));\n\n// Default (unchanged): verbatim \u2014 injection present\nconst unsafe = new XMLSerializer().serializeToString(doc);\nconsole.log(unsafe);\n// \u003cr\u003e\u003c?a ?\u003e\u003cz/\u003e\u003c?q ?\u003e\u003c/r\u003e\n\n// Opt-in guard: throws InvalidStateError before serializing\ntry {\n  new XMLSerializer().serializeToString(doc, { requireWellFormed: true });\n} catch (e) {\n  console.log(e.name, e.message);\n  // InvalidStateError: The ProcessingInstruction data contains \"?\u003e\"\n}\n```\n\nThe guard catches `?\u003e` regardless of when it was introduced:\n\n```js\n// Post-creation mutation: also caught at serialization time\nconst pi = doc.createProcessingInstruction(\u0027target\u0027, \u0027safe data\u0027);\ndoc.documentElement.appendChild(pi);\npi.data = \u0027safe?\u003e\u003cinjected/\u003e\u0027;\nnew XMLSerializer().serializeToString(doc, { requireWellFormed: true });\n// InvalidStateError: The ProcessingInstruction data contains \"?\u003e\"\n```\n\n### Why the default stays verbatim\n\nThe W3C DOM Parsing and Serialization spec \u00a73.2.1.3 defines a `require well-formed` flag whose **default value is `false`**. With the flag unset, the spec explicitly permits serializing PI data verbatim. This matches browser behavior: Chrome, Firefox, and Safari all emit `?\u003e` in PI data verbatim by default without error.\n\nUnconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in `requireWellFormed: true` flag allows applications that require injection safety to enable strict mode without breaking existing code.\n\n### Residual limitation\n\n`createProcessingInstruction(target, data)` does not validate `data` at creation time. The WHATWG DOM spec (\u00a74.5 step 2) mandates an `InvalidCharacterError` when `data` contains `?\u003e`; enforcing this check unconditionally at creation time is a breaking change and is deferred to a future breaking release.\n\nWhen the default serialization path is used (without `requireWellFormed: true`), PI data containing `?\u003e` is still emitted verbatim. Applications that do not pass `requireWellFormed: true` remain exposed.",
  "id": "GHSA-x6wf-f3px-wcqx",
  "modified": "2026-04-22T20:17:58Z",
  "published": "2026-04-22T20:17:58Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/xmldom/xmldom/security/advisories/GHSA-x6wf-f3px-wcqx"
    },
    {
      "type": "WEB",
      "url": "https://github.com/xmldom/xmldom/commit/7207a4b0e0bcc228868075ed991665ef9f73b1c2"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/xmldom/xmldom"
    },
    {
      "type": "WEB",
      "url": "https://github.com/xmldom/xmldom/releases/tag/0.8.13"
    },
    {
      "type": "WEB",
      "url": "https://github.com/xmldom/xmldom/releases/tag/0.9.10"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:H/VA:N/SC:N/SI:N/SA:N",
      "type": "CVSS_V4"
    }
  ],
  "summary": "xmldom has XML node injection through unvalidated processing instruction serialization"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.


Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…