Azure Synapse: Local Privilege Escalation Vulnerability in Spark

Published:

Sep 01, 2022

Reading time:

6 Minutes

The story of a simple race condition leading to a Local Privilege Escalation, and how we discovered, in retrospect, that we crossed paths with another researcher and a previous Microsoft case.

The vulnerability

Azure Synapse Analytics is an analytics service  that processes  data using various types of runtimes: SQL pools, Apache Spark, Data Explorer, and different types of integration runtimes. Over the past year, the Orca Research Pod has disclosed a series of vulnerabilities and tenant separation issues regarding the Synapse service (some of our research is detailed in our SynLapse blog).

One of the ways to process data is via setting up a custom Apache Spark pool, this integrates with Python, .NET, Scala and SQL notebooks, custom package management, and different development tools.

What we did was run a Python reverse shell in an Apache Spark pool. But we did not set up our own Spark pool, rather used a pool called “systemreservedpool-dataflow” which is only used when debugging “Data flows” in Synapse.

Once we had a reverse shell over this reserved Spark pool, we started researching the VM to understand exactly what was going on. We found some interesting processes running, having to do with the pool autoscaling, resolving hosts, and performing batch jobs.

It is worth noting that we were running as “trusted-service-user”, but not as root. We couldn’t perform requests to the IMDS (169.254.169.254).

After running “sudo -l” we came over one command that we, as “trusted-service-user”, could run as root:

trusted-service-user ALL= NOPASSWD: /usr/lib/notebookutils/bin/filesharemount.sh

The script was supposed to be either used to mount, or unmount, directories as a low privileged user (trusted-service-user), because only root can actually use mount. The important part of the script looks like so:

mkdir -p "$mountPoint"
 
chown -R ${TRUSTED_SERVICE_USER}:${TRUSTED_SERVICE_USER} "$mountPoint"
 
# ….
 
mount -t cifs //"$account".file.core.windows.net/"$fileshare" "$mountPoint" -o 
vers=3.0,uid=$uid,gid=$gid,username="$account",password="$accountKey",serverino

It recursively creates the “mount point”, supplied as a command line argument, then makes it so we (trusted-service-user) own it, and then mounts it. The interesting thing here is, that if we can trigger this script to have every mount point we want, we can chown (change ownership of) any directory, including /usr/lib/notebookutils/bin/ which then means we can edit the file filesharemount.sh itself and run anything as root.

Only one problem: it validates the mount point.

if [ -z ${mountPoint} ]; then
    >&2 echo "mount point must be provided."
    exit 1
else
    check_if_is_valid_mount_point_before_mount "$mountPoint"
fi

How?

check_if_is_valid_mount_point_before_mount() {
    mountPoint=${1}
 
    if [[ ! "$mountPoint" =~ ^/synfs/[^.]+ ]]; then
        >&2 echo "Only allow do mount operation under /synfs directory!"
        exit 1
    fi
 
    if [ -f "$mountPoint" ]; then
        >&2 echo "File path $mountPoint can't be used for mount."
        exit 1
    fi
 
    if [ -d "$mountPoint" ]; then
        # check folder if is empty
        if [ "$(ls -A $mountPoint)" ]; then
          >&2 echo "Can't mount to non-empty folder $mountPoint"
          exit 1
        fi
    fi
}

We reach the interesting chown only under three conditions:

  1. Mount point starts with /synfs.
  2. Mount point is not a path to an existing file.
  3. Mount point is either not an existing directory, or an empty directory.

Keen observers and researchers with some experience will immediately recognize the potential for directory traversal (/synfs/../what/ever/path/i/want) but this doesn’t work, as the mount point is acquired using this line:

mountPoint=$(readlink -m “$value”)

Where we fully control $value.

This means it follows the path, including symlinks, so no directory traversal should be possible – as the resolved path should start with /synfs.

After some quick thinking, me and Yanir Tsarimi, a teammate and a fellow researcher, came up with an idea for a race.

Remember the order of actions:

    1. The path is resolved.
    2. The resolved path has to start with /synfs.
    3. The resolved path has to be either nonexistent or an empty directory.
    4. mkdir, and chown happen.

    I want to get to stage 4 with the path pointing to “/usr/lib/notebookutils/bin/” so I can edit the bash script under it.

    This means I have to interpose between stage 3 and 4.

    The idea is as such: 

    0. I supply the mount point “/synfs/mysymlink”, under synfs, and currently nonexistent.

    1. The path is resolved to be the same, as it is nonexistent – “/synfs/mysymlink”.

    2. The resolved path has to start with /synfsCHECK.

    3. The resolved path has to be either nonexistent or an empty directory – CHECK.

    3.5. The race: I create “/synfs/mysymlink” to be a symbolic link to “/usr/lib/notebookutils/bin”.

    4. mkdir, and chown happen to /usr/lib/notebookutils/bin.

    After writing 2 scripts, one running filesharemount.sh repeatedly, and one creating and deleting a symbolic link under /synfs, I finally got this:

    The first request to my server is before exploiting the PE, the second one is afterward.

    I will not dive too much into findings on the VM, but I will mention that we were now able to access IMDS (the credentials themselves weren’t so interesting), and every file on that machine. Some files contained credentials, including a client certificate to a logging service used by Microsoft, and a token (identified by my account) used for accessing internal Synapse Spark services and Web APIs which didn’t lead to more than some interesting information disclosures.

    The unplanned crossover

    On June 13th, 2022, Tenable published a blog regarding their findings in Azure Synapse.

    To our surprise, one of their vulnerabilities was a local privilege escalation in the Apache Spark cluster feature inside Synapse. After reading their blog, it was now clear that we found a bypass to the fix deployed by Microsoft to mitigate the issue reported by Tenable, which they considered being fully resolved.

    It is worth mentioning that we dug deeper into the machines and how the Apache Spark cluster feature worked. This got us a few leads, and after reporting these to Microsoft, they followed up by categorizing this issue as an information disclosure with an important severity level in Synapse.

    Disclosure timeline

    This was a quick disclosure, props to MSRC for that.

    About 2 weeks from initial contact to disabling the vulnerable feature.

    • June 1, 2022 – Orca discloses the vulnerability to MSRC.
    • June 2, 2022 – Microsoft’s first response.
    • June 18, 2022 – Microsoft disables the vulnerable API feature, deploying a hotfix.
    • July 21, 2022 – Microsoft fully deploys a patched version.

     

     

    Tzah Pahima is a Cloud Security Researcher at Orca Security. Follow him on Twitter @TzahPahima