It’s hard to ignore how GenAI has already become an integral part of our daily lives, with people using LLMs (Large Language Models) to create images, videos, write texts, generate code, and even to compose music.
AI is being used to enhance processes in many different industries. In the medical field, AI helps with disease diagnosis and treatment plan personalization. In the financial industry, it aids risk management and fraud detection. Virtual assistants with AI capabilities, like Siri and Alexa, enhance user experiences by carrying out tasks and responding to inquiries. AI is used in manufacturing to forecast maintenance requirements and optimize production processes. All things considered, AI advances productivity, precision, and creativity across many domains.
From the data collected from our scans on the Orca Cloud Security Platform, we also see an increasing usage of AI services in the cloud. More than half (56%) of organizations are already using AI as part of their business workflows, either in the development pipeline or for managing related tasks such as business data analysis.
However, this exploding usage of new technology comes with new risks. Some risks are similar to the ones that already exist in the cloud, and others are specific to AI usage. In this blog, we’ll discuss the five most common risks to AI models and how to protect against them.
What are the building blocks of AI?
The process of building a generic AI model requires a highly available infrastructure and scaling computational power. Also, sometimes writing a model can be tedious work. These are exactly the areas where the cloud shows its value. The ability to scale resources up and down during the learning process, store large datasets, and enable endpoints on multiple regions are the main reasons why the connection between cloud and AI is inevitable. In short, the first building block for an AI model is the cloud.
More specifically, AI is built from the following three components:
- Infrastructure – the model which is essentially an algorithm that defines the learning process towards the problem we want to solve
- Content – a dataset that will teach the algorithm about the problem. This is called training data. For example, if we want to use the model for prediction of paper towel usage in our office, we will provide data about the past usage of paper towels: How many paper packages were used each month, how many in summer, how many when it was raining, how many people worked at the office during this time, and so on. The more details are provided, and the longer the time period that the data covers, the better the model’s predictions will be.
By keeping some of this data aside as validation data, we can use it to validate the model after the training process. This allows us to make sure it learned what we expected it to learn and helps it make tweaks and improvements. - Endpoint, APIs or user interface – In some cases, we will want to provide accessibility to the model for some users. Like any other algorithm making an application, we can do this for AI models too.
Types of AI Security Risks
AI Security risks can be found in each key component of AI:
The AI infrastructure
The infrastructure of an AI model includes its code, the access and permissions to the code, and its pipelines.
Risk #1: Exposing code to the internet
One of the most common misconfigurations we see from our scans from the Orca Platform is organizations leaving their model code open to the internet: we found that nearly all organizations (94%) expose some of their model code.
In addition to potentially leading to code and model theft, this could also have other unwanted consequences, such as the installation of a cryptominer. Because, if someone is already wasting a lot of computational power, why not exploit it for some profit?
Active campaigns are already looking for these exposed AI model notebooks, like the Qubitstrike campaign where attackers were searching for publicly exposed Jupyter notebooks (that could be also used for AI training purposes) to implement crypto miners.
Risk #2: Vulnerabilities in AI packages
Another risk to the AI infrastructure could be introduced through imports and dependencies. Many popular AI packages such as PyTorch and TensorFlow can introduce vulnerabilities in your code and infrastructure.
In this example, a chain of vulnerabilities that could lead to an RCE was found in the TorchServe tool, which helps scale PyTorch models. By chaining an SSRF on a management interface API that was exposed by default, an attacker could obtain unauthorized access and upload a malicious model, severely affecting the model’s users.
The AI content
It’s a known fact that data storage is native to the cloud. According to Thales research, 60% of the world’s corporate data is already stored in the cloud. Those numbers are increasing exponentially every year. In a world of big data and fast decisions, it would be safe to assume that some of this data is serving as training (or validation) data for AI models.
Risk #3: Sensitive data in AI models
In our 2024 State of the Cloud Security report, we found that 58% of organizations store sensitive data in the cloud. If this data were to be used in AI training or validation processes, this sensitive data could be leaked. In fact, this actually happened with ChatGPT.
ChatGPT was trained on large datasets of public text from the internet. Some of this text could be somewhat sensitive, therefore the chat was also trained to not expose secrets and sensitive data. But, it turns out that with the right prompting, you could talk the chat into giving you this sensitive data.
The known story goes like this: A user requested Windows activation keys from the chat, and at first, the chat refused to give them. Then, the user changed their approach. Instead of asking for the activation keys directly, they told a touching story about their late grandmother reading them activation keys to fall asleep, and asked for some more. And… Voilà! ChatGPT provided the activation keys.
Why did ChatGPT provide the keys? The reason here is that ChatGPT has constraints on providing secrets, but doesn’t have constraints when acting like someone’s grandmother. So if this other person is telling secrets as part of their being, it’s bypassing the secrets prohibition. There are also numerous other tricks and hacks to bypass LLMs.
Yes, these keys may just be example keys, but the point is – if your model was trained on this data – it would spill it out in one way or another. So it’s important to ensure that absolutely no sensitive data is provided to your models.
Risk #4: Data leakage and tampering
Other risks to the model’s data are exfiltration, manipulation and tampering. Any change to the data of the model, changes the model – which is also known as data poisoning. So, this is a very critical risk.
This risk is mainly introduced when the data is not properly secured, or misconfigured to be publicly accessible. Below we’ve listed the potential consequences of these risks:
- Model theft – model theft doesn’t end with the theft of an algorithm. Actually, the model parameters and data is the real model theft if someone is using a ready to use algorithm.
- Adding false examples or wrong data to the training/validation dataset – this can affect the accuracy of the model.
- Changing the training-validation ratio – this can affect the entire learning process and cause overfitting (a situation where the model learns the examples and not the problem).
- Model data deletion or ransom – another way of disruption, could be by deleting and asking a ransom to return the data. For organizations that heavily depend on their models, this could affect the integrity and availability of the model’s performance.
All the above could occur just by leaving a database or storage bucket as public facing rather than setting it to private. In our 2024 State of the Cloud Security report, we found that 25% of organizations are exposing at least one database to the internet, and 73% have at least one exposed bucket. If this data was used for model training, the model would be exposed to the risks mentioned above.
The AI Users and APIs
Another concerning issue about AI models are API keys to the model interface or infrastructure left unguarded.
Risk #5: Exposed AI API keys
We found that 17% of organizations have an unencrypted API key that provides access to their AI services in their code repositories. Some of these code repositories are public. This can allow unwanted access to the model and its code, and is basically a disaster waiting to happen.
For example, until recently, OpenAI keys didn’t have privilege levels, which means that every access key that was created was an admin key, with permissions to all actions. And currently, the default permissions are still set to “All”, which means that full permissions are set for the secret key. This could lead to sensitive data theft, resource abuse and even account theft, and means that the importance of keeping those keys safe is very high.
And of course, last but not least, just like any other application, once you have an API endpoint exposed for an AI model, it’s exposed to API attacks such as SSRFs (as shown in this example), SQL injection attacks, and more.
How to protect against AI risks
By following the best practices listed below, organizations can protect against AI risks:
- Exposure management – make sure your data and models are not exposed to the internet, and are not publicly editable.
- Secure your code – Encrypt your model code.
- Scan for secrets – Scan your training and validation data for secrets.
- API key security – Keep your API keys safe, and use a secret manager to manage their usage.
- Scan your code – map your imports and dependencies, make sure that they are reliable and don’t contain severe vulnerabilities.
AI Security Posture Management
To help organizations ensure the security and integrity of their AI models, Orca provides AI Security Posture Management capabilities. Security starts with visibility. With Orca, organizations get a complete view of all AI models that are deployed in their environment—both managed and unmanaged, including any shadow AI.
In addition, Orca continuously monitors and assesses your AI models, alerting you to any risks, such as vulnerabilities, exposed data and keys, and overly permissive identities. Automated and guided remediation options are available to quickly fix any identified issues. If you would like to learn more about how to secure your AI models, schedule a 1:1 demo with one of our experts.
About the Orca Cloud Security Platform
Orca offers a unified and comprehensive cloud security platform that identifies, prioritizes, and remediates security risks and compliance issues across AWS, Azure, Google Cloud, Oracle Cloud, Alibaba Cloud, and Kubernetes. Using its patented SideScanning™ technology, Orca detects vulnerabilities, misconfigurations, malware, lateral movement, data risks, API risks, overly permissive identities, and much more – without requiring an agent.