6 Cloud Security Risks Hiding Inside Your Cloud Estate
The Challenge with Traditional Network Security in the Public Cloud
If we consider what we’ve done from a security perspective in the past, it’s not that long of a list. And so, first, we have those workload agents. Agents are installed and maintained as another application that runs alongside your important workloads. Agents provide visibility into what’s running, rogue activity on the host, they detect data that shouldn’t exist, and more.
The problem with agents is that you must install and maintain them just like any other important application in your estate. When you don’t, these workloads effectively become invisible and orphaned, which is a real challenge from a security perspective. Unsupervised workloads, rather, are a little bit like unsupervised children, they find ways to get in trouble.
Another approach that’s been used in the past has been the network scan. Network scanners probe resources externally looking for known configuration errors. And these two solutions have been the cornerstone of data center security for many years.
If you want to understand a physical machine security posture, for example, you still are necessarily using one or both of these technologies. And in fact, we still use these today. And even in very modern public cloud environments, where agents used to run on physical machines, they now run on virtual machines. And where network scanners used to be deployed into our physical networks, they now run as appliances in our virtual networks.
And then, in about the last five or six years, we’ve also added a third category of security applications since adopting the public cloud, and that is the CSPM or Cloud Security Posture Manager. CSPMs monitor the cloud configuration itself, the roles those resources have password policies for cloud accounts, that sort of thing.
The Cloud Security Paradigm: CSPM or CWPP
So just where does Orca fit in? We’ve already talked about the Gartner categorization of the CSPM or cloud security posture manager. The CSPM reads the configuration of all the security controls and dials at your cloud of choice by API. It’s good at measuring a wide variety of services you consume there, but at a relatively shallow depth. A good example of a typical CSPM test is, is my S3 or is my storage bucket open to the public, or is my EC2 security group permissive to the world?
So there exists another category of tools, the CWPP or cloud workload protection platform. Unlike the CSPM, the CWPP’s role is to understand which risks exist inside your workloads. Identifying operating system and software vulnerabilities is really CWPP priority number one. Most often, this is done by installing workload agents per asset or network scanners per network. The visibility here is all about hosts and not about the cloud control plane at all, if you like, like a CSPM.
One of the largest disadvantages, though, and we’ll talk about this consistently throughout the rest of our presentation, is that per asset integration with security agents, it’s a real challenge in modern cloud environments. It also represents a real lack of context because that agent sits inside only that one particular workload. There’s a lack of understanding around what exists just outside that workload, like the roles that grant it permission or the security groups that control traffic in and out of the workload.
Orca Security: The Best of CSPM and CWPP
With Orca, we’re really combining the role of a CSPM and a CWPP and sharing the intelligence from one side to the other, truly the best of both worlds. Orca understands down to the data layer, whether that’s in your virtual machines, your containers, or your storage buckets. And so, when I look at the sorts of scenarios that Orca finds from a host space risk perspective, at Orca, we really try to take our findings and drill down at least one level deeper, more than just an alert, but also a path forward in terms of resolution.
It’s great the platform can tell me these sorts of cloud security risks hiding in my environment; most can’t even get me that far. But what the next question I might have, for example, if I find a rogue SSH key in a workload, the next thing I might want to know is, is there anything else in my environment that those keys provide access to? If I find malware, certainly in that case, I’d want to know, is the machine that I found it on accessible? Can I contain that damage? Is that resource even turned on?
And that may seem a little bit silly compared to traditional approaches to this problem that rely on a resource being turned on and powered on. Orca can scan a workload while it’s off, no problem, thanks to this concept of SideScanning, something we’ll speak about in quite a bit of detail soon. And so, if that machine is powered off, for example, I can breathe a little bit easier in knowing that I can go and quarantine that machine and deal with that malware problem. It’s not actively kind of being pursued from the outside.
If I find cloud keys that a developer has embedded in a test script, for example, the next thing I might want to know is, are those keys for accounts that I control? And if so, for what users? Or is it something that someone spun up on a credit card to try and test something, for example?
I’ll need that information to go back and find the root cause and fix these problems. What if I misplaced PII, whether it’s in the form of email addresses, or social security numbers, or credit card numbers, anything like that?
The very next thing I might want to know is where I found that data. Is that repository or location public? Can it be viewed from the internet at large? Those are the questions I might have. And that’s what we try to focus on at Orca is detecting absolutely, but then do I have the tools and the platform to sort of understand where I am and ultimately solve the problem?
This notion of context then really is our best strategy to combat alert fatigue. And indeed, alert fatigue is our first cloud security risk. And we’re going to be talking about alert fatigue is number one. Workload coverage is number two. Vulnerabilities and malware are number three. Our fourth risk is going to be misconfigurations, and that’s the category that Gartner talks a lot about, that misconfigurations will be responsible for well over 95% of all the breach activity that happens on the public cloud, and there’s a definite reason for that, we’ll talk about that. Number five is lateral movement risk. And our last cloud security risk, number six, is authentication risk.
Cloud Security Risk #1: Alert Fatigue
We hear so much about alert fatigue and it really boils down to two main problems. The first is if we alert on every single software vulnerability in your cloud estate, really in any sort of sizable environment, you’re going to be generating tens of thousands of alerts. So there has to be this next level filter and probably several levels of filter really that filter out those that are important for me, and those that are actionable and those that really aren’t. It’s not that I don’t want to know about all those vulnerabilities, but what are the ones that can do something about and should do something about now and which can wait?
We spoke of earlier that context is really the answer. What if these vulnerabilities have no fix, for example? What am I necessarily supposed to do with that information? Again, it’s information I want to be categorized and cataloged, but it’s not information that supersedes really anything else that truly does need my immediate attention.
You’re never going to get to the bottom of that vulnerability list. Having something that can work outside of this traditional vulnerability or malware detection kind of silo and understand more context is really, really key here.
So, let’s say that somehow, we tackle this challenge of context and we find the risk strategy that works for us. The next problem is, does the alert have enough information in it for me to be able to solve the problem. Does the platform or feature that generates these alerts have this information kind of in it or do I have to navigate to the cloud console or even other third-party tools to address this? And I’ll give you a really good example.
We had a customer detect a new occurrence of malware that was relatively not well understood yet. And so, throughout the process of better understanding kind of the scenario, they utilized other features of the Orca platform to be able to learn whether or not others were trying to sort of brute force their way onto the host. So, there’s that capability with an Orca.
And also, because Orca measures workload CPU usage and kind of charts that out in a really easy to understand way, we can also tell that since that malware landed in that workload, the CPU had been busier from a computational perspective than before. We could see that trend very clearly. And so, it’s about having enough information, yes, in the alert, but then does the platform also have enough subsequent features and dashboards and visibility for me to sort of piece those pieces of the puzzle together and ultimately solve the problem?
IT security professionals also miss alerts because they find so many false positives. So, the trust and the accuracy kind of wanes a little bit. And this happens when we risk alerts in a vacuum without an understanding of its environmental context. As we’ve spoken up about before, many platforms generate alerts based entirely on the CVSS score for vulnerability, which doesn’t make sense. And it adds to the problem of alert fatigue.
If a remotely exploitable vulnerability, for example, is found on an isolated workload with no inbound network access, should that be risked as critical and take my attention away from something else that is actionable? And that’s really the question.
We’ve talked a little bit about how Orca understands data on the host via this very powerful SideScanning approach that does not require an agent and in fact, requires nothing running in your cloud environment at all. That very same process also analyzes the host configuration, whether that host is a traditional instance or a containerized workload, and understand which services are running, what the local firewall configuration looks like, how applications are configured, and quite a bit more.
At the same time, Orca is reading metadata in a read-only fashion from your cloud provider via API that’s focused on the configuration of everything that’s just above that workload. Concepts like security group coverage, virtual private network configuration, identity, and access roles and policies, Orca combines all of this information into what we call a context map. And Orca builds this context map by mashing up all that deep workload discovery detail combined with cloud context.
Orca will discover cloud assets and in fact, Orca exposes this detail in the asset inventory of the service, which if we have time, I’ll definitely focus on in the demo, then identify asset roles which resources are configured to do what, and the permissions that they hold. Orca will identify connectivity, which networks are public-facing versus those that aren’t.
And that’s far beyond the very basic, sort of, “Does my resource have a public IP address test?” It’s all the way down to “Is my application configured to serve on a particular port?” which VPCs have internet gateways configured, for example, and quite a bit more. And really, only then, once Orca has all of this sort of context and information, it then risks, all the alerts with that full contextual awareness.
So, let’s take an example and have a look at how these two workloads, Server 1 and Server 2, both running a web server that uses a vulnerable library, vulnerable to remote code execution risk in this case. And for the sake of example, let’s assume that the vulnerability is the very same vulnerability in both, so a context-less vulnerability scanner, like an agent, would simply report this vulnerability with its static CVSS score and both workloads would end up getting the very same score. After all, it’s the very same vulnerability, as we talked about.
Orca, on the other hand, deduces from the cloud configuration that the service on Server 1 is internet-facing, and therefore the risk level is an imminent compromise. Server 2 is not internet facing and in fact, can’t be reached directly from anywhere but a particular host, and therefore the risk level is only hazardous or medium. But Orca doesn’t stop there. We also show you that the imminent compromised risk in Server 1 puts at risk databases that contain PII, as the vulnerable web server includes keys that facilitate lateral movement.
Cloud Security Risk #2: Workload Coverage
Our customers tell us that deploying agents universally is next to impossible, even in organizations with a dedicated function to do such. This is especially true for organizations that use the cloud across the entire organization. Each business unit has its own cloud and security maturity, and its own methods of managing configuration and ultimately risk. Getting workload agents installed across each org or discipline is sort of problematic by design.
DevOps, or really any business unit that’s leveraging the cloud in some fashion, don’t want to be bogged down by security and risk. Ultimately, they’re not hitting their goals and timelines. But of course, IT and security want to ensure every workload that debuts across every cloud is built with a base level of security hardening and best practices. So as such, security tooling must be easy and hands-free, completely automatic, and transparent to end-users who are building and serving applications.
Cloud Security Risk #3: Software Vulnerabilities and Malware
We know that vulnerabilities are usually associated with poorly aging software that hasn’t been patched or updated in some time, are one of the primary attack vectors that those with bad intent leverage, which makes our continued vigilance so important.
If you’ve run any sort of vulnerability management program before, the main problem with this strategy is the noise, so to speak. Thousands and thousands of vulnerabilities will exist in the average cloud deployment. So, do you want to know about every single one? How about those which are exposed to the outside? How about those that have fixes versus those that don’t? All of those have very different risk profiles.
Because Orca performs out of band, we can do so and spend a lot of resources from a CPU and a disk I/O perspective, scanning your disks and looking for vulnerabilities and malware. Even advanced polymorphic malware that changes its characteristics each time it’s deployed, and runs is found by Orca, as we don’t have to worry about competing with the resources in your workload like agent deployments do.
Cloud Security Risk #4: Misconfiguration
As we employ more and more cloud services, we find that keeping the security controls associated with each one configured properly really becomes a challenge. Exceptions are made for access both at the network and permissions levels. And often, we fail to clean up those exceptions properly and so they persist in our network for way longer than we intend, and they create possibilities for those with bad intent.
We’ve gone from managing access on a pair or a handful of firewalls around our perimeter to literally hundreds and thousands of those controls in a micro-segmented environment. This affords us more capability from a security perspective, but it represents a significant management challenge. Do you have the visibility to understand when one of the hundreds of security controls that you utilize change?
Cloud Security Risk #5: Lateral or Adjacent Movement Risk
And that is the practice of attackers moving from resource to resource in search of their goal or the highest value data or system. They look for network information and credentials in a number of ways to help facilitate this movement.
And so, it’s critical for us not only to find those conventions that facilitate this lateral movement, like sloppy firewall controls, cloud credentials that provide access, or SSH keys that can be used to remote into systems that are closer to an attacker’s goal, but then to understand with the help of context, just what and who they provide access to. Although micro-segmentation, as we’ve already talked about, has created more opportunities for tighter security controls, it also means we have a lot more firewalls to manage. And of course, that means the chances of us making mistakes in that management goes way up.
Cloud Security Risk #6: Authentication Risk
And then, lastly, risk number six is authentication risk. And this really has everything to do with being diligent on all things identity and access related. This means everything from crafting and enforcing proper password, and access or login policies to ensuring our logging is complete and secure, to having a pre-determined policy for handling exceptions. We talked about how exceptions, poor exception handling, can really, really drift those security controls very, very quickly. So, kind of planning for those exceptions and how to handle them goes a long way.
At the host level, we have to ensure that user accounts are hardened as well, right, where no commonly used usernames and passwords are utilized, no old passwords that have been hacked and published have been utilized, and that no users external to our organizations are configured to be able to access our resources on an ongoing basis.
Demo: See Orca Security in Action
With that, let’s pivot to an Orca demo and really try to tie some of these things we’ve talked about today together. Great. So welcome to your Orca Dashboard. The first thing we often talk about, because it’s in such stark contrast to agent-based, per-asset integration strategies, is that the installation of Orca is a one and done kind of activity. And not only that, it’s on rails, and it takes about…, if you’ve done it once, it probably takes 90 seconds to do. The first time, it might take five minutes.
But it’s really all about logging into your cloud console of choice, following some really simple steps. And that’s regardless of whether it’s in AWS and Azure or GCP project. Really, what you’re doing is you’re defining a role that Orca can assume, and you’re assigning the privileges that that role has as part of that integration. And from that point on, Orca will see all of the workloads that you bring up, it’ll regularly scan the ones that exist kind of in your workload. And let’s take a look at the kind of findings that it produces.
We’ll navigate to the dashboard. And the dashboard is sort of your top five in each category. And it’s a really good way…I mean, typically, once you’ve used Orca for a little bit, you’ll have a lot less of this kind of alerts that you see in our sample demo environment that’s used to obviously build and showcase many of these. And so, any kind of occurrence of a new high severity, which is indicated by this kind of red line underneath each alert, any indication of those would obviously be important and sort of attention-worthy.
But you work your way through kind of each of the major risk categories, just making sure that you’ve got all these handled, and they include malware vulnerabilities, neglected assets, authentication, outdated resources – we talked about many of these in the in our time today -insecure configurations, data at risk, and lateral movement risk.
And so that’s your dashboard. Again, lots of stats and aggregate, really, really good for sort of watching your security trend. But you’ll do most of your alert sort of consumption probably in the alert interface.
You’ve got a lot of filters on the left-hand side in terms of being able to really focus on what’s important. If I wanted to see what’s already been compromised, I can do that. If I want to really just focus on that risky configuration that leads to a potential imminent compromise, which is kind of my favorite category to talk about in a demo perspective, I can do that.
And as you can see, many of the things we talked about, users using weak host passwords, the occurrence of malware, and resources that maybe aren’t public-facing, for example, web service and patch. And this is a really, really good example of an alert that is intelligent.
And you’ll notice when I click on it, Orca displays the attack vector, the path that somebody from the outside must take to get to this resource. You can see that this one is fronted by Cloudflare, so maybe there’s cached content being retrieved from there, and maybe not. It looks relatively directly kind of accessible. That’s probably why this alert is scored the way it is.
But if we investigate it and go a little bit further, you can see that Orca has rolled up all of these CVE findings into this one single alert. We know that it’s this poorly aging version of Apache. If we simply update Apache, it will take care of all of these vulnerabilities. Context does mean what’s around the workload, but it also means context from a software perspective.
Orca is smart enough to know that, “Hey, we could raise all these alerts individually,” and there’s another portion of the product where you can absolutely see that information, you can see it like that. Or we can raise this roll-up, this aggregation single alert that says, “Hey, if you take care of this update this version of Apache, you’re going to take care of all these CVE findings,” which is hugely powerful. And again, it’s how Orca sort of stops the inundation of that security administrator with every single finding and really tries to be intelligent about this, for example.
We talked a little bit about inventory. This is the portion of the product where you can come in. And if you’re multi-cloud, this makes a ton of sense. But even if you’re single cloud, this is the place where you can go and inventory sort of all of your network resources, right, your VPCs, what’s contained in them, which have internet gateways, which don’t, your route tables, really all that information, your storage images, right.
Part of that shifting left philosophy is, I don’t want to just scan my workloads in production, I want to scan them kind of as I’m building them before I ever deploy into production in the first place. So, we absolutely have support for that in the way of image template on Amazon that’s called an AMI scanning or container registry scanning, right, as those containers sit in that registry. But pre-deployment, we can make sure they don’t have any of those vulnerabilities or risks hiding inside them.
And then many other sorts of conventions as well including serverless. Orca has really started to apply all that same intelligence into serverless functions, right, looking for secrets inside those, for example. And we’ll have much more to come kind of in that area of functionality as well.
Conclusion: Orca Detects Every Important Cloud Security Risk across Cloud Conﬁgurations and Workloads
I could speak forever about Orca, but I know our time is limited in our session today. Hopefully, that gives you kind of enough of a crash course around the kinds of things that Orca can find and how we could find them. We’ve talked a lot about how it’s a one-and-done installation method, no per-asset integration. And that’s a huge, huge change to really any other method of discovering host-based vulnerabilities and risks. So, for that, we’ll return back to our deck.
One of the things we’ve really, really tried to do at Orca is get as close to our customers as possible. We really don’t like designing any kind of feature or capability in a vacuum. And so, we’ve really, really dive deep with double-digit numbers of customers. Orca is about two years old and so we’re really proud of the fact that we’ve got over 10 customer case studies at orca.security/reviews, more coming all the time. It’s one of the things we really pride ourselves on. So please do focus there on more case studies about how our customers find value.