The idea of Infrastructure as Code (IaC), or defining how servers and other infrastructure components are built by writing out a description of them instead of configuring them, interactively, step by step, is practically a pillar of DevOps and cloud native approaches. However, while IaC can make it easier, cheaper, and faster to deploy things in the cloud, if you’re not careful, it can make it possible to propagate security problems just as easily. 

Cattle Not Pets – What is IaC?

You’ll often hear the phrase “cattle not pets” used to describe IaC and, I think, it’s an evocative description. The idea is that we’ve traditionally treated computers as pets; that is, we’ve had to take the time to build each one manually, installing and configuring what’s needed – an _imperative_ approach, if you will. Even if we have a checklist of what we want to do, we still have to go step-by-step and, if we’re configuring multiple hosts, we may very well end up with small differences from host to host. And it’s not just hosts, of course. We may also be building other parts of the infrastructure, such as networks and storage, by hand.

This is, obviously, not an approach that works well if we want to scale to hundreds and thousands of hosts and if we want to do so repeatedly. Thus, we come to the notion of treating these resources as cattle – of creating them, identically for our purposes, and being able to do so repeatedly and dependably. Instead of building things step-by-step, we want to create a description of what the object should look like and have automation create and manage it for us. This is a _declarative_ approach – we write a declaration of what the host or the network or the other piece of infrastructure should look like and our systems ensure that every instance of that object follows that declaration.

There are many forms of these declarations. For example, I may want to define how my cloud services are built and deployed – Terraform, CloudFormation, Azure Resource Manager, and Google Deployment Manager templates all accomplish this goal. I may also want to define how other infrastructure is built and configured (in Ansible), how workloads are deployed (Kubernetes YAML and Docker-Compose), and how container images are built (Dockerfiles).

Keeping Our Cattle Safe

As I mentioned earlier, while declarative IaC is a very useful approach to scaling out our infrastructure and applications, it can also propagate security mistakes as easily as it can propagate anything else. What might’ve been a mistake I made a single time when I was building VMs by hand could now affect thousands of VMs or tens of thousands of containers.  The impact might even be larger than that given that I tend to copy something that worked once, whether it’s in a Terraform template or a Kubernetes YAML, and use it over and over.

Of course, with a platform like Orca Security, we can easily find these problems after they’re deployed to the cloud. We can open a Jira ticket, for example, to notify the right people and follow our existing process for fixing the misconfigurations. If we’ve managed to deploy a lot of workloads with the misconfiguration, though, this is rather inefficient. We can definitely do better.

How? By integrating scanning of our IaC artifacts as far back as when they’re checked into source code repositories so that we know, long before they’re ever used to spin up resources.

Centralizing IaC Security Policies  

We’ll start this by defining policies for which problems we’re concerned about in our artifacts.  In this example, I have a Dockerfile that I use to build a demo app so it fits into my “Demo Apps” policy. I’ve filtered down to just the controls that apply to Dockerfiles and, from here, we can make changes to which controls will apply in our policy – for example, while it’s best practice to use the Docker command ADD to add files to an image instead of using cURL to get them, I might accept the risk of using cURL in my demo apps and, thus, not apply that particular control.  In this way, security teams can centrally manage the policies being applied to our artifacts, ensuring that what is highlighted matches the desired outcomes.

Centralize IaC by defining policies for which problems we’re concerned about in our artifacts.

Now that we’ve defined our policy, we want to apply it to the source repo where I build my demo app so that, any time I change the Dockerfile (or any other IaC files), I get guidance on where I have security issues and how to address them so that they don’t introduce unacceptable risk when deployed. In this case, I also have a threshold in the policy that will block wherever the scan is run if any high or critical severity issues are found. (This is somewhat controversial – I know orgs who want to never fail a PR/build and I know other orgs who are quite keen to do so. You have the flexibility to implement IaC scanning in a way that meets your organization’s needs).

Integrating Security into the CI/CD Pipeline

This scanning can easily be added to a CI/CD pipeline at build time or, as I’m doing here, to my source repo to scan artifacts that are pushed to the repo or that are in a pull request. I use GitHub for my own code so I’m using a GitHub action; however, under the hood, this uses a lightweight CLI that can just as easily be implemented in other platforms.

I have my action configured to run on every push request; in other words, every time I add or change something in this source repo, it’ll get scanned for the policies I want to scan for. As we can see below, the action has run on an update to the repo and it’s found a high severity issue that caused a failure – something that will definitely wake me up.

This scanning can easily be added to a CI/CD pipeline at build time or to a source repo to scan artifacts that are pushed to the repo or that are in a pull request.

My image doesn’t have a USER specified in it. This definitely violates best practice for container images as, by default, this will result in my image creating containers that run as root.  Based on the policy, I will have to fix this issue. There are also a number of other, less severe issues that I should also resolve now that I’m aware of them.

With GitHub (and other platforms that support SARIF output), these results can also be automatically consumed and integrated into the user experience. Here, we see GitHub has integrated the results into the Security tab of my repo, making a seamless experience for me to identify and address these results. When I add the USER directive to my Dockerfile in my next push, the new scan will automatically resolve that outstanding error in this list.

We see that GitHub has integrated the results into the Security tab of my repo, making a seamless experience for me to identify and address these results.

Conclusion

Infrastructure as Code is one of the engines driving the DevOps, cloud native revolution.  Adding Orca’s Shift-Left IaC scanning to the mix allows security teams to seamlessly identify and address potential risks in the code written to deploy infrastructure without slowing down the process with manual interventions.