I’ve talked before about how security doesn’t have to be hard. In fact, if approached correctly the basics can be easy.
Let’s start with the most basic of all. Patching. I’ve compiled a list of articles, linked at the bottom of this page, but I’ve read several times that, surprisingly, one of the number one security problems in IT today is simply that systems aren’t patched.
I mean, wow. Something that seems so simple yet it’s not being done. Why?
Well, upon closer inspection it’s not as easy as everyone thinks (at least at first). Let’s say you’ve got a datacenter with about 150 systems. How do you keep them all patched? At several places I’ve worked at, they have “Patch Day” (just like they have “Root Password Change Day”). This literally means that for one particular day or set of days everyone manually patches all the systems.
There’s a semi-famous xkcd comic that shows, given the time and frequency of a repetitive task, how long can you work on automating a task before it is worth it? Take a look, you might be surprised. You can spend a lot of time building patching automation and trust me, it’s worth it.
So let’s talk about how you would automate patching. There’s three ways I can think of off the top of my head, in increasing order of elegance and innovation.
The Easy Way: Patch Via Cron
So a pretty easy way, and a very common way, is to just patch automatically using scheduled tasks of some sort. This method is very commonly shows in various tutorials, and is fairly straightforward to set up. For Linux hosts, it’s typically just creating a cron job for “yum update” or “apt-get upgrade”. For Windows hosts (and I’m not a Windows guy so forgive me here), you can set up automated updates inherently within the OS.
Although this way seems easy and straightforward, and 90% of the time is just fine, I have seen this method break production. For many Linux distros, if you only patch off the recognized upstream then most of the time there’s no issue, as the patches are pretty thoroughly tested for interoperability. But the issue occurs when you’ve got a tricky configuration, or you have software installed from a non-distro repo, or installed manually. You can try to mitigate this by having certain repos or packages excluded during the yum update, but now you’re getting into system-specific configurations. If you’re going to get that involved, then you should look at something like…
The Modern Way: Configuration Management Tool: puppet, chef, ansible
So the more modern way, that especially in large, professional datacenters is much more common, is to utilize a configuration management tool such as puppet, chef, or ansible. If you’re using a tool like this, then you’re probably also using a centralized monitoring and control node, such as Chef Server or Ansible Tower.
This method allows you to patch all your servers with one button press, and it also lets you monitor the deployment status across your fleet. More importantly, it lets you roll out patches to a development and testing fleet before you roll out to production, which lets you ensure that the patches are not going to break anything.
So there’s a huge assumption right away with this method, and that assumption is that you have a development/testing/staging suite. Here is where the real benefits of automation become quickly apparent. If you are running systems in any sort of production state, and cannot tolerate downtime for business purposes, then this is the first step that you need to be moving towards.
Let’s say you run a consumer-facing website and your desire is for 100% service availability. You’re probably going to start with designing high-availability into your solution architecture, by clustering and failover and whatnot. But just as important, is that you need a duplicate copy of your production environment running on a set of duplicate hardware, so you can deploy and test changes. And I mean duplicate, as in exactly the same. You might have a second server running where you deploy beta versions of your product for user testing, but unless the server is configured the same as your production hosts then you’re not really testing that a patch won’t break anything.
By having your production systems managed and configured by something like Ansible, you’ve now allowed yourself to be able to launch a duplicate copy of your server as a testing node, and you’re pretty much ensured that it’s almost identical to your production server.
So this is definitely better, but still requires a fair amount of manual work. As you’ve probably realized by now, I am a huge proponent of Business Velocity, which is not really achievable without a large amount of automation. Which is why I want to talk next about…
The Cutting-Edge Innovative Way: Continuous Integration For Systems
Ah, Continuous Integration and Continuous Delivery, affectionately known as CI/CD. This is the holy grail for teams building web-based software applications. And it’s possible that your team is even doing this. Their in-house-constructed software product undergoes an entire automated unit test suite, and might even be automatically deployed to a server for integration and systems testing.
But is your ENTIRE pipeline automated? Do you have to still manually patch the servers that the software is automatically deployed to? Are you testing the correct functionality of the server itself? The custom software might be tested, but what about nginx or apache or postgresql?
If you’ve already got the automation built in Ansible to roll out a server with just a single command, then take the next step and roll that into the CI/CD pipeline. Since you’re probably using virtualization, then have your nightly build terminate the existing VM, then launch a new one from scratch and do the full build, all the way from OS to running production software product.
And they key here, that we were talking about from the beginning? That automated system build process should perform a full patching run. Furthermore, now you can monitor the known security issue news feeds and make sure that your automation script is fixing known vulnerabilities as well. Then, every single night, you are automatically testing that your entire enterprise, to include both developed software but also “shrink-wrapped” products (which might include open source, such as your Wikimedia server or Jira server). This way, you are not just patching every day, but you are making sure that the patching doesn’t break anything.
The next step here, and the one that is a game-changer, is if you can automatically verify that everything in the nightly development environment build is functional, then you’ve now got a single-button-press (or perhaps even automatic!) deployment of the latest patches and vulnerability fixes to your production environment. You’ve now got your entire enterprise being kept up to date with security and patching, and being fully verified. Your days are now spent simply checking the morning dashboard to ensure things are humming along, then instead of running around patching everything you can actually go do proactive things.
Now look back…this is actually a roadmap for your enterprise. You can’t do the full CI/CD pipeline with first having the automated system builds in place. Even if you can’t get to full CI/CD, then moving along this maturity pipeline provides many tangible benefits to your enterprise (for example, if you’ve got automated system builds then you don’t need to do bare metal backups, just data).