A great article from 2016 came up in a recent conversation. This article has come up a few times in my conversations about DevSecOps since it was first published. Justin Smith’s The Three R’s of Enterprise Security: Rotate, Repave, and Repair is a classic must-read. I just love the elegance of the alliteration and the simplicity of the message:
Rotate datacenter credentials every few minutes or hours. Repave every server and application in the datacenter every few hours from a known good state. Repair vulnerable operating systems and application stacks consistently within hours of patch availability.
The idea of Rotating keys was eye-opening for me but makes a lot of sense. I have no idea how to do it, but I get it. Repaving reminds me of back when I first heard that Netflix doesn’t have instances with uptimes of more than 36hrs. Netflix is also who brought us the Simian Army back in 2011 that turned operations on its head by intentionally killing live production instance. Then there is Repair, which is near and dear to our hearts. I was so inspired by this article that I felt compelled to expound on the ‘repair’ with a simple alliteration of our own.
My next reaction was to realize this could get a bit silly, especially if it felt forced in any way, so I laughed it off and went back to work. It wasn’t until later that day, that it came to me.
Reject known bad components from entering your SDLC.
Replace components that don’t meet governance standards.
Respond to zero-day vulnerabilities with immediate impact assessments.
So with the OWASP A9 a little over 6 yrs old now, I thought I’d like to expound on Justin’s article with three R’s of my own.
"Reject and Respond" Approaches In The Past
When I reflect on my own career I’m reminded that we’ve been doing at least these two things prior to the rise of SCA tools. In the early 2000’s I worked for a Fortune 100 company that had a policy banning the use of open-source software. As someone who built their software, I knew that lots of open-source was, in fact, being used. The policy seemed to be just pushing liability down to folks like me but it clearly wasn’t changing behaviors. Back then the main concern was the licensing because few people truly understood open source licenses and how they mixed. The fear of polluting IP with viral licenses paralyzed corporate compliance folks but didn’t deter developers from using open-source software anyway. In this case, there was a policy to reject these components but no process to enforce or monitor for compliance.
My first experience with open-source being a security issue came back sometime before 2011 (i forget the date and google isn’t helping). Someone from infoSec came to me and asked if I knew we were using a certain open-source library in any of our apps. I was able to definitively answer because we were running maven site reports and using Sonar (now SonarSource) which also tracked libraries at the time. The bad news was I only supported one business unit and had just a handful of apps under my management. The ability to respond to this event boiled down to running end-point scanners looking for the file in question.
It’s not until my next gig at a different Fortune 100 company that I chaired an open-source governance board. Our goal was to enable developers to use open source by coming up with the process to ‘approve’ its use. Like most processes at this time, we had a ticket-based system that required developers to submit a request. This began a workflow to get approvals, or rejections, from Legal, Security, and Enterprise Architecture. Initially, developers tried to embrace it but it was slow and tedious so few came back. It was never well adopted.
During this time period, from where I sat overseeing the enterprise build system I can also tell you that developers weren’t patching (replacing) their applications. As far as they were concerned, the components weren’t broke because no tests were failing. If they upgraded them at all it was to get new features that they needed and not because a patch was available.
The Future Is Now - and It is Automated
In the past, we were limited to a manual process. There was no automated way to approve or reject components developers were asking for. Many of the manual processes like this that we’ve seen often take weeks to get through.
Developers had no visibility into the components they were using and no test to provide feedback. Combine a lack of feedback with a slow approval and you’ve got a recipe to stick with what you have. Worse yet, the developers simply stop asking and just use whatever they want under the assumption that it is easier to get forgiveness than permission.
Security and Ops folks were limited to endpoint scanning solutions to find out what files were on which servers. It’s the equivalent of assembly line workers picking their own parts with no way to know if they had been recalled and no way to track which vehicles they went into. I’m sure nobody wants to buy a vehicle with an unmanaged supply chain, just imagine what the JD Powers initial quality rating might look like
Today, thankfully, we can use automation to apply a set of defined policies, or rules, that inspect the components moving through our software factory and can alert us to the ones that don’t measure up.
Reject
Our Firewall solution can inspect new components entering your software supply chain and inspect them in real-time. Components that are too old, have security vulnerabilities, or open-source licenses that have been banned can be blocked and quarantined. Be it on the developers desktop or in the build there is immediate feedback. Sometimes there is an easy path forward like simply getting a newer version of the same component. Sometimes you have to dig in a little and see what the suggested remediation has to say, many times there is a way to mitigate security issues via a configuration change and the component can be quarantine can be ‘waived’ to allow it in. Sometimes you simply have to question the component choice and perhaps look for a different one offering similar functionality.
This type of automation completely removes the need for manual approval processes and has been shown to save 10’s of thousands of hours of work for the folks doing the approvals. That’s not counting the additional time saved while those tickets just sit in queues.
Replace
Empowering developers to see which components are failing policy checks in the tools they're already using is a game-changer. Be it in their IDE or as an automated test that runs during builds, this timely feedback gets to the heart of lean.
If you’re asking yourself how they got in there when we have the ability automatically reject them let me point out that this is a temporal problem. Security issues, in particular, are discovered every day. Components that passed the approval process in the past will enter our system only to become an issue sometime later. At Sonatype we like to say code ages like milk and not like wine.
In the Justin Smith article on Enterprise Security, he suggests that you patch your software as soon as it is available. I would agree with this for the OS layer but I would caution against it at the app layer given the new wave of poison-the-well attacks we’ve been seeing. I gave a talk on this at DevOps Days Rockies earlier this year. The summary is attackers are injecting malicious software right into the supply chain by stealing credentials or hijacking projects. Sometimes just writing a new malicious package and then adding it as a dependency to popular projects. Definitely patch those OS components right away, and if you can do that within hours as Justin Smith suggests, all the better.
When it comes to the application layer, I’ll share some advice I used to give to my DevOps team, “...we upgrade for one of two reasons, either we’re running away from something, or we’re running toward something. We run away from bugs and security issues or we run toward new features we need.” That was our criterion that we used for much of our tooling and projects.
By having a policy engine that can flag non-compliant components in the developers' IDE and build process you have the kind of feedback loop needed to make this work. Now there is a failing test and reason to look into upgrading. Often times there is a patch available and the fix is quick and simple. Over the years of watching I’ve come to realize that most vulnerability announcements are made after a patch has been released. The important thing is that if you can patch within hours you’ll be ahead of the new norm of attacks happening within days.
Respond
If you don’t keep track of which components are in which applications your ability to respond to a zero-day announcement tends to be limited to what I mentioned earlier, end-point scanning. This can be a slow and tedious process since identifying the file in question is just the start. You still have track down what app is on it and if that file and the app have anything to do with each other. This can take days or weeks. I can imagine twice-daily meetings looking for status updates on the impact assessment InfoSec is being asked for.
Conversely, with a managed software supply chain you have the same capabilities auto manufacturers have when they have a recall. They don’t ask everyone to bring in there car so they can check and see if you’re affected (end-point scanning). Instead, they know which VINs have the recalled parts and they send a letter to those owners to bring their cars in. With policies in place and a tool to continuously monitor for zero-days, every affected will alert, at the same time, sending emails to the appropriate stakeholders. InfoSec now immediately understands the impact and can begin to think about prioritization. Most importantly, no twice-daily status meetings!
At Sonatype, many of our customers have been affected by various zero-day announcements, some of which have led to massive breaches. Companies without managed software supply chains ended up making the headlines, in a bad way. Our customers were able to respond quickly, and with confidence, to avoid becoming another victim.