In our hectic cloud-based world, devops (the mixing of infrastructure operations with software development) has become the standard way we build and run high-scale sites from IaaS to SaaS. There are lessons to be learned from how we got here, especially because devops isn’t very security friendly.
Here’s how we got to this sorry state, from the perspective of someone who started working on cloud infrastructure in 1998. I’ve run both dev and ops functions in multiple cloud environments and launched two early cloud computing services. I also ran the Web & Internet Engineering program for 5 years for the University of California, where I designed and taught courses to teach coders the basics of infrastructure operations and to teach infrastructure people about software architecture.
When I studied for my BS in Computer Information Systems, we had these nicely structured roles where software architects designed code that was implemented by developers, run through QA, and handed off to operations (aka sysadmins). If the code wasn’t stable, operations rejected it. They had the same level of influence as a product manager who could also reject code that didn’t meet functional requirements.
But in the real world, there is a hierarchy of geek cred. As the cloud was forming in the days of ASP, MSP, and dotcom, if you were a CCIE or maybe an Oracle DBA, you could name your price for an operational role. You ruled the operational roost. On the other hand, if you were a Java programmer, you could also name your price and you were the top dog of software developers. A skeptic would be tempted to think that Cisco’s CLI-only interface and Java’s needless complexity were both designed to fuel employment for highly paid geeks, who would in turn recommend buying insanely overpriced Cisco network gear and giant Sun servers to run inefficient Java code. But soon, the big salaries meant there were enough CCIEs to go around, and Java developers far outnumbered operations experts to the extent that it was hard to sort Java developers from baristas.
As the Internet drove demand for very short software release cycles, it meant coders would take shortcuts in their coding and operations people would reject the code. After all, it’s the operations people who wake up at 2 AM when the site is down. It’s the developers who rolled in at 10 AM to write code between chewing slices of company-supplied pizza. Both dev and ops felt pressure from executives desperate to launch new features quickly.
I can’t tell you how many meetings I sat through where operations people yelled at developers for writing inefficient, latency intolerant, generally sloppy code that would take down servers if it was deployed at scale. We began to insist that developers write code on the same platform they would be deployed on. We made them use WAN simulators so they could see how slow the code ran over the Internet versus Ethernet. Soon, the surplus of Java developers came to the conclusion that it was simpler to fix their own bugs on the fly than it was to write tight code.
It was a seemingly elegant solution because it met the business need of short software release cycles. It removed the pesky operations release decision that was formerly in place to filter out flaky code. It also held developers accountable for carrying a pager and waking up if their code was flaky. We started talking about graceful degradation and fixing on the fly.
There’s one little problem. I think it’s genetic. It’s politically incorrect. Software developers generally make poor operations people. Sysadmin (ops) people generally make poor developers. Developers build stuff and like shiny new things. Sysadmins like stable systems that stay up because they don’t change all the time. Sure, there’s a lot in common between dev and ops: a loathing of GUI, a love for t-shirts, curious hygiene habits, Red Bull, etc., but they are wired differently. (I will not say which of these traits I share with dev and ops, but only mention that I studied computer science and computer information systems…)
You could make the argument that developers are good enough at ops (or vice versa) that the system still works. It does when site availability is your foremost goal. It’s easy to fix a little coding mistake that lowered system performance for an hour. It’s simply not possible to repair the damage done by a similar little coding mistake that allowed your customer database to be stolen. You can’t fix security on the fly. You have to do it right the first time and you have to follow tight operational procedures if you want to remain compliant with regulations.
The problem with dev and ops is that both of them will take security shortcuts if necessary to meet their goals. Dev will get the new feature out the door on time before anything else. Ops will prioritize keeping the site up and running. That’s why most IT departments today still have a security function separate and apart from development and operations. It’s also why cloud providers who blindly subscribe to the devops philosophy will likely have less secure environments. Well-run, highly secure environments will, by nature, have slower release cycles than fly-by-the-seat-of-your-pants, fix-it-as-you-go environments. And that’s OK.
The good thing about the cloud is that you can have fewer ops people because your cloud provider will have its own team of infrastructure operations. But it doesn’t mean you should put your developers in charge of operations. By all means, your developers and your operations lead should be sitting in the same room and have drinks together regularly, but they should not be the same person and they should have equal power in the organization. It’s fine if they both don’t like the security architect, but he also needs an equal seat at the table.
To do otherwise is to put your company data at risk. All the high-end security software in the world won’t save you from a combination of poorly written code and sloppy operations. It’s time to get software development, security, and operations right the first time.
(Thanks to Ted Dziuba, co-founder of Milo.com, and his awesome blog entry titled, “Devops is a Poorly Executed Scam” that prompted me to think this way.)
[Ed. note: Trend Micro would like to know what you think about this. We enthusiastically invite your comments and we will read every one of them. For very detailed information about Trend Micro and Security Built for Enterprise Virtualization and Cloud Environments, please visit our website: Enterprise Virtualization – Cloud Virtualization Environment – Trend Micro USA