- Community Home
- >
- Software
- >
- Enterprise Security
- >
- Following the Wh1t3 Rabbit - Practical Enterprise Security
- >
- "If it ain't broke" - Antifragile and Information ...
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
"If it ain't broke" - Antifragile and Information Security
I assume you've all heard the expression, "If it ain't broke, don't fix it." Apparently, that's not only wrong, but it's a really dangerous way to think, says Nassim Taleb in his book Antifragile. In short, Taleb points out that certain things gain from disorder. I was introduced to this concept through a brilliant piece on Jez Humble's blog which summarizes Taleb’s ideas pretty clearly.
I don't want to restate Humble's points, and screw something up, so I urge you to read his work yourself. What I would like to do is get you to start thinking about what "antifragile" means as applied to our craft of information security. My piece on enterprise resiliency [Sept '12] stumbled close to Taleb's definition of antifragile, but while I was striving for resilience it's clear that is even a near-miss, as we say. Turns out what I'm looking for as resilient is actually quite susceptible to black swan events, which are events deemed improbable but quite cataclysmic and ultimately lead to big blow-ups when they occur. I believe a lot of this can (and should) be applied to Information Security because in our world just like in Humble and Taleb's a black swan event can mean the end of our enterprise.
There's more than meets the eye here at first read though. And Taleb's 544 page book will require a good thorough read before I feel qualified to analyze antifragile against the IT Security backdrop. But it's clear that from Humble's post we are going to have a hard time applying antifragile to security in the near future.
In practice, the Netflix "Simian Army" concept (think chaos monkey) is a fantastic idea, but applying it to security is proving harder than I realized when I first presented this concept to my colleagues. With Chaos Monkey you're looking at components along the delivery and operations chain, and using chaos as a way to identify weakness in structure and operation. In security - our events are different. On one end is the normal packets bouncing off of our firewalls which we're hopefully not counting as 'events' and on the other extreme is a full-on breach. Which do you simulate in your chaos model for security? You can't just simulate a hack or breach without causing major chaos, and simulating in a non-production environment is unrealistic (and probably too costly).
What I was advocating in my piece from September '12 is mock security drills, but it's a lot easier said than done, and hardly something that is fully automated since humans are required. I don't have the answer to this yet but I'm definitely working on it...
When it comes to security organizations that I've worked with or for, they tend to lean heavily towards fragile, and even those that have seen the effects of being fragile evolve and cling to being robust/resilient as a reaction. Look at it this way, the first thing that's typically blamed in any outage is what? The security device, usually the firewall... most calls I was on when doing operations a number of years back started out this way and colleagues tell me it hasn't changed much in recent years. Arguably this is the same reason that it takes the presidential executive order (or an equivalent thereof) to get a Web application firewall (WAF) in-line and in block mode and heaven help you if you drop a legitimate transaction on the floor because the development organization forgot to alert you to some structural change in the code ... been there, done that. Heck, look at how security is treated post-outage! If it actually was the security device's fault, you'd be lucky if you didn’t get it ejected from the packet-stream, which I suppose is a natural reaction.
Fragile systems are all around us in security. Robust (I would use the word rigid) systems are all over the place too. Antifragile is something I've yet to encounter, even in the most mature organizations, mainly because I believe that the patterns simply don't exist to follow.
Antifragile systems flourish in volatility - but I don't believe we [security professionals] have platforms that follow this line of thinking. I'd love for someone to point me to a security platform that thrives on volatility and to some degree, chaos, for advancement and capability if such a platform exists!
Think about your own organization, how would you implement antifragile security in your organization?
Edit {1/15/13 @ 11:42am CST}
I realize there is some confusion on what I meant by the reference to security monkey (part of the Simian Army) above, so here's a quick clarification. I'm referencing this post (http://techblog.netflix.com/2011/07/netflix-simian
"Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal."
The link I'm making between the Simian Army (arguably the first such example of a formalized antifragile implementation in software) and antifragile with respect to security is that while security monkey looks for violations and vulnerabilities, these are pre-defined somewhere, and pattern-based - arguably what we would build is more related to Chaos Monkey than Confirmity Monkey. Jez brings in the notion of the black swan event - which I don't believe the security monkey will account for (we end up with some confirmation bias here, as we find bugs we know about, but then can easily fall into a false sense of security leading to a 'blind spot').
So how do we create a system that improves security through antifragile means? The security monkey is a fantastic start, but I believe it's just that... a start, since ideally we would just set security monkey off to randomly change security policy at different layers, turn off defenses at various levels to identify weaknesses in our layered security approach - but that can easily lead to a self-inflicted breach. This probably sounds pretty insane too.
Penetration testing is a great start, but misses the point of the simian army - full automation. Again, I'm not 100% clear what the outcome will be, and whether we ever achieve this state I'm interested in based on the restrictions around security, but it certainly is interesting.
---------------------------------------------
*black swan photo courtesy of: http://enchantedwhispers.deviantart.com/art/Black-
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
Boris (jadedsecurity), thanks for taking the time to leave a comment, although I think you're largely missing the point of the post, I'll address yours here:
- Re: "chaos moonkey, security monkey" - If you've ever read up on what Chaos Monkey is and how it behaves, comparing what security monkey will be to penetration testing is ridiculous. The Simian Army is about full automation, without human intervention, at random intervals to test for fragile parts of the infrastructure. Applying that same thinking to the concept of security monkey would require, in my opinion, high costs since you're looking at replicating infrastructure, having simulated outages, etc... the initial cost would be astronomical given many enterprises' states of 'security' posture is often weak. I'm hoping someone from Netflix can shed further light since they're the keepers of the Simian Army... {I've included a clarification in the original blog post, above to further discuss this point. The current security monkey is merely an agent of conformity rather than chaos ...which is where my thinking diverges. Apologies if that wasn't clear.}
- Re: Patterns - please consider the context of the article, as I'm discussing antifragile applies to security - and those patterns most definitely do not exist to my knowledge today. I'm asked Jez Humble and Gene Kim to collaborate on applying this concept to Information Security - so if someone has already done this, please point me to them so I can contribute if possible. Jez points out in his blog post that attempts to 'manage risk' (as you've commented) result in 'theater' ( "This a great explanation of how many attempts to manage risk actually result in risk management theatre – giving the appearance of effective risk management while actually making the system (and the organization) extremely fragile to unexpected events." ). Furthermore, I know of (and this could just be me) machine-based learning systems as widely applied to enterprise security. Anomaly-based intrustion detection is really not applicable to this conversation... as far as I can tell.
- Re: "blame the firewall" - unfortunately, there is still a good bit of this around, as anecdotal evidence from colleagues and customers demonstrates, and while we should be long past blaming the security aparatus (say, firewall for simplicity) we are not, universally at that evolution.
Thanks for reading, commenting!
/Raf
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
@jadedsecurity
In addition to Raf's clarification which talks about Netflix's automated security testing, you say, "You can't just simulate a hack or breach without causing major chaos, and simulating in a non-production environment is unrealistic (and probably too costly)"
I'd like to point you to Amazon (which is of course SOX and PCI-DSS compliant, and which I think counts as "mature"). In my original blog entry I reference this paper from the ACM: http://queue.acm.org/detail.cfm?id=2371297 in which Jesse Robbins of Amazon describes the Game Days they ran to test their recovery processes thus:
"In the exercises I designed, we used real failures. We would literally power off a facility—without notice—and then let the systems fail naturally and allowed the people to follow their processes wherever they led. In one of those exercises, we actually drew on my fire-service background to concoct a simulated fire. I wrote out the timing to the minute for that according to when we would expect certain things to happen as part of a full-scale fire response. Then, posing as some of the facilities guys, we called people in operations to update them on what was happening.
"My view is that you want to make these drills seem as real as possible in order to expose those special systems and backdoor boxes that some system administrators have been holding onto in case of emergency. Not everything in these exercises can be simulated, and once you start powering down machines or breaking core software components, the problems that surface are real. Still, it's important to make it clear that the "disaster" at the core of the exercise is merely simulated so people on the periphery don't freak out. Otherwise, what happens in the course of that exercise ought to feel just as real as possible.
I don't know if they had people try and break in - I'll try and find out - but I don't see that it would cause any more "major chaos" than the Game Day exercise discussed here, and so I can't agree that this is a valid objection.
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
I don't have specific metrics, but "many" infrastructure exploitations are caused by poor configuration management and the inappropriate (or missing!) application of security controls. Overly-permissive host- and network-firewalls, poorly written code that is vulnerable to XSS, etc.
This could be caused by lack of knowledge/training on the part of the system/application owner, lack of updates to same in order to keep up with attackers (they always get better, never worse), or a mistake in the code or configuration management / orchestration system. In any case, we should assume that system owners want to be made aware when a system is vulnerable to compromise or in violation of a security/compliance policy.
As the system owner (in the case of IaaS) or application owner (in the case of PaaS), you don't have any control over exploits/threats. But you do have control over your venerability and response to such threats. Security Monkey could be focused on detecting overly-risky or missing security controls, and notifying the system/application owner of these risks. For a system/data that is deemed very critical, Security Monkey could possibly even take action and shut down, quarantine, prevent the launch of the system, or otherwise apply a mitigating control to protect the system/application/data.
The goal should be to automate security testing of systems so that owners are aware of the implications of: (1) risky changes they're making to the infrastructure, and (2) changes attackers are making to Bad Guy tools. Ideally Security Monkey is part of an overall Security Awareness program, layers of belts and suspenders.








