Following the Wh1t3 Rabbit - Practical Enterprise Security

Enterprise Security organizations often find themselves caught between the ever-changing needs of the agile business, and the ever-present, ever-evolving threats to that business. At the same time – all too often we security professionals get caught up in “shiny object syndrome” which leads us to spend poorly, allocate resources unwisely, and generally de-couple from the organization we’re chartered to defend. Knowing how to defend begins with knowing what you’ll be defending, why it is worth defending, and who you’ll be defending from… and therein lies the trick. This blog takes the issue of enterprise security head-on, challenging outdated thinking and bringing a pragmatic, business-aligned, beyond the tools perspective … so follow the Wh1t3 Rabbit and remember that tools alone don’t solve problems, strategic thinkers are the key.

Rafal (Principal, Strategic Security Services)

Hybrid Analysis - The Answer to Static Code Analysis Shortcomings

Hybrid Analysis - The Answer to Static Code Analysis Shortcomings

    Given my previous article and the buzz it generated (both for and against the ideas I set forth)... I needed to hurry-up and write the follow-on article for "Static Code Analysis Failures".  I've had so many conversations with people about Hybrid Analysis, and "why static code analysis fails" I've come to realize a few things, and wanted to share more of the base for my mindset...

  • Definition: Most of the folks I've spoken to (developer to security "expert") have a mis-guided idea of "static code analysis".  With that in mind, here is the definition from the Wikipedia...

Static code analysis is the analysis of computer software that is performed without actually executing programs built from that software (analysis performed on executing programs is known as dynamic analysis). In most cases the analysis is performed on some version of the source code and in the other cases some form of the object code. The term is usually applied to the analysis performed by an automated tool, with human analysis being called program understanding or program comprehension. 

  • Concept: The whole reason we (security professionals) have pushed to get a security tool into the "development" part of the SDLC is that it's easiest to fix defects when the code is still being worked on and built.  We all know patching is a recipe for disaster, no one will argue that - so the idea in itself was sound.
  • Evolution: As the evolution of security marches on we have seen the evolution of "static code analysis" move forward as well.  The concept of the traditional white-box testing strategy has gone from sending your code to a human being who would read it, line-by-line and provide an analysis of that code to programs which are essentially smart enough to build the code, trace "source-to-sink" and build data flow models and attack vector simulations. I am writing to contend that we have moved on in the next step of the evolution.  Building the code and doing a "source-to-sink" trace with simulated attack scenarios is no longer good enough. (more on this in a minute)
  • Comprehension: It seems a few folks have missed the point of what I was talking about.  I'm not saying that this type of testing (white-box, source-analysis, whatever you call it) shouldn't be done.  Quite the opposite my friends - "code analysis" is vital to the success of any good security SDLC.
  • Solutions: Everyone quickly jumped on my case to defend whatever tools their company makes or uses... but I think I did my best to not attack any particular tool or vendor... hrmmm...

    OK so now let's talk about what the answer to this whole mess of analyzing web-borne code is going to be.  How will we, as security professionals, continue to be relevant to developers and the roots of the Software Development LifeCycle?  I tell you that the future lies in what I can only call "Hybrid Analysis".  This isn't a term I've developed, in fact, the term can be attributed to the developers of a certain software tool which is now distributed and built by the HP ASC group (to which I belong).  Without turning this into a marketing pitch, I want to explain to you why the leap I made in the above bullet-point titled "evolution" is valid, and why you should care.

    I love the quote I put in my first article so I think it's worth repeating... "Machines do not execute source code, they execute machine code"... absolutely true.  So when people used to work with source code we can now understand why they only got the answers part of the time.  Yes, I fully realize most modern source-scanners "or white-box scanner" tools don't just grep through code, as someone suggested at a blog posting that quoted me.  In fact, I'll go out on a limb and say that to grep through the code for "vulnerabilities" is about as effective as anti-virus and IDS (both pattern-based recognition)... there are so many permutations of the bad/malicious that those tools are inherently broken.  Anyway - to get back on track, I want to talk about this term "hybrid analysis".

    What is Hybrid Analysis?  (And more importantly, why do you care?)  Hybrid analysis is the culmination and what I feel is the inevitable cross between white-box and black-box testing.  I'm not sure if "gray-box" is the proper term or not, so I'll just keep using Hybrid Analysis for the sake of not mixing things up.  Hybrid analysis analyzes the byte-code that a source-compiler generates and builds your standard "source-to-sink" data-flow model but then moves beyond the limitations of that approach by taking that "knowledge" of the application and submitting it seamlessly into a (modified) black-box scanner built into the solution.

    Picture it - you have the blueprints of the bank vault, so you can try all the ways to theoretically break into the vault, then you hand those theoretical attack plans to a grunt who takes your information and goes and actually tries each of those attack vectors with multiple permutations and attack parameters to make the score and break in.  That my friends, is the proverbial Holy Grail of application security testing.  Data-modeling will only get you so far, before you have to actually try the attack to make sure it really works.  Now, given the many parameters involved, for example, external libraries, compiler behavior, and so on that influence the way code actually behaves it's conceivable that you can accomplish this feat I've described without the hybrid analysis approach (the modified black-box scanner) ... but then I think you're looking at an incredibly complex and almost AI-driven analysis tool... and I simply don't believe that technology exists or will exist.  I'm the kind of person that has to see something being broken before I'll believe it's real.  Give me the proof.

    If you take the hybrid approach - the proof looks you in the face each and every time.  As a nice side-effect you can virtually forget false-positives!  Source code scanners are infamous (notorious even!) for generating a lot of false-positives.  This has always been one of the many reasons developers argue against using these tools.  But what if you could offer your developers a way to eliminate those false-positives with little or no human intervention or double-checking?  If you're still skeptical, I'll be happy to have someone from our team demonstrate how this type of approach works, yes - we have it exclusively in our toolset.

    So let's recap.  Here are all the ways that Hybrid Analysis will save the world (joking):

  • Reaches well beyond modern "source-to-sink" data-flow modeling for vulnerability detection
  • Addresses 3rd party components by reflection (analyzing byte-code or IL of DLLs, JARs, etc)
  • Provides a real validation of the theoretical attack scenarios that the above step generates
  • Virtually eliminates false-positives!  This is a nice side-effect of testing using the Hybrid Analysis method

Static Code Analysis Failures

Static code analysis failures are costing enterprises money and reputation.

White-box security testing is inherently a flawed proposition for many reasons -but it all comes down to a very simple concept:

  Machines do not execute source code, they execute machine code (compiled code). --Paul Anderson (GrammaTech)

  If you think this through for a minute you realize that there are a few specific reasons why the above statement fundamentally changes the way that people look at white-box testing, and why this is a losing proposition.  Let's analyze this in the context of a web application project for a mythical online bank.  Consider that the use-case here is that we are dealing with a bank that has an online presence (currently being analyzed) which will be integrated with a series of existing legacy applications, partners, and external 3rd party components.  Given this information let's analyze why white-box analysis (or static source-code analysis) is doomed to fail this project with respect to security.

  • Compiler Optimizers Break Things - Think of it this way, compilers are designed to make machine code from your source code.  That compiler's sole purpose (in most cases) is to create machine code that will be optimized, extremely fast-executing, but not necessarily secure.  Often times security functions that people build into source code can be removed by compiler optimizers and most often without our knowledge.  These actions often undo many of the advanced security features that developers may consciously insert into their code.  Consider the following example:
    • Developer is paranoid about data-persistence in memory space, and wants to be doubly-sure that variables are expired and destroyed
    • Developer writes a routine whereby the variable will have a null value written to it before the memory is freed
    • Compiler optimizer sees this as a double-work scenario, and removes the null-value portion and simply opts to free memory
    • A potential security vulnerability is created with variable persistence in freed memory space

This example ideally demonstrates how a security vulnerability can be inserted in spite of the developer's best efforts to write secure code.  Standard static-code analysis tools which are used to "scan code" at the static-file level will fail to catch this vulnerability.  Quite simply - static code analysis fails if it is not supplemented with dynamic analysis.

  • 3rd Party Library Integrations - There is another threat to developing and scanning static code in a white-box format.  Inevitably, 3rd party libraries are used to complement features or functionality that are not natively provided by the local development effort.  After all, no one re-invents the whole wheel everytime - we simply build what we cannot reuse from someone else's work, then use the publicly available libraries from 3rd parties to fill in the functionality and features that have already been written and (hopefully) tested before.  White-box testing (or static code analysis) will absolutely fail in finding flaws when it comes to pulling in 3rd party libraries.  By the definition of this type of issue, 3rd party libraries rarely provide you the source to be scanned and checked for weaknesses that will affect your application.  What you're left with is someone else's code (in machine-compiled format!) which will be interacting with your application.  Would you trust that model?
  • Static Code Analysis Rarely Understands Data-Flow Modeling (Data Tracing) - If you're scanning your application with a source-code-only analysis tool, you're going to not only miss things that will almost certainly come back to haunt you - but you may also be over-working yourself without a real purpose.  Consider the following example to illustrate my point.  Before I get into that example though, allow me to explain this idea of "data-flow modeling" for those that are not familiar with this idea.Data-flow modeling seeks to understand how data moves through your application, not just how the application code is written.  After all, that's the whole pointn of the application, to work with data.  Vulnerabilities lie in manipulating data either to or from the end users or the server(s).  Data-flow modeling maps out the data in your appliaction from it's instantiation (maybe when the user types it in) to its resting state (maybe when it's finally written to a database, or handed off to another application or service for additional work).  That being said let's consider a web application that has 1,000 forms across 100 pages written in the language of your choice, built to be AJAX.  While each page does nothing individually to validate user input (the data source) all variables (data) are filtered through a central validation module deep within the application logic.  A standard source-code analysis tool (I have evaluated this and can honestly say this is a real use-case but will not mention the tool) will flag on each and every input that is not validated (within the page) as vulnerable to hudreds of vulnerabilities ranging from XSS (Cross-Site Scripting) to SQL Injection and other attack types.  What you are left with is a very lengthy report with hundreds of critical and high vulnerabilities that you now obviously must address... unless you do some dynamic analysis on the code and realize that *none* of those theoretical vulnerabilities are exploitable due to the fact that the application filters all data through the central validator/scrubber.

So, there you have it.  Static code analysis is inherently doomed to fail.  White-box testing of source-only is flawed.  The sky is falling, global warming will kill us all.  In my next installment of this column, I'll give you what you need to know to avoid failing in your security initiatives at the development step of the SDLC - remember, knowing is half the battle.

 Stay tuned!

If this information disturbs you, and you would like to talk about it directly please don't hesitate to email me directly.  I am not a sensationalist, and pride myself on presenting practical solutions to real-world problems which are realistically attainable.  Thanks for reading.

Search
About the Author(s)
Follow Us
Twitter Stream


Community Announcements
HP Blog

Technical Support Services Blog

Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation