Digging into ASP.NET RegEx Validators


RegEx Validators are handy for implementing Whitelist input validation (our DevInspect product has a library of a hundred or so) so it pays to see what they  actually do under the covers. The following code is from the class
System.Web.UI.WebControls.RegularExpressionValidator which implements the RegEx Validator in ASP.NET. I'm looking at version 2.0 of the .NET framework. The EvaluteIsValid() function is what actually determines if an input matches the allowed regex:



protected override bool EvaluateIsValid()
{
string controlValidationValue
= base.GetControlValidationValue(base.ControlToValidate);

if ((controlValidationValue == null)
|| (controlValidationValue.Trim().Length == 0))
{
return true;
}
try
{
Match match = Regex.Match(controlValidationValue,
this.ValidationExpression);
return ((match.Success && (match.Index == 0))
&& (match.Length == controlValidationValue.Length));
}
catch
{
return true;
}
}

Lets look at the first if statement closely. The conditional is a logical OR, so we can treat each part as individual if statements. The code says "if the control value is null, the input passes this validator." The control value can only be null if the control doesn't exist, and chances are you would have had a compile error. Either way, I don't really care about this statement.



The second part of the if statement is interesting. If the trimmed value of the input has no length, the input passes validation. This is a little weird. Trim() removes all leading and trailing whitespace from a string. Things like tab, carriage return, linefeed, etc. The full list is in the .NET String Class documentation. So input consisting of nothing but whitespace passes any ASP.NET RegEx Validator, regardless of the regex. This is counter-intuitive. If I set a RegEx validator to use + whitespace-only input should not get through. More interesting is that characters like vertical tab, CR, LF, and form feed are often interpreted as delimiters, so input with nothing but whitespace could in fact do some weird things in a backend app.


Moving along, we see that the input is matched against the supplied patterned. If a match is found and that match begins with the first character of the input and the length of the matched text is the same as the length of the input, then the input passes the validator. Essentially, this is saying that the entire input must match the regex defined. This is good and bad. Its good because I often see developers define a whitelist and forget to use the the "start position" and "end position" characters (^ and $ respectively). Consider an simplified regex for valid email address +@+\.{3}. (This RegEx excludes valid email addresses and is just to illustrate the point). This RegEx matches email addresses like billy@hp.com. It also matches email address like <script>alert("XSS")</script>billy@hp.com because the regex doesn't force that the entire input must match the regex pattern, but merely some subset of the input matches. The RegEx should be ^+@+\.{3}$ where ^ and $ force the entire input to match.


I often see developers make this mistake and in a way Microsoft is solving the problem by saying "Developers are stupid we will also match the entire string so they don't have to." Ok, I like erring on the side of caution, but this prohibits advanced users from using $ and ^ appropriately. Microsoft completely usurps the developer and removes the choice from their hands. You have to use a CustomValidator class to implement RegExs which utilize just ^ or $. However I can live with this because it undoubtedly saves the Internet for silly mistakes.


A final thing that caught my eye was the try ... catch ... block. If the Regex.Match() call throws an exception, the validator returns true indicting the input is safe. This means in event of an error, the validator fails open instead of failing closed! Deciding when applications/appliances/software/hardware/structures should fail open or fail closed is way beyond the scope of this post and the answer is almost always circumstantial based on the individual situations. Quick, should firewalls fail open or closed? Fail open? Well then an attacker knocks out your firewalls and its open seasons on the FTP servers and Samba shares inside your organization. Fail closed? Thats a nifty DoS you built into your network infrastructure now isn't it? when should input validation fail open or fail closed? Again depend, but my gut tells me it should fail closed more often than it fails open.


Ignoring how it should fail is moot if you can't make it fail. How can we make Regex.Match() throw an exception? Null strings would do it, but a brief glance at the code paths in ASP.NET seems to prevent that from occuring. Invalid RegEx syntax would do it, but you get compile-time problems or an immediate runtime error during the Page_Load event, long perform input is validated. So RegEx Validators in ASP.NET fail open if there is an exception, but I can't seem to get it to exception in an exploitable way to sidestep input validation. Perhaps someone else can take a swing at it :-)

Comments
(anon) | ‎12-27-2007 12:48 AM
How to validate for <SCRIPT>, <style> tags, Javascript alert, confirm etc.. so that they won't spoil the look and feel of any page.<BR> <BR> Thanx in Advance.</SCRIPT>
Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.