Regular expressions aren’t just for brain surgeons
We’ve got (or will have) quite a lot of validation in Pushhit and because we’re trying to de-couple Microsoft from ASP.NET wherever possible :) we’re not using any of the standard validation controls. This is mainly because their validation controls are generally a bit useless and output some nasty markup, they also don’t integrate well with our FLASH (this is our standard way for outputting error messages). Anyway, moving on… it’s kinda forced me to better understand regular expressions.
It seems that Regular Expressions are one of those things that every developer knows are invaluable but they’re a bit of a bugger to get to grips with and as there’s generally another alternative, it’s easy to avoid learning them. I know I’ve previously avoided using them at all cost in the past.
Anyway, with a copy of RegEx Buddy my understanding is now much better. Let me share a few key things I’ve worked out that confused me previously…
^ – this character marks the start of the string. If I start my regular expression with this, it means I start looking at the beginning of the string. For validation you therefore probably always want this.
$ – this marks the end of the string so if I’m looking for an email address – I’ll want this on the end to prevent any white space or other rubbish I don’t want.
Loops! – this is what got me the most, the iteration declaration comes after the match string and outside any grouping.
Here’s a very simple example, I wanted to remove all of the white space from the end of a string. This was for some textarea input that I wanted to make sure didn’t have any lines afterwards. The regular expression simply looks like this:
[\s]*$
If you know about regular expressions this is about as easy as you can get, but if you don’t it almost looks frightening. Let me explain what it means…
Square brackets [] – these allow me to group characters together to look for. In this example I am looking for:
\s – this represents any whitespace character, spaces, tabs or line breaks.
* - this is what I was saying about the looping. It’s telling the regular expression engine to look for the previous grouping (other groupings exist, capturing groups etc, name captures etc) between 0 and many times.
$ - as mentioned, this is saying “Look for the previous stuff, but it has to be at the end of the string”.
Using .NET, by importing System.Text.RegularExpressions it’s easy to use the expression by doing:
Here’s another one I did recently (if any regexp vets can help me improve this feel free to comment).
“^([A-Z0-9._%-]+@[A-Z0-9.-]+\\.[A-Z]{1,6} *, *)*$”
I won’t explain this one because I’d rather have a beer ;) but basically it checks a string is a valid comma delimited list of emails (allowing spaces in between the commas if provided).
The most important thing for me learning regular expressions has been having a tool to easily run, debug and explain my expressions. RegEx Buddy does all that and more and it’s well worth the small license cost.
I can’t claim to be an expert with regular expressions yet, but I would now say I’m not frightened to use them :) if you’re having trouble with something feel free to email me [taholder(AT)gmail.com], I can’t guarantee I can help but it’s good exercise for the brain to try!

Comments
Add a comment