Archive

Archive for the ‘Regular Expressions’ Category

Working with Regular Expressions

May 11th, 2010

Just a quick note, to help write your regular expressions I highly recommend The Regulator for developing, tweaking, and testing before deployment into your code. It is available on source forge here:

http://sourceforge.net/projects/regulator/

One caveat, sometimes the config file gets screwed up and the program wont load anymore. Navigate to the directory and delete the config file then restart and you’re good to go.

Another tool you’ll need in your arsenal is a good reference, and the best I have found can be located here:

http://www.regular-expressions.info/

Keep on coding`

Regular Expressions

Regular Expressions: Forward and Backward Lookups

May 11th, 2010

Several years ago I read the regular expression syntax and was instantly hooked because it made parsing text just so darn simple. But one thing particular to regex that I never actually had a need for was forward and backward lookups, much less both of those within the same regular expression…. that is, until now.

Problem:

Write a regular expression to pull out everything not within {brackets}.

Ex. Input:

{ed ut perspiciatis}test{unde omnis iste} foo {natus error sit voluptatem}

This should pull out “test” and “foo” (as well as the space prefixed to “foo”, and the space suffixed to “foo”. These are underliend and in red above for your reference.

I would like to point out that I could not simply replace everything in brackets since I was modifying the string to be in a paticular format (in my case another programming language). And with that having been said, it does (and I might add quite suddenly) occur to me it would have been simpler, and for that matter faster, to have a simple regular expression such as ({.*?}) and then use the the index and length parameters of the GroupsCollection to pull out everything else, yet for some reason I got caught in the theoretical enjoyment of doing it the hard way :)

And so here it is:

(?<!{[^}]*?)(?<=^|\s|})(?<Text>[^{][^}]*?)(?<TrailingChar>\s|$|{)

Couple cool things here: 1. named groups which come in handy when you’ve got a lot of groups. Why count up the index if you can just name it? I also feel this makes it easier to understand and work with the regex in the future in case you have to revisit it.

The regular expression used above did require me to use .Replace(…., new MatchEvaulater(customFunc)). This is another cool feature I had never used before but it kept things quite simple and neat in the code.

Anyway, just a note about what Forward and Backward Lookups are and how they work:

Basically a lookup is just that, it looks up a match either to the left (backward) or to the right (forward) but since its only “looking” it doesn’t append that match to the MatchCollection.

To look forward (?>=[a-z]{1}) to look backward (?<=[a-z]{1}) , and then you can negate these like so (?>[a-z]{1}) and (?<![a-z]{1}) … *obviously  you can replace [a-z]{1} with whatever you need/want.

Let this serve as a reminder to both myself and you, if things seem overly complicated it’s probably because your making it that way. But hey, this is fun so who cares?

Regular Expressions