To kick off this blog I think I'll start with something a little wacky. Krister Hedfors has created a package called
inrex which implements a bunch of regular expression "operators" ("inrex" is short for "infix regular expressions"). Here's how regular expressions are normally handled in Python:
>>> import re
>>> match = re.match(r'(\w+) (\d+)', 'asd 123')
>>> if match is not None:
... print 'word is', match.group(1)
... print 'digit is', match.group(2)
...
word is asd
digit is 123
>>> match = re.match(r'(?P<word>\w+) (?P<digit>\d+)', 'asd 123')
>>> if match is not None:
... print 'word is', match.group('word')
... print 'digit is', match.group('digit')
...
word is asd
digit is 123
>>> re.findall(r'\d+', 'asd 123 qwe 456')
['123', '456']
>>> re.split(r'\d+', 'asd 123 qwe 456')
['asd ', ' qwe ', '']
>>> re.split(r'\d+', 'asd 123 qwe 456', maxsplit=1)
['asd ', ' qwe 456']
Note that we need to have a statement to obtain the match object and a second statement to examine it. Pretty standard Python, but a little annoying sometimes. Here's how the same results are achieved in inrex:
>>> from inrex import match, search, split, findall, finditer
>>>
>>> if 'asd 123' |match| r'(\w+) (\d+)':
... print 'word is', match[1]
... print 'digit is', match[2]
...
word is asd
digit is 123
>>> if 'asd 123' |match| r'(?P<word>\w+) (?P<digit>\d+)':
... print 'word is', match['word']
... print 'digit is', match['digit']
...
word is asd
digit is 123
>>> 'asd 123 qwe 456' |findall| r'\d+'
['123', '456']
>>> 'asd 123 qwe 456' |split| r'\d+'
['asd ', ' qwe ', '']
>>> 'asd 123 qwe 456' |split(maxsplit=1)| r'\d+'
['asd ', ' qwe 456']
Working with the match object is clearly much easier. There's a limitation that it'll only work for an immediate result; unlike the standard re.match the inrex match object is a singleton, and thus you can only work with one result at a time. For simple cases (the most common) a singleton match object would suffice.
I've not seen the syntax for defining new infix operators before -- I assume it's using the recipe from http://code.activestate.com/recipes/384122/?
ReplyDeleteIs it thread-safe?
ReplyDeleteWow, awesome. Thanks for the tip!
ReplyDeleteFrom what I can see it's definitely not thread-safe.
ReplyDeleteYou can’t define arbitrary infix operators in Python, but if you make a custom class, you can define how +, -, *, |, >>, ~, etc. work for that class using the __magic__ and __rmagic__ methods.
ReplyDeleteJust do a dir() on an int to learn more.
ReplyDelete@Marius and @Richard
ReplyDeleteThread safety could be added by using threading.local for the result.
I was inspired by the overloaded operators here, and made a library to make such definitions easier. Find it at http://42017203.blogspot.com/2010/03/minioperators-adding-new-operators-to.html.
ReplyDeleteI wrote the inrex module. Please check out the brand new 'sqldict' module too!
ReplyDelete