tag:blogger.com,1999:blog-58749121120647145062024-03-14T02:51:28.370-07:00Bit Of CheeseHighlighting interesting packages in the Cheese Shop (aka PyPI)Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-5874912112064714506.post-67051342043494702622013-05-07T14:39:00.000-07:002013-05-08T16:57:56.387-07:00Local PyPI Options<div dir="ltr" style="text-align: left;" trbidi="on">
Having a central package repository has helped the Python community immensely through sharing reusable code. There's a few issues that arise when you start depending on such a resources though, and may need to be solved:<br />
<ol style="text-align: left;">
<li>make your installs resilient against Internet/PyPI issues,</li>
<li>speed up your installs significantly (after the first one),</li>
<li>prevent problems installing packages that are removed from distribution by the author,</li>
<li>allow installation of packages from within a firewalled environment where the host performing the installation does not have Internet access, and</li>
<li>allow hosting and installation of private packages.</li>
</ol>
<div>
All while being as little an overhead on the package users as possible (ie. maintenance of a system performing the above should be either low or none).</div>
<div>
<br /></div>
<div>
Searching for "PyPI" on the package repository is somewhat daunting (the page of results seems to go on forever). Having done a bit of a survey of the top hits there seems to be only a few packages that are relevant to the above requirements (presented here in the order that the PyPI search ranks them):</div>
<div>
<ul style="text-align: left;">
<li><a href="https://pypi.python.org/pypi/Flask-Pypi-Proxy">Flask-Pypi-Proxy</a> - A semi-proxy that supports private package upload. Its dependencies are quite hefty and it does not mirror packages locally.</li>
<li><a href="https://pypi.python.org/pypi/pyramidpypi">pyramidpypi</a> - "This is a very simple pypi-like server written with the pyramid web framework." Pyramid is a very hefty dependency for such a simple server and it only supports private package upload.</li>
<li><a href="https://pypi.python.org/pypi/simplepypi">simplepypi</a> - a very simple local repository allowing upload of packages and installation of them.</li>
<li><a href="https://pypi.python.org/pypi/yopypi">yopypi</a> - is a "load balancer" which punts requests to a mirror automatically when the primary PyPI is unavailable.</li>
<li><a href="https://pypi.python.org/pypi/djangopypi">djangopypi</a> / <a href="https://pypi.python.org/pypi/djangopypi2">djangopypi2</a> - are both PyPI servers acting as local repositories with the same user interface as the real thing. No proxying, though there is a manual tool <a href="https://pypi.python.org/pypi/infi.pypi_manager">infi.pypi_manager</a> which may be used to mirror packages to a local djangopypi.</li>
<li><a href="https://pypi.python.org/pypi/inupypi">inupypi</a> - appears to also be a "load balancer".</li>
<li><a href="https://pypi.python.org/pypi/mypypi">mypypi</a> - another local PyPI server using Zope 3.</li>
<li><a href="https://pypi.python.org/pypi/pypiserver">pypiserver</a> - serves files out of local directories or redirects to the real server if not found. Handles upload of private packages. No proxying for missing packages, though it does have a facility for updating packages which are already in the local directories.</li>
<li><a href="https://pypi.python.org/pypi/pyshop">pyshop</a> - another private repository implementation with access controls built in. It also performs caching proxy of packages not present locally. Hefty dependencies (Pyramid but also an SQL database).</li>
<li><a href="https://pypi.python.org/pypi/spynepi">spynepi</a> - a proxying server with local storage which also handles local upload of private packages! In Twisted. Using "spyne" which is some RPC mechanism and I don't know what it's got to do with PyPI serving. Hefty dependencies.</li>
<li><a href="https://pypi.python.org/pypi/chishop">chishop</a> - <i>another</i> simple local repository with upload written in Django.</li>
<li><a href="https://pypi.python.org/pypi/ClueReleaseManager">ClueReleaseManager</a> - yet another local repository though with full meta-data support and what appears to be proxying of PyPI meta-data, but not files.</li>
<li><a href="https://pypi.python.org/pypi/pyroxy">pyroxy</a> - a proxying index server which can serve local files (but without local caching of proxied files).</li>
<li><a href="https://pypi.python.org/pypi/scrambled">scrambled</a> - a very simple server of local files (point it at a directory and run).</li>
<li><a href="https://pypi.python.org/pypi/devpi-server">devpi-server</a> - a transparent caching proxy with local storage of the files accessed. Uses a redis database, which is an additional dependency that is a problem in my deployment scenario.</li>
<li><a href="https://pypi.python.org/pypi/collective.eggproxy">collective.eggproxy</a> - implements caching proxy but has hefty dependencies. Also seems to be very fetch-happy, retrieving eggs I don't actually need.</li>
</ul>
<div>
A lot of the implementations above have a bunch of user controls built into them. And there's an awful lot of "simple PyPI in framework X" implementations. Most of the "proxy" solutions (save pyshop, devpi-server and collective.eggproxy) required manual download of the package files, or they just proxied their requests through to the Internet with no local file storage for speed/resilience. Those others had dependencies that prevented me easily installing them into my target environment.<br />
<br />
So none of them fit the bill, and none appeared to be easily modifiable to do what I want. So, I wrote my own: <a href="https://pypi.python.org/pypi/proxypypi">proxypypi</a> :-)<br />
<br />
When <b>proxyypi</b> is asked about a package it doesn't know it automatically goes off and fetches the file download list for the package, rewriting all references (PyPI and external) so they appear to be local. On request of one of those now-local package files it performs a background fetch of the file contents and serves up the new file data to the pip request (thus keeping that request alive despite its very short timeout duration).</div>
</div>
</div>
Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com12tag:blogger.com,1999:blog-5874912112064714506.post-20977447377057438432012-02-26T20:41:00.002-08:002012-02-26T20:46:58.232-08:00A couple of new modules for messing about with objects<div dir="ltr" style="text-align: left;" trbidi="on">
<b><a href="http://pypi.python.org/pypi/python-blueprint/">blueprint</a></b> - a neat tool/library that allows data objects to be used as bases for new data objects. And other cool stuff. "Think of it as prototypal inheritance for Python! " I see a lot of potential in video games with procedural content.<br />
<b><a href="http://pypi.python.org/pypi/objectifier">objectifier</a></b> - in a similar (messing about with objects) vein, here we create objects from dictionaries.</div>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com0tag:blogger.com,1999:blog-5874912112064714506.post-89908221261180358652012-02-12T18:53:00.000-08:002012-02-12T18:53:46.972-08:00reStructuredText to ...<div dir="ltr" style="text-align: left;" trbidi="on">
Convert <a href="http://docutils.sourceforge.net/rst.html">reStructuredText</a> files to...<br />
<br />
<b><a href="http://pypi.python.org/pypi/rst2blogger">rst2blogger</a></b> - HTML to post on <a href="http://blogger.com/">blogger.com</a> blogs. Hmm, thanks Doug :-)<br />
<b><a href="http://pypi.python.org/pypi/rst2hatena">rst2hatena</a></b> - posts for <a href="http://www.hatena.com/">Hatena's</a> Diary service.<br />
<b><a href="http://pypi.python.org/pypi/rst2marsedit">rst2marsedit</a></b> - HTML that can be used with the <a href="http://www.red-sweater.com/marsedit/">MarsEdit</a> blogging tool.<br />
<b><a href="http://pypi.python.org/pypi/rst2atom">rst2atom</a></b> - skip the whole "blog" thing and send your rst directly to <a href="http://www.atomenabled.org/">XML ATOM 1.0</a> feed readers.<br />
<b><a href="http://pypi.python.org/pypi/blohg">blohg</a></b> - blog posts (stored in <a href="http://mercurial.selenic.com/">Mercurial</a>.)<br />
<br />
<b><a href="http://pypi.python.org/pypi/rst2beamer">rst2beamer</a></b> - the <a href="https://bitbucket.org/rivanvx/beamer">Beamer LaTeX</a> document class for presentations.<br />
<b><a href="http://pypi.python.org/pypi/slides">slides</a></b> - the <a href="https://bitbucket.org/rivanvx/beamer">Beamer LaTeX</a> document class for presentations.<br />
<b><a href="http://pypi.python.org/pypi/rst2odp">rst2odp</a></b> - odp files for <a href="http://www.openoffice.org/product/impress.html">OpenOffice Impress</a>. Probably nicer than using the GUI ;-)<br />
<a href="http://pypi.python.org/pypi/rst2slides"><b>rst2slides</b></a> - an HTML5 slideshow.<br />
<b><a href="http://pypi.python.org/pypi/bruce">bruce</a></b> - an interactive OpenGL slideshow.<br />
<a href="http://pypi.python.org/pypi/StarScream"><b>StarScream</b></a> - a DHTML slideshow.<br />
<b><a href="http://pypi.python.org/pypi/landslide">landslide</a></b> - an HTML slideshow.<br />
<br />
<b><a href="http://pypi.python.org/pypi/handcrank">handcrank</a></b> - a static website.<br />
<b><a href="http://pypi.python.org/pypi/soho">soho</a></b> - a static website.<br />
<b><a href="http://pypi.python.org/pypi/flask-rst">flask-rst</a></b> - a static website.<br />
<b><a href="http://pypi.python.org/pypi/rest2web">rest2web</a></b> - a static website.<br />
<b><a href="http://pypi.python.org/pypi/cyrax">cyrax</a></b> - a static website.<br />
<br />
<b><a href="http://pypi.python.org/pypi/Sphinx">sphinx</a></b> - your project's documentation. And then some*.<br />
<br />
<b><a href="http://pypi.python.org/pypi/rst2pdf">rst2pdf</a></b> - PDF using <a href="http://pypi.python.org/pypi/reportlab">ReportLab</a>.<br />
<b><a href="http://pypi.python.org/pypi/rst2texinfo">rst2texinfo</a></b> - <a href="http://www.gnu.org/software/texinfo/">texinfo</a> (<span style="background-color: white; text-align: -webkit-auto;">the official documentation format of the </span><a href="http://www.gnu.org/" style="background-color: white; text-align: -webkit-auto;">GNU project</a><span style="background-color: white; text-align: -webkit-auto;">.)</span><br />
<span style="background-color: white; text-align: -webkit-auto;"><b><a href="http://pypi.python.org/pypi/rst2xaml">rst2xaml</a></b> - </span><a href="http://msdn.microsoft.com/en-us/library/aa970909.aspx">XAML</a> for WPF and Silverlight / Moonlight.<br />
<b><a href="http://pypi.python.org/pypi/rstex">rstex</a></b> - a more powerful version of the built-in LaTeX support docutils provides (inline math, equations, references, and raw latex.)<br />
<a href="http://pypi.python.org/pypi/epubmaker"><b>epubmaker</b></a> - <span style="background-color: white; font-family: Arial, Verdana, Geneva, 'Bitstream Vera Sans', Helvetica, sans-serif; font-size: 15px; line-height: 21px; text-align: -webkit-auto;"><a href="http://en.wikipedia.org/wiki/EPUB">EPUB</a>.</span><br />
<br />
<a href="http://pypi.python.org/pypi/restxsl" style="font-weight: bold;">restxml</a> - XML using XSLT. Oh, yes.<br />
<br />
Sorry if your tool isn't on this list. I tried :-)<br />
<br />
<br />
<br />
* there's a whole other blog post listing Sphinx extensions... later... :-)</div>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com6tag:blogger.com,1999:blog-5874912112064714506.post-32556857057456505302012-02-08T19:04:00.000-08:002012-02-08T19:04:47.079-08:00A couple of useful tools<div dir="ltr" style="text-align: left;" trbidi="on">
<b><a href="http://pypi.python.org/pypi/logging_tree">logging_tree</a></b> - introspect and display the logger tree inside the standard library's "logging" package. This could be an invaluable tool to discover what's really going on in your application's logging - and in particular perhaps why logging isn't working how you think it should.<br />
<b><a href="http://pypi.python.org/pypi/hgtools">hgtools</a></b> - adds support for Mercurial in setuptools, both for the basics like listing the files under revision control (so find_packages and include_package_data can do their work without needing explicit listings of files in MANIFEST.in) but also supporting pulling the version number from the repository tag so it doesn't have to be duplicated. The git equivalent appears to be <a href="http://pypi.python.org/pypi/setuptools-git">setuptools-git</a> (formerly known as gitlsfiles.)</div>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com0tag:blogger.com,1999:blog-5874912112064714506.post-12574166419539956602012-02-05T22:07:00.000-08:002012-02-05T22:09:20.608-08:00A few more random bits<div dir="ltr" style="text-align: left;" trbidi="on">
<b><a href="http://pypi.python.org/pypi/localshop">localshop</a></b> - "really, really alpha" but promising local PyPI mirror / private repository. Yes, <i>another</i> one. This one might just be the one to meet <i>my</i> specific requirements though...<br />
<br />
<b><a href="http://pypi.python.org/pypi/pytagcloud">pytagcloud</a></b> - is one to watch: make tag clouds as PNG images or HTML. Usage is a bit fiddly at the moment and I couldn't replicate the results they got. I think the key is having a good tag (interesting word) extractor. This bit of code might come in handy when experimenting with it:<br />
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">re</span>
<span class="kn">from</span> <span class="nn">roundup.backends.indexer_common</span> <span class="kn">import</span> <span class="n">STOPWORDS</span>
<span class="kn">import</span> <span class="nn">requests</span><span class="o">,</span> <span class="nn">collections</span><span class="o">,</span> <span class="nn">bs4</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">'http://www.python.org/about/'</span><span class="p">)</span><span class="o">.</span><span class="n">text</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">bs4</span><span class="o">.</span><span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">soup</span><span class="p">)</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">'div'</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="s">'content-body'</span><span class="p">)</span><span class="o">.</span><span class="n">get_text</span><span class="p">()</span>
<span class="n">counts</span> <span class="o">=</span> <span class="n">collections</span><span class="o">.</span><span class="n">defaultdict</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'\W+'</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
<span class="k">if</span> <span class="n">word</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">STOPWORDS</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">word</span><span class="p">)</span><span class="o">></span><span class="mi">2</span><span class="p">:</span>
<span class="n">counts</span><span class="p">[</span><span class="n">word</span><span class="o">.</span><span class="n">lower</span><span class="p">()]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">words</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">((</span><span class="n">count</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span> <span class="k">for</span> <span class="n">word</span><span class="p">,</span> <span class="n">count</span> <span class="ow">in</span> <span class="n">counts</span><span class="o">.</span><span class="n">items</span><span class="p">())</span>
<span class="n">tags</span> <span class="o">=</span> <span class="p">[(</span><span class="n">word</span><span class="p">,</span> <span class="n">count</span><span class="p">)</span> <span class="k">for</span> <span class="n">count</span><span class="p">,</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span><span class="p">[</span><span class="o">-</span><span class="mi">30</span><span class="p">:]]</span>
<span class="kn">from</span> <span class="nn">pytagcloud</span> <span class="kn">import</span> <span class="n">make_tags</span><span class="p">,</span> <span class="n">create_tag_image</span>
<span class="n">create_tag_image</span><span class="p">(</span><span class="n">make_tags</span><span class="p">(</span><span class="n">tags</span><span class="p">),</span> <span class="s">'cloud.png'</span><span class="p">)</span>
</pre>
</div>
Sadly it doesn't quite work for me. I suspect something might up up with my pygame/platform's TTF support. I also had to add a Font object cache to stop it blowing up on my system (git pull request submitted :-)<br />
<br />
<b><a href="http://pypi.python.org/pypi/slumber/">slumber</a></b> - call web RESTful (HTTP) APIs from Python code. Supports JSON, and YAML (with <a href="http://pypi.python.org/pypi/PyYAML">pyyaml</a> installed) and is built on top of the awesome <a href="http://pypi.python.org/pypi/requests">requests</a>. While looking at slumber I picked up this tip for validating and pretty-printing JSON:<br />
<div class="highlight">
<pre>$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
</pre>
</div>
</div>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com2tag:blogger.com,1999:blog-5874912112064714506.post-4556412435429548862012-02-02T19:30:00.000-08:002012-02-03T00:35:34.158-08:00Another small sampler to finish the week<div dir="ltr" style="text-align: left;" trbidi="on">
<i>OMG</i> it's <b><a href="http://pypi.python.org/pypi/beautifulsoup4">beautifulsoup4</a></b> - BeautifulSoup for Python 3! Beware: this release involves <a href="http://www.crummy.com/software/BeautifulSoup/doc/#porting-code-to-bs4">API changes</a>, amongst other things.<br />
<b><a href="http://pypi.python.org/pypi/heightfield">heightfield</a></b> is a neat toy that generates 256x256 heightfields using <a href="http://www.lighthouse3d.com/opengl/terrain/index.php3?particle">particle deposition</a>.<br />
<b><a href="http://pypi.python.org/pypi/pager">pager</a></b> - "page output to the screen, read keys and get console dimensions."</div>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com0tag:blogger.com,1999:blog-5874912112064714506.post-61297769451195788922012-02-01T21:53:00.000-08:002012-02-01T21:53:48.112-08:00Another sampler<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
I do love a module that has a nice, simple purpose and a direct, to the point name :-)</div>
<div>
<br /></div>
<b><a href="http://pypi.python.org/pypi/rfc6266">rfc6266</a></b> - parse and generate Content-Disposition headers as per <a href="http://tools.ietf.org/html/rfc6266">RFC 6266</a><br />
<b><a href="http://pypi.python.org/pypi/walkdir">walkdir</a></b> - making it easier to use os.walk() to walk directories with filtering, depth limiting, flattening and handling of symlink loops to boot!<br />
<b><a href="http://pypi.python.org/pypi/times">times</a></b> - "everything sucks about our mechanisms to represent absolute moments in time, but the least worst one of all is UTC." Indeed. In a style similar to the explicit bytes/unicode objects in Python 3 with encodings explicitly dealt with at input and output, this library encourages times to be UTC internally, with timezones only every dealt with at input and output time.</div>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com0tag:blogger.com,1999:blog-5874912112064714506.post-6597698096052369952012-01-31T18:00:00.000-08:002012-01-31T18:41:28.508-08:00Some new packages that look interesting<div dir="ltr" style="text-align: left;" trbidi="on">
<b style="color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 18px; text-align: -webkit-auto;"><a href="http://pypi.python.org/pypi/d">d</a></b><span style="background-color: white; color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 18px; text-align: -webkit-auto;"> - "If you have a small project and want to quickly write some docs that don't look like ass." It uses Markdown, pygments and pyquery and keeps things simple. It has none of the capabilities, nor the complexities, of </span><a href="http://pypi.python.org/pypi/Sphinx" style="font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 18px; text-align: -webkit-auto;">Sphinx</a><span style="background-color: white; color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 18px; text-align: -webkit-auto;">.</span><br />
<span style="background-color: white; color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 18px; text-align: -webkit-auto;"><b><a href="http://pypi.python.org/pypi/success">success</a></b> - "</span><span style="color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif;"><span style="font-size: 14px; line-height: 18px;">import success" (waiting for the module to be uploaded though ;-)</span></span><br />
<span style="color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 18px;">
<b><a href="http://pypi.python.org/pypi/airy">Airy</a></b> - a new Web application development framework. "</span><span style="background-color: white; color: #3e4349; font-family: Arial, sans-serif; font-size: 14px; line-height: 21px; text-align: -webkit-auto;">Contrast to most currently available frameworks, Airy doesn’t use the standard notion of HTTP requests and pages. Instead, it makes use of WebSockets and provides a set of tools to let you focus on the interface, not content delivery."</span><br />
<span style="color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 18px;"><b><a href="http://pypi.python.org/pypi/timerange">timerange</a></b> - g</span><span style="color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif;"><span style="font-size: 14px; line-height: 18px;">enerates list of dates in various formats.</span></span><br />
<span style="color: #333333; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, sans-serif;"><span style="font-size: 14px; line-height: 18px;"><b><a href="http://pypi.python.org/pypi/pbs">pbs</a></b> - "PBS is a unique subprocess wrapper that maps your system programs to Python functions dynamically." An example given is:</span></span><br />
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">pbs</span> <span class="kn">import</span> <span class="n">ifconfig</span>
<span class="k">print</span> <span class="n">ifconfig</span><span class="p">(</span><span class="s">"eth0"</span><span class="p">)</span>
</pre>
</div>
</div>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com4tag:blogger.com,1999:blog-5874912112064714506.post-76742857306202838152010-05-06T17:51:00.000-07:002010-05-06T21:57:52.867-07:00Animating fishes for progress bars<a href="http://pypi.python.org/pypi/fish/">Animated fish progress bars</a><br />
<br />
<a href="http://pypi.python.org/pypi/fish/"></a>'nuff said<br />
<br />
<b>Update: </b>I added my own take, <a href="http://pypi.python.org/pypi/worm">worm</a> :-)Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com3tag:blogger.com,1999:blog-5874912112064714506.post-74531953174705879042010-04-15T18:08:00.000-07:002010-04-15T21:44:51.457-07:00dbapiext.py - making dealing with SQL easier in Python<p>Sorry there was no post last week - I’ve been a little busy organising <a class="reference external" href="http://pycon-au.org/">PyCon Australia</a>.</p><p>This week I’d like to share <a class="reference external" href="http://furius.ca/pubcode/pub/antiorm/lib/python/dbapiext.py.html">dbapiext.py</a> by Martin Blais with you. It provides really nice interface for making dealing with SQL easier in Python.</p><div class="section" id="introduction"><h1>Introduction</h1><p>In the Python world we have a standard specification of how database interfaces should look and we call it PEP 294, or more commonly <a class="reference external" href="http://www.python.org/dev/peps/pep-0249/">DB-API 2.0</a>. One of the things about DB-API 2.0 is that it’s a little loose in some areas - the stated goal of the PEP is “to <em>encourage similarity</em> between the Python modules that are used to access databases” (my emphasis) rather than to dictate a precise API.</p><p>In practical terms what this means is that you can’t just swap DB-API 2.0 modules like <a class="reference external" href="http://www.zope.org/Members/matt/dco2">DCOracle2</a> for <a class="reference external" href="http://cx-oracle.sourceforge.net/">cx_Oracle</a>, even for simple operations and even though they both run on top of Oracle. The core reason for this is that they use different parameter specifications. For example, a query in DCOracle2 might look like:</p><div class="highlight-python"><div class="highlight"><pre><span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'SELECT * FROM customers WHERE name=:1'</span><span class="p">,</span> <span class="s">'Richard Jones'</span><span class="p">)</span>
</pre></div></div><p>the same query in cx_Oracle might look like:</p><div class="highlight-python"><div class="highlight"><pre><span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'SELECT * FROM customers WHERE name=:name'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'Richard Jones'</span><span class="p">)</span>
</pre></div></div><p>Of course one solution is to not use parameters and format the arguments directly into the SQL. But that’s just <a class="reference external" href="http://en.wikipedia.org/wiki/SQL_injection">inviting disaster</a>.</p><p>The dbapiext module attempts to normalise these interfaces and add some new convenience features. At its simplest it allows one to translate the above two statemens into a single form:</p><div class="highlight-python"><div class="highlight"><pre><span class="n">execute_f</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s">'SELECT * FROM customers WHERE name=%S'</span><span class="p">,</span> <span class="s">'Richard Jones'</span><span class="p">)</span>
</pre></div></div><p>Where the “%S” indicates that the value being passed should be escaped and quoted as appropriate. If you don’t wish for the value to be treated as such you may use “%s” which formats the value directly into the SQL. This will work on top of either backend, given the following definition of execute_f:</p><div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">dbapiext</span> <span class="kn">import</span> <span class="n">execute_f</span>
<span class="kn">import</span> <span class="nn">functools</span>
<span class="k">if</span> <span class="n">using_cx_Oracle</span><span class="p">:</span>
<span class="n">execute_f</span> <span class="o">=</span> <span class="n">functools</span><span class="o">.</span><span class="n">partial</span><span class="p">(</span><span class="n">execute_f</span><span class="p">,</span> <span class="n">paramstyle</span><span class="o">=</span><span class="s">'named'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">execute_f</span> <span class="o">=</span> <span class="n">functools</span><span class="o">.</span><span class="n">partial</span><span class="p">(</span><span class="n">execute_f</span><span class="p">,</span> <span class="n">paramstyle</span><span class="o">=</span><span class="s">'numeric'</span><span class="p">)</span>
</pre></div></div></div><div class="section" id="other-capabilities"><h1>Other Capabilities</h1><p>The query parsing and argument handling is quite flexible. You can mix positional arguments and keyword arguments - and refer to the keyword arguments by name in the SQL, especially if the underlying database connection implementation doesn’t offer the facility:</p><div class="highlight-python"><div class="highlight"><pre><span class="n">execute_f</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s">'''SELECT * FROM account WHERE</span>
<span class="s"> account.active = %S AND (</span>
<span class="s"> account.number = %(number)S OR</span>
<span class="s"> account.number = (</span>
<span class="s"> SELECT account FROM mobile_account WHERE</span>
<span class="s"> mobile_account.msisdn = %(number)S</span>
<span class="s"> ))'''</span><span class="p">,</span> <span class="n">activated</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="n">number</span><span class="p">)</span>
</pre></div></div><p>In this situation the “%S” SQL parameter will use the fixed argument “activated” and the “%(number)S” SQL parameter will use the keyword argument “number”. Pretty cool.</p><p>dbapiext doesn’t stop there - it introduces a bunch of other really neat extensions. How many times has your code included something like this?</p><div class="highlight-python"><div class="highlight"><pre><span class="c"># given some variable list of data to update</span>
<span class="n">data</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'Richard'</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="s">'medium'</span><span class="p">,</span> <span class="n">alignment</span><span class="o">=</span><span class="s">'neutral'</span><span class="p">)</span>
<span class="c"># figure the SQL and values argument</span>
<span class="n">columns</span> <span class="o">=</span> <span class="s">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s">'</span><span class="si">%s</span><span class="s">=:1'</span><span class="o">%</span><span class="n">k</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">data</span><span class="p">)</span>
<span class="n">sql</span> <span class="o">=</span> <span class="s">'UPDATE person set </span><span class="si">%s</span><span class="s">'</span><span class="o">%</span><span class="n">columns</span>
<span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="n">data</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="c"># update the information in a table</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">sql</span><span class="p">,</span> <span class="o">*</span><span class="n">values</span><span class="p">)</span>
</pre></div></div><p>With dbapiext you can do the much more pythonic:</p><div class="highlight-python"><div class="highlight"><pre><span class="c"># given some variable list of data to update</span>
<span class="n">data</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'Richard'</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="s">'medium'</span><span class="p">,</span> <span class="n">alignment</span><span class="o">=</span><span class="s">'neutral'</span><span class="p">)</span>
<span class="n">execute_f</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s">'UPDATE person %S'</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
</pre></div></div><p>The “%S” argument here renders the dictionary in the form suitable for the UPDATE statement. If your SQL was a SELECT instead you may use “%A” which joins the dictionary items with “AND” instead:</p><div class="highlight-python"><div class="highlight"><pre><span class="c"># given some variable list of data to match</span>
<span class="n">data</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">'Richard'</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="s">'medium'</span><span class="p">,</span> <span class="n">alignment</span><span class="o">=</span><span class="s">'neutral'</span><span class="p">)</span>
<span class="n">execute_f</span><span class="p">(</span><span class="n">cursor</span><span class="p">,</span> <span class="s">'SELECT * FROM person WHERE %A'</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
</pre></div></div><p>Note that “data” could also be a list of pairs instead of a dictionary. How convenient is that!</p></div>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com4tag:blogger.com,1999:blog-5874912112064714506.post-89700508283796067572010-04-01T01:41:00.000-07:002010-04-01T04:55:35.622-07:00cython - easier optimisations than writing C<p>For this Bit Of Cheese I thought I’d present a bit of an example using <a href="http://www.cython.org/">cython</a> since I just learned how to use it myself for <a href="http://pyweek.org/">PyWeek</a>.</p><p>For the unaware, cython is a neat little extension to Python that makes it much easier to write C-optimised modules for your Python code, including using 3rd-party C libraries.</p><p>For my <a href="http://pyweek.org/e/toto/">PyWeek entry</a> I need to generate a lot of cubic <a href="http://en.wikipedia.org/wiki/B%C3%A9zier_curve">bézier curves</a>. I also wanted to experiment with mutating them - which basically means re-generating them many times over. The core curve generation code therefore had to be as fast as I could possibly make it.</p><p>This code looks something like this in Python:</p><div class="highlight-python"><div class="highlight"><pre><span class="k">def</span> <span class="nf">generate</span><span class="p">(</span><span class="n">p1x</span><span class="p">,</span> <span class="n">p1y</span><span class="p">,</span> <span class="n">p2x</span><span class="p">,</span> <span class="n">p2y</span><span class="p">,</span> <span class="n">cp1x</span><span class="p">,</span> <span class="n">cp1y</span><span class="p">,</span> <span class="n">cp2x</span><span class="p">,</span>
<span class="n">cp2y</span><span class="p">,</span> <span class="n">step</span><span class="p">):</span>
<span class="sd">'''Given the two points p1 and p2 and their control</span>
<span class="sd"> points cp1 and cp2 generate a cubic bezier curve with</span>
<span class="sd"> steps of "step".</span>
<span class="sd"> Return the list of (x, y) points on the curve.</span>
<span class="sd"> '''</span>
<span class="n">l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c"># generate the cubic bezier points</span>
<span class="n">x1</span> <span class="o">=</span> <span class="n">cp1x</span> <span class="o">*</span> <span class="mf">3</span><span class="p">;</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">cp2x</span> <span class="o">*</span> <span class="mf">3</span>
<span class="n">y1</span> <span class="o">=</span> <span class="n">cp1y</span> <span class="o">*</span> <span class="mf">3</span><span class="p">;</span> <span class="n">y2</span> <span class="o">=</span> <span class="n">cp2y</span> <span class="o">*</span> <span class="mf">3</span>
<span class="n">t</span> <span class="o">=</span> <span class="mf">0</span>
<span class="k">while</span> <span class="n">t</span> <span class="o"><=</span> <span class="p">(</span><span class="mf">1</span> <span class="o">+</span> <span class="n">step</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span> <span class="n">a2</span> <span class="o">=</span> <span class="n">a</span><span class="o">**</span><span class="mf">2</span><span class="p">;</span> <span class="n">a3</span> <span class="o">=</span> <span class="n">a</span><span class="o">**</span><span class="mf">3</span>
<span class="n">b</span> <span class="o">=</span> <span class="mf">1</span> <span class="o">-</span> <span class="n">t</span><span class="p">;</span> <span class="n">b2</span> <span class="o">=</span> <span class="n">b</span><span class="o">**</span><span class="mf">2</span><span class="p">;</span> <span class="n">b3</span> <span class="o">=</span> <span class="n">b</span><span class="o">**</span><span class="mf">3</span>
<span class="n">px</span> <span class="o">=</span> <span class="n">p1x</span><span class="o">*</span><span class="n">b3</span> <span class="o">+</span> <span class="n">x1</span><span class="o">*</span><span class="n">b2</span><span class="o">*</span><span class="n">a</span> <span class="o">+</span> <span class="n">x2</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">a2</span> <span class="o">+</span> <span class="n">p2x</span><span class="o">*</span><span class="n">a3</span>
<span class="n">py</span> <span class="o">=</span> <span class="n">p1y</span><span class="o">*</span><span class="n">b3</span> <span class="o">+</span> <span class="n">y1</span><span class="o">*</span><span class="n">b2</span><span class="o">*</span><span class="n">a</span> <span class="o">+</span> <span class="n">y2</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">a2</span> <span class="o">+</span> <span class="n">p2y</span><span class="o">*</span><span class="n">a3</span>
<span class="n">l</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">px</span><span class="p">,</span> <span class="n">py</span><span class="p">))</span>
<span class="n">t</span> <span class="o">+=</span> <span class="n">step</span>
<span class="k">return</span> <span class="n">l</span>
<span class="k">def</span> <span class="nf">speed_test</span><span class="p">():</span>
<span class="n">generate</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span> <span class="mf">0</span><span class="p">,</span> <span class="mf">5</span><span class="p">,</span> <span class="mf">0</span><span class="p">,</span> <span class="mf">0</span><span class="p">,</span> <span class="mf">1</span><span class="p">,</span> <span class="mf">5</span><span class="p">,</span> <span class="o">-</span><span class="mf">1</span><span class="p">,</span> <span class="o">.</span><span class="mf">0001</span><span class="p">)</span>
</pre></div></div><p>For generating a curve with quite small steps on my MacBook the result is about 17.5 milliseconds per curve:</p><div class="highlight-sh"><div class="highlight"><pre><span class="nv">$ </span>python -m timeit -s <span class="s1">'import curve'</span> <span class="s1">'curve.speed_test()'</span>
100 loops, best of 3: 17.5 msec per loop
</pre></div></div><p>The first thing I tried is simply copying my <tt class="docutils literal"><span class="pre">curve.py</span></tt> to <tt class="docutils literal"><span class="pre">_curve.pyx</span></tt> (the <tt class="docutils literal"><span class="pre">.pyx</span></tt> denoting it being a cython module) and adding some code to the end of the original <tt class="docutils literal"><span class="pre">curve.py</span></tt> to use the cython version:</p><div class="highlight-python"><div class="highlight"><pre><span class="k">try</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">pyximport</span><span class="p">;</span> <span class="n">pyximport</span><span class="o">.</span><span class="n">install</span><span class="p">()</span>
<span class="kn">import</span> <span class="nn">_curve</span>
<span class="n">generate</span> <span class="o">=</span> <span class="n">_curve</span><span class="o">.</span><span class="n">generate</span>
<span class="k">except</span> <span class="ne">Exception</span><span class="p">,</span> <span class="n">e</span><span class="p">:</span>
<span class="k">print</span> <span class="s">'_curve not available:'</span><span class="p">,</span> <span class="n">e</span>
<span class="k">pass</span>
</pre></div></div><p>This is only one way to generate cython modules. The other is to use code in your <tt class="docutils literal"><span class="pre">setup.py</span></tt> file – but for this project I don’t actually have one. The <tt class="docutils literal"><span class="pre">pyximport</span></tt> module will compile the “pyrex” format cython files with the <tt class="docutils literal"><span class="pre">.pyx</span></tt> suffix on import. It detects changes to the original file and recompiles, so it’s <em>incredibly</em> convenient!</p><p>Re-running the test after that change I found the result was surprisingly similar:</p><div class="highlight-sh"><div class="highlight"><pre><span class="nv">$ </span>python -m timeit -s <span class="s1">'import curve'</span> <span class="s1">'curve.speed_test()'</span>
100 loops, best of 3: 17.5 msec per loop
</pre></div></div><p>Hmm.</p><p>OK, well, cython has a bunch of hints I can give it like function arguments. These are just C types that I can add to the Python function signature.</p><p>Let’s try those to start with. Note that Python floats are actually C doubles:</p><div class="highlight-python"><pre><span class="k">def</span> <span class="nf">generate</span><span class="p">(</span><span class="n">double</span> <span class="n">p1x</span><span class="p">,</span> <span class="n">double</span> <span class="n">p1y</span><span class="p">,</span> <span class="n">double</span> <span class="n">p2x</span><span class="p">,</span>
<span class="n">double</span> <span class="n">p2y</span><span class="p">,</span> <span class="n">double</span> <span class="n">cp1x</span><span class="p">,</span> <span class="n">double</span> <span class="n">cp1y</span><span class="p">,</span>
<span class="n">double</span> <span class="n">cp2x</span><span class="p">,</span> <span class="n">double</span> <span class="n">cp2y</span><span class="p">,</span> <span class="n">double</span> <span class="n">step</span><span class="p">):</span></pre></div><p>This saves me a couple of seconds:</p><div class="highlight-sh"><div class="highlight"><pre><span class="nv">$ </span>python -m timeit -s <span class="s1">'import curve'</span> <span class="s1">'curve.speed_test()'</span>
100 loops, best of 3: 15.5 msec per loop
</pre></div></div><p>So let’s declare <strong>all</strong> of my variable types (except the Python list):</p><div class="highlight-python"><pre><span class="k">cdef</span> <span class="kt">double</span> <span class="nf">x1</span><span class="p">,</span> <span class="nf">x2</span><span class="p">,</span> <span class="nf">y1</span><span class="p">,</span> <span class="nf">y2</span><span class="p">,</span> <span class="nf">t</span>
<span class="k">cdef</span> <span class="kt">double</span> <span class="nf">a</span><span class="p">,</span> <span class="nf">a2</span><span class="p">,</span> <span class="nf">a3</span><span class="p">,</span> <span class="nf">b</span><span class="p">,</span> <span class="nf">b2</span><span class="p">,</span> <span class="nf">b3</span><span class="p">,</span> <span class="nf">px</span><span class="p">,</span> <span class="nf">py</span></pre></div><p>This change bought me the improvement I was after - the function is now <strong>way faster</strong> than the original Python version:</p><div class="highlight-sh"><div class="highlight"><pre><span class="nv">$ </span>python -m timeit -s <span class="s1">'import curve'</span> <span class="s1">'curve.speed_test()'</span>
100 loops, best of 3: 2.08 msec per loop
</pre></div></div><p>Cool eh?</p><p>The final code in <tt class="docutils literal"><span class="pre">_curve.pyx</span></tt> looks like this:</p><div class="highlight-python"><pre><span class="k">def</span> <span class="nf">generate</span><span class="p">(</span><span class="n">double</span> <span class="n">p1x</span><span class="p">,</span> <span class="n">double</span> <span class="n">p1y</span><span class="p">,</span> <span class="n">double</span> <span class="n">p2x</span><span class="p">,</span>
<span class="n">double</span> <span class="n">p2y</span><span class="p">,</span> <span class="n">double</span> <span class="n">cp1x</span><span class="p">,</span> <span class="n">double</span> <span class="n">cp1y</span><span class="p">,</span>
<span class="n">double</span> <span class="n">cp2x</span><span class="p">,</span> <span class="n">double</span> <span class="n">cp2y</span><span class="p">,</span> <span class="n">double</span> <span class="n">step</span><span class="p">):</span>
<span class="sd">'''Given the two points p1 and p2 and their control</span>
<span class="sd"> points cp1 and cp2 generate a cubic bezier curve</span>
<span class="sd"> with steps of "step".</span>
<span class="sd"> Return the list of (x, y) points on the curve.</span>
<span class="sd"> '''</span>
<span class="k">cdef</span> <span class="kt">double</span> <span class="nf">x1</span><span class="p">,</span> <span class="nf">x2</span><span class="p">,</span> <span class="nf">y1</span><span class="p">,</span> <span class="nf">y2</span><span class="p">,</span> <span class="nf">t</span>
<span class="k">cdef</span> <span class="kt">double</span> <span class="nf">a</span><span class="p">,</span> <span class="nf">a2</span><span class="p">,</span> <span class="nf">a3</span><span class="p">,</span> <span class="nf">b</span><span class="p">,</span> <span class="nf">b2</span><span class="p">,</span> <span class="nf">b3</span><span class="p">,</span> <span class="nf">px</span><span class="p">,</span> <span class="nf">py</span>
<span class="n">l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c"># generate the cubic bezier points</span>
<span class="n">x1</span> <span class="o">=</span> <span class="n">cp1x</span> <span class="o">*</span> <span class="mf">3</span><span class="p">;</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">cp2x</span> <span class="o">*</span> <span class="mf">3</span>
<span class="n">y1</span> <span class="o">=</span> <span class="n">cp1y</span> <span class="o">*</span> <span class="mf">3</span><span class="p">;</span> <span class="n">y2</span> <span class="o">=</span> <span class="n">cp2y</span> <span class="o">*</span> <span class="mf">3</span>
<span class="n">t</span> <span class="o">=</span> <span class="mf">0</span>
<span class="k">while</span> <span class="n">t</span> <span class="o"><=</span> <span class="p">(</span><span class="mf">1</span> <span class="o">+</span> <span class="n">step</span><span class="p">):</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span> <span class="n">a2</span> <span class="o">=</span> <span class="n">a</span><span class="o">**</span><span class="mf">2</span><span class="p">;</span> <span class="n">a3</span> <span class="o">=</span> <span class="n">a</span><span class="o">**</span><span class="mf">3</span>
<span class="n">b</span> <span class="o">=</span> <span class="mf">1</span> <span class="o">-</span> <span class="n">t</span><span class="p">;</span> <span class="n">b2</span> <span class="o">=</span> <span class="n">b</span><span class="o">**</span><span class="mf">2</span><span class="p">;</span> <span class="n">b3</span> <span class="o">=</span> <span class="n">b</span><span class="o">**</span><span class="mf">3</span>
<span class="n">px</span> <span class="o">=</span> <span class="n">p1x</span><span class="o">*</span><span class="n">b3</span> <span class="o">+</span> <span class="n">x1</span><span class="o">*</span><span class="n">b2</span><span class="o">*</span><span class="n">a</span> <span class="o">+</span> <span class="n">x2</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">a2</span> <span class="o">+</span> <span class="n">p2x</span><span class="o">*</span><span class="n">a3</span>
<span class="n">py</span> <span class="o">=</span> <span class="n">p1y</span><span class="o">*</span><span class="n">b3</span> <span class="o">+</span> <span class="n">y1</span><span class="o">*</span><span class="n">b2</span><span class="o">*</span><span class="n">a</span> <span class="o">+</span> <span class="n">y2</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">a2</span> <span class="o">+</span> <span class="n">p2y</span><span class="o">*</span><span class="n">a3</span>
<span class="n">l</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">px</span><span class="p">,</span> <span class="n">py</span><span class="p">))</span>
<span class="n">t</span> <span class="o">+=</span> <span class="n">step</span>
<span class="k">return</span> <span class="n">l</span></pre></div><p>The important thing to note here is that the bulk of the function is untouched, pure Python. cython does all the smarts now it knows enough about the types of the variables. Clever cython!</p><p>OK, now the downside. It took me a lot of trial-and-error and banging my head against the cython docs to even do something as simple as that.</p><p>One of my early mistakes was to declare the function "cdef" like I saw in all the examples. This results in a function that's not actually exported to Python from the module though. It needs to remain a standard "def" (or, if it's to be used in both C and Python it may be a "cpdef"'ed function). This stumble cost me quite a large amount of time.</p><p>Eventually I found most of the useful information in a section marked “<a href="http://docs.cython.org/src/userguide/">Old Cython Users Guide</a>” which, though broken in places, still contains the most useful information for a newbie like me.</p><p>ps. apologies for the horizontally smooshed code - I've only just found a wider template. Future posts won't be quite as restricted.</p>Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com2tag:blogger.com,1999:blog-5874912112064714506.post-69908510586326950962010-03-25T17:02:00.000-07:002010-03-25T17:02:43.635-07:00Regular expression ... expressions!To kick off this blog I think I'll start with something a little wacky. Krister Hedfors has created a package called <a href="http://pypi.python.org/pypi/inrex/">inrex</a> which implements a bunch of regular expression "operators" ("inrex" is short for "infix regular expressions"). Here's how regular expressions are normally handled in Python:<br />
<div class="highlight"><pre><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">re</span>
<span class="gp">>>> </span><span class="n">match</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s">r'(\w+) (\d+)'</span><span class="p">,</span> <span class="s">'asd 123'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">if</span> <span class="n">match</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">print</span> <span class="s">'word is'</span><span class="p">,</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mf">1</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">print</span> <span class="s">'digit is'</span><span class="p">,</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mf">2</span><span class="p">)</span>
<span class="gp">... </span>
<span class="go">word is asd</span>
<span class="go">digit is 123</span>
<span class="gp">>>> </span><span class="n">match</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s">r'(?P<word>\w+) (?P<digit>\d+)'</span><span class="p">,</span> <span class="s">'asd 123'</span><span class="p">)</span>
<span class="gp">>>> </span><span class="k">if</span> <span class="n">match</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">print</span> <span class="s">'word is'</span><span class="p">,</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s">'word'</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">print</span> <span class="s">'digit is'</span><span class="p">,</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s">'digit'</span><span class="p">)</span>
<span class="gp">... </span>
<span class="go">word is asd</span>
<span class="go">digit is 123</span>
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'\d+'</span><span class="p">,</span> <span class="s">'asd 123 qwe 456'</span><span class="p">)</span>
<span class="go">['123', '456']</span>
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">r'\d+'</span><span class="p">,</span> <span class="s">'asd 123 qwe 456'</span><span class="p">)</span>
<span class="go">['asd ', ' qwe ', '']</span>
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">r'\d+'</span><span class="p">,</span> <span class="s">'asd 123 qwe 456'</span><span class="p">,</span> <span class="n">maxsplit</span><span class="o">=</span><span class="mf">1</span><span class="p">)</span>
<span class="go">['asd ', ' qwe 456']</span></pre></div>Note that we need to have a statement to obtain the match object and a second statement to examine it. Pretty standard Python, but a little annoying sometimes. Here's how the same results are achieved in inrex:<br />
<div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">inrex</span> <span class="kn">import</span> <span class="n">match</span><span class="p">,</span> <span class="n">search</span><span class="p">,</span> <span class="n">split</span><span class="p">,</span> <span class="n">findall</span><span class="p">,</span> <span class="n">finditer</span>
<span class="gp">>>> </span>
<span class="gp">>>> </span><span class="k">if</span> <span class="s">'asd 123'</span> <span class="o">|</span><span class="n">match</span><span class="o">|</span> <span class="s">r'(\w+) (\d+)'</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">print</span> <span class="s">'word is'</span><span class="p">,</span> <span class="n">match</span><span class="p">[</span><span class="mf">1</span><span class="p">]</span>
<span class="gp">... </span> <span class="k">print</span> <span class="s">'digit is'</span><span class="p">,</span> <span class="n">match</span><span class="p">[</span><span class="mf">2</span><span class="p">]</span>
<span class="gp">... </span>
<span class="go">word is asd</span>
<span class="go">digit is 123</span>
<span class="gp">>>> </span><span class="k">if</span> <span class="s">'asd 123'</span> <span class="o">|</span><span class="n">match</span><span class="o">|</span> <span class="s">r'(?P<word>\w+) (?P<digit>\d+)'</span><span class="p">:</span>
<span class="gp">... </span> <span class="k">print</span> <span class="s">'word is'</span><span class="p">,</span> <span class="n">match</span><span class="p">[</span><span class="s">'word'</span><span class="p">]</span>
<span class="gp">... </span> <span class="k">print</span> <span class="s">'digit is'</span><span class="p">,</span> <span class="n">match</span><span class="p">[</span><span class="s">'digit'</span><span class="p">]</span>
<span class="gp">... </span>
<span class="go">word is asd</span>
<span class="go">digit is 123</span>
<span class="gp">>>> </span><span class="s">'asd 123 qwe 456'</span> <span class="o">|</span><span class="n">findall</span><span class="o">|</span> <span class="s">r'\d+'</span>
<span class="go">['123', '456']</span>
<span class="gp">>>> </span><span class="s">'asd 123 qwe 456'</span> <span class="o">|</span><span class="n">split</span><span class="o">|</span> <span class="s">r'\d+'</span>
<span class="go">['asd ', ' qwe ', '']</span>
<span class="gp">>>> </span><span class="s">'asd 123 qwe 456'</span> <span class="o">|</span><span class="n">split</span><span class="p">(</span><span class="n">maxsplit</span><span class="o">=</span><span class="mf">1</span><span class="p">)</span><span class="o">|</span> <span class="s">r'\d+'</span>
<span class="go">['asd ', ' qwe 456']</span>
</pre></div>Working with the match object is clearly much easier. There's a limitation that it'll only work for an immediate result; unlike the standard re.match the inrex match object is a singleton, and thus you can only work with one result at a time. For simple cases (the most common) a singleton match object would suffice.Richard Joneshttp://www.blogger.com/profile/04600262656208358816noreply@blogger.com9