Saturday, January 24, 2009

PyMOTW feed moved to Google hosting

All Feedburner feeds need to move to Google accounts by the end of February, and I've gone ahead and made the switch. I don't expect any interruption in service, but you never know. According to the FAQ, the old URLs should redirect to the new location, so if you're already a subscriber you shouldn't need to do anything.

For the record, the canonical feeds for this blog are http://feeds.doughellmann.com/DougHellmann for all posts and http://feeds.doughellmann.com/PyMOTW for just the PyMOTW posts. Those URLs have not changed, so if that's how you're subscribed you should be good to go.

Email me or post a comment here if you have trouble.

Friday, January 23, 2009

Python Magazine for January 2009



The January 2009 issue of Python Magazine is available for download now.

This month's cover feature is Creating a Collection Manager with Elixir , by Gaëtan de Menten. You have heard about SQLAlchemy, but never found the time to actually try it? Or maybe you have heard about the add-on to SQLAlchemy called Elixir, but don't really know what it is useful for. Here is your chance to see them in action.

JC Cruz continues his series of articles on using Python to create applications for Mac OS X. In the last article, you learned how to edit data with the NSTableView. This month, in Table Drag and Drop with PyObjC, you will learn the mechanics behind drag-and-drop and how to support it in a table.

If you've ever done extensive development in Django, you'll know the issues involved with changing schemas. In Django Migrations with South, Andrew Godwin introduces South, a migrations library for Django, and shows how it helps to solve many of the problems you face as your Django project matures.

This month we are co-publishing a special-guest column by Ivo Jansch, a regular contributor to our sister publication php|architect. Requirements Gathering in the Enterprise is a look at the processes used in large enterprise software development.

Jesse Noller reflects on the new challenges of test engineering in the software world in his column, And Now For Something Completely Different: The Changing Face of Test Engineering.

Interacting with the file system can often be a frustrating test for new programmers. This month Mark Mruss introduces the os module and some of the more helpful functions it contains to help ease some of that frustration in Welcome to Python: Working with Files and Directories.

Beginning this month, we are proud to present the column, Ask the Pragmatic Testers, written by Titus Brown and Grig Gheorghiu. Both Titus and Grig have a wealth of development experience, and we're excited to add them as regular contributors to the magazine. Many people think that automated testing is about making sure your software works right. They're not wrong, but in Why test? It's about complexity. Titus and Grig remind us that there are deeper benefits -- in particular, managing the complexity of the build and development environments.

This month Steve Holden ponders on the Zen of Python, sometimes with less relevance that might be expected.

And in What does Python 3.0 mean for you?, I talk about the reactions to the long-awaited release of Python 3.0, that came on December 3, 2008. This is a big step forward in the evolution of Python, as it provides an opportunity for the core developers to introduce backwards-incompatible changes to the language and libraries and break free of some past design decisions that have been deemed misguided or short-sighted.

Monday, January 19, 2009

PyMOTW is now available in German

Ralf Schönian is translating the PyMOTW series into German. He is posting the articles on his web site as he translates them.

Ralf is an active member of the pyCologne user group in Germany and author of pyVoc, the open source English/German vocabulary trainer.

Thanks, Ralf!

Sunday, January 18, 2009

Converting from Make to Paver

As I briefly mentioned in an earlier post, I recently moved the PyMOTW build from make to Kevin Dangoor's Python-based build tool Paver. I had been wanting to try Paver out for a while, especially since seeing Kevin's presentation at PyWorks in November. As a long time Unix/Linux user, I didn't have any particular problems with make, but it looked intriguing. PyMOTW is one of the few projects I have with a significant build process (beyond simply creating the source tarball), so it seemed like a good candidate for experimentation.

Concepts:

The basic concepts with Paver and make are the same. Make "targets" correspond roughly to Paver "tasks". Paver places less emphasis on file modification time-stamps, though, so tasks are all essentially "PHONEY" targets. As with make, Paver keeps track of which dependencies are executed so they are not repeated while building any one target.

Tasks are implemented as simple Python functions. Paver starts by loading pavement.py, and tasks can be defined inline there or you can import code from elsewhere if needed. According to Kevin, once the main engine settles down enough to reach a 1.0 release, he doesn't anticipate a lot of active development on the core. Recipes for extending Paver can be added easily through external modules which would be distributed separately.

Building a Source Distribution:

The most important target from the old PyMOTW Makefile was "package". It ran sphinx-build to create the HTML version of the documentation then produced a versioned source distribution with distutils. The whole thing was bundled up and dropped on my computer desktop, ready to be uploaded to my web site.

package: clean html
rm -f setup.py
$(MAKE) setup.py
rm -f MANIFEST MANIFEST.in
$(MAKE) MANIFEST.in
python setup.py sdist --force-manifest
mv dist/*.gz ~/Desktop/


Paver sits on top of distutils, so one of the pre-defined tasks it has built-in is "sdist" (similar to python setup.py sdist, for producing source distributions of Python apps or libraries). In my case, I extended the task definition to perform some pre-requisite tasks and move the tarball out of the build directory onto the desktop of my computer, to make it easier to upload to my web site.

Let's look at the task definition:

@task
@needs(['generate_setup', 'minilib', 'html_clean',
'setuptools.command.sdist'
])
def sdist():
"""Create a source distribution.
"""
# Copy the output file to the desktop
dist_files = path('dist').glob('*.tar.gz')
dest_dir = path(options.sdist.outdir).expanduser()
for f in dist_files:
f.move(dest_dir)
return


The @task decorator registers the function as a task, tying it in to the list of options available from the command line. The docstring for the function is included in the help output (paver help or paver --help-commands).

$ paver help
---> help
Paver 0.8.1

Usage: paver [global options] [option.name=value] task [task options] [task...]

Run 'paver help [section]' to see the following sections of info:

options global command line options
setup available distutils/setuptools tasks
tasks all tasks that have been imported by your pavement

'paver help taskname' will display details for a task.

Tasks defined in your pavement:
blog - Generate the blog post version of the HTML for the current module
html - Build HTML documentation using Sphinx
html_clean - Remove sphinx output directories before building the HTML
installwebsite - Rebuild and copy website files to the remote server
pdf - Generate the PDF book
sdist - Create a source distribution
webhtml - Generate HTML files for website
website - Create local copy of website files
webtemplatebase - Import the latest version of the web page template from the source


To run the task, pass the name as argument to paver on the command line:

$ paver sdist


Prerequisites:

The @needs decorator specifies the prerequisites for a task, listed in order and identified by name. Paver prerequisites correspond to make dependencies, and are all run before the task, as you would expect.

In the Makefile, before building a source distribution I always ran the "clean" and "html" targets, too. That meant I had a fresh copy of the HTML version of PyMOTW, generated by sphinx. The next step was to build setup.py using a simple input template file processed by sed (so I didn't have to remember to edit the version and download URL every time).

Paver provides a task to generate a setup.py ("generate_setup"), so I no longer need to mess around with templates on my own. The "minilib" task writes a ZIP archive with enough of Paver to support installation through the usual python setup.py install or easy_install PyMOTW incantations.

Notice that "setuptools.command.sdist" is the fully qualified name to the task being redefined locally. That means in this case the standard work for sdist (producing the source distribution) is run prior to invoking my override function.

I've defined an "html_clean" task in pavement.py to take the place of the old make targets "clean" and "html":

@task
def html_clean():
"""Remove sphinx output directories before building the HTML.
"""
remake_directories(options.sphinx.doctrees, options.html.outdir)
call_task('html')
return


remake_directories() is a simple Python function I've written to remove the directories passed to it and then recreate them, empty. It is the equivalent of rm -r followed by mkdir. It's not strictly necessary, but I'm paranoid about old versions of renamed files ending up in my generated output, so I always start with an empty directory.

Working with Files:

Paver's standard library includes Jason Orendorff's path library for working with directories and files. Using the library, paths are objects with methods (instead of just strings). Methods include finding the directory name for a file, getting the contents of a directory, removing a file, etc. -- the sorts of things you want to do with files and directories. One especially nice feature is the / operator, which works like os.path.join. It is simple to construct a path using components in separate variables, joining them with /.

The sdist function is responsible for copying the packaged source distribution to my desktop. It starts by using path's globbing support to build a list of the .tar.gz files created by setuptools.command.sdist. (There should only be one file, but predicting the name is more difficult than just using globbing.) The destination directory is configured through the options Bundle (a dictionary-like object with attribute lookup for the keys). Since the value of the option might include ~, I expand it before using it as the destination directory for the file move operation.

Options:

Make is usually configured via the shell environment and variables within the Makefile itself. Paver uses a collection of Bundle objects organized into a hierarchical namespace. The options can be set to static literal values, computed at runtime using normal Python expressions, or overridden from the command line.

Each task can define its own section of the namespace with options. Some underlying recipes (especially the distutils and sphinx integration) depend on a specific structure, documented with the relevant task documentation. For example, these are the settings I use when running sphinx:

    sphinx = Bunch(
sourcedir='PyMOTW',
docroot = '.',
builder = 'html',
doctrees='sphinx/doctrees',
confdir = 'sphinx',
),


Tasks access the options using dot-notation starting with the root of the namespace. For example, options.sphinx.doctrees.

Running Shell Commands:

Even with the power of Python as a programming language, sometimes it is necessary to shell-out to run an external program. Paver makes that very easy. sh() wraps Python's standard library module subprocess to make it easier to work with for the sorts of use cases commonly found in a build system. Simply pass a string containing the shell command you want run, and optionally include the capture argument to have it return the output text (useful for commands like svnversion). sh() takes care of running the command, or in dry-run mode printing the command it would have run.

For example, the last step of building the PyMOTW PDF requires running a target included in the Makefile generated by Sphinx.

    latex_dir = path(options.pdf.builddir) / 'latex'
sh('cd %s; make' % latex_dir)


Sphinx and Cog Integration:

Paver also includes built-in support for Sphinx. The standard integration with Sphinx supports producing HTML output. You can configure many of the Sphinx options you would normally put in a conf.py file directly through Paver's pavement.py. I had to override the way Sphinx is run by default, because I want to produce 3 different versions of HTML output (using different templates) and the PDF, but simpler projects won't have to do much more than set up the location of the input files.

In addition to Sphinx, Paver integrates Ned Batchelder's Cog, a templating/macro definition tool that lets you generate part of your documentation on the fly from arbitrary Python code. I've done some work to have cog run the PyMOTW examples and insert the output into the rst file before passing it to Sphinx to be converted to HTML or PDF. The process is complicated enough to warrant its own post, though, so that will have to wait for another day.

Conclusions:

Paver is a useful alternative to make, especially for Python-based packages. The default integration with distutils makes it very easy to get started. Build environments requiring a lot of external shell calls may find Makefile's easier to deal with. In my case, I was able to fold a couple of small Python scripts into the pavement.py file, so I eliminated a few separate tools.

It's hard to say whether a pavement file is "simpler" than a Makefile. Task definitions do not tend to be shorter than make targets, but the verbosity is an artifact of Python (function definitions and decorators, etc.) rather than anything inherent in the way Paver is designed.

A typical Paver configuration file is likely to be more portable than a Makefile, so that may be something to take into account. With file operations easily accessible in a portable library, it should be easy to set up your pavement.py to work on any OS.

For the complete pavement.py file used by PyMOTW, grab the latest release from the web site.

PyMOTW: compileall

compileall – Byte-compile Source Files

Purpose:Convert source files to byte-compiled version.
Python Version:1.4

The compileall module finds Python source files and compiles them to the byte-code representation, saving the results in .pyc or .pyo files.

Compiling One Directory

compile_dir() is used to recursively scan a directory and byte-compile the files within it.

import compileall

compileall.compile_dir('examples')

By default, all of the subdirectories are scanned to a depth of 10. When using a version control system such as subversion, this can lead to unnecessary scanning, as seen here:

$ python compileall_compile_dir.py
Listing examples ...
Listing examples/.svn ...
Listing examples/.svn/prop-base ...
Listing examples/.svn/props ...
Listing examples/.svn/text-base ...
Listing examples/.svn/tmp ...
Listing examples/.svn/tmp/prop-base ...
Listing examples/.svn/tmp/props ...
Listing examples/.svn/tmp/text-base ...
Compiling examples/a.py ...
Listing examples/subdir ...
Listing examples/subdir/.svn ...
Listing examples/subdir/.svn/prop-base ...
Listing examples/subdir/.svn/props ...
Listing examples/subdir/.svn/text-base ...
Listing examples/subdir/.svn/tmp ...
Listing examples/subdir/.svn/tmp/prop-base ...
Listing examples/subdir/.svn/tmp/props ...
Listing examples/subdir/.svn/tmp/text-base ...
Compiling examples/subdir/b.py ...

To filter directories out, use the rx argument to provide a regular expression to match the names to exclude.

import compileall
import re

compileall.compile_dir('examples',
rx=re.compile(r'/\.svn'))
$ python compileall_exclude_dirs.py
Listing examples ...
Compiling examples/a.py ...
Listing examples/subdir ...
Compiling examples/subdir/b.py ...

The maxlevels argument controls the depth of recursion. For example, to avoid recursion entirely pass 0.

import compileall
import re

compileall.compile_dir('examples',
maxlevels=0,
rx=re.compile(r'/\.svn'))
$ python compileall_recursion_depth.py
Listing examples ...
Compiling examples/a.py ...

Compiling sys.path

All of the Python source files found in sys.path can be compiled with a single call to compile_path().

import compileall
import sys

sys.path[:] = ['examples', 'notthere']
print 'sys.path =', sys.path
compileall.compile_path()

This example replaces the default contents of sys.path to avoid permission errors while running the script, but still illustrates the default behavior. Note that the maxlevels value defaults to 0.

$ python compileall_path.py
sys.path = ['examples', 'notthere']
Listing examples ...
Compiling examples/a.py ...
Listing notthere ...
Can't list notthere

From the Command Line

It is also possible to invoke compileall from the command line, as you might when integrating it with a build system via a Makefile. For example:

$ python -m compileall -h
option -h not recognized
usage: python compileall.py [-l] [-f] [-q] [-d destdir] [-x regexp] [directory ...]
-l: don't recurse down
-f: force rebuild even if timestamps are up-to-date
-q: quiet operation
-d destdir: purported directory name for error messages
if no directory arguments, -l sys.path is assumed
-x regexp: skip files matching the regular expression regexp
the regexp is search for in the full path of the file

To recreate the example above, skipping .svn directories, one would run:

$ python -m compileall -x '/\.svn' examples
Listing examples ...
Compiling examples/a.py ...
Listing examples/subdir ...
Compiling examples/subdir/b.py ...

See also

compileall
The standard library documentation for this module.

PyMOTW Home

Sunday, January 11, 2009

PyWorks Wrap-up | Import This!

The first annual PyWorks conference has just wrapped up, and it was by all accounts a big success. We had a great time socializing, learning about new tools, and catching up on the progress made by established projects. The Atlanta weather was unusually wet, but that didn't stop some of us from heading offsite to attend the local Python user group meeting, too. All in all, it was a fun and productive three days.

This article was originally published by Python Magazine in December of 2008 part of a series of columns I wrote as Editor in Chief for Python Magazine under the title "Import This!".

Read more

Monday, January 5, 2009

January meeting of PyATL

Via Brandon Rhodes:

This month's Python Atlanta meeting is this Thursday, January 8th:

And, I have exciting news - the chairman of the Python Software Foundation himself, Steve Holden, will be our main speaker! He will kick off our new year by giving us his own State of the Union address: in "The State of the Python Community", he will talk about the Python community, its strengths, and its weaknesses. He will not only answer your questions about how to stay connected to the wider Python community, but will be asking *you* questions about how the community can be more accessible and serve you better!

Our other talk should also be great: ifPeople founder Christopher Johnson will answer the question "Why People Choose Plone" by talking about how his customers benefit from Plone, Python's flagship CMS (content management system). He will discuss how it integrates with other web-enabled services like Salesforce, and what it is like to install and theme a Plone site for the first time.

If you are very interested in Plone, note that the Atlanta Plone group meets at the ifPeople offices this Wednesday at 5:30pm for beer, slide presentations, and talk about their favorite CMS!

Feel free to either meet up with us early at the Howell Mill Figo Pasta at 6pm (email Brandonz if you're coming so that he can get a head count), or just show up at the main meeting at 7:30pm at the GTRI Food Processing Technology Building for the presentations. Here are the meeting details, where you can RSVP and get directions.

See you on Thursday!

[Updated to fix formatting, sorry for the aggregator spam.]

Sunday, January 4, 2009

PyMOTW: bz2

bz2 – bzip2 compression

Purpose:bzip2 compression
Python Version:2.3 and later

The bz2 module is an interface for the bzip2 library, used to compress data for storage or transmission. There are three APIs provided:

  • “one shot” compression/decompression functions for operating on a blob of data
  • iterative compression/decompression objects for working with streams of data
  • a file-like class that supports reading and writing as with an uncompressed file

One-shot Operations in Memory

The simplest way to work with bz2 requires holding all of the data to be compressed or decompressed in memory, and then using compress() and decompress().

import bz2
import binascii

original_data = 'This is the original text.'
print 'Original :', len(original_data), original_data

compressed = bz2.compress(original_data)
print 'Compressed :', len(compressed), binascii.hexlify(compressed)

decompressed = bz2.decompress(compressed)
print 'Decompressed :', len(decompressed), decompressed
$ python bz2_memory.py
Original : 26 This is the original text.
Compressed : 62 425a683931415926535916be35a600000293804001040022e59c402000314c000111e93d434da223028cf9e73148cae0a0d6ed7f17724538509016be35a6
Decompressed : 26 This is the original text.

Notice that for short text, the compressed version can be significantly longer. While the actual results depend on the input data, for short bits of text it is interesting to observe the compression overhead.

import bz2

original_data = 'This is the original text.'

fmt = '%15s %15s'
print fmt % ('len(data)', 'len(compressed)')
print fmt % ('-' * 15, '-' * 15)

for i in xrange(20):
data = original_data * i
compressed = bz2.compress(data)
print fmt % (len(data), len(compressed)), '*' if len(data) < len(compressed) else ''
$ python bz2_lengths.py
len(data) len(compressed)
--------------- ---------------
0 14 *
26 62 *
52 68 *
78 70
104 72
130 77
156 77
182 73
208 75
234 80
260 80
286 81
312 80
338 81
364 81
390 76
416 78
442 84
468 84
494 87

Working with Streams

The in-memory approach is not practical for real-world use cases, since you rarely want to hold both the entire compressed and uncompressed data sets in memory at the same time. The alternative is to use BZ2Compressor and BZ2Decompressor objects to work with streams of data, so that the entire data set does not have to fit into memory.

The simple server below responds to requests consisting of filenames by writing a compressed version of the file to the socket used to communicate with the client. It has some artificial chunking in place to illustrate the buffering behavior that happens when the data passed to compress() or decompress() doesn’t result in a complete block of compressed or uncompressed output.

Warning

This server has obvious security implications. Do not run it on a server on the
open internet or in any environment where security might be an issue.

from __future__ import with_statement
import bz2
import logging
import SocketServer
import binascii

BLOCK_SIZE = 32

class Bz2RequestHandler(SocketServer.BaseRequestHandler):

logger = logging.getLogger('Server')

def handle(self):
compressor = bz2.BZ2Compressor()

# Find out what file the client wants
filename = self.request.recv(1024)
self.logger.debug('client asked for: "%s"', filename)

# Send chunks of the file as they are compressed
with open(filename, 'rb') as input:
while True:
block = input.read(BLOCK_SIZE)
if not block:
break
self.logger.debug('RAW "%s"', block)
compressed = compressor.compress(block)
if compressed:
self.logger.debug('SENDING "%s"', binascii.hexlify(compressed))
self.request.send(compressed)
else:
self.logger.debug('BUFFERING')

# Send any data being buffered by the compressor
remaining = compressor.flush()
while remaining:
to_send = remaining[:BLOCK_SIZE]
remaining = remaining[BLOCK_SIZE:]
self.logger.debug('FLUSHING "%s"', binascii.hexlify(to_send))
self.request.send(to_send)
return


if __name__ == '__main__':
import socket
import threading
from cStringIO import StringIO

logging.basicConfig(level=logging.DEBUG,
format='%(name)s: %(message)s',
)
logger = logging.getLogger('Client')

# Set up a server, running in a separate thread
address = ('localhost', 0) # let the kernel give us a port
server = SocketServer.TCPServer(address, Bz2RequestHandler)
ip, port = server.server_address # find out what port we were given

t = threading.Thread(target=server.serve_forever)
t.setDaemon(True)
t.start()

# Connect to the server
logger.info('Contacting server on %s:%s', ip, port)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ip, port))

# Ask for a file
requested_file = 'lorem.txt'
logger.debug('sending filename: "%s"', requested_file)
len_sent = s.send(requested_file)

# Receive a response
buffer = StringIO()
decompressor = bz2.BZ2Decompressor()
while True:
response = s.recv(BLOCK_SIZE)
if not response:
break
logger.debug('READ "%s"', binascii.hexlify(response))

# Include any unconsumed data when feeding the decompressor.
decompressed = decompressor.decompress(response)
if decompressed:
logger.debug('DECOMPRESSED "%s"', decompressed)
buffer.write(decompressed)
else:
logger.debug('BUFFERING')

full_response = buffer.getvalue()
lorem = open('lorem.txt', 'rt').read()
logger.debug('response matches file contents: %s', full_response == lorem)

# Clean up
s.close()
server.socket.close()
$ python bz2_server.py
Client: Contacting server on 127.0.0.1:54092
Client: sending filename: "lorem.txt"
Server: client asked for: "lorem.txt"
Server: RAW "Lorem ipsum dolor sit amet, cons"
Server: BUFFERING
Server: RAW "ectetuer adipiscing elit. Donec
"
Server: BUFFERING
Server: RAW "egestas, enim et consectetuer ul"
Server: BUFFERING
Server: RAW "lamcorper, lectus ligula rutrum "
Server: BUFFERING
Server: RAW "leo, a
elementum elit tortor eu "
Server: BUFFERING
Server: RAW "quam. Duis tincidunt nisi ut ant"
Server: BUFFERING
Server: RAW "e. Nulla
facilisi. Sed tristique"
Server: BUFFERING
Server: RAW " eros eu libero. Pellentesque ve"
Server: BUFFERING
Server: RAW "l arcu. Vivamus
purus orci, iacu"
Server: BUFFERING
Server: RAW "lis ac, suscipit sit amet, pulvi"
Server: BUFFERING
Server: RAW "nar eu,
lacus. Praesent placerat"
Server: BUFFERING
Server: RAW " tortor sed nisl. Nunc blandit d"
Server: BUFFERING
Server: RAW "iam egestas
dui. Pellentesque ha"
Server: BUFFERING
Server: RAW "bitant morbi tristique senectus "
Server: BUFFERING
Server: RAW "et netus et
malesuada fames ac t"
Server: BUFFERING
Server: RAW "urpis egestas. Aliquam viverra f"
Server: BUFFERING
Server: RAW "ringilla
leo. Nulla feugiat augu"
Server: BUFFERING
Server: RAW "e eleifend nulla. Vivamus mauris"
Server: BUFFERING
Server: RAW ". Vivamus sed
mauris in nibh pla"
Server: BUFFERING
Server: RAW "cerat egestas. Suspendisse poten"
Server: BUFFERING
Server: RAW "ti. Mauris massa. Ut
eget velit "
Server: BUFFERING
Server: RAW "auctor tortor blandit sollicitud"
Server: BUFFERING
Server: RAW "in. Suspendisse imperdiet
justo."
Server: BUFFERING
Server: RAW "
"
Server: BUFFERING
Server: FLUSHING "425a68393141592653590fd264ff00004357800010400524074b003ff7ff0040"
Server: FLUSHING "01dd936c1834269926d4d13d232640341a986935343534f5000018d311846980"
Client: READ "425a68393141592653590fd264ff00004357800010400524074b003ff7ff0040"
Server: FLUSHING "0001299084530d35434f51ea1ea13fce3df02cb7cde200b67bb8fca353727a30"
Client: BUFFERING
Server: FLUSHING "fe67cdcdd2307c455a3964fad491e9350de1a66b9458a40876613e7575a9d2de"
Client: READ "01dd936c1834269926d4d13d232640341a986935343534f5000018d311846980"
Server: FLUSHING "db28ab492d5893b99616ebae68b8a61294a48ba5d0a6c428f59ad9eb72e0c40f"
Client: BUFFERING
Server: FLUSHING "f449c4f64c35ad8a27caa2bbd9e35214df63183393aa35919a4f1573615c6ae3"
Client: READ "0001299084530d35434f51ea1ea13fce3df02cb7cde200b67bb8fca353727a30"
Server: FLUSHING "611f18917467ad690abb4cb67a3a5f1fd36c2511d105836a0fed317be03702ba"
Client: BUFFERING
Server: FLUSHING "394984c68a595d1cc2f5219a1ada69b6d6863cf5bd925f36626046d68c3a9921"
Client: READ "fe67cdcdd2307c455a3964fad491e9350de1a66b9458a40876613e7575a9d2de"
Server: FLUSHING "3103445c9d2438d03b5a675dfdc74e3bed98e8b72dec76c923afa395eb5ce61b"
Client: BUFFERING
Server: FLUSHING "50cfc0ccaaa726b293a50edc28b551261dd09a24aba682972bc75f1fae4c4765"
Client: READ "db28ab492d5893b99616ebae68b8a61294a48ba5d0a6c428f59ad9eb72e0c40f"
Server: FLUSHING "f3b7eeea36e771e577350970dab4baf07750ccf96494df9e63a9454b7133be1d"
Client: BUFFERING
Server: FLUSHING "ee330da50a869eea59f73319b18959262860897dafdc965ac4b79944c4cc3341"
Client: READ "f449c4f64c35ad8a27caa2bbd9e35214df63183393aa35919a4f1573615c6ae3"
Server: FLUSHING "5b23816d45912c8860f40ea930646fc8adbc48040cbb6cd4fc222f8c66d58256"
Client: BUFFERING
Server: FLUSHING "d508d8eb4f43986b9203e13f8bb9229c284807e9327f80"
Client: READ "611f18917467ad690abb4cb67a3a5f1fd36c2511d105836a0fed317be03702ba"
Client: BUFFERING
Client: READ "394984c68a595d1cc2f5219a1ada69b6d6863cf5bd925f36626046d68c3a9921"
Client: BUFFERING
Client: READ "3103445c9d2438d03b5a675dfdc74e3bed98e8b72dec76c923afa395eb5ce61b"
Client: BUFFERING
Client: READ "50cfc0ccaaa726b293a50edc28b551261dd09a24aba682972bc75f1fae4c4765"
Client: BUFFERING
Client: READ "f3b7eeea36e771e577350970dab4baf07750ccf96494df9e63a9454b7133be1d"
Client: BUFFERING
Client: READ "ee330da50a869eea59f73319b18959262860897dafdc965ac4b79944c4cc3341"
Client: BUFFERING
Client: READ "5b23816d45912c8860f40ea930646fc8adbc48040cbb6cd4fc222f8c66d58256"
Client: BUFFERING
Client: READ "d508d8eb4f43986b9203e13f8bb9229c284807e9327f80"
Client: DECOMPRESSED "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec
egestas, enim et consectetuer ullamcorper, lectus ligula rutrum leo, a
elementum elit tortor eu quam. Duis tincidunt nisi ut ante. Nulla
facilisi. Sed tristique eros eu libero. Pellentesque vel arcu. Vivamus
purus orci, iaculis ac, suscipit sit amet, pulvinar eu,
lacus. Praesent placerat tortor sed nisl. Nunc blandit diam egestas
dui. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Aliquam viverra fringilla
leo. Nulla feugiat augue eleifend nulla. Vivamus mauris. Vivamus sed
mauris in nibh placerat egestas. Suspendisse potenti. Mauris massa. Ut
eget velit auctor tortor blandit sollicitudin. Suspendisse imperdiet
justo.
"
Client: response matches file contents: True

Mixed Content Streams

The BZ2Decompressor class can also be used in situations where compressed and uncompressed data is mixed together. After decompressing all of the data, the unused_data attribute contains any data not used.

import bz2

lorem = open('lorem.txt', 'rt').read()
compressed = bz2.compress(lorem)
combined = compressed + lorem

decompressor = bz2.BZ2Decompressor()
decompressed = decompressor.decompress(combined)

print 'Decompressed matches lorem:', decompressed == lorem
print 'Unused data matches lorem :', decompressor.unused_data == lorem
$ python bz2_mixed.py
Decompressed matches lorem: True
Unused data matches lorem : True

Writing Compressed Files

The BZ2File class can be used to write to and read from bzip2-compressed files using the usual methods for writing and reading data. To write data into a compressed file, open the file with mode 'w'.

import bz2
import os

output = bz2.BZ2File('example.txt.bz2', 'wb')
try:
output.write('Contents of the example file go here.\n')
finally:
output.close()

os.system('ls -l example.txt.bz2')
os.system('file example.txt.bz2')
$ python bz2_file_write.py
-rw-r--r-- 1 dhellmann dhellmann 74 Dec 24 10:44 example.txt.bz2
example.txt.bz2: bzip2 compressed data, block size = 900k

Different compression levels can be used by passing a compresslevel argument. Valid values range from 1 to 9, inclusive. Lower values are faster and result in less compression. Higher values are slower and compress more, up to a point.

import bz2
import os

data = open('lorem.txt', 'r').read() * 1024
print 'Input contains %d bytes' % len(data)

for i in xrange(1, 10):
filename = 'compress-level-%s.bz2' % i
output = bz2.BZ2File(filename, 'wb', compresslevel=i)
try:
output.write(data)
finally:
output.close()
os.system('cksum %s' % filename)

The center column of numbers in the output of the script is the size in bytes of the files produced. As you see, for this input data, the higher compression values do not always pay off in decreased storage space for the same input data. Results will vary, of course.

$ python bz2_file_compresslevel.py
3018243926 8771 compress-level-1.bz2
1942389165 4949 compress-level-2.bz2
2596054176 3708 compress-level-3.bz2
1491394456 2705 compress-level-4.bz2
1425874420 2705 compress-level-5.bz2
2232840816 2574 compress-level-6.bz2
447681641 2394 compress-level-7.bz2
3699654768 1137 compress-level-8.bz2
3103658384 1137 compress-level-9.bz2
Input contains 754688 bytes

A BZ2File instance also includes a writelines() method that can be used to write a sequence of strings.

import bz2
import itertools
import os

output = bz2.BZ2File('example_lines.txt.bz2', 'wb')
try:
output.writelines(itertools.repeat('The same line, over and over.\n', 10))
finally:
output.close()

os.system('bzcat example_lines.txt.bz2')
$ python bz2_file_writelines.py
The same line, over and over.
The same line, over and over.
The same line, over and over.
The same line, over and over.
The same line, over and over.
The same line, over and over.
The same line, over and over.
The same line, over and over.
The same line, over and over.
The same line, over and over.

Reading Compressed Files

To read data back from previously compressed files, simply open the file with mode 'r'.

import bz2

input_file = bz2.BZ2File('example.txt.bz2', 'rb')
try:
print input_file.read()
finally:
input_file.close()

This example reads the file written by bz2_file_write.py from the previous section.

$ python bz2_file_read.py
Contents of the example file go here.

While reading a file, it is also possible to seek and read only part of the data.

import bz2

input_file = bz2.BZ2File('example.txt.bz2', 'rb')
try:
print 'Entire file:'
all_data = input_file.read()
print all_data

expected = all_data[5:15]

# rewind to beginning
input_file.seek(0)

# move ahead 5 bytes
input_file.seek(5)
print 'Starting at position 5 for 10 bytes:'
partial = input_file.read(10)
print partial

print
print expected == partial
finally:
input_file.close()

The seek() position is relative to the uncompressed data, so the caller does not even need to know that the data file is compressed.

$ python bz2_file_seek.py
Entire file:
Contents of the example file go here.

Starting at position 5 for 10 bytes:
nts of the

True

See also

bz2
The standard library documentation for this module.
bzip2.org
The home page for bzip2.
zlib
The zlib module for GNU zip compression.

PyMOTW Home

Chinese translation of PyMOTW

I was contacted yesterday about a Chinese translation of PyMOTW. Junjie Cai (蔡俊杰) and Yan Sheng (盛艳) have started a google code project called PyMOTWCN (http://code.google.com/p/pymotwcn/) and posted the completed translations at http://www.vbarter.cn/pymotw/.

My holidays were busier than anticipated, so I didn't have a chance to continue my research into DVCS tools and hosting. I plan to work on that this month and come to a conclusion soon. I'm currently leaning towards bitbucket.org and Mercurial, but I haven't really done the level of investigation into git that I wanted, so I can't say that's a final decision.

Updated to correct Junjie and Yan's names.

Saturday, January 3, 2009

To Build or To Buy? | Import This!

Recently a friend sent me a link to an old post on Rick Copeland's blog titled Three Reasons Why You Shouldn't Write Your Own Web Framework. Although Rick talks specifically about open source web frameworks, his comments raised the more general question of how should we draw the line when deciding whether to build a new anything versus "buying" or using an existing project. The answer is not always clear, because it depends on a lot of factors.

This article was originally published by Python Magazine in October of 2008 as part of a series of columns I wrote as Editor in Chief for Python Magazine under the title "Import This!".

Read more

Friday, January 2, 2009

New Year's meme: What are the oldest files in your home directory?

Following up on Brandon's meme:

The Rules:

Celebrate the new year with a blog post discussing the oldest files that are still sitting somewhere beneath your home directory! The procedure is simple:

1. Run the following script in your home directory. (You might want to use less to read the output.)
2. Ignore files whose date does not reflect your own activity.
3. List the oldest files in a blog post and discuss!

#!/usr/bin/env python
"""Print last-modified times of files beneath '.', oldest first."""
import os, os.path, time
paths = ( os.path.join(b,f)
for (b,ds,fs) in os.walk('.')
for f in fs )
for mtime, path in sorted( (os.lstat(p).st_mtime, p)
for p in paths ):
print time.strftime("%Y-%m-%d", time.localtime(mtime)), path


Only include files whose last-modified time is a date on which you really touched the file. The file's time should neither result from an error (a few files beneath my own home directory have an incorrect date of 1970-01-01), nor from unpacking someone else's archive that has old files inside of it.

But there is no requirement that the actual content of each file you list be your own. Whether you wrote the file yourself long ago, or downloaded it from some ancient and forgotten FTP site, you have a story to share!

My Files:

After weeding out old files extracted from Python source archives for various version of Python (used while writing my PyMOTW series), I found an old Pascal program for doing payroll at a small company I used to work for while I was an undergrad:


1992-03-12 ./Documents/1995/College/SEPA/PAYROLL.PRG/PAYROLL.PAS
1992-03-12 ./Documents/1995/College/SEPA/PAYROLL.PRG/WWE.PAS


I also found some files containing a paper about SEDPAK, a sequence stratigraphy simulation program I worked on as part of the Stratmod Group. Based on the new web site, it looks like they've made some major enhancements since I left school, lo these many years ago.


1992-12-10 ./Documents/1995/College/Sedpak/POHLMAN2
1992-12-10 ./Documents/1995/College/Sedpak/POHLMANR.EPO


From around the same era, I found the help manual for Mayday!, the help desk app we build in my software engineering course. It was pretty fancy, for its time (the all-caps 8.3 naming convention should clue you in to its relative age). A Motif GUI and curses UI let the admins check new trouble tickets and resolve them when fixed. Incoming reports came from email processed by an impressive batch script.


1993-01-26 ./Documents/1995/College/SCHOOL/640/MAYDAY/APPLY.DOC


Thanks for instigating a trip down memory lane, Brandon!

Thursday, January 1, 2009

Releasing Software | Import This!

"Release early, release often." How often is enough and does it count if you just commit?

I've always been interested in development processes and how they impact the software being created. I have worked in shops with a range of formal processes, and after some experimentation with different methodologies, I find myself most comfortable somewhere in the middle of the spectrum. I like to have enough controls in place to understand how forward progress is made and prevent regressions, but also want a certain degree of flexibility to bend the rules when the situation calls for it. While no software is ever complete, I want the ability to put a stake in the ground and say "This version is done", for some definition of "done".

This article was originally published by Python Magazine in September of 2008 as part of a series of columns I wrote as Editor in Chief for Python Magazine under the title "Import This!".

Read more