Sunday, June 28, 2009

PyMOTW: pyclbr

pyclbr – Python class browser support

Purpose:Implements an API suitable for use in a source code editor for making a class browser.
Python Version:1.4 and later

pyclbr can scan Python source to find classes and stand-alone functions. The information about class, method, and function names and line numbers is gathered using tokenize without importing the code.

The examples below use this source file as input:

"""Example source for pyclbr.
"""

class Base(object):
"""This is the base class.
"""

def method1(self):
return

class Sub1(Base):
"""This is the first subclass.
"""

class Sub2(Base):
"""This is the second subclass.
"""

class Mixin:
"""A mixin class.
"""

def method2(self):
return

class MixinUser(Sub2, Mixin):
"""Overrides method1 and method2
"""

def method1(self):
return

def method2(self):
return

def method3(self):
return

def my_function():
"""Stand-alone function.
"""
return

Scanning for Classes

There are two public functions exposed by pyclbr. readmodule() takes the name of the module as argument returns a mapping of class names to Class objects containing the meta-data about the class source.

import pyclbr
import os
from operator import itemgetter

def show_class(name, class_data):
print 'Class:', name
print '\tFile: {0} [{1}]'.format(os.path.basename(class_data.file), class_data.lineno)
show_super_classes(name, class_data)
show_methods(name, class_data)
print
return

def show_methods(class_name, class_data):
for name, lineno in sorted(class_data.methods.items(), key=itemgetter(1)):
print '\tMethod: {0} [{1}]'.format(name, lineno)
return

def show_super_classes(name, class_data):
super_class_names = []
for super_class in class_data.super:
if super_class == 'object':
continue
if isinstance(super_class, basestring):
super_class_names.append(super_class)
else:
super_class_names.append(super_class.name)
if super_class_names:
print '\tSuper classes:', super_class_names
return

example_data = pyclbr.readmodule('pyclbr_example')

for name, class_data in sorted(example_data.items(), key=lambda x:x[1].lineno):
show_class(name, class_data)

The meta-data for the class includes the file and line number where it is defined, as well as the names of super classes. The methods of the class are saved as a mapping between method name and line number. The output below shows the classes and methods listed in order based on their line number in the source file.

$ python pyclbr_readmodule.py
Class: Base
File: pyclbr_example.py [10]
Method: method1 [14]

Class: Sub1
File: pyclbr_example.py [17]
Super classes: ['Base']

Class: Sub2
File: pyclbr_example.py [21]
Super classes: ['Base']

Class: Mixin
File: pyclbr_example.py [25]
Method: method2 [29]

Class: MixinUser
File: pyclbr_example.py [32]
Super classes: ['Sub2', 'Mixin']
Method: method1 [36]
Method: method2 [39]
Method: method3 [42]

Scanning for Functions

The other public function in pyclbr is readmodule_ex(). It does everything that readmodule() does, and adds functions to the result set.

import pyclbr
import os
from operator import itemgetter

example_data = pyclbr.readmodule_ex('pyclbr_example')

for name, data in sorted(example_data.items(), key=lambda x:x[1].lineno):
if isinstance(data, pyclbr.Function):
print 'Function: {0} [{1}]'.format(name, data.lineno)

Each Function object has properties much like the Class object.

$ python pyclbr_readmodule_ex.py
Function: my_function [45]

See also

pyclbr
The standard library documentation for this module.
inspect
The inspect module can discover more meta-data about classes and functions, but requires importing the code.
tokenize
The tokenize module parses Python source code into tokens.

PyMOTW Home

The canonical version of this article

Sunday, June 21, 2009

PyMOTW: robotparser


robotparser – Internet spider access control

Purpose:Parse robots.txt file used to control Internet spiders
Python Version:2.1.3 and later

robotparser implements a parser for the robots.txt file format, including a simple function for checking if a given user agent can access a resource. It is intended for use in well-behaved spiders or other crawler applications that need to either be throttled or otherwise restricted.

Note

The robotparser module has been renamed urllib.robotparser in Python 3.0.
Existing code using robotparser can be updated using 2to3.

robots.txt

The robots.txt file format is a simple text-based access control system for computer programs that automatically access web resources (“spiders”, “crawlers”, etc.). The file is made up of records that specify the user agent identifier for the program followed by a list of URLs (or URL prefixes) the agent may not access.

This is the robots.txt file for http://www.doughellmann.com/:

User-agent: *
Disallow: /admin/
Disallow: /downloads/
Disallow: /media/
Disallow: /static/
Disallow: /codehosting/

It prevents access to some of the expensive parts of my site that would overload the server if a search engine tried to index them. For a more complete set of examples, refer to The Web Robots Page.

Simple Example

Using the data above, a simple crawler can test whether it is allowed to download a page using the RobotFileParser‘s can_fetch() method.

import robotparser
import urlparse

AGENT_NAME = 'PyMOTW'
URL_BASE = 'http://www.doughellmann.com/'
parser = robotparser.RobotFileParser()
parser.set_url(urlparse.urljoin(URL_BASE, 'robots.txt'))
parser.read()

PATHS = [
'/',
'/PyMOTW/',
'/admin/',
'/downloads/PyMOTW-1.92.tar.gz',
]

for path in PATHS:
print '%6s : %s' % (parser.can_fetch(AGENT_NAME, path), path)
url = urlparse.urljoin(URL_BASE, path)
print '%6s : %s' % (parser.can_fetch(AGENT_NAME, url), url)
print

The URL argument to can_fetch() can be a path relative to the root of the site, or full URL.

$ python robotparser_simple.py
True : /
True : http://www.doughellmann.com/

True : /PyMOTW/
True : http://www.doughellmann.com/PyMOTW/

False : /admin/
False : http://www.doughellmann.com/admin/

False : /downloads/PyMOTW-1.92.tar.gz
False : http://www.doughellmann.com/downloads/PyMOTW-1.92.tar.gz

Long-lived Spiders

An application that takes a long time to process the resources it downloads or that is throttled to pause between downloads may want to check for new robots.txt files periodically based on the age of the content it has downloaded already. The age is not managed automatically, but there are convenience methods to make tracking it easier.

import robotparser
import time
import urlparse

AGENT_NAME = 'PyMOTW'
parser = robotparser.RobotFileParser()
# Using the local copy
parser.set_url('robots.txt')
parser.read()
parser.modified()

PATHS = [
'/',
'/PyMOTW/',
'/admin/',
'/downloads/PyMOTW-1.92.tar.gz',
]

for n, path in enumerate(PATHS):
print
age = int(time.time() - parser.mtime())
print 'age:', age,
if age > 1:
print 're-reading robots.txt'
parser.read()
parser.modified()
else:
print
print '%6s : %s' % (parser.can_fetch(AGENT_NAME, path), path)
# Simulate a delay in processing
time.sleep(1)

This extreme example downloads a new robots.txt file if the one it has is more than 1 second old.

$ python robotparser_longlived.py

age: 0
True : /

age: 1
True : /PyMOTW/

age: 2 re-reading robots.txt
False : /admin/

age: 1
False : /downloads/PyMOTW-1.92.tar.gz

A “nicer” version of the long-lived application might request the modification time for the file before downloading the entire thing. On the other hand, robots.txt files are usually fairly small, so it isn’t that much more expensive to just grab the entire document again.

See also

robotparser
The standard library documentation for this module.
The Web Robots Page
Description of robots.txt format.

PyMOTW Home

The canonical version of this article

Friday, June 19, 2009

Exception Handling Techniques

Last week when a colleague was reviewing some code I had written, he had a few questions about an unusual exception handling technique I was using. I had been fixing a bug in which the cleanup code from one error was generating another error, and the second exception masked the first. I went through a few contortions before I came up with a solution that allowed the cleanup code to fail but still preserved the original error and traceback for the user.

After our conversation, I decided to write up a quick example to share. I prefaced the description with some basic exception handling information to put it in context, and I posted the resulting article on my site.

Sunday, June 14, 2009

PyMOTW: gettext

gettext – Message Catalogs

Purpose:Message catalog API for internationalization.
Python Version:2.1.3 and later

The gettext module provides an all-Python implementation compatible with the GNU gettext library for message translation and catalog management. The tools available with the Python source distribution enable you to extract messages from your source, build a message catalog containing translations, and use that message catalog to print an appropriate message for the user at runtime.

Message catalogs can be used to provide internationalized interfaces for your program, showing messages in a language appropriate to the user. They can also be used for other message customizations, including “skinning” an interface for different wrappers or partners.

Note

Although the standard library documentation says everything you need is included with
Python, I found that pygettext.py refused to extract messages wrapped in the
ungettext call, even when I used what seemed to be the appropriate command line
options. I ended up installing the GNU gettext tools from source and using
xgettext instead. YMMV.

Translation Workflow Overview

The process for setting up and using translations includes five steps:

  1. Mark up literal strings in your code that contain messages to translate.

    Start by identifying the messages within your program source that need to be translated,
    and marking the literal strings so the extraction program can find them.

  2. Extract the messages.

    After you have identified the translatable strings in your program source, use
    xgettext to pull the strings out and create a .pot file, or translation
    template. The template is a text file with copies of all of the strings you identified
    and placeholders for their translations.

  3. Translate the messages.

    Give a copy of the .pot file to the translator, changing the extension to .po. The
    .po file is an editable source file used as input for the compilation step. The
    translator should update the header text in the file and provide translations for all of
    the strings.

  4. “Compile” the message catalog from the translation.

    When the translator gives you back the completed .po file, compile the text file to
    the binary catalog format using msgfmt. The binary format is used by the runtime
    catalog lookup code.

  5. Load and activate the appropriate message catalog at runtime.

    The final step is to add a few lines to your application to configure and load the message
    catalog and install the translation function. There are a couple of ways to do that, with
    associated trade-offs, and each is covered below.

Let’s go through those steps in a little more detail, starting with the modifications you need to make to your code.

Creating Message Catalogs from Source Code

gettext works by finding literal strings embedded in your program in a database of translations, and pulling out the appropriate translated string. There are several variations of the functions for accessing the catalog, depending on whether you are working with Unicode strings or not. The usual pattern is to bind the lookup function you want to use to the name _ so that your code is not cluttered with lots of calls to functions with longer names.

The message extraction program, xgettext, looks for messages embedded in calls to the catalog lookup functions. It understands different source languages, and uses an appropriate parser for each. If you use aliases for the lookup functions or need to add extra functions, you can give xgettext the names of additional symbols to consider when extracting messages.

Here’s a simple script with a single message ready to be translated:

import gettext

# Set up message catalog access
t = gettext.translation('gettext_example', 'locale', fallback=True)
_ = t.ugettext

print _('This message is in the script.')

In this case I am using the Unicode version of the lookup function, ugettext(). The text "This message is in the script." is the message to be substituted from the catalog. I’ve enabled fallback mode, so if we run the script without a message catalog, the in-lined message is printed:

$ python gettext_example.py
This message is in the script.

The next step is to extract the message(s) and create the .pot file, using pygettext.py.

$ xgettext -d gettext_example -o gettext_example.pot gettext_example.py

The output file produced looks like:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-06-14 11:39-0400\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: gettext_example.py:16
msgid "This message is in the script."
msgstr ""

Message catalogs are installed into directories organized by domain and language. The domain is usually a unique value like your application name. In this case, I used gettext_example. The language value is provided by the user’s environment at runtime, through one of the environment variables LANGUAGE, LC_ALL, LC_MESSAGES, or LANG, depending on their configuration and platform. My language is set to en_US so that’s what I’ll be using in all of the examples below.

Now that we have the template, the next step is to create the required directory structure and copy the template in to the right spot. I’m going to use the locale directory inside the PyMOTW source tree as the root of my message catalog directory, but you would typically want to use a directory accessible system-wide. The full path to the catalog input source is $localedir/$language/LC_MESSAGES/$domain.po, and the actual catalog has the filename extension .mo.

For my configuration, I need to copy gettext_example.pot to locale/en_US/LC_MESSAGES/gettext_example.po and edit it to change the values in the header and add my alternate messages. The result looks like:

# Messages from gettext_example.py.
# Copyright (C) 2009 Doug Hellmann
# Doug Hellmann <doug.hellmann@gmail.com>, 2009.
#
msgid ""
msgstr ""
"Project-Id-Version: PyMOTW 1.92\n"
"Report-Msgid-Bugs-To: Doug Hellmann <doug.hellmann@gmail.com>\n"
"POT-Creation-Date: 2009-06-07 10:31+EDT\n"
"PO-Revision-Date: 2009-06-07 10:31+EDT\n"
"Last-Translator: Doug Hellmann <doug.hellmann@gmail.com>\n"
"Language-Team: US English <doug.hellmann@gmail.com>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"


#: gettext_example.py:16
msgid "This message is in the script."
msgstr "This message is in the en_US catalog."


The catalog is built from the .po file using msgformat:

$ cd locale/en_US/LC_MESSAGES/; msgfmt -o gettext_example.mo gettext_example.po

And now when we run the script, the message from the catalog is printed instead of the in-line string:

$ python gettext_example.py
This message is in the en_US catalog.

Finding Message Catalogs at Runtime

As described above, the locale directory containing the message catalogs is organized based on the language with catalogs named for the domain of the program. Different operating systems define their own default value, but gettext does not know all of these defaults. The default locale directory is sys.prefix + '/share/locale', but most of the time it is safer for you to always explicitly give a localedir value than to depend on any default behavior.

The language portion of the path is taken from one of several environment variables that can be used to configure localization features (LANGUAGE, LC_ALL, LC_MESSAGES, and LANG). The first variable found to be set is used. Multiple languages can be selected by separating the values with a colon (:). We can illustrate how that works by creating a second message catalog and running a few experiments.

$ cd locale/en_CA/LC_MESSAGES/; msgfmt -o gettext_example.mo gettext_example.po
$ python gettext_find.py
Catalogs: ['locale/en_US/LC_MESSAGES/gettext_example.mo']
$ LANGUAGE=en_CA python gettext_find.py
Catalogs: ['locale/en_CA/LC_MESSAGES/gettext_example.mo']
$ LANGUAGE=en_CA:en_US python gettext_find.py
Catalogs: ['locale/en_CA/LC_MESSAGES/gettext_example.mo', 'locale/en_US/LC_MESSAGES/gettext_example.mo']
$ LANGUAGE=en_US:en_CA python gettext_find.py
Catalogs: ['locale/en_US/LC_MESSAGES/gettext_example.mo', 'locale/en_CA/LC_MESSAGES/gettext_example.mo']

Although find() shows the complete list of catalogs, only the first one in the sequence is actually loaded for message lookups.

$ python gettext_example.py
This message is in the en_US catalog.
$ LANGUAGE=en_CA python gettext_example.py
This message is in the en_CA catalog.
$ LANGUAGE=en_CA:en_US python gettext_example.py
This message is in the en_CA catalog.
$ LANGUAGE=en_US:en_CA python gettext_example.py
This message is in the en_US catalog.

Plural Values

While simple message substitution will handle most of your translation needs, one of the special cases handled explicitly by gettext is pluralization. Depending on the language, the difference between the singular and plural forms of a message may vary only by the ending of a single word, or the entire sentence structure may be different. There may also be different forms depending on the level of plurality. To make managing plurals easier (and possible), there is a separate set of functions for asking for the plural form of a message.

from gettext import translation
import sys

t = translation('gettext_plural', 'locale', fallback=True)
num = int(sys.argv[1])
msg = t.ungettext('%(num)d means singular.', '%(num)d means plural.', num)

# Still need to add the values to the message ourself.
print msg % {'num':num}
$ xgettext -L Python -d gettext_plural -o gettext_plural.pot gettext_plural.py

Since there are alternate forms to be translated, the replacements are listed in an array. Using an array allows translations for languages with multiple plural forms (Polish, for example, has different forms indicating the relative quantity).

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-06-14 11:39-0400\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\n"

#: gettext_plural.py:15
#, python-format
msgid "%(num)d means singular."
msgid_plural "%(num)d means plural."
msgstr[0] ""
msgstr[1] ""

In addition to filling in the translation strings, you will also need to describe the way plurals are formed so the library knows how to index into the array for any given count value. The line "Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\n" includes two values to replace manually. nplurals is an integer indicating the size of the array (the number of translations used) and plural is a C language expression for converting the incoming quantity to an index in the array when looking up the translation. The literal string n is replaced with the quantity passed to ungettext().

For example, English includes two plural forms. A quantity of 0 is treated as plural (“0 bananas”). The Plural-Forms entry should look like:

Plural-Forms: nplurals=2; plural=n != 1;

The singular translation would then go in position 0, and the plural translation in position 1.

# Messages from gettext_plural.py
# Copyright (C) 2009 Doug Hellmann
# This file is distributed under the same license as the PyMOTW package.
# Doug Hellmann <doug.hellmann@gmail.com>, 2009.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PyMOTW 1.92\n"
"Report-Msgid-Bugs-To: Doug Hellmann <doug.hellmann@gmail.com>\n"
"POT-Creation-Date: 2009-06-14 09:29-0400\n"
"PO-Revision-Date: 2009-06-14 09:29-0400\n"
"Last-Translator: Doug Hellmann <doug.hellmann@gmail.com>\n"
"Language-Team: en_US <doug.hellmann@gmail.com>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=n != 1;"

#: gettext_plural.py:15
#, python-format
msgid "%(num)d means singular."
msgid_plural "%(num)d means plural."
msgstr[0] "In en_US, %(num)d is singular."
msgstr[1] "In en_US, %(num)d is plural."

If we run the test script a few times after the catalog is compiled, you can see how different values of N are converted to indexes for the translation strings.

$ cd locale/en_US/LC_MESSAGES/; msgfmt -o gettext_plural.mo gettext_plural.po
$ python gettext_plural.py 0
In en_US, 0 is plural.
$ python gettext_plural.py 1
In en_US, 1 is singular.
$ python gettext_plural.py 2
In en_US, 2 is plural.

Application vs. Module Localization

The scope of your translation effort defines how you install and use the gettext functions in your code.

Application Localization

For application-wide translations, it would be acceptable to install a function like ungettext() globally using the __builtins__ namespace because you have control over the top-level of the application’s code.

import gettext
gettext.install('gettext_example', 'locale', unicode=True, names=['ngettext'])

print _('This message is in the script.')

The install() function binds gettext() to the name _() in the __builtins__ namespace. It also adds ngettext() and other functions listed in names. If unicode is true, the Unicode versions of the functions are used instead of the default ASCII versions.

Module Localization

For a library, or individual module, modifying __builtins__ is not a good idea because you don’t know what conflicts you might introduce with an application global value. You can import or re-bind the names of translation functions by hand at the top of your module.

import gettext
t = gettext.translation('gettext_example', 'locale', fallback=True)
_ = t.ugettext
ngettext = t.ungettext

print _('This message is in the script.')

See also

gettext
The standard library documentation for this module.
GNU gettext
The message catalog formats, API, etc. for this module are all based on the original gettext package from GNU. The catalog file formats are compatible, and the command line scripts have similar options (if not identical). The GNU gettext manual has a detailed description of the file formats and describes GNU versions of the tools for working with them.
Internationalizing Python
A paper by Martin von Löwis about techniques for internationalization of Python applications.
Django Internationalization
Another good source of information on using gettext, including real-life examples.

PyMOTW Home

The canonical version of this article

installing GNU gettext for use with Python on OS X

I've been working on my blog post about Python's gettext module for the past couple of mornings, and ran into a snag. The documentation claims that the Python source distribution includes all the tools you'll need, but when I got to the point where I wanted to write examples of internationalizing plural strings, pygettext.py wasn't working.

It worked great for extracting individual string messages up to that point, but refused to extract messages wrapped in the ungettext() call, even when I used what seemed to be the appropriate command line options. I ended up installing the GNU gettext tools and using xgettext instead. Installation took me longer than I expected, so I'm documenting the process I went through here.

Fink or MacPorts: Not so much.

Since OS X doesn't ship with a version of gettext by default, and there doesn't seem to be one in the Xcode package (Apple has their own internationalization tools), I needed to find a copy elsewhere.

Ages ago I had installed fink as an "easy" way to grab copies of these sorts of utilities. However, it seems that somewhere along the line the version of fink I have stopped working (probably due to upgrading to 10.5, I haven't tried using fink or FinkCommander directly in some time). After wiping and reinstalling, I was pleased to find a slightly old version (0.14.5) of gettext installed as part of the default set of packages. Unfortunately, xgettext wasn't included in the package at all.

Next, I grabbed a copy of MacPorts, a competitor to fink. Although I've been warned off of MacPorts by a few people I trust, others have had no problems with it. Installation was fairly easy, and as with fink it installed everything to its own directory tree (under /opt/local). Once I had the port program installed, the next step was to run:

$ sudo port install gettext


It downloaded several dependencies, patched the source, compiled everything, and installed it. Voila! Well, not so much.

Even though the most current version of gettext (0.17) was installed, and the documentation clearly described the included Python language support, the binary refused to recognize any language other than C.

Scratch that.

Compiling from Source: Partial Success

Since MacPorts had to download the package and compile it anyway, I decided to go ahead and do that on my own. I was a little wary because I wasn't exactly sure what port was doing in its "patching" step, but I thought I would give it a try anyway. I snagged the most recent tarball from the gettext site and ran the usual

$ ./configure
$ make


That gave me a binary for xgettext inside the source directory, and testing it against my Python source yielded the results I wanted. A simple .pot file was extracted with the original message strings and placeholders for singular and plural translations.

Next, I thought I'd get clever and install the results into the virtualenv I use for working on PyMOTW. Re-configuring with my --prefix set to $VIRTUAL_ENV, rebuilding, then running make install copied the binaries and a bunch of associated data files right where I expected them to be. And the binary only recognized the C language.

After a little more fighting, I did manage to get it working by installing with a prefix of $VIRTUAL_ENV/gettext and adding $VIRTUAL_ENV/gettext/bin to my PATH. I'm not sure if the problem was solved by clearing out an older, bad version of xgettext from elsewhere in my path or if using $VIRTUAL_ENV as the prefix somehow confused the install script.

Conclusion: I think I understand why internationalization is frequently the last feature dealt with in a project.

Thursday, June 11, 2009

Book Review: The Economics of Iterative Development



The Economics of Iterative Software Development, by Walker Royce, Kurt Bittner, and Mike Perrow, covers techniques for achieving more predictable results with development projects.

Disclaimer: I received a review copy of this book through the PyATL Book Club.

The goal of the book is to encourage adoption of iterative development processes, rather than the old-fashioned waterfall model (does anyone really still use that model?). In the authors' view, iterative processes, where one builds a rough version of a product and continually refine it over time, yield better results in terms of predictable schedule and meeting real requirements. This isn't a new idea, and they make reference to agile processes such as XP and Scrum, along with RUP. But the book isn't necessarily a guide to a specific set of processes. It's framed as more of an argument in favor of the entire class of techniques represented by modern development methodologies.

The authors clearly have experience with development processes and evaluating and improving performance of teams. Royce is a VP from IBM's Rational Services group and a contributor to RUP. Bettner is a CTO at Ivar Jacobsen Consulting and also contributed to RUP as well as jazz.net. Perrow is a writer within the Rational group at IBM. Their experience shows in the authoritative tone the book takes when presenting best practices.

"day-to-day decisions are dominated by value judgements, cost trade-offs, ..."


They start from the premise that software development is less of an engineering practice and more like a creative endeavor such as producing a film. Only 1/3 of movies deliver on time and within budget, for example. The comparison resonates with me since I have always viewed development work as more creative than mechanical manufacturing or engineering, although I was never quite sure other types of engineering were as non-creative as they are portrayed.

The more I think about it though, the more I am inclined to see managing software development as managing invention, which is even farther from typical engineering than movie production. With software, no two products or projects are exactly the same, so the "best practices" we learn have to be adapted for every situation.

They go on to describe a generalized history of approaches to software development processes. In the '60's and '70's the attitude was "craftsmanship", with lots of customization of tools and processes. In the '80's and '90's the trend was towards an engineering approach, but it still had a lot of innovation with new technology and techniques. Recent techniques have paid more attention to risk, taking advantage of automation.

Write code. Less of it. Mostly high-level languages.


The authors a couple of primary ways to reduce risk in development. The first is using propose component-based and service-oriented architectures. By isolating parts of the implementation from one another and connecting them through established interfaces, you can iterate over different parts improving as needed. There is also an emphasis on reducing the amount of "human generated" source code, either through high-level languages, off-the-shelf components, or code generation (unsurprisingly, they specifically mention UML-based code generators).

This was about the point in the book where I realized that it was missing the material needed to back up the assertions and claims being made. With a name like "The Economics of Iterative Software Development," I expected to find more statistics and supporting research material than is provided. I don't disagree with any of the authors' conclusions (indeed, they're hardly new insights for anyone who has read a couple of books on agile methodologies). The problem is if I was trying to use this book to convince someone who did not already accept the premise, I wouldn't have any basis for an argument.

This lack of background was particularly evident in chapter 7, where they talk about ways to "accelerate cultural change" to iterative development by choosing a high-profile project instead of easing into it with a pilot project. Their rationale is that the people assigned to work on high-profile projects are typically the better performers already, and if you get their buy-in, they will make the change work because of their dedication. The trick is getting the buy-in in the first place, of course.

I found the book interesting, well written, and worth reading. It doesn't quite stand on its own, though, if you're looking for ammunition to change your boss' mind about process. If you have already decided to go with an iterative process, it will reinforce your decision and provide guidance to make it work (particularly in the appendix). But it didn't live up to my expectations, based on the title.

Updated: Check out my notes for this book on readernaut.com.

Using Readernaut for taking notes on books

Over the past couple of months I've been working on reviving the PyATL book club. We received our first batch of books from Addison-Wesley/Pearson Education in time to distribute them at our meeting last month. I'll be posting my review soon, but before I do that I wanted to share my experiences with Nathan Borror's excellent http://readernaut.com/.

The site is easy to use. Once you have an account, just use the search field to find a book, either already in the Readernaut catalog or at Amazon. Add it to your shelf, then record your progress and notes as you read. In true web 2.0 fashion, you can connect with other readers and follow their progress (indeed, I may be coming late to the party on this one, because after I signed up I found a bunch of other Pythonistas already using the site to post notes and reviews about the books they're reading).

After a bit of experimentation, I'm hooked. I used the site to take notes about my first book club book, and am finding the results very helpful while preparing the summary for my review. I can record notes, quotes, and remarks as separate types of comments. All include page numbers, so as I build the review I don't even have to go hunting through the book for references.

Besides posting my own material, I've found a couple of interesting-looking titles among the books others are reading. As though I needed help finding more books to read.

If you take your reading seriously, check out http://readernaut.com/ as a convenient way to keep up with your notes.

Friday, June 5, 2009

new project: sphinxcontrib-paverutils

Kevin Dangoor's Paver includes basic integration for Sphinx, the excellent document production toolkit from Georg Brandl. As I have written before, however, the default integration didn't quite meet my needs for producing different forms of output from the same inputs.

Georg has opened the sphinxcontrib repository on BitBucket for developers who want to collaborate on providing unofficial extensions to Sphinx, so I decided to go ahead and package up the alternate integration I use and release it in case someone else finds it helpful. The result is sphinxcontrib.paverutils.

Wednesday, June 3, 2009

python-authors mailing list

At PyCon this year, a group of authors and editors met to start building the writing sub-community for people interested in Python-related topics appearing online and in print. One of the suggestions that came out of the meeting was to establish a new SIG mailing list, hosted on python.org. I'm pleased to announce that the python-authors list has been established and is ready to host traffic!

If you have any experience or interest in writing, editing, or reviewing technical writing such as blog posts, magazine articles, or books, this list is for you. We'll be trading tips, looking for collaborators for new projects, and just generally talking shop.

I hope you'll join us!

Updated 5 June 2009: Unfortunately I've had to set the list subscription to moderated to avoid spammers. The list itself is still unmoderated, and I will try to approve requests for new subscriptions as quickly as possible.