Sunday, February 24, 2008

PyMOTW: imp

The imp module exposes the implementation of Python's import statement.

Module: imp
Purpose: Interface to module import mechanism.
Python Version: At least 2.2.1

Description:

The imp module includes functions that expose part of the underlying implementation of Python's import mechanism for loading code in packages and modules. It is one access point to importing modules dynamically, and useful in some cases where you don't know the name of the module you need to import when you write your code (e.g., for plugins or extensions to an application).

Example Package:

The examples below use a package called "example" with __init__.py:

print 'Importing example package'


and module called submodule containing:

print 'Importing submodule'


Watch for the output of the print statements in the sample output when the package or module are imported.

Module Types:

Python supports several styles of modules. Each requires its own handling when opening the module and adding it to the namespace. Some of the supported types and those parameters can be listed by the get_suffixes() function.

import imp

module_types = { imp.PY_SOURCE: 'source',
imp.PY_COMPILED: 'compiled',
imp.C_EXTENSION: 'extension',
imp.PY_RESOURCE: 'resource',
imp.PKG_DIRECTORY: 'package',
}

def main():
fmt = '%10s %10s %10s'
print fmt % ('Extension', 'Mode', 'Type')
print '-' * 32
for extension, mode, module_type in imp.get_suffixes():
print fmt % (extension, mode, module_types[module_type])

if __name__ == '__main__':
main()


get_suffixes() returns a sequence of tuples containing the file extension, mode to use for opening the file, and a type code from a constant defined in the module. This table is incomplete, because some of the importable module or package types do not correspond to single files.


$ python imp_get_suffixes.py
Extension Mode Type
--------------------------------
.so rb extension
module.so rb extension
.py U source
.pyc rb compiled


Finding Modules:

The first step to loading a module is finding. The find_module() function scans the import search path looking for a package or module with the given name. It returns an open file handle (if appropriate for the type), filename where the module was found, and "description" (a tuple such as those returned by get_suffixes()).

import imp
from imp_get_suffixes import module_types

print 'Package:'
f, filename, description = imp.find_module('example')
print module_types[description[2]], filename
print

print 'Sub-module:'
f, filename, description = imp.find_module('submodule', ['./example'])
print module_types[description[2]], filename
if f: f.close()


find_module() does not pay attention to dotted package names (example.submodule), so the caller has to take care to pass the correct path for any nested modules. That means that when importing the submodule from the package, you need to give a path that points to the package directory for find_module() to locate the module you're looking for.


$ python imp_find_module.py
Package:
package /Users/dhellmann/Documents/PyMOTW/in_progress/imp/example

Sub-module:
source ./example/submodule.py


If find_module() cannot locate the module, it raises an ImportError.

import imp

try:
imp.find_module('no_such_module')
except ImportError, err:
print 'ImportError:', err



$ python imp_find_module_error.py
ImportError: No module named no_such_module


Loading Modules:

Once you have found the module, use load_module() to actually import it. load_module() takes the full dotted path module name and the values returned by find_module() (the open file handle, filename, and description tuple).

import imp

f, filename, description = imp.find_module('example')
example_package = imp.load_module('example', f, filename, description)
print 'Package:', example_package

f, filename, description = imp.find_module('submodule',
example_package.__path__)
try:
submodule = imp.load_module('example.module', f, filename, description)
print 'Sub-module:', submodule
finally:
f.close()


load_module() creates a new module object with the name given, loads the code for it, and adds it to sys.modules.


$ python imp_load_module.py
Importing example package
Package: <module 'example' from '/Users/dhellmann/Documents/PyMOTW/in_progress/imp/example/__init__.pyc'>
Importing submodule
Sub-module: <module 'example.module' from '/Users/dhellmann/Documents/PyMOTW/in_progress/imp/example/submodule.py'>


If you call load_module() for a module which has already been imported, the effect is like calling reload() on the existing module object.

import imp
import sys

for i in range(2):
print i,
try:
m = sys.modules['example']
except KeyError:
print '(not in sys.modules)',
else:
print '(have in sys.modules)',
f, filename, description = imp.find_module('example')
example_package = imp.load_module('example', f, filename, description)


Instead of a creating a new module, the contents of the existing module are simply replaced.


$ python imp_load_module_reload.py
0 (not in sys.modules) Importing example package
1 (have in sys.modules) Importing example package


References:

PEP 302: New Import Hooks
PEP 369: Post import hooks
Python Module of the Week Home
Download Sample Code


Technorati Tags:
,


Thursday, February 21, 2008

February issue of Python Magazine now available

The February issue of Python Magazine is available for download now.

This month's cover story by Chad Cooper is Mapping point locations with Python and Microsoft Live Search Maps. It's all about visualizing your database of points on a map using their web service.

Arkadiusz Wahlig's piece, Extending Skype using Python tells you the basics you need to write a Skype bot to add features to the popular chat and VOIP client.

In Using Python with SOAP to create a CLI for JIRA, Matthew Doar introduces SOAPpy and discusses how he has used it to create a command line tool for working with the JIRA bug tracker.

And our very own Brian Jones writes about common data modeling pitfalls this month in Identifying Data Badness.

Mark Mruss discusses documenting your code with docstrings in his latest Welcome to Python column.

Steve Holden's column about using Python's DB API meshes nicely with Brian's article.

And last, in my column this month I explore two more tools you should have in your toolbox: virtualenv and IPython.

So head over to the site and grab your copy today. As always, feedback is welcome via our contact page.

Sunday, February 17, 2008

PyMOTW: pkgutil

Alter the search path for a specific package using pkgutil.

Module: pkgutil
Purpose: Add to the module search path for a specific package to combine separate directories into a single package.
Python Version: 2.3 and later

Description:

The pkgutil module provides a single function, extend_path(), that is used to modify the search path for modules in a given package to include other directories in sys.path. This is very useful for overriding installed versions of packages with development versions, or for combining os-specific and shared modules into a single package namespace.

The most common way to call extend_path() is by adding these two lines to the __init__.py inside the package:

import pkgutil
__path__ = pkgutil.extend_path(__path__, __name__)


extend_path() returns a new module search path for the package that includes paths from sys.path that include a subdirectory with the package name. An example will make that more clear. Set up a package called demopkg1 with empty files:

$ find demopkg1 -name '*.py'
demopkg1/__init__.py
demopkg1/shared.py


Now create a directory structure like:

$ find extension -name '*.py'
extension/__init__.py
extension/demopkg1/__init__.py
extension/demopkg1/not_shared.py


Again, all of the files can be empty.

Now go back to demopkg1/__init__.py and edit it to contain:

import pkgutil
import pprint

print 'demopkg1.__path__ before:'
pprint.pprint(__path__)
print

__path__ = pkgutil.extend_path(__path__, __name__)

print 'demopkg1.__path__ after:'
pprint.pprint(__path__)
print


This shows what the search path is before and after it is modified, to illustrate the difference.

Now a simple test program to import the package:

import demopkg1
print 'demopkg1:', demopkg1.__file__

try:
import demopkg1.shared
except Exception, err:
print 'demopkg1.shared: Not found (%s)' % err
else:
print 'demopkg1.shared:', demopkg1.shared.__file__

try:
import demopkg1.not_shared
except Exception, err:
print 'demopkg1.not_shared: Not found (%s)' % err
else:
print 'demopkg1.not_shared:', demopkg1.not_shared.__file__


When this test program is run directly from the command line, the not_shared module is not found.


$ python pkgutil_extend_path.py
demopkg1.__path__ before:
['/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1']

demopkg1.__path__ after:
['/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1']

demopkg1: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1/__init__.pyc
demopkg1.shared: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1/shared.pyc
demopkg1.not_shared: Not found (No module named not_shared)


However, if we add "extension" to the PYTHONPATH and run it again, we see different results:

$ export PYTHONPATH=extension
$ python pkgutil_extend_path.py
demopkg1.__path__ before:
['/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1']

demopkg1.__path__ after:
['/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1',
'/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/extension/demopkg1']

demopkg1: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1/__init__.pyc
demopkg1.shared: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1/shared.pyc
demopkg1.not_shared: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/extension/demopkg1/not_shared.py


The version of demopkg1 inside the extension directory has been added to the search path, so the not_shared module is found there.

Extending the path in this manner is useful for combining os-specific versions of packages with common packages, especially if the os-specific versions include C extension modules.

Developing with pkgutil:

As I develop enhancements to my own projects, I commonly find that I need to test changes to an installed package. I don't want to replace the installed copy with my development version, since it is not necessarily correct (yet) and other tools on my system may depend on the installed package. I could configure a completely separate copy of the package in a development environment using something like virtualenv, but if I just need to modify one file that could be overkill. Another option is to use pkgutil to modify the module search path for modules that belong to the package I'm working on. In this case, however, I need to reverse the path, since I want the development version to override the installed version.

Suppose the package looks like this:

$ find demopkg2 -name '*.py'
demopkg2/__init__.py
demopkg2/overloaded.py


The function I'm working on is in demopkg2/overloaded.py. The installed version looks like:

def func():
print 'This is the installed version of func().'


And demopkg2/__init__.py contains:

import pkgutil
import pprint

__path__ = pkgutil.extend_path(__path__, __name__)
__path__.reverse()


Note the use of reverse() there to ensure that any directories added to the search path are scanned before the default location.

With another simple test program, I can run the function:

import demopkg2
print 'demopkg2:', demopkg2.__file__

import demopkg2.overloaded
print 'demopkg2.overloaded:', demopkg2.overloaded.__file__

print
demopkg2.overloaded.func()


First, without any special path treatment:

$ python pkgutil_devel.py
demopkg2: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg2/__init__.py
demopkg2.overloaded: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg2/overloaded.py

This is the installed version of func().


And now I can set up a development directory like this:

$ find develop -name '*.py'
develop/demopkg2/__init__.py
develop/demopkg2/overloaded.py


And replace the overloaded module contents:

def func():
print 'This is the development version of func().'


Now, when the test program is run with the develop directory in the search path, the overloaded module from the development directory is found and used.

$ export PYTHONPATH=develop 
$ python pkgutil_devel.py

demopkg2: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg2/__init__.pyc
demopkg2.overloaded: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/develop/demopkg2/overloaded.pyc

This is the development version of func().


Using .pkg files:

The first example illustrated how to extend the search path using extra directories included in the PYTHONPATH. It is also possible to add to the search path using *.pkg files containing directory names. PKG files are similar to the PTH files used by the site module. They can contain directory names, one per line, to be added to the search path for the package.

Another way to structure the os-specific portions of the application from the first example is to have a directory for each operating system, and use a .pkg file to extend the search path.

This example uses the same demopkg1 files, and also includes the following files:

$ find os_* -type f
os_one/demopkg1/__init__.py
os_one/demopkg1/not_shared.py
os_one/demopkg1.pkg
os_two/demopkg1/__init__.py
os_two/demopkg1/not_shared.py
os_two/demopkg1.pkg


The PKG files are named demopkg1.pkg to match the package we are extending. They both contain:

demopkg


This demo program shows the version of the module being imported:

import demopkg1
print 'demopkg1:', demopkg1.__file__

import demopkg1.shared
print 'demopkg1.shared:', demopkg1.shared.__file__

import demopkg1.not_shared
print 'demopkg1.not_shared:', demopkg1.not_shared.__file__


A simple run script can be used to switch between the two packages:

export PYTHONPATH=os_${1}
echo "PYTHONPATH=$PYTHONPATH"
echo

python pkgutil_os_specific.py


And when run with "one" or "two" as the arguments, the path is adjusted appropriately:

$ ./with_os.sh one
PYTHONPATH=os_one

demopkg1.__path__ before:
['/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1']

demopkg1.__path__ after:
['/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1',
'/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/os_one/demopkg1',
'demopkg']

demopkg1: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1/__init__.pyc
demopkg1.shared: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1/shared.pyc
demopkg1.not_shared: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/os_one/demopkg1/not_shared.pyc

$ ./with_os.sh two
PYTHONPATH=os_two

demopkg1.__path__ before:
['/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1']

demopkg1.__path__ after:
['/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1',
'/Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/os_two/demopkg1',
'demopkg']

demopkg1: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1/__init__.pyc
demopkg1.shared: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/demopkg1/shared.pyc
demopkg1.not_shared: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/os_two/demopkg1/not_shared.pyc


Of course, PKG files can appear anywhere in the normal search path, so a single PKG file in the current working directory could also be used to include a development tree.

Nested Packages:

For nested packages, it is only necessary to modify the path of the top-level package. For example, with a directory structure like:

$ find nested -name '*.py'
nested/__init__.py
nested/second/__init__.py
nested/second/deep.py
nested/shallow.py


Where nested/__init__.py contains:

import pkgutil

__path__ = pkgutil.extend_path(__path__, __name__)
__path__.reverse()


And a development tree like:

$ find develop/nested -name '*.py'
develop/nested/__init__.py
develop/nested/second/__init__.py
develop/nested/second/deep.py
develop/nested/shallow.py


Both the shallow and deep modules contain a simple function to print out a message indicating whether or not they come from the installed or development version.

Again, we need a simple test program:

import nested

import nested.shallow
print 'nested.shallow:', nested.shallow.__file__
nested.shallow.func()

print
import nested.second.deep
print 'nested.second.deep:', nested.second.deep.__file__
nested.second.deep.func()


When pkgutil_nested.py is run without any special path considerations, we see the installed version of both modules:


$ python pkgutil_nested.py
nested.shallow: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/nested/shallow.pyc
This func() comes from the installed version of nested.shallow

nested.second.deep: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/nested/second/deep.pyc
This func() comes from the installed version of nested.second.deep


And when the develop directory is added to the path, we see the development version of both functions:

$ PYTHONPATH=develop python pkgutil_nested.py 
nested.shallow: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/develop/nested/shallow.pyc
This func() comes from the development version of nested.shallow

nested.second.deep: /Users/dhellmann/Documents/PyMOTW/in_progress/pkgutil/develop/nested/second/deep.pyc
This func() comes from the development version of nested.second.deep


References:

The site module and PTH files
Python Module of the Week Home
Download Sample Code


Technorati Tags:
,




Updated 18 Feb to correct typo, thanks lpc.

Wednesday, February 13, 2008

Testing Python Linters

Based on recommendations via comments on an earlier post, my March column is a survey of a few different "lint" programs for Python (it looks like I'll stick to PyChecker, pylint, and PyFlakes for now). I need some sample code to run through all 3 programs so I can compare the output reports. I have a few ideas for common "mistakes" to include, but I'm looking for other suggestions.

So, what kinds of things are these tools good at finding, and where do they need more work? Are there any false-positivies I should make sure to include?

Sunday, February 10, 2008

Python Bug Day

From A. M. Kuchling via the python-dev mailing list:

After the success of January's bug day, which closed 37 issues, let's
have another one this month! Here's the brief announcement:

Python Bug Day: Saturday, February 23 2008. Meet in the #python-dev
IRC channel on irc.freenode.net and help improve Python. For more
information, see http://wiki.python.org/moin/PythonBugDay.

--amk

PyMOTW: tempfile

Securely generate temporary files and directories with the tempfile module.

Module: tempfile
Purpose: Create temporary filesystem resources.
Python Version: Since 1.4 with major security revisions in 2.3

Description:

Many programs need to create files to write intermediate data. Creating files with unique names securely, so they cannot be guessed by someone wanting to break the application, is challenging. The tempfile module provides several functions for creating filesystem resources securely. TemporaryFile() opens and returns an un-named file, NamedTemporaryFile() opens and returns a named file, and mkdtemp() creates a temporary directory and returns its name.

TemporaryFile:

If your application needs a temporary file to store data, but does not need to share that file with other programs, the best option for creating the file is the TemporaryFile() function. It creates a file, and on platforms where it is possible, unlinks it immediately. This makes it impossible for another program to find or open the file, since there is no reference to it in the filesystem table. The file created by TemporaryFile() is removed automatically when it is closed.

import os
import tempfile

print 'Building a file name yourself:'
filename = '/tmp/guess_my_name.%s.txt' % os.getpid()
temp = open(filename, 'w+b')
try:
print 'temp:', temp
print 'temp.name:', temp.name
finally:
temp.close()
# Clean up the temporary file yourself
os.remove(filename)

print
print 'TemporaryFile:'
temp = tempfile.TemporaryFile()
try:
print 'temp:', temp
print 'temp.name:', temp.name
finally:
# Automatically cleans up the file
temp.close()


This example illustrates the difference in creating a temporary file using a common pattern for making up a name, versus using the TemporaryFile() function. Notice that the file returned by TemporaryFile has no name.


$ python tempfile_TemporaryFile.py
Building a file name yourself:
temp: <open file '/tmp/guess_my_name.7297.txt', mode 'w+b' at 0x5c338>
temp.name: /tmp/guess_my_name.7297.txt

TemporaryFile:
temp: <open file '<fdopen>', mode 'w+b' at 0x5c410>
temp.name: <fdopen>


By default, the file handle is created with mode 'w+b' so it behaves consistently on all platforms and your program can write to it and read from it.

import os
import tempfile

temp = tempfile.TemporaryFile()
try:
temp.write('Some data')
temp.seek(0)

print temp.read()
finally:
temp.close()


After writing, you have to rewind the file handle using seek() in order to read the data back from it.


$ python tempfile_TemporaryFile_binary.py
Some data


If you want the file to work in text mode, pass mode='w+t' when you create it:

import tempfile

f = tempfile.TemporaryFile(mode='w+t')
try:
f.writelines(['first\n', 'second\n'])
f.seek(0)

for line in f:
print line.rstrip()
finally:
f.close()


The file handle treats the data as text:


$ python tempfile_TemporaryFile_text.py
first
second


NamedTemporaryFile:

There are situations, however, where having a named temporary file is important. If your application spans multiple processes, or even hosts, naming the file is the simplest way to pass it between parts of the application. The NamedTemporaryFile() function creates a file with a name, accessed from the name attribute.

import os
import tempfile

temp = tempfile.NamedTemporaryFile()
try:
print 'temp:', temp
print 'temp.name:', temp.name
finally:
# Automatically cleans up the file
temp.close()
print 'Exists after close:', os.path.exists(temp.name)


Even though the file is named, it is still removed after the handle is closed.


$ python tempfile_NamedTemporaryFile.py
temp: <open file '<fdopen>', mode 'w+b' at 0x5c338>
temp.name: /var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-/tmplBKZMv
Exists after close: False


mkdtemp:

If you need several temporary files, it may be more convenient to create a single temporary directory and then open all of the files in that directory. To create a temporary directory, use mkdtemp().

import os
import tempfile

directory_name = tempfile.mkdtemp()
print directory_name
# Clean up the directory yourself
os.removedirs(directory_name)


Since the directory is not "opened" per se, you have to remove it yourself when you are done with it.


$ python tempfile_mkdtemp.py
/var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-/tmp0OsHPg



Predicting Names:

For debugging purposes, it is useful to be able to include some indication of the origin of the temporary files. While obviously less secure than strictly anonymous temporary files, including a predictable portion in the name lets you find the file to examine it while your program is using it. All of the functions described so far take three arguments to allow you to control the filenames to some degree. Names are generated using the formula:

dir + prefix + random + suffix


where all of the values except random can be passed as arguments to TemporaryFile(), NamedTemporaryFile(), and mkdtemp(). For example:

import tempfile

temp = tempfile.NamedTemporaryFile(suffix='_suffix',
prefix='prefix_',
dir='/tmp',
)
try:
print 'temp:', temp
print 'temp.name:', temp.name
finally:
temp.close()


The prefix and suffix arguments are combined with a random string of characters to build the file name, and the dir argument is taken as-is and used as the location of the new file.


$ python tempfile_NamedTemporaryFile_args.py
temp: <open file '<fdopen>', mode 'w+b' at 0x5c338>
temp.name: /tmp/prefix_zy-7H3_suffix


Temporary File Location:

If you don't specify an explicit destination using the dir argument, the actual path used for the temporary files will vary based on your platform and settings. The tempfile module includes 2 functions for querying the settings being used at runtime:

import tempfile

print 'gettempdir():', tempfile.gettempdir()
print 'gettempprefix():', tempfile.gettempprefix()


gettempdir() returns the default directory that will hold all of the temporary files and gettempprefix() returns the string prefix for new file and directory names.


$ python tempfile_settings.py
gettempdir(): /var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-
gettempprefix(): tmp


The value returned by gettempdir() is set based on a straightforward algorithm of looking through a list of locations for the first place the current process can create a file. From the library documentation:

Python searches a standard list of directories and sets tempdir to the first one which the calling user can create files in. The list is:

1. The directory named by the TMPDIR environment variable.

2. The directory named by the TEMP environment variable.

3. The directory named by the TMP environment variable.

4. A platform-specific location:

* On RiscOS, the directory named by the Wimp$ScrapDir environment variable.

* On Windows, the directories C:$\backslash$TEMP, C:$\backslash$TMP, $\backslash$TEMP, and $\backslash$TMP, in that order.

* On all other platforms, the directories /tmp, /var/tmp, and /usr/tmp, in that order.

5. As a last resort, the current working directory.



If your program needs to use a global location for all temporary files that you need to set explicitly but do not want to set through one of these environment variables, you can set tempfile.tempdir directly.

import tempfile

tempfile.tempdir = '/I/changed/this/path'
print 'gettempdir():', tempfile.gettempdir()



$ python tempfile_tempdir.py
gettempdir(): /I/changed/this/path


References:

Python Module of the Week Home
Download Sample Code


Technorati Tags:
,


Wednesday, February 6, 2008

AstronomyPictureOfTheDay 3.0

I finally fixed one of the lingering "issues" I've had since migrating from Tiger to Leopard last month. Something in the Automator actions or other behavior changed in my AstronomyPictureOfTheDay app and every morning I have been greeted with 2 Safari windows. One is the usual window showing the description of the image downloaded that night, and the other is the local copy of the image. I don't need to see the latter, and it seemed a little sloppy so I wanted to fix it.

After some fiddling around with "View Results" actions I determined that the local filename was being added to the output chain by "Set the Desktop Picture". So where before I had a chain of actions that only sent the remote URL to the final "Show Web page" action, now I needed to filter the data. Throwing a variable into the mix did not solve the problem. I'm not sure how the variable is supposed to work -- it seems like it should replace the output instead of extending it. So I took the brute force option and inserted a shell script to filter the URLs before opening them.

So if you've been running an earlier version of AstronomyPictureOfTheDay under Leopard, grab the new one and give it a try. As usual, email me with any problems.

Sunday, February 3, 2008

PyMOTW: string

Although most of the functions it used to contain have moved to methods of string and unicode objects, the string module still contains several useful items.

Module: string
Purpose: Contains useful constants and classes for working with text.
Python Version: 2.5

Description:

The string module dates from the earliest versions of Python. In version 2.0, many of the functions previously implemented only in the module were moved to methods of string objects. Legacy versions of those functions are still available, but their use is deprecated and they will be dropped in Python 3.0. The string module still contains several useful constants and classes for working with string and unicode objects, and this discussion will concentrate on them.

Constants:

The constants in the string module can be used to specify categories of characters such as ascii_letters and digits. Some of the constants are locale-dependent, such as lowercase, so the value changes to reflect the language settings of the user. Others, such as hexdigits, do not change when the locale changes.

import string

for n in dir(string):
if n.startswith('_'):
continue
v = getattr(string, n)
if isinstance(v, basestring):
print '%s=%s' % (n, repr(v))
print


Most of the names for the constants are self-explanatory.


$ python string_constants.py
ascii_letters='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

ascii_lowercase='abcdefghijklmnopqrstuvwxyz'

ascii_uppercase='ABCDEFGHIJKLMNOPQRSTUVWXYZ'

digits='0123456789'

hexdigits='0123456789abcdefABCDEF'

letters='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

lowercase='abcdefghijklmnopqrstuvwxyz'

octdigits='01234567'

printable='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

punctuation='!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

uppercase='ABCDEFGHIJKLMNOPQRSTUVWXYZ'

whitespace='\t\n\x0b\x0c\r '



Functions:

There are two functions not moving from the string module. capwords() capitalizes all of the words in a string.

import string

s = 'The quick brown fox jumped over the lazy dog.'

print s
print string.capwords(s)


The results are the same as if you called split(), capitalized the words in the resulting list, then called join() to combine the results.


$ python string_capwords.py
The quick brown fox jumped over the lazy dog.
The Quick Brown Fox Jumped Over The Lazy Dog.


The other function creates translation tables that can be used with the translate() method to change one set of characters to another.

import string

leet = string.maketrans('abegiloprstz', '463611092572')

s = 'The quick brown fox jumped over the lazy dog.'

print s
print s.translate(leet)


In this example, some letters are replaced by their l33t number alternatives.


$ python string_maketrans.py
The quick brown fox jumped over the lazy dog.
Th3 qu1ck 620wn f0x jum93d 0v32 7h3 142y d06.


Templates:

String templates were added in Python 2.4 as part of PEP 292 and are intended as an alternative to the built-in interpolation syntax. With string.Template interpolation, variables are identified by name prefixed with $ (e.g., '$var') or, if necessary to set them off from surrounding text, they can also be wrapped with curly braces (e.g., '${var}').

This example compares a simple template with a similar string interpolation setup:

import string

values = { 'var':'foo' }

t = string.Template("""
$var
$$
${var}iable
""")

print 'TEMPLATE:', t.substitute(values)

s = """
%(var)s
%%
%(var)siable
"""

print 'INTERPLOATION:', s % values


As you see, in both cases the trigger character ($ or %) is escaped by repeating it twice.


$ python string_template.py
TEMPLATE:
foo
$
fooiable

INTERPLOATION:
foo
%
fooiable


One key difference between templates and standard string interpolation is that the type of the arguments is not taken into account. The values are converted to strings and the strings are inserted. No formatting options are available. For example, there is no way to control the number of digits used to represent a floating point value.

A benefit, though, is that by using the safe_substitute() method, it is possible to avoid exceptions if not all of the values needed by the template are provided as arguments.

import string

values = { 'var':'foo' }

t = string.Template("$var is here but $missing is not provided")

try:
print 'TEMPLATE:', t.substitute(values)
except KeyError, err:
print 'ERROR:', str(err)

print 'TEMPLATE:', t.safe_substitute(values)


Since there is no value for missing in the values dictionary, a KeyError is raised by substitute(). Instead of raising the error, safe_substitute() catches it and leaves the variable expression alone in the text.


$ python string_template_missing.py
TEMPLATE: ERROR: 'missing'
TEMPLATE: foo is here but $missing is not provided


Advanced Templates:

If the default syntax for string.Template is not to your liking, you can change the behavior by adjusting the regular expression patterns it uses to find the variable names in the template body. A simple way to do that is to change the delimiter and idpattern class attributes.

import string

class MyTemplate(string.Template):
delimiter = '%'
idpattern = '[a-z]+_[a-z]+'

t = MyTemplate('%% %with_underscore %notunderscored')
d = { 'with_underscore':'replaced',
'notunderscored':'not replaced',
}

print t.safe_substitute(d)


In this example, variable ids must include an underscore somewhere in the middle, so %notunderscored is not replaced by anything.


$ python string_template_advanced.py
% replaced %notunderscored


For more complex changes, you can override the pattern attribute and define an entirely new regular expression. The pattern provided must contain 4 named groups for capturing the escaped delimiter, the named variable, a braced version of the variable name, and invalid delimiter patterns.

Let's look at the default pattern:

import string

t = string.Template('$var')
print t.pattern.pattern


Since t.pattern is a compiled regular expression, we have to access its pattern attribute to see the actual string.


$ python string_template_defaultpattern.py

\$(?:
(?P<escaped>\$) | # Escape sequence of two delimiters
(?P<named>[_a-z][_a-z0-9]*) | # delimiter and a Python identifier
{(?P<braced>[_a-z][_a-z0-9]*)} | # delimiter and a braced identifier
(?P<invalid>) # Other ill-formed delimiter exprs
)



If we wanted to create a new type of template using, for example, {{var}} as the variable syntax, we could use a pattern like this:

import re
import string

class MyTemplate(string.Template):
delimiter = '{{'
pattern = re.compile(r'''
\{\{(?:
(?P<escaped>\{\{)|
(?P<named>[_a-z][_a-z0-9]*)\}\}|
(?P<braced>[_a-z][_a-z0-9]*)\}\}|
(?P<invalid>)
)
''', re.VERBOSE | re.DOTALL)

t = MyTemplate('''
{{{{
{{var}}
''')

print 'MATCHES:', t.pattern.findall(t.template)
print 'SUBSTITUTED:', t.safe_substitute(var='replacement')


Notice that we still have to provide both the named and braced patterns, even though they are the same. Here's the output:


$ python string_template_newsyntax.py
MATCHES: [('{{', '', '', ''), ('', 'var', '', '')]
SUBSTITUTED:
{{
replacement



Deprecated Functions:

For information on the deprecated functions moved to the string and unicode classes, refer to String Methods in the manual.

References:

Leet
PEP 292: Simpler String Substitutions
Python Module of the Week Home
Download Sample Code


Technorati Tags:
,