Sunday, November 29, 2009

PyMOTW: plistlib - Manipulate OS X property list files

plistlib – Manipulate OS X property list files

Purpose:Read and write OS X property list files
Python Version:2.6

plistlib provides an interface for working with property list files
used under OS X. plist files are typically XML, sometimes compressed.
They are used by the operating system and applications to store
preferences or other configuration settings. The contents are usually
structured as a dictionary containing key value pairs of basic
built-in types (unicode strings, integers, dates, etc.). Values can
also be nested data structures such as other dictionaries or lists.
Binary data, or strings with control characters, can be encoded using
the data type.

Reading plist Files

OS X applications such as iCal use plist files to store meta-data
about objects they manage. For example, iCal stores the definitions
of all of your calendars as a series of plist files in the Library
directory.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>AlarmFilter</key>
<true/>
<key>AlarmsDisabled</key>
<false/>
<key>AttachmentFilter</key>
<true/>
<key>AutoRefresh</key>
<true/>
<key>Checked</key>
<integer>1</integer>
<key>Color</key>
<string>#808000FF</string>
<key>Enabled</key>
<true/>
<key>Key</key>
<string>4221BCE5-1017-4EE4-B7FF-311A846C600D</string>
<key>NeedsForcedUpdate</key>
<false/>
<key>NeedsRefresh</key>
<true/>
<key>Order</key>
<integer>25</integer>
<key>RefreshDate</key>
<date>2009-11-29T16:31:53Z</date>
<key>RefreshInterval</key>
<integer>3600</integer>
<key>SubscriptionTitle</key>
<string>Athens, GA Weather - By Weather Underground</string>
<key>SubscriptionURL</key>
<string>http://ical.wunderground.com/auto/ical/GA/Athens.ics?units=both</string>
<key>TaskFilter</key>
<true/>
<key>Title</key>
<string>Athens, GA Weather - By Weather Underground</string>
<key>Type</key>
<string>Subscription</string>
</dict>
</plist>

This sample script finds the calendar defintions, reads
them, and prints the titles of any calendars being displayed by iCal
(having the property Checked set to a true value).

import plistlib
import os
import glob

calendar_root = os.path.expanduser('~/Library/Calendars')
calendar_directories = (
glob.glob(os.path.join(calendar_root, '*.caldav', '*.calendar')) +
glob.glob(os.path.join(calendar_root, '*.calendar'))
)

for dirname in calendar_directories:
info_filename = os.path.join(dirname, 'Info.plist')
if os.path.isfile(info_filename):
info = plistlib.readPlist(info_filename)
if info.get('Checked'):
print info['Title']

The type of the Checked property is defined by the plist file, so
our script does not need to convert the string to an integer.

$ python plistlib_checked_calendars.py
Doug Hellmann
Tasks
Vacation Schedule
EarthSeasons
US Holidays
Athens, GA Weather - By Weather Underground
Birthdays
Georgia Bulldogs Calendar (NCAA Football)
Home
Meetup: Django
Meetup: Python

Writing plist Files

If you want to use plist files to save your own settings, use
writePlist() to serialize the data and write it to the filesystem.

import plistlib
import datetime
import tempfile

d = { 'an_int':2,
'a_bool':False,
'the_float':5.9,
'simple_string':'This string has no special characters.',
'xml_string':'<element attr="value">This string includes XML markup &nbsp;</element>',
'nested_list':['a', 'b', 'c'],
'nested_dict':{ 'key':'value' },
'timestamp':datetime.datetime.now(),
}

output_file = tempfile.NamedTemporaryFile()
try:
plistlib.writePlist(d, output_file)
output_file.seek(0)
print output_file.read()
finally:
output_file.close()

The first argument is the data structure to write out, and the second
is an open file handle or the name of a file.

$ python plistlib_write_plist.py
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>a_bool</key>
<false/>
<key>an_int</key>
<integer>2</integer>
<key>nested_dict</key>
<dict>
<key>key</key>
<string>value</string>
</dict>
<key>nested_list</key>
<array>
<string>a</string>
<string>b</string>
<string>c</string>
</array>
<key>simple_string</key>
<string>This string has no special characters.</string>
<key>the_float</key>
<real>5.9000000000000004</real>
<key>timestamp</key>
<date>2009-11-29T12:09:35Z</date>
<key>xml_string</key>
<string>&lt;element attr="value"&gt;This string includes XML markup &amp;nbsp;&lt;/element&gt;</string>
</dict>
</plist>


Binary Property Data

Serializing binary data or strings that may include control characters
using a plist is not immune to the typical challenges for an XML
format. To work around the issues, plist files can store binary data
in base64 format if the object is wrapped with a Datainstance.

import plistlib

d = { 'binary_data':plistlib.Data('This data has an embedded null. \0'),
}

print plistlib.writePlistToString(d)

This example uses the ToString version of the write function to create
an in-memory string instead of writing to a file.

$ python plistlib_binary_write.py
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>binary_data</key>
<data>
VGhpcyBkYXRhIGhhcyBhbiBlbWJlZGRlZCBudWxsLiAA
</data>
</dict>
</plist>

Binary data is automatically converted to a Data instance when
read.

import plistlib
import pprint

DATA = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>binary_data</key>
<data>
VGhpcyBkYXRhIGhhcyBhbiBlbWJlZGRlZCBudWxsLiAA
</data>
</dict>
</plist>
"""

d = plistlib.readPlistFromString(DATA)

print repr(d['binary_data'].data)

The data attribute of the object contains the decoded data.

$ python plistlib_binary_read.py
'This data has an embedded null. \x00'

See also

plistlib
The standard library documentation for this module.
plist manual page
Documentation of the plist file format.
Weather Underground
Free weather information, including ICS and RSS feeds.
Convert plist between XML and Binary formats
Some plist files are stored in a binary format instead of XML
because the binary format is faster to parse using Apple’s
libraries. Python’s plistlib module does not handle the
binary format, so you may need to convert binary files to XML
using plutil before reading them.

PyMOTW Home

The canonical version of this article

Monday, November 23, 2009

PyMOTW: sys, Part 7: Modules and Imports

Modules and Imports

Most Python programs end up as a combination of several modules with a main application importing them. Whether
you are using the features of the standard library, or organizing your own code in separate files to make it
easier to maintain, understanding and managing the dependencies for your program is an important aspect of
development. sys includes information about the modules available to your application, either as built-ins
or after being imported. It also defines hooks for overriding the standard import behavior for special cases.

Imported Modules

sys.modules is a dictionary mapping the names of imported modules to the module object holding the code.

import sys
import textwrap

names = sorted(sys.modules.keys())
name_text = ', '.join(names)

print textwrap.fill(name_text)

The contents of sys.modules change as new modules are imported.

$ python sys_modules.py
UserDict, __builtin__, __main__, _abcoll, _codecs, _sre, _warnings,
abc, codecs, copy_reg, encodings, encodings.__builtin__,
encodings.aliases, encodings.codecs, encodings.encodings,
encodings.utf_8, errno, exceptions, genericpath, linecache, new, os,
os.path, posix, posixpath, re, signal, site, sphinxcontrib,
sre_compile, sre_constants, sre_parse, stat, string, strop, sys,
textwrap, types, warnings, zipimport

Built-in Modules

The Python interpreter can be compiled with some C modules built right in, so you don’t need to distribute them
as separate shared libraries. These modules don’t appear in the list of imported modules managed in
sys.modules because they weren’t technically imported. The only way to find the available built-in modules is
through sys.builtin_module_names.

import sys

for name in sys.builtin_module_names:
print name

Note

Your results may vary, especially if you have built a custom version of the interpreter.
This script was run using a copy of the interpreter installed from the standard python.org
installer for the platform.

$ python sys_builtins.py
__builtin__
__main__
_ast
_codecs
_sre
_symtable
_warnings
errno
exceptions
gc
imp
marshal
posix
pwd
signal
sys
thread
xxsubtype
zipimport

See also

Build instructions
Instructions for building Python, from the README distributed with the source.

Import Path

The search path for modules is managed as a Python list saved in sys.path. The default contents of the path
include the directory of the script used to start the application and the current working directory.

import sys

for d in sys.path:
print d

As you can see here, the first directory in the search path is the home for the sample script itself. That is
followed by a series of platform-specific paths where compiled extension modules (written in C) might be
installed, and then the global site-packages directory is listed last.

$ python sys_path_show.py
/Users/dhellmann/Documents/PyMOTW/src/PyMOTW/sys
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-darwin
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-tk
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac/lib-scriptpackages
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages

The import search path list can be modified before starting the interpreter by setting the shell variable
PYTHONPATH to a colon-separated list of directories.

$ PYTHONPATH=/my/private/site-packages:/my/shared/site-packages python sys_path_show.py
/Users/dhellmann/Documents/PyMOTW/src/PyMOTW/sys
/my/private/site-packages
/my/shared/site-packages
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-darwin
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-tk
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac/lib-scriptpackages
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages

A program can also modify its path by adding elements to sys.path directly.

import sys
import os

base_dir = os.path.dirname(__file__) or '.'
print 'Base directory:', base_dir

# Insert the package_dir_a directory at the front of the path.
package_dir_a = os.path.join(base_dir, 'package_dir_a')
sys.path.insert(0, package_dir_a)

# Import the example module
import example
print 'Imported example from:', example.__file__
print '\t', example.DATA

# Make package_dir_b the first directory in the search path
package_dir_b = os.path.join(base_dir, 'package_dir_b')
sys.path.insert(0, package_dir_b)

# Reload the module to get the other version
reload(example)
print 'Reloaded example from:', example.__file__
print '\t', example.DATA
$ python sys_path_modify.py
Base directory: .
Imported example from: ./package_dir_a/example.pyc
This is example A
Reloaded example from: ./package_dir_b/example.pyc
This is example B

Custom Importers

Modifying the search path lets you control how standard Python modules are found, but what if you need to import
code from somewhere other than the usual .py or .pyc files on the filesystem? PEP 302 solves this
problem by introducing the idea of import hooks that let you trap an attempt to find a module on the search
path and take alternative measures to load the code from somewhere else or apply pre-processing to it.

Finders

Custom importers are implemented in two separate phases. The finder is responsible for locating a module and
providing a loader to manage the actual import. Adding a custom module finder is as simple as appending a
factory to the sys.path_hooks list. On import, each part of the path is given to a finder until one claims
support (by not raising ImportError). That finder is then responsible for searching data storage represented by
its path entry for named modules.

import sys

class NoisyImportFinder(object):

PATH_TRIGGER = 'NoisyImportFinder_PATH_TRIGGER'

def __init__(self, path_entry):
print 'Checking NoisyImportFinder support for %s' % path_entry
if path_entry != self.PATH_TRIGGER:
print 'NoisyImportFinder does not work for %s' % path_entry
raise ImportError()
return

def find_module(self, fullname, path=None):
print 'NoisyImportFinder looking for "%s"' % fullname
return None

sys.path_hooks.append(NoisyImportFinder)

sys.path.insert(0, NoisyImportFinder.PATH_TRIGGER)

try:
import target_module
except Exception, e:
print 'Import failed:', e

This example illustrates how the finders are instantiated and queried. The NoisyImportFinder raises ImportError
when instantiated with a path entry that does not match its special trigger value, which is obviously not a real
path on the filesystem. This test prevents the NoisyImportFinder from breaking imports of real modules.

$ python sys_path_hooks_noisy.py
Checking NoisyImportFinder support for NoisyImportFinder_PATH_TRIGGER
NoisyImportFinder looking for "target_module"
Checking NoisyImportFinder support for /Users/dhellmann/Documents/PyMOTW/src/PyMOTW/sys
NoisyImportFinder does not work for /Users/dhellmann/Documents/PyMOTW/src/PyMOTW/sys
Import failed: No module named target_module

Importing from a Shelve

When the finder locates a module, it is responsible for returning a loader capable of importing that module.
This example illustrates a custom importer that saves its module contents in a database created by shelve.

The first step is to create a script to populate the shelf with a package containing a sub-module and
sub-package.

import sys
import shelve
import os

filename = '/tmp/pymotw_import_example.shelve'
if os.path.exists(filename):
os.unlink(filename)
db = shelve.open(filename)
try:
db['data:README'] = """
==============
package README
==============

This is the README for ``package``.
"""
db['package.__init__'] = """
print 'package imported'
message = 'This message is in package.__init__'
"""
db['package.module1'] = """
print 'package.module1 imported'
message = 'This message is in package.module1'
"""
db['package.subpackage.__init__'] = """
print 'package.subpackage imported'
message = 'This message is in package.subpackage.__init__'
"""
db['package.subpackage.module2'] = """
print 'package.subpackage.module2 imported'
message = 'This message is in package.subpackage.module2'
"""
print 'Created %s with:' % filename
for key in sorted(db.keys()):
print '\t', key
finally:
db.close()

A real packaging script would read the contents from the filesystem, but using hard-coded values is sufficient
for a simple example like this.

$ python sys_shelve_importer_create.py
Created /tmp/pymotw_import_example.shelve with:
data:README
package.__init__
package.module1
package.subpackage.__init__
package.subpackage.module2

Next, we need to provide finder and loader classes that know how to look in a shelf for the source of a module or
package:

import contextlib
import imp
import os
import shelve
import sys


@contextlib.contextmanager
def shelve_context(filename, flag='r'):
"""Context manager to make shelves work with 'with' statement."""
db = shelve.open(filename, flag)
try:
yield db
finally:
db.close()


def _mk_init_name(fullname):
"""Return the name of the __init__ module for a given package name."""
if fullname.endswith('.__init__'):
return fullname
return fullname + '.__init__'


def _get_key_name(fullname, db):
"""Look in an open shelf for fullname or fullname.__init__, return the name found."""
if fullname in db:
return fullname
init_name = _mk_init_name(fullname)
if init_name in db:
return init_name
return None


class ShelveFinder(object):
"""Find modules collected in a shelve archive."""

def __init__(self, path_entry):
if not os.path.isfile(path_entry):
raise ImportError
try:
# Test the path_entry to see if it is a valid shelf
with shelve_context(path_entry):
pass
except Exception, e:
raise ImportError(str(e))
else:
print 'new shelf added to import path:', path_entry
self.path_entry = path_entry
return

def __str__(self):
return '<%s for "%s">' % (self.__class__.__name__, self.path_entry)

def find_module(self, fullname, path=None):
path = path or self.path_entry
print 'looking for "%s" in %s ...' % (fullname, path),
with shelve_context(path) as db:
key_name = _get_key_name(fullname, db)
if key_name:
print 'found it as %s' % key_name
return ShelveLoader(path)
print 'not found'
return None


class ShelveLoader(object):
"""Load source for modules from shelve databases."""

def __init__(self, path_entry):
self.path_entry = path_entry
return

def _get_filename(self, fullname):
return '<%s "%s"[%s]>' % (self.__class__.__name__, self.path_entry, fullname)

def get_source(self, fullname):
print 'loading source for "%s" from shelf' % fullname
try:
with shelve_context(self.path_entry) as db:
key_name = _get_key_name(fullname, db)
if key_name:
return db[key_name]
raise ImportError('could not find source for %s' % fullname)
except Exception, e:
print 'could not load source:', e
raise ImportError(str(e))

def get_code(self, fullname):
source = self.get_source(fullname)
print 'compiling code for "%s"' % fullname
return compile(source, self._get_filename(fullname), 'exec', dont_inherit=True)

def get_data(self, path):
print 'looking for data for "%s"' % path
if not path.startswith(self.path_entry):
raise IOError
path = path[len(self.path_entry)+1:]
key_name = 'data:' + path
try:
with shelve_context(self.path_entry) as db:
return db[key_name]
except Exception, e:
# Convert all errors to IOError
raise IOError

def is_package(self, fullname):
init_name = _mk_init_name(fullname)
with shelve_context(self.path_entry) as db:
return init_name in db

def load_module(self, fullname):
source = self.get_source(fullname)

if fullname in sys.modules:
print 'reusing existing module from previous import of "%s"' % fullname
mod = sys.modules[fullname]
else:
print 'creating a new module object for "%s"' % fullname
mod = sys.modules.setdefault(fullname, imp.new_module(fullname))

# Set a few properties required by PEP 302
mod.__file__ = self._get_filename(fullname)
mod.__name__ = fullname
mod.__path__ = self.path_entry
mod.__loader__ = self
mod.__package__ = '.'.join(fullname.split('.')[:-1])

if self.is_package(fullname):
print 'adding path for package'
# Set __path__ for packages
# so we can find the sub-modules.
mod.__path__ = [ self.path_entry ]
else:
print 'imported as regular module'

print 'execing source...'
exec source in mod.__dict__
print 'done'
return mod

Now we can use ShelveFinder and ShelveLoader to import code from a shelf. For example, importing the
package created above:

import sys
import sys_shelve_importer

def show_module_details(module):
print ' message:', module.message
print ' __name__:', module.__name__
print ' __package__:', module.__package__
print ' __file__:', module.__file__
print ' __path__:', module.__path__
print ' __loader__:', module.__loader__

filename = '/tmp/pymotw_import_example.shelve'
sys.path_hooks.append(sys_shelve_importer.ShelveFinder)
sys.path.insert(0, filename)

print 'Import of "package":'
import package

print
print 'Examine package details:'
show_module_details(package)

print
print 'Global settings:'
print 'sys.modules entry:', sys.modules['package']

The shelf is added to the import path the first time an import occurs after the path is modified. The finder
recognizes the shelf and returns a loader, which is used for all imports from that shelf. The initial
package-level import creates a new module object and then execs the source loaded from the shelf, using the new
module as the namespace so that names defined in the source are preserved as module-level attributes.

$ python sys_shelve_importer_package.py
Import of "package":
new shelf added to import path: /tmp/pymotw_import_example.shelve
looking for "package" in /tmp/pymotw_import_example.shelve ... found it as package.__init__
loading source for "package" from shelf
creating a new module object for "package"
adding path for package
execing source...
package imported
done

Examine package details:
message: This message is in package.__init__
__name__: package
__package__:
__file__: <ShelveLoader "/tmp/pymotw_import_example.shelve"[package]>
__path__: ['/tmp/pymotw_import_example.shelve']
__loader__: <sys_shelve_importer.ShelveLoader object at 0xddf50>

Global settings:
sys.modules entry: <module 'package' from '<ShelveLoader "/tmp/pymotw_import_example.shelve"[package]>'>

Packages

The loading of other modules and sub-packages proceeds in the same way.

import sys
import sys_shelve_importer

def show_module_details(module):
print ' message:', module.message
print ' __name__:', module.__name__
print ' __package__:', module.__package__
print ' __file__:', module.__file__
print ' __path__:', module.__path__
print ' __loader__:', module.__loader__

filename = '/tmp/pymotw_import_example.shelve'
sys.path_hooks.append(sys_shelve_importer.ShelveFinder)
sys.path.insert(0, filename)

print
print 'Import of "package.module1":'
import package.module1

print
print 'Examine package.module1 details:'
show_module_details(package.module1)

print
print 'Import of "package.subpackage.module2":'
import package.subpackage.module2

print
print 'Examine package.subpackage.module2 details:'
show_module_details(package.subpackage.module2)
$ python sys_shelve_importer_module.py

Import of "package.module1":
new shelf added to import path: /tmp/pymotw_import_example.shelve
looking for "package" in /tmp/pymotw_import_example.shelve ... found it as package.__init__
loading source for "package" from shelf
creating a new module object for "package"
adding path for package
execing source...
package imported
done
looking for "package.module1" in /tmp/pymotw_import_example.shelve ... found it as package.module1
loading source for "package.module1" from shelf
creating a new module object for "package.module1"
imported as regular module
execing source...
package.module1 imported
done

Examine package.module1 details:
message: This message is in package.module1
__name__: package.module1
__package__: package
__file__: <ShelveLoader "/tmp/pymotw_import_example.shelve"[package.module1]>
__path__: /tmp/pymotw_import_example.shelve
__loader__: <sys_shelve_importer.ShelveLoader object at 0xe31b0>

Import of "package.subpackage.module2":
looking for "package.subpackage" in /tmp/pymotw_import_example.shelve ... found it as package.subpackage.__init__
loading source for "package.subpackage" from shelf
creating a new module object for "package.subpackage"
adding path for package
execing source...
package.subpackage imported
done
looking for "package.subpackage.module2" in /tmp/pymotw_import_example.shelve ... found it as package.subpackage.module2
loading source for "package.subpackage.module2" from shelf
creating a new module object for "package.subpackage.module2"
imported as regular module
execing source...
package.subpackage.module2 imported
done

Examine package.subpackage.module2 details:
message: This message is in package.subpackage.module2
__name__: package.subpackage.module2
__package__: package.subpackage
__file__: <ShelveLoader "/tmp/pymotw_import_example.shelve"[package.subpackage.module2]>
__path__: /tmp/pymotw_import_example.shelve
__loader__: <sys_shelve_importer.ShelveLoader object at 0xe32d0>

Reloading

Reloading a module is handled slightly differently. Instead of creating a new module object, the existing module
is re-used.

import sys
import sys_shelve_importer

filename = '/tmp/pymotw_import_example.shelve'
sys.path_hooks.append(sys_shelve_importer.ShelveFinder)
sys.path.insert(0, filename)

print 'First import of "package":'
import package

print
print 'Reloading "package":'
reload(package)

By re-using the same object, existing references to the module are preserved even if class or function
definitions are modified by the reload.

$ python sys_shelve_importer_reload.py
First import of "package":
new shelf added to import path: /tmp/pymotw_import_example.shelve
looking for "package" in /tmp/pymotw_import_example.shelve ... found it as package.__init__
loading source for "package" from shelf
creating a new module object for "package"
adding path for package
execing source...
package imported
done

Reloading "package":
looking for "package" in /tmp/pymotw_import_example.shelve ... found it as package.__init__
loading source for "package" from shelf
reusing existing module from previous import of "package"
adding path for package
execing source...
package imported
done

Import Errors

When a module cannot be imported, ImportError is raised.

import sys
import sys_shelve_importer

filename = '/tmp/pymotw_import_example.shelve'
sys.path_hooks.append(sys_shelve_importer.ShelveFinder)
sys.path.insert(0, filename)

try:
import package.module3
except ImportError, e:
print 'Failed to import:', e
$ python sys_shelve_importer_missing.py
new shelf added to import path: /tmp/pymotw_import_example.shelve
looking for "package" in /tmp/pymotw_import_example.shelve ... found it as package.__init__
loading source for "package" from shelf
creating a new module object for "package"
adding path for package
execing source...
package imported
done
looking for "package.module3" in /tmp/pymotw_import_example.shelve ... not found
Failed to import: No module named module3


Package Data

In addition to defining the API loading executable Python code, PEP 302 defines an optional API for retrieving
package data intended for distributing data files, documentation, and other non-code resources used by a package. By implementing get_data(), a loader can allow calling applications to support retrieval of data associated with the package without considering how the package is actually installed (especially without assuming that the package is stored as files on a filesystem).

import sys
import sys_shelve_importer
import os

filename = '/tmp/pymotw_import_example.shelve'
sys.path_hooks.append(sys_shelve_importer.ShelveFinder)
sys.path.insert(0, filename)

import package

readme_path = os.path.join(package.__path__[0], 'README')

readme = package.__loader__.get_data(readme_path)
print readme

foo_path = os.path.join(package.__path__[0], 'foo')
foo = package.__loader__.get_data(foo_path)
print foo

get_data() takes a path based on the module or package that owns the data, and returns the contents of the
resource “file” as a string, or raises IOError if the resource does not exist.

$ python sys_shelve_importer_get_data.py
new shelf added to import path: /tmp/pymotw_import_example.shelve
looking for "package" in /tmp/pymotw_import_example.shelve ... found it as package.__init__
loading source for "package" from shelf
creating a new module object for "package"
adding path for package
execing source...
package imported
done
looking for data for "/tmp/pymotw_import_example.shelve/README"

==============
package README
==============

This is the README for ``package``.

looking for data for "/tmp/pymotw_import_example.shelve/foo"
Traceback (most recent call last):
File "sys_shelve_importer_get_data.py", line 26, in <module>
foo = package.__loader__.get_data(foo_path)
File "/Users/dhellmann/Documents/PyMOTW/src/PyMOTW/sys/sys_shelve_importer.py", line 114, in get_data
raise IOError
IOError

Importer Cache

Searching through all of the hooks each time a module is imported can become expensive. To save time,
sys.path_importer_cache is maintained as a mapping between a path entry and the loader that can use the
value to find modules.

import sys
import pprint

print 'PATH:',
pprint.pprint(sys.path)
print
print 'IMPORTERS:'
pprint.pprint(sys.path_importer_cache)

A cache value of None means to use the default filesystem loader. Each missing directory is associated with
an imp.NullImporter instance, since modules cannot be imported from directories that do not exist. In the
example output below, several zipimport.zipimporter instances are used to manage EGG files found on the path.

$ python sys_path_importer_cache.py
PATH:['/Users/dhellmann/Documents/PyMOTW/src/PyMOTW/sys',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages/pip-0.3.1-py2.6.egg',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages/Paver-1.0.1-py2.6.egg',
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python26.zip',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/plat-darwin',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/plat-mac',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/plat-mac/lib-scriptpackages',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/lib-tk',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/lib-old',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/lib-dynload',
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6',
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-darwin',
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-tk',
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac',
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac/lib-scriptpackages',
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages',
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages']

IMPORTERS:
{'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6': None,
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/lib-tk': None,
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-darwin': None,
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac': None,
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/plat-mac/lib-scriptpackages': None,
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages': None,
'/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg': <zipimporter object "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg">,
'/Users/dhellmann/.virtualenvs/pymotw/bin/../lib/python2.6/': None,
'/Users/dhellmann/.virtualenvs/pymotw/bin/../lib/python26.zip': <imp.NullImporter object at 0x16028>,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6': None,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/encodings': None,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/lib-dynload': None,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/lib-old': <imp.NullImporter object at 0x164a8>,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/lib-tk': <imp.NullImporter object at 0x164a0>,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/plat-darwin': <imp.NullImporter object at 0x16488>,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/plat-mac': <imp.NullImporter object at 0x16490>,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/plat-mac/lib-scriptpackages': <imp.NullImporter object at 0x16498>,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages': None,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages/Paver-1.0.1-py2.6.egg': None,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages/pip-0.3.1-py2.6.egg': None,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg': <zipimporter object "/Users/dhellmann/.virtualenvs/pymotw/lib/python2.6/site-packages/setuptools-0.6c9-py2.6.egg">,
'/Users/dhellmann/.virtualenvs/pymotw/lib/python26.zip': <imp.NullImporter object at 0x16480>,
'/Users/dhellmann/Documents/PyMOTW/src/PyMOTW/sys': None,
'sys_path_importer_cache.py': <imp.NullImporter object at 0x164b0>}

Meta Path

The sys.meta_path further extends the sources of potential imports by allowing a finder to be searched
before the regular sys.path is scanned. The API for a finder on the meta-path is the same as for a regular
path. The difference is that the meta-finder is not limited to a single entry in sys.path, it can search
anywhere at all.

import sys
import sys_shelve_importer
import imp


class NoisyMetaImportFinder(object):

def __init__(self, prefix):
print 'Creating NoisyMetaImportFinder for %s' % prefix
self.prefix = prefix
return

def find_module(self, fullname, path=None):
print 'NoisyMetaImportFinder looking for "%s" with path "%s"' % (fullname, path)
name_parts = fullname.split('.')
if name_parts and name_parts[0] == self.prefix:
print ' ... found prefix, returning loader'
return NoisyMetaImportLoader(path)
else:
print ' ... not the right prefix, cannot load'
return None


class NoisyMetaImportLoader(object):

def __init__(self, path_entry):
self.path_entry = path_entry
return

def load_module(self, fullname):
print 'loading %s' % fullname
if fullname in sys.modules:
mod = sys.modules[fullname]
else:
mod = sys.modules.setdefault(fullname, imp.new_module(fullname))

# Set a few properties required by PEP 302
mod.__file__ = fullname
mod.__name__ = fullname
# always looks like a package
mod.__path__ = [ 'path-entry-goes-here' ]
mod.__loader__ = self
mod.__package__ = '.'.join(fullname.split('.')[:-1])

return mod


# Install the meta-path finder
sys.meta_path.append(NoisyMetaImportFinder('foo'))

# Import some modules that are "found" by the meta-path finder
print
import foo

print
import foo.bar

# Import a module that is not found
print
try:
import bar
except ImportError, e:
pass

Each finder on the meta-path is interrogated before sys.path is searched, so there is always an opportunity to have a central importer load modules without explicitly modifying sys.path. Once the module is “found”, the loader API works in the same way as for regular loaders (although this example is truncated for simplicity).

$ python sys_meta_path.py
Creating NoisyMetaImportFinder for foo

NoisyMetaImportFinder looking for "foo" with path "None"
... found prefix, returning loader
loading foo

NoisyMetaImportFinder looking for "foo.bar" with path "['path-entry-goes-here']"
... found prefix, returning loader
loading foo.bar

NoisyMetaImportFinder looking for "bar" with path "None"
... not the right prefix, cannot load

See also

PEP 302
Import Hooks
imp
The imp module provides tools used by importers.
zipimport
Implements importing Python modules from inside ZIP archives.
The Quick Guide to Python Eggs
PEAK documentation for working with EGGs.
Import this, that, and the other thing: custom importers
Brett Cannon’s PyCon 2010 presentation.
Python 3 stdlib module “importlib”
Python 3.x includes abstract base classes that makes it easier to create custom importers.

PyMOTW Home

The canonical version of this article

Saturday, November 21, 2009

Automatically back up thumb drives on your Mac

I have a couple of different thumb drives that I use as portable working devices. The data on them is important, so I wanted to back them up. Today I worked out how to copy the contents of the USB drive to a folder on my hard drive every time the USB drive is inserted into the computer.

The two technologies I used to accomplish this are Folder Actions and AppleScript. The first step is to use ScriptEditor to save the script below to ~/Library/Scripts/Folder Action Scripts/SyncThumbDrive.scpt.


property DEST_DIR : "Documents:ThumbDrives:" (* inside $HOME *)

on handle_new_volume(volume)

log "Detected new volume " & volume

tell application "Finder"
-- figure out source for copying data
set sourcePath to the quoted form of the POSIX path of volume
log "Source path:" & sourcePath

-- work out the destination
set sourceVol to the name of volume
set myHome to path to home folder as string
set destFolderName to (myHome & DEST_DIR & volume)

if exists folder destFolderName then
set destPath to POSIX path of destFolderName
log "Destination path:" & destPath
do shell script "rsync -a " & sourcePath & " " & destPath
beep
end if
end tell

end handle_new_volume

on adding folder items to this_folder after receiving these_items
repeat with aItem in these_items
my handle_new_volume(aItem)
end repeat
end adding folder items to


Next, enable folder actions using the "Folder Actions Setup" application. Add the script above as an action on the "/Volumes" folder. Once you have done that, any time a file or directory is added to "/Volumes" the script will be invoked. Since a new entry is added for each volume mounted automatically, this amounts to triggering the script every time a volume is mounted.

Folder Actions Setup.png


The script looks for a destination directory ~/Documents/ThumbDrives/$VOLUME" where $VOLUME is the name of the drive inserted. You need to create the directory before inserting the drive because the script uses the presence of the directory as confirmation that it should copy the updates to the files on the thumb drive over to the hard drive.

After you create the destination directory, insert the thumb drive. When the backup is complete, the computer will play your configured alert sound.

No files are ever deleted, so if you have removed old files from the hard drive they will re-appear until you remove them from the thumb drive.

Sunday, November 15, 2009

PyMOTW: sys, Part 6: Low-level Thread Support

Low-level Thread Support

sys includes low-level functions for controlling and debugging thread behavior.

Check Interval

Python uses a form of cooperative multitasking in its thread implementation. At a fixed interval, bytecode execution is paused and the interpreter checks if any signal handlers need to be executed. During the same interval check, the global interpreter lock is also released by the current thread and then reacquired, giving other threads an opportunity to take over execution by grabbing the lock first.

The default check interval is 100 bytecodes and the current value can always be retrieved with sys.getcheckinterval(). Changing the interval with sys.setcheckinterval() may have an impact on the performance of your application, depending on the nature of the operations being performed.

import sys
import threading
from Queue import Queue
import time

def show_thread(q, extraByteCodes):
for i in range(5):
for j in range(extraByteCodes):
pass
q.put(threading.current_thread().name)
return

def run_threads(prefix, interval, extraByteCodes):
print '%(prefix)s interval = %(interval)s with %(extraByteCodes)s extra operations' % locals()
sys.setcheckinterval(interval)
q = Queue()
threads = [ threading.Thread(target=show_thread, name='%s T%s' % (prefix, i),
args=(q, extraByteCodes)
)
for i in range(3)
]
for t in threads:
t.start()
for t in threads:
t.join()
while not q.empty():
print q.get()
print
return

run_threads('Default', interval=10, extraByteCodes=1000)
run_threads('Custom', interval=10, extraByteCodes=0)

When the check interval is smaller than the number of bytecodes in a thread, the interpreter may give another thread control so that it runs for a while. This is illustrated in the first set of output where the check interval is set to 100 (the default) and 1000 extra loop iterations are performed for each step through the i loop.

On the other hand, when the check interval is greater than the number of bytecodes being executed by a thread that doesn’t release control for another reason, the thread will finish its work before the interval comes up. This is illustrated by the order of the name values in the queue in the second example.

$ python sys_checkinterval.py
Default interval = 10 with 1000 extra operations
Default T0
Default T0
Default T0
Default T1
Default T2
Default T2
Default T0
Default T1
Default T2
Default T0
Default T1
Default T2
Default T1
Default T2
Default T1

Custom interval = 10 with 0 extra operations
Custom T0
Custom T0
Custom T0
Custom T0
Custom T0
Custom T1
Custom T1
Custom T1
Custom T1
Custom T1
Custom T2
Custom T2
Custom T2
Custom T2
Custom T2

Modifying the check interval is not as clearly useful as it might seem. Many other factors may control the context switching behavior of Python’s threads. For example, if a thread performs I/O, it releases the GIL and may therefore allow another thread to take over execution.

import sys
import threading
from Queue import Queue
import time

def show_thread(q, extraByteCodes):
for i in range(5):
for j in range(extraByteCodes):
pass
#q.put(threading.current_thread().name)
print threading.current_thread().name
return

def run_threads(prefix, interval, extraByteCodes):
print '%(prefix)s interval = %(interval)s with %(extraByteCodes)s extra operations' % locals()
sys.setcheckinterval(interval)
q = Queue()
threads = [ threading.Thread(target=show_thread, name='%s T%s' % (prefix, i),
args=(q, extraByteCodes)
)
for i in range(3)
]
for t in threads:
t.start()
for t in threads:
t.join()
while not q.empty():
print q.get()
print
return

run_threads('Default', interval=100, extraByteCodes=1000)
run_threads('Custom', interval=10, extraByteCodes=0)

This example is modified from the first so that the thread prints directly to sys.stdout instead of appending to a queue. The output is much less predictable.

$ python sys_checkinterval_io.py
Default interval = 100 with 1000 extra operations
Default T0
Default T1
Default T1Default T2

Default T0Default T2

Default T2
Default T2
Default T1
Default T2
Default T1
Default T1
Default T0
Default T0
Default T0

Custom interval = 10 with 0 extra operations
Custom T0
Custom T0
Custom T0
Custom T0
Custom T0
Custom T1
Custom T1
Custom T1
Custom T1
Custom T2
Custom T2
Custom T2
Custom T1Custom T2

Custom T2

See also

dis
Disassembling your Python code with the dis module is one way to count bytecodes.

Debugging

Identifying deadlocks can be on of the most difficult aspects of working with threads. sys._current_frames() can help by showing exactly where a thread is stopped.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#!/usr/bin/env python
# encoding: utf-8

import sys
import threading
import time

io_lock = threading.Lock()
blocker = threading.Lock()

def block(i):
t = threading.current_thread()
with io_lock:
print '%s with ident %s going to sleep' % (t.name, t.ident)
if i:
blocker.acquire() # acquired but never released
time.sleep(0.2)
with io_lock:
print t.name, 'finishing'
return

# Create and start several threads that "block"
threads = [ threading.Thread(target=block, args=(i,)) for i in range(3) ]
for t in threads:
t.setDaemon(True)
t.start()

# Map the threads from their identifier to the thread object
threads_by_ident = dict((t.ident, t) for t in threads)

# Show where each thread is "blocked"
time.sleep(0.01)
with io_lock:
for ident, frame in sys._current_frames().items():
t = threads_by_ident.get(ident)
if not t:
# Main thread
continue
print t.name, 'stopped in', frame.f_code.co_name,
print 'at line', frame.f_lineno, 'of', frame.f_code.co_filename

# Let the threads finish
# for t in threads:
# t.join()

The dictionary returned by sys._current_frames() is keyed on the thread identifier, rather than its name. We have to do a little work to map those identifiers back to the thread object we created.

Since Thread-1 does not sleep, it finishes before we check its status. Since it is no longer active, it does not appear in the output. Thread-2 acquires the lock “blocker”, then sleeps for a short period. Meanwhile Thread-3 tries to acquire blocker but cannot because Thread-2 already has it.

$ python sys_current_frames.py
Thread-1 with ident -1341648896 going to sleep
Thread-1 finishing
Thread-2 with ident -1341648896 going to sleep
Thread-3 with ident -1341116416 going to sleep
Thread-3 stopped in block at line 16 of sys_current_frames.py
Thread-2 stopped in block at line 17 of sys_current_frames.py

See also

threading
The threading module includes classes for creating Python threads.
Queue
The Queue module provides a thread-safe implementation of a FIFO data structure.
Python Threads and the Global Interpreter Lock
Jesse Noller’s article from the December 2007 issue of Python Magazine.
Inside the Python GIL
Presentation by David Beazley describing thread implementation and performance issues, including how the check interval and GIL are related.

PyMOTW Home

The canonical version of this article