Monday, July 27, 2009

PyMOTW: Text Processing Tools

Text Processing Tools

The string class is the most obvious text processing tool available to Python programmers, but there are plenty of other tools in the standard library to make text manipulation simpler.

string module

Old-style code will use functions from the string module, instead of methods of string objects. There is an equivalent method for each function from the module, and use of the functions is deprecated for new code.

Newer code may use a string.Template as a simple way to parameterize strings beyond the features of the string or unicode classes. While not as feature-rich as templates defined by many of the web frameworks or extension modules available on PyPI, string.Template is a good middle ground for user-modifiable templates where dynamic values need to be inserted into otherwise static text.

Text Input

Reading from a file is easy enough, but if you’re writing a line-by-line filter the fileinput module is even easier. The fileinput API calls for you to iterate over the input() generator, processing each line as it is yielded. The generator handles parsing command line arguments for file names, or falling back to reading directly from sys.stdin. The result is a flexible tool your users can run directly on a file or as part of a pipeline.

Text Output

The textwrap module includes tools for formatting text from paragraphs by limiting the width of output, adding indentation, and inserting line breaks to wrap lines consistently.

Comparing Values

The standard library includes two modules related to comparing text values beyond the built-in equality and sort comparison supported by string objects. re provides a complete regular expression library, implemented largely in C for performance. Regular expressions are well-suited for finding substrings within a larger data set, comparing strings against a pattern (rather than another fixed string), and mild parsing.

difflib, on the other hand, shows you the actual differences between sequences of text in terms of the parts added, removed, or changed. The output of the comparison functions in difflib can be used to provide more detailed feedback to user about where changes occur in two inputs, how a document has changed over time, etc.

PyMOTW Home

The canonical version of this article

Thursday, July 23, 2009

Italian translation of PyMOTW

Roberto Pauletto is working on an Italian translation of the Module of the Week series. Roberto creates Windows applications with C# by day, and tinkers with Linux and Python at home. He has recently moved to Python from Perl for all of his system-administration scripting.

Thanks, Roberto!

Sunday, July 19, 2009

PyMOTW: urllib2 - Library for opening URLs.

urllib2 – Library for opening URLs.

Purpose:A library for opening URLs that can be extended by defining custom protocol handlers.
Python Version:2.1

The urllib2 module provides an updated API for using internet resources identified by URLs. It is designed to be extended by individual applications to support new protocols or add variations to existing protocols (such as handling HTTP basic authentication).

HTTP GET

Note

The test server for these examples is in BaseHTTPServer_GET.py, from the
PyMOTW examples for BaseHTTPServer. Start the server in one
terminal window, then run these examples in another.

As with urllib, an HTTP GET operation is the simplest use of urllib2. Simply pass the
URL to urlopen() to get a “file-like” handle to the remote data.

import urllib2

response = urllib2.urlopen('http://localhost:8080/')
print 'RESPONSE:', response
print 'URL :', response.geturl()

headers = response.info()
print 'DATE :', headers['date']
print 'HEADERS :'
print '---------'
print headers

data = response.read()
print 'LENGTH :', len(data)
print 'DATA :'
print '---------'
print data

The example server accepts the incoming values and formats a plain text response
to send back. The return value from urlopen() gives access to the headers from
the HTTP server through the info() method, and the data for the remote
resource via methods like read() and readlines().

$ python urllib2_urlopen.py
RESPONSE: <addinfourl at 11940488 whose fp = <socket._fileobject object at 0xb573f0>>
URL : http://localhost:8080/
DATE : Sun, 19 Jul 2009 14:01:31 GMT
HEADERS :
---------
Server: BaseHTTP/0.3 Python/2.6.2
Date: Sun, 19 Jul 2009 14:01:31 GMT

LENGTH : 349
DATA :
---------
CLIENT VALUES:
client_address=('127.0.0.1', 55836) (localhost)
command=GET
path=/
real path=/
query=
request_version=HTTP/1.1

SERVER VALUES:
server_version=BaseHTTP/0.3
sys_version=Python/2.6.2
protocol_version=HTTP/1.0

HEADERS RECEIVED:
accept-encoding=identity
connection=close
host=localhost:8080
user-agent=Python-urllib/2.6

The file-like object returned by urlopen() is iterable:

import urllib2

response = urllib2.urlopen('http://localhost:8080/')
for line in response:
print line.rstrip()

This example strips the trailing newlines and carriage returns before printing the output.

$ python urllib2_urlopen_iterator.py
CLIENT VALUES:
client_address=('127.0.0.1', 55840) (localhost)
command=GET
path=/
real path=/
query=
request_version=HTTP/1.1

SERVER VALUES:
server_version=BaseHTTP/0.3
sys_version=Python/2.6.2
protocol_version=HTTP/1.0

HEADERS RECEIVED:
accept-encoding=identity
connection=close
host=localhost:8080
user-agent=Python-urllib/2.6

Encoding Arguments

Arguments can be passed to the server by encoding them with urllib.urlencode() and
appending them to the URL.

import urllib
import urllib2

query_args = { 'q':'query string', 'foo':'bar' }
encoded_args = urllib.urlencode(query_args)
print 'Encoded:', encoded_args

url = 'http://localhost:8080/?' + encoded_args
print urllib2.urlopen(url).read()

The list of client values returned in the example output contains the encoded query
arguments.

$ python urllib2_http_get_args.py
Encoded: q=query+string&foo=bar
CLIENT VALUES:
client_address=('127.0.0.1', 55849) (localhost)
command=GET
path=/?q=query+string&foo=bar
real path=/
query=q=query+string&foo=bar
request_version=HTTP/1.1

SERVER VALUES:
server_version=BaseHTTP/0.3
sys_version=Python/2.6.2
protocol_version=HTTP/1.0

HEADERS RECEIVED:
accept-encoding=identity
connection=close
host=localhost:8080
user-agent=Python-urllib/2.6

HTTP POST

Note

The test server for these examples is in BaseHTTPServer_POST.py, from the
PyMOTW examples for the BaseHTTPServer. Start the server in one
terminal window, then run these examples in another.

To POST form-encoded data to the remote server, instead of using GET, simply pass the encoded
query arguments as data to urlopen().

import urllib
import urllib2

query_args = { 'q':'query string', 'foo':'bar' }
encoded_args = urllib.urlencode(query_args)
url = 'http://localhost:8080/'
print urllib2.urlopen(url, encoded_args).read()

The server can decode the form data and access the individual values by name.

$ python urllib2_urlopen_post.py
Client: ('127.0.0.1', 55943)
User-agent: Python-urllib/2.6
Path: /
Form data:
q=query string
foo=bar

Working with Requests Directly

urlopen() is a convenience function that hides some of the details of how the request is made
and handled for you. For more precise control, you may want to instantiate and use a Request
object directly.

Adding Outgoing Headers

As the examples above illustrate, the default User-agent header value is made up of the
constant Python-urllib, followed by the Python interpreter version. If you are creating
an application that will access other people’s web resources, it is a courtesy to include
real user agent information in your requests, so they can identify the source of the hits
more easily. Using a custom agent also allows them to control crawlers using a robots.txt
file (see robotparser).

import urllib2

request = urllib2.Request('http://localhost:8080/')
request.add_header('User-agent', 'PyMOTW (http://www.doughellmann.com/PyMOTW/)')

response = urllib2.urlopen(request)
data = response.read()
print data

After creating a Request object, it is easy to use add_header() to set the user agent
value before opening the request. The last line of the output shows our custom
value.

$ python urllib2_request_header.py
CLIENT VALUES:
client_address=('127.0.0.1', 55876) (localhost)
command=GET
path=/
real path=/
query=
request_version=HTTP/1.1

SERVER VALUES:
server_version=BaseHTTP/0.3
sys_version=Python/2.6.2
protocol_version=HTTP/1.0

HEADERS RECEIVED:
accept-encoding=identity
connection=close
host=localhost:8080
user-agent=PyMOTW (http://www.doughellmann.com/PyMOTW/)


Posting Form Data

You can set the outgoing data on the Request to post it to the server.

import urllib
import urllib2

query_args = { 'q':'query string', 'foo':'bar' }

request = urllib2.Request('http://localhost:8080/')
print 'Request method before data:', request.get_method()

request.add_data(urllib.urlencode(query_args))
print 'Request method after data :', request.get_method()
request.add_header('User-agent', 'PyMOTW (http://www.doughellmann.com/PyMOTW/)')

print
print 'OUTGOING DATA:'
print request.get_data()

print
print 'SERVER RESPONSE:'
print urllib2.urlopen(request).read()

The HTTP method used by the Request changes from GET to POST after the data is added.

$ python urllib2_request_post.py
Request method before data: GET
Request method after data : POST

OUTGOING DATA:
q=query+string&foo=bar

SERVER RESPONSE:
Client: ('127.0.0.1', 56044)
User-agent: PyMOTW (http://www.doughellmann.com/PyMOTW/)
Path: /
Form data:
q=query string
foo=bar

Note

Although the method is add_data(), its effect is not cumulative. Each call
replaces the previous data.

Uploading Files

Encoding files for upload requires a little more work than simple forms. A complete MIME
message needs to be constructed in the body of the request, so that the server can
distinguish incoming form fields from uploaded files.

import itertools
import mimetools
import mimetypes
from cStringIO import StringIO
import urllib
import urllib2

class MultiPartForm(object):
"""Accumulate the data to be used when posting a form."""

def __init__(self):
self.form_fields = []
self.files = []
self.boundary = mimetools.choose_boundary()
return

def get_content_type(self):
return 'multipart/form-data; boundary=%s' % self.boundary

def add_field(self, name, value):
"""Add a simple field to the form data."""
self.form_fields.append((name, value))
return

def add_file(self, fieldname, filename, fileHandle, mimetype=None):
"""Add a file to be uploaded."""
body = fileHandle.read()
if mimetype is None:
mimetype = mimetypes.guess_type(filename)[0] or 'application/octet-stream'
self.files.append((fieldname, filename, mimetype, body))
return

def __str__(self):
"""Return a string representing the form data, including attached files."""
# Build a list of lists, each containing "lines" of the
# request. Each part is separated by a boundary string.
# Once the list is built, return a string where each
# line is separated by '\r\n'.
parts = []
part_boundary = '--' + self.boundary

# Add the form fields
parts.extend(
[ part_boundary,
'Content-Disposition: form-data; name="%s"' % name,
'',
value,
]
for name, value in self.form_fields
)

# Add the files to upload
parts.extend(
[ part_boundary,
'Content-Disposition: file; name="%s"; filename="%s"' % \
(field_name, filename),
'Content-Type: %s' % content_type,
'',
body,
]
for field_name, filename, content_type, body in self.files
)

# Flatten the list and add closing boundary marker,
# then return CR+LF separated data
flattened = list(itertools.chain(*parts))
flattened.append('--' + self.boundary + '--')
flattened.append('')
return '\r\n'.join(flattened)

if __name__ == '__main__':
# Create the form with simple fields
form = MultiPartForm()
form.add_field('firstname', 'Doug')
form.add_field('lastname', 'Hellmann')

# Add a fake file
form.add_file('biography', 'bio.txt',
fileHandle=StringIO('Python developer and blogger.'))

# Build the request
request = urllib2.Request('http://localhost:8080/')
request.add_header('User-agent', 'PyMOTW (http://www.doughellmann.com/PyMOTW/)')
body = str(form)
request.add_header('Content-type', form.get_content_type())
request.add_header('Content-length', len(body))
request.add_data(body)

print
print 'OUTGOING DATA:'
print request.get_data()

print
print 'SERVER RESPONSE:'
print urllib2.urlopen(request).read()

The MultiPartForm class can represent an arbitrary form as a multi-part MIME message
with attached files.

$ python urllib2_upload_files.py

OUTGOING DATA:
--192.168.1.17.527.30074.1248020372.206.1
Content-Disposition: form-data; name="firstname"

Doug
--192.168.1.17.527.30074.1248020372.206.1
Content-Disposition: form-data; name="lastname"

Hellmann
--192.168.1.17.527.30074.1248020372.206.1
Content-Disposition: file; name="biography"; filename="bio.txt"
Content-Type: text/plain

Python developer and blogger.
--192.168.1.17.527.30074.1248020372.206.1--


SERVER RESPONSE:
Client: ('127.0.0.1', 57126)
User-agent: PyMOTW (http://www.doughellmann.com/PyMOTW/)
Path: /
Form data:
lastname=Hellmann
Uploaded biography as "bio.txt" (29 bytes)
firstname=Doug

Custom Protocol Handlers

urllib2 has built-in support for HTTP(S), FTP, and local file access. If you need to add
support for other URL types, you can register your own protocol handler to be invoked as
needed. For example, if you want to support URLs pointing to arbitrary files on remote NFS
servers, without requiring your users to mount the path manually, would create a
class derived from BaseHandler and with a method nfs_open().

The protocol open method takes a single argument, the Request instance, and it should return
an object with a read() method that can be used to read the data, an info() method to return
the response headers, and geturl() to return the actual URL of the file being read. A simple
way to achieve that is to create an instance of urllib.addurlinfo, passing the headers,
URL, and open file handle in to the constructor.

import mimetypes
import os
import tempfile
import urllib
import urllib2

class NFSFile(file):
def __init__(self, tempdir, filename):
self.tempdir = tempdir
file.__init__(self, filename, 'rb')
def close(self):
print
print 'NFSFile:'
print ' unmounting %s' % self.tempdir
print ' when %s is closed' % os.path.basename(self.name)
return file.close(self)

class FauxNFSHandler(urllib2.BaseHandler):

def __init__(self, tempdir):
self.tempdir = tempdir

def nfs_open(self, req):
url = req.get_selector()
directory_name, file_name = os.path.split(url)
server_name = req.get_host()
print
print 'FauxNFSHandler simulating mount:'
print ' Remote path: %s' % directory_name
print ' Server : %s' % server_name
print ' Local path : %s' % tempdir
print ' File name : %s' % file_name
local_file = os.path.join(tempdir, file_name)
fp = NFSFile(tempdir, local_file)
content_type = mimetypes.guess_type(file_name)[0] or 'application/octet-stream'
stats = os.stat(local_file)
size = stats.st_size
headers = { 'Content-type': content_type,
'Content-length': size,
}
return urllib.addinfourl(fp, headers, req.get_full_url())

if __name__ == '__main__':
tempdir = tempfile.mkdtemp()
try:
# Populate the temporary file for the simulation
with open(os.path.join(tempdir, 'file.txt'), 'wt') as f:
f.write('Contents of file.txt')

# Construct an opener with our NFS handler
# and register it as the default opener.
opener = urllib2.build_opener(FauxNFSHandler(tempdir))
urllib2.install_opener(opener)

# Open the file through a URL.
response = urllib2.urlopen('nfs://remote_server/path/to/the/file.txt')
print
print 'READ CONTENTS:', response.read()
print 'URL :', response.geturl()
print 'HEADERS:'
for name, value in sorted(response.info().items()):
print ' %-15s = %s' % (name, value)
response.close()
finally:
os.remove(os.path.join(tempdir, 'file.txt'))
os.removedirs(tempdir)

The FauxNFSHandler and NFSFile classes print messages to illustrate where a real
implementation would add mount and unmount calls. Since this is just a simulation,
FauxNFSHandler is primed with the name of a temporary directory where it should look for all
of its files.

$ python urllib2_nfs_handler.py

FauxNFSHandler simulating mount:
Remote path: /path/to/the
Server : remote_server
Local path : /var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-/tmppv5Efn
File name : file.txt

READ CONTENTS: Contents of file.txt
URL : nfs://remote_server/path/to/the/file.txt
HEADERS:
Content-length = 20
Content-type = text/plain

NFSFile:
unmounting /var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-/tmppv5Efn
when file.txt is closed

See also

urllib2
The standard library documentation for this module.
urllib
Original URL handling library.
urlparse
Work with the URL string itself.
urllib2 – The Missing Manual
Michael Foord’s write-up on using urllib2.
Upload Scripts
Example scripts from Michael Foord that illustrate how to upload a file
using HTTP and then receive the data on the server.
HTTP client to POST using multipart/form-data
Python cookbook recipe showing how to encode and post data, including files,
over HTTP.
Form content types
W3C specification for posting files or large amounts of data via HTTP forms.
mimetypes
Map filenames to mimetype.
mimetools
Tools for parsing MIME messages.

PyMOTW Home

The canonical version of this article

Monday, July 13, 2009

Suggesting PyMOTW topics via Skribit

I've added a Skribit widget to the PyMOTW home page and to my blog to get some feedback from readers about what modules or general topics I should cover next. Add your suggestion to the list, or vote for one of the existing suggestions, to let me know what you want to read about. I'll monitor the suggestions and use them to decide which module(s) to start researching for my next article.

No account is necessary to participate, so let me hear from you today!

Updated 18 Oct 2009: I've turned off anonymous suggestions because I'm getting too many vague ideas without any way to communicate back with the submitter.

Sunday, July 12, 2009

PyMOTW: File Access

File Access

Python’s standard library includes a large range of tools for working with files, filenames, and file contents.

Filenames

The first step in working with files is to get the name of the file so you can operate on it. Python represents filenames as simple strings, but provides tools for building them from standard, platform-independent, components in os.path. List the contents of a directory with listdir() from os, or use glob to build a list of filenames from a pattern. Finer grained filtering of filenames is possible with fnmatch.


Meta-data

Once you know the name of the file, you may want to check other characteristics such as permissions or the file size using os.stat() and the constants in stat.

Reading Files

If you’re writing a filter application that processes text input line-by-line, fileinput provides an easy framework to get started. The fileinput API calls for you to iterate over the input() generator, processing each line as it is yielded. The generator handles parsing command line arguments for file names, or falling back to reading directly from sys.stdin. The result is a flexible tool your users can run directly on a file or as part of a pipeline.

If your app needs random access to files, linecache makes it easy to read lines by their line number. The contents of the file are maintained in a cache, so be careful of memory consumption.

Temporary Files

For cases where you need to create scratch files to hold data temporarily, or before moving it to a permanent location, tempfile will be very useful. It provides classes to create temporary files and directories safely and securely. Names are guaranteed not to collide, and include random components so they are not easily guessable.

Files and Directories

Frequently you need to work on a file as a whole, without worrying about what is in it. The shutil module includes high-level file operations such as copying files and directories, setting permissions, etc.

PyMOTW Home

The canonical version of this article

Monday, July 6, 2009

Book Review: IronPython in Action



IronPython in Action by Michael Foord and Christian Muirhead covers the version of Python built to run on Microsoft's CLR and explains how to use it with the .NET framework.

Disclaimer: I received a review copy of this book from Manning through the PyATL Book Club.

There are two target audiences for this book: experienced Python developers wanting to learn .NET, and experienced .NET developers wanting to learn Python. Both groups will find plenty of interesting material and learn a lot. After some relatively basic introductory chapters, the authors dig right in building a complex GUI application, and then implementing a web interface for the same desktop application.

Along the way they introduce topics such as different programming models in Python, navigating the MSDN documentation (very important for understanding the scope of features available in .NET), packaging your app for distribution under Windows, data persistence, XML parsing, design patterns, automated testing, system administration, relational databases, and two separate GUI libraries.

All of the code is clear, concise, and useful -- there are no fluffy, throw-away code snippets that fall short in the real world. While many of the examples given are specific to IronPython or .NET, the techniques being illustrated are definitely not.

I recommend this book for any Windows developer interested in learning about Python, and for Python developers looking into deploying an application under Windows. If you don't fall into either of those groups, I can still recommend that you pick up a copy for some excellent advice on general programming topics and the solid example code.

Sunday, July 5, 2009

New command line interface to PyMOTW

The 1.95 release of PyMOTW includes a command line interface to access the documentation for a module.

The package can be installed via easy_install or pip:

$ pip install PyMOTW
Downloading/unpacking PyMOTW
Downloading PyMOTW-1.95.tar.gz (2.2Mb): 2.2Mb downloaded
Running setup.py egg_info for package PyMOTW
warning: no files found matching 'ChangeLog'
warning: no files found matching '*.py' under directory 'sphinx/templates'
no previously-included directories found matching 'utils'
Installing collected packages: PyMOTW
Running setup.py install for PyMOTW
changing mode of build/scripts-2.6/motw from 644 to 755
warning: no files found matching 'ChangeLog'
warning: no files found matching '*.py' under directory 'sphinx/templates'
no previously-included directories found matching 'utils'
changing mode of /Users/dhellmann/.virtualenvs/testpymotw/bin/motw to 755
Successfully installed PyMOTW


and then to use the command line interface, run motw.

$ motw -h
Usage: motw [options]

Options:
-h, --help show this help message and exit
-t, --text Print plain-text version of help to stdout
-w, --web Open HTML version of help from web
--html Open HTML version of help from installed file


For example, motw abc opens the local version of this week's article. You can also use the "-w" option to go to my web site instead of reading the local version, so you always have the latest version of an article.

PyMOTW: abc - Abstract Base Classes

abc – Abstract Base Classes

Purpose:Define and use abstract base classes for API checks in your code.
Python Version:2.6

Why use Abstract Base Classes?

Abstract base classes are a form of interface checking more strict than individual hasattr() checks for particular methods. By defining an abstract base class, you can define a common API for a set of subclasses. This capability is especially useful in situations where a third-party is going to provide implementations, such as with plugins to an application, but can also aid you when working on a large team or with a large code-base where keeping all classes in your head at the same time is difficult or not possible.

How ABCs Work

abc works by marking methods of the base class as abstract, and then registering concrete classes as implementations of the abstract base. If your code requires a particular API, you can use issubclass() or isinstance() to check an object against the abstract class.

Let’s start by defining an abstract base class to represent the API of a set of plugins for saving and loading data.

import abc

class PluginBase(object):
__metaclass__ = abc.ABCMeta

@abc.abstractmethod
def load(self, input):
"""Retrieve data from the input source and return an object."""
return

@abc.abstractmethod
def save(self, output, data):
"""Save the data object to the output."""
return

Registering a Concrete Class

There are two ways to indicate that a concrete class implements an abstract: register the class with the abc or subclass directly from the abc.

import abc
from abc_base import PluginBase

class RegisteredImplementation(object):

def load(self, input):
return input.read()

def save(self, output, data):
return output.write(data)

PluginBase.register(RegisteredImplementation)

if __name__ == '__main__':
print 'Subclass:', issubclass(RegisteredImplementation, PluginBase)
print 'Instance:', isinstance(RegisteredImplementation(), PluginBase)

In this example the PluginImplementation is not derived from PluginBase, but is registered as implementing the PluginBase API.

$ python abc_register.py
Subclass: True
Instance: True

Implementation Through Subclassing

By subclassing directly from the base, we can avoid the need to register the class explicitly.

import abc
from abc_base import PluginBase

class SubclassImplementation(PluginBase):

def load(self, input):
return input.read()

def save(self, output, data):
return output.write(data)

if __name__ == '__main__':
print 'Subclass:', issubclass(SubclassImplementation, PluginBase)
print 'Instance:', isinstance(SubclassImplementation(), PluginBase)

In this case the normal Python class management is used to recognize PluginImplementation as implementing the abstract PluginBase.

$ python abc_subclass.py
Subclass: True
Instance: True

A side-effect of using direct subclassing is it is possible to find all of the implementations of your plugin by asking the base class for the list of known classes derived from it (this is not an abc feature, all classes can do this).

import abc
from abc_base import PluginBase
import abc_subclass
import abc_register

for sc in PluginBase.__subclasses__():
print sc.__name__

Notice that even though abc_register is imported, RegisteredImplementation is not among the list of subclasses because it is not actually derived from the base.

$ python abc_find_subclasses.py
SubclassImplementation

Dr. André Roberge has described using this capability to discover plugins by importing all of the modules in a directory dynamically and then looking at the subclass list to find the implementation classes.

Incomplete Implementations

Another benefit of subclassing directly from your abstract base class is that the subclass cannot be instantiated unless it fully implements the abstract portion of the API. This can keep half-baked implementations from triggering unexpected errors at runtime.

import abc
from abc_base import PluginBase

class IncompleteImplementation(PluginBase):

def save(self, output, data):
return output.write(data)

PluginBase.register(IncompleteImplementation)

if __name__ == '__main__':
print 'Subclass:', issubclass(IncompleteImplementation, PluginBase)
print 'Instance:', isinstance(IncompleteImplementation(), PluginBase)
$ python abc_incomplete.py
Subclass: True
Instance:
Traceback (most recent call last):
File "abc_incomplete.py", line 22, in <module>
print 'Instance:', isinstance(IncompleteImplementation(), PluginBase)
TypeError: Can't instantiate abstract class IncompleteImplementation with abstract methods load

Concrete Methods in ABCs

Although a concrete class must provide an implementation of an abstract methods, the abstract base class can also provide an implementation that can be invoked via super(). This lets you re-use common logic by placing it in the base class, but force subclasses to provide an overriding method with (potentially) custom logic.

import abc
from cStringIO import StringIO

class ABCWithConcreteImplementation(object):
__metaclass__ = abc.ABCMeta

@abc.abstractmethod
def retrieve_values(self, input):
print 'base class reading data'
return input.read()

class ConcreteOverride(ABCWithConcreteImplementation):

def retrieve_values(self, input):
base_data = super(ConcreteOverride, self).retrieve_values(input)
print 'subclass sorting data'
response = sorted(base_data.splitlines())
return response

input = StringIO("""line one
line two
line three
""")

reader = ConcreteOverride()
print reader.retrieve_values(input)
print

Since ABCWithConcreteImplementation is an abstract base class, it isn’t possible to instantiate it to use it directly. Subclasses must provide an override for retrieve_values(), and in this case the concrete class massages the data before returning it at all.

$ python abc_concrete_method.py
base class reading data
subclass sorting data
['line one', 'line three', 'line two']

Abstract Properties

If your API specification includes attributes in addition to methods, you can require the attributes in concrete classes by defining them with @abstractproperty.

import abc

class Base(object):
__metaclass__ = abc.ABCMeta

@abc.abstractproperty
def value(self):
return 'Should never get here'


class Implementation(Base):

@property
def value(self):
return 'concrete property'


try:
b = Base()
print 'Base.value:', b.value
except Exception, err:
print 'ERROR:', str(err)

i = Implementation()
print 'Implementation.value:', i.value

The Base class in the example cannot be instantiated because it has only an abstract version of the property getter method.

$ python abc_abstractproperty.py
ERROR: Can't instantiate abstract class Base with abstract methods value
Implementation.value: concrete property

You can also define abstract read/write properties.

import abc

class Base(object):
__metaclass__ = abc.ABCMeta

def value_getter(self):
return 'Should never see this'

def value_setter(self, newvalue):
return

value = abc.abstractproperty(value_getter, value_setter)


class PartialImplementation(Base):

@abc.abstractproperty
def value(self):
return 'Read-only'


class Implementation(Base):

_value = 'Default value'

def value_getter(self):
return self._value

def value_setter(self, newvalue):
self._value = newvalue

value = property(value_getter, value_setter)


try:
b = Base()
print 'Base.value:', b.value
except Exception, err:
print 'ERROR:', str(err)

try:
p = PartialImplementation()
print 'PartialImplementation.value:', p.value
except Exception, err:
print 'ERROR:', str(err)

i = Implementation()
print 'Implementation.value:', i.value

i.value = 'New value'
print 'Changed value:', i.value

Notice that the concrete property must be defined the same way as the abstract property. Trying to override a read/write property in PartialImplementation with one that is read-only does not work.

$ python abc_abstractproperty_rw.py
ERROR: Can't instantiate abstract class Base with abstract methods value
ERROR: Can't instantiate abstract class PartialImplementation with abstract methods value
Implementation.value: Default value
Changed value: New value

Unfortunately, the decorator syntax does not work for read/write abstract properties the way it does with concrete properties.

import abc

class Base(object):
__metaclass__ = abc.ABCMeta

@abc.abstractproperty
def value(self):
return 'Should never see this'

@value.setter
def value_setter(self, newvalue):
return


class Implementation(Base):

_value = 'Default value'

@property
def value(self):
return self._value

@value.setter
def value_setter(self, newvalue):
self._value = newvalue


i = Implementation()
print 'Implementation.value:', i.value

i.value = 'New value'
print 'Changed value:', i.value

Notice that the caller cannot set the property value.

$ python abc_abstractproperty_rw_deco.py
Implementation.value: Default value
Traceback (most recent call last):
File "abc_abstractproperty_rw_deco.py", line 40, in <module>
i.value = 'New value'
AttributeError: can't set attribute

Collection Types

The collections module defines several abstract base classes related to container (and containable) types.

General container classes:

  • Container
  • Sized

Iterator and Sequence classes:

  • Iterable
  • Iterator
  • Sequence
  • MutableSequence

Unique values:

  • Hashable
  • Set
  • MutableSet

Mappings:

  • Mapping
  • MutableMapping
  • MappingView
  • KeysView
  • ItemsView
  • ValuesView

Miscelaneous:

  • Callable

In addition to serving as detailed real-world examples of abstract base classes, Python’s built-in types are automatically registered to these classes when you import collections. This means you can safely use isinstance() to check parameters in your code to ensure that they support the API you need. The base classes can also be used to define your own collection types, since many of them provide concrete implementations of the internals and only need a few methods overridden. Refer to the standard library docs for collections for more details.

See also

abc
The standard library documentation for this module.
PEP 3119
Introducing Abstract Base Classes
collections
The collections module includes abstract base classes for several collection types.
collections
The standard library documentation for collections.
PEP 3141
A Type Hierarchy for Numbers
Wikipedia: Strategy Pattern
Description and examples of the strategy pattern.
Plugins and monkeypatching
PyCon 2009 presentation by Dr. André Roberge

PyMOTW Home

The canonical version of this article

Saturday, July 4, 2009

Book Review: Hello, World!



Hello, World! Computer Programming for Kids and Other Beginners by Warren and Carter Sande is an introduction to programming in general (and Python specifically) aimed at pre-teens or young teens.

Disclaimer: I received a review copy of this book from Manning through the PyATL Book Club.

Although the book is designed for a young audience, it is not condescending as many kids books tend to be so it remains readable by adults who need a very basic text on how computer programs work. And by "basic" I mean from the ground up. The book covers using an editor to create and modify program files, numbers, strings, variables, branching, and looping. It doesn't stop with basic topics, though. By the mid-point of the book, the authors have built up to the point where introducing PyGame and graphics programming isn't a stretch, and by the end of the book they have covered the GUI, animation, and sound techniques needed to create two simple computer games.

The writing style is clear and friendly without coming off as cutesy. Each chapter is relatively short, with review questions at the end in the style of a text book (the answer guide is available in the appendix). There is a liberal use of sidebars to break up longer sections or highlight related digressions. And the authors also don't shy away from showing "broken" versions of programs as they evolve, which teaches the reader how to understand error messages and debug problems -- an extremely important skill for a programmer.

I recommend checking out Hello, World! if you have a young person in your life who is interested in learning about programming. Writing the book was a father/son project, and reading it together seems like a fun parent/child activity for the summer.