Tuesday, January 29, 2008
Monday, January 28, 2008
ProctorTicket
I have been meaning to post about this for a while, and am just getting around to it. As part of the GHOP contest, Zachary Voase created an add-on for Proctor that loads test failures into Trac as tickets.
ProctorTicket is an application, written in Python, which allows you to read a parseable proctor output log, or run a proctor test, and import all generated errors as Trac tickets. Tickets are the equivalent of issues, tasks or bugs in other software project management systems. ProctorTicket will determine the test class each error belongs to and group them accordingly. In addition, ProctorTicket can tell when a test has already been carried out, and it will not add the issue. If a test has been carried out before but the results vary, then it will automatically update the tickets to reflect this.
Check it out over at the ProctorTicket project site.
I'm looking forward to installing this at work so I can stop opening tickets for test failures manually. :-)
Sunday, January 27, 2008
PyMOTW: os.path
Use os.path for platform-independent manipulation of file names.
Module: os.path
Purpose: Parse, build, test, and otherwise work on file names and paths.
Python Version: 1.4 and later
Description:
Writing code to work with files on multiple platforms is easy using the functions included in the os.path module. Even programs not intended to be ported between platforms should use os.path to make parsing path names reliable.
Parsing Paths:
The first set of functions in os.path can be used to parse strings representing filenames into their component parts. It is important to realize that these functions do not depend on the paths actually existing. They operate solely on the strings.
Path parsing depends on a few variable defined in the os module:os.sep - The separator between portions of the path (e.g., "/").os.extsep - The separator between a filename and the file "extension" (e.g., ".").os.pardir - The path component that means traverse the directory tree up one level (e.g., "..").os.curdir - The path component that refers to the current directory (e.g., ".").split() breaks the path into 2 separate parts and returns the tuple. The second element is the last component of the path, and the first element is everything that comes before it.
import os.path
for path in [ '/one/two/three',
'/one/two/three/',
'/',
'.',
'']:
print '"%s" : "%s"' % (path, os.path.split(path))
$ python ospath_split.py
"/one/two/three" : "('/one/two', 'three')"
"/one/two/three/" : "('/one/two/three', '')"
"/" : "('/', '')"
"." : "('', '.')"
"" : "('', '')"
basename() returns a value equivalent to the second part of the split() value.import os.path
for path in [ '/one/two/three',
'/one/two/three/',
'/',
'.',
'']:
print '"%s" : "%s"' % (path, os.path.basename(path))
$ python ospath_basename.py
"/one/two/three" : "three"
"/one/two/three/" : ""
"/" : ""
"." : "."
"" : ""
dirname() returns the first path of the split path:import os.path
for path in [ '/one/two/three',
'/one/two/three/',
'/',
'.',
'']:
print '"%s" : "%s"' % (path, os.path.dirname(path))
$ python ospath_dirname.py
"/one/two/three" : "/one/two"
"/one/two/three/" : "/one/two/three"
"/" : "/"
"." : ""
"" : ""
splitext() works like split() but divides the path on the extension separator, rather than the directory names.import os.path
for path in [ 'filename.txt', 'filename', '/path/to/filename.txt', '/', '' ]:
print '"%s" :' % path, os.path.splitext(path)
$ python ospath_splitext.py
"filename.txt" : ('filename', '.txt')
"filename" : ('filename', '')
"/path/to/filename.txt" : ('/path/to/filename', '.txt')
"/" : ('/', '')
"" : ('', '')
commonprefix() takes a list of paths as an argument and returns a single string that represents a common prefix present in all of the paths. The value may represent a path that does not actually exist, and the path separator is not included in the consideration, so the prefix might not stop on a separator boundary.import os.path
paths = ['/one/two/three/four',
'/one/two/threefold',
'/one/two/three/',
]
print paths
print os.path.commonprefix(paths)
$ python ospath_commonprefix.py
['/one/two/three/four', '/one/two/threefold', '/one/two/three/']
/one/two/three
Building Paths:
Besides taking existing paths apart, you will frequently need to build paths from other strings.
To combine several path components into a single value, use
join():import os.path
for parts in [ ('one', 'two', 'three'),
('/', 'one', 'two', 'three'),
('/one', '/two', '/three'),
]:
print parts, ':', os.path.join(*parts)
$ python ospath_join.py
('one', 'two', 'three') : one/two/three
('/', 'one', 'two', 'three') : /one/two/three
('/one', '/two', '/three') : /three
It's also easy to work with paths that include "variable" components that can be expanded automatically. For example,
expanduser() converts the tilde (~) character to a user's home directory.import os.path
for user in [ '', 'dhellmann', 'postgres' ]:
lookup = '~' + user
print lookup, ':', os.path.expanduser(lookup)
$ python ospath_expanduser.py
~ : /Users/dhellmann
~dhellmann : /Users/dhellmann
~postgres : /var/empty
expandvars() is more general, and expands any shell environment variables present in the path.import os.path
import os
os.environ['MYVAR'] = 'VALUE'
print os.path.expandvars('/path/to/$MYVAR')
$ python ospath_expandvars.py
/path/to/VALUE
Normalizing Paths:
Paths assembled from separate strings using
join() or with embedded variables might end up with extra separators or relative path components. Use normpath() to clean them up:import os.path
for path in [ 'one//two//three',
'one/./two/./three',
'one/../one/two/three',
]:
print path, ':', os.path.normpath(path)
$ python ospath_normpath.py
one//two//three : one/two/three
one/./two/./three : one/two/three
one/../one/two/three : one/two/three
To convert a relative path to a complete absolute filename, use
abspath().import os.path
for path in [ '.', '..', './one/two/three', '../one/two/three']:
print '"%s" : "%s"' % (path, os.path.abspath(path))
$ python ospath_abspath.py
"." : "/Users/dhellmann/Documents/PyMOTW/in_progress/ospath"
".." : "/Users/dhellmann/Documents/PyMOTW/in_progress"
"./one/two/three" : "/Users/dhellmann/Documents/PyMOTW/in_progress/ospath/one/two/three"
"../one/two/three" : "/Users/dhellmann/Documents/PyMOTW/in_progress/one/two/three"
File Times:
Besides working with paths, os.path also includes some functions for retrieving file properties, which can be more convenient than calling
os.stat():import os.path
import time
print 'File :', __file__
print 'Access time :', time.ctime(os.path.getatime(__file__))
print 'Modified time:', time.ctime(os.path.getmtime(__file__))
print 'Change time :', time.ctime(os.path.getctime(__file__))
print 'Size :', os.path.getsize(__file__)
$ python ospath_properties.py
File : /Users/dhellmann/Documents/PyMOTW/in_progress/ospath/ospath_properties.py
Access time : Sun Jan 27 15:40:20 2008
Modified time: Sun Jan 27 15:39:06 2008
Change time : Sun Jan 27 15:39:06 2008
Size : 478
Testing Files:
When your program encounters a path name, it often needs to know whether the path refers to a file or directory. If you are working on a platform that supports it, you may need to know if the path refers to a symbolic link or mount point. You will also want to test whether the path exists or not.
os.path provides functions to test all of these conditions.import os.path
for file in [ __file__, os.path.dirname(__file__), '/', './broken_link']:
print 'File :', file
print 'Absolute :', os.path.isabs(file)
print 'Is File? :', os.path.isfile(file)
print 'Is Dir? :', os.path.isdir(file)
print 'Is Link? :', os.path.islink(file)
print 'Mountpoint? :', os.path.ismount(file)
print 'Exists? :', os.path.exists(file)
print 'Link Exists?:', os.path.lexists(file)
$ ln -s /does/not/exist broken_link
$ python ospath_tests.py
File : /Users/dhellmann/Documents/PyMOTW/in_progress/ospath/ospath_tests.py
Absolute : True
Is File? : True
Is Dir? : False
Is Link? : False
Mountpoint? : False
Exists? : True
Link Exists?: True
File : /Users/dhellmann/Documents/PyMOTW/in_progress/ospath
Absolute : True
Is File? : False
Is Dir? : True
Is Link? : False
Mountpoint? : False
Exists? : True
Link Exists?: True
File : /
Absolute : True
Is File? : False
Is Dir? : True
Is Link? : False
Mountpoint? : True
Exists? : True
Link Exists?: True
File : ./broken_link
Absolute : False
Is File? : False
Is Dir? : False
Is Link? : True
Mountpoint? : False
Exists? : False
Link Exists?: True
Traversing a Directory Tree:
os.path.walk() traverses all of the directories in a tree and calls a function you provide passing the directory name and the names of the contents of that directory. This example produces a recursive directory listing, ignoring .svn directories.import os.path
import pprint
def visit(arg, dirname, names):
print dirname, arg
for name in names:
subname = os.path.join(dirname, name)
if os.path.isdir(subname):
print ' %s/' % name
else:
print ' %s' % name
# Do not recurse into .svn directory
if '.svn' in names:
names.remove('.svn')
os.path.walk('..', visit, '(User data)')
$ python ospath_walk.py
.. (User data)
.svn/
ospath/
../ospath (User data)
.svn/
__init__.py
ospath_abspath.py
ospath_basename.py
ospath_commonprefix.py
ospath_dirname.py
ospath_expanduser.py
ospath_expandvars.py
ospath_join.py
ospath_normpath.py
ospath_properties.py
ospath_split.py
ospath_splitext.py
ospath_tests.py
ospath_walk.py
References:
Python Module of the Week Home
Download Sample Code
Technorati Tags:
python, PyMOTW
PyCon 2008

Registration for PyCon 2008 is now open. I'm signed up and ready to go. I haven't been for a few years, so I'm pretty excited to be going this year. I guess the first time I went to a Python conference was IPC8 back in 2000. We were snowed into the hotel that weekend by a freak storm in DC, and barely made it out of town before the next icing hit by taking an earlier flight than we had planned on Sunday. I missed IPC 9, but went to 10 out in Long Beach (the weather there was a lot nicer). Both conferences were fun and informative, and I'm glad to have a chance to attend again this year.
I'm not exactly sure what the difference is between IPC (International Python Conference?) and PyCon, except that PyCon is billed as a "community conference." I guess that has to do with how it is organized. I know a lot of people have been putting in a LOT of time and effort to set the whole thing up this year.
Conference Schedule:
Looking over the schedule, I see a few of the talks I had hoped to attend are going to overlap. Having to choose between two fascinating presentations at a conference is a real First World problem, though, so I guess I can't complain. I'm glad there are plans to podcast the sessions again this year, so I'll at least be able to listen to any I can't attend.
Here's a list of the sessions I'm considering for my schedule:
Buffer interface in Py3K (Travis E Oliphant)
I have no idea what that is, but I obviously want to find out about any new APIs in Py3K.
MPI Cluster Programming with Python and Amazon EC2 (Peter Skomoroch) OR How Import Does Its Thing (Mr. Brett Cannon)
I'm interested in the clustering stuff, but I also have an idea for using memcached to cache imported modules, so the "Import" session would be interesting.
Running a Successful Usergroup (Jeff Rush) OR Applying expert system technology to code reuse with Pyke (Bruce Frederiksen)
PyATL is running fairly smoothly, but the book club seems to have petered out over the holidays. I've been really busy this month, and we haven't been able to spur interest again. Maybe Jeff has some ideas. On the other hand, expert systems!
Dogtail: Taking your applications for a walk (Mr. Ramakrishna Reddy Yekulla) OR PyTriton: building a petabyte storage system (Jonathan Ellis)
Testing or big storage? Both would be useful for me at work.
Rich UI Webapps with TurboGears 2 and Dojo (Mr. Kevin Dangoor)
I don't count myself as a TurboGears fan, but it should be interesting to see Dangoor talk about it.
Jython on the Joint Strike Fighter (Mr. George F Rice) OR Tahoe: A Robust Distributed Secure Filesystem (Brian Warner) OR The State of Django (Adrian Holovaty)
I do count myself as a Django user, so it will be tough to choose between Holovaty's talk and the other two. I think I would probably end up in the Tahoe talk if I don't go to the one on Django.
Like Switching on the Light: Managing an Elastic Compute Cluster with Python (George Belotsky, Heath Johns) OR Django: Under the Hood (Marty Alchin)
The first talk there sounds related to what we do at Racemi.
I noticed that there are several "large computing" talks this year (cluster/compute cloud/big filesystem). Is that a trend?
Conference Goals:
Aside from attending sessions, the other thing I want to come away from the weekend with is a couple of new author commitments for the magazine. Whether from presenters or attendees, I'm going to be scouting for subjects and authors. If you have any interest at all in writing, but just aren't sure how to start, look me up at the conference and we can talk (or, of course, email me and we can talk now doug dot hellmann at pythonmagazine dot com).
So, are you going to PyCon? What sessions are you attending and what do you hope to get out of it? Tutorials? Sprints?
Saturday, January 26, 2008
January issue of Python Magazine now available for download
The January issue is live and ready for download from http://pymag.phparch.com/c/issue/view/66.
This month the cover article from Alex Martelli gives advice about how and when to use regular expressions for parsing text, and when to use other techniques instead.
Eugen Wintersberger has an excellent introduction to using ctypes. I've been looking forward to reviewing his article for a while now, so I'm glad to finally see it in print.
Dan Felts returns with an article about controlling the Nessus security scanner using Python to talk to it over the network.
In his Welcome to Python column this month Mark Mruss covers everything you need to know about iterators and generators.
And Steve Holden waxes philosophical about how, or whether, we should be marketing Python the language and platform.
I have two pieces in the mag this month. First, an article about treating your command line programs as objects by using CommandLineApp to define their options and argument processing. And my column starts the series I've mentioned on this blog previously by covering the wealth of testing tools available in Python.
It's another good issue, and we're already well into the work of producing the February edition as I write this. So go download the new issue and send your feedback to us via the pythonmagazine.com web site or in #pymag on irc.freenode.net.
Sunday, January 20, 2008
PyMOTW: hashlib
Generate cryptographically secure hashes with hashlib.
Module: hashlib
Purpose: Cryptographic hashes and message digests
Python Version: 2.5
Description:
The hashlib module deprecates the separate md5 and sha modules and makes their API consistent. To work with a specific hash algorithm, use the appropriate constructor function to create a hash object. Then you can use the same API to interact with the hash no matter what algorithm is being used.
Since hashlib is "backed" by OpenSSL, all of of the algorithms provided by that library should be available, including:
md5()sha1()sha224()sha256()sha384()sha512()
MD5 Example:
To calculate the MD5 digest for a block of data (here an ASCII string), create the hash, add the data, and compute the digest.
import hashlib
from hashlib_data import lorem
h = hashlib.md5()
h.update(lorem)
print h.hexdigest()
This example uses the
hexdigest() method instead of digest() because the output is formatted to be printed. If a binary digest value is acceptable, you can use digest().
$ python hashlib_md5.py
c3abe541f361b1bfbbcfecbf53aad1fb
SHA1 Example:
A SHA1 digest for the same data would be calculated in much the same way:
import hashlib
from hashlib_data import lorem
h = hashlib.sha1()
h.update(lorem)
print h.hexdigest()
Of course, the digest value is different because of the different algorithm.
$ python hashlib_sha1.py
ac2a96a4237886637d5352d606d7a7b6d7ad2f29
new():
Sometimes it is more convenient to refer to the algorithm by name in a string rather than by using the constructor function directly. It is useful, for example, to be able to store the hash type in a configuration file. In those cases, use the
new() function directly to create a new hash calculator.import hashlib
import sys
try:
hash_name = sys.argv[1]
except IndexError:
print 'Specify the hash name as the first argument.'
else:
try:
data = sys.argv[2]
except IndexError:
from hashlib_data import lorem as data
h = hashlib.new(hash_name)
h.update(data)
print h.hexdigest()
When run with a variety of arguments:
$ python hashlib_new.py sha1
ac2a96a4237886637d5352d606d7a7b6d7ad2f29
$ python hashlib_new.py sha256
88b7404fc192fcdb9bb1dba1ad118aa1ccd580e9faa110d12b4d63988cf20332
$ python hashlib_new.py sha512
f58c6935ef9d5a94d296207ee4a7d9bba411539d8677482b7e9d60e4b7137f68d25f9747cab62fe752ec5ed1e5b2fa4cdbc8c9203267f995a5d17e4408dccdb4
$ python hashlib_new.py md5
c3abe541f361b1bfbbcfecbf53aad1fb
Calling update() more than once:
The
update() method of the hash calculators can be called repeatedly. Each time, the digest is updated based on the additional text fed in. This can be much more efficient than reading an entire file into memory, for example.import hashlib
from hashlib_data import lorem
h = hashlib.md5()
h.update(lorem)
all_at_once = h.hexdigest()
def chunkize(size, text):
"Return parts of the text in size-based increments."
start = 0
while start < len(text):
chunk = text[start:start+size]
yield chunk
start += size
return
h = hashlib.md5()
for chunk in chunkize(64, lorem):
h.update(chunk)
line_by_line = h.hexdigest()
print 'All at once :', all_at_once
print 'Line by line:', line_by_line
print 'Same :', (all_at_once == line_by_line)
This example is a little contrived because it works with such a small amount of text, but it illustrates how you could incrementally update a digest as data is read or otherwise produced.
$ python hashlib_update.py
All at once : c3abe541f361b1bfbbcfecbf53aad1fb
Line by line: c3abe541f361b1bfbbcfecbf53aad1fb
Same : True
References:
Voidspace: IronPython and Hashlib
hmac module
PyMOTW: hmac
Python Module of the Week Home
Download Sample Code
Technorati Tags:
python, PyMOTW
Django with PostgreSQL on Mac OS X Leopard
Previously, I discussed the steps I went through to get PostgreSQL working on Tiger. This weekend I upgraded my system to a new MacBook Pro running Leopard.
PPC -> x86:
Although the Migration Assistant copied the version of PostgreSQL I had previously installed to the new machine, the results didn't work because the service would not start correctly. I ended up reinstalling using the Unified Installer from PostgreSQL for Mac, and the server still wouldn't start. I deleted the old database and re-initialized it (thanks to hints in some instructions from Russell Brooks) and that took care of the problem. I'm not sure if there was any way for me to convert the data, but I didn't have anything important in the database that I can't re-create, so it's fine to start from scratch.
X Code:
The next step was to install X Code 3. That was easy, with the package installer from Apple.
psycopg:
And then I went back to battle with my old nemesis psycopg. I should probably have taken Steve Holden's advice in the comments on my earlier post and just used psycopg2 instead. I didn't because that would have meant upgrading my production server, too, since the whole point of using PostgreSQL instead of SQLite is the back-end adapters in django produce different SQL for the same QuerySet.
To configure psycopg, I had to set LDFLAGS to include the directory with the crt1.10.5.0 library. It's installed to what looks like should be a standard library directory for the X Code gcc, but ld couldn't find it during the "create an executable" test.
Then when running make I had the same problem I had seen under PPC:
$ make
gcc -DNDEBUG -g -O3 -I/Users/dhellmann/Devel/AthensDocket/bin/../include/python2.5
-I/Users/dhellmann/Devel/AthensDocket/bin/../lib/python2.5/config -DPACKAGE_NAME=\"psycopg\"
-DPACKAGE_TARNAME=\"psycopg\" -DPACKAGE_VERSION=\"1.1.21\" -DPACKAGE_STRING=\"psycopg\ 1.1.21\"
-DPACKAGE_BUGREPORT=\"psycopg@lists.initd.org\" -DHAVE_LIBCRYPTO=1 -DHAVE_ASPRINTF=1
-I/Library/PostgreSQL8/include -I/Library/PostgreSQL8/include/postgresql/server
-I../egenix-mx-base-3.0.0/mx/DateTime/mxDateTime -DHAVE_PQFREENOTIFY -DNDEBUG -D_REENTRANT
-D_GNU_SOURCE -DPOSTGRESQL_MAJOR=8 -DPOSTGRESQL_MINOR=2 -c ././module.c -o ./module.o
In file included from ././module.c:33:
././module.h:30:20: error: Python.h: No such file or directory
I thought this was related to using virtualenv, but it didn't work correctly outside of the virtualenv setting this time (for some reason, it did on the PPC laptop). It turns out the error message was correct and configure/gcc just couldn't tell where the Python headers were. The configure command that let me compile was:
$ CPPFLAGS=-I/Library/Frameworks/Python.framework/Headers \
> LDFLAGS=-L/Developer/SDKs/MacOSX10.5.sdk/usr/lib \
> ./configure --with-postgres-libraries=/Library/PostgreSQL8/lib \
> --with-postgres-includes=/Library/PostgreSQL8/include \
> --with-mxdatetime-includes=../egenix-mx-base-3.0.0/mx/DateTime/mxDateTime
Then, oddly enough, when I did my
make install step, the psycopg.so was copied to $VIRTUAL_ENV/bin instead of $VIRTUAL_ENV/lib/python2.5/site-packages. That was easy enough to solve by moving the file manually, and then I was able to import the psycopg module.So, after about an hour, I'm back to being able to develop with django and PostgreSQL on OS X Leopard. Maybe now I can start enjoying some of the new features!
Saturday, January 19, 2008
Python development tools you can't live without
I'm working on a series of columns for Python Magazine in which I will be talking about development tools. The first "episode" appears in the January 2008 issue and covers testing tools like frameworks and runners. I have several more columns plotted out, but want to make sure I cover the topics well and don't miss out on mentioning a new or small tool just because I haven't heard about it myself.
So, I'm looking for feedback about the tools you just can't (or don't want to) live without. Tell me all about your development environment, editor, shell, debugger, IDE, version control system, etc. What libraries (in addition to the standard library) do you use on a regular basis? Is there any one thing that you would identify as being so necessary for your Python work that if it was taken away you'd have to give up and use another language? (And "the interpreter" is already on the list, thanks.)
If you're building a new development tool you think I should look at for the series, post a link in a comment here or tag it with pymagdifferent on del.icio.us and I'll review it. I can't promise that everything will make it into the columns right away, but it will eventually. Unusual or unique entries are more likely to be covered sooner.
It seems like everyone and their brother is building an editor, so if that's your game at least tell me why you think yours is better/different from all the others. There are some problems with text editing that are not easily solved, so if you have a killer feature make sure you point that out. That goes double for templating languages -- highlight your special strengths.
Sunday, January 13, 2008
PyMOTW: threading
The threading module lets you run multiple operations concurrently in the same process space.
Module: threading
Purpose: Builds on the thread module to more easily manage several threads of execution.
Python Version: since 1.5.2 (some of these examples require 2.5 because they use the with statement)
Description:
The threading module builds on the low-level features of the thread module to make working with threads even easier and more pythonic.
Thread objects:
The simplest way to use a thread is to instantiate it with a target function and call start() to let it begin working.
import threading
def worker():
"""thread worker function"""
print 'Worker'
return
threads = []
for i in range(5):
t = threading.Thread(target=worker)
threads.append(t)
t.start()
The output, is unsurprisingly, 5 lines with "Worker" on each:
$ python threading_simple.py
Worker
Worker
Worker
Worker
Worker
It useful to be able to spawn a thread and pass it arguments to tell it what work to do. For example, in PyMOTW: Queue, I created a simple program to illustrate how to download enclosures from RSS/Atom feeds. Each downloader thread needed to know where to find the URLs, and the
Queue instance was passed as an argument when the thread was created. Here, we'll just pass the thread a number so the output is a little more interesting in the second example.import threading
def worker(num):
"""thread worker function"""
print 'Worker:', num
return
threads = []
for i in range(5):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
The integer argument is now included in the message printed by each thread:
$ python threading_simpleargs.py
Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4
Determining the current thread:
Using arguments to identify or name the thread is cumbersome, and unnecessary. Each
Thread instance has a name with a default value that you can change as the thread is created. Naming threads is useful if you have a server process with multiple service threads handling different operations. import threading
import time
def worker():
print threading.currentThread().getName(), 'Starting'
time.sleep(2)
print threading.currentThread().getName(), 'Exiting'
def my_service():
print threading.currentThread().getName(), 'Starting'
time.sleep(3)
print threading.currentThread().getName(), 'Exiting'
t = threading.Thread(name='my_service', target=my_service)
w = threading.Thread(name='worker', target=worker)
w2 = threading.Thread(target=worker) # use default name
w.start()
w2.start()
t.start()
The debug output includes the name of the current thread on each line. The lines with "Thread-1" in the thread name column correspond to the unnamed thread
w2.
$ python threading_names.py
worker Starting
Thread-1 Starting
my_service Starting
worker Exiting
Thread-1 Exiting
my_service Exiting
Of course, in most programs you won't use
print to debug. The logging module supports embedding the thread name in every log message using the formatter code %(threadName)s. Including thread names in log messages makes it easier to trace those messages back to their source.import logging
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)-10s) %(message)s',
)
def worker():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')
def my_service():
logging.debug('Starting')
time.sleep(3)
logging.debug('Exiting')
t = threading.Thread(name='my_service', target=my_service)
w = threading.Thread(name='worker', target=worker)
w2 = threading.Thread(target=worker) # use default name
w.start()
w2.start()
t.start()
The output from the format string above looks like:
$ python threading_names_log.py
[DEBUG] (worker ) Starting
[DEBUG] (Thread-1 ) Starting
[DEBUG] (my_service) Starting
[DEBUG] (worker ) Exiting
[DEBUG] (Thread-1 ) Exiting
[DEBUG] (my_service) Exiting
Daemon vs. Non-Daemon Threads:
Up until this point, I have been assuming that the main program does not exit until all threads have completed their work. Sometimes you will want to spawn a thread as a "daemon" that runs without blocking the main program from exiting. Using daemon threads is useful for services where there may not be an easy way to interrupt the thread or where letting the thread die in the middle of its work does not lose or corrupt data (for example, a thread that generates "heart beats" for a service monitoring tool). To mark a thread as a daemon, call its
setDaemon() with a boolean argument. The default is for threads to not be daemons, so passing True turns the daemon mode on.import threading
import time
def daemon():
print 'Starting:', threading.currentThread().getName()
time.sleep(2)
print 'Exiting :', threading.currentThread().getName()
d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)
def non_daemon():
print 'Starting:', threading.currentThread().getName()
print 'Exiting :', threading.currentThread().getName()
t = threading.Thread(name='non-daemon', target=non_daemon)
d.start()
t.start()
Notice that the output does not include the "Exiting" message from the daemon thread, since all of the non-daemon threads (including the main thread) exit before the daemon thread wakes up from its 2 second sleep.
$ python threading_daemon.py
Starting: daemon
Starting: non-daemon
Exiting : non-daemon
To wait until the daemon thread has completed its work, use the
join() method.import threading
import time
def daemon():
print 'Starting:', threading.currentThread().getName()
time.sleep(2)
print 'Exiting :', threading.currentThread().getName()
d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)
def non_daemon():
print 'Starting:', threading.currentThread().getName()
print 'Exiting :', threading.currentThread().getName()
t = threading.Thread(name='non-daemon', target=non_daemon)
d.start()
t.start()
d.join()
t.join()
Since we wait for the daemon thread to exit using
join(), we do see its "Exiting" message.
$ python threading_daemon_join.py
Starting: daemon
Starting: non-daemon
Exiting : non-daemon
Exiting : daemon
By default,
join() blocks indefinitely. It is also possible to pass a timeout argument (a float representing the number of seconds to wait for the thread to become inactive). If the thread does not complete within the timeout period, join() returns anyway.import threading
import time
def daemon():
print 'Starting:', threading.currentThread().getName()
time.sleep(2)
print 'Exiting :', threading.currentThread().getName()
d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)
def non_daemon():
print 'Starting:', threading.currentThread().getName()
print 'Exiting :', threading.currentThread().getName()
t = threading.Thread(name='non-daemon', target=non_daemon)
d.start()
t.start()
d.join(1)
print 'd.isAlive()', d.isAlive()
t.join()
Since the timeout passed is less than the amount of time the daemon thread sleeps, the thread is still "alive" after
join() returns.
$ python threading_daemon_join_timeout.py
Starting: daemon
Starting: non-daemon
Exiting : non-daemon
d.isAlive() True
Using enumerate() to wait for all running threads:
It is not necessary to retain an explicit handle to all of the daemon threads you start in order to ensure they have completed before exiting the main process.
threading.enumerate() returns a list of active Thread instances. The list includes the current thread, and since joining the current thread is not allowed (it introduces a deadlock situation), we must check before joining.import random
import threading
import time
def worker():
"""thread worker function"""
t = threading.currentThread()
pause = random.randint(1,5)
print 'Starting:', t.getName(), 'sleeping', pause
time.sleep(pause)
print 'Ending :', t.getName()
return
for i in range(3):
t = threading.Thread(target=worker)
t.setDaemon(True)
t.start()
main_thread = threading.currentThread()
for t in threading.enumerate():
if t is main_thread:
continue
print 'Joining :', t.getName()
t.join()
Since the worker is sleeping for a random amount of time, your output may vary. It should look something like this, though:
$ python threading_enumerate.py
Starting: Thread-1 sleeping 2
Starting: Thread-2 sleeping 5
Starting: Thread-3 sleeping 2
Joining : Thread-1
Ending : Thread-1
Joining : Thread-3
Ending : Thread-3
Joining : Thread-2
Ending : Thread-2
Creating your own Thread class:
When you start a
Thread, it does some basic setup and then calls its run() method, which calls the target function passed to the constructor. If you want to create your own type of thread, you can subclass from Thread and override run() to do whatever you want.import threading
class MyThread(threading.Thread):
def run(self):
print 'MyThread:', self.getName()
return
for i in range(5):
t = MyThread()
t.start()
$ python threading_subclass.py
MyThread: Thread-1
MyThread: Thread-2
MyThread: Thread-3
MyThread: Thread-4
MyThread: Thread-5
Starting a task in a thread with a Timer:
One example of a reason to subclass
Thread is provided by Timer, also included in threading. A Timer lets you start the work of your thread after a delay, and cancel the operation at any point within that time period.import threading
import time
def delayed():
print 'Worker running', threading.currentThread().getName()
return
t1 = threading.Timer(3, delayed)
t1.setName('t1')
t2 = threading.Timer(3, delayed)
t2.setName('t2')
print 'Starting timers'
t1.start()
t2.start()
print 'Waiting before canceling', t2.getName()
time.sleep(2)
print 'Canceling', t2.getName()
t2.cancel()
print 'Main thread done'
Notice that the second timer is never run, and the first timer appears to run after the rest of the main program is done. Since it is not a daemon thread, we do not have to
join() it explicitly to block waiting for it.
$ python threading_timer.py
Starting timers
Waiting before canceling t2
Canceling t2
Main thread done
Worker running t1
Signaling between threads with Event objects:
Although the point of using multiple threads is to spin separate operations off to run more or less simultaneously, there are times when it is important to be able to synchronize the operations in two or more threads. A simple way to communicate between threads is using
Event objects. An Event manages an internal flag that users can either set() or clear(). Other users can wait() for the flag to be set(), effectively blocking progress until allowed to continue. You can also think of an Event as a traffic light.import logging
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s (%(threadName)-10s) %(message)s',
)
def wait_for_event(e):
"""Wait for the event to be set before doing anything"""
logging.debug('wait_for_event starting')
e.wait()
logging.debug('e.isSet()->%s', e.isSet())
def wait_for_event_timeout(e, t):
"""Wait t seconds and then timeout"""
logging.debug('wait_for_event_timeout starting')
e.wait(t)
logging.debug('e.isSet()->%s', e.isSet())
e = threading.Event()
t1 = threading.Thread(name='block',
target=wait_for_event,
args=(e,))
t1.start()
t2 = threading.Thread(name='non-block',
target=wait_for_event_timeout,
args=(e, 2))
t2.start()
logging.debug('Waiting before calling Event.set()')
time.sleep(3)
e.set()
logging.debug('Event is set')
In this case, the
non-block thread times out before the Event is set.
$ python threading_event.py
2008-01-13 13:25:02,514 (block ) wait_for_event starting
2008-01-13 13:25:02,525 (non-block ) wait_for_event_timeout starting
2008-01-13 13:25:02,536 (MainThread) Waiting before calling Event.set()
2008-01-13 13:25:04,526 (non-block ) e.isSet()->False
2008-01-13 13:25:05,563 (MainThread) Event is set
2008-01-13 13:25:05,564 (block ) e.isSet()->True
Controlling access to resources with Lock:
In addition to synchronizing the operations of threads, it is also important to be able to control access to shared resources to prevent corruption or missed data. Python's built-in data structures (lists, dictionaries, etc.) are thread-safe as a side-effect of having atomic byte-codes for manipulating them (so the GIL is not released in the middle of an update). Your own data structures implemented in Python (or simpler types like integers and floats), don't have that protection. To guard against simultaneous access to an object, use a
Lock object.import logging
import random
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)-10s) %(message)s',
)
class Counter(object):
def __init__(self, start=0):
self.lock = threading.Lock()
self.value = start
def increment(self):
self.lock.acquire()
try:
logging.debug('Acquired lock')
self.value = self.value + 1
finally:
self.lock.release()
def worker(c):
for i in range(3):
pause = random.random()
logging.debug('Sleeping %0.02f', pause)
time.sleep(pause)
c.increment()
logging.debug('Done')
counter = Counter()
for i in range(5):
t = threading.Thread(target=worker, args=(counter,))
t.start()
logging.debug('Waiting for worker threa