Sunday, August 31, 2008

PyMOTW: profile, cProfile, pstats

The profile and cProfile modules provide APIs for collecting and analyzing statistics about how Python source consumes processor resources.

Module: profile
Purpose: Performance analysis of Python programs
Python Version: 1.4 and later, these examples are for Python 2.5

run():

The most basic starting point in the profile module is run(). It takes a string statement as argument, and creates a report of the time spent executing different lines of code while running the statement.

import profile

def fib(n):
# from http://en.literateprograms.org/Fibonacci_numbers_(Python)
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)

def fib_seq(n):
seq = [ ]
if n > 0:
seq.extend(fib_seq(n-1))
seq.append(fib(n))
return seq

print 'RAW'
print '=' * 80
profile.run('print fib_seq(20); print')


This recursive version of a fibonacci sequence calculator is especially useful for demonstrating the profile because we can improve the performance so much. The standard report format shows a summary and then details for each function executed.


$ python profile_fibonacci_raw.py
RAW
================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

57356 function calls (66 primitive calls) in 0.746 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
21 0.000 0.000 0.000 0.000 :0(append)
20 0.000 0.000 0.000 0.000 :0(extend)
1 0.001 0.001 0.001 0.001 :0(setprofile)
1 0.000 0.000 0.744 0.744 :1()
1 0.000 0.000 0.746 0.746 profile:0(print fib_seq(20); print)
0 0.000 0.000 profile:0(profiler)
57291/21 0.743 0.000 0.743 0.035 profile_fibonacci_raw.py:13(fib)
21/1 0.001 0.000 0.744 0.744 profile_fibonacci_raw.py:22(fib_seq)


As you can see, it takes 57356 separate function calls and 3/4 of a second to run. Since there are only 66 *primitive* calls, we know that the vast majority of those 57k calls were recursive. The details about where time was spent are broken out by function in the listing showing the number of calls, total time spent in the function, time per call (tottime/ncalls), cumulative time spent in a function, and the ratio of cumulative time to primitive calls.

Not surprisingly, most of the time here is spent calling fib() repeatedly. We can add a memoize decorator to reduce the number of recursive calls and have a big impact on the performance of this function.

import profile

class memoize:
# from http://avinashv.net/2008/04/python-decorators-syntactic-sugar/
def __init__(self, function):
self.function = function
self.memoized = {}

def __call__(self, *args):
try:
return self.memoized[args]
except KeyError:
self.memoized[args] = self.function(*args)
return self.memoized[args]

@memoize
def fib(n):
# from http://en.literateprograms.org/Fibonacci_numbers_(Python)
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)

def fib_seq(n):
seq = [ ]
if n > 0:
seq.extend(fib_seq(n-1))
seq.append(fib(n))
return seq

if __name__ == '__main__':
print 'MEMOIZED'
print '=' * 80
profile.run('print fib_seq(20); print')


By remembering the Fibonacci value at each level we can avoid most of the recursion and drop down to 145 calls that only take 0.003 seconds. Also notice that the ncalls count for fib() shows that it *never* recurses.


$ python profile_fibonacci_memoized.py
MEMOIZED
================================================================================
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

145 function calls (87 primitive calls) in 0.003 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
21 0.000 0.000 0.000 0.000 :0(append)
20 0.000 0.000 0.000 0.000 :0(extend)
1 0.001 0.001 0.001 0.001 :0(setprofile)
1 0.000 0.000 0.002 0.002 :1()
1 0.000 0.000 0.003 0.003 profile:0(print fib_seq(20); print)
0 0.000 0.000 profile:0(profiler)
59/21 0.001 0.000 0.001 0.000 profile_fibonacci_memoized.py:19(__call__)
21 0.000 0.000 0.001 0.000 profile_fibonacci_memoized.py:26(fib)
21/1 0.001 0.000 0.002 0.002 profile_fibonacci_memoized.py:36(fib_seq)


runctx():

Of course, it's not always easy to construct the expression to pass to run(). Sometimes it is easier to build a simple expression and run it in a context with globals and locals, using runctx().

import profile
from profile_fibonacci_memoized import fib, fib_seq

if __name__ == '__main__':
profile.runctx('print fib_seq(n); print', globals(), {'n':20})


In this example, the value of "n" is passed through the local variable context instead of being embedded directly in the statement passed to runctx().


$ python profile_runctx.py
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

145 function calls (87 primitive calls) in 0.003 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
21 0.000 0.000 0.000 0.000 :0(append)
20 0.000 0.000 0.000 0.000 :0(extend)
1 0.001 0.001 0.001 0.001 :0(setprofile)
1 0.000 0.000 0.002 0.002 :1()
1 0.000 0.000 0.003 0.003 profile:0(print fib_seq(n); print)
0 0.000 0.000 profile:0(profiler)
59/21 0.001 0.000 0.001 0.000 profile_fibonacci_memoized.py:19(__call__)
21 0.000 0.000 0.001 0.000 profile_fibonacci_memoized.py:26(fib)
21/1 0.001 0.000 0.002 0.002 profile_fibonacci_memoized.py:36(fib_seq)


pstats: Saving and Working With Statistics:

If the standard report is not formatted the way you need it to be, both run() and runctx() take a filename argument to save the raw data to a file instead of printing the text report. The Stats class from the pstats module knows how to read the file and can be used to manipulate the data.

For example, to run several iterations of the same test and combine the results, you might do something like this:

import profile
import pstats
from profile_fibonacci_memoized import fib, fib_seq

# Create 5 set of stats
filenames = []
for i in range(5):
filename = 'profile_stats_%d.stats' % i
profile.run('print %d, fib_seq(20)' % i, filename)

# Read all 5 stats files into a single object
stats = pstats.Stats('profile_stats_0.stats')
for i in range(1, 5):
stats.add('profile_stats_%d.stats' % i)

# Clean up filenames for the report
stats.strip_dirs()

# Sort the statistics by the cumulative time spent in the function
stats.sort_stats('cumulative')

stats.print_stats()


The output report is sorted in descending order of cumulative time spent in the function and the directory names are removed from the printed filenames to conserve horizontal space.


$ python profile_stats.py
0 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
1 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
2 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
3 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
4 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]
Sun Aug 31 11:29:36 2008 profile_stats_0.stats
Sun Aug 31 11:29:36 2008 profile_stats_1.stats
Sun Aug 31 11:29:36 2008 profile_stats_2.stats
Sun Aug 31 11:29:36 2008 profile_stats_3.stats
Sun Aug 31 11:29:36 2008 profile_stats_4.stats

489 function calls (351 primitive calls) in 0.008 CPU seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
5 0.000 0.000 0.007 0.001 :1()
105/5 0.004 0.000 0.007 0.001 profile_fibonacci_memoized.py:36(fib_seq)
1 0.000 0.000 0.003 0.003 profile:0(print 0, fib_seq(20))
143/105 0.001 0.000 0.002 0.000 profile_fibonacci_memoized.py:19(__call__)
1 0.000 0.000 0.001 0.001 profile:0(print 4, fib_seq(20))
1 0.000 0.000 0.001 0.001 profile:0(print 1, fib_seq(20))
1 0.000 0.000 0.001 0.001 profile:0(print 2, fib_seq(20))
1 0.000 0.000 0.001 0.001 profile:0(print 3, fib_seq(20))
21 0.000 0.000 0.001 0.000 profile_fibonacci_memoized.py:26(fib)
100 0.001 0.000 0.001 0.000 :0(extend)
105 0.001 0.000 0.001 0.000 :0(append)
5 0.001 0.000 0.001 0.000 :0(setprofile)
0 0.000 0.000 profile:0(profiler)


Limiting Report Contents:

Since we are studying the performance of fib() and fib_seq(), we can also restrict the output report to only include those functions using a regular expression to match the filename:lineno(function) values we want.

import profile
import pstats
from profile_fibonacci_memoized import fib, fib_seq

# Read all 5 stats files into a single object
stats = pstats.Stats('profile_stats_0.stats')
for i in range(1, 5):
stats.add('profile_stats_%d.stats' % i)
stats.strip_dirs()
stats.sort_stats('cumulative')

# limit output to lines with "(fib" in them
stats.print_stats('\(fib')


The regular expression includes a literal left paren (() to match against the function name portion of the location value.


$ python profile_stats_restricted.py
Sun Aug 31 11:29:36 2008 profile_stats_0.stats
Sun Aug 31 11:29:36 2008 profile_stats_1.stats
Sun Aug 31 11:29:36 2008 profile_stats_2.stats
Sun Aug 31 11:29:36 2008 profile_stats_3.stats
Sun Aug 31 11:29:36 2008 profile_stats_4.stats

489 function calls (351 primitive calls) in 0.008 CPU seconds

Ordered by: cumulative time
List reduced from 13 to 2 due to restriction <'\\(fib'>

ncalls tottime percall cumtime percall filename:lineno(function)
105/5 0.004 0.000 0.007 0.001 profile_fibonacci_memoized.py:36(fib_seq)
21 0.000 0.000 0.001 0.000 profile_fibonacci_memoized.py:26(fib)


Caller / Callee Graphs:

Stats also includes methods for printing the callers and callees of functions.

import profile
import pstats
from profile_fibonacci_memoized import fib, fib_seq

# Read all 5 stats files into a single object
stats = pstats.Stats('profile_stats_0.stats')
for i in range(1, 5):
stats.add('profile_stats_%d.stats' % i)
stats.strip_dirs()
stats.sort_stats('cumulative')

print 'INCOMING CALLERS:'
stats.print_callers('\(fib')

print 'OUTGOING CALLEES:'
stats.print_callees('\(fib')


The arguments to print_callers() and print_callees() work the same as the restriction arguments to print_stats(). The output shows the caller, callee, and cumulative time.


$ python profile_stats_callers.py
INCOMING CALLERS:
Ordered by: cumulative time
List reduced from 13 to 2 due to restriction <'\\(fib'>

Function was called by...
profile_fibonacci_memoized.py:36(fib_seq) <- :1()(5) 0.007
profile_fibonacci_memoized.py:36(fib_seq)(100) 0.007
profile_fibonacci_memoized.py:26(fib) <- profile_fibonacci_memoized.py:19(__call__)(21) 0.002


OUTGOING CALLEES:
Ordered by: cumulative time
List reduced from 13 to 2 due to restriction <'\\(fib'>

Function called...
profile_fibonacci_memoized.py:36(fib_seq) -> :0(append)(105) 0.001
:0(extend)(100) 0.001
profile_fibonacci_memoized.py:19(__call__)(105) 0.002
profile_fibonacci_memoized.py:36(fib_seq)(100) 0.007
profile_fibonacci_memoized.py:26(fib) -> profile_fibonacci_memoized.py:19(__call__)(38) 0.002


References:

Python Module of the Week Home
Download Sample Code
profile and cProfile
pstats

The Fibonacci example implementation came from Fibonacci numbers (Python) - LiteratePrograms.

The memoize decorator came from Python Decorators: Syntactic Sugar | avinash.vora.

History:

Updated 1 September 2008 to add content of profile_runctx.py.


Technorati Tags:
,


Friday, August 29, 2008

Python Magazine for August 2008



The August 2008 issue of Python Magazine is available for download now.

The cover story this month is from Greg Pinero and talks about http://www.utilitymill.com/. If you don't know about Utility Mill, you should check it out. I don't think there's an easier way to create a web utility. Greg explains some of the security and performance issues he encountered creating the site and walks you though posting your own app.

In Drawing Presentable Trees, Bill Mill describes the evolution of tree rendering approaches, with plenty of graph traversal algorithms for us comp sci geeks.

Juri Pakaste explains how to make your code more loosely coupled in Using Dependency Injection in Python. I'm looking forward to applying these ideas in some code I'm working on right now to make it easier to test.

And wrapping up the feature list this month is Paul McGuire with Advanced Pyparsing: Implementing a JSON Parser Using Results Names. Paul shows how to parse complex structures and extract only the useful information without twisting yourself into knots.

Jesse Noller's column recounts some of his experiences getting into Python core development. He gives a brief tour of the code base and offers tips to avoid some of the issues he ran into.

Mark Mruss looks at the wxWidgets and wxPython libraries for creating GUI application. And Steve Holden introduces the Board of Directors of the Python Software Foundation and talks about the role of the PSF.

Grab your copy today, and as always direct questions or comments to "doug dot hellmann at pythonmagazine dot com".

Thursday, August 28, 2008

PyWorks 2008 Conference Schedule



Brian has posted the schedule for PyWorks 2008. He has put together a good mix of topics, including the cross-over with the php|works side of the conference. The presenter list includes some familiar names, and some new ones, too. Now if I could only figure out how to be in 2 places at the same time, so I could listen to Kevin Dangoor talk about ZODB and Jacob Taylor's presentation on artificial intelligence. Maybe Guido will let me borrow the time machine if I ask nicely.

Join us in Atlanta on Nov 12-14, 2008 for the first annual PyWorks conference.

Sunday, August 17, 2008

PyMOTW: signal

Receive notification of asynchronous system events with the signal module.

Module: signal
Purpose: Handle asynchronous events.
Python Version: 1.4 and later

Description:

Programming with Unix signal handlers is a non-trivial endeavor. This is an introduction, and does not include all of the details you may need to use signals successfully on every platform. There is some degree of standardization across versions of Unix, but there is also some variation, so consult documentation for your OS if you run into trouble.

Signals are a means of notifying your program of an event, and having it handled asynchronously. They can be generated by the system itself, or sent from one process to another. Since signals interrupt the regular flow of your program, it is possible that some operations (especially I/O) may produce error if a signal is received in the middle.

Signals are identified by integers and are defined in the operating system C headers. Python defines the signals appropriate for the platform as symbols in the signal module. For the examples below, I will use SIGINT and SIGUSR1. Both are typically defined for all Unix and Unix-like systems.

Receiving Signals:

As with other forms of event-based programming, signals are received by establishing a callback function, called a signal handler, that is invoked when the signal occurs. The arguments to your signal handler are the signal number and the stack frame from the point in your program that was interrupted by the signal.

import signal
import os
import time

def receive_signal(signum, stack):
print 'Received:', signum

signal.signal(signal.SIGUSR1, receive_signal)
signal.signal(signal.SIGUSR2, receive_signal)

print 'My PID is:', os.getpid()

while True:
print 'Waiting...'
time.sleep(3)


This relatively simple example script loops indefinitely, pausing for a few seconds each time. When a signal comes in, the sleep call is interrupted and the signal handler receive_signal() prints the signal number. When the signal handler returns, the loop continues.

To send signals to the running program, I use the command line program kill. To produce the output below, I ran signal_signal.py in one window, then kill -USR1 $pid, kill -USR2 $pid, and kill -INT $pid in another.

$ python signal_signal.py 
My PID is: 71387
Waiting...
Waiting...
Waiting...
Received: 30
Waiting...
Waiting...
Received: 31
Waiting...
Waiting...
Traceback (most recent call last):
File "signal_signal.py", line 25, in
time.sleep(3)
KeyboardInterrupt


getsignal():

To see what signal handlers are registered for a signal, use getsignal(). Pass the signal number as argument. The return value is the registered handler, or one of the special values signal.SIG_IGN (if the signal is being ignored), signal.SIG_DFL (if the default behavior is being used), or None (if the existing signal handler was registered from C, rather than Python).

import signal
import pprint

def alarm_received(n, stack):
return

signal.signal(signal.SIGALRM, alarm_received)

signals_to_names = {}
for n in dir(signal):
if n.startswith('SIG') and not n.startswith('SIG_'):
signals_to_names[getattr(signal, n)] = n

for s in xrange(1, signal.NSIG):
name = signals_to_names[s]
handler = signal.getsignal(s)
if handler is signal.SIG_DFL:
handler = 'SIG_DFL'
elif handler is signal.SIG_IGN:
handler = 'SIG_IGN'
print '%-10s (%2d):' % (name, s), handler


Again, since each OS may have different signals defined, the output you see from running this on other systems may vary. This is from OS X:


$ python signal_getsignal.py
SIGHUP ( 1): SIG_DFL
SIGINT ( 2): <built-in function default_int_handler>
SIGQUIT ( 3): SIG_DFL
SIGILL ( 4): SIG_DFL
SIGTRAP ( 5): SIG_DFL
SIGIOT ( 6): SIG_DFL
SIGEMT ( 7): SIG_DFL
SIGFPE ( 8): SIG_DFL
SIGKILL ( 9): None
SIGBUS (10): SIG_DFL
SIGSEGV (11): SIG_DFL
SIGSYS (12): SIG_DFL
SIGPIPE (13): SIG_IGN
SIGALRM (14): <function alarm_received at 0x7c3f0>
SIGTERM (15): SIG_DFL
SIGURG (16): SIG_DFL
SIGSTOP (17): None
SIGTSTP (18): SIG_DFL
SIGCONT (19): SIG_DFL
SIGCHLD (20): SIG_DFL
SIGTTIN (21): SIG_DFL
SIGTTOU (22): SIG_DFL
SIGIO (23): SIG_DFL
SIGXCPU (24): SIG_DFL
SIGXFSZ (25): SIG_IGN
SIGVTALRM (26): SIG_DFL
SIGPROF (27): SIG_DFL
SIGWINCH (28): SIG_DFL
SIGINFO (29): SIG_DFL
SIGUSR1 (30): SIG_DFL
SIGUSR2 (31): SIG_DFL


Sending Signals:

The function for sending signals is os.kill(). Its use is covered in the PyMOTW article covering the os module.

Alarms:

Alarms are a somewhat special sort of signal, where your program asks the OS to notify it after some period of time has elapsed. As the standard module documentation points out, this is useful for avoiding blocking indefinitely on an I/O operation or other system call.

import signal
import time

def receive_alarm(signum, stack):
print 'Alarm :', time.ctime()

# Call receive_alarm in 2 seconds
signal.signal(signal.SIGALRM, receive_alarm)
signal.alarm(2)

print 'Before:', time.ctime()
time.sleep(4)
print 'After :', time.ctime()


In this example, the call to sleep() does not last the full 4 seconds.


$ python signal_alarm.py
Before: Sun Aug 17 10:51:09 2008
Alarm : Sun Aug 17 10:51:11 2008
After : Sun Aug 17 10:51:11 2008


Ignoring Signals:

To ignore a signal, register SIG_IGN as the handler. This script replaces the default handler for SIGINT with SIG_IGN, and registers a handler for SIGUSR1. Then it uses signal.pause() to wait for a signal to be received.

import signal
import os
import time

def do_exit(sig, stack):
raise SystemExit('Exiting')

signal.signal(signal.SIGINT, signal.SIG_IGN)
signal.signal(signal.SIGUSR1, do_exit)

print 'My PID:', os.getpid()

signal.pause()


Normally SIGINT (the signal sent by the shell to your program when you hit Ctrl-C) raises a KeyboardInterrupt. In this case, we ignore SIGINT and raise SystemExit when we see SIGUSR1. Each ^C represents an attempt to use Ctrl-C to kill the script from the terminal. Using kill -USR1 72531 from another terminal eventually causes the script to exit.

$ python signal_ignore.py 
My PID: 72598
^C^C^C^CExiting


Signals and Threads:

Signals and threads don't generally mix well. Only the main thread of a process will receive signals, so it is not generally useful to try to use them in threads. The following example sets up a signal handler, waits for the signal in one thread, and sends the signal from another.

import signal
import threading
import os
import time

def signal_handler(num, stack):
print 'Received signal %d in %s' % (num, threading.currentThread())

signal.signal(signal.SIGUSR1, signal_handler)

def wait_for_signal():
print 'Waiting for signal in', threading.currentThread()
signal.pause()
print 'Done waiting'

# Start a thread that will not receive the signal
receiver = threading.Thread(target=wait_for_signal, name='receiver')
receiver.start()
time.sleep(0.1)

def send_signal():
print 'Sending signal in', threading.currentThread()
os.kill(os.getpid(), signal.SIGUSR1)

sender = threading.Thread(target=send_signal, name='sender')
sender.start()
sender.join()

# Wait for the thread to see the signal (not going to happen!)
print 'Waiting for', receiver
signal.alarm(2)
receiver.join()


Notice that the signal handlers were all registered in the main thread. This is a requirement of the signal module implementation for Python, regardless of underlying platform support for mixing threads and signals. Although the receiver thread calls signal.pause(), it does not receive the signal. The signal.alarm(2) call near the end of the example prevents an infinite block, since the receiver thread will never exit.

$ python signal_threads.py 
Waiting for signal in <Thread(receiver, started)>
Sending signal in <Thread(sender, started)>
Received signal 30 in <_MainThread(MainThread, started)>
Waiting for <Thread(receiver, started)>
Alarm clock


Although alarms can be set in threads, they are also received by the main thread.

import signal
import time
import threading

def signal_handler(num, stack):
print time.ctime(), 'Alarm in', threading.currentThread()

signal.signal(signal.SIGALRM, signal_handler)

def use_alarm():
print time.ctime(), 'Setting alarm in', threading.currentThread()
signal.alarm(1)
print time.ctime(), 'Sleeping in', threading.currentThread()
time.sleep(3)
print time.ctime(), 'Done with sleep'

# Start a thread that will not receive the signal
alarm_thread = threading.Thread(target=use_alarm, name='alarm_thread')
alarm_thread.start()
time.sleep(0.1)

# Wait for the thread to see the signal (not going to happen!)
print time.ctime(), 'Waiting for', alarm_thread
alarm_thread.join()

print time.ctime(), 'Exiting normally'


Notice that the alarm does not abort the sleep() call in use_alarm().


$ python signal_threads_alarm.py
Sun Aug 17 12:06:00 2008 Setting alarm in <Thread(alarm_thread, started)>
Sun Aug 17 12:06:00 2008 Sleeping in <Thread(alarm_thread, started)>
Sun Aug 17 12:06:00 2008 Waiting for <Thread(alarm_thread, started)>
Sun Aug 17 12:06:03 2008 Done with sleep
Sun Aug 17 12:06:03 2008 Alarm in <_MainThread(MainThread, started)>
Sun Aug 17 12:06:03 2008 Exiting normally


References:

Python Module of the Week Home
Download Sample Code


Technorati Tags:
,




[Updated 19 Aug to point to release 1.66.1, which includes a fix for the logic bug pointed out by Ernesto in comments.]

Python Documentation Power-User Tip

If you find yourself referencing the Python standard library documentation a lot while you're programming, you should set up a keyword bookmark in Firefox. I haven't seen this feature talked about very much, so maybe everyone just knows about it, but I find it saves me a ton of time so I wanted to share.

Keyword Bookmarks:

Keywords bookmarks are just like regular bookmarks, but have a short identifying word associated with them. Instead of hunting through your bookmark list, you can just type the word into the Firefox URL field at the top of your window.

Here's a regular bookmark to the module index of the standard library:

standard bookmark

If I add "modules" to the keywords field, like this:

keyword bookmark

then when I type "modules" into the URL field, Firefox takes me to http://docs.python.org/lib/modindex.html. No more hunting around in my bookmarks!


Smart Keyword:

Adding the keyword is only the first step. It's also easy to set up a smart keyword (a keyword bookmark that takes an argument) and then provide that argument when you use the keyword. It's almost like having a command line for the web right in your browser. Here's how you do it:

1. Bookmark a sample page, such as http://docs.python.org/lib/module-compiler.html.

2. Edit the properties for the bookmark.

3. Add a keyword, such as "pydoc".

4. Replace "compiler" with "%s":

keyword bookmark with argument

5. Save the changes.

Now when you type something like "pydoc compiler" in the URL bar, the browser will go directly to the doc page for that module.

Quicksilver:

If you are on a Mac, Firefox keyword bookmarks also work with Quicksilver.

Regular keyword bookmarks show up in Quicksilver searches, so you can type Cmd-Space, "modules", Return and Firefox opens the module index. If you use the "pydoc" keyword, Quicksilver will prompt you for the argument before launching the browser. So using the bookmark we created above, a documentation lookup is:

Cmd-Space, "pydoc"

quicksilver keyword bookmark

Return, "compiler",

quicksilver keyword bookmark argument

Return, and wait for the new browser window to show the documentation.

Sunday, August 10, 2008

Python Module of the Week, meet reST and Sphinx

Release 1.65 of the Python Module of the Week includes HTML documentation created with Sphinx from reStructuredText versions of all of the posts so far. The documentation is included in the source download, or you can browse it online from the PyMOTW home page.

I have been a long time StructuredText user, using it with Zope and some home-grown tools to produce DocBook output for dead-tree documentation. When reST was being created, I dismissed it as comparatively ugly and overly complicated. The quality of the toolset surrounding it now makes it a very attractive alternative for generating documentation, especially if you need to produce different versions or multiple output formats. After seeing the power and features it has that regular ST doesn't, I'm a convert.

A big "Thank you!" goes out to John Benediktsson for doing the original HTML-to-reST conversion. It would have taken me ages to do it myself, so if it was left up to me alone it probably never would have been done.

The new versions of the docs online on the PyMOTW home page are in a form that should be easier to browse than my blog archives. There is still some work to be done to make the content consistent, mostly due to my writing style and approach evolving over time. I'll be tackling those updates bit by bit, and converting the 2-3 modules that haven't been converted at all yet. I am releasing what I have now because I wanted to finish the major portion of the migration this weekend.

The Sphinx development team also deserves a big "Thank you!" from me. I've been using Sphinx at my day job to produce some documentation, and found that I liked it enough that it was the obvious choice for the PyMOTW site conversion. Incredibly, the Django base template I use for the rest of my site worked the first time the Jinja template engine used by Sphinx. I had to clean up some styles, but the template didn't bomb out and produced good HTML the first time.

Sunday, August 3, 2008

PyMOTW: webbrowser

Use the webbrowser module to display web pages to your users.

Module: webbrowser
Purpose: Open web pages in a browser.
Python Version: 2.1.3 and later

Description:

The webbrowser module includes functions to open URLs in interactive browser applications. The module includes a registry of available browsers, in case multiple options are available on the system. It can also be controlled with the BROWSER environment variable.

Simple Example:

To open a page in the browser, use the open() function.

import webbrowser

webbrowser.open('http://docs.python.org/lib/module-webbrowser.html')


The URL is opened in a window and that window is raised to the top of the window stack. The docs say that an existing window will be reused, if possible. On my Mac, with Firefox, a new window was always created. YMMV.

Windows vs. Tabs:

If you always want a new window used, use open_new().

import webbrowser

webbrowser.open_new('http://docs.python.org/lib/module-webbrowser.html')


If you would rather create a new tab, use open_new_tab() instead.

Using a specific browser:

If for some reason your application needs to use a specific browser, you can access the set of registered browser controllers using the get() function. The browser controller has methods to open(), open_new(), and open_new_tab(). This example forces the use of the lynx browser:

import webbrowser

b = webbrowser.get('lynx')
b.open('http://docs.python.org/lib/module-webbrowser.html')


Refer to the module documentation for a list of available browser types.

BROWSER variable:

Users can control the module from outside your application by setting the BROWSER environment variable to a sequence of browser names or commands. The value should consist of a series of browser names separated by os.pathsep. If the name includes %s, the name is interpreted as a literal command and executed directly with the %s replaced by the URL. Otherwise, the name is passed to get() to obtain a controller object from the registry.

For example, this command opens the web page in lynx, assuming it is available, no matter what other browsers are registered.

$ BROWSER=lynx python webbrowser_open.py 


If none of the names in BROWSER work, webbrowser falls back to its default behavior.

Command Line Interface:

All of the features of the webbrowser module are available via the command line as well as from within your Python program.

$ python -m webbrowser   
Usage: /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/webbrowser.py [-n | -t] url
-n: open new window
-t: open new tab


References:

Python Module of the Week Home
Download Sample Code


Technorati Tags:
,


Saturday, August 2, 2008

need help with sphinx and LaTeX

Dear Lazy Web,

I've started using sphinx to produce some documentation at work. The HTML output looks good, and I have the templating system figured out so I can change it to look the way we want. We also want to produce a PDF, and that's where I'm stuck. It looks like I need to go through LaTeX, then convert that to PDF. I'm a complete neophyte when it comes to TeX, though, so I'm not even sure where to start.

When I search for things like "converting TeX to PDF", I find some old posts (c. 2005) about how some tools use bitmapped fonts and "look terrible" or vague instructions like "Convert your TeX file to dvi in the usual way." I don't have a usual way, yet, though so that doesn't help me.

Can someone suggest a useful reference manual or starting point for me for using LaTeX under Linux?

I don't actually care about LaTeX, so if there's some other way to get nice looking PDF output from a Sphinx document tree, that information would be helpful, too.

Thanks in advance!