Sunday, August 30, 2009

PyMOTW: decimal - Fixed and floating point math

decimal – Fixed and floating point math

Purpose:Decimal arithmetic using fixed and floating point numbers
Python Version:2.4 and later

The decimal module implements fixed and floating point arithmetic using the model familiar to most people, rather than the IEEE floating point version implemented by most computer hardware. A Decimal instance can represent any number exactly, round up or down, and apply a limit to the number of significant digits.

Decimal

Decimal values are represented as instances of the Decimal class. The constructor takes as argument an integer, or a string. Floating point numbers must be converted to a string before being used to create a Decimal, letting the caller explicitly deal with the number of digits for values that cannot be expressed exactly using hardware floating point representations.

import decimal

fmt = '{0:<20} {1:<20}'
print fmt.format('Input', 'Output')
print fmt.format('-' * 20, '-' * 20)

# Integer
print fmt.format(5, decimal.Decimal(5))

# String
print fmt.format('3.14', decimal.Decimal('3.14'))

# Float
print fmt.format(repr(0.1), decimal.Decimal(str(0.1)))

Notice that the floating point value of 0.1 is not represented as an exact value, so the representation as a float is different from the Decimal value.

$ python decimal_create.py
Input Output
-------------------- --------------------
5 5
3.14 3.14
0.10000000000000001 0.1

Less conveniently, Decimals can also be created from tuples containing a sign flag (0 for positive, 1 for negative), a tuple of digits, and an integer exponent.

import decimal

# Tuple
t = (1, (1, 1), -2)
print 'Input :', t
print 'Decimal:', decimal.Decimal(t)
$ python decimal_tuple.py
Input : (1, (1, 1), -2)
Decimal: -0.11

Arithmetic

Decimal overloads the simple arithmetic operators so once you have a value you can manipulate it in much the same way as the built-in numeric types.

import decimal

a = decimal.Decimal('5.1')
b = decimal.Decimal('3.14')
c = 4
d = 3.14

print 'a =', a
print 'b =', b
print 'c =', c
print 'd =', d
print

print 'a + b =', a + b
print 'a - b =', a - b
print 'a * b =', a * b
print 'a / b =', a / b
print

print 'a + c =', a + c
print 'a - c =', a - c
print 'a * c =', a * c
print 'a / c =', a / c
print

print 'a + d =',
try:
print a + d
except TypeError, e:
print e

Decimal operators also accept integer arguments, but floating point values must be converted to Decimal instances.

$ python decimal_operators.py
a = 5.1
b = 3.14
c = 4
d = 3.14

a + b = 8.24
a - b = 1.96
a * b = 16.014
a / b = 1.624203821656050955414012739

a + c = 9.1
a - c = 1.1
a * c = 20.4
a / c = 1.275

a + d = unsupported operand type(s) for +: 'Decimal' and 'float'

Logarithms

Beyond basic arithmetic, Decimal includes methods to find the base 10 and natural logarithms.

import decimal

d = decimal.Decimal(100)
print 'd :', d
print 'log10 :', d.log10()
print 'ln :', d.ln()
$ python decimal_log.py
d : 100
log10 : 2
ln : 4.605170185988091368035982909

Special Values

In addition to the expected numerical values, Decimal can represent several special values, including positive and negative values for infinity, “not a number”, and zero.

import decimal

for value in [ 'Infinity', 'NaN', '0' ]:
print decimal.Decimal(value), decimal.Decimal('-' + value)
print

# Math with infinity
print 'Infinity + 1:', (decimal.Decimal('Infinity') + 1)
print '-Infinity + 1:', (decimal.Decimal('-Infinity') + 1)

# Print comparing NaN
print decimal.Decimal('NaN') == decimal.Decimal('Infinity')
print decimal.Decimal('NaN') != decimal.Decimal(1)

Adding to infinite values returns another infinite value. Comparing for equality with NaN always returns False and comparing for inequality always returns true. Comparing for sort order against NaN is undefined and results in an error.

$ python decimal_special.py
Infinity -Infinity
NaN -NaN
0 -0

Infinity + 1: Infinity
-Infinity + 1: -Infinity
False
True

Context

So far all of the examples have used the default behaviors of the decimal module. It is possible to override settings such as the precision maintained, how rounding is performed, error handling, etc. All of these settings are maintained via a context. Contexts can be applied for all Decimal instances in a thread or locally within a small code region.

Current Context

To retrieve the current global context, use getcontext().

import decimal

print decimal.getcontext()
$ python decimal_getcontext.py
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1, flags=[], traps=[Overflow, InvalidOperation, DivisionByZero])

Precision

The prec attribute of the context controls the precision maintained for new values created as a result of arithmetic. Literal values are maintained as described.

import decimal

d = decimal.Decimal('0.123456')

for i in range(4):
decimal.getcontext().prec = i
print i, ':', d, d * 1
$ python decimal_precision.py
0 : 0.123456 0
1 : 0.123456 0.1
2 : 0.123456 0.12
3 : 0.123456 0.123

Rounding

There are several options for rounding to keep values within the desired precision.

ROUND_CEILING
Always round upwards towards infinity.
ROUND_DOWN
Always round toward zero.
ROUND_FLOOR
Always round down towards negative infinity.
ROUND_HALF_DOWN
Rounds away from zero if the last significant digit is greater than or equal to 5, otherwise toward zero.
ROUND_HALF_EVEN
Like ROUND_HALF_DOWN except that if the value is 5 then the preceding digit is examined. Even values cause the result to be rounded down and odd digits cause the result to be rounded up.
ROUND_HALF_UP
Like ROUND_HALF_DOWN except if the last significant digit is 5 the value is rounded away from zero.
ROUND_UP
Round away from zero.
ROUND_05UP
Round away from zero if the last digit is 0 or 5, otherwise towards zero.
import decimal

context = decimal.getcontext()

ROUNDING_MODES = [
'ROUND_CEILING',
'ROUND_DOWN',
'ROUND_FLOOR',
'ROUND_HALF_DOWN',
'ROUND_HALF_EVEN',
'ROUND_HALF_UP',
'ROUND_UP',
'ROUND_05UP',
]

header_fmt = '{0:20} {1:^10} {2:^10} {3:^10}'

print 'POSITIVES:'
print

print header_fmt.format(' ', '1/8 (1)', '1/8 (2)', '1/8 (3)')
print header_fmt.format(' ', '-' * 10, '-' * 10, '-' * 10)
for rounding_mode in ROUNDING_MODES:
print '{0:20}'.format(rounding_mode),
for precision in [ 1, 2, 3 ]:
context.prec = precision
context.rounding = getattr(decimal, rounding_mode)
value = decimal.Decimal(1) / decimal.Decimal(8)
print '{0:<10}'.format(value),
print

print
print 'NEGATIVES:'

print header_fmt.format(' ', '-1/8 (1)', '-1/8 (2)', '-1/8 (3)')
print header_fmt.format(' ', '-' * 10, '-' * 10, '-' * 10)
for rounding_mode in ROUNDING_MODES:
print '{0:20}'.format(rounding_mode),
for precision in [ 1, 2, 3 ]:
context.prec = precision
context.rounding = getattr(decimal, rounding_mode)
value = decimal.Decimal(-1) / decimal.Decimal(8)
print '{0:<10}'.format(value),
print
$ python decimal_rounding.py
POSITIVES:

1/8 (1) 1/8 (2) 1/8 (3)
---------- ---------- ----------
ROUND_CEILING 0.2 0.13 0.125
ROUND_DOWN 0.1 0.12 0.125
ROUND_FLOOR 0.1 0.12 0.125
ROUND_HALF_DOWN 0.1 0.12 0.125
ROUND_HALF_EVEN 0.1 0.12 0.125
ROUND_HALF_UP 0.1 0.13 0.125
ROUND_UP 0.2 0.13 0.125
ROUND_05UP 0.1 0.12 0.125

NEGATIVES:
-1/8 (1) -1/8 (2) -1/8 (3)
---------- ---------- ----------
ROUND_CEILING -0.1 -0.12 -0.125
ROUND_DOWN -0.1 -0.12 -0.125
ROUND_FLOOR -0.2 -0.13 -0.125
ROUND_HALF_DOWN -0.1 -0.12 -0.125
ROUND_HALF_EVEN -0.1 -0.12 -0.125
ROUND_HALF_UP -0.1 -0.13 -0.125
ROUND_UP -0.2 -0.13 -0.125
ROUND_05UP -0.1 -0.12 -0.125

Local Context

Using Python 2.5 or later you can also apply the context to a subset of your code using a with statement and context manager.

import decimal

with decimal.localcontext() as c:
c.prec = 2
print 'Local precision:', c.prec
print '3.14 / 3 =', (decimal.Decimal('3.14') / 3)

print
print 'Default precision:', decimal.getcontext().prec
print '3.14 / 3 =', (decimal.Decimal('3.14') / 3)
$ python decimal_context_manager.py
Local precision: 2
3.14 / 3 = 1.0

Default precision: 28
3.14 / 3 = 1.046666666666666666666666667

Per-Instance Context

Contexts can be used to construct Decimal instances, applying the precision and rounding arguments to the conversion from the input type. This lets your application select the precision of constant values separately from the precision of user data.

import decimal

# Set up a context with limited precision
c = decimal.getcontext().copy()
c.prec = 3

# Create our constant
pi = c.create_decimal('3.1415')

# The constant value is rounded off
print 'PI:', pi

# The result of using the constant uses the global context
print 'RESULT:', decimal.Decimal('2.01') * pi
$ python decimal_instance_context.py
PI: 3.14
RESULT: 6.3114

Threads

The “global” context is actually thread-local, so each thread can potentially be configured using different values.

import decimal
import threading
from Queue import Queue

class Multiplier(threading.Thread):
def __init__(self, a, b, prec, q):
self.a = a
self.b = b
self.prec = prec
self.q = q
threading.Thread.__init__(self)
def run(self):
c = decimal.getcontext().copy()
c.prec = self.prec
decimal.setcontext(c)
self.q.put( (self.prec, a * b) )
return

a = decimal.Decimal('3.14')
b = decimal.Decimal('1.234')
q = Queue()
threads = [ Multiplier(a, b, i, q) for i in range(1, 6) ]
for t in threads:
t.start()

for t in threads:
t.join()

for i in range(5):
prec, value = q.get()
print prec, '\t', value
$ python decimal_thread_context.py
1 4
2 3.9
3 3.87
4 3.875
5 3.8748

See also

decimal
The standard library documentation for this module.
Wikipedia: Floating Point
Article on floating point representations and arithmetic.

PyMOTW Home

The canonical version of this article

Sunday, August 23, 2009

Book Review: Python Essential Reference, Fourth Edition



Disclosure: I received a copy of this book for free from Addison-Wesley as part of the PyATL Book Club.

I have a copy of the first edition of the Python Essential Reference that I picked up at IPC 8 back in 2000. It's largely out of date by now, given that it covered Python 1.5.2. But at the time it was one of the few books I always kept close at hand for easy reference. Over time my reference habits evolved away from paper references in favor of online materials. Today I cleared a little space on my desk for the fourth edition of PER by David Beazley, updated to cover Python 2.6 and 3.0.

Pound for pound:

Just a little space, mind you, because the book is quite compact (717 pages in 6" x 9" x 1", easily portable in a backpack or briefcase). This book, diminutive though it may be, has more information of direct use to Python programmers than many of the War and Peace-sized tomes you'll find elsewhere. If David keeps adding material at this rate, I'm going to need a magnifying glass for the next edition.

The book is organized into three main sections: Language, Library, and Extending and Embedding. There is a comprehensive index and the chapter sequence places related information close together. You will not find yourself flipping back and forth between an early "prose" chapter to a later "reference" section.

Language:

The language section can serve as a reference guide for Python, though I think the first chapter title "Tutorial" is a little optimistic based on the brevity. To be fair, the preface states right up front that the book is not intended to be an introductory text.

This is not a book for learning Python. It is a book for writing Python.


Library:

The coverage of the standard library is where PER really shines. I have a certain amount of interest in documenting the Python standard library myself, so I was especially keen to review the material here. I found it up to date, clearly explained, and detailed. There is not a lot of sample code, but it is not entirely devoid of examples. In most cases, the prose descriptions are sufficient and eliminating code samples let David maintain a readable style without adding filler material.

I thought I had internalized most of this material long ago, but I learned a few things by re-reading it.


As the title implies, this is not an exhaustive reference guide. It covers the essential information that will be useful to the most readers. As a result, some of the modules are covered in less depth than others. However, I tend to agree with where focus is placed. For example, much more space is given to working with sqlite3 and databases in general than some of the more esoteric modules like dis. The ast module doesn't appear at all.

Extending and Embedding:

The Extending and Embedding section is one area where plenty of example code is provided. Three techniques for creating extension modules are covered: hand coding, ctypes, and SWIG (no surprise, since SWIG is popular and was written by the author). Examples and commentary are provided for all three approaches.

Going the other direction, embedding an interpreter in another application, is also explained. All of the functions from the Python library useful to someone trying to make their application scriptable are listed and described, with some basic examples showing how to communicate between the interpreter and your main application.

Recommendation:

Due to the reference style, this should not be your first Python book. It should absolutely be your second.

PyMOTW: dis - Python Bytecode Disassembler

dis – Python Bytecode Disassembler

Purpose:Convert code objects to a human-readable representation of the bytecodes for analysis.
Python Version:1.4 and later

The dis module includes functions for working with Python bytecode by “disassembling” it into a more human-readable form. Reviewing the bytecodes being executed by the interpreter is a good way to hand-tune tight loops and perform other kinds of optimizations. It is also useful for finding race conditions in multi-threaded applications, since you can estimate where in your code thread control may switch.

Basic Disassembly

The function dis.dis() prints the disassembled representation of a Python code source (module, class, method, function, or code object). We can disassemble a module such as:

1
2
3
4
#!/usr/bin/env python
# encoding: utf-8

my_dict = { 'a':1 }

by running dis from the command line. The output is organized into columns with the original source line number, the instruction “address” within the code object, the opcode name, and any arguments passed to the opcode.

$ python -m dis dis_simple.py
4 0 BUILD_MAP 1
3 LOAD_CONST 0 (1)
6 LOAD_CONST 1 ('a')
9 STORE_MAP
10 STORE_NAME 0 (my_dict)
13 LOAD_CONST 2 (None)
16 RETURN_VALUE

In this case, the source translates to 5 different operations to create and populate the dictionary, then save the results to a local variable. Since the Python interpreter is stack-based, the first steps are to put the constants onto the stack in the correct order with LOAD_CONST, and then use STORE_MAP to pop off the new key and value to be added to the dictionary. The resulting object is bound to the name “my_dict” with STORE_NAME.

Disassembling Functions

Unfortunately, disassembling the entire module does not recurse into functions automatically. For example, if we disassemble this module:

 1
2
3
4
5
6
7
8
9
10
#!/usr/bin/env python
# encoding: utf-8

def f(*args):
nargs = len(args)
print nargs, args

if __name__ == '__main__':
import dis
dis.dis(f)

the results show loading the code object onto the stack and then turning it into a function (LOAD_CONST, MAKE_FUNCTION), but not the body of the function.

$ python -m dis dis_function.py
4 0 LOAD_CONST 0 (<code object f at 0xd2218, file "dis_function.py", line 4>)
3 MAKE_FUNCTION 0
6 STORE_NAME 0 (f)

8 9 LOAD_NAME 1 (__name__)
12 LOAD_CONST 1 ('__main__')
15 COMPARE_OP 2 (==)
18 JUMP_IF_FALSE 29 (to 50)
21 POP_TOP

9 22 LOAD_CONST 2 (-1)
25 LOAD_CONST 3 (None)
28 IMPORT_NAME 2 (dis)
31 STORE_NAME 2 (dis)

10 34 LOAD_NAME 2 (dis)
37 LOAD_ATTR 2 (dis)
40 LOAD_NAME 0 (f)
43 CALL_FUNCTION 1
46 POP_TOP
47 JUMP_FORWARD 1 (to 51)
>> 50 POP_TOP
>> 51 LOAD_CONST 3 (None)
54 RETURN_VALUE

To see inside the function, we need to pass it to dis.dis().

$ python dis_function.py
5 0 LOAD_GLOBAL 0 (len)
3 LOAD_FAST 0 (args)
6 CALL_FUNCTION 1
9 STORE_FAST 1 (nargs)

6 12 LOAD_FAST 1 (nargs)
15 PRINT_ITEM
16 LOAD_FAST 0 (args)
19 PRINT_ITEM
20 PRINT_NEWLINE
21 LOAD_CONST 0 (None)
24 RETURN_VALUE


Classes

You can also pass classes to dis, in which case all of the methods are disassembled in turn.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/usr/bin/env python
# encoding: utf-8

import dis

class MyObject(object):
"""Example for dis."""

CLASS_ATTRIBUTE = 'some value'

def __init__(self, name):
self.name = name

def __str__(self):
return 'MyObject(%s)' % self.name

dis.dis(MyObject)
$ python dis_class.py
Disassembly of __init__:
12 0 LOAD_FAST 1 (name)
3 LOAD_FAST 0 (self)
6 STORE_ATTR 0 (name)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE

Disassembly of __str__:
15 0 LOAD_CONST 1 ('MyObject(%s)')
3 LOAD_FAST 0 (self)
6 LOAD_ATTR 0 (name)
9 BINARY_MODULO
10 RETURN_VALUE

Using Disassembly to Debug

Sometimes when debugging an exception it can be useful to see which bytecode caused a problem. There are a couple of ways to disassemble the code around an error.

The first is by using dis.dis() in the interactive interpreter to report about the last exception. If no argument is passed to dis, then it looks for an exception and shows the disassembly of the most top of the stack that caused it.

$ python
Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> j = 4
>>> i = i + 4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'i' is not defined
>>> dis.distb()
1 --> 0 LOAD_NAME 0 (i)
3 LOAD_CONST 0 (4)
6 BINARY_ADD
7 STORE_NAME 0 (i)
10 LOAD_CONST 1 (None)
13 RETURN_VALUE
>>>

Notice the --> indicating the opcode that caused the error. There is no “i” variable defined, so the value associated with the name can’t be loaded onto the stack.

Within your code you can also print the information about an active traceback by passing it to dis.distb() directly. In this example, there is a DivideByZero exception, but since the formula has two divisions it isn’t clear which part is zero.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/usr/bin/env python
# encoding: utf-8

i = 1
j = 0
k = 3

# ... many lines removed ...

try:
result = k * (i / j) + (i / k)
except:
import dis
import sys
exc_type, exc_value, exc_tb = sys.exc_info()
dis.distb(exc_tb)

The bad value is easy to spot when it is loaded onto the stack in the disassembled version. The bad operation is highlighted with the -->, and we just need to look up a few lines higher to find where i‘s 0 value was pushed onto the stack.

$ python dis_traceback.py
4 0 LOAD_CONST 0 (1)
3 STORE_NAME 0 (i)

5 6 LOAD_CONST 1 (0)
9 STORE_NAME 1 (j)

6 12 LOAD_CONST 2 (3)
15 STORE_NAME 2 (k)

10 18 SETUP_EXCEPT 26 (to 47)

11 21 LOAD_NAME 2 (k)
24 LOAD_NAME 0 (i)
27 LOAD_NAME 1 (j)
--> 30 BINARY_DIVIDE
31 BINARY_MULTIPLY
32 LOAD_NAME 0 (i)
35 LOAD_NAME 2 (k)
38 BINARY_DIVIDE
39 BINARY_ADD
40 STORE_NAME 3 (result)
43 POP_BLOCK
44 JUMP_FORWARD 65 (to 112)

12 >> 47 POP_TOP
48 POP_TOP
49 POP_TOP

13 50 LOAD_CONST 3 (-1)
53 LOAD_CONST 4 (None)
56 IMPORT_NAME 4 (dis)
59 STORE_NAME 4 (dis)

14 62 LOAD_CONST 3 (-1)
65 LOAD_CONST 4 (None)
68 IMPORT_NAME 5 (sys)
71 STORE_NAME 5 (sys)

15 74 LOAD_NAME 5 (sys)
77 LOAD_ATTR 6 (exc_info)
80 CALL_FUNCTION 0
83 UNPACK_SEQUENCE 3
86 STORE_NAME 7 (exc_type)
89 STORE_NAME 8 (exc_value)
92 STORE_NAME 9 (exc_tb)

16 95 LOAD_NAME 4 (dis)
98 LOAD_ATTR 10 (distb)
101 LOAD_NAME 9 (exc_tb)
104 CALL_FUNCTION 1
107 POP_TOP
108 JUMP_FORWARD 1 (to 112)
111 END_FINALLY
>> 112 LOAD_CONST 4 (None)
115 RETURN_VALUE


Performance Analysis of Loops

Aside from debugging errors, dis can also help you identify performance issues in your code. Examining the disassembled code is especially useful with tight loops where the number of exposed Python instructions is low but they translate to an inefficient set of bytecodes. We can see how the disassembly is helpful by examining a few different implementations of a class, Dictionary, that reads a list of words and groups them by their first letter.

First, the test driver application:

import dis
import sys
import timeit

module_name = sys.argv[1]
module = __import__(module_name)
Dictionary = module.Dictionary

dis.dis(Dictionary.load_data)
print
t = timeit.Timer(
'd = Dictionary(words)',
"""from %(module_name)s import Dictionary
words = [l.strip() for l in open('/usr/share/dict/words', 'rt')]
""" % locals()
)
iterations = 10
print 'TIME: %0.4f' % (t.timeit(iterations)/iterations)

We can use dis_test_loop.py to run each incarnation of the Dictionary class.

A straightforward implementation of Dictionary might look something like:

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/usr/bin/env python
# encoding: utf-8

class Dictionary(object):

def __init__(self, words):
self.by_letter = {}
self.load_data(words)

def load_data(self, words):
for word in words:
try:
self.by_letter[word[0]].append(word)
except KeyError:
self.by_letter[word[0]] = [word]

The output shows this version taking 0.1074 seconds to load the 234936 words in my copy of /usr/share/dict/words on OS X. That’s not too bad, but as you can see from the disassembly below, the loop is doing more work than it needs to. As it enters the loop in opcode 13, it sets up an exception context (SETUP_EXCEPT). Then it takes 6 opcodes to find self.by_letter[word[0]] before appending word to the list. If there is an exception because word[0] isn’t in the dictionary yet, the exception handler does all of the same work to determine word[0] (3 opcodes) and sets self.by_letter[word[0]] to a new list containing the word.

$ python dis_test_loop.py dis_slow_loop
11 0 SETUP_LOOP 84 (to 87)
3 LOAD_FAST 1 (words)
6 GET_ITER
>> 7 FOR_ITER 76 (to 86)
10 STORE_FAST 2 (word)

12 13 SETUP_EXCEPT 28 (to 44)

13 16 LOAD_FAST 0 (self)
19 LOAD_ATTR 0 (by_letter)
22 LOAD_FAST 2 (word)
25 LOAD_CONST 1 (0)
28 BINARY_SUBSCR
29 BINARY_SUBSCR
30 LOAD_ATTR 1 (append)
33 LOAD_FAST 2 (word)
36 CALL_FUNCTION 1
39 POP_TOP
40 POP_BLOCK
41 JUMP_ABSOLUTE 7

14 >> 44 DUP_TOP
45 LOAD_GLOBAL 2 (KeyError)
48 COMPARE_OP 10 (exception match)
51 JUMP_IF_FALSE 27 (to 81)
54 POP_TOP
55 POP_TOP
56 POP_TOP
57 POP_TOP

15 58 LOAD_FAST 2 (word)
61 BUILD_LIST 1
64 LOAD_FAST 0 (self)
67 LOAD_ATTR 0 (by_letter)
70 LOAD_FAST 2 (word)
73 LOAD_CONST 1 (0)
76 BINARY_SUBSCR
77 STORE_SUBSCR
78 JUMP_ABSOLUTE 7
>> 81 POP_TOP
82 END_FINALLY
83 JUMP_ABSOLUTE 7
>> 86 POP_BLOCK
>> 87 LOAD_CONST 0 (None)
90 RETURN_VALUE

TIME: 0.1074

One technique to eliminate the exception setup is to pre-populate self.by_letter with one list for each letter of the alphabet. That means we should always find the list we want for the new word, and can just do the lookup and save the value.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/usr/bin/env python
# encoding: utf-8

import string

class Dictionary(object):

def __init__(self, words):
self.by_letter = dict( (letter, [])
for letter in string.letters)
self.load_data(words)

def load_data(self, words):
for word in words:
self.by_letter[word[0]].append(word)

The change cuts the number of opcodes in half, but only shaves the time down to 0.0984 seconds. Obviously the exception handling had some overhead, but not a huge amount.

$ python dis_test_loop.py dis_faster_loop
14 0 SETUP_LOOP 38 (to 41)
3 LOAD_FAST 1 (words)
6 GET_ITER
>> 7 FOR_ITER 30 (to 40)
10 STORE_FAST 2 (word)

15 13 LOAD_FAST 0 (self)
16 LOAD_ATTR 0 (by_letter)
19 LOAD_FAST 2 (word)
22 LOAD_CONST 1 (0)
25 BINARY_SUBSCR
26 BINARY_SUBSCR
27 LOAD_ATTR 1 (append)
30 LOAD_FAST 2 (word)
33 CALL_FUNCTION 1
36 POP_TOP
37 JUMP_ABSOLUTE 7
>> 40 POP_BLOCK
>> 41 LOAD_CONST 0 (None)
44 RETURN_VALUE

TIME: 0.0984

We can further improve the performance by moving the lookup for self.by_letter outside of the loop (the value doesn’t change, after all).

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/usr/bin/env python
# encoding: utf-8

import collections

class Dictionary(object):

def __init__(self, words):
self.by_letter = collections.defaultdict(list)
self.load_data(words)

def load_data(self, words):
by_letter = self.by_letter
for word in words:
by_letter[word[0]].append(word)

Opcodes 0-6 now find the value of self.by_letter and save it as a local variable by_letter. Using a local variable only takes a single opcode, instead of 2 (statement 22 uses LOAD_FAST to place the dictionary onto the stack). After this change, the run time is down to 0.0842 seconds.

$ python dis_test_loop.py dis_fastest_loop
13 0 LOAD_FAST 0 (self)
3 LOAD_ATTR 0 (by_letter)
6 STORE_FAST 2 (by_letter)

14 9 SETUP_LOOP 35 (to 47)
12 LOAD_FAST 1 (words)
15 GET_ITER
>> 16 FOR_ITER 27 (to 46)
19 STORE_FAST 3 (word)

15 22 LOAD_FAST 2 (by_letter)
25 LOAD_FAST 3 (word)
28 LOAD_CONST 1 (0)
31 BINARY_SUBSCR
32 BINARY_SUBSCR
33 LOAD_ATTR 1 (append)
36 LOAD_FAST 3 (word)
39 CALL_FUNCTION 1
42 POP_TOP
43 JUMP_ABSOLUTE 16
>> 46 POP_BLOCK
>> 47 LOAD_CONST 0 (None)
50 RETURN_VALUE

TIME: 0.0842

A further optimization, suggested by Brandon Rhodes, is to eliminate the Python version of the for loop entirely. If we use groupby() from itertools to arrange the input, the iteration is moved to C. We can do this safely because we know the inputs are already sorted. If you didn’t know they were sorted you would need to sort them first.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/usr/bin/env python
# encoding: utf-8

import operator
import itertools

class Dictionary(object):

def __init__(self, words):
self.by_letter = {}
self.load_data(words)

def load_data(self, words):
# Arrange by letter
grouped = itertools.groupby(words, key=operator.itemgetter(0))
# Save arranged sets of words
self.by_letter = dict((group[0][0], group) for group in grouped)

The itertools version takes only 0.0543 seconds to run, just over half of the original time.

$ python dis_test_loop.py dis_eliminate_loop
15 0 LOAD_GLOBAL 0 (itertools)
3 LOAD_ATTR 1 (groupby)
6 LOAD_FAST 1 (words)
9 LOAD_CONST 1 ('key')
12 LOAD_GLOBAL 2 (operator)
15 LOAD_ATTR 3 (itemgetter)
18 LOAD_CONST 2 (0)
21 CALL_FUNCTION 1
24 CALL_FUNCTION 257
27 STORE_FAST 2 (grouped)

17 30 LOAD_GLOBAL 4 (dict)
33 LOAD_CONST 3 (<code object <genexpr> at 0x7e7b8, file "/Users/dhellmann/Documents/PyMOTW/dis/PyMOTW/dis/dis_eliminate_loop.py", line 17>)
36 MAKE_FUNCTION 0
39 LOAD_FAST 2 (grouped)
42 GET_ITER
43 CALL_FUNCTION 1
46 CALL_FUNCTION 1
49 LOAD_FAST 0 (self)
52 STORE_ATTR 5 (by_letter)
55 LOAD_CONST 0 (None)
58 RETURN_VALUE

TIME: 0.0543

Compiler Optimizations

Disassembling compiled source also exposes some of the optimizations made by the compiler. For example, literal expressions are folded during compilation, when possible.

 1
2
3
4
5
6
7
8
9
10
11
12
#!/usr/bin/env python
# encoding: utf-8

# Folded
i = 1 + 2
f = 3.4 * 5.6
s = 'Hello,' + ' World!'

# Not folded
I = i * 3 * 4
F = f / 2 / 3
S = s + '\n' + 'Fantastic!'

The expressions on lines 5-7 can be computed at compilation time and collapsed into single LOAD_CONST instructions because nothing in the expression can change the way the operation is performed. That isn’t true about lines 10-12. Because a variable is involved in those expressions, and the variable might refer to an object that overloads the operator involved, the evaluation has to be delayed to runtime.

$ python -m dis dis_constant_folding.py
5 0 LOAD_CONST 11 (3)
3 STORE_NAME 0 (i)

6 6 LOAD_CONST 12 (19.039999999999999)
9 STORE_NAME 1 (f)

7 12 LOAD_CONST 13 ('Hello, World!')
15 STORE_NAME 2 (s)

10 18 LOAD_NAME 0 (i)
21 LOAD_CONST 6 (3)
24 BINARY_MULTIPLY
25 LOAD_CONST 7 (4)
28 BINARY_MULTIPLY
29 STORE_NAME 3 (I)

11 32 LOAD_NAME 1 (f)
35 LOAD_CONST 1 (2)
38 BINARY_DIVIDE
39 LOAD_CONST 6 (3)
42 BINARY_DIVIDE
43 STORE_NAME 4 (F)

12 46 LOAD_NAME 2 (s)
49 LOAD_CONST 8 ('\n')
52 BINARY_ADD
53 LOAD_CONST 9 ('Fantastic!')
56 BINARY_ADD
57 STORE_NAME 5 (S)
60 LOAD_CONST 10 (None)
63 RETURN_VALUE

See also

dis
The standard library documentation for this module, including the list of bytecode instructions.
Python Essential Reference, 4th Edition, David M. Beazley
http://www.informit.com/store/product.aspx?isbn=0672329786
thomas.apestaart.org “Python Disassembly”
A short discussion of the difference between storing values in a dictionary between Python 2.5 and 2.6.
Why is looping over range() in Python faster than using a while loop?
A discussion on StackOverflow.com comparing 2 looping examples via their disassembled bytecodes.
Decorator for binding constants at compile time
Python Cookbook recipe by Raymond Hettinger and Skip Montanaro with a function decorator that re-writes the bytecodes for a function to insert global constants to avoid runtime name lookups.

PyMOTW Home

The canonical version of this article

Thursday, August 20, 2009

looking for a loop optimization example

I'm working on a PyMOTW post about the dis module. As one of the examples, I'd like to show how to use dis to reduce the number of opcodes processed in a loop as an optimization technique. I'm looking for some "naive" code that looks OK, but doesn't perform especially well on large datasets.

So far I have a class that reads /usr/share/dict/words and organizes the words by their first letter:

#!/usr/bin/env python
# encoding: utf-8

import timeit
import dis

class Dictionary(object):
def __init__(self, filename='/usr/share/dict/words'):
self.by_letter = {}
self.load_file(filename)
def load_file(self, filename):
f = open(filename, 'rt')
try:
words = [l.strip() for l in f.readlines()]
for word in words:
try:
self.by_letter[word[0]].append(word)
except KeyError:
self.by_letter[word[0]] = [word]
finally:
f.close()

if __name__ == '__main__':
print 'DISASSEMBLY:'
dis.dis(Dictionary)
print
t = timeit.Timer('d = Dictionary()', 'from dis_slow_loop import Dictionary')
iterations = 2
print 'TIME: %0.4f' % (t.timeit(iterations)/iterations)


I've optimized it a bit to get this:

#!/usr/bin/env python
# encoding: utf-8

import collections
import dis
import timeit

class Dictionary(object):
def __init__(self, filename='/usr/share/dict/words'):
self.by_letter = collections.defaultdict(list)
self.load_file(filename)
def load_file(self, filename):
by_letter = self.by_letter
with open(filename, 'rt') as f:
for line in f:
word = line.strip()
by_letter[word[0]].append(word)

if __name__ == '__main__':
print 'DISASSEMBLY:'
dis.dis(Dictionary)
print
t = timeit.Timer('d = Dictionary()', 'from dis_faster_loop import Dictionary')
iterations = 2
print 'TIME: %0.4f' % (t.timeit(iterations)/iterations)


The first example gives me a run time of 0.2712 sec and the second is 0.1998.

I'm looking for either alternative loops or a way to increase that spread without making the "slow" version obviously obtuse (no sleeps, unnecessary sub-loops, etc.).

Monday, August 10, 2009

Book Review: The Practice of Programming



The Practice of Programming by Brian Kernighan and Rob Pike left me a little disappointed. If I had read it at a time closer to when it was originally published in 1999, I may have come away liking the book better. There's nothing wrong with the advice, and it reads well, but I don't think the examples are standing the test of time.

It is also possible that I'm just not a good representative of the target audience. The book focuses a lot on C, C++, and Java, with most of the examples in C. By 1999 I had moved over to mostly Python development and wasn't doing the sort of low-level coding in C that may have made their examples more appealing.

This book is about the practice of programming -- how to write programs for real.


The sections that talk about the the social aspects of programming are still valuable advice. For example, The code style issues and recommendations in chapter 1 can apply to any language, and basically boil down to write simple code meant to be read by another human being. I agree with their premise that clear code is also frequently the most optimal for runtime performance.

Chapter two talks about data structures and algorithms and covers a simple implementation of quicksort. The explanation is good, but on the other hand, does anyone still need to write their own sort functions? Do we really not have any other algorithms that are complex enough to be interesting, common enough to be worth reading about, and understood well enough that we can analyze them? I found that Beautiful Code (O'Reilly) had more interesting examples.

The Markov chain generator in chapter three did remind me of some code I wrote in 1998 (in Python) to find phrases in a Framemaker document that might be worth adding to the index. Maybe I should recreate some of that code for Sphinx.

I skimmed over a lot of the performance section since it seemed focused on low level C code and I try to avoid the sorts of situations where I would need to implement my own version of strstr() just to create a spam filter. This was another example of the text being dated.

I think the only real disagreement I had with anything they said was in the section on debugging and error handling, where the authors suggest saving exceptions for "truly exceptional" situations and don't think failing to open a file qualifies. I see two problems with this position: First, failing to raise an exception means there are two error reporting channels that need to be handled separately. Second, low-level code would have to check for all error conditions instead of allowing the exception to bubble up. Moving all error reporting to exceptions means that low-level code is more streamlined (and easier to read) because it doesn't have to consider error handling unless it can work around the problem (creating a missing file, for example). Maybe the advice is based on bad exception support in C++/Java? Or maybe their selected example is just not good. Searching for a substring may have been a better choice for an example, since the missing value isn't actually an error.

If you're new to professional programming, this book might be useful. If you have some real-world experience under your belt, you will find the same advice elsewhere in a more modern form. This book does talk about "how to write programs for real", just not the sorts of programs I have ended up writing.

Sunday, August 9, 2009

100th PyMOTW

This week's article on pydoc is the 100th PyMOTW episode. Whew!

To coincide with the coverage of pydoc, I have also enhanced the command line script motw that shows the module of the week article for a given module. The plain text output is now generated using the docutils rst-to-text converter (instead of using the rst source as plain text). The pydoc pager is used to display the text, so long articles no longer scroll off of the screen right away.

There is also a motw() function available as a built-in after you import PyMOTW. That means you can use motw(module_name) from within the interactive interpreter just like from the command line.

Enjoy!

PyMOTW: pydoc - Online help for Python modules

pydoc – Online help for Python modules

Purpose:Generates help for Python modules and classes from the code.
Python Version:2.1 and later

The pydoc module imports a Python module and uses the contents to generate help
text at runtime. The output includes docstrings for any objects that have them,
and all of the documentable contents of the module are described.

Plain Text Help

Running:

$ pydoc atexit

Produces plaintext help on the console, using your pager if one is configured.

HTML Help

You can also cause pydoc to generate HTML output, either writing a static file to
a local directory or starting a web server to browse documentation online.

$ pydoc -w atexit

Creates atexit.html in the current directory.

$ pydoc -p 5000

Starts a web server listening at http://localhost:5000/. The server generates
documentation as you browse through the available modules.

Interactive Help

pydoc also adds a function help() to the __builtins__ so you can access
the same information from the Python interpreter prompt.

$ python
Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> help('atexit')
Help on module atexit:

NAME
atexit

FILE
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/atexit.py
...

Examples

See also

pydoc
The standard library documentation for this module.
motw-cli
Accessing the Module of the Week articles from the command line.
motw-interactive
Accessing the Module of the Week articles from the interactive interpreter.

PyMOTW Home

The canonical version of this article