Sunday, December 30, 2007

Django with PostgreSQL on Mac OS X Tiger

Most of the time, when I have worked on django projects, I've used the SQLite backend for development on my PowerBook and deployed to PostgreSQL on my Linux server. For the project I'm working on right now, though, that turned into an issue when some of the queries that ran fine on my dev system didn't work at all on the production box. Apparently the backend code responsible for assembling the SQL query strings was producing different text for SQLite and PostgreSQL. So I could avoid similar issues in the future, I set out to install PostgreSQL on my laptop today.

Installing PostgreSQL itself turned out to be very easy indeed. I downloaded the universal installer from Andy Satori's "PostgreSQL for Mac" site. Some of the GUI clients included don't work because I'm on a PPC PowerBook instead of an x86 MacBook or MacBook Pro, but that's OK. I can use the CLI tools, which work fine.

The next thing I needed to do was set up psycopg. That turned out to be a bit of an issue, since initd.org is having some sort of server problem on their site. I was eventually able to download the tarball with the sources for psycopg 1.1.21.

In order to compile psycopg, I also needed mxDateTime from eGenix. They offer several pre-compiled packages, but none would install for me. Working from the source for 3.0.0, I was able to compile it myself via "python setup.py install" into my virtualenv sandbox.

Back in the psycopg build directory, I was able to use these instructions, but had to hack around a bit to get the mxDateTime headers in a place that matched the psycopg build expectations. I tried several variations of path names into the mx source tree, but eventually gave up and copied them all to one directory:

$ cd egenix-mx-base-3.0.0
$ mkdir include
$ find . -name '*.h' -exec cp {} `pwd`/include/ \;


I then tried to configure psycopg with:

$ ./configure --with-postgres-libraries=/Library/PostgreSQL8/lib --with-postgres-includes=/Library/PostgreSQL8/include --with-mxdatetime-includes=../egenix-mx-base-3.0.0/include/


That failed to find the Python.h header until I ran configure outside of my virtualenv environment, using the copy of Python 2.5 I had installed ages ago from python.org. Obviously your path to the mx includes may vary, but that installer package for the PostgreSQL server will put everything in /Library/PostgreSQL8.

Once I had configure running, I ran make (still outside of my virtualenv). The build succeeded, and then I went over to the shell running my virtual environment to install from there (via a simple "make install").

The end result of all of that was PostgreSQL 8.2.5 installed globally, and the mx 3.0.0 and psycopg 1.1.21 packages installed only in my virtual environment.

After a quick createdb, and edit to my settings.py file, I was able to sync up my dev server against the new database and get back to work. I suspect, but can't verify, that I would have had fewer issues if I was on an x86 Mac of some sort or running Leopard, since many of these packages seem to have moved on ahead of my platform. The whole thing took just over an hour, most of which was me fumbling around trying to find compatible versions of the source for the various pieces since it has been so long since I've compiled any of this stuff.

Python Magazine "wish list" updated

Brian has posted our current wish list over on his blog. I won't reproduce the entire thing here, so but please go look over the list and see if there is a topic you know something about.

We realize that not everyone who is interested in writing is necessarily an established writer. We will provide help and advice if you're a new author and not familiar with writing for a magazine. I've been programming professionally for 11-12 years now, but I'm still fairly new to publishing, so I know that that first article can be the hardest. But you won't gain the experience if you don't start somewhere. Brian refutes other reasons people don't like to write over here. Let me add to his arguments that it might just turn out to be a lot easier than you think.

As Brian points out in his wish-list post, there are quite a few big projects out there our readers are interested in hearing from but that haven't contributed articles, yet. If you're a project maintainer who has no time to write, that's ok! Consider encouraging someone else from your user or developer community to write an article about the project instead. If you find yourself using a particular package or tool frequently, but don't contribute directly to that project, that's not a problem. There no requirement that articles have to come from the original developer of a package. It's always good to have contributions from different perspectives.

The submission process is very easy and low pressure (ask any of our published authors). Neither Brian nor I want to make things harder than they have to be, so we don't. And just in case it isn't clear: we will pay you for your effort if you write an article. So check out the list and consider picking up a little extra cash after the holidays by writing for us.

[Updated with link to Brian's "Why you should write" post.]

PyMOTW: mmap

Map files directly to memory using mmap.

Module: mmap
Purpose: Memory-map files instead of reading the contents directly.
Python Version: 2.1 and later

Description:

Use the mmap() function to create a memory-mapped file. There are differences in the arguments and behaviors for mmap() between Unix and Windows, which are not discussed below. For more details, refer to the library documentation.

The first argument is a fileno, either from the fileno() method of a file object or from os.open(). Since you have opened the file before calling mmap(), you are responsible for closing it.

The second argument to mmap() is a size in bytes for the portion of the file to map. If the value is 0, the entire file is mapped. You cannot create a zero-length mapping under Windows. If the size is larger than the current size of the file, the file is extended.

An optional keyword argument, access, is supported by both platforms. Use ACCESS_READ for read-only access, ACCESS_WRITE for write-through (assignments to the memory go directly to the file), or ACCESS_COPY for copy-on-write (assignments to memory are not written to the file).

File and String API:

Memory-mapped files can be treated as mutable strings or file-like objects, depending on your need. A mapped file supports the expected file API methods, such as close(), flush(), read(), readline(), seek(), tell(), and write(). It also supports the string API, with features such as slicing and methods like find().

Sample Data:

All of the examples use the text file lorem.txt, containing a bit of Lorem Ipsum. For reference, the text of the file is:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec
egestas, enim et consectetuer ullamcorper, lectus ligula rutrum leo, a
elementum elit tortor eu quam. Duis tincidunt nisi ut ante. Nulla
facilisi. Sed tristique eros eu libero. Pellentesque vel arcu. Vivamus
purus orci, iaculis ac, suscipit sit amet, pulvinar eu,
lacus. Praesent placerat tortor sed nisl. Nunc blandit diam egestas
dui. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Aliquam viverra fringilla
leo. Nulla feugiat augue eleifend nulla. Vivamus mauris. Vivamus sed
mauris in nibh placerat egestas. Suspendisse potenti. Mauris massa. Ut
eget velit auctor tortor blandit sollicitudin. Suspendisse imperdiet
justo.


Reading:

To map a file for read-only access, make sure to pass access=mmap.ACCESS_READ:

import mmap

f = open('lorem.txt', 'r')
try:
m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
try:
print 'First 10 bytes via read :', m.read(10)
print 'First 10 bytes via slice:', m[:10]
print '2nd 10 bytes via read :', m.read(10)
finally:
m.close()
finally:
f.close()


In this example, even though the call to read() advances the file pointer, the slice operation still gives us the same first 10 bytes because the file pointer is reset. The file pointer tracks the last access, so after using the slice operation to give us the first 10 bytes for the second time, calling read gives the next 10 bytes in the file.

$ python mmap_read.py
First 10 bytes via read : Lorem ipsu
First 10 bytes via slice: Lorem ipsu
2nd 10 bytes via read : m dolor si


Writing:

If you need to write to the memory mapped file, start by opening it for reading and appending (not with 'w', but 'r+') before mapping it. Then use any of the API method which change the data (write(), assignment to a slice, etc.).

Here's an example using the default access mode of ACCESS_WRITE and assigning to a slice to modify part of a line in place:

import mmap
import shutil

# Copy the example file
shutil.copyfile('lorem.txt', 'lorem_copy.txt')

word = 'consectetuer'
reversed = word[::-1]
print 'Looking for :', word
print 'Replacing with :', reversed

f = open('lorem_copy.txt', 'r+')
try:
m = mmap.mmap(f.fileno(), 0)
try:
print 'Before:', m.readline().rstrip()
m.seek(0) # rewind

loc = m.find(word)
m[loc:loc+len(word)] = reversed
m.flush()

m.seek(0) # rewind
print 'After :', m.readline().rstrip()
finally:
m.close()
finally:
f.close()


As you can see here, the word shown in bold is replaced in the middle of the first line:

$ python mmap_write_slice.py
Looking for : consectetuer
Replacing with : reutetcesnoc
Before: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec
After : Lorem ipsum dolor sit amet, reutetcesnoc adipiscing elit. Donec


ACCESS_COPY Mode:

Using the ACCESS_COPY mode does not write changes to the file on disk.

f = open('lorem_copy.txt', 'r+')
try:
m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_COPY)
try:
print 'Memory Before:', m.readline().rstrip()
print 'File Before :', f.readline().rstrip()
print

m.seek(0) # rewind
loc = m.find(word)
m[loc:loc+len(word)] = reversed

m.seek(0) # rewind
print 'Memory After :', m.readline().rstrip()

f.seek(0)
print 'File After :', f.readline().rstrip()

finally:
m.close()
finally:
f.close()


Note, in this example, that it was necessary to rewind the file handle separately from the mmap handle.

$ python mmap_write_copy.py 
Memory Before: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec
File Before : Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec

Memory After : Lorem ipsum dolor sit amet, reutetcesnoc adipiscing elit. Donec
File After : Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec


Regular Expressions:

Since a memory mapped file can act like a string, you can use it with other modules that operate on strings, such as regular expressions. This example finds all of the sentences with "nulla" in them.

import mmap
import re

pattern = re.compile(r'(\.\W+)?([^.]?nulla[^.]*?\.)',
re.DOTALL | re.IGNORECASE | re.MULTILINE)

f = open('lorem.txt', 'r')
try:
m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
try:
for match in pattern.findall(m):
print match[1].replace('\n', ' ')
finally:
m.close()
finally:
f.close()


Since the pattern includes two groups, the return value from findall() is a sequence of tuples. The print statement pulls out the sentence match and replaces newlines with spaces so the result prints on a single line.

$ python mmap_regex.py
Nulla facilisi.
Nulla feugiat augue eleifend nulla.


References:

effbot.org - The mmap module
Python Module of the Week Home
Download Sample Code


Technorati Tags:
,


Using raw SQL in django

For a django project I'm working on, I need to run a simple query with COUNT and GROUP BY statements to collect some summary data. I tried working with the model API, and eventually fell back to using raw SQL instead.

Here's the query I wanted to run:

SELECT
(year/10)*10 as decade,
count(*) as case_count
FROM
docket_case
GROUP BY
decade


One problem I encountered was there was no way (that I could find) to do a GROUP BY. I found one reference to a group_by() method, but I don't think the patch has been submitted or approved.

I spent some time trying to work out the right way to do it using the model API, before realizing I was banging my head on the wrong brick wall. I was using values() and select() extra(select='...') together, which should have clued me in right away. Well, that and the fact that the data I was retrieving wasn't one of my model classes.

Using a tip from page 325 of the new django book (see my earlier post for a review), I decided to just go ahead and run the query directly and work with the result set. The results need to go into a template, so I wanted to have dictionaries instead of the tuples I get from cursor.fetchall(). This is what I came up with:

from itertools import *
from django.db import connection

def query_to_dicts(query_string, *query_args):
"""Run a simple query and produce a generator
that returns the results as a bunch of dictionaries
with keys for the column values selected.
"""
cursor = connection.cursor()
cursor.execute(query_string, query_args)
col_names = [desc[0] for desc in cursor.description]
while True:
row = cursor.fetchone()
if row is None:
break
row_dict = dict(izip(col_names, row))
yield row_dict
return


You call it like:

    results = query_to_dicts("""
SELECT
(year/10)*10 as decade,
count(*) as case_count
FROM
docket_case
GROUP BY
decade
""")


or, with arguments (this query counts the cases in each year of a particular decade):

    results = query_to_dicts("""
SELECT
year,
count(*) as case_count
FROM
docket_case
WHERE
year/10 = %s
GROUP BY
year
""", decade/10)


I don't expect a large result set, but I figured this was an excuse to experiment with generators and iterators. I'm pretty happy with the results, but surprised I had to write something like this myself. I didn't see a cursor method to fetch the rows as dictionaries instead of tuples, and the search I ran didn't return anything that looked useful.

Am I missing something?

[Updated 31 Dec: One correction: The method is extra(), the argument is select. I was adding a count field using extra(), and then asking for the value with values(). There does not seem to be a way to structure the query I wanted using the models API methods.]

Saturday, December 29, 2007

Book Review: The Definitive Guide to Django

A06F347D-780E-4756-92FA-941B1725E4EA.jpg

I'm working on a new web-based project, and continuing the process of learning django, so I was very pleased to receive my copy of The Definitive Guide to django by Adrian Holovaty and Jacob Kaplan-Moss in the mail recently.

Contents:

The book is divided into 3 sections. The first, including chapters 1-8, covers introductory material such as setting up a project, using the template system and database layer, etc. This was familiar material after several readings of the tutorials on the django web site, but it has been cleaned up and organized nicely in the book.

The second section (chapters 9-20) cover "subframeworks" and dig deeper into topics which, while documented online, I've found to be more difficult to "discover". The chapters on generic views, syndication and site-map generation, and caching were particularly helpful. I also appreciated the advice on production deployment in chapter 20. Some of the mystery has been removed from these topics, and I learned about features I didn't even realize existed.

The last section, consisting of 8 separate appendixes, is a reference manual for the various layers of django. All of the topics covered in the main part of the book are included with more concise descriptions of methods and more complete listings for functions or methods not discussed earlier.

My Review:

I'm glad I bought the book. It presents much of the same material you can find online, but having it available in book form made it easier for me to read without being distracted by trying the material at the same time. I like to read, absorb, then try when I'm learning about a new technology, and I find it much easier to read and absorb away from the computer where there is no temptation to try writing code before I'm really ready.

The writing is easy to read, but not dumbed down. Between the holidays and the writing style of the book, I was able to blaze through the whole thing in about a week's time (I admit to skimming parts of the reference section during that initial reading). There is a good mix of factual information and best practices tips with arguments backing up the opinions. You don't have to agree with the suggestions, but you are more informed after reading them.

I've been using the appendixes as a handy reference while working on the templates and database queries for my project, and it has made development quite a bit easier. The online references for django are quite good, but flipping back and forth in the physical book is actually quicker in a lot of cases.

Monday, December 24, 2007

Python Magazine for December available for download now

The December issue of Python Magazine went live this morning. If you're a subscriber, you can download your personlized, DRM-free PDF via your account page right away.

Contents:

The cover story this month is Python Threads and the Global Interpreter Lock, a detailed analysis of threading performance under different types of load by Jesse Noller. Jesse's article is chock-full of benchmarks and background material that illustrates when the GIL is, and isn't, an issue.

Python On the Go: Using Python on Mobile Platforms, by Saša Dimitrijević, includes references for all sorts of development tools for getting started hacking your phone or PDA with Python.

In Python Powered Accessibility, Steve Lee explains how to use Python and GNOME accessibility toolkits to make it easier easier for people with disabilities to use your desktop applications.

John Berninger returns this month with Using Python to Manage RPMs, an introduction to using RPM from within Python programs with a focus on security and intrusion detection.

In his column this month, Mark Mruss creates some basic GUI apps with PyQt. Steve Holden shows how to use RSS, del.icio.us, and MochiKit to keep fresh content on your home page. Brian Jones examines the increased adoption of Python over the past few years, and I talk about the PSF's involvement in the Google Highly Open Participation contest.

Write for us!

We've just finished the work on this issue, which of course means it's time to start the next. If you have a topic you'd like to see covered, post a comment here or head over to our web site and tell us all about it. If you have an idea for an article, use the "write for us" link. Don't be shy! We'll help you develop your idea into a full article and make sure your prose looks as good as your code.

And, as usual, if there is something you think I should cover in my column, shoot me an email, post a comment here, or tag the link with pymagdifferent on del.icio.us.

Saturday, December 22, 2007

This Week in Django - Podcast

If you haven't already seen it, check out Michael Trier's new podcast "This Week in Django". The first episode was good, so I'm looking forward to listening to the second.

Subscribed!

PyMOTW: zipimport

The zipimport module can be used to import and run Python code found inside ZIP archives.

Module: zipimport
Purpose: Load Python code from inside ZIP archives.
Python Version: 2.3 and later

Description:

The zipimport module implements the zipimporter class, which can be used to find and load Python modules inside ZIP archives. The zipimporter supports the "import hooks" API specified in PEP 302; this is how Python Eggs work.

You probably won't need to use the zipimport module directly, since it is possible to import directly from a ZIP archive as long as that archive appears in your sys.path. However, it is interesting to see the features available.

Example:

For the examples this week, I'll reuse some of the code from last week's discussion of zipfile to create an example ZIP archive containing some Python modules. If you are experimenting with the sample code on your system, run zipimport_make_example.zip before any of the rest of the examples. It will create a ZIP archive containing all of the modules in the zipimport example directory, along with some test data needed for the code below.

Finding a Module:

Given the full name of a module, find_module() will try to locate that module inside the ZIP archive.

import zipimport

importer = zipimport.zipimporter('zipimport_example.zip')

for module_name in [ 'zipimport_find_module', 'not_there' ]:
print module_name, ':', importer.find_module(module_name)


If the module is found, the zipimporter instance is returned. Otherwise, None is returned.

$ python zipimport_find_module.py
zipimport_find_module : <zipimporter object "zipimport_example.zip">
not_there : None


Accessing Code:

The get_code() method loads the code object for a module from the archive.

importer = zipimport.zipimporter('zipimport_example.zip')
code = importer.get_code('zipimport_get_code')
print code


The code object is not the same as a module object.

$ python zipimport_get_code.py
<code object <module> at 0x57530, file "./zipimport_get_code.py", line 28>


To load the code as a usable module, use load_module() instead.

importer = zipimport.zipimporter('zipimport_example.zip')
module = importer.load_module('zipimport_get_code')
print 'Name :', module.__name__
print 'Loader :', module.__loader__
print 'Code :', module.code


The result is a module object as though the code had been loaded from a regular import:

$ python zipimport_load_module.py
<code object <module> at 0x57968, file "./zipimport_get_code.py", line 28>
Name : zipimport_get_code
Loader : <zipimporter object "zipimport_example.zip">
Code : <code object <module> at 0x57968, file "./zipimport_get_code.py", line 28>


Source:

As with the inspect module, it is possible to retrieve the source code for a module from the ZIP archive, if the archive includes the source. In the case of the example, only zipimport_get_source.py is added to zipimport_example.zip (the rest of the modules are just the .pyc files).

importer = zipimport.zipimporter('zipimport_example.zip')
for module_name in ['zipimport_get_code', 'zipimport_get_source']:
source = importer.get_source(module_name)
print '=' * 80
print module_name
print '=' * 80
print source
print


If the source for a module is not available, get_source() returns None.

$ python zipimport_get_source.py
================================================================================
zipimport_get_code
================================================================================
None

================================================================================
zipimport_get_source
================================================================================
#!/usr/bin/env python
#
# ... some lines omitted for brevity ...
#
import zipimport

importer = zipimport.zipimporter('zipimport_example.zip')
source = importer.get_source('zipimport_get_code')
print source


Packages:

To determine if a name refers to a package instead of a regular module, use is_package().

importer = zipimport.zipimporter('zipimport_example.zip')
for name in ['zipimport_is_package', 'example_package']:
print name, importer.is_package(name)


In this case, zipimport_is_package came from a module and the example_package is, well, a package.

$ python zipimport_is_package.py
zipimport_is_package False
example_package True


Data:

There are times when source modules or packages need to be distributed with non-code data. Images, configuration files, default data, and test fixtures are just a few examples of this. Frequently, the module __path__ is used to find these data files relative to where the code is installed.

For example, with a normal module you might do something like:

import os
import example_package
data_filename = os.path.join(os.path.dirname(example_package.__file__),
'README.txt')
print data_filename, ':'
print open(data_filename, 'rt').read()


The output will look something like this, with the path changed based on where the PyMOTW sample code is on your filesystem.

$ python zipimport_get_data_nozip.py /Users/dhellmann/Documents/PyMOTW/in_progress/zipimport/example_package/README.txt :
This file represents sample data which could be embedded in the ZIP
archive. You could include a configuration file, images, or any other
sort of non-code data.


If the example_package is imported from the ZIP archive instead of the filesystem, that method does not work:

import sys
sys.path.insert(0, 'zipimport_example.zip')

import os
import example_package
print example_package.__file__
data_filename = os.path.join(os.path.dirname(example_package.__file__),
'README.txt')
print data_filename, ':'
print open(data_filename, 'rt').read()


The __file__ of the package refers to the ZIP archive, and not a directory. So we cannot just build up the path to the README.txt file.

$ python zipimport_get_data_zip.pyzipimport_example.zip/example_package/__init__.pyc
zipimport_example.zip/example_package/README.txt :
Traceback (most recent call last):
File "zipimport_get_data_zip.py", line 41, in
print open(data_filename, 'rt').read()
IOError: [Errno 20] Not a directory: 'zipimport_example.zip/example_package/README.txt'


Instead, we need to use the get_data() method. We can access zipimporter instance which loaded the module through the __loader__ attribute of the imported module:

import sys
sys.path.insert(0, 'zipimport_example.zip')

import os
import example_package
print example_package.__file__
print example_package.__loader__.get_data('example_package/README.txt')


$ python zipimport_get_data.py
zipimport_example.zip/example_package/__init__.pyc
This file represents sample data which could be embedded in the ZIP
archive. You could include a configuration file, images, or any other
sort of non-code data.


Although __loader__ is not set for modules not imported via zipimport.

References:

PEP 273 - Import Modules from ZIP Archives
PEP 302 - New Import Hooks
Python Eggs
inspect module
PyMOTW: inspect
Python Module of the Week Home
Download Sample Code


Technorati Tags:
,


Monday, December 17, 2007

Adam Gomaa explains why I prefer Django to Turbo Gears

I've built a couple of small web projects using Turbo Gears and Django. I found the Django experience much nicer, in the end, and am sticking with it for now. My main beef with Turbo Gears is that to figure out how to do anything I had to visit 3-4 different web sites to read documentation for the different pieces. The larger problem, as Adam Gomaa points out in his post about Pylons and Django, is that Turbo Gears lacked conceptual integrity.

"The 'best tool for the job' almost always consists of 'the most easily maintained tool for the job,' which itself correlates strongly with 'what comes out the box.' The fewer pluggable, add-on, additional-setup-required components there are, the better. Two templating languages is not a feature, it's a mistake. Four? That's a nightmare."


That nails it pretty succinctly. I don't love working with Django. They are slow to release new versions (even compatible versions) and I've recently had to start tracking the trunk in order to build the site I'm working on now (ugh). But at least all of the documentation is in one place, and the various layers work together consistently.

new statistics package for python: python-statlib

One of the tasks for the Google Highly Open Participation contest was to combine several existing statistics modules into a single package to make it easier to install them. That work is done, and now available from python-statlib or via easy_install statlib.

So that's a whole new package created as a result the contest. I can't get over how much work these students are doing!

Sunday, December 16, 2007

two new releases

I'm working on another django project and need some head features for it, so I needed to update my server from 0.96 to use an svn checkout (I wish there were more frequent formal releases of django - they need a release manager). As a result, I spent a while today updating a couple of my applications to be compatible with the head version.

So, there are new (formally tested and released :-) versions of django-links and codehosting available.

PyMOTW: zipfile

The zipfile module can be used to manipulate ZIP archive files.

Module: zipfile
Purpose: Read and write ZIP archive files.
Python Version: 1.6 and later

Limitations:

The zipfile module does not support ZIP files with appended comments, or multi-disk ZIP files. It does support ZIP files larger than 4 GB that use the ZIP64 extensions.

Testing ZIP Files:

The is_zipfile() function returns a boolean indicating whether or not the filename passed as an argument refers to a valid ZIP file.

import zipfile

for filename in [ 'README.txt', 'example.zip',
'bad_example.zip', 'notthere.zip' ]:
print '%20s %s' % (filename, zipfile.is_zipfile(filename))


Notice that if the file does not exist, is_zipfile() returns False.

$ python zipfile_is_zipfile.py 
README.txt False
example.zip True
bad_example.zip False
notthere.zip False


Reading Meta-data from a ZIP Archive:

Use the ZipFile class to work directly with a ZIP archive. It supports methods for reading data about existing archives as well as modifying the archives by adding additional files.

To read the names of the files in an existing archive, use namelist():

import zipfile

zf = zipfile.ZipFile('example.zip', 'r')
print zf.namelist()


The return value is a list of strings with the names of the archive contents:

$ python zipfile_namelist.py 
['README.txt']


The list of names is only part of the information available from the archive, though. To access all of the meta-data about the ZIP contents, use the infolist() or getinfo() methods.

import datetime
import zipfile

def print_info(archive_name):
zf = zipfile.ZipFile(archive_name)
for info in zf.infolist():
print info.filename
print '\tComment:\t', info.comment
print '\tModified:\t', datetime.datetime(*info.date_time)
print '\tSystem:\t\t', info.create_system, '(0 = Windows, 3 = Unix)'
print '\tZIP version:\t', info.create_version
print '\tCompressed:\t', info.compress_size, 'bytes'
print '\tUncompressed:\t', info.file_size, 'bytes'
print

if __name__ == '__main__':
print_info('example.zip')


There are additional fields other than those printed here, but deciphering the values into anything useful requires careful reading of the PKZIP Application Note with the ZIP file specification.

$ python zipfile_infolist.py 
README.txt
Comment:
Modified: 2007-12-16 10:08:52
System: 3 (0 = Windows, 3 = Unix)
ZIP version: 23
Compressed: 63 bytes
Uncompressed: 75 bytes


If you know in advance the name of the archive member, you can retrieve its ZipInfo object with getinfo().

import zipfile

zf = zipfile.ZipFile('example.zip')
for filename in [ 'README.txt', 'notthere.txt' ]:
try:
info = zf.getinfo(filename)
except KeyError:
print 'ERROR: Did not find %s in zip file' % filename
else:
print '%s is %d bytes' % (info.filename, info.file_size)


If the archive member is not present, getinfo() raises a KeyError.

$ python zipfile_getinfo.py 
README.txt is 75 bytes
ERROR: Did not find notthere.txt in zip file


Extracting Archived Files From a ZIP Archive:

To access the data from an archive member, use the read() method, passing the member's name.

import zipfile

zf = zipfile.ZipFile('example.zip')
for filename in [ 'README.txt', 'notthere.txt' ]:
try:
data = zf.read(filename)
except KeyError:
print 'ERROR: Did not find %s in zip file' % filename
else:
print filename, ':'
print repr(data)
print


The data is automatically decompressed for you, if necessary.

$ python zipfile_read.py 
README.txt :
'The examples for the zipfile module use this file and example.zip as data.\n'

ERROR: Did not find notthere.txt in zip file


Creating New Archives:

To create a new archive, simple instantiate the ZipFile with a mode of 'w'. Any existing file is truncated and a new archive is started. To add files, use the write() method.

__version__ = "$Id: copyright.el 1053 2007-09-20 11:56:47Z dhellmann $"

from zipfile_infolist import print_info
import zipfile

print 'creating archive'
zf = zipfile.ZipFile('zipfile_write.zip', mode='w')
try:
print 'adding README.txt'
zf.write('README.txt')
finally:
print 'closing'
zf.close()

print
print_info('zipfile_write.zip')


By default, the contents of the archive are not compressed:

$ python zipfile_write.py
creating archive
adding README.txt
closing

README.txt
Comment:
Modified: 2007-12-16 10:08:50
System: 3 (0 = Windows, 3 = Unix)
ZIP version: 20
Compressed: 75 bytes
Uncompressed: 75 bytes


To add compression, the zlib module is required. If zlib is available, you can set the compression mode for individual files or for the archive as a whole using zipfile.ZIP_DEFLATED. The default compression mode is zipfile.ZIP_STORED.

from zipfile_infolist import print_info
import zipfile
try:
import zlib
compression = zipfile.ZIP_DEFLATED
except:
compression = zipfile.ZIP_STORED

modes = { zipfile.ZIP_DEFLATED: 'deflated',
zipfile.ZIP_STORED: 'stored',
}

print 'creating archive'
zf = zipfile.ZipFile('zipfile_write_compression.zip', mode='w')
try:
print 'adding README.txt with compression mode', modes[compression]
zf.write('README.txt', compress_type=compression)
finally:
print 'closing'
zf.close()

print
print_info('zipfile_write_compression.zip')


This time the archive member is compressed:

$ python zipfile_write_compression.py creating archive
adding README.txt with compression mode deflated
closing

README.txt
Comment:
Modified: 2007-12-16 10:08:50
System: 3 (0 = Windows, 3 = Unix)
ZIP version: 20
Compressed: 63 bytes
Uncompressed: 75 bytes


Using Alternate Archive Member Names:

It is easy to add a file to an archive using a name other than the original file name, by passing the arcname argument to write().

from zipfile_infolist import print_info
import zipfile

zf = zipfile.ZipFile('zipfile_write_arcname.zip', mode='w')
try:
zf.write('README.txt', arcname='NOT_README.txt')
finally:
zf.close()
print_info('zipfile_write_arcname.zip')


There is no sign of the original filename in the archive:

$ python zipfile_write_arcname.py 
NOT_README.txt
Comment:
Modified: 2007-12-16 10:08:50
System: 3 (0 = Windows, 3 = Unix)
ZIP version: 20
Compressed: 75 bytes
Uncompressed: 75 bytes


Writing Data from Sources Other Than Files:

Sometimes it is necessary to write to a ZIP archive using data that did not come from an existing file. Rather than writing the data to a file, then adding that file to the ZIP archive, you can use the writestr() method to add a string of bytes to the archive directly.

from zipfile_infolist import print_info
import zipfile

msg = 'This data did not exist in a file before being added to the ZIP file'
zf = zipfile.ZipFile('zipfile_writestr.zip',
mode='w',
compression=zipfile.ZIP_DEFLATED,
)
try:
zf.writestr('from_string.txt', msg)
finally:
zf.close()

print_info('zipfile_writestr.zip')

zf = zipfile.ZipFile('zipfile_writestr.zip', 'r')
print zf.read('from_string.txt')


In this case, I used the compress argument to ZipFile to compress the data, since writestr() does not take compress as an argument.

$ python zipfile_writestr.py
from_string.txt
Comment:
Modified: 2007-12-16 11:38:14
System: 3 (0 = Windows, 3 = Unix)
ZIP version: 20
Compressed: 62 bytes
Uncompressed: 68 bytes

This data did not exist in a file before being added to the ZIP file


Writing with a ZipInfo Instance:

By default, the modification date is computed for you when you add a file or string to the archive. When using writestr(), it is also possible to pass a ZipInfo instance to define that and other meta-data yourself.

import time
import zipfile
from zipfile_infolist import print_info

msg = 'This data did not exist in a file before being added to the ZIP file'
zf = zipfile.ZipFile('zipfile_writestr_zipinfo.zip',
mode='w',
)
try:
info = zipfile.ZipInfo('from_string.txt',
date_time=time.localtime(time.time()),
)
info.compress_type=zipfile.ZIP_DEFLATED
info.comment='Remarks go here'
info.create_system=0
zf.writestr(info, msg)
finally:
zf.close()

print_info('zipfile_writestr_zipinfo.zip')


In this example, I set the modified time to the current time, compress the data, provide a false value for create_system, and add a comment.

$ python zipfile_writestr_zipinfo.pyfrom_string.txt
Comment: Remarks go here
Modified: 2007-12-16 11:44:14
System: 0 (0 = Windows, 3 = Unix)
ZIP version: 20
Compressed: 62 bytes
Uncompressed: 68 bytes


Appending to Files:

In addition to creating new archives, it is possible to append to an existing archive or add an archive at the end of an existing file (such as a .exe file for a self-extracting archive). To open a file to append to it, use mode 'a'.

from zipfile_infolist import print_info
import zipfile

print 'creating archive'
zf = zipfile.ZipFile('zipfile_append.zip', mode='w')
try:
zf.write('README.txt')
finally:
zf.close()

print
print_info('zipfile_append.zip')

print 'appending to the archive'
zf = zipfile.ZipFile('zipfile_append.zip', mode='a')
try:
zf.write('README.txt', arcname='README2.txt')
finally:
zf.close()

print
print_info('zipfile_append.zip')


The resulting archive ends up with 2 members:

$ python zipfile_append.py 
creating archive

README.txt
Comment:
Modified: 2007-12-16 10:08:50
System: 3 (0 = Windows, 3 = Unix)
ZIP version: 20
Compressed: 75 bytes
Uncompressed: 75 bytes

appending to the archive

README.txt
Comment:
Modified: 2007-12-16 10:08:50
System: 3 (0 = Windows, 3 = Unix)
ZIP version: 20
Compressed: 75 bytes
Uncompressed: 75 bytes

README2.txt
Comment:
Modified: 2007-12-16 10:08:50
System: 3 (0 = Windows, 3 = Unix)
ZIP version: 20
Compressed: 75 bytes
Uncompressed: 75 bytes


Python ZIP Archives:

Since version 2.3 Python has had the ability to import modules from inside ZIP archives if those archives appear in sys.path. The zipfile.PyZipFile class can be used to construct a module suitable for use in this way. When you use the extra method writepy(), PyZipFile scans a directory for .py files and adds the corresponding .pyo or .pyc file to the archive. If neither compiled form exists, a .pyc file is created and added.

import sys
import zipfile

if __name__ == '__main__':
zf = zipfile.PyZipFile('zipfile_pyzipfile.zip', mode='w')
try:
zf.debug = 3
print 'Adding python files'
zf.writepy('.')
finally:
zf.close()
for name in zf.namelist():
print name

print
sys.path.insert(0, 'zipfile_pyzipfile.zip')
import zipfile_pyzipfile
print 'Imported from:', zipfile_pyzipfile.__file__


When I set the debug attribute of the PyZipFile to 3, verbose debugging is enabled and you can observe as it compiles each .py file it finds.

$ python zipfile_pyzipfile.py
Adding python files
Adding package in . as .
Compiling ./__init__.py
Adding ./__init__.pyc
Compiling ./zipfile_append.py
Adding ./zipfile_append.pyc
Compiling ./zipfile_getinfo.py
Adding ./zipfile_getinfo.pyc
Compiling ./zipfile_infolist.py
Adding ./zipfile_infolist.pyc
Compiling ./zipfile_is_zipfile.py
Adding ./zipfile_is_zipfile.pyc
Compiling ./zipfile_namelist.py
Adding ./zipfile_namelist.pyc
Compiling ./zipfile_printdir.py
Adding ./zipfile_printdir.pyc
Compiling ./zipfile_pyzipfile.py
Adding ./zipfile_pyzipfile.pyc
Compiling ./zipfile_read.py
Adding ./zipfile_read.pyc
Compiling ./zipfile_write.py
Adding ./zipfile_write.pyc
Compiling ./zipfile_write_arcname.py
Adding ./zipfile_write_arcname.pyc
Compiling ./zipfile_write_compression.py
Adding ./zipfile_write_compression.pyc
Compiling ./zipfile_writestr.py
Adding ./zipfile_writestr.pyc
Compiling ./zipfile_writestr_zipinfo.py
Adding ./zipfile_writestr_zipinfo.pyc
__init__.pyc
zipfile_append.pyc
zipfile_getinfo.pyc
zipfile_infolist.pyc
zipfile_is_zipfile.pyc
zipfile_namelist.pyc
zipfile_printdir.pyc
zipfile_pyzipfile.pyc
zipfile_read.pyc
zipfile_write.pyc
zipfile_write_arcname.pyc
zipfile_write_compression.pyc
zipfile_writestr.pyc
zipfile_writestr_zipinfo.pyc

Imported from: zipfile_pyzipfile.zip/zipfile_pyzipfile.pyc


References: