PyMOTW: StringIO and cStringIO
Module: StringIO and cStringIO
Purpose: Work with text buffers using file-like API
Python Version: StringIO: 1.4, cStringIO: 1.5
Description:
The StringIO class provides a convenient means of working with text in-memory using the file API (read, write. etc.). There are 2 separate implementations. The cStringIO module is written in C for speed, while the StringIO module is written in Python for portability. Using cStringIO to build large strings can offer performance savings over some other string conctatenation techniques.
Example:
Here are some pretty standard, simple, examples of using StringIO buffers:
#!/usr/bin/env python
"""Simple examples with StringIO module
"""
# Find the best implementation available on this platform
try:
from cStringIO import StringIO
except:
from StringIO import StringIO
# Writing to a buffer
output = StringIO()
output.write('This goes into the buffer. ')
print >>output, 'And so does this.'
# Retrieve the value written
print output.getvalue()
output.close() # discard buffer memory
# Initialize a read buffer
input = StringIO('Inital value for read buffer')
# Read from the buffer
print input.read()
This example uses read(), but of course the readline() and readlines() methods are also available. The StringIO class also provides a seek() method so it is possible to jump around in a buffer while reading, which can be useful for rewinding if you are using some sort of look-ahead parsing algorithm.
Real world applications of StringIO include a web application stack where various parts of the stack may add text to the response, or testing the output generated by parts of a program which typically write to a file.
The application we are building at work includes a shell scripting interface in the form of several command line programs. Some of these programs are responsible for pulling data from the database and dumping it on the console (either to show the user, or so the text can serve as input to another command). The commands share a set of formatter plugins to produce a text representation of an object in a variety of ways (XML, bash syntax, human readable, etc.). Since the formatters normally write to standard output, testing the results would be a little tricky without the StringIO module. Using StringIO to intercept the output of the formatter gives us an easy way to collect the output in memory to compare against expected results.
References:
- The StringIO module ::: www.effbot.org
- Efficient String Concatenation in Python
- Python Module of the Week
Updated 5/20/2007 with technorati tags.
Updated 9/5/2007 with minor formatting changes.
Technorati Tags:
python, PyMOTW

2 comments:
Be careful when switching from StringIO to cStringIO. They're not exactly the same, and I've been bitten by the difference. As it says on the cStringIO page you linked to at the top:
"Unlike the memory files implemented by the StringIO module, those provided by this module are not able to accept Unicode strings that cannot be encoded as plain ASCII strings."
And you are, of course, using Unicode strings for all your data, right? ;)
Good point, Blake, and thanks for the tip. I'm not 100% unicode-enabled, yet. It sounds like I'll have some habits to break before it make it all the way. :-)
Post a Comment