Sunday, April 20, 2008

tools for literate programming with Python?

I've had an idea kicking around in my head for a couple of days. It's one of those things I just can't seem to let go of, but I don't really have time to build it right now. I'm hoping someone out there had the same idea already and written something that works sort of like what I want. Failing that, maybe someone looking for a project to start will like this idea.

I've seen discussions in the past of something called "literate programming". My understanding of the gist of the idea is that you write prose and code together in a file, then use a set of tools to split them apart as part of a build process. The benefits are supposed to be exceptionally well documented code, since you're essentially writing the documentation first and code later. I never really bought into that idea; it seemed like a lot of extra overhead. Unless you're writing libraries to be shared by other developers there's just not a need for so much documentation.

On the other hand, if what you're doing is writing about code, then it seems like a great idea. I've been doing a lot of writing about code lately, so I want to see if I can improve my tools. I'm specifically thinking of this for working on my Python Module of the Week series, but it could be useful in other areas as well. The way I blog now means that I have to make sure I regenerate all examples before posting, in case I've edited any source files along the way. So my cycle goes: read docs, write code, write description (pasting in code and examples), change code, fix pasted code and examples, repeat until done.

A quick Google search turned up a few tools I need to look into, but I'm not sure what I envision is literate programming as originally defined.

I want a tool that lets me write prose and code in the same file, then extract the code for use separately, but also run the code and re-process the input file. The idea would be to edit a single file, mark the sections that are code, mark an output area for each code block. I would then use a tool to extract and run the code with the output inserted back into the original file (replacing any output from previous runs, of course). Ideally the source file would be HTML or something close, since I have to convert to that for posting anyway. I would rather not have to learn some random new markup language that I could only use with this one tool, but something like Markdown or reST would be ok.

It doesn't seem like it would be that complicated to write a tool to do what I want using a library like BeautifulSoup to find the source blocks and output destinations. And I could set up my own trigger in TextMate to run it, so I wouldn't have to change editors. Before I spend a bunch of time implementing something I could just download, though, I thought it would be prudent to ask a couple of Dear Lazy Web questions:

1. Does something like what I describe exist?

2. If not, what literate programming tools for Python do you recommend? I may use them as inspiration for a design.

Thanks in advance for any suggestions.

12 comments:

Anonymous said...

This sounds a lot like Leo, which is a programmer's editor which also happens to be written in Python. You basically write text organized like an outline. Some chunks of text can be code, others just text. You can save the code blocks in a whole outline so that it is in a .py file which Python can execute, or you can execute code blocks directly in Leo which allows you to extend the functionality of the editor quite easily once you get the hang of a few internal objects. Leo also supports formal literate programming as defined years ago, but it doesn't force you to work that way if you don't want to.

I especially like the fact that you can clone an entire chunk of the outline without actually making copies of the contents, in other words like a pointer. This lets you reassemble code blocks from a document and put them in the order that the code requires. But you still have only one true copy so the document and .py file stay in sync.

Have a look at http://webpages.charter.net/edreamleo/front.html for more info.

jtauber said...

A few years ago I wrote about versioned literate programs as ways of writing tutorials but I never implemented a system. See Versioned Literate Aspect-Oriented Programming from 2004 and Revisiting Versioned Literate Programming from 2005.

Zed A. Shaw said...

Well, I'm working on Idiopidae which does the inverse: prose and code are in separate files and it merges them together for the final output.

It doesn't care what the text is, you just tell it how to format things (html, latex, whatever pygments supports).

It wouldn't be hard to give it an exec action that would take code you put in, run it, and then replace on the output with the results.

Let me hack a bit tonight and see what I can come up with.

Project at http://www.zedshaw.com/projects/idiopidae/

Chris Arndt said...

Hi Doug,

incidentally, we will mentor a Google Summer of Code project the goal of which it is to create a tool that seems very like what you are looking for. To get some ideas about what we (the TurboGears team) are planning, have a look at the project idea [1] and Mark Ramm's blog posts [2] [3] on this topic. More detailed plans for the project will surely emerge soon, when the GSoC projects get underway.

[1] http://tinyurl.com/6yavqm
[2] http://tinyurl.com/67h6vt
[3] http://tinyurl.com/568zcf

Martin said...

It's not a language, but it's a simple tool to extract verbatime blocks from files written in docutils: rst-literals (furius.ca/pubcode).

kib said...

Hi Doug,

have you ever tried PyLit ?

http://pylit.berlios.de/

Doug Hellmann said...

@anonymous - I noticed Leo, and it looks interesting, but it also looks like it wants me to learn a special markup. Do I have that wrong?

@jtauber - Thanks for the links, I'll check those out.

@zed - It sounds like Idiopidae is almost the opposite of what I want, but it could work for me. I'll definitely give it a look.

@Chris - Let me know if you need a tester for the GSoC project. :-)

@Martin - rst-literals looks like something I could use to create the tool I want, but I'm still hoping someone else has already written it for me. :-)

@kib - I haven't tried any of the existing literate programming tools, yet. PyLit looks like a good candidate. Does it have the feature to let me include the output of my program in the text?

ToddB said...

I tried leo for an experiment about a year ago. Works extremely well. I was able to put my unit tests and code in same leo file. Created a button that let me generate and run unit tests seperately. I was also able to create docs simultaneously using rst, really nice. There isn't really any markup, it uses cweb/noweb internally to combine the files. LEO file is actually XML. There are a few tags/symbols for things like indention, language for syntax coloring and comments. Very well documented and the primary developer is amazing about answering questions and adding requested features. You should definitely look at it, imo, most powerful editor ever.

Eric said...

The doctest module can do most of what you request, except for automatically updating the original file.

doctest.testfile() treats an entire file like a docstring, running any lines marked by ">>>" and checking that the output matches the next lines in the file. Unfortunately, it does not seem to accept any other method of marking a code block; however, each code block may be indented more than surrounding text as long as it ends with a blank line.

doctest.script_from_examples() can generate a python script from a doctest file, stuffing the expected output and other paragraphs into comments.

Doug Hellmann said...

@ToddB - The "tags/symbols" Leo uses was what I didn't want to have to learn. I have to keep up with too many markup formats already, so I'm trying not to add another one. It doesn't seem like a special markup should be needed, really. I still do need to give it another look, but I like TextMate and emacs so I'm looking for something stand-alone that would work from either of them.

@Eric - Using doctest might work for examples that aren't long blocks of code, but I structure most of the examples as real modules so it's easy for readers to download them and play around.

hsm said...

I think you need a clearer idea of what Literate Programming is. Try starting with Knuth--- it is his idea after all. Look at web and cweb. They won't work for your needs, but they will give you an idea of what LP means. Fast forward to your current notion. Try and separate out wishful thinking and user expectations of magic. If what is left over still seems worth it, then pick and try until you find something you can live with. LP requires discipline so don't expect a low slope learning curve or everything to happen in a day and a half. Good luck. BTW I've been using LP since cweb, although I switch around as my needs are multi-lingual.

Doug Hellmann said...

@hsm - I'm not really clear what you mean about "wishful thinking". Is it wishful to ask if something I want already exists so I don't have to write it myself? Isn't that how lots of software projects start?

To be clear, I'm definitely not trying to do Literate Programming. In fact, I'm pretty sure what I want is not what Knuth meant for LP to be, and I'm entirely comfortable with that. LP is not what I need, but it's similar enough to what I want that I thought one or more LP tools might be useful.

I haven't made any progress on this, though, since I've been too busy with my existing projects to start a new one (another reason I hoped someone else had already done what I was describing).