Sunday, January 28, 2007

testing regular expressions

I discovered Christof Hoeke's retest program today. This is a very slick use of Python's standard library HTTP server module to package an AJAX app for interactively testing out regular expressions. I used to have a Tkinter app that did something similar, but Christof's is much lighter weight.

Now I need to figure out how to package it to run as an app when I double click on it in the Finder, instead of opening the .py file in an editor.

CastSampler.com monitoring feeds

On the plane back from Phoenix this week, I implemented some changes to the way CastSampler.com republishes feeds for the sites a user subscribes to. The user page used to link directly to the original feed so it would be easy to copy it to a regular RSS reader to keep up to date on new shows. That link has been replaced with a "monitor" feed which uses the original description and title for each item, but replaces the link with a new URL that causes the show to be added to your CastSampler queue. The user page still links to the original home page for the feed, so I think I am doing enough as far as attribution and advertisement. Any author information included in the original feed is also passed through to the monitor feed. The OPML file generated for a user's feeds links to these "monitor" feeds instead of the original source, too.

The goal of these changes is to make it easy to use a feed-reader such as Bloglines or Google Reader to monitor podcasts from CastSampler. To add an episode to your queue, just click the link in the monitor feed to be directed to the appropriate CastSampler.com page.

By the way, how cool is it to be able to develop a web app on my Powerbook while I'm on a plane? What an age to be alive.

Saturday, January 27, 2007

Adium ChatMonitor

We recently set up our own Jabber server at work. For a short time we had been using an IRC server, but decided for a variety of non-technical reasons to switch to Jabber. The benefit is now I only have to run one chat client (Adium). The downside, is I miss the feature of Colloquy which had a special notification event for when I was mentioned by name.

I searched for a while, but didn't find any way to add such a notification to Adium. I finally hacked something together using an AppleScript triggered for every incoming message. I'm sure there must be a better way to achieve the same results, but this works. Eventually I should learn more about how to develop true OS X apps using Objective C, and then I can create a real plugin to do the same thing.

Nothing new under the Sun

Or should I say IBM?

It turns out IBM Alphaworks already has a data visualization project called Many Eyes that can render network diagrams as I described in my earlier post. The demos look impressive.

Their UI for adding data requires you to upload from a separate source, which makes the social aspect of my idea more difficult to implement. Perhaps Many Eyes can be used as the visualization front-end for a site that collects the data. Any data uploaded to Many Eyes becomes publicly visible, but that's not an issue since the original site would have similar rules.

The network visualization from Many Eyes is more limited than what I would want to see in a full featured tool, though. It could be very useful to be able to see the types of relationships (using different colors for edges, etc.). They also point out that since the rendering is done in the browser, it may not be well suited for large data sets or "for networks in which a lot of nodes have a large number of neighbors".

Originally spotted on Boing Boing.

Sunday, January 14, 2007

Visualizing People and Relationships

While I'm thinking about digraphs and visualization, I want to describe another idea for a website I have been mulling over. It would offer a way to see the relationships between people using a digraph rendering engine.

There would be a central organizing theme for a given rendering. It might be the current political scandal, an emergency response plan, a corporate organizational chart, or any other theme by which people are related to each other. Each theme would have a rendering of the current members and their relationships, as a digraph. Users could add people (nodes) and relationships (edges). Relationships could have supporting documentation in the form of URLs (useful for scandal tracking).

The UI would not need to be very complicated. To add information, you just need a simple form with 2 fields for node names, a description of the relationship, and optional URLs to supporting documentation. You could get fancy with auto-completion of the node names, but that's just a detail. Editing a node/edge uses a similarly simple form. Each theme page would also have an RSS feed, of course, of changes.

It would also be useful to be able to see the themes a node was involved in, as an alternate view. So an individual lawmaker might show up in a theme for a campaign and a general legislative topic.

As with any social site, suppressing malicious input might be tricky. Using the wikipedia model of allowing anyone to edit anything, flag content as suspicious, and block edits to prevent flame-type wars might be enough.

All of the graphs should be available as image files. The question is, are they rendered on the fly or on some regular basis? That would depend on how expensive the rendering is. Obviously they only need to be re-rendered after a change, so we want to cache the output files.

PyUGraph

I am continuing to migrate my old project repositories from CVS to svn. In the process, today, I found some old code I wrote in 2001 (or earlier) to generate input files for daVinci, an old di-graph visualization package. It turns out that daVinci has been renamed to uGraph, so when I released the code I updated the module name.

There are now other, possibly better, graph visualization tools available. NetworkX looks very promising. It uses Graphviz, which produces some really nice output. But, daVinci was the first tool I used for doing relationship analysis. I used it to analyze calls between functions in some nasty Perl code I was maintaining. I also used it to analyze the module linkage dependencies in a large C toolkit library, with the idea that we would split the big library up into several smaller .so files for release. And there have been several one-off projects along the way, too. Unfortunately, I seem to have lost most of that code.

Spam Irony

In my spam research today, I came across this link to a blog post discussing POPFile, a POP3 spam filtering tool. I've seen the tool before, and I'm not even sure why I bothered to read the post, but I'm glad I did. This bit from the end caught my eye:

Steve Shaw is the developer of PopUpMaster Pro, which allows you to add unblockable popups to your web site quickly and easily, specifically designed to sign up subscribers to your list, and fast.
It's good to see that the marketers are not immune to the problem.

Object-Relational Mappers

My friend Steve and I have spent some time discussing object-relational mapping recently, partially initiated by his comments on the ORM features in django.

For some reason I've never quite understood, there seems to be an inherent fear of SQL in the web development community, and over the years there have been many efforts to hide the SQL completely (or in the case of Zope, encourage the use of a custom object database instead of a relational database). Personally I'm wary of any form of object relational mapping which works automatically. What I do want is a nice abstraction layer (sometimes called the data access object pattern), so that the code working with objects doesn't know that the objects are actually stored in a relational database.
I tend to agree. I'm confused by the intense need to create a new way to express a relational database schema in Python, Ruby, or any other language. The DDL is a perfectly legitimate way to express the schema of the database. Why not just use it?

We use an ORM like that at work. The whole thing was written several years ago before the SQLObject and SQLAlchemy ORMs were available, of course, or we would be using one of them. The database connection layer scans the database table definitions when it connects for the first time. The base class for persistent objects uses that information to discover the names and types of attributes for classes. We do it all at runtime now, though we have discussed caching the information somehow for production servers (maybe using a pickle or even writing out a python module during the build by parsing the DDL itself). Scanning the tables doesn't take as long as you would think, though, so it hasn't become a hot-spot for performance tuning. Yet.

Steve suggested a slightly different design. Use DDL to define the schema, then convert the schema to base classes (one per table) with a code generator. Then subclass from the auto-generated tables to add "business logic". I'm not sure how well that would work, but it sounds like an interesting idea. If the generated code is going to support querying for and returning related objects, how does it know to use the subclass to create instances instead of the generated class?

I do like the automatic handling of queries for related objects, and the system used by django is particularly elegant. Two features I especially like are:
  1. The QuerySet isn't resolved and executed until you start indexing into it.
  2. Modifying a QuerySet by applying a filter actually creates a new QuerySet.
This means passing QuerySet instances around is inexpensive, and callers do not have to worry about call-by-reference objects being modified unexpectedly. I need to study SQLAlchemy again, to see how it handles query result sets.

Saturday, January 6, 2007

Blog location change

I've decided to take advantage of the new Blogger feature "Custom Domains" and move my blog under my own domain. This is a much more attractive feature than the older ftp publishing since Blogger still hosts the content for me.

If all goes well, it should be transparent and all of the old URLs should redirect to the new domain.

Friday, January 5, 2007

Entrepreneurial Debt Waivers

The company I work for came out of the Advanced Technology Development Center at Georgia Tech, which is an incubator for small companies run by the university. Among other resources, ATDC provides nice facilities with shared conference and break rooms but private office and lab space. There were a lot of companies in the incubator, at various levels of maturity. There were regular get-togethers and plenty of opportunity to exchange ideas with people down the hall. Our company has since graduated, but the time we spent there meant we didn't have to worry about a lot of little details that come up with a business.

Ed Kohler writes about an idea for VC firms over at Technology Evangalist. The basic idea is to grab students as they graduate (probably before) and set them up so they have no loan debt and a reasonable salary in exchange for a stake in whatever idea they happen to be working on. Kohler's idea takes the normal incubator like ATDC one (or more) step farther by suggesting paying off student loans and and the housing rent as well as the office space.

It seems natural to extend this even further and combine the 2 systems. Why not buy an office/appartment building? Offer a variety of apartment sizes to accomodate married and single people, etc. Provide office space, a food court, the works. Some of the space could even be rented to companies that are not part of the VC fund. The point is to pull all of it together into one place to keep the energy level high and make it an attractive place to be in addition to sharing whatever resources can be shared between companies.

Maybe the whole thing is done by renting out floors of a multi-use building that someone else owns under a single lease, then subletting the space (instead of buying the building out-right). I tend to think in terms of high-rises because I work in Atlanta. In other areas, you might want a campus or office park. I'm sure there are a lot of ways to structure it.

In the end what you get is a startup "factory" that churns out ideas on a (hopefully) regular basis. You bring a new crop of people in each semester when they graduate. Start small with each new company. Each "startup" is owned by a holding company in the beginning. If it starts to look promising, spin it off to its own company as needed. If an idea isn't panning out, kill it and either move the people to another project or cut them loose and let them try it on their own.

Maybe multiple VC firms would work together to fund the thing - I don't know how well the politics of that would work, but I'm not a VC.

Hmm. This all sounds a lot like the old research labs from before everyone wanted to be out on their own...

page rank

A few months ago when I googled myself, I came up with a variety of random old posts to forums or mailing lists. Most of the information was stale. After a couple of weeks of having this blog online, and just a few days of having my personal site online, those have hit the top of the search results list for "doug hellmann". Somehow that's satisfying.

Tuesday, January 2, 2007

Coder's Block

Logan Koester posted some tips for overcoming Coder's Block.

I get blocked, once in a while, too. In those situations, it almost always comes with the feeling that the problem I am trying to solve is too big. That, in turn, usually stems from not having thought about the problem enough, rather than the other way around.

The development staff at my company is pretty small, so we are all involved in each new feature from "front to back", as it were. I like to start by thinking about the user interaction aspect of the problem. It doesn't make sense to start with the back-end design until you know what the front-end is supposed to do, right? So I think about what operations the user needs to perform, then what inputs are needed to handle them. From there I can work out how many of those inputs should be stored for re-use.

I like to draw diagrams, since I find they are easier to re-assimilate when I come back to a problem after some time. So I may sketch out a few UI screens, or draw a few boxes and arrows to understand the relationships between objects (I use a sort of pidgin UML for that). I also make lists of attributes I might need for classes, since those map to the database schema.

There are plenty of good tools for making such sketches on the computer, but I guess I'm Old School. I find that sitting down with a pen and paper, away from the computer, helps clarify my thoughts. Since I don't have my text editor, the temptation to write code is reduced and I can concentrate on the big picture. And once I have the big picture worked out, the way forward is usually clear.

Monday, January 1, 2007

backing up a blog

Since blogger doesn't support exporting the contents of a blog without hacking around and republishing it, I decided to throw together a little application to handle the backup based on the feed.

The resulting Python script should work with any feed type, since I used the feedparser module to process the feed, but I have only tested it with this blog's Atom feed.

If you are interested, check it out.