Showing posts with label BlogBackup. Show all posts
Showing posts with label BlogBackup. Show all posts

Saturday, February 10, 2007

How NOT to Backup a Blogger Blog

Over at the Google Operating System blog, they offer a way to "backup" your blog. It is mostly a manual hack to load the entire blog into one page in a web browser, then save the resulting HTML, though a similar technique is offered for saving the contents of your XML feed.

There are a few problems with this technique:

  1. It depends on knowing how many posts are in the blog, up front.
  2. The steps and tools given are manual.
  3. Comments are handled separately.
A backup needs to be automated. If I have to remember to do something by hand, it isn't going to be done on a regular basis. I want to add to my blog without worrying about how many posts there are and tweaking some backup procedure that depends on knowing all about the content of the blog up front. I want comments saved automatically along with each post, not in one big lump. And if I need to import the data into a database, I want the backup format to support parsing the data easily.

What to do?

Enter BlogBackup, the unimaginatively named, fully automatic, backup software for your blog. Just point the command line tool at your blog feed and a directory where the backup output should go. It will automatically perform a full backup, including:
  1. Every blog post is saved to a separate file in an easily parsable format, including all of the meta-data provided by the feed (categories, tags, publish dates, author, etc.).
  2. Comments are saved in separate directories, organized around the post with which they are associated. Comments also include all of their meta-data.
  3. The content of blog posts and comments are copied to a separate text file for easy indexing by desktop search tools such as Spotlight.
Since the tool is a command line program, it is easy to automate with cron or a similar scheduling tool. Since it is fully automatic and reads the feed itself, you do not need to reconfigure it as your blog grows. And the data is stored in a format which makes it easy to parse to load into another database of some sort.

So, go forth and automate.

Sunday, February 4, 2007

Better blogger backups

I have enhanced the blog backup script I wrote a while back to automatically find and include comments feeds, so comments are now archived along with the original feed data. The means for recognizing "comments" feeds may make the script work only with blogger.com, though, since it depends on having "comments" in the URL. This does what I need now, though.

Monday, January 1, 2007

backing up a blog

Since blogger doesn't support exporting the contents of a blog without hacking around and republishing it, I decided to throw together a little application to handle the backup based on the feed.

The resulting Python script should work with any feed type, since I used the feedparser module to process the feed, but I have only tested it with this blog's Atom feed.

If you are interested, check it out.