Showing posts with label blogging. Show all posts
Showing posts with label blogging. Show all posts

Sunday, April 13, 2008

Shell history, jigs, & subversion

Everyone else is showing theirs, so here's mine:

$ history|awk '{a[$2]++} END{for(i in a){printf "%5d\t%s\n",a[i],i}}'|sort -rn|head
162 svn
99 ls
80 rtop
69 sudo
63 cd
55 dotest
51 workon
23 make
21 close_branch
21 cl2svn


Software Jigs:

Does it say anything in particular about me that half of those commands are aliases or scripts I or my co-workers have created to wrap up other tools?

rtop - is a bash alias to change directory to the top of sandbox. I have an environment variable pointing there, too, but I guess I don't like typing $.

dotest - is an alias to run tests with our tracing module turned on, preserving the output in the same log file each time. We have a very verbose trace module that prints function inputs and outputs as our program executes. It is superior to logging for low-level debugging, but entirely unsuitable for production use (it's easy to turn on and off).

workon - is a shell function that swaps out different sandboxes so I can work on multiple branches on the same system. Our test framework requires an installed version of the whole system, unfortunately, and I don't like to mix patches from multiple branches by copying files into the install tree. Running workon rearranges symlinks so I can replace the install tree with the build tree from my sandbox of choice. Shell functions are an under-appreciated implementation technique for something that has to operate on the current environment (workon changes directory to the new sandbox) but is more complicated than what would fit in an alias.

close_branch - is a bash script that takes a short branch name and deletes the branch and any "rebase" branches based on it using the long URL. We have a whole set of little scripts like this that we've written in house.

cl2svn - finds changes in ChangeLog files in my svn sandbox, extracts the new messages, and produces a single (sorted) output list formatted nicely to show up in trac. We use ChangeLog files and trac commit messages as part of the documentation for our code review process, so having everything formatted nicely is important. I used to do this by hand, but after one particularly large changeset I came up with this Python app to do the work for me.

Wrapping Subversion:

I mentioned close_branch as a subversion wrapper. There's a make_branch script, too, to save from making typos in long URLs.

Another shell function, mksbox, finds a free sandbox in my pool and switches it to use a particular branch. Our build tree is pretty large, so it is way more efficient to just keep a bunch of sandboxes around and switch them to point to different branches with "svn switch" instead of checking out a full copy every time.

My favorite, though, is merge_branch, which figures out the start point of a svn branch and merges all of the changes from that branch into the current sandbox. I'm a little surprised that make_branch and merge_branch didn't show up higher in the list, but they're in the top 20.

We wrote these wrapper scripts a couple of years ago, when we switched from CVS to svn. We had similar tools for CVS, but branching worked differently and we didn't use branches as often then. Now every ticket gets its own branch, so managing branches is a daily operation. A typical development cycle for me looks something like this:


$ make_branch 6583 # that's a trac ticket number
$ mksbox 6583 # automatically does a workon for that sandbox
$ dcctl restart # restart our daemon services to pick up the sandbox change
# add feature or remove bug
# update ChangeLog files
$ cl2svn | tee changes.txt
$ svn commit -F changes.txt
# request code review for changeset
$ prepare4commit.sh # switch current sandbox to trunk & merge in the branch
$ docommit # commit, using the first line of changes.txt for log message
$ close_branch 6583 # clean up after myself


When we switched off of CVS, we had some particular needs that weren't met by svn directly (especially the way we do code reviews). There are a whole host of tools for wrapping svn out there now. sv-subversion looks interesting, but I haven't tried it. If our code didn't make assumptions about the install path, we could probably just use DivmodCombinator, which looks like it has a lot of the features we've rolled ourselves, but the inertia for changing now is pretty high, and the benefits aren't great enough.

Sunday, April 6, 2008

One year of "The Python Module of the Week"

It's a bit passé to recognize blogging anniversaries, but as it's my first I'm going to do a little navel gazing retrospecting anyway. :-)

I just realized this afternoon that I had missed celebrating the first anniversary of PyMOTW by a few weeks. I started the series as an excuse to force myself to write something once a week. At the time, it seemed like a somewhat lame idea and I wasn't sure I would keep it up. There are any number of reference guides for the standard library out there. Sitting down to read through one isn't that exciting, though, so I thought writing example code with all of the modules would be a way to force myself to actually study the modules I didn't use on a regular basis.

The first real post from 25 March 2007 covered the fileinput module. It wasn't until several posts into the series that I started collecting and releasing the code through PyPI, so the version number for the source package is only up to 1.48 even though I've done more than 52 weeks worth of modules. (The os module took 4 weeks, so I haven't done as many modules as weeks of posts.)

My Writing Process:

My process for creating the posts has changed substantially over the last year. The first few posts were posted through the web form on blogger.com. They consisted of a lot of hand-edited HTML combined with output from the web version of pygments (used to highlight the syntax in the code examples). I also used to write the prosoe for each post first, and the code samples later.

Now, I have the entire process reversed. I work through all of the code examples before writing any prose. The code comes more quickly, and I can revise and refactor it so the examples work together without having to go back and edit the rest of the text. Once I have the code finished, I use a combination of shortcuts I've built for TextMate and MarsEdit to assemble the post and write the prose portions. It takes me a lot less time to create a single post now that I've refined the workflow. The post on the operator module from today, for example, only took a couple of hours (with interruptions). It is a little skimpy on prose, though.

Future Plans:

Since the beginning, I've had a fair number of comments (online and off) from people who tell me that the posts have been personally useful to them. I appreciate that sort of feedback, and it motivates me to keep going. I'm running out of the "simple" modules, and as I've also started working on Python Magazine over the past year, I don't actually have the same amount of free time any more. Having a bit of extra motivation will spur me to pick up some of the bigger modules like email and elementtree.

At the rate I'm going, I'm not going to finish the whole library before Python 3.0 comes out, and the current plans call for some modules to be removed, deprecated APIs to be dropped, and other sorts of changes. The rules say some modules can even be renamed. When that settles down and there is an actual release, I'll probably stop writing about 2.x and pick up with 3.0. I haven't decided yet, though.

Python Module of the Week Home


Technorati Tags:
,




Wednesday, November 14, 2007

new version of LinkingToMe

Version 0.3 of LinkingToMe remembers the link history and highlights new links since the previous run in bold.

Sunday, November 4, 2007

See who is linking to you

Google's Webmaster Tools site provides a reporting feature to let you see who is linking to you. Unfortunately, the report is backwards from the orientation I want to read it. It lists the remote links for each of your local pages. I want to see all of the local pages linked on a remote site grouped together. That helps me recognize trends and identify people who might be blogging about what I write here.

Luckily, in addition to the interactive report on the tools web site, you can download the data in a CSV file to be manipulated in any way you want. I put together a little script to produce an HTML file which shows what sites link to me, and the target links on my site. For example, I was a bit surprised to discover several links on an Italian Israeli site. Here's a segment of the output of LinkingToMe:



There are a lot of links on del.icio.us to the PyMOTW articles, and while that's cool it isn't very useful in this report. I included an option to filter out sites by their hostname, and included several bookmarking sites as defaults.

So far I'm not doing any other processing on the data (such as downloading the titles on those remote pages). Perhaps when I have a little more time I'll enhance the script.

[Corrected country of origin for whatsup.co.il.]

Sunday, May 13, 2007

blogs and preservation

Ms. PyMOTW sent me a link to the survey UNC-Chappel Hill School of Information & Library Science is conducting called "Blogger Perceptions on Digital Preservation". If you blog, you might want to go participate. They ask thoughtful questions, and it only takes 5-10 minutes.

MarsEdit Test

This is a test publishing through MarsEdit.

Monday, May 7, 2007

clip-to-blog

I've noticed Jeremy Wagstaff using this service for a while, so I thought I would give it a try, too.
clipped from addons.mozilla.org

With Clipmarks, you can clip the best parts of web pages. Whether it's a paragraph, sentence, image or video, you can capture just the pieces you want without having to bookmark the entire page.

 blog it


Updated to fix the embedded HTML which Clipmark apparently doesn't like.

Sunday, May 6, 2007

ironic enough?


I can't decide if I should order one of these t-shirts from Alexa or not.

Saturday, March 31, 2007

Testing pygments

This is a test post to see experiment with the code hightlighting output from pygments.org (as recommended by a couple of commentors on my previous post). Pygments produces HTML with CSS-based styling, so I have added a bunch of new styles to my blogger template. And I am including as a sample the same Python code posted earlier with the alternative syntax highlighting tool.

    def main(self, *m3ufilenames):

self.startRSS()
self.generateChannelInfo()

for line in fileinput.input(m3ufilenames):
mp3filename = line.strip()
if not mp3filename or mp3filename.startswith('#'):
continue
self.generateItem(mp3filename)

self.endRSS()

return 0


So, let me know what you think of the 2 methods, and which looks better.

Sunday, March 25, 2007

Converting Python source to HTML

For my PyMOTW series, I have found that I want to convert a lot of python source code to HTML. In a perfect world it would be easy for me to produce pretty XML/HTML and use CSS, but it is not obvious how to use CSS from Blogger. Instead, I am using a CLI app based on this ASPN recipe which produces HTML snippets that I can paste directly into a new blog post. The output HTML is more verbose than I wanted, but I like the fact that it has no external dependencies.

If you have any alternatives, I would appreciate hearing about them.

Saturday, February 10, 2007

How NOT to Backup a Blogger Blog

Over at the Google Operating System blog, they offer a way to "backup" your blog. It is mostly a manual hack to load the entire blog into one page in a web browser, then save the resulting HTML, though a similar technique is offered for saving the contents of your XML feed.

There are a few problems with this technique:

  1. It depends on knowing how many posts are in the blog, up front.
  2. The steps and tools given are manual.
  3. Comments are handled separately.
A backup needs to be automated. If I have to remember to do something by hand, it isn't going to be done on a regular basis. I want to add to my blog without worrying about how many posts there are and tweaking some backup procedure that depends on knowing all about the content of the blog up front. I want comments saved automatically along with each post, not in one big lump. And if I need to import the data into a database, I want the backup format to support parsing the data easily.

What to do?

Enter BlogBackup, the unimaginatively named, fully automatic, backup software for your blog. Just point the command line tool at your blog feed and a directory where the backup output should go. It will automatically perform a full backup, including:
  1. Every blog post is saved to a separate file in an easily parsable format, including all of the meta-data provided by the feed (categories, tags, publish dates, author, etc.).
  2. Comments are saved in separate directories, organized around the post with which they are associated. Comments also include all of their meta-data.
  3. The content of blog posts and comments are copied to a separate text file for easy indexing by desktop search tools such as Spotlight.
Since the tool is a command line program, it is easy to automate with cron or a similar scheduling tool. Since it is fully automatic and reads the feed itself, you do not need to reconfigure it as your blog grows. And the data is stored in a format which makes it easy to parse to load into another database of some sort.

So, go forth and automate.

Sunday, February 4, 2007

Better blogger backups

I have enhanced the blog backup script I wrote a while back to automatically find and include comments feeds, so comments are now archived along with the original feed data. The means for recognizing "comments" feeds may make the script work only with blogger.com, though, since it depends on having "comments" in the URL. This does what I need now, though.

Saturday, January 6, 2007

Blog location change

I've decided to take advantage of the new Blogger feature "Custom Domains" and move my blog under my own domain. This is a much more attractive feature than the older ftp publishing since Blogger still hosts the content for me.

If all goes well, it should be transparent and all of the old URLs should redirect to the new domain.