tag:blogger.com,1999:blog-5440028356946346379.post7186452976159160099..comments2008-05-18T20:11:12.144-04:00Comments on Doug Hellmann: PyMOTW: heapqDoug Hellmannhttp://www.blogger.com/profile/01892352754222143463noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-5440028356946346379.post-81809684998679237502008-05-18T20:11:00.000-04:002008-05-18T20:11:00.000-04:00Thanks for the tip - I didn't know about that.So I...Thanks for the tip - I didn't know about that.<BR/><BR/>So I looked into their source code and found out that the heapq module stores everything in the opposite order from what I do (root node is smallest element - not the largest one). So when I did nlargest, it simply sorted the whole list!<BR/><BR/>nsmallest, which was "equivalent" to what I was doing, works. <BR/><BR/>I also replaced my function with a nonrecursive one - no difference. So at the end: With psyco my version is much faster than heapq, but without it is much slower...Beetle B.noreply@blogger.comtag:blogger.com,1999:blog-5440028356946346379.post-15218033904887690012008-05-17T16:26:00.000-04:002008-05-17T16:26:00.000-04:00I'm sure a non-recursive key function would have a...I'm sure a non-recursive key function would have an impact.<BR/><BR/>As far as where the heapq source is, try this:<BR/><BR/>$ python<BR/>>>> import heapq<BR/>>>> print heapq.__file__Doug Hellmannhttp://www.blogger.com/profile/01892352754222143463noreply@blogger.comtag:blogger.com,1999:blog-5440028356946346379.post-55607939847568041252008-05-17T16:23:00.000-04:002008-05-17T16:23:00.000-04:00Actually, in a sense, I compared psyco on both. ps...Actually, in a sense, I compared psyco on both. psyco had little impact on the heapq function. <BR/><BR/>I used the algorithms given in the Cormen Intro to Algorithms book. <BR/><BR/>I suspect the reason I got a good speedup with psyco is that a key function I wrote is recursive. I'll try writing a nonrecursive one and see what difference that makes. <BR/><BR/>But in a more general sense, I would think trying to find the top 10 numbers among 10 million of them using heaps should be significantly faster than using sort. In Python, using heapq, it really wasn't much faster (perhaps a speedup of 2 at most) - which made me wonder if there's some inefficiency in the heapq code. <BR/><BR/>Do you know where on my system (Linux) I can find the source for heapq? I honestly don't have a clue.Beetle B.noreply@blogger.comtag:blogger.com,1999:blog-5440028356946346379.post-64438726372288659862008-05-17T16:00:00.000-04:002008-05-17T16:00:00.000-04:00Beetle B., the standard library documentation does...Beetle B., the standard library documentation doesn't talk a lot about performance of heappush vs. heapify, but I wouldn't be surprised to find heapify more efficient.<BR/><BR/>I'm also not surprised to find the code optimzied with psyco runs faster than native Python. That's the point, right? :-)Doug Hellmannhttp://www.blogger.com/profile/01892352754222143463noreply@blogger.comtag:blogger.com,1999:blog-5440028356946346379.post-27008718973856537552008-05-17T14:17:00.000-04:002008-05-17T14:17:00.000-04:00Thanks for highlighting the module. 1. I haven't l...Thanks for highlighting the module. <BR/><BR/>1. I haven't looked at the source code, but usually heappush is slower (O(n*logn) vs O(n) for heapify). It'd be worth experimenting, but I suspect that for large enough n, it would be faster to read the elements into a list, and heapify it rather than heappush one by one. <BR/><BR/>2. Regarding nlargest(), here's a relevant blog <A HREF="http://wordaligned.org/articles/top-ten-tags" REL="nofollow">post</A>. The gist of it is that if you want to get the top n elements, you can either sort the whole list and read them off, or construct a heap and keep popping the largest elements. The assertion is that the latter approach is faster (as you need not sort the whole list, and creating a heap is O(n)).<BR/><BR/>So I decided to try it out. I created a random list of 1 million integers. I then wrote my own (very simple) heap functions - I did not know at the time of the heapq module - and timed it.<BR/><BR/>Then I found out about the heapq module and used nlargest(). I was quite surprised to see that my routine was 2-3 times faster (<B>if</B> I used psyco) than the heapq function. <BR/><BR/>I can send the code over if you want to compare...Beetle B.noreply@blogger.comtag:blogger.com,1999:blog-5440028356946346379.post-84106445765499020002008-05-11T16:26:00.000-04:002008-05-11T16:26:00.000-04:00Thanks, István! By the way, the code to print the...Thanks, István! By the way, the code to print the heap is included in the example source tarball, even though I don't show it here in this post.Doug Hellmannhttp://www.blogger.com/profile/01892352754222143463noreply@blogger.comtag:blogger.com,1999:blog-5440028356946346379.post-16445404272332782482008-05-11T16:20:00.000-04:002008-05-11T16:20:00.000-04:00What a great tutorial! Showing how the tree change...What a great tutorial! Showing how the tree changes after pushing values onto the heap are fantastic. <BR/><BR/>I'm putting "LazyWeb" on notice ... from now on there is no excuse for not understanding how heaps work ;-)István Alberthttp://www.blogger.com/profile/17320536844472897403noreply@blogger.com