Sunday, July 13, 2008
Wednesday, May 28, 2008
Book Review: Einstein: His Life and Universe, by Walter Isaacson
Mrs. PyMOTW gave me a copy of Einstein: His Life and Universe by Walter Isaacson for Christmas last year, and I've finally managed to find time to read it. If you are interested in history, science, and Einstein in particular, I highly recommend the book.
General Notes:
It took a couple of weeks of reading in the evenings, but that was mostly short-burst sessions; the prose flows very smoothly. Isaacson is a good story teller and has created an engaging view of Eintstein as a man and as a scientist.
The book is organized in a semi-chronological way, with some overlapping sections focusing on different aspects of the same time period. This allows Isaacson to tell all of the stories clearly, yet stitch them together by referring back to earlier quotes and events. The end result is a coherent narrative that exposes the personal side of Einstein as much as his professional or public sides. I found this writing style easy to follow and quite effective.
Politics:
Einstein was more politically active than I realized; I learned about his strong ethical nature, and especially his activism against war and racism. His rejection of tribalism and nationalism, along with the regimented militarism of Germany's schools at the time, led him to become a pacifist, and then eventually support World War II to fight fascism. While he had some socialist beliefs, he also rejected the communism practiced in the Soviet Union, since it oppressed the people there. He said, "Any government is evil if it carries within it the tendency to deteriorate into tyranny". After he settled in Princeton, he repeatedly said that he would not live in a country where people lacked the freedom of speech and thought.
From an early age, Einstein supported the establishment of a strong global government as a way to prevent war. After the development of nuclear weapons, he felt even more strongly that a true transnational governing body should have control over such destructive power.
Science:
Of course no biography of Einstein would be complete without descriptions of his major scientific contributions. It is clear that Isaacson enjoyed researching the scientific side of his subject as much as, or more than, his personal life. He uses many of Einstein's own thought experiments to describe the work in terms that are easy for a non-physicist to understand. Although true understanding requires complex mathematics, this book does not.
Other Links:
There has been quite a bit of attention on Einstein lately. Here are a few links to items I found during the time I was reading the book.
Einstein Letter on God Sells for $404,000
Einstein's God (Speaking of Faith from American Public Media)
Einstein's Ethics (Speaking of Faith from American Public Media)
Saturday, December 29, 2007
Book Review: The Definitive Guide to Django

I'm working on a new web-based project, and continuing the process of learning django, so I was very pleased to receive my copy of The Definitive Guide to django by Adrian Holovaty and Jacob Kaplan-Moss in the mail recently.
Contents:
The book is divided into 3 sections. The first, including chapters 1-8, covers introductory material such as setting up a project, using the template system and database layer, etc. This was familiar material after several readings of the tutorials on the django web site, but it has been cleaned up and organized nicely in the book.
The second section (chapters 9-20) cover "subframeworks" and dig deeper into topics which, while documented online, I've found to be more difficult to "discover". The chapters on generic views, syndication and site-map generation, and caching were particularly helpful. I also appreciated the advice on production deployment in chapter 20. Some of the mystery has been removed from these topics, and I learned about features I didn't even realize existed.
The last section, consisting of 8 separate appendixes, is a reference manual for the various layers of django. All of the topics covered in the main part of the book are included with more concise descriptions of methods and more complete listings for functions or methods not discussed earlier.
My Review:
I'm glad I bought the book. It presents much of the same material you can find online, but having it available in book form made it easier for me to read without being distracted by trying the material at the same time. I like to read, absorb, then try when I'm learning about a new technology, and I find it much easier to read and absorb away from the computer where there is no temptation to try writing code before I'm really ready.
The writing is easy to read, but not dumbed down. Between the holidays and the writing style of the book, I was able to blaze through the whole thing in about a week's time (I admit to skimming parts of the reference section during that initial reading). There is a good mix of factual information and best practices tips with arguments backing up the opinions. You don't have to agree with the suggestions, but you are more informed after reading them.
I've been using the appendixes as a handy reference while working on the templates and database queries for my project, and it has made development quite a bit easier. The online references for django are quite good, but flipping back and forth in the physical book is actually quicker in a lot of cases.
Tuesday, November 13, 2007
Needed: SQL/Database design book recommendation
I need a book to teach someone about basic database design. They don't need relational algebra or calculus, and they don't have to be an expert about highly optimized storage, indexing, or anything like that. They just need some basic normalization, column type selection, and query help for what should be a pretty simple database.
They took a college class on RDBMSes, but the class and accompanying book were both terrible. The book from my class is great, but is more complex than what they want and need. I'm aware of the Dummies and Idiot series books, but I would prefer to avoid those, if possible.
I'd rather not give them something tied to a specific tool (since they haven't selected the tool they are going to use), but as long as the tool is not Access it's ok if the book is vendor-specific. We'll probably end up using Postgresql or SQLite for the actual database, but won't be doing anything that should require special features provided by either of those databases.
Does anyone have any recommendations?
Sunday, November 11, 2007
Book Review: Programming Collective Intelligence
The latest book I've been reading as part of the Atlanta Python Users's Group Book Club is Programming Collective Intelligence by Toby Segaran.
Disclosure: My copy of the book was provided free, as part of O'Reilly Media's support for the book club.
My Impressions:
I have to admit, I was a little concerned when I picked up Programming Collective Intelligence that my rusty math skills would be a hindrance to really understanding the material. But all of the statistics or linear algebra needed (not a lot) are explained quite clearly in context (something my college professors could never seem to manage). It did take me longer than it usually does to read a book of this size because this one is crammed full of great material. It has a high information density, but is still a pleasant read. Ending each chapter with a list of exercises you can use to explore the topics presented earlier in more depth was a nice touch.
While the source code is not always as clear as the prose (mostly due to variable name choices), it is presented with plenty of descriptive text that is clear. Most of the chapters build the source along with your own knowledge, rather then presenting a large complete program after a lengthy description. In fact, many of the inline examples are created using the Python interpreter command line, making it easy to work along with the text and experiment with the data on your own.
I definitely recommend this book. The algorithms covered are fascinating, and I'm already considering how I can use the optimization techniques to solve a sticky problem we've been trying to address at work.
Book Summary:
The first chapter introduces collective intelligence (combining input from a large group of people to achieve insight) and machine learning (adaptive algorithms which can be trained to perform a task more accurately or make predictions).
Chapter 2 dives right in to building a recommendation engine. The first small example program finds users with similar tastes in movies. This example is used to explore different ways to calculate similarity between data points, and how to use those values to rank other users who have critiqued movies based on how similar they are to you. Critics with taste similar to your own can be used to find a recommendation for a movie you have not seen. These ideas are expanded in a larger example which recommends links from del.icio.us. This is the first of many real example programs throughout the book which use Web 2.0 APIs to pull data from public sites. Chapter 2 closes with a discussion of the pros and cons of user-based vs. item-based filtering and when each is appropriate.
In chapter 3, the similarity calculations developed in chapter 2 are used to build data clustering algorithms (hierarchical, column, and K-Means). The first example groups blogs based on the words which appear in the posts on that blog. The example works through the entire process of breaking the input into words to be counted, all the way to visualization of the clustering results. Sample code for drawing dendrograms using PIL is included. Next, an example using Zebo.com discovers clusters in the preferences people have (Zebo lets users post lists of things they want).
Chapter 4 discusses the challenges experienced when building a full-text search engine. The example code starts out a little confusing because it stubs in the whole API instead of "evolving" the class throughout the chapter. But the discussion is clear, and once the code is complete it makes sense. The discussion of PageRank have especially good examples. Chapter 4 also introduces a simple neural network implementation and shows how to train it to include the click counts for search results in their rankings. The neural network code might have been more clear if it had used a functional programming style, but that might just be a personal preference. In general, the implementation is very straightforward and it should be possible to use it for other purposes. This was my first exposure to neural networks, and they strike me as surprisingly simple for something with such an exotic sounding name.
Chapter 5 covers "stochastic optimization" techniques for selecting the best result from several options in a set. Random searching, hill climbing, simulated annealing, and genetic algorithms are covered. The discussion also includes the limitations of optimization as an approach. Once the basic techniques are explained, the sample flight scheduling application is converted to use live data from Kayak.com.
In chapter 6, various algorithms for classifying documents are covered. A naive Bayesian spam filter is used to examine the challenges of breaking documents up into classifiable "features". There is good coverage of the techniques for limiting false classifications using separate thresholds and a description of how to combine the probabilities for each feature to calculate the probability of the source document belonging in one category or another. The Fisher method, used by SpamBayes, is also discussed. Once the classifier is complete, an example program for filtering blog feeds is built with it. The code samples in chapter 6 start to suffer from abbreviated symbol names, but once you figure out the abbreviations the rest of the structure of the code makes sense.
The material in chapter 7, Modeling with Decision Trees, reminded me of an expert systems class I had in college. In class, we had to build our decision trees by hand but chapter 7 shows how to "train" a tree from input data with known outcomes. The material covers methods for splitting the tree into sets based on Gini impurity or Entropy, and then building a tree recursively by repeatedly splitting sets until no more information is gained by having separate nodes in the decision tree. Again, once an example program is built with a simple dataset, the program is enhanced by introducing a web 2.0 site which can provide similar data. In this case, real estate price information from Zillow.com is used. To illustrate how the same decision tree code can be used completely different types of data, a hotornot predictor is built with data from hotornot.com.
Chapter 8 leaves the realm of strict classification and introduces tools for building price models for predicting price for items using multiple variables. As with the earlier chapters, several techniques are presented and their pros and cons are covered in detail. There are plenty of graphs to illustrate the importance of selecting the right number of neighbors for the k-nearest neighbors calculation, for example. This chapter also discusses optimizing the scale of data from heterogeneous variables, and weighting different variables based on how much they effect the outcome. The real world dataset for chapter 8 comes from eBay pricing data.
Chapter 9 returns to classification, and covers tools for classifying data where the division of the data can be expressed as a function of 2 or more variables. The problems with decision tree and basic linear classification are discussed in the context of a dating site match-making application. This segues into a discussion of kernel methods for dealing with non-linear classification. The input data for the match-maker uses the distance apart the 2 parties live, as calculated using the Yahoo! Maps API. Although support vector machines are discussed in theory, the actual code for working with them is written using the open source LIBSVM library due to the intense computational requirements. At the end of the chapter the completed match-maker is turned loose on Facebook data to predict "friends".
While the earlier chapters have focused on placing data into categories, chapter 10 covers techniques for discovering categories within the data itself. The first example presented is a tool for finding themes among news items in RSS feeds. The feature extraction technique discussed, non-negative matrix factorization, is implemented using the NumPy libraries for matrix math. The second example uses Yahoo! Finance APIs to examine trading volume for various stocks, looking for relationships.
Chapter 11 introduces a few techniques for genetic programming, evolving applications through trials and mutations. The sample code includes classes to represent programs as trees of data which are easier to mutate than raw text source would be. The chapter explains how to measure the success of any one program tree, apply random changes through mutation and crossover, then evolve a set of programs by identifying and retaining those which are becoming more successful at reaching the desired outcome. The importance of diversity for keeping the result set from reaching a local maxima is stressed. The sample programs include a formula building tool and a player for a simple game.
Chapter 12 wraps up the book with a summary of all of the algorithms and techniques presented earlier, and is intended to serve as a reference. There is less code, but all of the examples are with new data so the prose is not just a repetition of what has already been seen. There are additional diagrams to help explain the techniques, in particular the neural network details are expanded.
Web 2.0 APIs/Sites:
All of the examples throughout the book use either simple flat files for input, or a Web 2.0 API of some sort.
- del.icio.us
- MovieLens/GroupLens
- RSS/Atom
- Zebo
- Kayak
- Askimet
- Zillow
- hotornot.com
- eBay
- Yahoo! Maps
- Yahoo! Finance
Open Source Libraries:
Many of the example programs use open source libraries to process or retrieve the data. All of those libraries are listed, with instructions for retrieving and installing them, in Appendix A.
- Python Imaging Library (PIL)
- BeautifulSoup
- SQLite
- pysqlite
- LIBSVM
- feedparser
- NumPy
- matplotlib
- pydelicious
Technorati Tags:
PyATL, python, books
Sunday, November 4, 2007
Programming Collective Intelligence
I'm reading Programming Collective Intelligence for the PyATL book club this month. I've only started, but am already finding it fascinating. If you've read it, come join the discussion over at our google group.
[Updated Nov 14 - My review is available here.]
Tuesday, September 18, 2007
PyATL Book Club on O'Reilly
Marsee Henon from O'Reilly recently interviewed a couple of us about the Atlanta-area Python Users' Group Book Club. She asked some good questions, and although I'm not entirely comfortable with the group being characterized as an "O'Reilly" book club, O'Reilly does offer us a lot of support (esp. free books) so I guess I shouldn't complain.
If you're interested in participating in a technical book club, you don't have to live in or even near, Atlanta to join ours. Head over to http://pyatl.org/bookclub and join the Google Group we have set up for our discussions.
Sunday, September 9, 2007
Book Review: RESTful Web Services
As part of the Atlanta Python Users' Group Book Club, I received a copy of RESTful Web Services written by Leonard Richardson and Sam Ruby, and published by O'Reilly Media. When we started the book club, this was the first book I suggested we read. I had previously studied some material on various blogs discussing REST, but I wanted a clear description and more specific examples. The book provided both, and I highly recommend reading it before planning your next web development project.
Overview
Unlike many such books, RESTful does not depend on a single programming language for examples. Much of the code is Ruby, but Python and Java make up a respectable proportion of the material as well. Since I was primarily interested in the design principles and "theory", I did not try to run any of the sample code myself. Others in the book club have, so check the forum for more details if you are interested in that aspect of the book.
The outline of the book follows a well thought-out progression of topics from basic "programmable web" concepts to in-depth discussion of Roy Fielding's Representational State Transfer (REST) ideas and then Resource Oriented Architecture (ROA), a natural extension of REST. Intermediate chapters include discussions of best-of-breed tools for web development and copious example code.
Outline
Chapter 1 is a foundation chapter for the remainder of the book. It describes how the HTTP protocol works and breaks down the different architectural styles discussed in the remaining chapters (REST, RPC, and REST-RPC hybrid). The theme of this chapter, and perhaps the entire book, is that "the human web is on the programmable web". If something is on the web, it is a service.
Chapter 2 introduces the concepts necessary to implement clients using web services. The easily digestible example code (in several languages) implements a client for the del.icio.us bookmarking service. Bookmarks are an excellent choice for an example program, since the information being managed is straightforward and everyone understands the concept, even if they have never used del.icio.us directly. Chapter 2 also includes recommendations for client tools and libraries for common languages. Basic HTTP access, JSON parsers, XML parsers (including details about DOM, SAX, and pull-style parsers and when each is appropriate), and WADL libraries are discussed, with best-of-breed options presented for each language.
In chapter 3, the authors use Amazon's S3 service design to point out features of the REST architecture which make it different from RPC-style APIs. The complexity of the examples increases to match the requirements of the service, including advanced authentication techniques.
Resource Oriented Architecture, introduced in Chapter 4 and discussed in an extended design example used through chapters 5 and 6, is perhaps the most interesting part of the book. ROA is a set of design principles which encourage you to think about your service in a specific way to enable REST APIs. The principles are:
- Descriptive URIs
- URIs should convey meaning
- Addressibility
- Expose all information via URLs
- Statelessness
- The client maintains the application state so the server does not have to.
- Representations
- Resources can have multiple representations, based on format, level of detail, language, and other criteria
- Connectedness
- Link between related resources explicitly within the representations, so the client does not have to know how to build URLs
- Uniform Interface
- Use the HTTP methods (GET, PUT, POST, DELETE) as designed
To illustrate these principles, in chapters 5 and 6 the authors build a web mapping service, similar to Google Maps. This detailed example also serves as a way to introduce their ROA development process.
1. Identify the data set to be managed by the service.
2. Split up that data into resources.
3. Name each resource with a URI.
4. Expose a subset of the uniform interface for each resource, depending on what makes sense and what features are to be supported.
5. Design representations to be passed from client to server.
6. Design representations to be passed from server to client.
7. Include links to other resources.
8. Consider a typical course of events, to ensure completeness.
9. Consider error conditions, to identify the HTTP response codes to be used.
Chapter 7 includes the implementation of a bookmarking service similar to del.icio.us. The sample code uses Ruby extensively, and it was a little more advanced than what I was prepared to absorb without a Ruby primer. One important point made in the prose of the chapter is that code frameworks may constrain your design by making certain choices for implementation easier or harder.
Chapter 8 is a summary of the REST and ROA principles discussed in the earlier chapters, and is an excellent reference once you've finished reading the whole book. It is also suitable as a "Cliff's Notes" version of the material, if you don't have time to read everything. If you want to review the book before reading it, go to the book store and take a look at this chapter.
While chapter 2 covered client implementation techniques, chapter 9 is a survey of tools and aspects of web service implementations in different languages. It covers topics such as XHTML, HTML5, microformats, Atom, JSON, RDF, control flow patterns, and WSDL.
In chapter 10, the authors give an extensive comparison of ROA and "Big Web Services" to argue that ROA is simpler, requires fewer tools, and can even be more interoperable.
Chapter 11 is the requisite "How to use this with AJAX" chapter.
And the book wraps up in chapter 12 with a discussion of frameworks for doing RESTful development in multiple languages. The coverage of Django includes a dispatcher that decides how to handle the request based on the HTTP method invoked.
Conclusion
Before reading "RESTful Web Services", I had a somewhat cloudy notion of REST and how to apply it. The book clarified what REST is and how to apply it. It also offered an invaluable concrete process to follow when implementing a web service using REST and ROA principles. I expect my copy to see a lot of use and become dog-eared as I refer back to it frequently.
PyATL Book Club
The Atlanta Python Users' Group runs an online book club. We encourage all Atlanta area Python developers to check the schedule on PyATL.org and come down to a meeting. Anyone is free to join and participate. For more reviews by members of the book club, check out the forums or our Reviews List.
Technorati Tags:
django, PyATL, REST
Thursday, May 10, 2007
CherryPy Essentials
A little over a week ago I received a review copy of Sylvain Hellegouarch's new book, "CherryPy Essentials", published through Packt Publishing. The timing couldn't have been better, since we have begun investigating Python web application frameworks at work for a new project. From previous work I have done with TurboGears, I knew that CherryPy was a contender, so I was definitely interested to see what version 3 had to offer. Sylvain's book is is a good starting point for the information I wanted.
Review
"CherryPy Essentials" is a fairly short book (251 pages), especially given the breadth of topics it covers. It could easily have been 2-3 times as long, if the author was wordy or repetitive, but the concise writing style means that there is a lot of good information packed into this short volume.
The outline is fairly typical for tech books:
- What is this thing and why do I care? - chapter 1
- How do I get it? - chapter 2
- What can it do? - chapters 3-4
- Build an example app - chapters 5-7
- Advanced topics - chapters 8-10
The real substance of the book begins with chapter 3, which gives an overview of CherryPy. It includes a moderately sized introductory application which lets different authors post notes on a page. The sample code includes embedded comments and is used as a basis for a brief description of how CherryPy decides what to do when an HTTP request is received.
CherryPy comes with a host of modules to make building your application easier. The coverage given these modules in the "Library" section of chapter 3 probably does not do them justice. The author clearly states that his intent is not to create a reference guide, but I would have liked to see this section pulled out into its own chapter and expanded, possibly combined with the Tools list in chapter 4.
The more in-depth discussion of CherryPy in chapter 4 includes instructions for running multiple HTTP servers; various mechanisms for dispatching URLs to Python functions; serving static content; hooking into the core to add your own middleware (or "Tools" in CherryPy-parlance); and WSGI support. The same chapter also includes a description and concise example of how to use each Tool provided in the CherryPy core distribution, following a format not unlike the one I use for my Python Module of the Week series.
Chapters 5-7 discuss the design and development of a photo blog application, which is small enough for the reader to follow but large enough to delve into the details of how to build a real application with CherryPy. The presentation begins with the data model, then covers web services and user interface topics.
Some of chapter 5, which discusses working with databases, is unfocused and includes sections on topics such as backgrounds on database types and object-relational mapping libraries not actually used in the example applications. This material could have been eliminated without serious loss. It is interesting, but it detracts a bit from the overall coherence of this book. Once he selects the Dejavu ORM, the discussion refocuses and covers the mechanics of using that library to store and retrieve data.
Chapter 6 provides an excellent discussion of web services, REST, URL design, and the Atom publishing protocol. These were perhaps the most interesting sections in the book. The author clearly has a great deal of experience to share on these topics. I hope there is another book coming soon with a greater examination of these topics.
The coverage of presentation layer topics in chapter 7 begins with a brief history of HTML, up to the development of DHTML. Then the Kid templating language is introduced (thankfully without a comprehensive survey of all available Python templating languages :-). The UI for the example application is fairly simple, so only basic Kid features are really covered, but that is enough. The more complex DHTML work is handled by Mochikit, introduced here and used more extensively in chapter 8, which is devoted to Ajax.
The last 2 chapters of the book cover topics too frequently left out of other books: Testing and Deployment. The chapter on testing presents several tools which integrate well with CherryPy for automated testing of different aspects of your application, including webtest for unit testing, FunkLoad for load testing, and Selenium for UI testing. I had never seen these tools before, and the descriptions and examples were enough to make me add them to my "to be researched" list.
The final chapter covers deployment options for moving a CherryPy app into production. Options for using Apache, lighthttpd, WSGI, and SSL are covered, but no definitive "best practice" is suggested. I suppose the final choice should be made on more variables than could really be covered, but I would have liked to see clear guidelines for making a decision about which configuration to use in different situations.
Third-party Tools
Any good, modern, open source project does not stand alone, and CherryPy is no exception. "CherryPy Essentials" makes it clear that integrating with third-party tools is an important part of the design for CherryPy. Tools covered include:
- DejaVu - database access/ORM
- Kid - HTML templating
- Mochikit - JavaScript/Ajax
- FunkLoad - load testing
- Selenium - UI functional testing
Other tools are also mentioned, but covered in less detail.
Conclusion
"CherryPy Essentials" packs a surprising amount of information into a small space. The coverage is not exceptionally deep in any one area, but is fairly complete. The sample code is consistent and easy to read. The book is full of useful and interesting information and, while it occasionally suffers from disjoint flow, I can definitely recommend it to any Python programmer interested in the future of CherryPy development, and web technologies in general.
Special thanks to Ms. PyMOTW for proof reading this post for me.
Thursday, December 21, 2006
ATM for Books
The idea of a machine that can print a copy of every book ever written makes me think I need to build some more shelves.
