Over the past year, I've seen a number of posts related to computer science programs. Is there enough funding? Are there enough students in computer science? Are they taking the right classes? Etc. I've seen a lot of good advice. Recently Joel Spolsky's
Programmer's Bookshelf came to light again, and I especially enjoyed Paul Graham's
What You'll Wish You'd Known.
I often find myself wanting to chime in with my own thoughts on such matters, but every time I start, it only leads to a desire for more personal reflection and introspection; knee jerk reactions come too easily. Sigh. I think I'm going to simply ramble on the subject. If anything good comes of it, perhaps I can repackage it later in a more cohesive form. So here goes, reflections on computer science past, what went right, what I wish went differently.
Statistics. My computer science program had some pretty substantial undergraduate requirements in statistics. The courses seemed like bothersome hurdles when I took them; the ideas came easily enough, but I found little of it more than marginally interesting.
In retrospect, however, all the statistics courses have been well worth their salt. Sometimes I joke about statisics being the devil I know. I've applied my knowledge of statistics at three jobs so far, in biotechnology, in the design of statistics software [duh] and finally in computer graphics.
Today, statistics is more important than ever. In some areas, statistical methods appear to be slowly but gradually overtaking many of the more classical approaches. Newer areas such as bioinformatics are driving all sorts of new research in statistics.
Borrowing an expression from
Paul Graham, a good education in statistics is one means of staying "upwind," because the knowledge is applicable in so many contexts. (In fact, Graham famously used Bayesian statistics to attack the problem of
Spam.)
What should you learn? I recall
Probability and Statistics by DeGroot and Schervish being a good introduction to the subject. A quick Google search reveals its use at Princeton, Berkeley, Columbia, MIT, etc.
Some authors have been putting entire books online in the form of PDF files. (My thanks to the authors.) The ability to peruse these books in their entirety has resulted in two of my purchases so far this year. One is David Mackay's
Information Theory, Inference, and Learning Algorithms, which I consider an outstanding textbook. (The previous link is to the online version of the book.) It does such a wonderful job of pulling the subjects together that it really deserves a post all by itself.
Another statistical tool I highly recommend adding to your toolbox is a solid understanding linear regression analysis. This should entail a course or two in linear algebra. (Check out
Gilbert Strang's excellent lectures on video.) I also recommend
Empirical Model-Building and Response Surfaces by Box and Draper. Knowing how to build good empirical models is a valuable skill that doesn't seem to be taught sufficiently in many engineering fields where such skills can translate into great improvements in processes and products. If you dig deeply enough in this area, I also highly recommend Weisberg's
Applied Linear Regression.