Ten Challenges

Stevey's Drunken Blog Rants™

A while back I recommended ten books I thought were fairly good/essential reading for programmers. They're all fairly basic material; the only one that's even moderately challenging is the Algorithm Design Manual. And that one's not too bad.

I thought I'd talk about some other books I'm in the process of reading (and in some cases, struggling through). I haven't finished most of them yet, but they're all exceptional.

These are books that are important to me. Not in the Lewis Carroll or Herman Melville sense; they're not cherished fictional works, or even fictional works that are just thick enough to prop up the couch. For the most part they're technical books. But each of them is a book that I return to regularly as I try to figure out — well, how stuff "works".

There are more than ten, obviously, but I decided to cap this list at ten books just to have a shot at finishing this essay before the end of the year. (Note: it's now Jan 22nd 2005, so I didn't make it. Close, though!)

Without further ado...

#1: Gödel, Escher, Bach: An Eternal Golden BraidDouglas R. Hofstadter

This is one of the most beautiful books ever written, and it's the top of my list. Sure, it's incredibly famous and won the Pulitzer Prize the year it was written. That's not why you want to read it, though. It's a work of art, one written for everyone, but especially for programmers like you and me. (If you aren't a programmer, make sure to read about book #10 on the list.)

One interesting tidbit about this book is that few people ever finish it. Make no mistake: they want to finish it, and they try. But depending on your tolerance for mind-bending recursive paradoxes, you're likely to give up anywhere from a third of the way to halfway through. I did. I finally managed to read about 2/3 of it over the holidays, and I'm looking forward to finishing it... so I suppose the situation hasn't really changed. I'm still not done.

Even some of my college profs weren't able to finish it. In fact, I only know one person who has. He works at Amazon, and it's not who you think it is (you probably don't even know him). He's a smart dude. But then, he was a smart dude back when he was a grad student writing the backend for the compiler that we were struggling with writing the frontend for in our compilers class. Very smart.

I love this book. Everyone loves this book. We just can't finish it! It's too much, too rich. It's like trying to eat a molten chocolate fudge cake the size of a minivan. I've seen TV shows where people were in ice-cream eating competitions, and they'd eat ice cream until they had to yak in a bucket, which isn't unlike reading this book, and you could see they were still eyeing the ice cream afterwards, thinking: "I suppose I could give it another go. The Romans did it that way, no?"

The book is pure genius, and a lot of fun. Not only is it one of the best books on intelligence (whether real or artificial) ever written, the book itself is a self-referential fugue in a structural pun on Lewis Carroll. It really defies description. You just have to read it, or even a third of it.

I feel like I'm getting closer and closer to describing this book without ever getting there. If you know what I mean.

#2: Structure and Interpretation of Computer Programs — Harold Abelson, Gerald Sussman

This book, often known by its initials (SICP), takes the coveted #2 spot on my list. Thanks to Andrew W for mentioning it when commenting on my Ten Great Books entry.

After Andrew's comment, I quickly went and devoured the book like so much enchanted Turkish Delight. I couldn't get enough of it. I already owned the book, and I'd tried reading it, but unfortunately I couldn't get past its combination of pomposity and bad jokes. Imagine a really arrogant, condescending professor who thinks he's hip, and he makes up stories about students named Louis Reasoner or Ima Pseudonym or whatever. I just couldn't stomach it.

But once you get past the tone, it's a great book. Great, as in: "It changed the way many schools taught computer science." That kind of great. It's used for the intro-to-programming course at MIT, and also (I think) Berkeley. Probably other schools as well.

I distinctly remember when I heard the University of Washington was changing their intro-to-programming course again, back in, oh, what was it, 1998? When they switched to Java. They realized that Ada wasn't exactly a cutting-edge programming language anymore, what with Babbage being dead for over a century now. (Actually, Ada *was* pretty cool, but don't tell anyone I said that.)

In any case, I remember hearing that one of the languages under consideration for the intro to CS course at UW was Lisp. "Lisp!!??" I thought. In fact I think I may have even been verbally and visibly outraged by the idea. What the heck would they teach newbies Lisp for?

I know the answer to that question now, and I'm still basically opposed to using Lisp for the intro-to-CS course at any school, including MIT. I interview people all the time who took the course, suffered through it, and never could figure out why the heck they were learning such an esoteric language. New CS students just aren't ready for Lisp.

Most people are never ready for Lisp, just like they're never ready for math, or philosophy, or any of the human endeavors that will be remembered and cherished by aliens who study our culture after we blow ourselves up. I sometimes wonder, though: if O'Reilly published a "Lisp Cookbook", with some giant hideous monster on it (a Yeti, I'd hope, or possibly a pterodactyl), how many people would start taking it seriously?

This book isn't an O'Reilly book, though. It's from MIT press, like most of the books in this list, so it's For Smart People Only. Er, in this case, Smart People who Don't Mind Being Condescended To.

This book covers... well, everything. Just like Gödel, Escher, Bach. I finally learned how Huffman encoding works (it's pretty cool!), and the simplicity of Scheme's metacircular interpreter is just staggering. The picture language is a thing of beauty, even if MIT's founder isn't. The book is stuffed with examples of programs that would be hundreds of lines, if not pages, of Java or C++ code, all written in a tiny amount of Scheme.

Definitely one of my favorites. Not for new programmers, maybe, but if you're more like me and Montgomery Burns ("It's not how old you are on parchment, it's how you feel in the humours!"), then I think you'll really enjoy it.

#3: Digital TypographyDonald Knuth

This is a Knuth book. For most programmers, that means: "It's by this old Comp Sci guy that everyone raves about but nobody has actually read." Am I right, or am I right?

Donald Knuth, it turns out, is one of the best writers in computer science. He's funny. OK, maybe it's just me; maybe I laugh at the jokes of a mathematician, when everyone else would be sighing in disgust. But I really get a kick out of his books. When I start reading Knuth, I know I'm in good hands.

For the recruiting-minded among you, you can check out his resume, or Curriculum Vitae, at his home page at Stanford. I actually find the idea of Knuth putting his resume online fairly amusing. I can just imagine the interview:

Donald: Hi, I'm Donald Knuth, and I was wondering if you have any job openin—

Stevey: Aaaahh!!! <faints>

Donald: Aren't you going to ask me any technical questions?

Donald is the best. Someone the other day said they felt he was a bit on the arrogant side. Well, (a) I've never noticed that myself, although I haven't finished his The Art of Computer Programming yet (I got it twice for Christmas, if you can believe that. I'm so predictable!), and (b) regardless, I think Donald has a right to be arrogant. When you obsolete the printing press, you earn that right, fair and square. You're even allowed to wear a t-shirt that reads: "I'm Donald Knuth and You're Not". It's in the Rules.

But why this book in particular? He's published a ton of stuff. I'll be honest: I picked this one in part because it's one of the only Knuth books that I've come close to finishing. But I've been gradually working my way through quite a few of his books, and this one is special. Extra-special, even.

For one thing, it's self-referential in a way that even Hofstadter rarely used; it refers constantly to its own print layout. And it's a beautiful book. After all, it was written entirely using Knuth's own TeX program. What's TeX? It's a deliberate plain-text transliteration of TEX: Donald's program (written in Pascal, believe it or not) for digital typesetting, which allows you to specify how you want your printed material rendered on a printed page, whether it's online or on a piece of paper.

For another, the book is, well, charming. Donald's writing for a 1970's audience, give or take a decade, and this stuff was breaking new ground, at least for the time. You have to appreciate the fact that when he published his first two volumes of the Art of Computer Programming, he had to use this giant, steaming-mercury metal monstrosity that might have been more at home in a Stephen King novel in order to print it nicely in book format.

Donald thought: "Hey, I'm a mathematician and a programmer; fonts and typesetting seem like they might be algorithmically tractable, so lemme see if I can write a program that will replace the printing press." He could, and he did — although it took him about 10x longer than his initial estimate, as do all the best software programs.

Digital Typography is a set of essays and papers spanning the entire journey, from his realization that you could do fonts and typesetting with computers, up through relatively recent releases of TEX, which is still the premier typesetting system worldwide. Incidentally, TEX is up to version 3.14159. Starting at version 3, they changed numbering schemes, and each new release adds another digit of π. For many years Donald has offered a small monetary reward to anyone who finds a bug in TEX; he even sends you a check, although they're rarely cashed, since a check from Knuth for finding a bug in TEX is far more valuable as a collectible.

Notice how the previous paragraph's formatting is all screwed up from the TeX logos, which I faked with a subscripted 'E'? TeX doesn't have that problem, of course. Browsers sure suck, don't they? Hmmmm... Maybe Microsoft and Mozilla ought to try to catch up with early-1970s typesetting and rendering technology, eh?

Anyway, you should read this book, even if you have no interest in typesetting. It'll give you a new appreciation for a lot of things that you probably take for granted. And it's just plain fun to read. Strongly recommended, along with all his other books.

#4: Programming Language PragmaticsMichael Scott

I think this book (PLP) is the best general introduction to programming languages you could find. It's "sort of" a language-design book, "sort of" a compilers book, "sort of" a lot of things. It covers dozens of languages in a very practical way, discussing the trade-offs and choices that language designers have made. The book never claims any one language is "better" than another, which is refreshing. It just shows what's easy and what's hard in each of them.

The book is organized more or less like a textbook on compilers. It starts with tokenizing, moves on to syntactic analysis, then semantic analysis, runtime organization, code generation, and (some) optimization. It's a good way to get a detailed feel for how compilers work without actually writing one. (Note that there's no substitute for really writing one, but reading this book is the next best thing.)

Unlike most compilers books, PLP talks a lot about why you might make particular decisions in designing a language. A big problem with many technical manuals (for any sort of programming activity) is that they give you a huge list of options, but they don't give you any guidelines. This book is about pragmatics, which the author feels is on equal footing with syntax and semantics as one of the central design principles in studying and designing programming languages. It's a very practical book.

It's also very well-written. Very, very, very. Unlike many of the books in this list, it's down-to-earth and written for ordinary mortals like you and me. It's also exceptionally well-organized, indexed, and cross-referenced, making it easy to jump around. I find myself reading it in a very nonlinear way, jumping from section to section as I follow the references. It has plenty of well-constructed and helpful diagrams, programming exercises at the ends of each chapter, and a zillion references to further reading.

The book basically covers a superset of all the features you might find in a modern programming language. So reading the book will make it a lot easier to pick up new programming languages, because you'll already be familiar with the cutting-edge concepts like pattern matching, lazy evaluation, parametric polymorphism, multi-methods, and so on. And you'll know how they're implemented, so you'll be able to figure out their relative performance trade-offs.

Great book. Can't say enough good things about it. You can search inside it on Amazon, so I encourage you to take a peek at the layout and writing style. I think you'll find it very approachable. My copy's getting dog-eared.

#5: The Essentials of Programming LanguagesFriedman, Wand, Haynes.

I just got this book from Amazon. It's from the MIT press, of course, like many of the books in this list.

Daniel Friedman and Matthew Felliesen are two of my Heroes. No, I've never met them, any more than I've met Heracles or Kurt Vonnegut Jr., but they're Heroes nonetheless, and virtually everything they've written, between the two of them, has been something I've wanted to have — well, bronzed, I guess. Or etched in stone. Whatever you do to make a physical copy of a book immortal.

This book is a real treasure, although one that's not well-understood by our customer reviewers. I'd ignore most of the reviews. (Unless maybe you think "witchkingofangmar" is a reliable reviewer. Jeez.)

The book builds up essentially all of the fundamental concepts you need in order to understand programming languages (which, I might add, is a Good Thing if you use them at all), implementing them as an increasingly complex (but never overly difficult) series of Scheme interpreters.

I stumbled on this book inadvertently, as a reference in a paper that described the algorithm that converts a list comprehension into a series of map and filter statements. List comprehensions, along with map, filter and their ilk, are really useful abstractions. It's worthwhile for you to figure them out, including how they're implemented by the compiler or runtime.

An interview candidate this morning actually explained to me why he thought they were valuable. He pointed out (rightly, I think) that when you're working on a particular business problem, you don't want to be worrying about whether your loop counters start at zero, and end at or below the ending index. You just want to perform some operation (e.g. getting the cost, or item details, or whatever) for every element in a list. Map does exactly that.

List comprehensions operate at a slightly higher level than Map; they're almost like a query language (in the SQL or XPath sense) for your data structures. Very useful. You say what you want to look for, possibly performing computations on the results as you acquire them, and it all happens "magically" under the hood. Well, I'd hope it's not really magic: you shouldn't use an abstraction unless you understand more or less how it's implemented, or you can very easily run into performance issues and not know how to debug them.

That's where this book comes in. It fills a role very similar to Programming Language Pragmatics, but with a very different exposition. PLP is more comprehensive, but might seem a bit daunting. This book, in contrast, covers an important subset of the most common language features, but it shows just how easy they are to implement, at least in Scheme. I think it's a great way to play with the algorithms yourself, without worrying too much about those loop counters.

I would not recommend this book unless you're already pretty familiar with Lisp or Scheme. This is a pretty advanced text.

#6: Types and Programming Languages, Benjamin C. Pierce

There's a guy on my team who actually used this book as an auxiliary text for one of his CS courses at the University of Washington. I can't tell you how cool that is.

Well, actually I can: it's really cool. This guy is a Smart Dude, sort of on par with the Smart Dude I mentioned earlier who wrote our compilers backend and finished Gödel, Escher Bach back when he was "just" our roommate in college. This guy is scary-smart. Like, I study nightly just in case he'll ask me some horribly complicated programming problem. He doesn't know that though; don't tell him or it may go to his head, and he'll want to be an "architect", and we'll never see any work out of him again.

This is the hardest book on the list, and I've made the least progress into it so far. However, I think it's one of the most important. It's important because type systems are a critically important part of programming languages — the way you design them, and the way you use them as a programmer. The book talks about various important type systems, including those of C++ and Java, and what sorts of problems and guarantees you can expect from them.

It spends a lot of time covering the ML type system, which is one of the strongest, most "elegant" type systems out there, and you'll find more and more books talking about it. And more and more languages are starting to use it. It's considered one of the best type systems in the world, because it gives you a lot of flexibility and Perl-like expressiveness, while maintaining strong compile-time guarantees that you're not assigning the wrong type of data to a value. It's basically the best of the strong-typing and dynamic-typing worlds, because you don't have to declare most of your type tags.

Interestingly, most programmers don't ever think much about types. I ask people in interviews sometimes: "what is a data type?" Often they struggle for a long time before coming up with anything even remotely approximating an answer, even a wrong one. Most people have never thought about it. At least they're savvy enough to realize they don't want to use the word "type" (or a synonym) in their definition.

I find it a bit odd when people don't know how to tell me what a datatype is, because structuring your data types is one of the most fundamental parts of creating a program. It's certainly the heart of "object-oriented design" and design patterns. What classes do you create, and how do they relate and/or communicate with each other? What methods and operations do they provide, and what access control do you place on those operations? What data types do they accept as arguments? How do you marshal a type across a wire?

These kinds of questions are all tied closely to the type-system that your programming language provides you. Your choices affect just about every quality measurement of your system: readability, maintainability, robustness, runtime performance, static build times, and so on.

Even if you don't want to follow the mathematical analysis in this book, it would be a good idea to read through it just for the results. I.e. it's not as important to follow the proofs as it is to understand what they're proving.

Another solid treatment of data types is chapter 7 in Programming Language Pragmatics. PLP's discussion far less mathematical in nature, and, well, more pragmatic, as you might expect. You might start with that book and then move on to this one if you want more detail.

Even though it takes a fairly formal approach, it's a pretty darn readable book, and is worth taking a look at.

#7: The Seasoned SchemerDaniel P. Friedman, Matthias Felleisen

The Seasoned Schemer is a continuation (of course) of its prequel, The Little Schemer, and it, too, features happy elephants on the cover and at the beginning of every chapter.

The authors have another book, quite an amusing one, written in the same question-and-answer style, called A Little Java, A Few Patterns. I have it on my home bookshelf, which is the shelf you don't get to see but which contains the books that I Really Care About.

The "Little Java" book is amusing because it tries to introduce exactly the same programming concepts and abstractions as The Little Schemer and The Little MLer, except it tries to do it in Java, which gives it roughly the flavor of Ken Thompson's Turing Award Speech, in which he says of his Quine competition: "FORTRAN was the language of choice for the same reason that three-legged races are popular." (Good speech, by the way.)

The Seasoned Schemer will be a real bear for you if you skipped the last chapter in The Little Schemer. I did, and had to go back and work through it carefully. It builds on the work they did to derive the Y-combinator, an important function that bootstraps recursion.

This book is aptly named. Tough book. Well worth it, though. I'll talk a bit more about why you might want to work your way through it when I talk about book #8, below.

#8: The Scheme Programming LanguageR. Kent Dybvig

Yes, another Lisp book. It's also another MIT Press book — they're sweeping the list today, with well over half the entries. I'm putting this book in the list because, once again, it's one that I haven't quite finished but I revisit it often. And it gave me some tremendous insights into the languages I use day-to-day: C, Java, Python, Ruby, and so on.

I'm sure many of you are rolling your eyes in disgust, since we all know Lisp isn't very practical. If Tim O'Reilly hasn't published a book on it yet, then it must not be. Plus, MIT folks have something of a track record for failing our interviews, because they tend not to know C++ very well. (That last sentence pains me more than you know.) So why all my focus on Lisp books?

The reason Scheme (a small, pure dialect of Lisp) gets so much attention in comp-sci programs is that it's an incredibly small language, but it packs a really big whallop. It's an easy language to learn — many professors say they can teach the basics of Scheme in a single lecture, and students can immediately start writing programs in it. But it's also an easy language to implement.

That's a really important point. Sure, it's also reasonably easy to learn C, but writing a C compiler is a complex task. Writing a C++ compiler is incredibly hard, and writing a Perl interpreter is actually impossible, because there's no spec for it, and (like C++) it's basically un-parseable.

But students regularly write Scheme interpreters as a single project in their undergraduate programming language course. And Scheme programs are usually shorter than their equivalents in the more complex languages. Why we go to so much effort to learn languages that are more complex and less expressive with the same performance is beyond me. It's probably all those scary parentheses.

As evidence supporting my point about ease of implementation, here's an example of a small but fairly solid Scheme interpreter written in Java. (The author, Peter Norvig, also the author of at least one book on Artificial Intelligence, is now the Director of Search Quality at Google. Ha! They'll never be successful now. He probably doesn't even know C++. He'd never have made it past Amazon's brutal hiring bar.)

On the JScheme home page, Peter says that he planned on spending no more than 20 hours total writing the interpreter, which included the time spent learning Java. He says he succeeded, and looking at the code (there's not much of it!) I think I believe him.

OK, so Scheme is a small language, and it's easy to write an interpreter for it. Easier than it is for almost any other language, even BASIC. And it's pretty easy to learn. So what's the big deal about it, then?

The big deal is that you can write complex Scheme programs that are far smaller than their equivalents in Java or C++. Scheme may be a small language, but the constructs it provides are amazingly powerful. The SICP book, for instance, provides a complete implementation of Huffman encoding and decoding in about a page of code. And the Dyvbig book (this entry) provides Scheme code that does some equally amazing things, at least for the size of the code.

Is code compression a good thing? After reading a bunch of Paul Graham essays, I was under the impression that he felt compression was the Holy Grail of language (and program) design. I wasn't sure I agreed with him, because obviously at some point you'll get diminishing returns on readability; in fact you might as well just gzip the code if compression is the most important criterion. However, searching for the essay above, I typed "paul graham compression" into Google and got this link back, in which someone says exactly the same thing, and they agree that there's some sort of "best semantic compression" that you can aim for. And it's definitely not gzipping the code.

On the other hand, I still think Scheme is fairly "dense" (as in, almost unreadable). It's easy enough to write, but following the program is nowhere near as easy as following, say, C++. I'm not sure how much of that is due to my relative unfamiliarity with the language, and how much of it is from the language itself, e.g. lacking type tags and interface signatures. Data types really help, and in Scheme, there isn't a standard way to create data types from which an automated tool can produce javadoc-like documentation.

Anyway, I could talk all day about the pros and cons of Lisp, of course, but I'm sure you don't want to hear it. I'm not going to argue that you should write all your code in Scheme, or any of it, for that matter. It may have some small practical use as the Free Software Foundation's (i.e. GNU's) new scripting language, Guile Scheme. In that capacity, it'll help you with scripting and automating any of your GNU software (e.g. gcc, gdb, emacs, gimp, etc.), especially as more and more GNU tools start incorporating Guile support. But it's not likely to be useful to you in many other places.

Like I said in the intro, this list isn't about books that are immediately relevant and practical to you in your everyday work as a programmer.

If nothing else, Scheme has given me a great deal of insight into how more popular languages work. Languages like Python, Ruby and Groovy are all beginning to incorporate Lisp features (e.g. first-class continuations, first-class closures, maybe even macros at some point), and it helps to understand them in the simplest context of Scheme before trying to use them in more complex languages.

This is a hard book, though, so don't try to read it in a day.

#9: How to Design ProgramsFelleisen, Findler, Flatt, Krishnamurthi

This is a fantastic book. If you're a programmer, you should definitely not read it.

Yep. Should not read it.

This book is for non-programmers. If you're not a programmer, and you have ever thought about getting into programming, this might just be the way to do it. It's not going to be easy, but you'll be learning programming in its purest form. This book was written for people who don't yet know how to program.

Programmers shouldn't read it because they fall into one of two categories:

    1. Programmers who know Lisp (or Scheme) already. They don't need this book.

    2. Programmers who don't know Lisp or Scheme already. They won't understand this book, because they believe programming is all about getting a job, and you only need one language to do that. If you're a programmer already, but you don't know Lisp or Scheme, definitely stay away from this book.

The thesis of the book is that everyone should learn how to program computers. The authors consider it to be one of the essential skills of an educated person. It's the Fourth 'R', maybe — a new addition to Readin', Ritin', and Rithmetic.

This book uses a programming language called Scheme, which is one of the simplest, most powerful languages on the planet. Programming in Scheme is like speaking eloquent English when all your friends are stuck with one-syllable words (that would be Java, or Perl, or C++). Of course, many useful words are only one syllable, with some especially useful ones being four letters long, so people using Java/Perl/C++ are more than equipped to respond to the challenges of the day. But to say things that are beautiful, you need Scheme, or a similar language.

The book really is targeted at people who've never written a program before. I wonder how effective it is? If you're a non-programmer, and you want to give this book a try, I would love to know how it works for you. There's a lot of description and very little code; they try really hard to explain things from the ground up, and they assume little or no computer knowledge on your part.

On the other hand, it's hard to remember what it was like not to know anything about programming when you're writing a book like this. So maybe it's awful. If YOU are a person at Amazon who would like to try learning how to program computers, and you get the book and are having trouble with the setup or whatever, let me know and I will be happy to help you.

#10: Purely Functional Data Structures

I haven't gotten very far into this book, primarily because I've been burying myself in a bunch of books that are (*gasp*) not in this list. That's what I get for counting on my fingers, and limiting my blog list to 10 books.

Here's the deal: for this one book, rather than talk about it, I'll talk about why it's in my to-read list.

JG and I have both, for the past year or so, been on an escalating and increasingly desperate search to find a (much) cleaner solution to our distributed computing challenges at Amazon. We meet and talk about it at least once a week, where we compare notes on our research.

So far, the research indicates that all people who like J2EE should be strapped to a rocketship and blasted out into space. But it's just our preliminary findings, and we're not quite ready to publish yet.

JG and I believe that there's a solution to the problem out there, buried somewhere in the mass of literature about programming languages, operations research, distributed systems, and other disciplines. Unfortunately, few of the researchers have access to systems as gigantic as ours, so they publish theoretical results that need to be sifted and understood before we can apply them to our problem domain. That's our labor.

For one thing, it's abundantly clear that it's time for us to move up to a higher level of abstraction. The company can't deal effectively with our hundred million lines of code, a number that only increases as we pile on more OOP and Design Patterns crap auto-generated by IDEs. I swear these Eclipse people can't see the trees for the forest; everyone wants to be an architect, which they seem to feel is synonymous with blasting out hundreds of files for every individual computation that needs to occur. I will blog about this at some point, but I can't let myself get sidetracked today. This essay has been sitting unfinished for months, and I want to get it published.

The level of abstraction we need will be a language abstraction. I don't know if it's going to be the addition of some Amazon-specific minilanguages that handle pattern-matching on clusters and networks of server nodes, or if it's going to be a full-fledged networking language along the lines of Erlang or Gambit Scheme. But Object-Oriented interfaces are failing us, and we need to turn today's network into a computer that we can start programming directly, as if it's a single machine. Language abstraction, backed by solid mathematical and theoretical principles, is the only thing that can stop the madness of our code growth.

Lest this sound like insane ranting, I should point out that JG and I, along with maybe 18 other Amazonians, all nearing the six-year mark here, started out at a company called Geoworks, where we all spent 5 to 10 years writing an entire operating system and applications suite in 80x86 assembly language. Everyone thought this was a great idea until we went out of business. Programmers have a lemming-like attachment to the systems and processes they're currently invested in. It's fairly easy to see the situation playing out again, this time at a higher level of abstraction: the EJB and C++ folks think that with their fancy OO interfaces, they're working in the very aether of pure thought, when in fact what they're doing is much more akin to assembly language.

You could argue with me, but you'd have to explain the constant problems with our systems availability. I encourage you to go do an impromptu survey: ask a bunch of directors and senior engineers why our systems availability has gotten so bad. I did this over a year ago, surveying about 30 top people, and the answer I got back was always the same: complexity. Our systems have become intractably complex.

The whole point of my writing that blog about John von Neumann was that he was the kind of person who would see intractable complexity coming, and find entirely new theoretical models that would make it tractable. In fact, as George Dyson pointed out in his recent talk, von Neumann actually published a preliminary theory on creating reliable systems from unreliable components. Then he began his work on cellular automata and other types of parallel systems, because he knew they were going to be the only way to do truly large-scale computing.

Here we are, stuck on poorly-understood networks of serial computers, trying desperately to pile on more code in the hopes that it'll somehow fix things. JG and I believe there is a way out for us. It's somewhere in the domain of parallel computing — the servers are parallel nodes, so this isn't exactly a huge intellectual leap. We've been studying the available research, and all roads lead to the same set of conclusions, one of which is that Functional Programming is going to be a necessity in this new world. It's a foregone conclusion.

And that, in a roundabout way, brings me to this book by Chris Okasaki. It is absolutely unique. It's the world's first textbook on purely functional data structures — i.e., data structures with no side-effects. I'm not going to explain in this blog why this is such an important topic for Amazon and distributed computing in general, but I will point you to the book in the hopes that you are also interested in finding a solution.


These books might seem hard, but this whole effort is actually about making things easier for programmers. J2EE might seem easy; C++ might seem, well, ah, er, tractable, anyway. Programming might seem fun right now, and the woes of our systems might seem to be a force of nature that one merely lives with — irascible pagan gods controlling our systems, and all we can do is offer up more code in hopeful sacrifices. But it's not the way things have to be, and languages 20 years from now will be so dramatically different (and more powerful) than the cruft you're using today, it'll seem as if you really were programming in assembly. Or worse.

But the answer is out there, and we want to find it, or at least notice when someone else finds it, so we can start applying it here.

(Published Jan 23, 2005)


Cool — thanks for pointing out "Purely Functional Data Structures" in particular.

Have you looked at Concepts, Techniques, and Models of Computer Programming by Peter Van Roy, Seif Haridi?

This is next on my personal list, due to the strong recommendations at


Posted by: Chris N. at January 25, 2005 04:30 PM

I bought and read GEB when I first moved up here and started at Amazon. I read it all the way through, too. That's what being in an apartment by yourself, with no TV or Internet access will do to you...


Posted by: Jon M. at January 26, 2005 01:00 AM

I also read GEB all the way through, back in my undergraduate days. I have it on my shelf to reread, but it joins a stack of other books, so it will probably be some little while yet before I get around to rereading it.

Posted by: Timothy K. at March 28, 2005 10:41 PM