In my spare time for the past year or so, I've conducted a long-running experiment to study a bunch of different languages, hoping to find at least one that makes me more productive. Like, a lot more. I wound up tinkering with at least 20 different languages, and I delved quite deeply into some of them.
This doesn't happen very often, I think. Most people don't go grubbing around looking at other languages unless they're brand-new developers looking for an easy way to start building things, or they've been "displaced" from their existing language through some major event.
I've been displaced from my home language twice in my career. Assembly language turned out to be too hard to use outside of Geoworks, so I had to pick another one. I chose Java, and worked with it for about seven years. I also chose Perl, because Java and Perl are good (and bad) at different things. They were the two most heavily-marketed languages at the time, so they're what I settled on.
I was displaced from both Java and Perl when I hit a wall with both of them, and realized I was spending more time working around problems with the languages than actually building things. I don't think most people ever notice this, though, or if they do, it doesn't bother them at all.
Anyway, I figured this was a good time to pause and talk about what I've learned.
My officemate Ron says my blogs are too long, and he's right, so I'm going to try to keep them shorter. I don't know if I've succeeded with this blog. At least I touch on a lot of subjects briefly, rather than blathering on too long about any one of them.
What I've learned about languages
People usually have a "favorite language", typically the one they've spent the most time with. Languages form communities, or "camps", and they stick together and bash on other languages. That's not because people are bad. It's an emergent behavior that stems from the interaction of a few simple natural phenomena:
Programming languages are very hard to master.
All languages are theoretically capable of doing the same things.
Learning a language moulds your thinking to that language.
Languages need a lot of libraries to be useful on the job.
In combination, these things produce the emergent behaviors of language communities and language religion.
Why are there a bunch of languages, then, with new ones appearing all the time? That's an inevitability as well, being another emergent behavior that results from the interaction of a few other simple phenomena:
Many frequently recurring problems that are relatively hard to express are best served by mini-languages.
Programming languages are collections of mini-languages that were deemed to be generally useful by the language designer(s). Everything else goes in libraries, which are more awkward to use.
New mini-languages appear constantly, as people recognize that they're saying the same things over and over, and as new problem domains appear.
It's hard to change or evolve languages after they've built up large code bases and user communities.
So new languages appear once a year or so, often as experimental extensions to (or revisions of) existing languages. A few examples should suffice:
Stackless Python is a version of Python that uses continuation-passing style.
AspectJ is an extension to Java that adds metaprogramming "hooks" to allow cleaner separation of concerns.
Gambit Scheme is a version of Scheme that incorporates the distributed, lightweight, message-passing concurrency constructs of Erlang.
OCaml is an enhanced version of CAML that supports object-oriented programming features.
C++ is a backwards-compatible extension to C that supports OOP features and template metaprogramming.
Languages are always combinations of elements from other languages. A designer might create a language for any number of reasons:
To "clean up" a language that the designer likes, but feels is critically flawed in some ways.
To synthesize the designer's favorite features from two or more existing languages.
To extend an existing language to add new capabilities the designer felt were missing.
To throw a public tantrum over a lost lawsuit.
Or just to experiment with making a language, because it's fun in its own right.
Languages are fad-driven. Marketing is the most important component of making a language popular. A language's technical merits don't matter much to most programmers, any more than a soft drink's flavor matters when people choose Coke, Pepsi, or RC Cola.
Every five years or so, a major marketing effort pushes some new language into the limelight, and a bunch of people start using it. Usually they're new developers. New developers are the only ones who really matter, at least for marketing purposes. Existing developers don't need or want to change; the more experienced they are, the more locked in they are to their current language, and it becomes a lifelong commitment for most people.
So the key to marketing a language is to claim that it's easy to learn. Languages that don't do this will fail to gain widespread popularity.
A language doesn't actually have to be easy to learn. It just has to be marketed and perceived that way.
The only way to get existing, committed developers to switch to your language is to make yours essentially a superset of the other language, so they feel their knowledge and skills transfer over. Java tried to do this, but wound up being too different, so most of Java's growth has been through acquisition of new developers.
Languages have to offer (or at least claim to have) blazing performance. This is because most programmers never learn how performance optimization really works, or how to apply it properly, so they need the language to cover up for their educational deficiencies.
The choices made by a language designer have a large impact on the language's long-term ability to survive and adapt. Languages that are consistent, powerful, flexible, and general-purpose tend to last a long time. Languages that provide lots of elaborate, domain-specific solutions for today's problems tend to be useless tomorrow.
That's pretty much what I've learned about languages in general.
Here are some observations I've had about different languages...
C is important to know. It's the most expressive language anyone has come up with that still maps more or less directly to assembly-language operations on sequential Von Neumann machines, which comprise most of the computers in the world today. It's ubiquitous, timeless, well-studied, pretty fast, and portable.
C doesn't scale well for doing application development. It was designed for systems programming and remains best for that kind of work.
Observation: systems programmers look down on application programmers. Pop culture is that systems programming (kernels, drivers, real-time OSes, etc.) is harder, possibly since they're equating app programming with just laying out the UI. I've done both — or all three, counting doing a bunch of tedious UI layouts in various frameworks. App-level programming is harder than systems programming.
C++ is a huge, complex language that's almost a proper superset of C. It adds object orientation, template metaprogramming, and a variety of smaller but still useful features (e.g. reference parameters). C++'s most useful contribution by far was the double-slash single-line comment.
I actually have a fair bit to say about C++, so I'll leave it for a separate essay. The very short takeaway is that I feel C++ has many strengths, but is a problematic and ultimately poor choice for large-scale application development, which includes most of the kinds of development we do at Amazon.
If you are a huge fan of the C/C++ programming models, I'd suggest taking a look at the D language. It's a new language designed to be what C++ should have been the first time around. It looks and feels very similar, and gives you all that cool low-level pointer access that makes you feel all manly inside. I have no idea whether D is production-quality yet, but it appears to be nearing spec-completion, and it has some compilers and other tools available. Worth a look.
Java is a decent compromise language. It's a turn-key solution for cross-platform development. It has lots of strengths and weaknesses, most of which you're probably familiar with.
I've had lots of interesting insights and realizations about Java lately, but probably the most interesting one is that Java code-base size grows superlinearly as a function of your feature set. Java has almost no abstraction facilities, so refactoring Java code almost always makes it larger (by line count, token count, class count, or just about any other measure of code "size".)
Refactoring isn't the only place this happens. Trying to evolve a Java system to meet new requirements also involves piling on more code. So does bridging two Java systems. There's no way to avoid huge amounts of copy and paste. Java's version of "once and only once" appears to be: "N times and only N+1 times."
This weird property of Java is driving the Java world towards controlling their code with automation: refactoring tools, for instance, and code generators. There has never been such a massive drive towards code automation, but then I don't think the world has ever tried building such complex systems in a language that requires code that grows quadratically with the feature set.
It's also driving Java programmers to use other languages for doing their processing — XML, for instance, is generally less verbose than Java, if you can believe that. Take a look at Apache Jelly, though, and they have proof.
I've already written one essay about Perl. I loved it for a long time, and don't like it anymore.
Python is a pretty strong offering. Very strong, in fact. It's a good language. As many people have pointed out, one great strategy for doing application or services development is to write your code in Python, then rewrite the performance-critical parts (which you can't predict in advance) in C. To be sure, you could substitute almost any decent language for "Python" in that pattern and get equally good results.
Python is really easy to learn. You get used to the whitespace thing. You can get used to anything, trust me. It also has a bunch of libraries, tools, other language flavors (e.g. Jython and Stackless Python), and other goodies. Some of the best Wiki implementations out there are written in Python. So are some of the best web server frameworks, source-code viewers, and many other apps.
Python's surprisingly expressive. You can generally write Python code that's as compact as Ruby for small systems, and Lisp for large ones. A lot of this has to do with its list comprehensions and its syntactic whitespace, both of which are done better in Haskell, but are still pretty good in Python.
I say it's "surprisingly" expressive because it has some irksome annoyances. For instance, it has no ternary operator, nor in fact any reasonable way to express an if-then-else on a single line. Lambda functions can only be one expression, which is pretty lame. The metaprogramming system is pretty verbose, although at least Python has a metaprogramming system. And the scoping rules are fairly messed up (even now, after 2.3 fixed a lot of them), so you often have to create classes for things that should have been inner closures or lambdas. Despite all these things, you can still write pretty darn compact, readable code.
Python's got some historical annoyances that will never be fixed. One of them is Guido. (Heh.) A fairly significant one, left over from the days when Python had no scoping at all except global and local (no lexical, class/instance, or dynamic scoping) is the fact that you have to prefix any object member or method access with "self". There are others. But in spite of these problems, it's still a pretty darn good language. Really. I'm just nitpicking.
Python's still struggling to evolve from its roots as a scripting language, and it lacks many features from more mature languages. For instance, there's no compiler, and no way to declare type annotations. The foreign-function interface is reasonable, but Ruby's is far cleaner. But I think all these problems will be addressed in time.
Ruby's an exceptionally good language. It has some of the same maturity problems of Python: no compiler, no type annotations (i.e. you can't declare a parameter, variable, or function return value as an "int"), and an immature multithreading system. Other than those weaknesses, all of which will be addressed in the fullness of time, Ruby's pretty incredible.
Ruby's got a lot of features that will be compelling to programmers coming from other languages. It's got strong OOP support, fairly strong functional-programming support, lots of Perl-like string and system facilities, and an amazingly broad set of other features for such a relatively simple syntax. Ruby's proof that you can do more with less: you don't need a byzantine syntax like that of C++ or Perl to be a powerful, flexible language.
I use Ruby quite a bit. It's still the language I reach for first when I've got some small problem to solve, or something to prototype.
Smalltalk was a great language. Maybe it still is; I'm not sure — I hear Squeak is gaining a lot of popularity. Smalltalk's one of the most expressive, powerful languages of all time.
There were a few reasons it never really took hold. One is that it was a bit too slow in the 80s on average PC hardware. It was ahead of the hardware curve, and had to wait to catch up.
Another reason is that the whole thing lives inside an "image": a sort of portable OS. This caused a bunch of problems, not least of which is that it prevented Smalltalk from ever taking hold in niche domains — e.g. as an embedded language like Tcl or Lua, or a scripting language like Python or Perl.
Eric Merritt points out that it also lacked a free implementation for many years (in fact, Squeak, which is relatively recent, may be the first reasonable free implementation), which may have been the most serious obstacle to its adoption.
By the time the hardware had more or less caught up, the Java marketing machine came along and totally broadsided the Smalltalk folks, who had been waiting patiently for their day in the sun.
I think languages that wait patiently for their day in the sun will never see it. Something else always comes along and steals their glory. Language marketing is everything, and being a "new" language has a certain marketing appeal. I can't see any other explanation for Java's success, given that it's a pretty weak language compared to Smalltalk.
I'm not sure I'd use Smalltalk for anything real these days, even though it's a lot of fun, and I miss many of its features.
Lisp doesn't actually exist as a programming language. It's a family of languages.
Lisp itself is quite beautiful; it's more of a discovery than an invention, and people continue to rediscover it. Finding Lisp is a site that documents this process (the one I went through myself) pretty well.
Unfortunately, once you reach the conclusion that it's the One True Language, rah rah and all that, you're stuck with the nasty reality of having to choose from the existing Lisps, all of which exhibit a certain amount of Suckage. This appears to be mandatory for all programming languages, at least so far.
I keep trying hard to like Common Lisp, but it's got soooo many problems. They took poor, beautiful Lisp and turned it into this C++-like monster.
If you want to do Lisp development today, and start getting things done with it, without worrying about compatibility issues, or library support, or performance problems, then you have to use Common Lisp. It's a goliath. It's huge, and has tons of libraries, and tools, and documentation, and you can do anything with it. It has powerful, state-of-the-art compilers and garbage collectors and profilers and all that happy stuff. For any problem you can imagine needing to solve at Amazon, there is a solution available in Common Lisp.
But boy is it ugly. I wouldn't go so far as to chew my arm off to avoid waking it up, but I might lie there for an hour or so wondering whether I could simply make a run for it.
I probably wouldn't have a clue how ugly it is unless I'd also looked closely at languages like Scheme, Haskell, and (to some extent) ML. Maybe ignorance is bliss.
However, it's still Lisp, with all the attendant technical advantages. If Java code bases grow superlinearly with the functionality, and Perl/Python/Ruby grow roughly linearly, then Lisp grows sublinearly. It's very compressible, although you'd never really notice, since compressing it doesn't need to obfuscate it. Of course there are programmers in every language who lack anything that could even remotely be considered "taste", and those people can do a real number on Lisp, just as with any other language. A Number Two, if you know what I mean. But unlike some languages, Common Lisp doesn't force you into it.
Scheme is a small, beautiful dialect of Lisp. If you want to use it, you have to choose from at least a dozen competing implementations, some commercial, some free. Scheme (www.schemers.org) has gradually been evolving towards a language with a fairly good-sized standard library. I've done a lot of digging, and you can do an awful lot of stuff in Scheme that you'd consider "essential" for using a real programming language: database access, xml support, service calls, all that stuff.
However, Scheme is not a turn-key solution, not by a long shot. If you want to use Scheme for a project, you'll wind up working harder to find libraries and make them work with your version of Scheme than you would if you were to use a more mainstream language like Java, Perl, Python, or Common Lisp.
I think that unlike most languages, which have a narrow window in which to be successful, Scheme may actually become popular at some point in the distant future. Hard to say. Definitely not there now, though, so if you're going to try to do a project in it (on your own, of course — I think you'd be a fool to use it at Amazon), you've got your work cut out for you.
ML, SML, OCaml
The ML family is pretty neat. How neat, exactly, is a matter of some debate. ML is basically Lisp with a super-strong (fanatically strong, in fact) type system, plus a few goodies like a better module system and some cool pattern-matching facilities.
The super-strong typing means ML compilers produce insanely fast code, optimized directly for modern hardware. On average, OCaml produces code that's as fast or faster than C++, and this isn't because the OCaml compiler writers are smarter than the C++ compiler writers. It's strictly a function of the fact that the OCaml compiler knows a lot more about what's going on in your program than a C++ compiler can possibly know.
The super-strong type system also makes ML and friends something of a pain in the ass to work with. I suppose you get used to it after a while. The one upside is that before your program ever runs, you've eliminated a vast number of bugs that you wouldn't have spotted until runtime.
However, as with all type systems, passing the compiler checks doesn't mean you've eliminated all the bugs, so it's not clear that you've really gained all that much, unless the type system is exactly as expressive as you need it to be, and no less. ML's is pretty flexible, but has some limitations that I find rather onerous.
The two most popular dialects of ML appear to be Standard ML (SML) and OCaml (Objective-CAML). OCaml seems to generate faster code, supports more features, and has more active research going on. SML has been around longer, looks a bit cleaner, and has more books available.
I'd like to work more with OCaml. Its sheer speed makes it continually worth looking at.
Haskell is really, really cool. I think it has years to go before it even comes close to being "production quality", but Microsoft is pumping an awful lot of money into Haskell research, so it may happen sooner than I'm guessing.
I have four books on Haskell, believe it or not. I'd never have guessed that there are so many books on it. None of them are that great, oddly enough, although the Hudak book is pretty decent. Language books always try to teach you the language as if you've never programmed before; they hate to mention other languages, or do direct comparisons. And of course that's exactly what you need if you want to pick it up quickly as an experienced programmer.
Anyway, Haskell's main claim to fame is that it's a lazy language. That has a very specific meaning: unlike in every other language you've probably seen, the arguments to a function are not evaluated before calling the function. They're evaluated "on demand". This lets you do, for instance, infinite sequences, and read from them as needed, like streams. You'd be surprised at how many places this is useful.
Haskell is actually a lot like ML, feature-wise: it has pattern matching, currying, type inference, a very similar type system, Lisp-like lists and operators, and so on. But for some reason, Haskell looks and feels a lot cleaner and less restrictive than ML.
This is in part because it uses syntactic whitespace — any language that lets you omit semicolons will look very clean. Haskell takes it pretty far, and uses "layouts", which I suppose are similar to the layout managers you see in GUIs. If you format (i.e. indent) your code a certain way, it imparts certain semantic meaning to the code. The nice thing about Haskell is that, unlike Python, you don't have to code this way; they did it to free you up, not restrict you arbitrarily.
I haven't written anything significant in Haskell — in fact, my only working Haskell program appears to be the solution I did for the first ADJ challenge. It's a language I'd love to get to know better.
Incidentally, if you start looking at Haskell, and you get to Monads (Haskell's way of doing I/O), and you think they're really confusing, you're not alone. I think the easiest way to think of it is that they're pretending that an IO stream is actually just an infinite list of bytes or records, and you just read or write the list as needed. "Monad" is widely regarded as a bad name for what they're doing. Try not to let it scare you.
Alas: this one is very high on my list, but I still haven't looked at it yet. I'll probably start with Gambit Scheme, and then go see if that helps me understand Erlang a bit better.
Languages often seem to come in pairs or triples. Java, C# and C++, Haskell/SML/OCaml, Common Lisp/Elisp/Scheme, Erlang and Gambit Scheme, Perl/Python/Ruby, and so on. They're little groups of languages that have very similar flavors. I find that it helps you understand a "flavor" better if you try learning more than one language in that flavor. You'll see patterns that might otherwise have escaped your notice.
Anyway, Erlang's on the list. I'll admit: I've only barely looked at it. But its message-passing model (or something like it) is really important, and as far as I can tell, beats multi-threading hands down as a concurrency model. I still have a lot of reading to do about this, though, since continuation-passing is (allegedly) even better. Sigh. So much to learn, so little time. But gosh, just think how much time it'll save me when I actually try doing something useful! Assuming I haven't totally forgotten how to program by then.
There are other languages (lots of them) that I haven't looked at. I'll keep an eye on them. It's a good habit to be in, as it makes you better at whatever language you happen to be using at the moment.
If I've learned ONE thing that's more important than everything else, it's this: languages DO matter. Don't let anyone tell you otherwise. Heck, if it weren't for words like "grubbing", the world would be a less beautiful place. Well, maybe "grubbing" is a bad example, but you get the idea.
I'm curious to hear peoples' comments on the languages I've discussed, or just general reactions. Is this stuff interesting? Should I go back to writing about video games?
Feel free to comment. I'm genuinely curious. I've been leaning towards writing more about algorithms and less about languages, but that could be even less interesting, for all I know. Let me know what you think!
(Published Feb 16, 2005)
This is a pretty interesting summary, overall. I was surprised not to see any mention of Objective C, until I realized that the language is practically invisible outside of the OS X world. You mentioned mini-languages so much at first that I was half-expecting a Paul Graham style trumpeting of Lisp when you reached it in your list. Imagine my surprise when it turned out to be a fairly reasonable glance that matched my own experience: "Gosh, the idea of Lisp is just gee-whiz cool. But damn, the implementations of it suck in countless ways."
I still do most of my work in less than a half-dozen languages, but it's always interesting to look at more. So far my experience with new languages has been along the same lines as the kid poking at roadkill with a stick, trying to figure out if it really is dead. There is a lot I'd like to do, but I've got the pressure of limited free time staring me down. In the meantime, all my roadkill poking seems to be making me more proficient at the languages I do know, especially Ruby.
Oh, and there *is* a Smalltalk-based embeddable scripting language available on the OS X platform (FScript). It's got more users than I would expect, but it's still not exactly taking the world by storm. Not even the relatively small world of OS X. The developers of GNU Smalltalk have also worked pretty hard to make it accessible to people accustomed to file-based development. It's nice, and definitely worth taking a closer look at if you don't feel like starting up a Squeak VM just so you can learn about the language.
Haskell ... oh man, Haskell looks so cool. I downoaded a PDF tutorial for it months ago, and haven't had a chance to look at it since. Well, I suppose I've had chances, but I haven't actually *remembered* during those vital free hours.
Oh, REBOL (http://www.rebol.com/, http://rebol.org for a lot of good sample scripts, http://coolnamehere.com/geekery/rebol/ for a couple of very low-level introductory articles for my niece) is another interesting niche language. It works from the philosophy that all computer programming should involve creating a dialect of your main language that fits into the problem domain. It's a really intriguing idea, and I would love to see them do more. Unfortunately, the developers have been stuck in a beta loop for about four years now, with no sign of releasing a new official stable release that more cautious people can use.
Too bad I'm buried deep in learning why folks hate obidos so much.
Posted by: Brian W. at February 17, 2005 07:10 PM
"On average, OCaml produces code that's as fast or faster than C++, and this isn't because the OCaml compiler writers are smarter than the C++ compiler writers. It's strictly a function of the fact that the OCaml compiler knows a lot more about what's going on in your program than a C++ compiler can possibly know."
The OCaml compiler was designed to be a very aggressive optimizing compiler. And high performance ML/OCAML code is often precisely engineered to expose optimization opportunities to the compiler, i.e. making all recursion use tail-call, continuation style computation, clever use of call-cc, etc. Rewriting your code so the compiler can optimize away the inefficiencies of functional languages is pretty hard for nontrivial programs. So while the compiler writers of ML/OCaml may not be smarter than their C++ counterparts, the people that write high-performance ML programs probably are :).
Not taking anything away from ML, I personally loved using it in school, but I'm not gonna try writing high performance apps in it.
Your argument that compiled ML/OCaml is faster than C++ because of type information doesn't work in the light of languages like C which have almost no type system and yet outperforms just about anything outside of assembly. But that's a bigger topic.
Posted by: Mark at February 17, 2005 07:36 PM
Mark, thanks for the corrections/clarifications on ML/OCaml. I agree with all of them.
I think writing high-performance programs is a challenge in any language, especially once they reach a certain size. Heck, much of the challenge is to know when to worry about performance and when not to.
I'm not sure if it's harder to write efficient code in ML than in C++ — it's tricky to do an apples-to-apples comparison unless you're equally proficient in both, and I have far less experience with ML than with C++.
I think every language has a set of recurring idioms and patterns that lend themselves to better-performing code. Many of the patterns I see in ML are equally valid for Lisp, Scheme, Haskell, Erlang, or any other functional language. I suspect that with enough experience, these idioms become second nature.
I've been very interested in figuring out ways to apply functional programming ideas in Java and C++. Plenty of people out there have the same interest, as it can often yield more modular, robust code. Josh Bloch recommends it highly in "Effective Java". Google has a number of papers (e.g. http://labs.google.com/papers/mapreduce.html) that rave about it as well.
Of course, once you start using the functional style in your C++ or Java, many ML-style optimizations become directly applicable. So I'd argue that learning how to write high-performance ML programs isn't a *total* waste of effort.
(I'm sure you weren't arguing that. I'm belaboring the point just in case any Alert Readers think there's no value in looking at functional languages if you're not going to use them in production.)
Posted by: Steve Yegge at February 17, 2005 11:46 PM
Brian: thanks for the feedback. I do like Objective-C, and I shouldn't have left it out. I've never really understood how C++ managed to eclipse Objective-C, but I suspect it all comes down to marketing.
There's another very promising new language that probably should have made my summary list: Groovy. I'm nearing completion of a project to implement a small but nontrivial GUI-based application in five different JVM languages: Java, JRuby, Jython, Kawa Scheme, and Groovy. So far, Groovy appears to be the "best" of the bunch. I'll do a blog on it in a few weeks, comparing and contrasting them.
Paul Graham likes Lisp a lot, no question. But he probably wouldn't have sequestered himself away to work on Arc for a hundred years (if I understand his plan correctly) if the existing Lisps weren't all problematic. I'd love to use one of them, but so far it's been a real struggle.
I still plan on devoting a fair amount of time to learning the ins and outs of Common Lisp, and I'll try to write some sort of significant application in it at some point. If nothing else, knowing Common Lisp enables you to do some pretty amazing things with Emacs.
Thanks for the pointers to F-Script and REBOL. Incidentally, I tried hard to get Gnu Smalltalk working on my Windows/Cygwin box and failed miserably. Oh well.
Alas, I think I've only been averaging about half an hour a day on language experimentation, so it's a slow-going process. Fun, though.
Posted by: Steve Yegge at February 18, 2005 12:03 AM
Any chance you've read this interview with Alan Kay yet? I was reading it over the weekend and was reminded about the things you've been writing about for the last couple of months.
Posted by: Daniel C. at February 22, 2005 02:02 AM
Yeah Dan, you're about the 30th person to send me that link. Thanks though!
It's an interesting article. People seem to think that Alan Kay is really bitter, and that he feels the world has done him an injustice, or whatever. I didn't get that impression when I read the article, though. I thought everything he said made a lot of sense.
I suspect the people who think this of him are the kinds of people who say: "you're just jealous because C++ proved to be way cooler than your language". That's the same kind of person that tells a Mac user: "you're just jealous because Windows proved to be way cooler than the Mac."
It all comes down to your personal definitions of "jealous" and "cooler". :-)
Posted by: Steve Yegge at February 25, 2005 03:41 AM