ocaml

I'm learning a relatively new programming language called OCaml, or Objective Caml. All the Ph.D. candidates have been raving about it for the past few years, and a few of the finest programmers I've ever known have cited it as their language of choice. OCaml programmers and programming teams have been winning and placing in challenging programming competitions. Microsoft research has created a version of it (called F#) for .NET. So I thought I'd take a look.

OCaml is a very high-level language, similar perhaps to Lisp or Prolog, but with strong type-checking that helps your programs run correctly the first time. However, it appears to be as fast or faster than C++, in any number of language shootouts. I've verified this myself, for small programs, by solving the same problems in OCaml and C++, and it appears (at least initially) to be true.

How is that possible, you ask? Well, that's complicated, and it involves a lot of proofs. The nutshell version is that the ML language family has an elegant type-system that allows the compiler to deduce things about your program that a C++ compiler cannot possibly deduce about a C++ program. Some ML compilers can, for example, turn much of your heap-allocated memory into stack-managed memory. They can also use more compact representations for your data types, eliminate or inline virtual method calls, turn tail recursion into iteration, and take advantage of immutability and side-effect-free functional programming to do internal parallelism, object pooling, and assembly-language-level optimizations that a C++ compiler has no hope of achieving, since it doesn't really know what the hell you're doing.

In the past year, I've studied and written code in many languages, including (but not limited to) Perl, Python, Ruby, Emacs-Lisp, Scheme, Common Lisp, Pascal, JavaScript, C#, Java, Objective-C, Haskell, Prolog, C, C++, XSLT, Standard ML, OCaml, and others. I've been on a quest to find the Right Language(s) for my own software development, ever since about 2 years ago when I found that I'd pretty much hit a wall with Java, after having written about 600,000 lines of Java code. I got to the point where I knew exactly how long it would take me to write and debug any given project in Java, typing at full speed - and it was too long.

As much as I like Java (the platform), I think Java (the language) is unpleasantly verbose and inexpressive - it's "uncompressable" beyond a certain point. For X amount of functionality, you're going to have Y lines of code, with no way around it. And for large enough systems, the sheer bulk of the code begins to be problematic by itself. Making big changes can require editing thousands of files. Understanding the program flow requires reading through a bunch of infrastructural boilerplate that there's no way to eliminate.

AspectJ may help with this significantly, but it's still (IMO) an inelegant hack: a precompiler that adds metaprogramming facilities to a language that doesn't support them natively. It's a brilliant move by Gregor Kiczales, in some ways - he's realized the unwashed masses of programmers will never graduate to a real language, and that Java is here to stay, so he's implemented something remotely resembling aspects of his CLOS system as a Java preprocessor, and added branding and marketing to it in the form of new terminlogy (e.g. "crosscutting", "aspects") to an idea that's been around for 20 or 30 years. Kudos. But I'm not going to use it unless it becomes part of the Java language standard. As it stands, I don't think it's well-integrated enough.

However, most of the other languages I've used make me unhappy as well. OCaml looks like the first one that might have everything I want and perform well. It has Win32 bindings, so you can write native Windows applications. (This statement is initially a bit of a mind-bender for people who associate "native" with assembly, C, and C++, but if you think about it, there's absolutely no reason you couldn't write a "native" Windows application in any language for which you wrote an appropriate compiler.) It works wonderfully on Linux and Cygwin, which are the only two Unixes I care about anymore. It supports functional, imperative, object-oriented, and logic programming all in the same language, and you can intermix the styles freely. It has great Emacs support. It comes with an interactive interpreter that you can use for incremental development, Lisp-style. There are books about it available from your favorite bookstore - at least about SML, which is very similar to OCaml.

OCaml has threads, exceptions, call-with-continuation, calling conventions to and from C, a rich standard library with collections, networking, I/O, graphics, a complete interface to the Unix programming API, and a powerful module system that blows Java's packages away. It has interfaces and bindings for Oracle, MySQL, postgres, berkeley DBs, CORBA, COM, xml-rpc, SOAP, XML, perl-compatible regular expressions... the list goes on. You name it, it's there.

OCaml has the potential to make me happy as a programmer, finally. We'll see.

(Published sometime in June 2004)

8/4/2005: I wound up not using OCaml much after that. I do still like it, and I'd enjoy programming in it. I just found myself migrating towards Lisp. However, OCaml's mix of expressiveness and performance is astonishing, and I'd like to work with it again at some point.

If you liked this short article, check out my follow-up, More Ocaml.