ruby-tour

A Quick Tour of Ruby

Ruby used to annoy me simply by existing. I first heard about Ruby years ago, in maybe 1997 or 1998, and folks said it was kind of like Perl, but "cleaner", whatever that meant. Ruby fans back then seemed like a tiny minority of rebels and fringe separatists.

Ruby irked me primarily because we already had Perl, which was working just fine thank you very much. And if for some strange reason you didn't like Perl, we had Python. If Perl fans were dog owners, and Python fans were cat owners, then Ruby fans seemed like ferret owners. They could go on and on about how much they adored their beady-eyed albino stretch-limo rats, and how cute they were, but we all knew they were just looking for attention. Nobody really wants a pet rat. (Ferret owners will correct me and say they're not rodents; they're more closely related to weasels and skunks. As if that helps.) Regardless, I didn't want to have anything to do with Ruby.

Last year, though, I was looking at a bunch of different languages in the hopes of finding one to replace Perl for small- to medium-sized tasks. One day my magic Perl dust had worn off rather suddenly, and I'd joined the growing ranks of people who were beginning to notice the emperor was a wee bit underdressed. But all the alternatives to Perl looked pretty bad themselves, and I started judging languages by how far I'd get into the reference manual before throwing it across the room.

I eventually picked up a Ruby book -- I think it might have been the O'Reilly "Ruby in a Nutshell" by Ruby's author, Yukihiro "Matz" Matsumoto. To my lasting surprise, I made it all the way through, occasionally even pounding the table and saying "Now THAT is the right way to do that! Finally!" (I know. Loony.)

Within about 3 days, I was more comfortable with Ruby than seven years had made me with Perl. I was still accidentally sticking "my" in front of variable declarations, typing "sub" instead of "def", little things like that. But I was already starting to think in Ruby, and suddenly I had all this extra space in my brain. You have to learn Perl by memorizing, but you learn Ruby by understanding.

Within a month or two, I'd totally given up on Perl. My Ruby programs were shorter, clearer, and more fun to write. Everything I'd actually liked about Perl was there in Ruby, and I realized there was a lot of crufty old legacy baggage in Perl that it was never going to lose.

Nowadays I can't look at a Perl program without snickering; there's so much suffocating boilerplate junk in there. And I feel for folks who still have to write in it. Going from Perl to Ruby is as big a step in expressive power as going from C++ to Java.

I still like dogs and cats, but I'm starting to think ferrets might not be so bad.

Experimenting With Ruby

There's no way to cover a language decently in a short article like this one, so I'll just highlight a few things I particularly like about Ruby. If it's not enough to convince you to take a look, no worries. It'll still be around next year.

The place to start getting to know Ruby in the "irb" interactive Ruby session. It comes with the standard Ruby distribution. I usually run it inside an Emacs buffer with M-x run-ruby. To do this, you need to download inf-ruby.el and ruby-mode.el from the Ruby website, and put them in your Emacs load path somewhere. There are some instructions on the site. If you don't want to run it in Emacs, start up a Unix shell and type "irb" to get a prompt. There's an Eclipse plugin for editing Ruby source code, but you still have to run irb in a shell.

Ruby's interactive session is an example of a REPL, or "Read Eval Print Loop". Most popular languages these days have a repl of some sort:

- Common Lisp, Scheme, Ruby, Tcl, Haskell and OCaml all ship with very good ones as part of the language distribution.
- Python's is also good but a bit annoying, since you have to indent every line manually when entering multi-line expressions.
- Perl has an OK-ish one. Just type "perl -d" to enter the Perl debugger, followed by Ctrl-d for the program, and then don't use the "my" modifier on your variables. And don't hit Ctrl-C Ctrl-D or it spins into an unrecoverable infinite loop. (Perl lovers, I'm sure, will be happy to have this new set of weird things to memorize.)
- For JavaScript your best bet is to use Rhino, Mozilla's implementation of JavaScript in Java. It comes with an interpreter shell.
- For Java, you have lots of options, since most of the languages above have been "ported" to the JVM. The most popular REPL for Java is probably BeanShell, which provides a language that's very close to the Java language.
- There's even a REPL for C and C++ called CINT. Go figure.

REPLs are amazingly useful. Once you get used to them, you'll never want to code without one again. They're a bit like an interactive debugger - you can evaluate expressions, set variables, call functions, define classes, and generally just tinker with things safely. If you're not sure what a particular line of code will do, just enter it into the REPL and see.

Some REPLs are friendlier than others. Perl's is definitely the worst, but it's really more of a debugger than a REPL. Ruby's is very friendly. Just type "irb" to enter it, "quit" to leave it, and enter expressions or statements at the prompt to have them evaluated.

Using IRB

In Ruby, all data types are first-class objects, so you can use irb to ask any object what its methods are. And that's pretty useful. For instance, if you want to see the methods available on arrays, you type:

irb(main):009:0> [].methods

and the interpreter responds with:

=> ["send", "rindex", "reject", "reject!", "[]=", "flatten", "pop", "<<", "&", ...]

The names of all the methods on an array object come back in an arbitrary order. I've truncated the list above, but the list actually had 118 methods in it, which I figured out by typing:

irb(main):010:0> [].methods.size

=> 118

The "methods" method (available on all Ruby objects, since it's defined in the base class, Object) returns the method names in an array, so you can easily get a sorted list back instead. Here we get the first 11 sorted method names from the Array class:

irb(main):015:0> [].methods.sort[0..10]

=> ["&", "*", "+", "-", "<<", "<=>", "==", "===", "=~", "[]", "[]="]

Ruby lets you omit parens if you're invoking a method with no arguments, so a no-arg method invocation looks just like a field access, and you can chain them: foo.bar.baw.baz.

Working with Collections

We work with collections all the time. It's hard to imagine a useful program that doesn't do at least some work with them. Lists, stacks, queues, hashtables, trees, graphs, sets, bit vectors, tuples, multimaps, result sets -- these are all collection types. Even Strings are collections of characters, although most languages make it inconvenient to treat them that way, as last month's String Permutations problem demonstrated.

So one of the most important features of a language is how easy and consistent it makes working with collections. Regardless of the other merits of the language, this one aspect probably has the biggest impact on your day-to-day programming. To a large extent, it determines how pleasant it is to work in the language.

To illustrate, let's pick a random simple problem. How about if we write a program that will print out all the words in the dictionary starting with the letter 'Q' (case-insensitive), grouped by increasing length, and sorted alphabetically within each group. So the output would look something like this:

Words of length 3:

qua

quo

Words of length 4:

quad

quay

quip

quit

quiz

Words of length 5:

Qatar

Quinn

Quito

quack

...

Words of length 15:

quantifications

It's a pretty straightforward problem, so you'd hope that the code would also be straightforward. And in Ruby, it is:

#!/usr/bin/ruby

words = []

IO.readlines("/usr/share/dict/words").each do |word|

words << word.chomp if word.downcase[0] == ?q

end

max = -1

words.sort_by{|a| [a.length, a]}.each do |word|

if word.length > max

max = word.length

puts "Words of length #{max}:"

end

puts " #{word}"

end

It's a very short program, but it's especially short because Ruby lets you specify an array of fields to sort on in your collection. I've highlighted that array in magenta above. It specifies that the "words" array should be sorted first on word length, then on the word itself (which defaults, as in Java, to alphabetical order.)

A corresponding Java program would be about 4 times longer:

import java.io.*;

import java.util.*;

public class QWords {

public static void main(String[] args) throws IOException {

List<String> dictionary = new ArrayList<String>();

BufferedReader in = new BufferedReader (new FileReader("/usr/share/dict/words"));

String line;

while ((line = in.readLine()) != null) {

dictionary.add(line.trim());

}

in.close();

// sort strings by their length, then alphabetically

Comparator<String> c = new Comparator<String>() {

public int compare(String s1, String s2) {

int diff = s1.length() - s2.length();

return (diff != 0) ? diff : s1.compareTo(s2);

}

};

Set<String> sorter = new TreeSet<String>(c);

for (String word : dictionary) {

if (word.toLowerCase().charAt(0) == 'q') {

sorter.add(word);

}

int max = -1;

for (String word : sorter) {

if (word.length() > max) {

max = word.length();

System.out.println("Words of length " + max + ":");

}

System.out.println(" " + word);

}

Actually a real Java program would be even longer, because you'd refactor the code clumps above into their own methods. And both the Java and Ruby versions would be approximately doubled in length after you "servicized" them, creating classes and exception handlers and documentation and so on. But the Ruby version will always be a lot shorter. And the longer the program gets, the bigger the difference will be.

Why is the Java code so long? All in all, Java is a relatively pleasant language to work with, and it has a lot to recommend it. I'm a Java fan, no question. But as I'm writing Java code, I keep wondering why it's taking me so long. Fred Brooks has established that the time it takes to write a program depends mostly on its length. Also, the number of bugs in a system is proportional to the size of the code base. So I'd obviously prefer shorter programs. What does Ruby give me that Java doesn't?

Breaking it down, the major differences that came into play in this example are:

1. Java doesn't provide a utility method for opening a text file, reading or processing its lines, and closing the file. This is something you do pretty frequently.
2. Java's collections don't support higher-order functions very conveniently or consistently. The fact that TreeSet can take a Comparator is the only higher-order function we were able to take advantage of here. (The Comparator is a sorting function, which the TreeSet takes as a constructor parameter.)
3. Java doesn't have any shortcuts for sorting. You can't say "mylist.sort()" in Java; it requires more cumbersome infrastructure to specify what you want to do.

And some minor differences contribute as well:

1. Java doesn't import two of the most critical packages for you by default: java.util and java.io. We use these all the time.
2. Java doesn't provide syntax for indexing collections like arrays (e.g. "mythings[2]"), nor does it provide operators for adding things to the collection. I.e., Java collections don't have first-class language status; they're all done via an OOP interface.

One difference that didn't surface in this example is that collection methods in Ruby typically return collections. E.g. if you call a_list.addAll(b_list), the return value in Ruby will be a new list with all the elements of A and B. In Java, you get back the number of items added. This has two important implications: you can't usually chain collection operations in Java (e.g. "mylist.sort.uniq.grep(/foo/)"), and it's inconvenient to program in the functional style, which I'll cover more below.

Anyway, I mentioned earlier that Ruby's Array class has 118 methods, at least in the 1.8.1 version I'm running on my laptop. That's a lot of methods! Ruby's collection classes have a surprisingly rich set of operations, including a bunch that I really wish they had by default in Java collections. For instance, Ruby has defined a bunch of useful operators on arrays, including:

- a + b # returns new array containing elements of a and b
- a - b # returns new array with b's elements removed from a
- a << b # modifies a, appending elements of b
- a[expr] # indexes into a with value of expr

and several others. Note that "+" and "<<" are identical operators, except that "<<" modifies a, and "+" returns a copy of a.

The fact that there are side-effect and non-side-effect versions of the same operators in Ruby is important, so I'll talk about it a bit.

Functional Programming

In C++, Java, Python, and many other languages, collection operations usually work via side-effect, modifying their target. You might naturally assume that this is the best design, for performance reasons: returning a copy of a collection sounds expensive. However, the asymptotic complexity is usually the same -- e.g. "a + b" runs in O(n) time, where n is the number of elements in a and b. "a << b" is still O(n) operations; it just allocates (slightly) fewer objects.

So returning copies is unlikely to produce detectable performance differences, and when it does, your profiler can show you the bottlenecks (usually very few) that need to be tuned.

There's a style of programming called functional programming that tries hard to avoid side-effects, since this eliminates large classes of bugs. Functions with no side-effects are idempotent, making them a convenient building block that you can use without worrying about their effect on state or concurrency.

This idea pops up in a lot of Java literature. Josh Bloch's "Effective Java" hits on it in several places. Item 13 ("Favor immutability") covers it in some detail. Josh says (emphasis his):

It is known as the functional approach because methods return the result of applying a function to their operand without modifying it. Contrast this to the more common procedural approach in which methods apply a procedure to their operand causing its state to change.

Josh then goes on to say the functional approach may seem unnatural at first, but he spends the next six pages (!) explaining its many advantages.

Some other great books on Java also recommend using the functional programming style. Concurrent Programming in Java, for instance, describes various schemes for isolation, containment and immutability, all to avoid the need for explicit thread synchronization, which is notoriously fragile and error-prone.

And if you want to see the idea of functional Java taken to an almost comical extreme, you might try A Little Java, A Few Patterns by Matthias Felleisen and Daniel P. Friedman, the authors of The Little Schemer and The Little MLer, two famous books on the functional programming style.

I own all these books and can't recommend them enough. They'll make you a better programmer no matter what language you use.

Ruby is OO/Functional

Object-oriented programming, procedural programming, and functional programming are three very different styles for designing and writing programs. But they don't need to be mutually exclusive, and often they're not. Some functional languages (Lisp, OCaml) provide object-oriented programming features, because OOP is a really useful way of modeling many problem domains.

How do you decide whether a language is "functional" or "object oriented?" Is it even a reasonable distinction to make? I think it is, because if you want to program in a particular style, you'll want to use a language that makes that style convenient. E.g. you wouldn't really want to do object-oriented programming in ANSI C.

Ruby makes it easy to program in the functional style recommended by Josh Bloch, Doug Lea and other recognized design gurus. This is one of the reasons I find Ruby such a pleasure to work with compared to its peer languages, Perl and Python.

For instance, if I fire up an interactive Ruby shell, an interactive Python shell, and an interactive Perl "shell" (via "perl -d" followed by ctrl-d for the Perl program), observe what happens when I sort an array...

Python:

>>> a = [3, 1, -1, 4, 0, 7]

>>> a.sort()

>>> a

[-1, 0, 1, 3, 4, 7]

So in Python, the sort() function is destructive, modifying the list.

Perl:

DB<1> @a = [3, 1, -1, 4, 0, 7]; DB<2> p @a; ARRAY(0x104bfc2c) DB<3> $a = (3, 1, -1, 4, 0, 7); DB<4> p $a; 7 DB<5> eat flaming death Can't locate object method "eat" via package "flaming" (perhaps you forgot to load "flaming"?) at (eval 27)[/usr/lib/perl5/5.8.2/.perl5db.pl:618] line 2. DB<6> @a = (3, 1, -1, 4, 0, 7); DB<7> p @a; 31-1407 DB<8> p join(', ', @a); 3, 1, -1, 4, 0, 7 DB<9> sort @a; DB<10> p @a; 31-1407 DB<11> p join(', ', sort @a); -1, 0, 1, 3, 4, 7 DB<12> p @a; 31-1407

So in Perl, after a bit of friendly interactive experimentation to remember the right combination of symbols for declaring an array (as opposed to a reference to an array, or an array evaluated in a scalar context, or a reference to an array evaluated in a list context), we see that the built-in sort function is functional-style, and returns a copy without modifying the array. The same goes for Perl's other collection functions (map, grep, reverse, and the like). So Perl is a functional-friendly language. Perl's overall friendliness is still a matter of some debate.

Ruby:

irb(main):001:0> a = [3, 1, -1, 4, 0, 7]

=> [3, 1, -1, 4, 0, 7]

irb(main):002:0> a.sort

=> [-1, 0, 1, 3, 4, 7]

irb(main):003:0> a

=> [3, 1, -1, 4, 0, 7]

In Ruby the sort function is non-destructive. The same goes for all of Ruby's collection functions, so Ruby is functional-friendly.

However, Ruby also provides destructive versions of the functions if you want to modify the array in-place, using the Scheme language convention of appending a "!" to the method name for the destructive version:

irb(main):007:0> [].methods.sort.grep(/!/)

=> ["collect!", "compact!", "flatten!", "map!", "reject!", "reverse!", "slice!", "sort!", "uniq!"]

Nice! (Note: Ruby's functions borrowed to make life easier for Perl programmers, notably push/pop/shift/unshift, don't follow the "!" naming convention, but they do modify the array as in Perl.)

Ruby is also fully Object-Oriented, and unlike in Python and Perl, the OO features aren't just bolted onto the side; they're designed into Ruby from the ground up. It supports inheritance, mixins, composition, public/private/protected visibility, static members, reflection, and all the other things you'd expect.

Also, Ruby's support for static methods isn't totally busted like Java's. In Java, static methods are just global functions stuck in a class's namespace, and don't have parity with instance methods in terms of support for overriding, interfaces, polymorphism, etc. Ruby's static methods and members work the way you wanted them to in Java.

Ruby also supports a large number of OO features not present in Java, including metaclasses, mutable classes, singletons, operator overloading, delegation and forwarding, message passing, enumerating all instances of a class at runtime, and more. Ruby's OO support is more powerful, flexible, and consistent than Java's.

Ruby does provide you with the ability to write "scripts" without defining any classes. But it's just a trick to make procedural-style programming easy when you need it: all your top-level statements and functions are being assembled into an anonymous wrapper object that you can reference with "self":

irb(main):008:0> self.methods

=> ["irb_workspaces", "send", "kill", "irb_quit", "irb_popws", ...]

Oooohhh, now that is pretty cool.

So I think Ruby can genuinely make the claim: "Ruby works the way you do!" And unlike in Perl, it's not just a marketing gimmick.

When would I use Ruby?

For starters, anywhere I'd think of using Awk, Bash, Python, or Perl, and I'm not required to use one of those languages, then Ruby's a good option.

Ruby's threads scare me a little, for two reasons:

1. They're "green threads", implemented and managed by the Ruby interpreter, which means they're more fragile than native threads. E.g. a single deadlock brings the whole process to a halt (although for what it's worth, you can't do much about a deadlock in Java, either, since you can't kill threads safely or reliably.) A more serious problem is that a thread that blocks on an OS call will block the other threads too.
  1. Any large system is likely to be multi-threaded. You can get by just fine with multi-processing (Perl folks do this all the time; Ruby threads may be scary, but if you use Perl threads you'll be accounted a suicide), and Ruby folks would argue that you shouldn't build giant systems in the first place, but things are rarely that simple in practice.
2. I don't feel the semantics of Ruby's threads are well-documented, at least compared with pthreads or Java threads. There are a lot of details in the specification for Java threads, and when I don't see that kind of detail in Ruby's thread documentation, I worry that nobody really understands how they interact with all of Ruby's other language features.

So I'd be reeeeally leery about building a significant multi-threaded system in Ruby. I'd be comfortable implementing concurrency using multiple OS processes, but that's a bit cumbersome if you're used to threads.

I'd definitely consider using Ruby as an embedded interpreter, e.g. if I needed an expression language. Ruby's syntax is pretty much exactly what I've seen people come up with for their home-grown expression languages; I figure you might as well save yourself the effort and just embed Ruby in your C/C++/Java application, since it's trivial to do. (You'd use JRuby for embedding in Java.)

Lastly, I'm still undecided on strong typing vs. weak typing: in particular, whether I'd feel comfortable building a large system in a non-statically-typed language. I've seen people do it, of course -- IMDB is a good example of a large, working system written in Perl.

But I'm dubious about it. Not because I think compiler type-checks are the last word in runtime correctness -- no, you still need unit tests (which we rarely do, but it's not like we can exactly say we're "getting away with it", either). It's because type tags are extra documentation, and automatically checkable documentation at that. It can help with code-analysis tools, API-doc generators, and (possibly) human readability. In large systems, I think having the type tags may be a big win.

But I've never written a large system in a latently-typed language, so I don't know for sure! If you have, I'd love to hear how your experience went.

In the meantime, we have tons of little automation and embedding tasks that Ruby is perfectly suitable for. So I hope you get a chance to play with it a bit. I think you'll find that Ruby takes a lot of the tedium out of programming, and you'll have fun with it.

(Published Oct 26, 2004)

Back to Stevey's Drunken Blog Rants™