Posts‎ > ‎

Java 8, Haskell, FP and Monoids

posted Mar 1, 2014, 9:12 AM by Renato Athaydes   [ updated Mar 1, 2014, 2:02 PM ]
I've been writing some code using Java 8 to get familiar with it. As any Java developer should know, Java 8 is just about to be released (if there's no more delays, it will come out in March 2014), so there's no more time to wait to get our hands dirty!

Java 8 brings a volume of changes when compared to Java 7 never before seen between single major releases. Lambda expressions, mixins, streams and its rich, fluent API, a new Date Time API, to mention but a few novelties.

Clearly, Java 8 focuses on bringing Functional Programming (FP) to Java programmers. But what exactly is FP, why does everyone seem to be so excited about it these days, and can Java programmers really benefit from it (or is it just a stunt to desperatly try to stop the flock from seeking greener pastures)?

In order to find out, I decided to (once again) dive deep into FP with another language, a language which is widely accepted as one of the main "pure" FP languages out there: Haskell.

As in my previous, failed tries, it took me several days to be able to write my own, simple programs to do things like calculate someone's BMI (Body-Mass Index). That's why I gave up previously! Even the beautiful illustrations in the Learn you a Haskell for great good tutorial, reminescent of pre-school (which made me feel even more like a 8-year old trying to learn maths all again) didn't seem to help... but that's quite understandable, after all, I am trying to "think" using a very different paradigm than what I am used to. It felt the same when I was at university and tried to write quicksort in Scheme and Prolog for an assignment (hard times).

But the thing is: after you learn your basic Haskell and are confident writing list comprehensions and let expressions, you notice that, actually, you've just got started with real FP! You've just seen the tip of the iceberg, and are still far, very far from understanding how after you've mastered monads, monoids, applicators and the likes, you won't even need comprehensions.

For example, in an early chapter about IO in Learn you a Haskell, you get to write code like this:

myAction :: IO String  
myAction = do
    a <- getLine  
    b <- getLine  
    return $ a ++ b

That's the basic way of doing IO in Haskell that even looks a little "imperative", courtesy of the do notation (but beware, do may be considered harmful)... but then, later, you learn that pros would probably have used an Applicative functor to write the same thing as a one-liner:

myAction :: IO String  
myAction = (++) <$> getLine <*> getLine

In fact, it seems to me that FP is all about encapsulating and abstracting functionality in such a way that it becomes possible to write almost anything as a one-liner. As long as you know how to use applicative functors, of course.

But I have to confess I am not yet comfortable with basic functors, let alone Applicatives, so for now, let's talk about something I found simpler and actually quite useful: Monoids.

Fear not the Monoids


Monoids are a mathematical concept (but fear not, they are really simple!) which describe any operator (represented here as *) which obey the Monoid rules:
  • identity law: there exists an element e such that, for any element a:
    • e * a == a * e == a
  • associative law: for all a, b and c:
    • a * (b * c) == (a * b) * c
If you just skipped the above after seeing the formulas, do not worry, it's even easier to understand with an example.

Suppose you need to know how many requests your web servers have received, in total, in the last few seconds. Each webserver is queried once a second and returns how many requests it handled since it was last queried. So you end up with something like this:

Web Server A:
 Time (s)  
 12
3
4
 Requests 83
-
6

Web Server B:
 Time (s)  
 12
3
4
 Requests 25
3
-

Sometimes, your server may return an empty value or just not respond, for whatever reason. How do you "aggregate" the data so you know the total number so far?
Pretty simple, you might say: just add up all the values in the table, replacing missing values with 0!

Well, you've just described something that clearly fits into the concept of a Monoid: a monoid where the operator is sum and identity is 0.
But the nice thing is that this is a very common pattern that can be applied in many other interesting situations.

A few other examples of "things" that can be Monoids are:
  • Product ( operator : multiplication, identity: 1 )
  • String ( operator: append, identity: "" )
  • List ( operator: append, identity: [] )

A perhaps less intuitive example is Boolean, which can be made Monoid in 2 different ways:

  • All ( operator: &&, identity: True )
  • Any ( operator: ||, identity: False )


Monoids in code

Nothing explains how this can be useful better than actual code. So, let's use a Monoid to solve a really common problem: to sort some items based on a number of attributes.

Problem statement:

Given several instances of Person in a list, sort them by, in this order - firstName, lastName, age, id.

Haskell solution using Data.Monoid

First of all, let's see how Monoids are defined in Haskell:

class Monoid m where  
    mempty :: m  
    mappend :: m -> m -> m  
    mconcat :: [m] -> m  
    mconcat = foldr mappend mempty

A Monoid is a type class in the Data.Monoid module in Haskell (not to be confused with classes in Java and similar languages, which are very different things).
To make some type m an instance of Monoid, all one needs to do is declare an instance of Monoid for that type providing implementations for mempty and mappend (optionally, also for mconcat).

For example, Lists are made Monoids through the following declaration:

instance Monoid [a] where  
    mempty = []  
    mappend = (++) 

Notice how concise this is (see my comparison between Haskell, a strongly-typed language, with Groovy, which is dynamically-typed and does away with all it can from the Java syntax, to see how Haskell can be extremely concise in all sorts of situations)!

It matches exactly what we had discussed previously: a Monoid must have an identity element (mempty) and an associative operator (mappend).
++ is the append operator in Haskell, and [] is the empty list.

Apparently, mappend is kind of a misnomer... it makes sense for lists and Strings, where the operator is actually the append operation, but not for Product, for example! FP folks seem to enjoy giving things bad names, though, so it is important to note that Java 8 does not seem to have just copied all of the names they use in FP land, which is pretty good.

Now that we know what a Monoid looks like in practice, let's declare a data constructor for Person and see how we can use Monoids:

data Person = Person { pid :: Integer, fname :: String, lname :: String, age :: Int }
              deriving (Show, Eq)

We derive instances of Show and Eq for Person, which means that Haskell will automatically be able to print a person, or show it, in Haskell parlance, and tell when they are equal (by comparing each field of the persons being compared, in the order they are declared, which comes in really handy here)!

We could also have derived an instance of Ord, so that Haskell would "know" how to order, or sort, people... but the automagic would only work if we had placed the fields in the order in which we want people to be sorted, but that is not the case (at least we got Eq for free!).

Now, luckily, all of the fields in Person are instances of Ord! That means that we can compare them easily and get an Ordering as a result. An Ordering in Haskell is defined as follows:

data Ordering = LT | EQ | GT

LT =  Less Than
EQ = Equal
GT = Greater Than

This is similar to an enum in Java. When you make a comparison in Haskell using the compare function, you get one of the enumerated values of Ordering as a result: GT, EQ, or LT.

The great thing about Ordering is that they are also Monoids!

The definition of Ordering as a Monoid looks like this:

instance Monoid Ordering where  
    mempty = EQ  
    LT `mappend` _ = LT  
    EQ `mappend` y = y  
    GT `mappend` _ = GT 

In Haskell, _ means that we don't care what the argument is. So, in the definition of mappend, if the first argument is LT, the result is always LT. If it is EQ, the result is the second argument, and if it is GT, the result is GT.

Now, finally, we can see how to use Monoids:

personCompare :: Person -> Person -> Ordering
personCompare p1 p2 = (fname p1 `compare` fname p2) `mappend`
                      (lname p1 `compare` lname p2) `mappend`
                      (age p1   `compare` age p2)   `mappend`
                      (pid p1   `compare` pid p2)

Now, we can test our function:

import Data.List

let p1 = Person 1 "a" "b" 20
let p2 = Person 2 "a" "a" 20
let p3 = Person 3 "b" "a" 15
let p4 = Person 4 "b" "a" 12

sortBy personCompare [p1, p2, p3, p4]

The last function call results in:

[ Person {pid = 2, fname = "a", lname = "a", age = 20},
  Person {pid = 1, fname = "a", lname = "b", age = 20},
  Person {pid = 4, fname = "b", lname = "a", age = 12},
  Person {pid = 3, fname = "b", lname = "a", age = 15} ]

As expected!

Can you see how the problem of aggregating data points coming from different sources (as in the web servers example above) has the exact same structure as the problem of comparing Persons by their fields? The common structure is captured in the concept of Monoids. This is the kind of idea that is at the heart of computer science, and FP brings it to life beautifully.

Java solutions - Java 7 and 8

In Java, everything will be more verbose!! That's for sure, but we can still achieve the same result... except that Java, even Java 8, doesn't have Monoids (as far as I know, at least). Maybe the Oracle devs couldn't get past the first 3 or 4 chapters in Learn you a Haskell?

But that's ok, because we can easily write that!

Here's a Monoid definition in Java 8:

public interface Monoid<M> {

    public M operator( M m1, M m2 );

    public M identity();

    default M applyOn( Stream<M> ms ) {
        return ms.reduce( identity(), this::operator );
    }

}

We also don't have Ordering in Java :(, so let's define that as well!
public enum Ordering {

    LT( -1 ), GT( 1 ), EQ( 0 );

    public final int intValue;

    private Ordering( int intValue ) {
        this.intValue = intValue;
    }

}

The intValue comes in handy when we want to convert the result of Java's Comparator::compareTo method to an Ordering:

public static Ordering asOrdering( int intValue ) {
        return intValue == 0 ?
                EQ :
                intValue > 0 ?
                        GT :
                        LT;
}

public static <T> Ordering asOrdering( Comparable<T> c1, T c2 ) {
        return asOrdering( c1.compareTo( c2 ) );
}

We must also make Ordering an "instance" of Monoid before we can unleash its Monoid powers:

public class OrderingMonoid implements Monoid<Ordering> {

    @Override
    public Ordering operator( Ordering o1, Ordering o2 ) {
        switch ( o1 ) {
            case LT:
                return LT;
            case EQ:
                return o2;
            default:
                return GT;
        }
    }

    @Override
    public Ordering identity() {
        return EQ;
    }

}

Now, we can define the Person class (no getters so it will fit in one blog post):

public static class Person {
        final int id;
        final String fname;
        final String lname;
        final int age;

        public Person( int id, String fname, String lname, int age ) {
            this.id = id;
            this.fname = fname;
            this.lname = lname;
            this.age = age;
        }

        @Override
        public String toString() {
            return "(" + id + "," + fname + "," + lname + "," + age + ")";
        }

}

We define a list of Persons called people, then sort it first using the traditional Java 7 and earlier approach:

people.sort( new Comparator<Person>() {
            @Override
            public int compare( Person p1, Person p2 ) {
                int comparison = p1.fname.compareTo( p2.fname );
                if ( comparison == 0 )
                    comparison = p1.lname.compareTo( p2.lname );
                if ( comparison == 0 )
                    comparison = Integer.valueOf( p1.age ).compareTo( p2.age );
                if ( comparison == 0 )
                    comparison = Integer.valueOf( p1.id ).compareTo( p2.id );
                return comparison;
            }
} );

This is pretty ugly... when compared to Haskell, it does look almost insane!

Alright, I know... Java can do better! Even Java 7, with some cleverness... but this is what your average time-constrained dev would come up with.

If you don't think this is too bad, sorry but let's look again at the Haskell solution:

personCompare p1 p2 = (fname p1 `compare` fname p2) `mappend`
                      (lname p1 `compare` lname p2) `mappend`
                      (age p1   `compare` age p2)   `mappend`
                      (pid p1   `compare` pid p2)

It should be our mission to get as close to this as possible! Practically no noise, everything you see is actually necessary (and nothing is explicitly hidden from view either).

But let's now see what we can achieve with Java 8 and our own Monoid and Ordering classes:

OrderingMonoid orderingM = new OrderingMonoid();
people.sort( ( p1, p2 ) -> orderingM.applyOn( Stream.of(
                asOrdering( p1.fname, p2.fname ),
                asOrdering( p1.lname, p2.lname ),
                asOrdering( p1.age, p2.age ),
                asOrdering( p1.id, p2.id )
) ).intValue );

Now wait, this is pretty awesome Java 8 code! Just as good (if not as concise, still), in my modest opinion, as the Haskell version! Also, it is essentially a one-liner, because although the important code was split into around 5 lines for readability, it is actually a single expression... just like in Haskell! So I feel that I have accomplished something, really.

This is what code should look like. Imperative, Functional or whatever....

Conclusion

It turns out that monoids can really make our code (even our Java code) more concise and much more readable. It did take quite some work to get it all in place, but once it's there, it's highly re-usable! And in any case, you don't have to write any of this, I have done that for you and you are welcome to use it as much as you want.

See my GitHub project functions4j where you will find Monoid, Ordering, and, I hope, much more advanced stuff in the future... you can also find the Java examples above, and many others!

I am trying to learn the other stuff that FP people like to talk about, like Monads and Functors, and hope to keep adding the stuff that I find useful to functions4j so that us, mere mortals working with Java, can also benefit from that. After all, once explained properly, I am sure that almost anything is, deep inside, so simple that even I and the average Joe dev can understand :)




Comments