dxdy? Because f(x)f(y)is the most “random”(“chaotic”) distribution with marginals f(x) and f(y). I(X;Y) measures the distance (KL-divergence) between f(x, y) and f(x)f(y), and reveals how much uncertenty is lost due to the interaction of x and y in f(x, y).