Entropy, KL Divergence and Mutual Information

Entropy changes with the choice of parameterization.

A random variable x has PDF f(x). Now say, y = 2x, then the PDF of y, g(y) = f(2x)
2

. Visually the distribution is stretched horizontally, and thus the “height” of each point in the x-axis decreases.
Then H(Y) = − ∫g(y)logg(y)dy = − 12∫f(2x)log(f(2x) ⁄ 2)2dx = H(x) + log2.

Why mutual information is defined as I(X;Y) =
1. dxdy? Because f(x)f(y)is the most “random”(“chaotic”) distribution with marginals f(x) and f(y). I(X;Y) measures the distance (KL-divergence) between f(x, y) and f(x)f(y), and reveals how much uncertenty is lost due to the interaction of x and y in f(x, y).

∫

f(x, y)logf(x, y)
f(x)f(y)

Google Sites

Report abuse