Entropy, KL Divergence and Mutual Information
- Entropy changes with the choice of parameterization.
- A random variable x has PDF f(x). Now say, y = 2x, then the PDF of y, g(y) = f(2x)
- 2
- . Visually the distribution is stretched horizontally, and thus the “height” of each point in the x-axis decreases.
- Then H(Y) = − ∫g(y)logg(y)dy = − 12∫f(2x)log(f(2x) ⁄ 2)2dx = H(x) + log2.
- Why mutual information is defined as I(X;Y) =
- dxdy? Because f(x)f(y)is the most “random”(“chaotic”) distribution with marginals f(x) and f(y). I(X;Y) measures the distance (KL-divergence) between f(x, y) and f(x)f(y), and reveals how much uncertenty is lost due to the interaction of x and y in f(x, y).
- ∫
- f(x, y)logf(x, y)
- f(x)f(y)