Entropy, KL Divergence and Mutual Information

  1. Entropy changes with the choice of parameterization.
      1. A random variable x has PDF f(x). Now say, y = 2x, then the PDF of y, g(y) = f(2x)
      2. 2
    1. . Visually the distribution is stretched horizontally, and thus the “height” of each point in the x-axis decreases.
    2. Then H(Y) = − ∫g(y)logg(y)dy = − 12∫f(2x)log(f(2x) ⁄ 2)2dx = H(x) + log2.
  1. Why mutual information is defined as I(X;Y) =
    1. dxdy? Because f(x)f(y)is the most “random”(“chaotic”) distribution with marginals f(x) and f(y). I(X;Y) measures the distance (KL-divergence) between f(x, y) and f(x)f(y), and reveals how much uncertenty is lost due to the interaction of x and y in f(x, y).
    1. f(x, y)logf(x, y)
    2. f(x)f(y)