FID, or Fréchet Inception Distance, is a metric used to evaluate the quality of images generated by generative models, like GANs (Generative Adversarial Networks). It measures how similar the generated images are to real images in terms of visual quality and diversity.
Here's how FID works:
Feature Extraction: Images (both real and generated) are passed through a pretrained Inception network (usually InceptionV3) to extract features from a specific layer. These features are high-level representations of the images.
Statistical Comparison: FID compares the statistical properties (mean and covariance) of the extracted features for the real and generated images. The idea is to measure how closely the feature distributions of generated images match those of real images.
Mathematical Calculation: The FID is computed using the Fréchet distance (also known as Wasserstein-2 distance) between two multivariate Gaussian distributions:
FID=∣∣μr−μg∣∣2+Tr(Σr+Σg−2(ΣrΣg)1/2)FID = ||\mu_r - \mu_g||^2 + Tr(\Sigma_r + \Sigma_g - 2(\Sigma_r \Sigma_g)^{1/2})
μr\mu_r, μg\mu_g: Mean feature vectors for real and generated images.
Σr\Sigma_r, Σg\Sigma_g: Covariance matrices for real and generated images.
Interpretation:
Lower FID indicates better quality and diversity in the generated images (closer to real images).
Higher FID suggests the generated images are less realistic or diverse.
Image Generation: Evaluating GANs or diffusion models.
Style Transfer: Measuring how well the style transfer retains image quality.
Data Augmentation: Comparing augmented datasets with the original ones.
Since FID evaluates both quality and diversity, it's widely used in research and development of image synthesis models.