Multi-site data pooling is increasingly used in medical imaging to enhance sample sizes, subject cohorts, and statistical power. However, combining data from different sites introduces non-biological variations, particularly in MRI data, due to differences in scanner vendors, acquisition protocols, field strengths, and software/hardware upgrades. These variations can degrade the performance of machine learning (ML) and deep learning (DL) models trained on multi-site data.
To address these challenges, harmonization methods have been developed, ranging from traditional image processing techniques to advanced data-driven approaches. Traditional methods normalize raw image data to a predefined intensity range, while recent methods harmonize either pre-extracted features or entire 3D/2D images. Notably, generative models like GANs, VAEs, flow-based models, and diffusion models have shown superior performance in multi-site MRI harmonization. This tutorial will cover foundational and state-of-the-art harmonization approaches, introduce relevant datasets and toolboxes, and discuss evaluation metrics for assessing harmonization quality.