Data preprocessing for variant discovery