Comparing Apples and Apples: Experimentation with and Benchmarking of Hyperparameter Tuning