Benchmark Datasets for Reproducible Data-Driven Science in Solar Physics

Many recent, important advances in the broad fields of data-driven knowledge discovery and data mining stem from the ever-increasing complexity and realism of machine learning algorithms and the hand-in-hand improvement in computing power that can accommodate this algorithmic complexity. However, there is an increasingly endorsed view (e.g., [1], [2]) that non-incremental, many major breakthroughs in data-driven discoveries are constrained by the unavailability of high-quality datasets, limiting the current machine learning capability. While solar physics research is still strongly driven by model-based approaches, we take note of a rapidly growing number of machine learning-based studies, specifically in the solar end of space-weather forecasting. Well-prepared, readily available and extensive benchmark datasets can significantly benefit the research performed by this steadily growing community of solar data scientists. This working group will discuss and provide clear recommendations on the important aspects of creation, maintenance, storage, dissemination and applications of solar benchmark datasets.

Founding working group members: Rafal A. Angryk (GSU/Computer Science), Berkay Aydin (GSU/Computer Science), Manolis K. Georgoulis (GSU/Physics and Astronomy), Petrus C. Martens (GSU/Physics and Astronomy)

[1] https://www.kdnuggets.com/2016/05/datasets-over-algorithms.html

[2] https://www.kdnuggets.com/2015/06/machine-learning-more-data-better-algo...