Research

Most Recent Research

Advancing Solar Flare Prediction using Deep Learning with Active Region Patches

Berkay Aydin, Anli Ji, Nigar Khasayeva, Rafal A. Angryk, Petrus Martens, Manolis K. Georgoulis

In this paper, we present a novel approach to solar flare prediction using shape-based features of magnetograms from active region (AR) patches across the entire solar disk, spanning -90° to +90° in longitude. Our methodology involves developing three deep learning models—ResNet34, MobileNet, and MobileViT—focused on predicting ≥M-class flares. We evaluate the models' performance across varying solar longitudes to assess their robustness and efficacy. The primary contributions of this work are twofold: (i) we introduce a novel capability for solar flare prediction that spans the entire solar disk, allowing for the prediction of flares in all AR patches, and (ii) we demonstrate the performance of our models in predicting flares, particularly in near-limb regions (±60° to ±90°), which are traditionally challenging areas for AR-based predictions. This advancement in AR-based modeling opens up new possibilities for more reliable solar flare predictions, enhancing space weather forecasting systems.

Available here

The process flow diagram of data processing pipeline used in this work. It shows a sequential pipeline for creating JPEG images from magnetogram rasters and corresponding bitmaps along with data augmentation pipeline given the label for the magnetogram patch. Boxes colored in green collectively defines our entire dataset.

Sliding Window Multivariate Time Series Classification and Ranking

Anli Ji, Chetraj Pandey, Berkay Aydin

Recently, the synergy of physics-based feature engineering and data-intensive methods, including machine learning and deep learning, has ushered in a new era in the analysis and prediction of space weather forecasting, specifically for solar flare prediction. These sophisticated approaches play a pivotal role in understanding the complex mechanisms leading to solar flares, with a primary focus on forecasting these events and mitigating potential risks they pose to our planet. While current methodologies have made substantial advancements, they are not without limitations, and one particularly glaring limitation is the neglect of temporal evolution characteristics within the active regions from which solar flares originate. This oversight impairs the capacity of these methods to capture the intricate relationships among high-dimensional features of these active regions, thereby constraining their practical utility. Our study focuses on two key objectives: the development of interpretable classifiers for multivariate time series data and the introduction of an innovative feature ranking method using sliding window-based sub-interval ranking. The central contribution of our work lies in bridging the gap between complex, less interpretable "black-box" models typically employed for high-dimensional data and the exploration of pertinent sub-intervals within multivariate time series data, with a specific emphasis on solar flare forecasting.

Available here

Exploratory Analysis of Magnetic Polarity Inversion Line Metadata and Eruptive Characteristics of Solar Active Regions

Berkay Aydin, Anli Ji, Nigar Khasayeva, Rafal A. Angryk, Petrus Martens, Manolis K. Georgoulis

We have developed a novel, open-source, GPU-accelerated toolkit for detecting solar magnetic polarity inversion lines (MPILs), which are crucial boundaries separating regions of opposite magnetic polarity. MPILs play a vital role in predicting solar instabilities such as flares and eruptive events. This toolkit efficiently generates multi-resolution MPILs and provides a large-scale, publicly available MPIL dataset that spans nearly the entire solar cycle 24, covering May 2010 to March 2019. The dataset is created using line-of-sight (LoS) magnetograms from the Solar Dynamics Observatory's Helioseismic and Magnetic Imager Active Region Patches (HARP) data series, comprising 4090 HARP series. It includes six types of MPIL-related binary masks: detected MPILs, Regions of Polarity Inversion (RoPI), Positive Polarity Regions, Negative Polarity Regions, Unsigned Polarity Regions, and the convex hull of MPILs. In addition, we provide structured metadata in the form of multivariate time series, extracted from these masks, to support various space weather forecasting and analytics tasks. Our MPIL detection process synergizes morphological operations on magnetic field data with the computational power of GPUs. This integration not only enhances the precision and reliability of MPIL detection but also significantly improves computational efficiency. We envision that this expanded MPIL dataset will advance space weather research, specifically in analyzing MPIL structure, evolution, and their role in solar eruptions, complementing existing datasets used in space weather forecasting.

Available here

A Modular Approach to Building Solar Energetic Particle Event Forecasting Systems

Anli Ji , Akhil Arya , Dustin Kempton , Rafal Angryk , Manolis K. Georgoulis , Berkay Aydin

Unlike common predictions that focus on the occurrence of an event, an All-Clear forecast puts more emphasis on predicting the absence of an event. Such forecasts, while usually not addressed directly, can be crucial in operational environments. We have developed an All-Clear SEP event prediction system utilizing active region- based prediction methods together with active region scenarios (i.e., location and complexity). Within our All-Clear forecast system, signals are generated only when requested as binary predictions of YES or NO indicating “All Clear” or “Not All Clear”, respectively. Such signals referred to the potential possibility of the occurrence of any events in the next prediction window, in our cases, the next 24 hours.

Available here

Figure. An illustration of the prediction workflow for a hypothetical set of active regions and multivariate time series parameters (MVTS), derived from their NRT Magnetogram Patches.

Four space weather event forecasting modules are established corresponding to the flare prediction (FP), eruptive flare prediction (ERP), CME speed prediction, and full-disk aggregation methodology. All of them are loosely coupled without direct communications between each other using microservices. Our system design follows a modular approach for flexibility, maintainability, and extensibility that can be configured to utilize various data access mechanisms, such as file storage or database systems, outside the confines of our system.

Towards Coupling Full-disk and Active Region-based Flare Prediction for Operational Space Weather Forecasting

Chetraj Pandey; Anli Ji, Rafal A. Angryk, Manolis Georgoulis, Berkay Aydin

We present a set of new heuristic approaches to train and deploy an operational solar flare prediction system for ≥M1.0-class flares with two prediction modes: full-disk and active region-based. In full-disk mode, predictions are performed on full-disk line-of-sight magnetograms using deep learning models whereas in active region-based models, predictions are issued for each active region individually using multivariate time series data instances. The outputs from individual active region forecasts and full-disk predictors are combined to a final full-disk prediction result with a meta-model. We utilized an equal weighted average ensemble of two base learners’ flare probabilities as our baseline meta learner and improved the capabilities of our two base learners by training a logistic regression model.

Available here

The major findings of this study are: (i) We successfully coupled two heterogeneous flare prediction models trained with different datasets and model architecture to predict a full-disk flare probability for next 24 hours, (ii) Our proposed ensembling model, i.e., logistic regression, improves on the predictive performance of two base learners and the baseline meta learner measured in terms of two widely used metrics True Skill Statistic (TSS) and Heidke Skill Score (HSS), and (iii) Our result analysis suggests that the logistic regression-based ensemble improves on the full-disk model (base learner) by ∼ 9% in terms TSS and ∼ 10% in terms of HSS. Similarly, it improves on the AR-based model (base learner) by ∼ 17% and ∼ 20% in terms of TSS and HSS respectively. Finally, when compared to the baseline meta model, it improves on TSS by ∼ 10% and HSS by ∼ 15%.

Figure. A timeline diagram to present the problem formulation of our deep learning-based full-disk flare prediction model using bi-daily observations of full-disk line-of-sight magnetograms and prediction window of 24 hours considered to label the magnetogram instances.

Deep Neural Networks based Solar Flare Prediction using Compressed Full-disk Line-of-sight Magnetograms

Chetraj Pandey; Rafal A. Angryk; Berkay Aydin

We selected three prediction modes, among which two are binary for predicting the occurrence of ≥M1.0 and ≥C4.0 class flares and one is a multi-class mode for predicting the occurrence of <C4.0, [≥C4.0, <M1.0] and ≥M1.0 within the next 24 hours. We perform our experiments in all three modes using three well-known pre-trained CNN models—AlexNet, VGG16 and ResNet34. For this, we collected compressed 8-bit images derived from full-disk line-of-sight magnetograms provided by the Helioseismic and Magnetic Imager (HMI) instrument onboard Solar Dynamics Observatory (SDO). We trained our models using data-augmented oversampling to address the existing class-imbalance issue by following a time-segmented cross-validation strategy to effectively understand the accuracy performance of our models and used true skill statistics (TSS) and Heidke skill score (HSS) as metrics to compare and evaluate.

Available here

Figure. An overview of three deep learning architectures we use (a) AlexNet-, (b) VGG16-, (c) ResNet34-based models for both the binary and multi-class ﬂare prediction. Models produce a set of probabilities determined based on the prediction mode.

The major results of this study are (1) we successfully implemented an efficient and effective full-disk flare predictor for operational forecasting using compressed images of solar magnetograms; (2) Our candidate model for multi-class flare prediction achieves an average TSS of 0.36 and average HSS of 0.31. Similarly, for binary prediction in (i) ≥C4.0 mode: we achieve an average TSS score of 0.47 and HSS score of 0.46 (ii) ≥M1.0 mode: we achieve an average TSS score of 0.55 and HSS score of 0.43.

We followed two time-segmented cross-validation strategies: chronological and non-chronological, to effectively understand the predictive skill of our models. We also trained our models using data-augmentation and oversampling to address the existing class imbalance issue and used true skill statistic (TSS) and Heidke skill score (HSS) as metrics to compare and evaluate. Our experimental evaluation suggests that training a flare prediction model is heavily influenced by the sampling strategies involved due to the imbalanced nature of the datasets and predicting ≥M1.0 class flares is a more challenging task compared to ≥C1.0 ones.

Available here

Multiscale IoU:

A Metric for Evaluation of Salient Object Detection with Fine Structures

A. Ahmadzadeh, D. J. Kempton, Y. Chen, R. A. Angryk

Available at IEEE and ArXiv

A Framework for Local Outlier Detection from Spatio-Temporal Trajectory Datasets

X Cai, B Aydin, A Ji, R Angryk

We develop an interpretable, clustering-based technique to detect local outliers in multi-type trajectory datasets by utilizing spatial and temporal attributes of moving objects. This local outlier detection involves three phases. First, we apply a temporal partition to divide the raw trajectory into multiple trajectory segments and extract trajectory features from spatial and temporal attributes for each trajectory segment. Second, we generate template features of trajectory segments by applying a clustering schema. Lastly, we use the abnormal score - a novel dissimilarity measure, which quantifies the disparity among the query and template trajectory segments in terms of trajectory features and hence determines the local outliers based on the distribution of abnormal score.

To demonstrate the effectiveness of our method, we conducted three case studies on the real-life spatio-temporal trajectory datasets from the solar astroinformatics domain. Those are solar active regions, coronal mass ejections, polarity inversion lines (PIL). Our experimental results show that our local outlier detection approach can effectively discover the erroneous reports from the reporting module and abnormal phenomenon in various spatio-temporal trajectory datasets.

How to Train Your Flare Prediction Model: Revisiting Robust Sampling of Rare Events

A. Ahmadzadeh, B. Aydin, M. Georgoulis, D. J. Kempton, S. S. Mahajan, and R. A. Angryk

We have been working on a case study of solar flare forecasting by means of metadata feature time series. We treat this data as a prominent class-imbalance and temporally coherent problem. We take full advantage of pre-flare time series in solar active regions, which is made possible thanks to the Space Weather Analytics for Solar Flare benchmark dataset, known as SWAN-SF. This benchmakr dataset is a partitioned collection of multivariate time series of active region properties comprising 4075 regions and spanning over 9 years of the Solar Dynamics Observatory (SDO) period of operations.

Twelve consecutive time series slices for the parameter Total Unsigned Current Helicity (TOTUSJH) corresponding to an M1.0-class flare associated to NOAA AR 11875 (HARP 3291). Each time series spans over 12 hours of observation, with a 12-minute cadence.

We showcase the general concept of temporal coherence (figure above) triggered by the demand of continuity in time series forecasting and show that lack of proper understanding of this effect may spuriously enhance models’ performance.

We further address another well-known challenge in rare event prediction, namely, the class-imbalance issue. The SWAN-SF is an appropriate dataset for this, with a 60:1 imbalance ratio for GOES M- and X-class flares and a 800:1 for X-class flares against flare-quiet instances (figure on the right). We revisit the main remedies for these challenges and present several experiments to illustrate the exact impact that each of these remedies may have on performance.

This study is now in press, but a pre-print is available here.

This study is published in The Astrophysical Journal Supplement Series, here.

All-Clear Flare Prediction Using Interval-based Time Series Classifiers

Anli Ji, Berkay Aydin, Manolis K. Georgoulis, Rafal Angryk

An all-clear flare prediction is a type of solar flare forecasting that intends to predict relatively small flares and flare quiet regions. This type of prediction focuses more on forecasting the non-flaring class more precisely instead of simply a binary or probabilistic estimation of whether a flare will occur. While many flare prediction studies do not address this problem directly, all-clear predictions can be useful in operational context. However, in all-clear predictions, finding the right balance between avoiding false negatives (misses) and reducing the false positives (false alarms) is often challenging.

We put more emphasis on predicting non-flaring instances with high precision while still maintaining valuable predictive results. Our study focuses on training and testing a set of interval-based time series classifiers named Time Series Forest (TSF). These classifiers will be used towards building an all-clear flare prediction system by utilizing multivariate time series data. An ensemble schema and overview of the system is shown in the figure.

Schematic overview of the homogeneous ensemble pipeline

Our research is built around three branches: data collection, predictive model building and evaluation processes, and comparing our time series classification models with baselines using our benchmark datasets. Our results show that time series classifiers provide better forecasting results in terms of skill scores, precision and recall metrics, and they can be further improved for more precise all-clear forecasts by tuning model hyperparameters.

This study is published in IEEE Big Data 2020, and is accessible here, and also on arXiv here.

Multivariate time series dataset for space weather data analytics

Rafal A. Angryk, Petrus C. Martens, Berkay Aydin, Dustin Kempton, Sushant S. Mahajan, Sunitha Basodi , AzimAhmadzadeh1, XuminCai, Soukaina Filali Boubrahimi, Shah Muhammad Hamdi, Michael A. Schuh & Manolis K.Georgoulis

We introduce and make openly accessible a comprehensive, multivariate time series (MVTS) dataset extracted from solar photospheric vector magnetograms in Space weather HMI Active Region Patch (SHARP) series. Our dataset also includes a cross-checked NOAA solar flare catalog that immediately facilitates solar flare prediction efforts. We discuss methods used for data collection, cleaning and pre-processing of the solar active region and flare data, and we further describe a novel data integration and sampling methodology.

Our dataset covers 4,098 MVTS data collections from active regions occurring between May 2010 and December 2018, includes 51 flare-predictive parameters, and integrates over 10,000 flare reports. The immediate tasks enabled by the disseminated dataset include: optimization of solar flare prediction and detailed investigation for elusive flare predictors or precursors, with both operational (research-to-operations), and basic research (operations-to-research) benefits potentially following in the future.

This study is published in Scientific Data, Nature, and is publicly accessible here.

Overview of our 4-step flare data enhancement and cross-checking procedures as well as accompanied enhancements after each step (brief explanations also provided). The cross-checking with secondary flare data sources (SSW Latest Events and Hinode-XRT) results in three sets of flare reports: (1) primary-verified, where the locations of the primary flare reports (from GOES) are verified by at least one secondary source; (2) secondary-verified, where GOES reported locations could not be verified but SSW and XRT reported locations are in agreement; and (3) non-verified, where flare location from any of the three data sources cannot be verified.

Access our research archive here

Google Sites

Report abuse