One multivariate time seires has 60 timesteps. And each timestep has 33 parameters, including:
TOTUSJH , TOTBSQ , TOTPOT , TOTUSJZ , ABSNJZH , SAVNCPP , USFLUX , TOTFZ , MEANPOT , EPSZ , SHRGT45 ,
MEANSHR , MEANGAM , MEANGBT , MEANGBZ , MEANGBH , MEANJZH , TOTFY , MEANJZD , MEANALP , TOTFX , EPSY ,
EPSX , R_VALUE , RBZ_VALUE , RBT_VALUE , RBP_VALUE , FDIM , BZ_FDIM , BT_FDIM , BP_FDIM , PIL_LEN , XR_MAX .
The plot shows the distributions of solar flare time series data in partition 1 with 5 classes. And the per-class frequencies are listed below:
Class Q: 63400,
Class B: 6010,
Class C: 6531,
Class M: 1157,
Class X: 172.
The plot shows the distributions of solar flare time series data in partition 1 with 2 classes. And the per-class frequencies are listed below:
Positive class : 1329,
Class M: 1157; Class X: 172.
Negative class: 75941,
Class Q: 63400; Class B: 6010; Class C: 6531.
The partition1 has 77270 samples. The pet dataset aims to sample 1000 out of 77270 with applying preserves climatology strategy. To fulfill this purpose, 500 flares and 500 no-flares are kept. The ratios of sampling are listed below:
Positive class (M and X): 500/1329 = 0.376.
M_after_sampling = 1157 * 0.376 = 435.
X_after_sampling = 500 - 435 = 65.
Negative class (Q, B and C) = 500/75941 = 0.0066.
B_after_sampling = 6531 * 0.0066 = 43.
C_after_sampling = 6010 * 0.0066 = 40.
Q_after_sampling = 500 - 43 - 40 = 417.
Ref: Ahmadzadeh, Azim, et al. "Challenges with extreme class-imbalance and temporal coherence: A study on solar flare data." 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019.
The plot shows the distributions of solar flare time series data with 5 classes after sampling. And the amount within each class is listed below:
Class Q: 417,
Class B: 40,
Class C: 43,
Class M: 435,
Class X: 65.
The plot shows the distributions of solar flare time series data with 2 classes after sampling. And the amount within each class is listed below:
Positive class : 500,
Class M: 435; Class X: 65.
Negative class: 500,
Class Q: 417; Class B: 40; Class C: 43.
Ref: https://bitbucket.org/gsudmlab/yang_3861/src/master/src/preprocessing/
A min_max_normalizer for processing multivariate time seires data with shape of [n, timesteps, num_features], e.g. [1000, 60, 33].
|- n: the total number of records in a MVTS dataset.
|- timesteps: the total steps in a single time sequence, e.g. 60.
|- num_features: the number of features of one time step.
NOTE that this is a global normalization method, which means each value (in a column) is scaled across n*timesteps with a certain feature.
In meanwhile, the information of scalers are saved for inversion (undo normalization) in the future. (each feature has a global scaler).
(** this method is extended from sklearn's preprocessing package (see `sklearn.preprocessing.MinMaxScaler`))
Example: ( Data format description: 3-d array, [n, timesteps, num_features] )
data = [ [ [ 1 2 3]
[ 0 10 4] ]
[ [-1 18 2]
[ 4 1 1] ] ]
data_norm = [ [ [-0.2 -0.88235294 0.33333333]
[-0.6 0.05882353 1. ] ]
[ [-1. 1. -0.33333333]
[ 1. -1. -1. ] ] ]
We can find that each value is scalced across different multivariate time series.
e.g. the first feature of all MVTS: [[1, 0], [-1, 4]] --> [[-0.2, -0.6], [-1, 1]].
Introduction:
A Gramian Angular Field is an image obtained from a time series, representing some temporal correlations between each time point.
Two methods are available: Gramian Angular Summation Field and Gramian Angular Difference Field.
Usage in python:
# Transform the time series into Gramian Angular Fields
gasf = GramianAngularField(image_size=24, method='summation')
X_gasf = gasf.fit_transform(X)
gadf = GramianAngularField(image_size=24, method='difference')
X_gadf = gadf.fit_transform(X)
# Show the images for the first time series
fig = plt.figure(figsize=(8, 4))
grid = ImageGrid(fig, 111, nrows_ncols=(1, 2), axes_pad=0.15, share_all=True,
cbar_location="right", cbar_mode="single", cbar_size="7%", cbar_pad=0.3,)
images = [X_gasf[0], X_gadf[0]]
titles = ['Summation', 'Difference']
for image, title, ax in zip(images, titles, grid):
im = ax.imshow(image, cmap='rainbow', origin='lower')
ax.set_title(title, fontdict={'fontsize': 12})
ax.cax.colorbar(im)
ax.cax.toggle_label(True)
plt.suptitle('Gramian Angular Fields', y=0.98, fontsize=16)
plt.show()
An example of GAF with different settings of imageSize:
The above figure shows generating GAF with different sizes. The less size means using less bins for calculating GAF, which means less computations as well. Therefore, choosing an approprite size is to balance the complexities between representations and computations. In this case, the sizes between 10 and 50 can obtain pretty good representations since this is a simple line, but ratios of 0.25 and 0.5 are more suitable for more complex situations.
Introduction:
A Gramian Angular Field is an image obtained from a time series, representing some temporal correlation between each time point.
Two methods are available: Gramian Angular Summation Field and Gramian Angular Difference Field.
Usage in python:
# Reading MVTS and corresponding labels.
from src.data.data_reader import DataReader
from src.data.ts_imaging import TSImaging
from pyts.image import MarkovTransitionField, GramianAngularField
from src.plotting.plotter import Plotter
# Reading MVTS and corresponding labels.
X = DataReader().read_npy('.' + PATH_TO_PET_NORM_TOP_5_FEATURE_DATA)
y = DataReader().read_npy('.' + PATH_TO_PET_LABEL)
# GAF transformation and saved with image format.
tsi = TSImaging()
plotter = Plotter()
for i in range(len(X_train)): # mvts index
gaf_summ = tsi.transform_mvts(X_train[i], GramianAngularField, image_size=28)
gaf_diff = tsi.transform_mvts(X_train[i], GramianAngularField, image_size=28, method='difference')
label = y_train[i]
for j in range(len(TOP_5_FEATURES)): # param index
plotter.save_gaf_with_image(gaf_summ[j], gaf_diff[j], label, i, j)
A workflow for Processing and generating GAF image dataset:
Observations:
(1) The summations of GAF are symmetric along the counter-diagonal line, which is from lower left to upper right.
(2) The differences of GAF are opposite along the counter-diagonal line.
GAF summation images dataset examples:
GAF images tranformed from corresponding MVTS with summation method. Each image in the above table is obtained from a univarite time series, e.g. (60, 1). Each multivariate time series produces 5 images if top 5 features are used. In this way, we can generate an image dataset for all MVTS.
GAF difference images dataset examples:
GAF images tranformed from corresponding MVTS with summation method. Each image in the above table is obtained from a univarite time series, e.g. (60, 1). Each multivariate time series produces 5 images if top 5 features are used. In this way, we can generate an image dataset for all MVTS.
(1) how many images:
Totally 5000 images genreated by GASF and 5000 images genreated by GADF. The 5000 is calculated by 1000 MVTS with 5 features.
(2) distribution of classes:
Negative class:
Class Q: 417,
Class B: 40,
Class C: 43.
Positive class:
Class M: 435,
Class X: 65.
(3) image sizes: 28 * 28 pixels.
(4) RGB or grayscale: including both grayscale(recommended) and RGB versions.
(5) image type: *.jpg files.
(6) data architecture: Images that can be read via PIL(Python Image Library).
ref: https://bitbucket.org/gsudmlab/yang_3861/src/master/data/gaf_dataset/
|- graysclae version: image_gaf_diff and image_gaf_summ directories.
|- RGB version: image_gaf_diff_rgb and image_gaf_summ_rgb directories.
Introduction:
A model extended from LeNet-5 for image classification.
CNN experiments with applying different number of features:
|- Top-1 features: 'TOTUSJH'.
|- Top-2 features: 'TOTUSJH' and 'TOTBSQ'.
|- Top-3 features: 'TOTUSJH', 'TOTBSQ' and 'TOTPOT'.
|- Top-5 features: 'TOTUSJH', 'TOTBSQ', 'TOTPOT', 'TOTUSJZ' and 'ABSNJZH'.