Understanding the functions of the visual system has been one of the major targets in neuroscience for many years. However, the relation between spontaneous brain activities and visual saliency in natural stimuli has yet to be elucidated. In this study, we developed an optimized machine learning-based decoding model to explore the possible relationships between the electroencephalography (EEG) characteristics and visual saliency. The optimal features were extracted from the EEG signals and saliency map which was computed in an unsupervised saliency model [Tavakoli and Laaksonen, 2017]. Subsequently, various unsupervised feature selection/extraction techniques were examined using different supervised regression models. The robustness of the presented model was fully verified by means of ten-fold or nested cross validation procedure, and promising results were achieved in the reconstruction of saliency features based on the selected EEG characteristics. Through the successful demonstration of using EEG characteristics to predict the real-time saliency distribution in natural videos, we suggest the feasibility of quantifying visual content through measuring brain activities (EEG signals) in real environments, which would facilitate the understanding of cortical involvement in the processing of natural visual stimuli and application developments motivated by human visual processing.
Our paper:
Z. Liang, Y. Hamada, S. Oba, and S. Ishii, Characterization of electroencephalography signals for estimating saliency features in videos, Neural Networks, 105, pp. 52-64, 2018.
The data acquisition environment is shown below.
The selected channel locations were near the visual cortices, which were highlighted by red circles as:
S. Jirayucharoensak, S. Pan-Ngum, and P. Israsena, EEG-Based Emotion Recognition Using Deep Learning Networkwith Principal Component Based Covariate Shift Adaptation, The Scientific World Journal, vol. 2014, Article ID 627892, pp. 1-10, 2014.
The number of components under different percentages of variance retained in various feature extraction techniques were presented below.
The robustness of KPCA+KRR was here examined when different ratios of training and test (validation) datasets were utilized.
For each video, we selected the first ψ% frames as the training data, and the remaining frames (1- ψ)% as the test data. We trained the KPCA+KRR model with all the training data from 10 videos (training dataset), and verified the prediction performance on all the test data from 10 videos (test dataset).
To understand the prediction performance better, we changed ψ from 0.3 to 0.9, and the results on the prediction of time-series saliency features Pw were reported below.
As expected, an increase in the amount of the training data was associated with a higher prediction accuracy. When the size of the training dataset was larger than 50%, the prediction performance achieved over 70% accuracy in the estimation of saliency feature. When we used more than 80% data for training, a reasonably high test accuracy (correlation coefficient > 0.85) was obtained. These promising results revealed that the saliency fluctuations across frames in a single video and across different videos can be robustly and accurately estimated by using the EEG characteristics.
Moreover, below's figure showed the prediction results on the test data in each video under ψ=80%. These results verified that the predicted time-series saliency features Pw on the test data were fairly close to the actual one identified from the real test (unseen for the regression model) time-series features, and well reproduced the temporal behaviors such as up and down, especially in videos 1,2,3,6 and 8.
However, we also observed that there was a consistent gap between the prediction results and the groundtruth on video 7. One possible reason would be the feature levels of the extracted saliency features in the training (80%) and test (20%) parts were quite different and this difference affected the prediction results a lot. In the future work, more videos with various saliency levels would be studied and this effect on the saliency estimation using EEG signals would be further explored.
We examined the prediction performance of KPCA+KRR, correlation coefficient and NMSE, for each of the 10 videos in the 10-fold cross validation procedure.
The results were reported below, with employing the feature extraction of the 95% variance retained. In the 10-fold cross validation results on individual videos, most of the correlation coefficients were above 0.80 with p-value<0.05 and the median value of the obtained NMSEs were above 0.35. The performance highlighted our KPCA+KRR reproduced well the dynamic natures of the target saliency features and the individual performance for each video thus suggested the applicability of our KPCA+KRR for the prediction/reproduction of the visual feature in videos based on EEG signals.
The parameter tunings were conducted in the nested CV manner. More details about the procedure were presented below.
Two loops were involved in the nested CV: inner loop for tuning parameters and selecting the best parameter setting; outer loop for evaluating the model performance by using the parameters selected in the inner loop. Algorithm 1 illustrates the nested CV procedure.
The whole data were first randomly divided into ten folds, where nine folds for training and one fold for testing. Then, based on the nine folds training dataset, various hyperparameters were optimized according to the 9-fold CV so that the best hyperparameters were selected such to achieve the best validation performance. Subsequently, the selected hyperparameters were tested on the remaining test dataset. Through repeating this procedure by changing the test dataset, we cross-validated the total process of hyper-parameter determination, and obtained the unbiased estimation of the best performance.
We examined which EEG electrodes were frequently selected during the feature selection in the 10-fold cross-validation procedure. Below's figure showed a brain heatmap to represent the selection rate of the EEG electrodes; the highest selection rate was colored red while the lowest selection rate was green. The results showed, the brain location surrounding P3 and P1 and surrounding P4 and P6, were found to be important for the selective attention (visual saliency). This finding is in line with the existing knowledge. During visual processing, most coherences would happen among occipital and parietal electrodes [Colby and Goldberg, 1999; Smith et al., 2012], and the neural responses in those visual areas play a critical role underlying the attentional control [Kanwisher and Wojciulik, 2000; Imamoglu et al., 2012].
C.L. Colby and M.E. Goldberg, Space and attention in parietal cortex, Annual Review of Neuroscience, vol. 22, pp. 319-349, 1999.
M.L. Smith, F. Gosselin, and P.G. Schyns, Measuring internal representations from behavioral and brain data, Current Biology, vol. 22(3), pp. 191-196, 2012.
N. Kanwisher and E. Wojciulik, Visual attention: insights from brain imaging, Nature Reviews Neuroscience, vol. 1, pp. 91-100, 2000.
F. Imamoglu, T. Kahnt, C. Koch, and J.D. Haynes, Changes in functional connectivity support conscious object recognition, NeuroImage, vol. 63(4), pp. 1909-1917, 2012.