Comparative Analysis of Classifier Performance and Generalization using Saccade Related Potentials
by
by
Header photo by Kevin Ku, via Pexels.com
GitHub Repository: https://github.com/wachsmanadam/flerp-modeling
In solving visual detection problems using computer algorithms, the recently-flourishing Computer Vision field faces a constant problem: data scientists must act as interpreters between human understanding and algorithms' understanding of visual data. In classification problems, data labels are the primary tool for ensuring algorithmic solutions sufficiently match human performance; labeling attempts to impart the human understanding of the distinction between different stimuli. However, increasingly complex and generalized visual processing problems necessitate increasingly large quantities of data with increasingly complex labeling schemes (or reformulations of the problem that narrow the category space)--for example, the problem of discriminating video data of real-world actions that may fit multiple different categories at different points in time (Kliper-Gross et. al. 2011). Meanwhile, the field of Brain-Computer Interface (BCI) research takes a different approach toward uniting human understanding and machine learning. Rather than training algorithms to react to sensor data in a way that matches human behavior, the BCI approach directly integrates with the computational power of the human nervous system by training algorithms to react to different patterns of human signals. This exchanges the problems of labeling data and engineering algorithms to achieve human-like results for the problems of accurately collecting human signal data and labeling state over time (Wolpaw et. al. 2020). This also sacrifices the theoretical possibility of full automation, but still presents great possibilities for augmenting human performance, especially given neuroscience research that suggests brain signals precede even conscious awareness of intention to move (Matsuhashi & Hallett, 2008).
Building off previous BCI research, this project is a direct follow-up on Brouwer et. al.'s 2017 study of eye tracking and electroencephalogram (EEG) signatures during a visual search task, performed under low and high cognitive load conditions. As part of their study, they used a Linear Support Vector Machine (SVM) classifier to distinguish target-related versus non-target-related patterns of eye motion and EEG signals. The current project attempts to expand on this by comparing the performance with classifier models that have been demonstrated to be more robust to varying levels of signal noise: Naive Bayes, Logistic Regression, and Random Forest (Atla et. al. 2011). Should any of these models prove to be consistently more accurate, it could be helpful for classification problems using similar signals. Additionally, this project will attempt to test whether these models succeed or fail to generalize when trained exclusively on data from a single cognitive load condition. Depending on which directions (low-to-high or high-to-low cognitive load) cause would have crucial implications for applications which attempt to determine and react to human biosignals in real time.
A growing body of research has started to apply machine learning methodology to classify human states using biosignals. In a 2014 study by Wang, Nie, and Lu, EEG data were input into an SVM model to classify positive and negative emotional states elicited by movie clips, achieving an average classification accuracy of 91.77% after applying dimensionality reduction. Another study aimed to predict cognitive load and driving situation at a given time by training Random Forest and SVM models on eye tracking and car sensor data during real life driving (Yoshida et. al. 2014). These predictions, which centered around the state of the steering wheel, achieved above-chance accuracy and recall during most intervals during turns. The pervasiveness of SVM models was curious, however. One key study of machine learning classifiers which injected increasingly large amounts of artificial noise into training datasets found that SVM models only outperformed Decision Tree models in the case of moderate (30-40%) noise and multiclass targets, while both Decision Trees and Logistic Regression classifiers achieved consistently higher accuracy than SVM except under the highest (50%) noise condition, at which point all of them underperformed next to the Naive Bayes classifier (Atla et. al. 2011). Given the amount of noise that is often present in EEG signal data, it appeared worthwhile to compare a similar set of models using the Brouwer et. al. (2017) biosignal data to see if the patterns in the Atla study seemed to hold.
How effectively do Naive Bayes, Logistic Regression, and Random Forest classifiers perform at distinguishing task target fixations from non-target fixations compared to the Linear Support Vector Machine (SVM) model used by the study?
How does training using only high or low cognitive load data affect the classifiers' ability to generalize to the other cognitive load condition? Specifically, is the generalization worse in one direction more than the other?
This project will utilize the Netherlands Organisation for Applied Scientific Research's (TNO) Fixation-Locked Event-Related Potential (FLERP) experiment dataset, available from the United States Army's Cognition and Neuroergonomics Collaborative Technology Alliance (CANCTA) data repository. These data are available upon valid request for approved use from dev.cancta.net/C3DS/db_login.php .
Per the data description prepared by DCS Corporation (n.d.), the dataset consists of experiment from twenty-one participants. It includes event data from the experiment task (categorical), whether a given target was gaze and pupil data recorded at 60Hz (numerical), EEG signals recorded at 512Hz (numerical), and electrooculogram (EOG) signals recorded at 512Hz (numerical).
Figure 1: Example of the task screen in the Brouwer et. al. (2017) study, during the presentation of a distractor.
The experimental task consisted of a monitoring task and an auditory math task. In the low cognitive load condition, participants only performed the monitoring task, which required watching fifteen locations arranged in a grid on a computer monitor. The locations are initially all left blank as "####". Then, the locations would individually be revealed for one second as shown in Figure 1, surrounded by a white border and displaying whether the location contained a target ("#FA#") or a distractor ("#OK#"). After presentation, the location's status would be hidden for the remainder of the trial block. Participants were instructed to remember all locations containing targets. Once all 15 locations had been revealed, they were replaced by empty boxes. Participants were instructed to click on boxes corresponding to all the locations which presented "#FA#". Once finished, the participant would click an OK button to continue. Every trial had two to four "#FA#" targets and locations were chosen pseudo-randomly such that no two #FA# targets would appear next to one another.
In the high cognitive load condition, participants would perform the monitoring task as above, but simultaneously perform a math task. The math task consisted of an aurally presented sum and/or subtraction of six numbers between 6 and 12. The first number was presented one second after the start of the monitor task, followed by the another number every 2660 ms. At the end of the monitoring task, participants would click boxes corresponding to #FA# locations as above. After that, they would be prompted to type their answer to the math task, and would receive feedback that showed the correct answer following an incorrect submission.
Full details are available from Brouwer et. al. (2017) or the experiment's section in DCS Corp's (n.d.) SANDR description document.
The project utilized archived classifier input data files (path: TNO_FLERP/Additional Data/InputForClass) stored as Matlab structs containing metadata from each trial. The EEG inputs and derived eye tracking statistic inputs are stored separately, and additionally the eye tracking data is split into two files per subject (before and after mid-experiment break). The data were imported using scipy's io module (Virtanen et. al. 2020) and integrated into classes that extracted all information from the structs into more usable forms, primarily numpy arrays (Harris et. al. 2020) and pandas DataFrames (Reback et. al. 2021; McKinney 2010). Descriptions of the data and trial tags are available in ioclasses.py in the project repository, and are also summarized in Table 1.
Table 1: The input classes that parse the .mat data files along with important attributes and descriptions. This is not an exhaustive list of attributes. Some dimensions of tabular data are summarized within < > for brevity.
Two subjects' data were dropped prior to further analysis:
Subject 9, due to a discrepancy between number of trials listed between the eyetracking data and the EEG data. Specifically, within the eyetracking data trials with missing data were still listed as rows with missing column values, but the EEG data omitted some rows entirely. As a result, the eyetracking data had more total rows than the EEG data for all subjects--except Subject 9.
Subject 20, because there was only data from a single session for this subject. Without knowing further information about why this was the case, it seemed prudent to preemptively drop this subject's data as well.
Data from 19 subjects remain.
The remaining subjects had approximately 2640 total trials of data. The two behavioral data splits of interest were the target-distractor trial splits, and the high-low cognitive load condition splits. The split between target trials and distractor trials is approximately 1:3--about 510 target stimulus trials per subject. The split between high and low cognitive load trials is approximately 1:1 for all subjects.
For basic validation, a Chi-Squared test of similarity (p < 0.05) was performed between the distributions of correct and incorrect responses for each cognitive load condition. These tests revealed that, for every subject, response accuracy was significantly lower under the high load condition than the low load condition. This holds true even when considering only accuracy in identifying targets. Cross-subjects mean overall accuracy under low load is 0.983 with standard deviation of 0.017. Cross-subjects mean accuracy under high load is 0.913 with standard deviation of 0.046.
Inspired by exploratory data analysis which compared inter-electrode correlations during tasks between controls and alcoholics (Klymentiev 2019), mean inter-electrode correlations across all subjects were visualized in correlation matrices. Shown in Figure 2, this approach failed to show any obvious differences between cognitive load conditions. Additionally, the areas of greatest correlation largely seem to correspond to neighboring electrodes.
Figure 2: Electrode activity correlations averaged across subjects. From left to right, inter-electrode correlations across all trials, high cognitive load trials, and low cognitive load trials. Visually, there are no obvious differences between the correlations.
To assess the similarity in electrode activity between target and nontarget trials, a more sophisticated statistical testing approach was chosen over the visualization approach. Once again the inter-electrode correlations for all 496 unique pairings of electrodes were calculated for all trials. These arrays of correlations were split into two groups by target and distractor trials. Because the correlations do not even approximate a normally-distributed dataset, the Wilcoxon Signed-Rank Test (p < 0.01), a nonparametric statistical test, was used to test whether the correlation patterns differed during target and distractor trials. This test normally requires an equal number of values from each condition, but because distractor trials outnumber target trials that requirement was fulfilled by randomly sampling from each category with replacement (n = 1200). This method was repeated for every subject to see which pairs of electrodes showed statistically significant differences in correlation (and therefore activity) between the two types of stimulus. Finally, every significant electrode pair label for every subject was split, collapsed into a set of single-electrode labels, and intersected with all other sets of single electrodes involved in significant stimulus-related changes of activity. By this method, it was determined that 19 of the 32 scalp electrodes demonstrated at least some significant stimulus-related activity changes across all 19 subjects. These electrodes are pictured in Figure 3. The fact that many of these significant electrodes are concentrated around the occipital region of the brain lends some credibility to the results because this region contains the visual cortex (Rehman & Al Khalili 2019). These results could potentially help reduce the number of EEG features used for model training in similar research.
Figure 3: All electrodes that, across all subjects, were part of at least one pair of electrodes that demonstrated statistically-significant stimulus-related changes in correlation. In other words, these electrodes consistently showed stimulus-related changes in their relationship to at least one other electrode across all subjects.
To get a sense of how closely the eye fixation features differ between target and distractor presentations, the Wilcoxon Signed-Rank test (p <= 0.05) was employed again. Trials with missing data were excluded, and stimulus-distractor imbalance was addressed using random sampling with replacement as above (n = 1200). The features tested were fixation_duration, median_pupilsize_ontarget, max_pupilsize_ontarget, and deltat (see Table 1). The selected alternative hypotheses were as follows: fixation_duration would be greater for target stimuli, median pupil size would be greater, max pupil size would be greater, and deltat would be smaller (faster reaction). The full results of the test are presented in Table 2. Overall, the null hypothesis was rejected for fixation_duration across all subjects. Results for the other fields were less consistent; 2 of 19 subjects had significantly greater median pupil size for target stimuli, 4 of 19 subjects had significantly greater maximum pupil size for target stimuli, and 6 of 19 had significantly lower deltat for target stimuli. Notably, only one of the subjects with significant median pupil size also had significant maximum pupil size results, which casts even more doubt on the hypothesis that stimulus type consistently increased pupil size. In summary, the only strong conclusion from these tests is that subjects fixated on targets for significantly longer than distractors.
Implementation of these analyses can be found within scripts/EDA.py and within some of the methods contained in ioclasses.py .
Table 2: The full results of the Wilcoxon Signed-Rank Tests for the eye fixation features. A * in the column following the pvalue columns indicates statistically significant test result.
For model training, the featureset included the raw EEG samples, the alpha, and the theta powers from each of the 32 scalp electrodes as well as the fixation duration, median pupil size, maximum pupil size, and delay between presentation and fixation. The dimensionality of the raw EEG sample data was reduced by appending the 32 sets of 256 samples to one another, end to end to be able to create a fully two-dimensional featureset. The last feature is the stimulus category presented for each trial. Targets were represented by 1 and distractors by zero. Overall, this resulted in feature arrays of 8260 columns, plus the target class column.
To deal with the missing trials among the EEG data when integrating with the complete eye tracking data, the metadata from each were reindexed according to session number, trial block number, and fixation/saccade number. This index was applied to all the training features as well. Using this common index, an inner join was performed between the eyetracking metadata DataFrame and the EEG metadata DataFrame, implicitly dropping all incomplete rows. The metadata were further filtered on the criterion of having at least five pupillometry samples. Finally, to address target class imbalanced, distractor trials were randomly downsampled to match the number of target stimulus trials.
Due to the wide range of numerical values, training features were standardized using scikit-learn's StandardScaler object (Pedregosa et. al. 2011). The raw EEG sample sets were reduced to a scale from 0.0 to 1.0 on an per-trial, per-channel basis, effectively representing the highest given potential for a given electrode on a given trial as a value close to 1.0, and the lowest as some value near 0.0. All other numeric features were scaled relative to their entire column of the featureset, also on a range from 0.0 to 1.0. This preserves the temporal qualities of the EEG signals (e.g. at what points in the one-second interval did the electrical potential peak and trough?) while also preserving some representation of absolute electrical potentials relative to other trials in the form of the alpha and theta power features.
Implementation can be found in the integrationclasses.py and modeltesters.py files.
All four classifiers were built using the scikit-learn package's implementations (Pedregosa et. al. 2011) and validation tools. Specifically, testing used the LinearSVC, GaussianNB, LogisticRegression, and RandomForestClassifier objects. Each subject dataset was split 80:20 into training and testing sets. Select hyperparameters were tuned for each subject dataset for the Linear SVM, Logistic Regression, and Random Forest datasets via scikit-learn's GridSearchCV tool. This tool automatically validated all combinations of passed hyperparameters using 5-fold validation. The Naive Bayes classifier implementation's only parameters were priors and variance smoothing, so these were left at default values. The hyperparameters and value selections tested for each model follow:
LinearSVC: {'penalty': ['l1', 'l2'], 'loss': ['hinge', 'squared_hinge'], 'C': [0.1, 1.0, 10.0, 100.0], 'max_iter': 500}
LogisticRegression: {'penalty': ['l1', 'l2', 'elasticnet'], 'C': [0.1, 1.0, 10.0, 100.0], 'solver': ['sag', 'saga'], 'l1_ratio' (elasticnet penalty only): [0.25, 0.5, 0.75]
RandomForest: {'criterion': ['gini', 'entropy'], 'n_estimators': [10, 100, 500, 1000], 'max_depth': [2, 5, 7, 10, 20], 'max_features': [0.01, 0.05, 0.1, 0.25, 0.5, 0.8], 'bootstrap': [True, False]}
After many hours of training, testing, and validation, the parameters chosen by the GridSearchCV object for the five highest testing f1_scores were used to settle on fixed parameters for the models. Discrete parameters like the penalization norm were chosen by simple majority while an in-between value was selected for numerical parameters if no single value was present for four out of the five GridSearch parameters. The fixed parameters follow below:
LinearSVC: {'penalty': ['l2'], 'loss': ['hinge'], 'C': [0.25], 'max_iter': 500}
LogisticRegression: {'penalty': 'elasticnet'], 'C': [0.1], 'solver': ['saga'], 'l1_ratio': [0.5]}
RandomForest: {'criterion': ['gini'], 'n_estimators': [1000], 'max_depth': [20], 'max_features': [0.25], 'bootstrap': [True]}
Using these parameters, the models were trained and tested once again for final comparison.
Using the above hyperparameters, an instance of each classifier was trained per-subject using only data from one cognitive load condition, and subsequently tested using only data from the other condition. Class imbalance was again addressed by random downsampling of distractor stimulus trials, resulting in training and testing sets between 350 and 550 trials. One subject had an unusually low number of trials for unknown reasons, but the results for their data did not meaningfully impact overall metrics.
Models were compared on three primary bases: accuracy, f1 score, and the number of subjects for which the model accuracy performed above chance (p < 0.05), as determined by scipy's binomial test statistic implementation (Virtanen et. al. 2020). These statistics are summarized on Table 3. Additionally, the per-subject accuracy and f1 scores are visualized in Figure 4.
Figure 4: On the left, a plot of accuracy score of each model per subject. On the right, a plot of f1 score of each model per subject
Table 3: Summary statistics of model performance across subjects. "# of above chance accuracy" refers to the number of subjects, out of the 19, for which the given model classified stimuli at an accuracy level statistically higher than random chance (p < 0.05) as determined by binomial test.
Performance was evaluated according to testing accuracy, f1 score, and number of subjects for which the classifier performed above chance according to binomial test statistic. Additionally, significance of differences in accuracy and f1 score between the "high-to-low" and "low-to-high" models were evaluated by Wilcoxon Signed-Rank test statistic, with the alternative hypothesis set to assess whether the "high-to-low" models' performance was significantly greater than the "low-to-high" models'.
Performance between classifiers, regardless of training set, demonstrated similar results to the above. Random Forest performed the most accurately, followed closely by Logistic Regression, then Linear SVM, and finally Gaussian Naive Bayes. Figure 5 and Figure 6 show per-subject results of testing accuracy and f1 score for each classifier type. In terms of the generalization performance between cognitive load conditions, although the summed differences between accuracy and f1 scores were positive (indicating higher "high-to-low" scores overall) for all classifiers except the f1 score for Naive Bayes, only the f1 score of the Random Forest classifier was significantly higher (p = 0.0006). All results are shown on Table 4.
Figure 5: Subject level accuracy, subdivided by each classifier type, for each training-testing set pairing
Figure 6: Subject level f1 scores, subdivided by each classifier type, for each training-testing set pairing
Table 4: Table of metrics for each model and each testing-training condition (high-to-low and low-to-high), as well as indication whether the difference between the testing-training conditions in accuracy and f1 score was statistically significant according to the Wilcoxon Signed-Rank test. P values only included for significant results.
Regarding the first hypothesis, the results from the direct model comparison and the cognitive load crossover comparison, a clear hierarchy emerges. Random Forest proved to be the most robust and accurate, followed by Logistic Regression, then Support Vector Machine, and finally Naive Bayes. These results partly align with the 2011 study by Atla et. al., which found Decision Tree classifiers to be the most accurate models for binary classification problems, followed closely by Logistic Regression, and then Support Vector Machines for data with 40% artificial signal noise and below. The results for Subject 10 in particular also seem to align with their results, which found that Naive Bayes models are more robust with highly noisy data compared to other models.
As for the second hypothesis, while the Random Forest Classifier's F1 scores were significantly higher when trained on the high cognitive load condition and tested on the low cognitive load condition, the difference in accuracy was not even close to statistically significant levels (p = 0.113). Combined with the fact that none of the other models showed significant differences in accuracy based on the training and testing cognitive load conditions, these results do not support the hypothesis that biosignals produced under high cognitive load carry any more information about the importance of a gaze target than those produced under low cognitive load.
Based on these results, it is recommended to use Random Forest classifiers over Support Vector Machines for modeling similar datasets that utilize high quality EEG signals and eye tracking statistics. Furthermore, because the results failed to find a significant effect of modeling using the high cognitive load data over the low cognitive load data, this suggests that the most important features for distinguishing important visual stimuli from unimportant stimuli are not strongly impacted by additional task load. Based on that, there does not seem to be a need to impose high cognitive load on subjects when training a model to learn about what they are looking at using their biosignal data.
This study hit several limitations. First, at the dataset stage, while the parent research paper utilized EOG signal data as classifier features, those signals were not present in the Matlab struct files, and due to time constraints that signal data could not be integrated into the data used in these model comparisons. Second, at the model training stage, the Support Vector Machine models frequently failed to converge before hitting the cap on training iterations. Due to computational limitations, an optimal number of maximum iterations that avoided overfitting could not be adequately tested using the hyperparameter grid search. Finally, although downsampling was the chosen approach to avoid suboptimal model training due to target class imbalance, other approaches such as class weighting may make better use of the data.
Future research using this data should explore reducing the number of EEG features. For example, dropping the electrodes that were not found to have any correlation differences across the target classes could improve accuracy. A proper assessment of the time domain EEG features (i.e. the 256 saccade-locked samples for each channel) against frequency domain features (i.e. the alpha and theta power) would also be a useful line of inquiry. Finally, further model comparison for other target classes, such as correctly recalled targets against incorrect recalls, would help validate the classifier hierarchy that emerged from this study.
saccade - Rapid eye movement from one part of the visual field to the other
fixation - A period of eye movement that is constrained to a relatively small part of the visual field
cognitive load - The amount of information that the subject must concentrate on during task completion
Electroencephalography (EEG) - Recording of voltage changes on the scalp using an array of electrodes, used as a relatively low-latency, noninvasive means of recording brain activity
Electrooculography (EOG) - Recording of voltage changes around the eyes, often recorded simultaneously to remove electric noise caused by eye motion from the EEG signal during data analysis
trial - Within the context of the Brouwer study, a single stimulus presentation (revealing whether a location on the screen is a target or distractor)
trial block - A series of 15 stimulus presentations (revealing locations)
distractor - A stimulus that the subject should disregard during completion of the experimental task. In the Brouwer study, this refers to all screen locations revealed to have the letters #OK# during a trial block.
Chi-Squared test of similarity - A statistical test which compares two distributions of categorical data. It tests the null hypothesis that the two distributions of values do not differ from one another at a level above random chance. Notably, this test cannot be used on its own to determine direction of effect for statistically significant differences.
Wilcoxon Signed-Rank Test - A statistical test which tests the null hypothesis that the differences between a series of paired values is approximately zero. The principle difference between this and other similar tests (for example, paired-samples T-test) is that it does not rely on assumptions about the distribution of values (e.g. that the sample values are approximately normally-distributed).
downsampling - Deliberately discarding observations from a dataset to reduce their number. This is one way to avert the effects of class imbalance when training a classification model.
f1 score - The harmonic mean between precision and recall of a model. Favors models with a high rate of true positives to false positives and true positives to false negatives. It does not directly consider true negatives, however.
binomial test - A statistical test which assesses whether the number of successes observed is significantly greater than a specified probability level. For binary classification problems, that probability can be set to 0.5 to represent random choice and then be used to formally determine whether a trained model is performing above random chance levels.
high-to-low, low-to-high - Shorthand for models trained on the low cognitive load condition data and tested on the high cognitive load condition data (low-to-high/l2h) vs. those trained on the high cognitive load condition data and tested on the low cognitive load condition data (high-to-low/h2l)
Atla, A., Tada, R., Sheng, V., & Singireddy, N. (2011). Sensitivity of different machine learning algorithms to noise. Journal of Computing Sciences in Colleges, 26(5), 96-103.
Brouwer, A. M., Hogervorst, M. A., Oudejans, B., Ries, A. J., & Touryan, J. (2017). EEG and eye tracking signatures of target encoding during structured visual search. Frontiers in human neuroscience, 11, 264.
DCS Corporation. (n.d.). Army Research Laboratory (ARL) Standardized Annotated Neurophysiological Data Repository (SANDR) Data Description. Combat Capabilities Development Command Army Research Laboratory Human Research and Engineering Directorate. v2.1.2. Retrieved from https://dev.cancta.net/C3DS/SANDR%20Data%20Description%20v2.1.2.pdf
Harris, C.R., Millman, K.J., van der Walt, S.J. et al. (2020). Array programming with NumPy. Nature 585, 357–362 . DOI: 0.1038/s41586-020-2649-2.
Kliper-Gross, O., Hassner, T., & Wolf, L. (2011). The action similarity labeling challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 615-621.
Klymentiev, R. (2019). EEG Data Analysis, Alcoholic vs. Control Groups. Kaggle. Retrieved from https://www.kaggle.com/ruslankl/eeg-data-analysis#4.-Whole-EEG-Data-Set-Analysis
Matsuhashi, M., & Hallett, M. (2008). The timing of the conscious intention to move. European Journal of Neuroscience, 28(11), 2344-2351.
McKinney, W. (2010, June). Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference (Vol. 445, pp. 51-56).
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Marchine Learning in Python. Journal of Machine Learning Research 12, 2825-2830
Jeff Reback, Wes McKinney, jbrockmendel, Joris Van den Bossche, Tom Augspurger, Phillip Cloud, Simon Hawkins, gfyoung, Sinhrks, Matthew Roeschke, Adam Klein, Terji Petersen, Jeff Tratner, Chang She, William Ayd, Shahar Naveh, patrick, Marc Garcia, Jeremy Schendel, … h-vetinari. (2021). pandas-dev/pandas: Pandas 1.2.4 (v1.2.4). Zenodo. https://doi.org/10.5281/zenodo.4681666
Rehman A. & Al Khalili Y. (2019) Neuroanatomy, Occipital Lobe. StatPearls. StatPearls Publishing. Available from: https://www.ncbi.nlm.nih.gov/books/NBK544320/ Found online at: https://www.ncbi.nlm.nih.gov/books/NBK544320/
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, & SciPy 1.0 Contributors. (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17(3), 261-272.
Wang, X. W., Nie, D., & Lu, B. L. (2014). Emotional state classification from EEG data using machine learning approach. Neurocomputing, 129, 94-106.
Wolpaw, J. R., Millán, J. D. R., & Ramsey, N. F. (2020). Brain-computer interfaces: Definitions and principles. Handbook of clinical neurology, 168, 15-23.
Yoshida, Y., Ohwada, H., Mizoguchi, F., & Iwasaki, H. (2014). Classifying cognitive load and driving situation with machine learning. International Journal of Machine Learning and Computing, 4(3), 210.