AI in neuro- biomedical science 

Analyzing electroencephalographic (EEG) time series can be challenging, especially with deep neural networks, due to the large variability among human subjects and often small datasets. To address these challenges, various strategies, such as self-supervised learning, have been suggested, but they typically rely on extensive empirical datasets. Inspired by recent advances in computer vision, we propose a pretraining task termed "frequency pretraining" to pretrain a neural network for sleep staging by predicting the frequency content of randomly generated synthetic time series. Our experiments demonstrate that our method surpasses fully supervised learning in scenarios with limited data and few subjects, and matches its performance in regimes with many subjects. Furthermore, our results underline the relevance of frequency information for sleep stage scoring, while also demonstrating that deep neural networks utilize information beyond frequencies to enhance sleep staging performance, which is consistent with previous research. We anticipate that our approach will be advantageous across a broad spectrum of applications where EEG data is limited or derived from a small number of subjects, including the domain of brain-computer interfaces.

A novel dual-stream time-frequency contrastive pretext tasks framework for sleep stage classification [arXiv, Github]

Self-supervised learning addresses the challenge en- countered by many supervised methods, i.e. the requirement of large amounts of annotated data. This challenge is particularly pronounced in fields such as the electroencephalography (EEG) research domain. Self-supervised learning operates instead by utilizing pseudo-labels, which are generated by pretext tasks, to obtain a rich and meaningful data representation. In this study, we aim at introducing a dual-stream pretext task architecture that operates both in the time and frequency domains. In particular, we have examined the incorporation of the novel Frequency Similarity (FS) pretext task into two existing pretext tasks, Relative Positioning (RP) and Temporal Shuffling (TS). We assess the accuracy of these models using the Physionet Challenge 2018 (PC18) dataset in the context of the downstream task sleep stage classification. The inclusion of FS resulted in a notable improvement in downstream task accuracy, with a 1.28 percent improvement on RP and a 2.02 percent improvement on TS. Furthermore, when visualizing the learned embeddings using Uniform Manifold Approximation and Projection (UMAP), dis- tinct clusters emerge, indicating that the learned representations carry meaningful information.

Enhancing brain decoding using attention augmented deep neural networks [ESANN, Github]

Neuroimaging techniques have shown to be valuable when studying brain activity. This paper uses Magnetoencephalography (MEG) data, provided by the Human Connectome Project (HCP), and different deep learning models to perform brain decoding. Specifically, we investi- gate to which extent one can infer the task performed by a subject based on its MEG data. In order to capture the most relevant features of the signals, self and global attention are incorporated into our models. The obtained results show that the inclusion of attention improves the perfor- mance and generalization of the models across subjects.

BAST: binaural audio spectrogram transformer for binaural sound localization [arXiv, Github]

Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predict the sound azimuth in both anechoic and reverberation environments. Two modes of implementation, i.e. BAST-SP and BAST-NSP corresponding to BAST model with shared and non-shared parameters respectively, are explored. Our model with subtraction interaural integration and hybrid loss achieves an angular distance of 1.29 degrees and a Mean Square Error of 1e-3 at all azimuths, significantly surpassing CNN based model. The exploratory analysis of the BAST's performance on the left-right hemifields and anechoic and reverberation environments shows its generalization ability as well as the feasibility of binaural Transformers in sound localization. Furthermore, the analysis of the attention maps is provided to give additional insights on the interpretation of the localization process in a natural reverberant environment.

Exploring automatic liver tumor segmentation using deep learning[IJCNN-2021]

The segmentation of liver tumors is crucial for diagnosis, treatment planning and treatment evaluation. Due to the setbacks that the manual segmentation brings, automatic segmentation has recently gained a lot of attention. In this work, we explore various deep learning based approaches to address automatic liver tumor segmentation. We use the data from the Liver Tumor Segmentation challenge (LiTS). In particular, the considered models here are UNet-based architectures. In addition, we investigate the influence of incorporating extra elements to the pipeline such as attention mechanisms, model ensemble, test-time inference as well as an additional model to reject false positives, over the final performance. The obtained results show that the 3D-UNet architecture, together with ensemble learning methods, performs more accurate predictions than the other examined approaches.


Towards biologically plausible learning in neural networks [IEEE-SSCI 2021]

Artificial neural networks are inspired by information processing performed by neural circuits in biology. While existing models are sufficient to solve many real-world tasks, they are far from reaching the potential of biological neural networks. These models are oversimplifications of their biological counterparts, omitting key features such as the spiking nature of their units or the locality during learning, among others. In this work, we, first, provide a short review of the most recent theories on biologically plausible learning and learning in Spiking Neural Networks. Then, aiming to give a step towards brain-inspired deep learning, we introduce a novel biologically plausible learning method. This approach achieves learning using only local information to each synapse, spiking units and unidirectional synaptic connections. We also propose a local solution to address the credit assignment problem based on target propagation. Finally, we evaluate our approach over three different tasks, i.e. boolean problems, image autoencoding and handwritten digit recognition.

Goal-driven, neurobiological-inspired convolutional neural network models of human spatial hearing [Neurocomputing 2022]

The human brain effortlessly solves the complex computational task of sound localization using a mixture of spatial cues. How the brain performs this task in naturalistic listening environments (e.g. with reverberation) is not well understood. In the present paper, we build on the success of deep neural networks at solving complex and high-dimensional problems [1] to develop goal-driven, neurobiological-inspired convolutional neural network (CNN) models of human spatial hearing. After training, we visualize and quantify feature representations in intermediate layers to gain insights into the representational mechanisms underlying sound location encoding in CNNs. Our results show that neurobiological-inspired CNN models trained on real-life sounds spatialized with human binaural hearing characteristics can accurately predict sound location in the horizontal plane. CNN localization acuity across the azimuth resembles human sound localization acuity, but CNN models outperform human sound localization in the back. Training models with different objective functions - that is, minimizing either Euclidean or angular distance - modulates localization acuity in particular ways. Moreover, different implementations of binaural integration result in unique patterns of localization errors that resemble behavioral observations in humans. Finally, feature representations reveal a gradient of spatial selectivity across network layers, starting with broad spatial representations in early layers and progressing to sparse, highly selective spatial representations in deeper layers. In sum, our results show that neurobiological-inspired CNNs are a valid approach to modeling human spatial hearing. This work paves the way for future studies combining neural network models with empirical measurements of neural activity to unravel the complex computational mechanisms underlying neural sound location encoding in the human auditory pathway.