Nikolaos Dionelis, PhD

Dr Nikolaos Dionelis

My contact information:

nik.dionelis.2022@gmail.com

My CV can be found in: CV of Nikolaos Dionelis

Dr Nikolas Dionelis, Post-Doc, PhD, Masters MEng

Postdoctoral Research Associate in Machine Learning

School of Engineering, University of Edinburgh

Alexander Graham Bell building, King’s Buildings,

The University of Edinburgh,

EH9 3JL, Edinburgh, UK

Email: Nikolaos.Dionelis@ed.ac.uk

Mobile: +44 (0) 7873286106

I am a deep learning enthusiast working at the intersection of machine learning, computer vision, speech and audio, and signal processing.

I am a Postdoctoral Research Associate in Machine Learning at the University of Edinburgh in the Department of Electronics and Electrical Engineering, Digital Communications Research Institute, School of Engineering.

As a four-year Postdoctoral Research Associate in Machine Learning, I work within the University Defence Research Collaboration (UDRC) in Signal Processing. I work within Work Package (WP) 3.1 entitled "Robust Generative Neural Networks", and more information about the UDRC research project can be found in (UDRC EPSRC), (WP3.1 UDRC), and (WPs UDRC).

Research Topics: Deep learning, Anomaly detection/ Out-of-Distribution (OoD) detection; Deep generative models, Generative Adversarial Networks (GAN); Discriminative models, Classification; Semi- and self-supervised learning, Contrastive learning; Few-shot anomaly detection, Low- and few-shot learning; Robust learning with label noise; Automated solutions using machine learning and signal processing techniques.

I am currently conducting research on deep learning for these research topics with Prof. Sotirios A. Tsaftaris and Mehrdad Yaghoobi.

I have a Masters MEng degree and a PhD degree, both from Imperial College London. My PhD advisor was Mike Brookes, and I was within the Speech and Audio Processing (SAP) Group, Electrical and Electronic Engineering (EEE) Department, also with Prof. Patrick A. Naylor.

UDRC WP3.1 Robust Generative Neural Networks: The Principal Investigator (PI) of the UDRC research project is Prof. Mike Davies, (Prof. Davies). The Academics in UDRC WP3.1 are Prof. Sotirios A. Tsaftaris, (Prof. Tsaftaris), and Mehrdad Yaghoobi, (Assistant Prof. Yaghoobi).

Masters MEng and PhD degrees from Imperial College London: During my Masters MEng degree and my PhD degree at Imperial College, I collaborated with Mike Brookes, (Prof. Brookes (Reader)) , and Prof. Patrick A. Naylor, (Prof. Naylor) .

Research Experience: I have experience in deep learning, computer vision, speech and audio, machine learning, and signal processing, and good coding experience in Python.

My main research work and my paper publications:

"Boundary of Distribution Support Generator (BDSG): Sample Generation on the Boundary": We propose an invertible-residual-network-based model, the Boundary of Distribution Support Generator (BDSG). We use the recently developed Invertible Residual Network (IResNet) and Residual Flow (ResFlow), for density estimation. These models have not yet been used for anomaly detection. We leverage IResNet and ResFlow for Out-of-Distribution (OoD) detection and for sample generation on the data distribution boundary using a loss function that forces the samples to lie on the boundary. The BDSG addresses non-convex support, disjoint components, and multimodal data distributions. Evaluation results on synthetic data and data from multimodal distributions, such as CIFAR-10, demonstrate competitive performance compared to methods from the literature.

Research topics: Anomaly detection/ OoD detection, Explicit deep generative models, Maximum likelihood optimisation, Flow-based invertible generative models, Invertible Residual Networks, Sample generation on the data distribution boundary, Learned distribution boundary, Likelihood, Probability density distribution

The proposed methodology and our contributions: Sample generation on the data distribution boundary; Development of the Boundary of Data Distribution Generator model; BDSG, IResNet, and ResFlow for anomaly/ OoD detection; Evaluation of BDSG

"Tail of Distribution GAN (TailGAN): Generative-Adversarial-Network-Based Boundary Formation": Generative Adversarial Networks (GAN) are a powerful methodology and can be used for unsupervised anomaly detection, where current techniques have limitations such as the accurate detection of anomalies near the tail of a distribution. We develop a GAN-based tail formation model for anomaly detection, the Tail of distribution GAN (TailGAN), to generate samples on the tail of the data distribution and detect anomalies near the support boundary. Using GANs that learn the probability of the underlying data distribution has advantages in improving the anomaly detection methodology by allowing us to devise a generator for boundary samples, and use this model to characterise anomalies. We evaluate TailGAN for identifying Out-of-Distribution (OoD) data, and its performance evaluated on MNIST, CIFAR-10, Baggage X-Ray, and OoD data shows competitiveness compared to methods from the literature.

Research topics: Anomaly/ OoD detection, Implicit deep generative models, GANs, Adversarial training, Sample generation on the boundary of the underlying distribution of the data, Probability distribution metrics, Anomaly score/ OoD score

The proposed methodology and our contributions: Generate samples on the boundary of the support of the normal class distribution; Use implicit deep generative models for anomaly detection/ OoD detection; GAN; Development of TailGAN

"OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Anomaly Detection": Generative models trained in an unsupervised manner may set high likelihood and low reconstruction loss to Out-of-Distribution (OoD) samples. This leads to failures to detect anomalies, overall decreasing Anomaly Detection (AD) performance. In addition, AD models underperform due to the rarity of anomalies. To address these limitations, we develop the OoD Minimum Anomaly Score GAN (OMASGAN) which performs retraining by including the proposed minimum-anomaly-score OoD samples. These OoD samples are generated on the boundary of the support of the normal class data distribution in a proposed self-supervised learning manner. The evaluation of OMASGAN on image data using the leave-one-out method shows that it achieves an improvement of at least 0.24 and 0.07 points in AUROC on average on the MNIST and CIFAR-10 datasets, respectively, over other benchmark models for AD.

Research topics: Anomaly detection/ OoD detection, Implicit deep generative models, GANs, Learned negative data, Sample generation on the data distribution boundary, Minimum anomaly score OoD samples, Self-supervised learning, Model retraining, Contrastive negative training, Leave-one-out evaluation

The proposed methodology and our contributions: Perform self-supervised learning; Improve upon unsupervised learning; Model retraining; Improve performance by including the minimum anomaly score OoD samples in the training process; Learned negative data; Perform active negative sampling and training; Development of the OMASGAN model; Probability metrics; Evaluation of OMASGAN

"Few-Shot Adaptive Detection of Objects of Concern Using Generative Models with Negative Retraining": Detecting objects which we are interested in, Objects of Concern (OoC), is nowadays attracting attention. In aviation and transport, it is important to robustly detect OoC in security images. OoC are rare, differ from typical samples, and may be unknown during training. To address such limitations, we propose the negative REtraining with Few-shots Generative Adversarial Network (REFGAN) for detecting OoC. REFGAN aims at automatically identifying OoC by learning from Objects of No Concern (OoNC) and OoC. Our methodology comprises learning a prior using OoNC, and few-shot model adaptation using the Few-Shot OoC (FSOoC). The evaluation of REFGAN on the Baggage SIXray dataset shows that when FSOoC are used, our model outperforms the prior, and outperforms recent baselines by approximately 6.3% in mean values. REFGAN can detect OoC, and its evaluation on SIXray and CIFAR-10 shows robustness against the number of few-shot samples of OoC.

Research topics: Objects of Concern detection, Rarity of Objects of Concern, Negative REtraining with Few-shots GAN (REFGAN), Model retraining, Contrastive negative training, Robustness

The proposed methodology and our contributions: Perform Objects of Concern detection and classification; Include the learned data distribution boundary in training; Retraining; Include few-shot data samples of Objects of Concern in the training process; Develop the novel negative REtraining with Few-shots GAN (REFGAN) model

"FROB: Few-Shot Robust Model for Classification and Out-of-Distribution Detection": Classification and Out-of-Distribution (OoD) detection in the few-shot setting remain challenging aims, but are important for devising critical systems in security where samples are limited. OoD detection requires that classifiers are aware of when they do not know and avoid setting high confidence to OoD samples. To address such limitations, we propose the Few-shot ROBust (FROB) model with its key contributions being (a) the joint classification and few-shot OoD detection, (b) the sample generation on the boundary of the support of the normal class distribution, and (c) the incorporation of the learned distribution boundary as OoD data for contrastive negative training. FROB finds the boundary of the normal class distribution, and uses it to improve few-shot OoD detection. By including the learned boundary, FROB reduces the threshold linked to the model’s few-shot robustness in the number of few-shots, and maintains the OoD performance approximately constant, independent of the number of few-shots. The low- and few-shot robustness evaluation of FROB on different datasets and on One-Class Classification (OCC) data shows that FROB achieves competitive performance and outperforms baselines in terms of robustness to the OoD few-shot sample population and variability.

Research topics: Anomaly/ OoD detection, Joint classification and OoD detection, Few-shot anomaly detection, Discriminative models, Prediction confidence, Labeled data, Class labels, Sample generation on the data distribution boundary, Self-supervised learning, Retraining by including the learned negative data

"CTR: Contrastive Training Recognition Classifier for Few-Shot Open-World Recognition": AI-enabled systems in security, autonomous systems, safety, and healthcare do not only need to effectively detect Out-of-Distribution (OoD) samples, but also to recognise Objects of Concern (OoC), e.g. multiple thorax diseases, efficiently with few-shots. Detecting OoD samples is crucial, because reporting an out-of-domain input as abnormal is better than falsely classifying it. Data samples, during inference, are not confined to a finite labelled set, and thus closed-set approaches are limiting, as they misclassify OoD inputs, and they may assign them high prediction confidence. Furthermore, although anomaly detection is possible, recognising new OoC fast using only few-shot samples remains challenging. There is a lack of methods for joint anomaly detection and few-shot OoC classification. Our contribution is the development of a framework for simultaneous few-shot OoC detection and classification and anomaly detection in the unknown previously unseen, in the wild, environment, which is known as Open-World Recognition (OWR). We propose a novel methodology, the data distribution boundary Contrastive Training Recognition (CTR) classifier for few-shot OWR. CTR outperforms recent baselines in several settings, including on the SVHN, CIFAR-FS, and BSCD-FS ChestX and ISIC image datasets.

Research topics: Anomaly detection, Open-Set classification, Open-World recognition, Dynamic setting with known and new classes, Model retraining, Low- and few-shot learning, Contrastive negative retraining, Learn new classes fast with few-shots, Data distribution boundary Contrastive Training Recognition classifier

"Literature Review of Methods for Anomaly Detection": Deep generative models and Generative Adversarial Networks (GANs) can be used for anomaly detection. To perform anomaly detection, generative models can be used for unsupervised or semi-supervised learning. Unsupervised anomaly detection is examined because the anomaly is not known in advance, before inference. GANs are state-of-the-art generative models that are based on a distribution metric and can succeed convergence related to distribution metrics. GANs are used for anomaly detection because they are based on distribution metrics, such as the Jensen-Shannon Divergence (JSD) and the Wasserstein distance. Convergence can be achieved using loss functions that are related to probability metrics. GANs are probability distribution learners and can achieve better convergence than other generative models. The aim of this report is to examine the main components of GAN frameworks, random variables, models, algorithms, cost functions, distribution metrics and evaluation metrics. This report reviews the state-of-the-art deep learning based methods for anomaly detection and categorises them based on the type of model used and the criteria of detection.

Research topics: Implicit deep geneerative models, GANs, Distribution metrics, Anomaly detection/ OoD detection, Anomaly/ OoD score, Objective loss functions, Evaluation metrics

"Methodologies for Improved Robustness for Few-Shot Anomaly Detection and Learning": In this work, our main aim is to extend the work on verifiable robust anomaly detection/ Out-of-Distribution (OoD) detection to include incremental few-shot adaptation and the simultaneous identification of classes. Our target is to combine Verifiable Robustness with Confidence and Uncertainty Estimation, Few-Shot Learning, Multi-Class Classification, and Generative Adversarial Networks (GAN). The main criteria for including paper publications in this Literature Review are: (1) Robustness Verification for OoD Detection and Multi-Class Classification, and (2) Few-Shot Learning. The main areas of interest are (a) Robustness Verification, Confidence and Uncertainty Estimation, (b) Anomaly/ OoD Detection, (c) Multi-Class Classification, and (d) Low- and Few-Shot Learning.

Research topics: Anomaly/ OoD detection, Discriminative models, Deep Neural Network (DNN) classifiers, Convolutional Neural Network (CNN) classifiers, Residual Network (ResNet) classifiers, Prediction confidence, Uncertainty calibration, Open-Set recognition, Simultaneous multi-class classification and anomaly/ OoD detection, Low- and few-shot learning, Robustness

"Anomaly Detection/ OoD Detection and Incremental Learning: Alleviate Catastrophic Forgetting": In this work, we conduct research on state-of-the-art methodologies in deep learning and computer vision. In this research, we analyse several papers on incremental learning, Open-World classification, few-shot learning, and anomaly detection/ Out-of-Distribution (OoD) detection. Incremental learning, class-incremental training, training with limited data, and few-shot learning are performed. In our work, we have developed anomaly/ OoD detection and multi-class classification methodologies using both deep generative models and discriminative classifier models, few-shot anomaly detection/ OoD detection techniques, and low- and few-shot learning methods. In our research, we have also performed both many-sample and few-shot class-incremental learning.

Research topics: Anomaly detection/ OoD detection, Incremental learning, Effectively alleviate catastrophic forgetting, Few-shot classification and learning, Discriminative models, DNN classifiers (CNN and ResNet), Prediction confidence, Open-Set recognition, Open-World classification, Realistic dynamic setting

"Anomaly Detection in Images, on Temporal Data, and on Multimodal Data": The main aims of this report are to identify discernible limitations and gaps in the current literature of methods for anomaly detection in high-dimensional spaces. In spite of impressive recent progress in anomaly detection ushered in by the use of deep generative models such as Generative Adversarial Networks (GANs), Invertible Generative Models (IGMs), Variational Auto-encoders (VAEs), and Auto-encoders (AEs), performance, robustness, and stability remain formidable challenges that are unlikely to be overcome solely by the current state-of-the-art methodologies. The current literature of methods for anomaly detection uses different inference me- chanisms and training methodologies with respect to the data, the data labels and classes, the modelling of the temporal dimension of the data, and the modelling of the multimodality of the data.

Research topics: Anomaly/ OoD detection, Image data, Videos, Temporal data, Multimodal data, Deep generative models, Implicit generative models, GANs, Model retraining, Deep generative models knowing what they do not know, Overconfidence

"Research on Anomaly Detection in Videos Using Optical Flow Networks": In this research work, we examine Robust Video Prediction for Anomaly Detection. The accurate identification of abnormal events in scenes is important in video surveillance applications. For effective anomaly detection in videos, time variation is key as we have time-static cases and time dynamics. Optical Flow models can be used for anomaly detection in videos.

Research topics: Anomaly detection/ OoD detection, Video data, Image frames, Optical Flow networks, Accurate scene understanding, Video next frame prediction, Temporal dimension and time variation, Definition of anomaly, Time-static anomalies including vans and cars on pavements, Time dynamics, Time-dynamic anomalies including people running on pavements

"Few-Shot Anomaly Detection/ Mitosis Detection in Medical Imaging": In this research, we propose a deep learning based mitosis detection framework for mitotic cells identification in breast cancer histopathological images. Mitosis is an indicator of breast cancer and its detection is challenging. We aim at developing an algorithm for few-shot mitosis detection in healthcare to support histopathologists. Due to the rarity of mitosis data, the main aim of this research is to perform few-shot mitosis detection with improved mitosis detection performance, mitosis generalisation, enhanced mitosis detection accuracy and precision (as well as recall and F1 score), and robustness to the abnormal mitosis few-shot sample population and variability.

Research topics: Anomaly/ OoD detection, Medical imaging, Healthcare, Mitosis detection, Breast cancer joint detection and classification, Few-shot anomaly detection, Discriminative models, Multi-class classification, DNN classifiers including both CNN and ResNet models, High class imbalance, Generalisation performance, Robustness to low-shot population and variability

"BSS: Baggage Security Screening Model for Low-Shot Learning of Baggage X-Ray Images": (Work in progress, To be provided)

Research topics: Anomaly detection, Baggage security screening, Baggage X-ray images, Low-shot classification and learning

"Deep Learning and Generative Adersarial Networks (GANs) for Speech and Audio": Deep generative models, including Generative Adversarial Networks (GANs), can be used for speech enhancement and for speech and audio applications.

Research topics: Deep generative models, Implicit generative models, GANs, Speech and audio, Speech enhancement

"Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering": We present a speech enhancement algorithm that performs modulation-domain Kalman filtering to track the speech phase using circular statistics, along with the spectral log-amplitudes of speech and noise. In our algorithm, the speech phase posterior is used to create an enhanced speech phase spectrum for the signal reconstruction of speech. The Kalman filter prediction step separately models the temporal inter-frame correlation of the speech and noise spectral log-amplitudes and of the speech phase, while the Kalman filter update step models their nonlinear relations under the assumption that speech and noise add in the complex short-time Fourier transform domain. Our phase-sensitive enhancement algorithm is evaluated with speech quality and intelligibility metrics, using a variety of noise types over a range of SNRs. Instrumental measures predict that tracking the speech log-spectrum and phase with modulation-domain Kalman filtering leads to consistent improvements in speech quality, over both conventional enhancement methods and other algorithms that perform modulation-domain Kalman filtering.

Research topics: Speech enhancement, Nonlinear modulation-domain Kalman filtering, Signal processing, Estimation and tracking, Dynamic setting, Adaptive algorithm, Speech phase, Phase-sensitive speech enhancement, Speech and noise signals, Log-amplitude spectral domain, Speech quality, Intelligibility

"Modulation-Domain Kalman Filtering for Monaural Blind Speech Denoising and Dereverberation": We describe a speech enhancement algorithm based on modulation-domain Kalman filtering to blindly track the time-frequency log-magnitude spectra of speech and reverberation. We propose an adaptive algorithm that performs joint denoising and dereverberation, while accounting for the inter-frame speech dynamics, by estimating the posterior distribution of the speech log-magnitude spectrum given the log-magnitude spectrum of the noisy reverberant speech. The Kalman filter update step models the non-linear relations between the speech, noise, and reverberation log spectra. The Kalman filtering algorithm uses a signal model that takes into account the reverberation parameters of the reverberation time and the direct-to-reverberant energy ratio (DRR). The proposed algorithm is evaluated in terms of speech quality, speech intelligibility, and dereverberation performance for a range of reverberation parameters and reverberant speech to noise ratios, in different noises, and is also compared to competing denoising and dereverberation techniques. Experimental results using noisy reverberant speech signals demonstrate the effectiveness of our enhancement algorithm.

Research topics: Dereverberation, Joint noise suppression and speech dereverberation, Realistic noisy reverberant speech settings, Modulation-domain Kalman filtering for speech enhancement, Speech and audio, Nonlinear Kalman filter update step, Estimation and tracking of the reverberation parameters, Reverberation time (T60) and DRR, Speech quality, Intelligibility

"Modulation-domain Kalman filtering for single-channel speech enhancement, denoising and dereverberation": We perform robust single-channel speech enhancement, joint noise suppression and dereverberation, using modulation-domain Kalman filtering to blindly and adaptively track the time-frequency log-magnitude spectra of speech and reverberation. Several adaptive phase-sensitive speech enhancement algorithms based on modulation domain Kalman filtering are designed, described, implemented and evaluated that perform blind joint denoising and dereverberation, while accounting for the inter-frame speech dynamics by estimating the posterior distribution of the speech log-magnitude spectrum given the log-magnitude spectrum of the noisy reverberant speech. The Kalman-filter-based enhancement algorithms, dependent on the signal model and the tracked quantities, are evaluated in terms of speech quality, intelligibility, and dereverberation performance for a range of reverberation parameters and SNRs, in different noise types and reverberant conditions, and are compared to competing denoising and dereverberation techniques. Experimental results and instrumental measures indicate that our algorithms enhance the speech quality of the degraded noisy reverberant signals and outperform other algorithms over a range of SNRs for various noise types and reverberant conditions.

Research topics: Signal processing, Nonlinear modulation-domain Kalman filtering, Speech enhancement, Dereverberation, Noise suppression, Inter-frame speech dynamics, Estimation and tracking, Phase-sensitive models, Improve speech quality

"Modulation-domain speech enhancement using a Kalman filter with a Bayesian update of speech and noise in the log-spectral domain": We present a novel Bayesian estimator that performs log-spectrum estimation of both speech and noise, and is used as a Bayesian Kalman filter update step for single-channel speech enhancement in the modulation domain. We use Kalman filtering in the log-power spectral domain rather than in the amplitude or power spectral domains. In our proposed Bayesian Kalman filter update step, we define the posterior distribution of the clean speech and noise log-power spectra as a two-dimensional multivariate Gaussian distribution. We utilise a Kalman filter observation constraint surface in the three-dimensional space, where the third dimension is the phase factor. We report the evaluation results of our proposed phase-sensitive log-spectrum Kalman filter by comparing them with the results obtained by traditional noise suppression techniques and by an alternative existing Kalman filtering technique that assumes additivity of noise and clean speech in the power spectral domain.

Research topics: Speech enhancement, Denoising, Modulation-domain Kalman filtering, Nonlinear Kalman filter update step, Speech phase, Phase factor, Signal model that takes into account that speech and noise add in the complex Short-Time Fourier Transform (STFT) domain, Posterior distribution of noise and clean speech in the log-amplitude spectral domain, Imposition of the observation constraint, Evaluation in terms of speech quality

"Speech Enhancement Using Kalman Filtering in the Logarithmic Bark Power Spectral Domain": We present a phase-sensitive speech enhancement algorithm based on a Kalman filter estimator that tracks speech and noise in the log Bark power spectral domain. With modulation-domain Kalman filtering, the algorithm tracks the speech spectral log-power using perceptually-motivated Bark bands. By combining STFT bins into Bark bands, the number of frequency components is reduced. The Kalman filter prediction step separately models the inter-frame relations of the speech and noise spectral log-powers and the Kalman filter update step models the nonlinear relations between the speech and noise spectral log-powers using the phase factor in Bark bands. The algorithm is evaluated in terms of speech quality with different algorithm configurations compared on various noise types. Experimental results show that tracking speech in the log Bark power spectral domain, taking into account the temporal dynamics of each subband envelope, is beneficial.

Research topics: Modulation-domain Kalman filtering for speech enhancement, Bark frequency bands, Mel frequency bands, Log-amplitude spectral domain, Noise suppression, Adaptive algorithm, Real-world noise, Babble noise, Estimation and tracking, Modelling of the time dynamics of speech, Speech phase, Phase factor, Additivity of speech and noise in the complex STFT domain, Evaluation in terms of speech quality including PESQ

"Speech enhancement using modulation-domain Kalman filtering with active speech level normalized log-spectrum global priors": We describe a single-channel speech enhancement algorithm that is based on modulation-domain Kalman filtering that tracks the inter-frame time evolution of the speech log-power spectrum in combination with the long-term average speech log-spectrum. We use offline-trained log-power spectrum global priors incorporated in the Kalman filter prediction and update steps for enhancing noise suppression. We train and utilise Gaussian mixture model priors for speech in the log-spectral domain that are normalized with respect to the active speech level. The Kalman filter update step uses the log-power spectrum global priors together with the local priors obtained from the Kalman filter prediction step. The log-spectrum Kalman filtering algorithm, which uses the phase factor distribution and improves the modeling of the modulation features, is evaluated in terms of speech quality. Different algorithm configurations, dependent on whether global priors and/or Kalman filter noise tracking are employed, are compared in various noise types.

Research topics: Speech enhancement, Nonlinear modulation-domain Kalman filtering, Active speech level estimation, Tracking of speech and noise including the estimate of the active speech level, Global speech priors, Speech phase, Phase factor, Development of phase-sensitive algorithm, Speech quality

"On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering": This research work focuses on algorithms that perform single-channel speech enhancement. We use modulation-domain Kalman filtering for speech enhancement, i.e. noise suppression and dereverberation. Non-linear modulation-domain Kalman filtering can be applied for both noise and late reverberation suppression. Various model-based speech enhancement algorithms that perform modulation-domain Kalman filtering are designed, implemented, and tested. Our proposed model-based speech enhancement algorithm estimates and tracks the speech phase to improve performance.

Research topics: Modulation-domain Kalman filtering, Nonlinear Kalman filter update step, Single-channel speech enhancement, Dereverberation, Late reverberation suppression, Denoising, Joint noise suppression and speech dereverberation, T60 and DRR, Real-world signal model, Speech phase, Phase factor, Phase-aware algorithm, Estimation and tracking in the log-power spectral domain, Speech quality and intelligibility evaluation

"Active speech level estimation in noisy signals with quadrature noise suppression": We present a noise-robust algorithm for estimating the active level of speech, which is the average speech power during intervals of speech activity. We use the clean speech phase to remove the quadrature noise component from the short-time power spectrum of the noisy speech, as well as SNR-dependent techniques to improve the estimation. The pitch of voiced speech frames is determined using a noise-robust pitch tracker. The speech level is estimated from the energy of the pitch harmonics using the harmonic summation principle. Our method is evaluated using a range of noise signals and gives consistently lower errors than previous methods and the ITU-T P.56 algorithm.

Research topics: Speech and audio, Active speech level estimation, Noise suppression, Robustness to various different noise types, Low SNR, Adaptive denoising, Non-stationary noise including babble noise, Intermittent and transient noise, Pitch and harmonics of speech, Pitch estimation, Global speech priors

"Adaptive power spectrum estimation of non-stationary acoustic noise": We propose an efficient technique to adaptively estimate the power spectrum of non-stationary noise considering the distinction between background noise and transient noise when a noisy speech signal is given. Our method aims at continuously tracking and estimating the noise power in each frequency bin of every frame, even when speech is present. Our noise estimation techniques are evaluated with respect to their accuracy and their tracking delay efficiency. Our noise estimation algorithms are also integrated into speech enhancement systems. Both speech quality and intelligibility measures are examined.

Research topics: Speech enhancement, Noise suppression, Denoising, Non-stationary noise, Transient noise, Intermittent noise, Real-world noise including babble noise, Adaptive noise estimation and tracking, Joint noise and clean speech tracking, Log-amplitude STFT spectral domain, Estimation error and accuracy, Both speech quality and speech intelligibility

"Analysis of the PPG and the ECG Signals": This report provides an overview of the design and the implementation of a system that estimates the respiratory rate (RR) using the PPG and the ECG signals. The system uses two algorithms to estimate the RR from the PPG signal. Furthermore, the system uses two algorithms to estimate the RR from the ECG signal. These algorithms are based on the QRS and on the RR interval of the ECG signal. The system combines the four algorithms by taking the mean value of the Kalman Filter implementation of every algorithm. Each algorithm is improved by using a KF to predict the next RR measurement based on the two previous and on the current RR estimates. The RMS error of the entire system is 2.0 breaths per minute (bpm).

Research topics: Signal processing, Kalman filter algorithm, Adaptive real-time algorithm, ECG and PPG signals, Estimation and tracking, Estimation of the respiratory rate, Improved performance, Evaluation in terms of accuracy and the RMS error