CURRICULUM VITAE

Alexei V. Ivanov                                                                                                                                    

Address:  Bay Area, CA, USA, Trento, Italy

tel.: +1-408-475-5125, +39-346-612-3914 

e-mail: alexei_v_ivanov@ieee.org

Executive Summary:

(http://sites.google.com/site/alexeivivanov/)

Alexei V. Ivanov is an engineer and computer scientist. He received his PhD in Theoretical Foundations of Computer Science in 2004 from Belarussian State University of Informatics and Radioelectronics. He also holds a MSc degree in Applied Mathematics and Physics from Moscow Institute of Physics and Technology. He has working experience both in academia (University of Trento, Moscow Institute of Physics and Technology) and industry (Educational Testing Service (ETS); Pearson Knowledge Technologies; Speech Technology Center, Moscow; Lernout & Hauspie Speech Products NV, Belgium).

Alexei has broad experience in machine learning for speech and natural language processing systems. His current research interests include contextualized adaptive conversational machines; speech characterization technology; integration of para-linguistic knowledge into natural speech recognition and interpretation.

In 2011-12 he led a research group that won the Interspeech'2012 Para-linguistic Challenge in recognizing personality from speech. In 2013-15 he was involved in several projects concentrating on hardware acceleration. Specifically, that effort resulted in a 50+ times faster speech recognition decoding with SIMD devices. During his ETS time, he was in charge of construction of the on-line speech recognizer for interaction with remote non-native speakers of English. That system achieved parity with human transcription accuracy. Later at Knowles Alexei led the development of the extremely-small footprint (~3 Mb) HW accelerated unlimited vocabulary speech recognizer that consumes 75mW and achieves the state-of-the-art in transcription accuracy.

Skills:

- AI/ML: Pytorch, Tensorflow, Tensorflow Light, Tensorflow MicroLight.

- Software development: C/C++, Perl, JavaScript, Java, Python.

- SIMD device programming: various GPUs/DSPs using CUDA, OpenCL, DPC++, C.

- ML model distillation/porting to constrained execution environments: Reducing memory and computational footprint of ML models for efficient execution. Quantization and algorithm porting to a resource-constrained “edge-processing” execution target.

- Rapid software prototyping: Git, Github. Experience in working with open source packages like Kaldi, K2, Lhotse, Icefall, Spacy, HuggingFace, Weka, openFST, SRILM, IRSTLM and others.

- Distributed High-Performance Computation: AWS, Azure, GCP.

- Algorithm prototyping: Matlab, Simulink, Octave, Maple, Mathematica.

- Big Data Visualization: Matlab, dot, gnuplot.

- Software Optimization: Valgrind, GPROF, GDB, CUDAMemCheck, NVIDIA nvprof.

- DB: mySQL, postrgreSQL.

- Experiment Design: problem definition, baseline composition, error analysis, improvement synthesis, experimental validation.

- Scientific Reporting: experience writing papers and articles, composing technical presentations.

Education:

2004 PhD Degree in “Theoretical Foundations of Computer Science” from Belarussian State University of Informatics & Radioelectronics (BSUIR), Minsk, Belarus

1995 MSc Degree in "Applied Mathematics & Physics" from Moscow Institute of Physics & Technology (MIPT), Moscow

Experience:

2021-present Uniphore Inc., 1001 Page Mill Road, Palo Alto, California

Sr. Principal Research Scientist

Developed interactive context-aware ML tools for human-human conversation analysis;

Supervised a team doing experimental design, data analysis and interpretation;

Coordinated joint research programs with external partners.

2020-2021 McD Tech Labs, 2440 W El Camino Real, Mountain View, California

Sr. Core Technology Engineer

Developed ML technology to support an automated conversational agent;

Worked to enable handling of code-switching and multi-linguality in a conversation;

Built a prototype to dynamically characterize human psycho-physiological states.

2017-2020 Knowles Intelligent Audio, 331 Fairchild Dr, Mountain View, California

Principal Engineer for Speech Recognition Algorithms

Developed ML and data mining technology for commercial applications;

Designed a modular distributed speech processing platform;

Designed, implemented and supervised the local high-performance distributed compute infrastructure for ML;

Designed an edge-based unlimited vocabulary low-latency speech recognition and understanding system with the state-of-the-art accuracy that works locally in the Knowles Audio Processor IA8508 consuming ~75mW of power while continuously transcribing up to 4 independent continuous spoken channels. This system can be employed as cloud-independent voice transcription that supports free-speech interactive voice interfaces or voice interface to consumer electronic devices using the dynamic (“on-the-fly”) definition of the active vocabulary in the written form.

Developed ML models for unlimited vocabulary edge-based ASR: US English (broad regional accent support); Mandarin Chinese (regional accent independent).

2014-2016 Educational Testing Services (ETS), San Francisco, California

Senior Research Scientist

Developed the automated spontaneous speech recognition, characterization and understanding for the purposes of automated dialogue agents in language skill assessment application.

2013-2014 Fondazione Bruno Kessler // Pervoice SPA, Trento, Italy

Senior Research Scientist

Developed the automated speech recognition and characterization components in highly efficient media-monitoring systems.

Developed the emotion recognition system in application to call center performance analysis.

Developed the production-grade GPU-accellerated ASR engine.

2012-2013 Pearson Knowledge Technologies, Menlo Park, California

Senior Research Scientist

Developed the automated speech recognition and characterization technology for the machines built for spoken language test scoring.

Developed the in-house speaker verification engine for language testing environment.

With my method of speech & speaker characterization we have participated and won the Interspeech'2012 worldwide Speaker Personality Recognition Challenge with a task to predict judgment of human experts on the apparent personality profile of a speaker (OCEAN traits).

2008-2012 Department of Information Engineering and Computer Science, University of Trento (Povo), Trento, Italy

Marie Curie Research Fellow in “ADAMACH” project // PostDoc Researcher in “LiveMemories” project

Studied and developed the automated speech recognition aspects of the adaptive conversational machines.

Studied automated language acquisition.

Explored para-linguistic and prosodic aspects of natural speech communication.

Studied segmental methods of speech characterization (recognition of emotion, predicting speaker personality profile, etc.).

Developed open-source large vocabulary speech recognition system for English, Italian, Russian.

Tutored PhD students.

2005-2008 Moscow Institute of Physics and Technology (State University), Moscow

Assistant and Instructor at the Radioelectronics department. Research Scientist

Participated in educational process: lectures and laboratory classes.

Prepared a lecture course “Fundamentals of Speech Recognition”.

Prepared a lecture course “Scientific computation with graphical accelerator hardware (GPGPU)”.

Prepared a MIPT Lecture course “Speech recognition by human and machine”.

Studied speech and audio perception from the information-theoretic ground.

Developed GPGPU accelerator for Acoustic Model Training process.

Developed the telephone (PSTN & GSM) Phonetic Key Word Spotting Engine.

2004-2006 Speech Technology Center, Moscow

Research Scientist, Leading Specialist in Large Vocabulary (1.5M Word Forms) Speech Recognition Engine (Russian Language)

Studied and developed feature extraction algorithms to increase robustness to environment interference.

Studied and developed decoding algorithms: linguistically constrained hypothesis generation and pruning, fast search in hypotheses space.

Developed the DSP/FPGA hardware accelerators for speech recognition applications.

2002-2004 Belarussian State University of Informatics & Radioelectronics (BSUIR), Minsk, Belarus

Research Engineer at the Computer Science department

Studied Anthropo-Neuromorphic feature extraction algorithms;

Developed the practical non-linear feature extraction algorithm capable of efficiently handling a broad range of environment interference.

2000-2001 Lernout & Hauspie Speech Products NV, Wemmel, Belgium.

Research Engineer at the corporate R&D Center

Studied and developed robust feature extraction algorithms;

Developed small foot-print methods of speech recognition for embedded systems;

Maintained Lernout & Hauspie small-foot-print recognition platform ASR300.

Language Proficiency

Russian & Belarussian - native;

English - fluent (Jan, 1999, TOEFL Computer Based Test Score 260 out of 300);

French, German, Italian - read and translate;

Nederlandes - translate with a vocabulary.

Government Sponsored Projects:

ADAMACH" - research into adaptive conversational machines;

"LiveMemories" - web-integration of individual multimedia experiences;

"EUBridge" - streaming transcription and translation services;

"WikiVoice" - computationally efficient web-based speech and language tools.

Membership in Professional Organizations:

International Speech Communication Association (ISCA);

Institute of Electrical and Electronics Engineers (IEEE): Signal Processing & Information Theory Societies;

Association for Computing Machinery (ACM);

Acoustic Engineering Society (AES);

Corresponding Electronic Associate Member of Acoustical Society of America (ASA).