Haoran Wei

Haoran Wei 魏浩然

Staff Research Engineer, Rivian and Volkswagen Group Technologies

PhD from Electrical Engineering, University of Texas at Dallas

Email: haoranwei@rivianvw.tech

Research interests

Speech and Music Processing
Real Time Image and video Processing
Pattern Recognition and Deep Learning
Lite Deep Learning Models on Smart Phone
Sensor Fusion

Useful Links

Google Scholar

Research Gate

知乎

微信 WeChat

Work Experiences:

Seniror Enginner, Machine Learning Research @ Samsung Research America (Sep 2022 - Now) Report to: Vijendra Raj Apsingekar & Srinivas Ponakala

Samsung 'Hi Bixby' and 'Bixby' Wakeup Project (Keywords Verifier, word-level). New Bixby Wakeup Verifier model is deployed on global Samsung phones (support 11 accents)on May 2023, and our team is granted the MPS Star Award. This new Bixby wakeup verifier model is a global model developed by our North American Bixby team. Compared to the old version, this new model size drops 3 times, false wakeup reduces Twice and has 10% increase in correct-acceptance rate. Improvements come from training speed, model size, data cleaning, data augmentation, training data selection, and model training strategy.

After deploying this mode on the US market one month ago, this model gets confirmed for its effect on real user data. For old version, 64% of user data is false wakeup, and 34% is actual wakeup, every actual wakeup comes with 1.88 (64/34) times false wakeup. For the new version, only 37% of user data is false wakeup, and actual wakeup come to 61%, every actual wakeup comes with 0.61 (37/61) times false wakeup. Suppose users have the same Bixby usage frequency, on the actual US market, now False Wakeup reduces more than 3 (1.88/0.61) times.

Samsung 'Hi Bixby' and 'Bixby' Wakeup Project (Keywords Detector, frame-level).

Senior Algorithm Engineer @ Visla, a start up company working on video editing&communication (Mar 2021 - Aug 2022) Report to: Huipin Zhang 张慧品

1. Explore new features

1.1 Music Remix音乐拼接: remix a music to a target length 分析音乐结构后，拼接音乐到指定长度, 且拼接自然, 用户听不出拼接位置。

1.2 BGM Ducking 音频增益调节: auto adjust volume for music with speech tracks. 音乐回避人声。

1.3 VAD (Voice Activity Detection)+ Winner Filter based Audio Denoise 人声检测+语音降噪: Vad aims to find speech and non-speech part in audio, wiener filter based speech denoise to get clear speech。

1.5 Split Speech 切分送ASR的音频，实现快速拿到ASR结果: detect silence and split speech to accelerate ASR。

1.6 Force Alignment强制对齐: align transcript with speech from Zoom video, to save ASR usage进行语音-字幕对齐，降低语音识别运算量。（对于偶尔发生的异常，拿不到结果的情况，需要加入其他逻辑, 比如根据语速大致估计，实现对齐）

1.7 Optimal Play Speed Recommendation推荐最佳播放速度：Recommend a faster play speed for slow speed. An unusual ASR detection model is included.对于语速较慢的视频，自动推荐播放速度. 并包含一个检测ASR异常的模块。

1.8 Refine AWS-ASR results, adjustive aggressive threshold. AWS语音识别结果后处理, 解决1) 高噪音场景下额外单词问题,使用到自动处理强度调节+VAD结果覆盖+安全切割点+ASR结果平滑 2) ASR在高噪场景下单词长度不准的问题,使用到自动处理强度调节+依据音频特征+发音时长先验信息. 可能带来的弊端1）在ASR结果中去除了孤立且长度较短的filler words，导致后续 Remove Filler Words功能找不出这种词。

1.9 Remove Filler Words and Pauses 自动移除连接词(无痕迹音频切割的实现，主要是加入淡入淡出效果，防止出现破音) 和停顿.

1.10 fast audio waveform plot快速画波形

1.11BGM recommendation背景音乐推荐（预留了根据用户喜爱度，调整推荐概率的模块，但是概率更新需要额外实现）

1.12 Music Beat Detection音乐节奏检测: find beat and downbeat and BPM (Beat Per Minutes) for BGM 为背景音乐找到节拍点, 重拍点和拍速 (尚未应用到产品)

1.13 Video Scene Split: Modify Bohan’s code and split scene for a video input (尚未应用到产品，对于淡入淡出，或者动画效果，识别效果差，还需要改进)

1.14 Deep learning based speech enhancement, and other signal processing solutions for improving audio recording quality.基于深度学习的降噪，以及其他提高录音音质的信号处理方法（探索阶段）

1.15 Face Detection and Recognition Related works人脸检测/识别相关模块：extend Qimeng’s work（探索阶段，需要先定义好使用场景，有利于技术选型）

2. Deploy features on server 将上述功能部署到AWS服务器

3. Algorithm performance enhancement 计算时间与内存效率提升.

4. Try to Explore Market 尝试探索市场 Market Exploring is super important for a Start Up Company, especially in this stage ( before release) for Visla

Quarter Star Award 季度最佳员工奖励 [link]

Research Intern @ 快手 Kwai US R&D Center (May 2020 - Aug 2020) Mentor: Fei Tao 陶斐

Works on multi modal signal processing, Video generating.

Software Engineer Intern @ OPPO US Research Center (Jan 2020 - May 2020) Mentor: Chiu Man Ho 何朝文 & Yuan Lin 林袁

1. Works on audio-visual based speech enhancement, lip reading.

2. Involved in an OCR (optical character recognition) project, responsible for character detection part. Convert a pytorch based modal to a keras based model, then to tf-lite model which can be deployed on smartphone.

3. Various lite deep learning models(MobileNet/ShuffleNet/GhostNet), pruning and quantization methods have been compared.

Research Assistant and Teaching Assistant @ University of Texas at Dallas (Aug 2017 - Dec 2020) Advisor: Nasser Kehtarnavaz

Works on real time video processing, speech processing, Sensor Fusion and Deep learning on Mobile.

Research Assistant @ Shanghai Normal University (Aug 2014- Jun 2017), Advisor: Yanhua Long

Works on speech processing, voice activity detection, speaker identification.

Project Assistant Intern @ Shanghai Industrial μTechnology Research Institute (May 2014 - Sep 2014) Mentor: Wenjie Yu 俞文杰

Academic Activities:

Associate Editor of [Journal of Real-Time Image Processing] (IF 3.0)

Chief Guest Editor of [Biomimetics ] (IF 4.5 ) special issue on [Bioinspired Artificial Intelligence Applications]

Chief Guest Editor of [Sustainability] (IF 3.9, ) special issue on [Artificial Intelligence Applications for Sustainable Urban Living]

Guest Editor of [Applied Sciences] (IF 2.5), special issue on [Advanced Technologies and Applications of Emotion Recognition]

Technical Committee Member of conference ICBDT2021 [Link], ICBDT2022 and ICBDT2023

Reviewer of 《IEEE Sensors Journal》,《Remote Sensing》，《Real Time Image Processing》,《Diagnostics 》，《Sensors》, 《Journal of Personalized Medicine》，《PLOS ONE》，《Computers in Biology and Medicine 》，《Applied Sciences》，《Electronics》，《Entropy》, 《Signal, Image and Video Processing》，《Journal of Intelligent & Fuzzy Systems》,《Journal of Pharmaceutical Research International 》, Conference of ICASSP and ICBDT

Thesis Papers:

Haoran Wei. Deep Learning Solutions for Continuous Action Recognition Using Fusion of Inertial and Video Sensing and for Far Field Video Surveillance, PhD Dissertation, University of Texas at Dallas, 2019. [slides ][Paper]
Haoran Wei. Deep Learning-Based Far Field Video Surveillance and Continuous Action Recognition by Fusion of Inertial and Video Sensing, PhD Proposal Exam, University of Texas at Dallas, 2019. [slides]
Haoran Wei. Real-time adaptive noise reduction for hearing devices, PhD Qualify Exam, University of Texas at Dallas, 2017. [slides]
魏浩然. 基于统计模型的语音端点检测. 硕士学位论文，上海师范大学，2017. [中文论文] [中文ppt] Haoran Wei. Statistical model based voice activity detection . Master Thesis Paper, Shanghai Normal University, 2017

Patents:

P2. Haoran Wei, Srinivasa Rao Ponakala, Neha Barde, Gowtham Srinivasan, Patrick Hegarty, Taeyeon Ki, Kenneth Y. Cho, Vijendra Raj Apsingekar. Consistency Check for Large Language Model Continuous Conversations. Samsung Research America.

P1. Haoran Wei, Gowtham Srinivasan, Srinivasa Rao Ponakala, Vijendra Raj Apsingekar. Bixby Wakeup with Auto Enrollment and on-Device Training. Samsung Research America.

P2. 魏浩然, 龙艳花, 冯志民, 叶宏, 茅红伟. 一种基于位置信息的语音端点检测方法. 中国专利:申请状态(申请号:201710624269.0).

P1. 龙艳花, 叶宏, 魏浩然. 采用声纹和语音识别进行个性化电视语音唤醒的方法. 中国专利:申请状态(申请号:201410840544.9).

Journal Papers:

J23. Haoran Wei, Fei Tao, Zhenghua Huang, Yanhua Long. Bioinspired Artificial Intelligence Applications 2023[J]. Biomimetics 9 (2), pp: 80, 2024. [Open Access Link]

J22. Yifan Zhou, Yanhua Long, Haoran Wei. Acoustic-Sensing-Based Attribute-Driven Imbalanced Compensation for Anomalous Sound Detection without Machine Identity[J]. Sensors , 23(21), pp: 8984, 2023. [Open Access Link]

J21. Yuchao Chang, Wen Chen, Jun Li, Jianpo Liu, Haoran Wei, Zhendong Wang and Naofal Al-Dhahir. "Collaborative Multi-BS Power Management for Dense Radio Access Network using Deep Reinforcement Learning." IEEE Transactions on Green Communications and Networking, 2023.

J20. Yue Cao, Yegeng Sun, Runtian Zheng, Qing Wang, Xue Li, Haoran Wei, Likai Wang, Zhongfang Li, Fagang Wang, and Ning Han. "Biomass-derived carbon material as efficient electrocatalysts for the oxygen reduction reaction." Biomass and Bioenergy 168 (2023): 106676.

J19. Guolei Liu, Xue Li, Qing Wang, Kuizhao Sun, Chuping Lee, Yue Cao, Weimeng Si, Haoran Wei, Zhongfang Li, and Fagang Wang. "The biomass of pig-blood-derived carbon as a novel electrode material for hydrogen peroxide electrochemical sensing." Catalysts 12, no. 11 (2022): 1438.

J18. Haoran Wei, Zhendong Wang, Yuchao Chang and Zhenghua Huang. Introducing the Special Issue on Artificial Intelligence Applications for Sustainable Urban Living[J], Sustainability, 14(20), pp.13631, 2022 [Open Access Link]

J17. Zhendong Wang, Haoran Wei , Jianda Wang, Xiaoming Zeng and Yuchao Chang. Security Issues and Solutions for Connected and Autonomous Vehicles in a Sustainable City: A Survey[J], Sustainability, 14(19), pp.12409, 2022 [Open Access Link]

J16. Yuhong He,Tao Zeng, Ye Xiong, Jialu Li and Haoran Wei. Deep Leaning Based Frequency-Aware Single Image Deraining by Extracting Knowledge from Rain and Background[J], Machine Learning and Knowledge Extraction, 4(3), pp. 738-752, 2022 [Open Access Link]

J15. 巩霞, 姚泽炜, 魏浩然. 利用人工智能技术检测中国民族乐器[J]. 山东理工大学学报(社会科学版), 1, pp: - , 2022.

J14. Linqiang Wei, Yanhua Long, Haoran Wei, Yijie Li. New Acoustic Features for Synthetic and Replay Spoofing Attack Detection[J]. Symmetry, 14(2), pp: 274, 2022. [Open Access Link]

J13. Long Peng, Aiwen Jiang, Haoran Wei, Bo Liu and Mingwen Wang. Ensemble Single Image Deraining Network via Progressive Structural Boosting Constraints[J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 99: 116460.2021.[Link]

J12. Haidi Zhu, Haoran wei, Baoqing Li, Xiaobing Yuan, Nasser Kehtarnavaz. A review of video object detection: datasets, metrics and methods [J]. Applied Sciences, 10(21),pp: 7834 ,2020. [Open Access Link]

J11. Haidi Zhu, Haoran wei, Baoqing Li, Xiaobing Yuan, Nasser Kehtarnavaz. Real-Time Moving Object Detection in High Resolution Video Sensing [J]. Sensors , 20(12), pp: 3591, 2020. [Open Access Link]

J10. Haoran wei, Pranav Chopada, Nasser Kehtarnavaz. C-MHAD: Continuous Multimodal Human Action Dataset of Simultaneous Video and Inertial Sensing[J]. Sensors , 20(10), pp:2905, 2020. [Open Access Link]

J9. Haoran Wei，Nasser Kehtarnavaz . Simultaneous utilization of inertial and video sensing for action detection and recognition in continuous action streams [J]. IEEE Sensors Journal, 20(11), pp:6055 - 6063, 2020. [IEEE link]

J8. Haoran Wei，Roozbeh Jafari，Nasser Kehtarnavaz . Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition[J]. Sensors , 19(17),pp: 3680, 2019. [Open Access Link]

J7. Haoran Wei, Nasser Kehtarnavaz. Semi-Supervised Faster RCNN-based Person Detection and Load Classification for Far Field Video Surveillance[J]. Machine Learning and Knowledge Extraction, 1(3), pp: 756-767, 2019. [Open Access Link]

J6. Y Zhang, Y Long, X Shen, H Wei, M Yang, H Ye, H Mao. Articulatory movement features for short-duration text-dependent speaker verification[J]. International Journal of Speech Technology, 20(1), pp:1-7, 2017. [pdf]

J5. 周雷, 龙艳花, 魏浩然. 一种新型的与文本相关的说话人识别方法研究[J]. 上海师范大学学报(自然科学版), 46(2), pp:224-230, 2017.

J4. 张艳, 倪继锋, 魏浩然,等. 基于DNN位点的选择和验证[J]. 计算机仿真, 34(7),pp:335-338, 2017.

J3. Wei H, Long Y, Mao H. Improvements on self-adaptive voice activity detector for telephone data[J]. International Journal of Speech Technology, 19(3), pp:1-8, 2016. [pdf]

J2. 魏浩然, 倪继锋, 龙艳花. 基于STM32F103的手写绘图板设计[J]. 上海师范大学学报(自然科学版), 45(5), pp:543-547, 2016.

J1. 魏浩然, 李传江, 翁志明,等. 小型被动式双轴太阳能跟踪装置的设计与应用[J]. 电子制作, 2012(12), pp:97-98,2012.

Conference Papers:

C12. Haoran Wei, Shilin Wang, Yanhua Long. Personalized Speech Enhancement without User Enrollment for Real-World Audio Replay Scenarios, ICASSP, Hyderabad, India, 2025.

C11. Ziling Huang, Haixin Guan, Haoran Wei, Yanhua Long. SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation. ICASSP, Hyderabad, India, 2025.

C10. Li Li, Yijie Li, Dongxing Xu, Haoran Wei, Yanhua Long. ACCENT-SPECIFIC VECTOR QUANTIZATION FOR JOINT UNSUPERVISED AND SUPERVISED TRAINING IN ACCENT ROBUST SPEECH RECOGNITION. 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 2024.

C9. Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei. Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition. 24th Conference of the International Speech Communication Association (INTERSPEECH), Dublin, Ireland, 2023.[paper]

C8. Li Li, Dongxing Xu, Haoran Wei, Yanhua Long.Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system. 24th Conference of the International Speech Communication Association (INTERSPEECH), Dublin, Ireland, 2023.[paper]

C7. Xiaoxiao Wu, Dongxing Xu, Haoran Wei, Yanhua Long.Few-shot continual learning with weight alignment and positive enhancement for bioacoustic event detection[C]. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes island, Greece, 2023.

C6. Xia Gong, Yuxiang Zhu, Haidi Zhu, Haoran Wei. ChMusic: A Traditional Chinese Music Dataset for Evaluation of Instrument Recognition [C]. ACM 4th International Conference on Big Data Technologies (ICBDT), Zibo, China, 2021. [ACM Link]

C5. Xia Gong, Yan Lu, Haoran Wei. Continuous Human Action Detection Based on Wearable Inertial Data[C]. ACM 4th International Conference on Big Data Technologies (ICBDT), Zibo, China, 2021. [ACM Link]

C4. Haoran Wei, Fei Tao, Runze Su, Sen Yang, Ji Liu. Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream [C]. ACM 4th International Conference on Big Data Technologies (ICBDT), Zibo, China, 2021. [ACM Link]

C3. Haoran Wei, Abhishek Sehgal and Nasser Kehtarnavaz . A Deep Learning-Based Smartphone App for Real-Time Detection of Retinal Abnormalities in Fundus Images[C]. SPIE conference of Real-Time Image Processing and Deep Learning, Baltimore, 2019. [pdf] [slides] [video] [code]

C2. Haoran Wei, Nasser Kehtarnavaz. Determining Number of Speakers from Single Microphone Speech Signals by Multi-Label Convolutional Neural Network[C]. The 44th Annual Conference of the IEEE Industrial Electronics Society (IECON), Washingdon D.C, 2018. [pdf] [slides]

C1. Haoran Wei, Matthew Laszewski and Nasser Kehtarnavaz. Deep Learning-Based Person Detection and Classification for Far Field Video Surveillance[C]. The 13th IEEE Dallas Circuits and Systems Conference (DCAS), Dallas, 2018. [pdf] [slides]

Pre-print Papers:

Pre1. Runze Su, Fei Tao, Xudong Liu, Haoran Wei, Xiaorong Mei, Zhiyao Duan, Lei Yuan, Ji Liu, Yuying Xie. Themes Inferred Audio-visual Correspondence Learning[C]. Submitted to The 23rd ACM International Conference on Multimodal Interaction [arxiv]

Google Sites

Report abuse