Publications

miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge

Haris Gulzar, Monikka Roslianna Busto, Takeharu Eda, Katsutoshi Itoyama, Kazuhiro Nakadai

Abstract:

Real-time applications of Automatic Speech Recognition (ASR) on user devices on the edge require streaming processing. Conformer model has achieved state-of-the-art performance in ASR for the non-streaming task. Conventional approaches have tried to achieve streaming ASR with Conformer using causal operations, but it leads to quadratic increase in the computational cost as the utterance length increases. In this work, we propose a chunked-context masking approach to perform streaming ASR with Conformer, which limits the computational cost from quadratic to a constant value. Our approach allows self-attention in Conformer encoder to attend the limited past information in form of chunked context. It achieves close to the full context causal performance for Conformer-Transducer, while significantly reducing the computational cost and maintains a low Real Time Factor (RTF) which is highly desirable trait for resource-constrained low-power edge devices.

Keywords: Machine Learning, Speech AI, Streaming ASR

Published in: INTERSPEECH 2023

Peer-reviewed Conference Paper | Status: Published, link

CASE: CNN Acceleration for Speech-Classification in Edge-Computing  

Haris Gulzar, Muhammad Shakeel, Katsutoshi Itoyama , Kenji Nishida, Kazuhiro Nakadai, Hideharu Amano

Abstract:

High performance of Machine Learning algorithms has enabled numerous applications based upon speech interface in our daily life, but most of the frameworks use computationally expensive algorithms deployed on cloud servers as speech recognition engines. With the recent surge in the number of IoT devices, a robust and scalable solution for enabling AI applications on IoT devices is inevitable in form of edge computing. In this paper, we propose the application of Systemon-Chip (SoC) powered edge computing device as accelerator for speech commands classification using Convolutional Neural Network (CNN). Different aspects affecting the CNN performance are explored and an efficient and light-weight model named as CASENet is proposed which achieves state-of-the-art performance with significantly smaller number of parameters and operations. Efficient extraction of useful features from audio signal helped to maintain high accuracy with a 6X smaller number of parameters, making CASENet the smallest CNN in comparison to similarly performing networks. Light-weight nature of the model has led to achieve 96.45% validation accuracy with a 14X smaller number of operations which makes it ideal for low-power IoT and edge devices. A CNN accelerator is designed and deployed on FPGA part of SoC equipped edge server device. The hardware accelerator helped to improve the inference latency of speech command by a 6.7X factor as compared to standard implementation. Memory, computational cost and latency are the most important metrics for selecting a model to deploy on edge computing devices, and CASENet along with the accelerator surpasses all of these requirements. 

Keywords: Machine Learning, Speech-Classification, Edge-Computing, Internet of Things (IoT)

Published in: IEEE Cloud Summit 2021

Peer-reviewed Conference Paper | Status: Published, link

Fig. 1: Deploying Machine-Learning Models on SoC based Edge Devices

A Multi-Access Edge Computing Solution with Distributed Sound Source Localization for IoT Networks 

Haris Gulzar, Muhammad Shakeel, Katsutoshi Itoyama , Kenji Nishida, Kazuhiro Nakadai

Abstract:

In this paper we have presented a flexible edge computing approach for distributed sound source localization and tracking by utilizing a custom-built Multi-Access Edge Computing (MEC) device for Internet of Things (IoT) applications. A multichannel microphone array mounted embedded device is modelled as IoT node to record real-time sound signals, perform computation according to its resource capability and then communicate either audio signals, partially computed results or final sound source localization (SSL) results to MEC device to perform further computation. In this paper, we present a framework to deploy HARK (Honda research institute Japan, Audition for Robots with Kyoto university) to perform SSL on multi devices by exploiting the PYNQ platform of MEC device. IoT node and MEC device together are modelled to offer a light-weight and flexible environment to perform distributed SSL where size of communication payload was optimally reduced from 25KB to 0.1KB per frame, which is highly desired in IoT applications. 

Keywords: Edge Computing, Internet of Things (IoT), Sound Source Localization, MQTT

Published in: 21st Society of Instrument and Control Engineers System Integration (SICE-SI2020)

Best Paper Presentation Award | Conference Paper | Status: Published in SI2020 Archive

Fig. 1: Distributed Computing framework between IoT and Edge Device

Auditory Awareness with Sound Source Localization on Edge Devices for IoT Applications

Haris Gulzar, Muhammad Shakeel, Katsutoshi Itoyama , Kenji Nishida, Kazuhiro Nakadai

Abstract:

In this paper we propose a sound source localization solution in 3D environment with specific focus on edge computing for IoT applications. The proposed method integrates real time sound signals processing, edge computing and deployment of this model for IoT network. Sound signals processing is performed on IoT edge device and sound source localization results are transmitted to the remote server using a light weight communication protocol. The computational shift from cloud to edge device doesn’t only resolve the problem of cloud overloading and scalability but transmission of minimum information through a light weight communication protocol paves the way to further develop sustainable solution for sound signals processing with enhanced data privacy, least bandwidth utilization and improved overall latency. 

Keywords: Internet of Things (IoT),  HRI-JP Audition for Robots with Kyoto University (HARK), Sound Source Localization, MQTT

Published in: 38th Annual Conference of the Robotics Society of Japan  (RSJ2020)

Conference Paper | Status: Published in RSJ2020 Archive

Fig. 1: Sound source localization for HARK IoT node