Efficient Deep Learning for Environmental Sound Classification: A Compact Lightweight Model

Environmental Sound Classification plays a crucial role in diverse applications ranging from smart cities and forest monitoring to surveillance and context-aware IoT systems. Convolutional Neural Networks have recently emerged as the dominant paradigm, surpassing traditional approaches in environmental sound classification tasks. However, these performance gains often come at the cost of increased network depth, model complexity, and model size, limiting their usage in many practical applications.

 

To address these challenges, we present a novel hybrid deep learning architecture that combines CNNs, Kolmogorov-Arnold Networks (KAN), and advanced pooling strategies such as Sparse Salient Region Pooling (SSRP) and Principal Component Analysis (PCA) pooling. Our methodology follows a progressive enhancement strategy: starting with a baseline CNN model, we integrate KAN layers for richer functional representation, introduce SSRP to better capture salient regions, and replace conventional pooling with PCA pooling for dimensionality reduction.

 

To ensure robust feature learning, we adopt a multi-stage preprocessing pipeline consisting of waveform-level augmentations (pitch shifting, time stretching, Gaussian noise addition, random gain), log-mel spectrogram feature extraction, and feature-level augmentation techniques including time and frequency masking and mixup. This establishes a strong balance between performance and model compactness, and outperforming existing state-of-the-art approaches. The results highlight the potential of hybrid neural designs that fuse convolutional frontends, operator-based back-ends, and adaptive pooling mechanisms for environmental sound classification.