Mamba offers competitive accuracy with superior computational efficiency compared to state-of-the-art transformer models. While this advantage makes Mamba particularly promising for resource-constrained edge devices, no hardware acceleration frameworks are currently optimized for deploying it in such environments. We present eMamba, a comprehensive end-to-end hardware acceleration framework explicitly designed for deploying Mamba models on edge platforms. We also quantize and implement the entire eMamba pipeline on an AMD ZCU102 FPGA and ASIC using GlobalFoundries (GF) 22 nm technology.
Traditional point cloud detection models, such as those derived from LiDAR or 4D Radar, are accurate are computationally intensive and ill-suited for low-power edge environments. To address these shortcomings, we propose EdgePillars that combines a high-speed, simple voxel-based encoder with the low-latency Backbone of pillar-based models.
Current large language models (LLMs) operate on servers using high-performance GPUs. However, this requires sending data to the server, which raises security concerns and incurs high costs. To address these issues, we are researching hardware architectures and software that can enable efficient on-device inference of LLMs.