Key Features
🖥️ Multiple Topologies: Tested on YOLO Tiny, OCR (DeepBenchConv), and a modified MobileNet
⚙️ Configurable Array Sizes: From compact 12×12 arrays to large 256×256 arrays
🔄 Dataflow Comparison: Input Stationary (IS), Output Stationary (OS), and Weight Stationary (WS)
📊 Performance Metrics: Overall utilization, total cycles, and mapping efficiency
Research Contributions
Identified 32×32 Output Stationary as optimal for YOLO Tiny due to its balance of utilization and cycle count
Demonstrated 64×64 Output Stationary as best for MobileNet, aligning with depth-wise separable convolution characteristics
Conducted a sensitivity study across filter sizes (1×1 to 7×7), showing weak correlation with utilization in depth-wise separable convolutions
Technical Achievements
Achieved ~20% higher utilization with 32×32 OS compared to larger arrays in YOLO Tiny
systolic
Verified trade-offs between parallelism, memory bandwidth, and resource efficiency
Established Output Stationary as the most effective dataflow across multiple workloads
Applications
AI Accelerators: Custom hardware design for efficient CNN execution
Edge AI Systems: Optimized architectures for low-power devices
Hardware-Software Co-Design: Exploring DNN mapping strategies on systolic arrays
Impact and Recognition
This project provided insights into area-efficient systolic array configurations, demonstrating how compact architectures can deliver high utilization without requiring arbitrarily large arrays. Results highlight 32×32 Output Stationary as a strong default choice for lightweight neural networks, with 64×64 OS better suited for MobileNet-like workloads