(Co-Author)
Developed three successively optimized hardware variants of SamplerZ, the discrete Gaussian sampler critical to FALCON signature generation, using architectural, algorithmic, and datapath-level innovations.
Introduced Estrin’s Scheme-based exponential approximation, IP-based constant-latency exponentiation, and parallel rejection sampling to shrink the critical path and improve throughput.
Achieved a 71% reduction in sampling latency and 46% reduction in end-to-end signature generation latency over prior state-of-the-art, along with a 48% reduction in area-time product on Xilinx Zynq Ultrascale FPGAs.
(First-Author)
Evaluated ChatGPT-o4-mini and other LLM-based models for generating HLS-compatible C++ implementations of a discrete Gaussian sampler (SamplerZ in FALCON) targeting latency and area-delay product.
On FPGA, LLM-generated SamplerZ matched the current state-of-the-art hand-coded RTL baseline performance within 4% latency and 30% area while also providing an additional optimization unexplored in current implementations.
(First-Author)
Designed and evaluated four multiplier architectures —Baseline, Tiling, Comba, and Karatsuba—tailored for unified 64-bit integer and 53-bit floating-point operations in FALCON.
On FPGA (Xilinx Virtex-7), Karatsuba achieved the best area efficiency (19.2% better than baseline), while Tiling delivered the highest energy efficiency (35.9% improvement).
On ASIC (SkyWater 130nm), Karatsuba remained area-optimal, but Comba unexpectedly outperformed others in energy efficiency (51.5% over Karatsuba, 22.8% over baseline).