Publications

Year 2025

Pre-Print

(Co-Author)

Outrunning the Millennium FALCON: Speed Records for FALCON on FPGAs

Developed three successively optimized hardware variants of SamplerZ, the discrete Gaussian sampler critical to FALCON signature generation, using architectural, algorithmic, and datapath-level innovations.
Introduced Estrin’s Scheme-based exponential approximation, IP-based constant-latency exponentiation, and parallel rejection sampling to shrink the critical path and improve throughput.
Achieved a 71% reduction in sampling latency and 46% reduction in end-to-end signature generation latency over prior state-of-the-art, along with a 48% reduction in area-time product on Xilinx Zynq Ultrascale FPGAs.

VLSI-SoC 2025

(First-Author)

Deus Ex LLMs: AI vs Humans in Post-Quantum Cryptographic Hardware Code Generation

Evaluated ChatGPT-o4-mini and other LLM-based models for generating HLS-compatible C++ implementations of a discrete Gaussian sampler (SamplerZ in FALCON) targeting latency and area-delay product.
On FPGA, LLM-generated SamplerZ matched the current state-of-the-art hand-coded RTL baseline performance within 4% latency and 30% area while also providing an additional optimization unexplored in current implementations.

LightSEC 2025

(First-Author)

A Comparison of Unified Multiplier Designs for the FALCON Post-Quantum Digital Signature

Designed and evaluated four multiplier architectures —Baseline, Tiling, Comba, and Karatsuba—tailored for unified 64-bit integer and 53-bit floating-point operations in FALCON.
On FPGA (Xilinx Virtex-7), Karatsuba achieved the best area efficiency (19.2% better than baseline), while Tiling delivered the highest energy efficiency (35.9% improvement).
On ASIC (SkyWater 130nm), Karatsuba remained area-optimal, but Comba unexpectedly outperformed others in energy efficiency (51.5% over Karatsuba, 22.8% over baseline).

Page updated

Google Sites

Report abuse