FALCON is a NIST-selected post-quantum digital signature scheme whose performance bottleneck lies in the SamplerZ subroutine for discrete Gaussian sampling. We present a throughput-optimized, full hardware implementation of SamplerZ that introduces several architectural and algorithmic innovations to significantly accelerate signature generation. Our design incorporates a datapath-aware floating-point arithmetic pipeline that strategically balances latency and resource utilization. We introduce a novel Estrin's Scheme-based polynomial evaluator to accelerate exponential approximation, and implement a constant-latency BerExp routine using floating-point exponentiation IP, thereby eliminating critical-path logic associated with fixed-point decomposition. Additionally, we optimize rejection handling through parallel sampling loops, full loop unrolling, and a speed-optimized flooring circuit, collectively enabling high-throughput discrete Gaussian sampling. As a result, these optimizations yield FPGA implementations of SamplerZ that achieve 55%-71% reduction in sampling latency, leading to a 36%-46% reduction in overall FALCON signature generation latency compared to the current state-of-the-art. Furthermore, our design achieves up to a 48% reduction in the Area-Time Product (ATP) of SamplerZ, setting a new benchmark for high-throughput and efficient discrete Gaussian sampling, advancing the practical deployment of post-quantum lattice-based signatures in high-performance cryptographic hardware.