paper on RIFFA has been published in the 20th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines.RIFFA is a Reusable Integration Framework for FPGA Accelerators. It connects IP cores on an FPGA with user software running on a commodity workstation to provide high bandwidth, low latency synchronization and communication. The framework requires a PCIe enabled workstation and a FPGA on a board with a PCIe connector. RIFFA provides communication and synchronization capabilities with a standard interface for both software and hardware. It is comprised of Verilog and VHDL HDL source, software libraries, and a device driver. Our
RIFFA 1.0 has been deprecated in favor of RIFFA 2.0. In addition to numerous enhancements, additional language support, additional OS support, and wider FPGA platform support, RIFFA 2.0 can achieve the maximum transfer rate over the PCIe links we tested.
FPGAs are very flexible devices and can be used to accelerate applications. But communication and synchronization between software running on a traditional CPU and FPGA IP core is not a standard feature from most FPGA vendors. The level of connectivity between workstations and FPGAs is at the hardware protocol level. Application acceleration using FPGAs frequently requires building considerable components for basic functionality before any application logic can be implemented. Thus, most FPGA applications are designed as standalone systems. Integrated designs require more non-application related framework to be built and would most likely not be reusable. We hope to remedy this situation with RIFFA.
With RIFFA, software engineers are presented with convenient user space library API that hides the low level communications details. Software can send and receive data to and from FPGA IP cores by writing only a few lines of code.
FPGA designers are provided a simplified interface for communicating with software applications. This interface hides the timing and protocol details of communicating over the various buses from the IP core.
The communications model is based on direct memory access (DMA) transfers and interrupt/doorbell signaling. This achieves high bandwidth over the PCIe link. Whenever possible, superfluous data copying and delays are removed to reduce latency.