The way we work with digital electronics is rapidly changing and open source tools and languages are becoming increasingly important. Many different tools exist to cover the design, management and verification of digital chip designs, but how can they be combined in the best way to use their combined strengths? This was a learning experiment to find out by designing an example with Google's XLS High Level Synthesis tool, verifying it using the popular Python-based cocotb framework, using FuseSoC to package it as a reusable component and EDAlize to run the design through different synthesis and simulation flows. Lets roll!
If you want to reproduce my findings, the following or similar setup should help you reach there quicker:
$ uname -a -> Linux Ubuntu20 5.15.0-67-generic #74~20.04.1-Ubuntu SMP Wed Feb 22 14:52:34 UTC 2023 x86_64 GNU/Linux
OS is running in a Virtual Box v7.0 VM with 14 cores of 12th Gen Intel(R) Core(TM) i7-12700H and 12 Gigs of LPDDR5 SDRAM.
The reason to choose Ubuntu 20.04 is to satisfy the intersection of dependency requirements between the various tools used in the experiments.
Tool Versions used in the experiments:
XLS: Built from source
FuseSoC v2.1
cocotb v1.7.2
Icarus Verilog v11.0
Modelsim or QuestaSim v2022.3
Xilinx Vivado v2022.2
Python v3.8.10
XLS implements High Level Synthesis (HLS) toolchain which produces synthesizable designs (Verilog and SystemVerilog). This toolchain is from Google, it is fully open source and is under active development. Below, we will see the issues faced while installing XLS and a quick glance at the available front ends for HLS. Then we will see how we intend to use XLS as part of our learning experiment.
Missing instruction to install C++20 support, else installation will be unsuccessful with tests failing. Commands to install the required support:
$ sudo apt install gcc-10 gcc-10-base gcc-10-doc g++-10
$ sudo apt install libstdc++-10-dev libstdc++-10-doc
Installing C++17 and C++20 on Ubuntu and Amazon Linux - Duccio Marco Gasparri
XLS build uses (a bit weird) build system called Bazel. There is a slightly steep learning curve here and one of the issues faced was to get the build to successfully complete without going out of memory. Got this working by capping JVM heap and constraining the bazel jobs to 10.
$ bazel --host_jvm_args=-Xmx10g --host_jvm_args=-Xms512m test --sandbox_debug --jobs=10 -c opt -- //xls/...
Currently, XLS supports two high level language front ends, namely DSLX and XLSCC.
XLSCC: The function is written in C++ and then compiled to System/Verilog module.
DSLX (used in this blog): This is a Domain Specific Language in XLS (DSLX) that mimics Rust, with focus on supporting hardware oriented dataflow based programming methodology. The intuition here is that dataflow DSLs are a good fit to describe hardware as compared to languages designed assuming von Neumann style computation.
Installation of XLS using Bazel with both DSLX and XLSCC support took almost 6 hours in my machine setup.
The next task was to find a suitable project to learn the DSLX workflow and to go all the way to programming the generated RTL on to the FPGA. A strong focus was given to using stable and actively developed open source tool chains as much as possible. As a result, the LED to believe - Blinky project was chosen as the candidate for experimentation. The goal is to have the blinky functionality in DSLX and use the XLS toolchain to compile to Verilog. From here, there needs to be support for Verilog simulation, cocotb simulation and then generating the bitstream for FPGA. The FPGA board at hand is the Nexys A7 from Digilent with Xilinx ARTIX-7 100T but this work can be easily extended for all the boards supported in the LED to believe - Blinky project. For bitstream generation for the target platform, of course, we rely on the Xilinx Vivado toolchain. In order to package all these tools and EDA-lize them, the best tool is undoubtedly FuseSoC. It is well documented and currently under active development.
Note that the blinky project is chosen to just show the proof of concept and not in anyway showing the true potential of XLS to build deep dataflow pipelines.
The complete project is available at https://bitbucket.org/prajithrg/xls_blinky. Now lets look at the different pieces building up the puzzle!
The goal here is to write a sequential process using DSLX to mimic the behaviour of the standard blinky verilog file (see below) and at the same time trying to use minimal DSLX constructs to build the module:
XLS has "Procs" for communicating sequential processes for modeling DSLX sequential and stateful modules. A proc contains:
A config function that initializes constant proc state and spawns any other dependent/child procs needed for execution.
A next function which is an infinitely looping function that contains the actual logic to be executed by the sequential process.
Procs communicate to the outside world using channels. These are entities into which data can be sent and from which data can be received. Each channel has a send and a receive endpoint: data inserted into a channel by a send op can be pulled out by a recv op. For example, in our toy blinky example, the only output we need is the q state to drive the LED, we require only one output Boolean channel channel, qOut: chan<bool> out.
Lets go through blinky_xls.x to check the key lines to understand the flow:
pub proc blinky <CLK_FRQ:u32, CLK_FRQ_BITS:u32=addr_width(CLK_FRQ)> defines a parameterized proc blinky with CLK_FRQ as the first parameter and CLK_FRQ_BITS as the second parameter which is derived from the first using the addr_width function.
qOut: chan<bool> out; declares the blinky proc member channel similar to class data members in software. In DSLX, proc members are constant values, and are set by the output of the config function. Member values can be referred to inside the next function in the same way as locally-declared data.
config( qOut: chan<bool> out) is the function which configures the qOut member channel and returns the declared member
next(tok: token, state: blinkyState<CLK_FRQ_BITS>) is the real body of the blinky sequential process. The next function maintains and evolves the blinkyState and is responsible for communicating with the outside world, as well. In our toy blinky example, we do not have any input channels coming into the Blinky proc from other procs. What we have is the BlinkyState structure which is the recurrent state to maintain in the blinky sequential process as in the always block in the Verilog implementation.
The contents of the next function is self explanatory, as in the countState and the qState are updated each clock cycle based on the previous state.
At the end of the proc, we terminate with the result value which the new BlinkyState. This final value becomes the input state for the next iteration. This is how recurrent state is managed by procs: a state value is provided to the next function, and the result of that function is used as the next iteration's state input.
In the XLS documentation about procs, it is mentioned that procs can have several state elements but this feature wasn't working in the installed version of XLS. So, instead of different states, a BlinkyState structure clubbing the countState and qSTate was defined and used. This worked fine!
There are three tools in the the XLS chain to convert DSLX to verilog:
ir_converter_main: To convert the DSL file to Intermediate Representation: $ ir_converter_main --top=blinky xls/blinky.x > blinky.ir
opt_main: To optimize the IR: $ opt_main blinky.ir > blinky.opt.ir
One issue with opt_main tool is that without mentioning the correct top function name, the tool was optimizing the code completely based on the main function. To help the tool optimize correctly, the correct top name should be given as parameter to the tool. For example in our case, if the experiment clk_req_hz is 100 kHz, the top function name thats spit out from the ir_converter_main to be used will be __blinky__main__blinky_0__100000_17_next. Run: $ opt_main --top=__blinky__main__blinky_0__100000_17_next blinky.ir > blinky.opt.ir
codegen_main: To generate verilog from the optimzed IR: $ codegen_main blinky.opt.ir --generator=pipeline --delay_model=sky130 --output_verilog_path=blinky.v --module_name=blinky --use_system_verilog=false --pipeline_stages=1 --reset=rst --flop_inputs=false --flop_outputs=false
Please refer the codegen documentation for details about the various options.
The verilog file spit out of XLS codegen for a clock frequency set at 100 kHz looks like this:
As seen above, there are some extra signals which gets introduced by the XLS code generator. As mentioned before, the idea behind procs is to build communicating sequential modules in a dataflow fashion. Consequently, the code generator adds light weight handshaking mechanism (ready and valid signals) to control the back pressure in the dataflow between the modules. As our toy blinky example has just a single block, there isn't any communication for synchronization per say to the outside world and these signals are hard wired using the Verilog wrapper blinky top file as below:
The verilog testbench used is same as the one from the LED to believe - Blinky project to show the functional correctness of the Verilog spit out of the XLS compilation.
Another very popular and actively developed open source verification framework using Python is cocotb. The goal here was to check the claimed easiness of using cocotb to verify hardware designs like software. The same blinky test written in Verilog is converted to a python cocotb testbench and it was a breeze to get it done. The documentation is easy to follow and well done. One point to note here is the parameter passing to cocotb, FuseSoC takes care of this by first setting in the Verilog file and then the instantiated dut object can be used to read the clock frequency in the cocotb file as dut.clk_freq_hz.value.
The goal here is to use a build/package manager to run a single command to automate all the above steps (target flows), i.e., to convert DSLX to Verilog, perform different simulations or to burn the bitstream on to the FPGA. Here comes FuseSoC which is a popular build system for digital hardware (e.g. Verilog or VHDL designs), and a package manager for reusable blocks in hardware designs. FuseSoC checks all our requirements as in to run tests against multiple simulators, to port designs to new targets and to create compile-time or run-time configurations. FuseSoC is well documented and is easy to get started. The main steps done to get started with FuseSoC are as below and as always, all files described here are available in the repository.
A core file is the fundamental component in FuseSoC which configures and packages the xls_blinky project. It contains the various configurations for the filesets, the target flows (Verilog simulation, cocotb simulation, FPGA build) and the XLS to Verilog Generator (more about this below). More about writing core files can be found here.
As we saw before, FuseSoC core files list files that are natively used by the backend, such as Verilog files. However, in case of DSLX files, as these needs to be converted to Verilog, FuseSoC provides Generators which are a mechanism to generate core files on the fly during the FuseSoC build flow. So, in our case, we have the xls_verilog_gen python script, which takes as input the:
various tools for converting DSLX to verilog,
location of the XLS file,
top level module name, and
clock frequency.
I could not find a way to set compile time parameter values to DSLX file using existing XLS ir_converter_main tool chain. So, a hack was done to find and replace the parameter in the XLS file using sed bash utility through the generator python script.
Once the <PARAM_CLK_FRQ_HZ>parameter is supplied, the 3 steps in the XLS to Verilog conversion is performed and then the resultant blinky.v Verilog file is written to the generated FuseSoC source folder.
The above generator is automatically called in the various FuseSoC target flows.
Next step is to add the xls_blinky project library to FuseSoC to register the cores, this generates a fusesoc.conf file inside the cloned xls_blinky repository directory.
$ fusesoc library add <xls_blinky repo path> --sync-type=local
$ fusesoc run --target=sim_verilog fusesoc:xlsVerilogEg:blinky
$ MODULE=blinky_cocotb fusesoc run --target=sim_cocotb fusesoc:xlsVerilogEg:blinky
The core file has the nexys_a7 target which performs the synthesis, placing and routing, and the bitstream generation of the blinky design. This could be extended to support any board listed in the LED to believe - Blinky project. The Nexys A7 board device constraints are set in the blinky.xdc file for the clock, LED and the reset. The command for running the FPGA build and burning the bitstream is as below:
$ fusesoc run --target=nexys_a7 fusesoc:xlsVerilogEg:blinky
It was fun to see the level of automation achieved using the capabilities of FuseSoC to go from DSLX all the way to the bitstream and seeing the LED Blink on the Nexys A7 board with just running a single FuseSoC command!
That's it folks!