Zynq and AXI introduction

The Xilinx Zynq SoC is the main core of the Red Pitaya board, so here we present the main ideas behind this chip.

The Zynq device is an hybrid platform which consists in two ARM processor plus one FPGA in one chip, enabling high transfer rates inside the device. In the Xilinx nomenclature the processor side is called Processing Sistem (PS) and the FPGA side is called Programmable Logic (PL).


It is of public knowledge that FPGAs has an enormous capability to perform computations in a reduced time compared with the processors, but using a FPGA to solve a complex decision making problem could be really hard. So one idea that immediately pop up is to use the processor for the decision making and use the FPGA to make the required computations, so we have the best of the two worlds in those hybrid chips.


In order to use the 2 processors and the FPGA jointly we must define a communication protocol between the systems. The common answer is to use the Advanced eXtensible Interface (AXI ) protocol, which is the ARM supported bus protocol. Xilinx took this bus protocol and starting from the 7 series, AXI became a Xilinx standard (If you look at the Xilinx IP cores most of them have some sort of AXI interface).


The image below shows the basic interfaces between the PL and the PS. You could see that the PS has Master ports (MGP0, MGP1) to control slaves in the FPGA side. There are also the High Performance ports (HP0, HP1) which are mapped to the DRAM of the processors, we mainly use this ports to make Direct Memory Addressing (DMA) to the RAM of the processor from the FPGA.

Also there is an ACP port which gets map into the ARM processor cache memory and SGP

Zynq Architecture Diagram (Mohammad Sadri)

Basic AXI

The AXI protocol is a standard that covers different types of operations but always follows the master-slave architecture where a master send some requirement and the slave respond accordingly. The 3 flavors of AXI are: AXI full, AXI-lite and AXI-stream. We are going to look at them separated.

Be warn, this is truly the most basic description of the AXI protocol and is only intended as an introduction, if you need to develop a custom AXI block you should look at the documentation.


AXI-stream:

Is the most basic AXI flavor, and is intended to be use as interface between streaming blocks like for example a ADC and a FFT, so probably you have had encountered this protocol in some Xilinx Simulink blocks. For this properties is the default protocol used in the IP cores that doesn't talk with the PS side directly.

The protocol has 3 basic signals: tdata, tready and tvalid. And the protocol is really simple, when the slave is ready to receive data it raise the tready signal, in the other side when the master wants to send data raise the tvalid signal and puts the data in the tdata. When tready and tvalid are both high, the transaction occurs.

That implies that if you want to create a master AXI stream you have to hold the data in the tdata port until receive the tready flag and then move out to the next sample, this leads easily into a FIFO implementation. To be fair, in the typical streaming interface that we use we have a valid sample from the ADCs in each clock cycle, so is enough to set tvalid=1 after the sync pulse arrives and then keep it high.


In some IP there is also an extra optional signal, tlast which is set to 1 at the end of a package to tell to the slave that the package has been delivered. So for example a 8192 FFT will send a tlast flag in the final frequency channel.


There are more optional signals but this four are the most typical ones that you will encounter.

AXI stream example: look with extra care that the master pulls down the tvalid flag in the middle after the P1 value and when has the value P2 raise the tvalid flag again. Also see that the master holds the P3 value until the slave puts the tready in 1 and only then continue with the next data sample. ( axi-ref )

AXI FULL:

The AXI full interface consists in 5 signal channels that enables burst write and burst read transactions in an independent way, so we could write and read at the same time.

Each device has it own address range in a memory mapped fashion, so the master could send commands to the slave writing in the right address. Basically is an enhanced memory access interface that enable multiple readings and writing in one command, handle errors between other features.

The 5 channels of the AXI full interface are the read address channel, read data channel, write address channel, write data channel and response channel. Each channel is composed by several signals that defines the type of transaction.

Just to give a taste of the protocol we are going to describe shortly the main signals of the write channels.


Here are listed the main signals of the write address channel:

  • AWADDR: Write Address base address for burst transfer.
  • AWLEN: Write Address length, number of transactions (or beats) to be made in the burst write.
  • AWSIZE: Size of each transaction (for example 32bits for each data transferred).
  • AWBURST: Type of the burst, the typical configuration is incremental so we write (base_addr: base_addr+len*size) range.
  • AWVALID: Indicate that the data given is valid, just like the AXI stream.
  • AWREADY: The slave indicate that is ready to receive a write request.


Next are the main signals of the write data channel:

  • WDATA: data to be written.
  • WSTRB: Write strobe. Is a sort of a way to mask the data in WDATA. For example if we are transfer 32 bits per beat, the strobe is 4 bits and the value 4'b1010 means that we are enabling the write of the second and fourth byte in the beat.
  • WLAST: High when the last beat is been transfer.
  • WVALID: The data in WDATA is valid.
  • WREADY: The salve is ready to receive data.


And the main signals of the response channel are:

  • BRESP: The status of the transaction, for example fail, success, etc.
  • BVALID: The value in BRESP is valid (even if the transaction fails).
  • BREADY: The Master is ready to receive the response.


Those are only the main ones, you are invited to look at the documentation for a full description of each signals and to discover others.

The read channels are similar to the write ones, for example there is a ARADDR which encode the base address to read or a RDATA where is the read data, etc.


AXI-INTERCONNECT:

The last main character of this history is the AXI-interconnect. This module is an arbiter of the request, it is his job address each request to the right receiver. Also it has several features like handle transaction errors, decide the priority of the transactions, cross different clock domains. So it's where all the magic occurs.

Xilinx provide an implementation of AXI interconnect, so if you made a custom project you only ave to ensure that it talks AXI, connect it to the AXI interconnect and then you are able to send and receive request from the PS.


Vivado block design example: At the left there is the Zynq PS connected as a master to the AXI interface and two devices connected as slaves. In this configuration the PS could read and write to the slaves.

AXI-lite:

This is also a memory mapped protocol but it not support burst transactions. Is intended to support registers that could be written, for example if you want to write a enable or reset from the PS you don't want to implement an AXI full protocol to just target one address, instead you could use AXI lite.

The blocks who handles AXI lite should be connected to an AXI interconnect to be accessible to the master via the usual addressing.

In the CASPER toolflow you could think that AXI lite its the form that takes the yellow registers in the Simulink interface.


About the Zynq PS

The zynq PS correspond to two arm cortex A9 among other components. There are two main ways to use the PS, use it in a bare metal fashion and the second one is to run an OS in the processors.

The red pitaya comes with a linux image charged in the SD card but you could make your own linux image to support custom devices made inside the PL. For example, we could generate a driver to handle interruptions triggered by the PL into the PS in the kernel level. The most typical reason to make a custom Linux bootable image is to free physical memory addresses to use it with a DMA in the PL.

The advantages to run an OS in the PS is that you immediately have a bunch of drivers that comes with the Linux interface like USB, ethernet, etc, you could install things with the package manager, for example the red pitaya has python installed and runs a http server.

About the installing things you have to have one thing in mind, the PS has limited resources so everything you run are resources that are not being used in a critical task in the system.

When you run it in a bare metal way you have to use the Xilinx Software Development Kit (XSDK) software where you program the actual behavior of the system in C. It is literally like program any micro so you have a deeper control of what is being made inside de processor, but also doesn't has the nice interface with some peripherals like ethernet connection.

The nice thing about program in the bare metal way is that you have the whole physical address free to play with and I personally found nice to debug.

ROACH digression:

The ROACH uses the old XIlinx tools, so even that there are some IPs that support AXI is not the usual.

The Virtex devices uses the CoreConnect standard. So instead of AXI bus the ROACH uses the On chip Peripheral Bus (OPB) to connect the FPGA to the PowerPC where the registers and brams are mapped in a similar way that the AXI uses.

If you are curious go to a compiled model folder and take a look to the generated core_info.m file. In this file there are listed all the devices accessible by the PPC and their addresses. Now get into the borph system using the default telnet connection (no the one in the katcp), upload the model and issue

kcpcmd read <addr> 0 4

Where you have to replace the <addr> with one of the core_info then you should be able to read the value store in that address.

This kcpcmd is a custom command made by the katcp developers, but you could access to this device making a simple C script. Like all Linux based system, the devices gets mapped into a some sort of file (eg: tty) the FPGA gets mapped into the "/dev/roach/mem" file.

To access to this file you could use the C command "mmap" which enable you to load some portion of the file, for our propose this portion is the memory that you are accessing. To make the executable you have to cross compile using the command

powerpc-linux-gnu-gcc static -o <name> <target_file.c>

The static part is important because the PowerPC doesn't have the typical libraries (If i remember right it doesn't have stdio).

To send the script to the PowerPC you could use the linux command nc.


Maybe here you are going to discover the horror that one register doesn't gets mapped into a single byte in the memory and takes several positions (0xFF if I am not wrong). If you change the first address of this part of the memory you should modify the value of the register in the FPGA. The other bunch of addresses are there because you have to keep certain memory alignments. So make your registers worth!


A tip about the mmap function is that the appropriated length to map should be expressed in multiples of the page size = 4096 or you could encounter the dreaded segmentation fault.

Another tip should be to take care of how you name you're CASPER yellow blocks, if you choose wisely your name you could end with all your brams next to each other making possible to read all with only one mmap command.

A sample code is attached at the bottom of the page to poke around.

The final comment about the ROACH is that the installation of the CASPER environment suppose that you use the Xilinx embedded edition, this edition (among other things) enables you to use the Xilinx Platform Studio program which is the intended to make the bus connections between a processor and the FPGA. You could access to this program issuing:

source /opt/Xilinx/14.7/ISE_DS/settings64.sh
xps

Now if you go to a compiled design you could find a xps project and open it to take a look at the interfaces.