DE0-Nano | NiosII LCD driver

Here's how I hooked up my LCD display to the NiosII from the previous tut. Part of the reason why I upgraded the memory to 32MB is so I have enough headroom for frame buffers. The PSP screen is quite high res (for a micro controller) with a 24 bit color resolution. In order to see what it can do I'd like to use the full color range which will require 4 Bytes per pixel (due to memory alignment, I'll get to that later) and 510KB for a full frame. Twice that for double buffered drawing.

I'll go "quickly" through the steps of 
  • adding a Video Sync Generator core to the NiosII which will interface with the screen and generate all the necessary signals
  • adding a 9Mhz video clock to the existing PLL component
  • a Scatter-Gather DMA Controller core that will feed the Sync Generator with data
  • a few bits and bobs to glue it all togrther
  • and then write a few lines of code to test the screen


So the basis for this new extension is going to be the previously built NiosII with SDRAM project.
Copy the folder and rename it.
Then open the project from the new folder in QuartusII and fire up SOPC builder.

Scatter-Gather DMA Controller

  • Add the Scatter-Gather DMA Controller (Bridges and Adapters>DMA>Scatter-Gather DMA Controller).
    • change the Transfer mode to Memory To Stream.
    • leave everything else on defaults and click Finish.
  • rename it to sgdma.
    you'll see a lot of complaints about missing connections.
  • connect the sgdma.descriptor_read to onchip_memory2.s1 by right-clicking the signal descriptor_read (listed under the sgdma component) and from the pop-up menu selecting sgdma.descriptor_read Connections>onchip_memory2.s1
  • do the same thing for sgdma.descriptor_write  (sgdma.descriptor_write Connections>onchip_memory2.s1)
  • connect sgdma.m_read to sdram.s1
    ignore the error about the missing out connection for now. We'll connect that in the next step.
  • if not already connected automatically, connect sgdma.csr to cpu.data_master

Avalon-ST Dual Clock FIFO

To cross the two clock domains (50Mhz RAM, 9MHz display pixel clock) and to smooth out potential memory access collisions between DMA and CPU we add a Dual clock FIFO buffer.
  • Add an Avalon-ST Dual Clock FIFO (Memories and Memory Controllers>On-Chip>Avalon-ST Dual Clock FIFO)
    • change the Symbols per beat to 4
    • set the FIFO depth to a value of your liking. I set mine to an overkill of 512. The larger the buffer the longer the allowed data traffic jams can be but this is on-chip memory and it's quite limited.
    • enable Use packets
    • click Finish
  • rename the component to dc_fifo
  • connect sgdma.out to

Pixel Converter

Now the Video Sync Generator expects to be fed with triplets of color bytes. A 24 bit symbol for each pixel. The dma controller however reads data in 4 byte chunks. Which means there is one unnecessary byte in every 32 bit word.
The Pixel Converter core just snips off that byte.
  • add a Pixel Converter component (Peripherals>Display>Pixel Converter (BGR0->BGR))
    • change the Source symbols per beat to 1
    • click Finish
  • and rename it to pixel_converter
  • connect dc_fifo.out to

Video Sync Generator

And finally we get to the Sync Generator which reads the data stream and presents the color signals to the display while also generating various sync signals.
  • add a Video Sync Generator (Peripherals>Display>Video Sync Generator)
    • change the Data Stream Bit Width to 24
    • Beats per Pixel to 1
    • Number of Columns to 480
    • Number of Rows to 272
    • Horizontal Blank Pixels to 43
    • Horizontal Front Porch Pixels to 2
    • Horizontal Sync Pulse Pixels to 41
    • leave the Horizontal Sync Pulse Polarity at 0
    • Vertical Blank Lines to 12
    • Vertical Front Porch Lines to 2
    • Vertical Sync Pulse Lines to 10
    • Vertical Sync Pulse Polarity to 0
    • Total Horizontal Scan Pixels to 525
    • and Total Vertical Scan Lines to 286
      (Most of these values can be found in Sharp's datasheet for the screen)
    • click Finish
  • rename it to video_sync_generator
  • connect pixel_converter.out to

Adding a video clock

The video sync generator and it's inputs require a clock that governs the refresh rate of the output screen, a pixel clock. Including front porches and blank periods we need to tick off 525*286 pixels and that 60 times a second (50Hz works too but lower rates will start to flicker). This results in a pixel clock of a little over 9Mhz. We have a lot of unused clocks left over in our PLL module so let's use one for this.
  • select the shift_clock module and click Edit to open the wizard again
  • click Next on the next few screens until you get to tab 3) Output Clocks
  • leave clock c0 as it is, it's still driving the SDRAM. click next to get to clock c1
  • tick the checkbox Use this clock
  • select the Enter output frequency radio button and set the frequency to 9 MHz
  • leave everything else on Default and click Finish (twice)
    Up top, back in SOPC builder in the Clock Settings box our new 9MHz clock signal should now appear as shift_clk_c1.
  • now we have to tell the pixel-clock-components to use this clock instead of the default clk_50
    • tell dc_fifo.out to use shift_clk_c1 by clicking on the Clock column next to it that still says clk_50.
      A selection list will pop up from which you can select the new clock source shift_clk_c1.
    • set also to shift_clk_c1
    • set to shift_clk_c1

Wrapping up

Before we leave the SOPC builder and generate the new core lets switch the cpu back to use onchip_memory for it's program memory. Compared to onchip memory the SDRAM is actually quite slow. On top of having to service the display continuously we are also going to constantly hammer it with draw calls. It's not a bad idea to run the program from somewhere else (if possible).
  • select the cpu and click Edit to open the wizard
  • set the Reset Vector back to onchip_memory2 with Offset 0x0
  • and set the Exception Vector to onchip_memory2 with Offset 0x20
  • click Finish
Our new core should now looks like this:

  • click Generate to build the core and exit SOPC builder. Save the file when requested.
  • back in QuartusII, recompile the design

Updating the symbol and pin assignments

Now the following pin assignments obviously depend on how you connected the display to your DE0-Nano board. As an example I'll list the connections I used for my prototype board.
  • double click the mynios2.bdf file in the project navigator to open it in the schematic editor
  • to prevent a double creation of pins select all pins and wires around the symbol and delete them
  • right click the Nios symbol and select Update Symbol or Block (select any of the options presented)
    the new signals from the Video Sync Generator will appear in the symbol

  • right click the symbol again and select Generate Pins for Symbol Ports
  • Save the file and recompile
Next, just as before assign the pins of the Video Sync Generator by either using the Pin Planer or by editing the mynios2.qsf file (here are my assignments). Note that we are salvaging 2 pins from the LED Port for the DISP signal and the backlight. Make sure to disconnect PIN_R8 from out_port_from_the_pio_led[7] and PIN_L3 from out_port_from_the_pio_led[6].

 Nios Port Screen Pin  Pin Assignment
shift_clk_c1_out    CLK  D9  
HD_from_the_video_sync_generator    HSYNC  B11  
VD_from_the_video_sync_generator    VSYNC  D11  
out_port_from_the_pio_led[6]    Backlight  B12  
out_port_from_the_pio_led[7]    DISP  E10
RGB_OUT_from_the_video_sync_generator[0]    RED0  D2
RGB_OUT_from_the_video_sync_generator[1]    RED1   C3
RGB_OUT_from_the_video_sync_generator[2]    RED2    A2
RGB_OUT_from_the_video_sync_generator[3]    RED3    A3
RGB_OUT_from_the_video_sync_generator[4]    RED4  B3
RGB_OUT_from_the_video_sync_generator[5]   RED5    B4
RGB_OUT_from_the_video_sync_generator[6]   RED6    A4
RGB_OUT_from_the_video_sync_generator[7]   RED7    B5
RGB_OUT_from_the_video_sync_generator[8]   GREEN0  A5
RGB_OUT_from_the_video_sync_generator[9]   GREEN1    D5
RGB_OUT_from_the_video_sync_generator[10]   GREEN2    B6
RGB_OUT_from_the_video_sync_generator[11]   GREEN3    A6
RGB_OUT_from_the_video_sync_generator[12]   GREEN4    B7
RGB_OUT_from_the_video_sync_generator[13]   GREEN5    D6
RGB_OUT_from_the_video_sync_generator[14]   GREEN6    A7
RGB_OUT_from_the_video_sync_generator[15]    GREEN7    C6
RGB_OUT_from_the_video_sync_generator[16]    BLUE0  C8
RGB_OUT_from_the_video_sync_generator[17]    BLUE1    E6
RGB_OUT_from_the_video_sync_generator[18]    BLUE2   E7
RGB_OUT_from_the_video_sync_generator[19]     BLUE3    D8
RGB_OUT_from_the_video_sync_generator[20]    BLUE4    E8
RGB_OUT_from_the_video_sync_generator[21]    BLUE5    F8
RGB_OUT_from_the_video_sync_generator[22]    BLUE6    F9
RGB_OUT_from_the_video_sync_generator[23]    BLUE7    E9

  • recompile the design
  • and upload it to the board using the Programmer



  • find your project folder on disk and delete the contents of the software sub folder as well as the .metadata folder
  • then fire up NiosII Software Build Tools for Eclipse and switch the workspace to your project folder. The workspace should open up completely empty
  • select File > New > NiosII Application and BSP from Template
  • select your DE0_NANO_SOPC.sopcinfo file as target hardware and call the project display_test
    click Finish
    The hello_world project (display_test) and the BSP project are created for you
  • right click the BSP project and select NiosII > Bsp Editor...
    • in the Bsp Editor under the Main tab in Settings/Hal tick the box enable_reduced_device_drivers
    • also tick enable_small_c_library
    • go to the Linker Script tab and under Linker Section Mappings switch all sections (.bss, .heap, etc.) to onchip_memory2
    • click Generate and Exit to close the Editor
  • select Project > Build All
  • right click the display_test project and select Run As > NiosII Hardware
    the demo message should appear in the Nios Console
Hello from Nios II!


Now that we know that the core is functional we can initialize the DMA controller to send some data to the display.
We need to build a data structure in memory that tells the DMA controller what memory we want to be copied where. This structure is called a Descriptor and each Descriptor can describe a transfer of 64KB at a time. That means we need to chain a bunch of them together to transfer the entire frame buffer (480*272*4 = 522240 Bytes).

Here's a shortended version of the code with comments:

// DMA descriptors
alt_sgdma_descriptor dmaDescA[8];
alt_sgdma_descriptor dmaDescEND;

// this subroutine initializes a chain of descriptors
void init_framebuffer(alt_sgdma_dev *dma)
    // 480*272 lines * 4 bytes = 522240 bytes
    //   65532 (0xfffc) bytes * 7 = 458724
    //  +63516 (0xf81c) bytes

    // frame buffer A
    alt_u8* buff = (alt_u8*)frameBufferA;
    int i;
    for(i = 0; i < 8; ++i) {
	alt_u16 size = (i<7)?0xfffc:0xf81c;
				(i<7) ? (&dmaDescA[i+1]) : &dmaDescEND,
				size, 0, i==0, i==7, 0);

int main()
    // initialize the DMA and get a device handle
    alt_sgdma_dev *dma = alt_avalon_sgdma_open("/dev/sgdma");

    // now we just continuously copy framebufferA to the Video Sync Generator
    while(1) {
        // init the descriptors

        // transfer
        alt_avalon_sgdma_do_sync_transfer(dma, dmaDescA);

The complete code is here:  hello_world.c1
This will effectively display the contents of the (uninitialized) memory on screen. Usually colorful garbage. Almost there!

Interrupt Service Routine

Now, our cpu is not going to be very useful if we keep it busy transferring data to the display. That's what the DMA controller is for and it can do this independently from the cpu. In fact it can be set up to run infinitely in a loop without any further cpu intervention. However, it can be very useful to be notified when a frame transfer has finished. Sometimes it is necessary to synchronize a program to the refresh rate or to switch the currently displayed memory to a different frame buffer.
This can be done with an interrupt service routine that's registered with the DMA controller and is called every time another descriptor chain has completed it's transfer.

Here's a shortened version of the program slightly modified for asynchronous DMA transfer, interrupt service routine and some drawing routines.

// The InterruptService Routine (actually a callback function called by the ISR)
void my_dma_callback(void *data)
    // reset the OWNED_BY_HW bit in the descriptors to reuse the chain
    int i;
    for(i = 0; i < 8;++i)
        dmaDescA[i].control |= 

    // trigger another transfer all over again
    alt_avalon_sgdma_do_async_transfer((alt_sgdma_dev*)data, dmaDescA);

// this subroutine initializes a chain of descriptors, registers the
// interrupt service routine and starts the first asynchronous transfer
void init_and_start_framebuffer(alt_sgdma_dev *dma)
    // 480*272 lines * 4 bytes = 522240 bytes
    //   65532 (0xfffc) bytes * 7 = 458724
    //  +63516 (0xf81c) bytes

    // frame buffer A
    alt_u8* buff = (alt_u8*)frameBufferA;
    int i;
    for(i = 0; i < 8; ++i) {
        alt_u16 size = (i<7)?0xfffc:0xf81c;
	    (i<7) ? (&dmaDescA[i+1]) : &dmaDescEND,
	    size, 0, i==0, i==7, 0);
	    buff+= size;

    // register our interrupt routine
            dma, my_dma_callback, 
    // initiate the transfer
    alt_avalon_sgdma_do_async_transfer(dma, dmaDescA); } // basic drawing routines -------------------------------------- // // draw a pixel inline void setPix(const int x, const int y, const Color col, alt_u32* buffer) {     buffer[x + y * 480] = col.color32; } // bresenham line drawing void line(alt_u32* buffer, int x0, int y0, int x1, int y1, Color color) {     ... }
// fill the screen with a fractal (quite slow)
void MandelBrot(alt_u32* buffer) {     ...

int main() {     // switch on the backlight     //     IOWR_ALTERA_AVALON_PIO_DATA(PIO_LED_BASE, 64);     // initialize the DMA and get a device handle     //     alt_sgdma_dev *dma = alt_avalon_sgdma_open("/dev/sgdma");     // assert the DISP signal     //     IOWR_ALTERA_AVALON_PIO_DATA(PIO_LED_BASE, 128+64);
    // start the DMA
    init_and_start_framebuffer(dma);     // now we actually have some time to draw something     //     Particle particles[2] = { {100,0, -1, -1}, {0, 120, -1, -1} };     Color col; int count;
    for(count=1;;count++) {

        // blink an led to see the program running (gonna be too fast to see)
    	IOWR_ALTERA_AVALON_PIO_DATA(PIO_LED_BASE, (count & 0x0001)+128+64);

    	// draw a line in a random color
    	col.color8.r = rand()%256;
    	col.color8.g = rand()%256;
    	col.color8.b = rand()%256;
        line(frameBufferA, particles[0].x, particles[0].y, 
                            particles[1].x, particles[1].y, col);

        // bounce the particles around
        int i;
        for(i = 0; i < 2; ++i) {
            if(particles[i].x == 0 || particles[i].x == 479) particles[i].vx *= -1;
            particles[i].x += particles[i].vx;
            if(particles[i].y == 0 || particles[i].y == 271) particles[i].vy *= -1;
            particles[i].y += particles[i].vy;

    return 0;

The complete source code is here.

That's it for now. Happy drawing.

Related Links

Scatter-Gather DMA Controller (and most other IP cores)