WS2812 DMA Ping-Pong mode

In this page i will try to explain how i got the WS2812B control working with the DMA in ping-pong mode.

The WS2812B Timing:

First i will explain how the WS2812B data sent works. I will refer always to the WS2812B unless noted, the WS2812 has just different timing.

Has you can see the wave needs to have a period of 1.25uS or 800Khz. Notice that to send a 0 or a 1 you only need to change the value of wave between 0.4uS or 0.8uS, all the rest is the same.

So i need to set the output state to 1 from 0uS-0.4uS and 0.8uS-1.25uS to 0. Then set the outputs to 1 or 0 between 0.4uS and 0.8uS depending on the data you want to send to the WS2812B.

Using the DMA:

The Tiva is an ARM-M4 and like all ARM-M4 it has a DMA. This makes it possible to make automatic memory transfers from peripheral triggers (and software if you need). It makes it possible to send bytes from memory to the GPIO states, so you can control the 8 output ports of the GPIO with 1 transfer. This is also great if you want the processor to not be always busy cocontrollinghe WS2812B and also makes the timing more precise.

You can generate triggers with a GPIO input by configuring it to trigger the DMA and you can chose which event makes it trigger, in my case i have it in both rising edge and falling edge.

If you want to know more about the DMA visit Tiva DMA

How to get the timing right?

Well we can send data from the memory to the GPIO now but how do we get the timing right?

For that i used the PWM generator instead of a timer to generate a PWM. Why the PWM generator instead of a simple timer? Well, the PWM generator has 2 comparators per output and has inverting ability. I simply set the PWM signal to invert at 0.4uS, 0.8uS and 1.25uS. With a GPIO in both edges DMA trigger this makes the transfers happen at the right time.

The array needs to have always be like this per bit: 0xFF, data, 0x00. This raises the problem of memory usage since it needs 24*3 bytes per LED, but remember that this is already controlling 8 outputs, so it's 24*3bytes per 8 LEDs, 1 per output. There are more complex ways to avoid this extra memory usage but that i still haven't got into working with.

Why ping-pong mode transfers?

The Tiva DMA unfortunately only can do 1024 transfers per transfer set. This means a maximum of 14 LEDs per transfer.

That means that at least every 1024 item transfers the processor needs to set the transfer and re-enable the DMA channel. This will be needed of course since we want more than 14 LEDs per output.

Since the timing is very important i use the ping-pong mode.

In basic mode you would need to really fast set again a transfer, but the processor could be busy and not set it in the right time. The ping-pong mode has 2 transfers set up. In alternates between them when one ends so you have the whole transfer time to setup the next transfer.

It's something like:

Setup primary transfer and secondary transfer. Transfer primary jump to secondary. If you want 1 more transfer to happen, while the secondary happens setup the primary transfer. When the secondary ends it jumps again to the primary. If you want still 1 more just setup the secondary while the primary happens.

The way i have the array for this is something like:

Transfer size = (24*3)*10= 720. 24*3 is 1 LED and 10 is the number of LEDs I want to control per transfer.

Multiply = 6. This is how many transfer i want to happen, it's 6 so i control 60 LEDs.

The DMA transfer will jump 720 in 720 bytes for each transfer and that way they will send data for 60 LEDs.

Now how to have 32 istead of 8 outputs?

Well, it's simple right? We just add 3 more GPIO triggers and 3 GPIO outputs right? Well, more or less, yes.

The problem is that i had to stop the PWM generation right when the DMA transfer was done or for some reason everything would be wrong, weird since i expected the DMA channel to be disabled.

When you have just 1 channel working that wasn't a problem but since i had 4 being trigger from the same PWM output that something was a problem. Because the transfers have the same priority you never knew which one ended first so i didn't know in which DMA done interrupt i should stop the PWM.

I tried in all of them but that would not work every time, it caused some timing issues.

The solution for this was actually really simple. just add 40*3 0x00 at the end of the array. This is for the 50uS of the reset time. This plus the PWM generator stop in each DMA done interrupt did the trick.

And now i have the smoothest transitions i ever had with all the 32 outputs working like so:

Some bugs:

This bugs are the weirdest.

1 of them is that if i have an array with around 60 members (didn't fully tested but sometigh like 61,65,70 does it too) it causes a Fault ISR when i do GPIOIntRegister. But if it's 20 members or 80 members it works just fine. And i'm just changing the array size, nothing else. Weird right?

There's also other problem, i use 3 arrays to store a pattern. 1 for Red, other for Green and other for Blue. Well the first 6 members of the Green array are always 0xFF no mater what i do! Just weird things all arround.

Besides that it's all going well. Hope you like reading this explanation. I'm always open to suggestions or any help.

I'll keep posting updates

Here is a example code with 8 paralel outputs:

/* This code was made using IAR workbench free version. It's ment to work with the TM4C1294xl launchpad. I programed it using TivaWare. This code was made to control WS2812B LED strips and is adaptable for WS2812. There's 8 independent paralel outputs for higher refresh rate. I made one with 32outputs but to keep the fuctionality simple to understand this code is only for 8 outputs. The DMA is used to keep the processor the least ocupied possible. It's simply the control of the WS2812B, there isn't any way implemented to controlthem with UART or any other communication to update the LED states by something like a computer. That will be added later. I have tested it with 120 LED strips in all 8 outputs to a total of 960 with this codeIt should be able to control much more than 120 per output.****************** How it works:****************** The DMA is used to transfer the data from the memory to the GPIO state. To transfer the data with the right timing a PWM output at 800Khz and 2 comparatorsthat invert the signal is supplied to a GPIO pin to trigger the DMA. The PWM signal is from PF0 and the GPIO trigger is at PF1. The 8 outputs need to be at the same GPIO module. In this example GPIO A is used.On the launchpad it seems only the folowing GPIO have all 8 pins available GPIOK, GPIOA, GPIOD, GPIOM. The DMA is in ping-pong mode. This is due to the DMA 1024 item transfer limit.The DMA can only transfer 1024 itens at each transfer setup. If basic mode was used a interrupt to set up the transfer again would not be fast enough to keep the timing.The ping-pong mode eliminates this problem, when a transfer ends it starts another and youcan setup the next one with enough time. The memory to transfer to the GPIO needs to be like this: 0xFF, data, 0x00. This is due to the shape needed for the WS2812 and how i have the DMA triggers working. So for each 8 WS2812 you need 24*3=72byts. Remember that you are controling 8 ports soit's in bytes and you are controling alredy 1 WS2812 per pin by using 72bytes.*/#include <stdint.h>#include <stdbool.h>#include "stdlib.h"#include "inc/hw_ints.h"#include "inc/hw_memmap.h"#include "inc/hw_uart.h"#include "inc/hw_gpio.h"#include "inc/hw_pwm.h"#include "inc/hw_types.h"#include "driverlib/interrupt.c"#include "driverlib/sysctl.c"#include "driverlib/timer.c"#include "driverlib/udma.c"#include "driverlib/gpio.c"#include "driverlib/pwm.c"#include "driverlib/interrupt.h"#include "driverlib/pin_map.h"#include "driverlib/rom.h"#include "driverlib/rom_map.h"#include "driverlib/sysctl.h"#include "driverlib/uart.h"#include "driverlib/udma.h"#include "driverlib/pwm.h"#include <string.h>void GPIOPort1IntHandler(void);void InituDMA(void);void InitGPIO(void); void InitPWM(void);void SendData();void UpdateDate();void StartLEDs();void setup();void Patern4();void StorePatern();/* These are auxiliar arrays to save all 2 color combinatons. It can be more eficient, i was just lazy and made them all 1500 in size*/static uint8_t PaternRed[1500];static uint8_t PaternGreen[1500];static uint8_t PaternBlue[1500];//*****************************************************************************//// The control table used by the uDMA controller. This table must be aligned// to a 1024 byte boundary.////*****************************************************************************#if defined(ewarm)#pragma data_alignment=1024 uint8_t pui8ControlTable[1024];#elif defined(ccs)#pragma DATA_ALIGN(pui8ControlTable, 1024) uint8_t pui8ControlTable[1024];#else uint8_t pui8ControlTable[1024] __attribute__ ((aligned(1024)));#endif/* OUTPUTs, INPUTs and DMA definitions Outputs Tested ( x means they are working) PA: 0[x], 1[x],2[x]3[x],4[x],5[x],6[x],7[x] */#define GPIO_DMA_PERIPH1 SYSCTL_PERIPH_GPIOF#define GPIO_DMA_BASE1 GPIO_PORTF_BASE#define GPIO_DMA_PIN1 GPIO_PIN_1#define INT_DMATRIGGER1 INT_GPIOF#define UDMA_CH_1 UDMA_CH15_GPIOF#define GPIO_BASE_OUTPUT1 GPIO_PORTA_BASE#define GPIO_PERIPH_OUTPUT1 SYSCTL_PERIPH_GPIOA/* * End of OUTPUTs, INPUTs and DMA definitions *///How many LEDs you want PER OUTPUT. #define NumbLEDs 10//This is how many times the DMA needs to repeat due to 1024 transfer limit, or 14 LEDs#define multiplica 6//This is the size of each array 24*3 means 1 WS2812,#define WS2812_BUF_SIZE (72*NumbLEDs*multiplica)//Due to ping-pong mode it need to be half#define countMax multiplica//This is to set how many transfer per DMA cycle (it was just used for testing)#define CycleSize (72*NumbLEDs)volatile uint32_t g_ui32SysClock;/* * These are the 4 arrays for the outputs * g_ui8TxBuf1 is for the set of OUTPUTs1 */static uint8_t g_ui8TxBuf1[WS2812_BUF_SIZE+40*3];/*This will count how many DMA cycles hapened per transfer. Reset to Start a new transferThere's 1 for each DMA*/volatile uint8_t DMACycleCount1 = 0;/*Each DMA chanel has it's own interrupt to re-enable it or disable it when neededafter it's done*/extern void GPIOPort1IntHandler(void){ MAP_GPIOIntClear(GPIO_DMA_BASE1,GPIO_INT_DMA); DMACycleCount1++; // SysCtlDelay(2); /* * Check if done all the necessary cycles * If true then disable the interrupt and the PWM generation */ if( DMACycleCount1>=multiplica){ MAP_GPIOIntDisable(GPIO_DMA_BASE1,GPIO_INT_DMA); /* if(DMACycleCount1 < multiplica && DMACycleCount2 < multiplica && DMACycleCount3 < multiplica && DMACycleCount4 < multiplica)*/ //PWMGenDisable(PWM0_BASE, PWM_GEN_0); MAP_PWMGenDisable(PWM0_BASE, PWM_GEN_0); } else{ if(DMACycleCount1%2 !=0){ MAP_uDMAChannelTransferSet(UDMA_CH_1 | UDMA_PRI_SELECT, UDMA_MODE_PINGPONG, g_ui8TxBuf1+(CycleSize*(DMACycleCount1+1))-1, (void *)(GPIO_BASE_OUTPUT1 + 0x3FC), CycleSize); } else{ MAP_uDMAChannelTransferSet(UDMA_CH_1 | UDMA_ALT_SELECT, UDMA_MODE_PINGPONG, g_ui8TxBuf1+(CycleSize*(DMACycleCount1+1))-1, (void *)(GPIO_BASE_OUTPUT1 + 0x3FC), CycleSize); } } }/* Sets up each DMA chanel.*/void InituDMA(void){ MAP_SysCtlPeripheralDisable(SYSCTL_PERIPH_UDMA); MAP_SysCtlPeripheralReset(SYSCTL_PERIPH_UDMA); MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_UDMA); MAP_SysCtlDelay(10); MAP_uDMAEnable(); MAP_uDMAControlBaseSet(pui8ControlTable);/* * This is for seting up the UDMA_CH_1 for GPIO_BASE_OUTPUT1 */ MAP_uDMAChannelAssign(UDMA_CH_1); MAP_uDMAChannelAttributeDisable(UDMA_CH_1, UDMA_ATTR_ALTSELECT | UDMA_ATTR_USEBURST | UDMA_ATTR_HIGH_PRIORITY | UDMA_ATTR_REQMASK); MAP_uDMAChannelControlSet(UDMA_CH_1 | UDMA_PRI_SELECT, UDMA_SIZE_8 | UDMA_SRC_INC_8 | UDMA_DST_INC_NONE | UDMA_ARB_1); MAP_uDMAChannelControlSet(UDMA_CH_1 | UDMA_ALT_SELECT, UDMA_SIZE_8 | UDMA_SRC_INC_8 | UDMA_DST_INC_NONE | UDMA_ARB_1); /*uDMAChannelTransferSet(UDMA_CH_1 | UDMA_PRI_SELECT, UDMA_MODE_PINGPONG, g_ui8TxBuf1, (void *)(GPIO_BASE_OUTPUT1 + 0x3FC), CycleSize); uDMAChannelTransferSet(UDMA_CH_1 | UDMA_ALT_SELECT, UDMA_MODE_BASIC, g_ui8TxBuf1+(CycleSize-1), (void *)(GPIO_BASE_OUTPUT1 + 0x3FC), CycleSize); */ /* * End of UDMA_CH_1 */ //Enable the DMA chanels // MAP_uDMAChannelEnable(UDMA_CH_1);}/*GPIO DMA trigger setupGPIO outputs setup*/void InitGPIO(void){ /* Start of DMA trigger 1 setup */ //SysCtlPeripheralDisable(GPIO_DMA_PERIPH1); //SysCtlPeripheralReset(GPIO_DMA_PERIPH1); //SysCtlPeripheralEnable(GPIO_DMA_PERIPH1); MAP_SysCtlDelay(3); MAP_GPIOPinTypeGPIOInput( GPIO_DMA_BASE1, GPIO_DMA_PIN1); MAP_GPIOIntTypeSet( GPIO_DMA_BASE1,GPIO_DMA_PIN1,GPIO_BOTH_EDGES); GPIOIntRegister( GPIO_DMA_BASE1,GPIOPort1IntHandler); MAP_GPIOIntClear( GPIO_DMA_BASE1,0x1FF); MAP_GPIODMATriggerEnable( GPIO_DMA_BASE1,GPIO_DMA_PIN1); MAP_GPIOIntEnable( GPIO_DMA_BASE1,GPIO_INT_DMA); MAP_IntEnable(INT_DMATRIGGER1); /* * End of DMA trigger 1 setup *//* *=================================================== * * * Output setups * * *=================================================== */ /* * Start of GPIO_BASE_OUTPUT1 setup */ MAP_SysCtlPeripheralDisable(GPIO_PERIPH_OUTPUT1); MAP_SysCtlPeripheralReset(GPIO_PERIPH_OUTPUT1); MAP_SysCtlPeripheralEnable(GPIO_PERIPH_OUTPUT1); SysCtlDelay(10); //The folowing GPIO have all the 8 pins in the launchpad: // GPIOK, GPIOA, GPIOD, GPIOM, // HWREG(GPIO_BASE_OUTPUT1 + GPIO_O_LOCK) = GPIO_LOCK_KEY; HWREG(GPIO_BASE_OUTPUT1 + GPIO_O_CR) |= 0x80; MAP_GPIOPinTypeGPIOOutput(GPIO_BASE_OUTPUT1, 0xFF); MAP_GPIOPinWrite(GPIO_BASE_OUTPUT1,0xFF,0x0); /* * End of GPIO_BASE_OUTPUT1 setup */ }/*Sets up a PWM signal with 800Khz frequency at PF0The signal is inverted at CompA, CompB and 0 (HWREG(PWM0_BASE+PWM_O_0_GENA) = 0x444;)For WS2812 the comparator values are 84 and 42For WS2812B the comparator values are 96 and 48 */void InitPWM(void){ MAP_SysCtlPeripheralDisable(SYSCTL_PERIPH_GPIOF); MAP_SysCtlPeripheralReset(SYSCTL_PERIPH_GPIOF); MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_GPIOF); MAP_SysCtlPeripheralDisable(SYSCTL_PERIPH_PWM0); MAP_SysCtlPeripheralReset(SYSCTL_PERIPH_PWM0); MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_PWM0); MAP_SysCtlDelay(3); // // Unlock the Pin PF0 and Set the Commit Bit // HWREG(GPIO_PORTF_BASE + GPIO_O_LOCK) = GPIO_LOCK_KEY; HWREG(GPIO_PORTF_BASE + GPIO_O_CR) |= 0x01; MAP_GPIOPinConfigure(GPIO_PF0_M0PWM0); // // Configure the PWM function for this pin. // Consult the data sheet to see which functions are allocated per pin. // TODO: change this to select the port/pin you are using. // MAP_GPIOPinTypePWM(GPIO_PORTF_BASE, GPIO_PIN_0); // // Set the PWM clock to the system clock. // MAP_PWMClockSet(PWM0_BASE, PWM_SYSCLK_DIV_1); // // Configure the PWM0 to count down without synchronization. // MAP_PWMGenConfigure(PWM0_BASE, PWM_GEN_0, PWM_GEN_MODE_DOWN | PWM_GEN_MODE_NO_SYNC); // // Set the PWM period to 800KHz. To calculate the appropriate parameter // use the following equation: N = (1 / f) * SysClk. Where N is the // function parameter, f is the desired frequency, and SysClk is the // system clock frequency. // In this case you get: (1 / 800KHz) * 120MHz = 150 cycles. Note that // the maximum period you can set is 2^16. // MAP_PWMGenPeriodSet(PWM0_BASE, PWM_GEN_0, 150); // // Set the Comparators for 0.35us and 0.7us //96 and 48 for WS2812B //84 and 42 for WS2812 // HWREG(PWM0_BASE+PWM_O_0_CMPA) = 96; HWREG(PWM0_BASE+PWM_O_0_CMPB) = 48; HWREG(PWM0_BASE+PWM_O_0_GENA) = 0x444; HWREG(PWM0_BASE+PWM_O_0_COUNT ) = 40; // // Enable the PWM0 output signal (PF0). // MAP_PWMOutputState(PWM0_BASE, PWM_OUT_0_BIT, true);}void StartLEDs(){ for(uint32_t ui32Index=0;ui32Index<(WS2812_BUF_SIZE);ui32Index++) { if((ui32Index%3) == 2 ) { g_ui8TxBuf1[ui32Index] = 0x00; } else if((ui32Index%3) == 1 ) { g_ui8TxBuf1[ui32Index] = 0x00;//rand()%256; } else if((ui32Index%3) == 0) { g_ui8TxBuf1[ui32Index] = 0xFF; } } for(int i=0; i < 40*3; i++){ g_ui8TxBuf1[WS2812_BUF_SIZE+i] = 0x00; } // // Enables the PWM generator block. // MAP_PWMGenEnable(PWM0_BASE, PWM_GEN_0); SendData(); }/*This re-enables and sets up a new DMA transfer.It waits for current DMA transfers to end*/void SendData(){ while(DMACycleCount1 < multiplica){ } //SysCtlDelay(2000); HWREG(GPIO_BASE_OUTPUT1 + (0xFF << 2))=0x00; MAP_uDMAChannelTransferSet(UDMA_CH_1 | UDMA_PRI_SELECT, UDMA_MODE_PINGPONG, g_ui8TxBuf1, (void *)(GPIO_BASE_OUTPUT1 + 0x3FC), CycleSize); MAP_uDMAChannelTransferSet(UDMA_CH_1 | UDMA_ALT_SELECT, UDMA_MODE_PINGPONG, g_ui8TxBuf1+(CycleSize-1), (void *)(GPIO_BASE_OUTPUT1 + 0x3FC), CycleSize); MAP_PWMGenDisable(PWM0_BASE, PWM_GEN_0); DMACycleCount1 = 0; HWREG(PWM0_BASE+PWM_O_0_COUNT ) = 47; MAP_uDMAChannelEnable(UDMA_CH_1); MAP_GPIOIntEnable(GPIO_DMA_BASE1,GPIO_INT_DMA); MAP_PWMGenEnable(PWM0_BASE, PWM_GEN_0);}/*This is to help change the values more easily.*/ int8_t UpdateData(uint8_t value,uint8_t LED, uint8_t color, uint8_t output, uint8_t *array){ while(DMACycleCount1 < multiplica){ } if(LED >= NumbLEDs*multiplica) return -1; if(color > 2) return -1; if(output > 7) return -1; uint32_t pos=1; uint8_t *temporary=array; pos = pos + (LED*72); pos = pos + (color*24); uint8_t temp[8]; for (int i=7; i >=0; i--){ temp[i] = 1 & value; value = value >> 1; } for(int i=0; i < 8; i++){ if(temp[i] == 1 && i!=7){ *(temporary+(pos+(3*i))) |= (1 << output); } else{ *(temporary+(pos+(3*i))) &= ~(1 << output); } } return 1;}int main(void){ // // Set the clocking to run at 20 MHz (200 MHz / 10) using the PLL. When // using the ADC, you must either use the PLL or supply a 16 MHz clock // source. //e g_ui32SysClock = MAP_SysCtlClockFreqSet((SYSCTL_XTAL_25MHZ | SYSCTL_OSC_MAIN | SYSCTL_USE_PLL | SYSCTL_CFG_VCO_480), 120000000); InitPWM(); InitGPIO(); InituDMA(); StartLEDs(); SysCtlDelay(2000); Patern4(); //Below there are just some patern tests that i was having fun with }/* A partern that cycles the color combinations, starting from the center of the strip Just change the size of the strips on the defines It's a bit hard to explain how i made this patern, but it's stored all 2 color combinationspossible and this partern starts the LEDs with a "color distance", it's how far away the value is from each other. There's a increment that you can use to change how "fast" the colors change. It's visualy but in reality you just skip some intermediate values/combinations*/void Patern4(){ StorePatern(); static uint16_t values[120]; uint16_t temp=0; for(int i=0; i< 120; i++){ values[i] = temp; temp+=24; //This is the "color distance" betwen each led if(temp>1500-1) temp=0; } while(1){ for(int k=(NumbLEDs*multiplica/2)-1; k >=0 ; k--){ if(values[k] >= 1500-1) values[k]=0; for(int p=0; p < 8; p++){ UpdateData(PaternRed[values[k] ],k,1,p,g_ui8TxBuf1); UpdateData(PaternGreen[values[k] ],k,0,p,g_ui8TxBuf1); UpdateData(PaternBlue[values[k] ],k,2,p,g_ui8TxBuf1); UpdateData(PaternRed[values[k] ],(NumbLEDs*multiplica-1)-k,1,p,g_ui8TxBuf1); UpdateData(PaternGreen[values[k] ],(NumbLEDs*multiplica-1)-k,0,p,g_ui8TxBuf1); UpdateData(PaternBlue[values[k] ],(NumbLEDs*multiplica-1)-k,2,p,g_ui8TxBuf1); } /* This is the increment that happens each cycle. The bigger, the faster the colors change. */ values[k]+=3; } SendData(); while(DMACycleCount1 < multiplica){ } } }/*This stores all combinations for later use of 2 colors with a RGB LED*/void StorePatern(){ uint8_t brightness=250; uint16_t k=0; uint8_t step=1; //sobe azul for (uint8_t i=0; i <= brightness; i+=step){ PaternRed[k] = brightness; PaternBlue[k] = i>>2; k++; } //desce vermelho for (int i=brightness; i >= 0; i-=step){ PaternRed[k] = i; PaternGreen[k] = 0; PaternBlue[k] = brightness>>2; k++; } //sobe verde for (int i=0; i <= brightness; i+=step){ PaternRed[k] = 0; PaternGreen[k] = i; PaternBlue[k] = brightness>>2; k++; } //desce azul for (int i=brightness; i >= 0; i-=step){ PaternRed[k] = 0; PaternGreen[k] = brightness; PaternBlue[k] = i>>2; k++; } //sobe vermelho for (int i=0; i <= brightness; i+=step){ PaternRed[k] = i; PaternGreen[k] = brightness; PaternBlue[k] = 0; k++; } //desce verde for (int i=brightness; i >= 0; i-=step){ PaternRed[k] = brightness; PaternGreen[k] = i; PaternBlue[k] = 0; k++; } for (uint8_t i=0; i <= brightness; i+=step) PaternGreen[i] = 0;}