Software UARTs

Why?

I think it's pretty obvious why you'd want one of these. The bifferboard only has a single serial port, and it would be handy to use some GPIO pins to implement a second or third port, even if running at low speed.

The Transmitter

Clock Speeds

The current PIT tick rate patch defines the pit tick rate on the Bifferboard to be 1041816. An interesting thing happens when you divide this by 115200, the maximum speed of the on-board serial port. You get 9.04354 (near enough). Multiply 115200 by 9 gives you 1036800, so I wonder if this would give slightly more accurate time-keeping than the current patch. It's too much of a coincidence. RDC must surely be driving their on-board hardware UART from the PIT tick timer, otherwise why pick a non-standard frequency for the tick rate. This does suggest a reasonably high baud rate is possible with software UART, so let's see.

Selecting Pins

This isn't as trivial as you might think. The serial transmit pin (GPIO8) is the obvious choice because it allows you to test out a software UART using a normal USB serial cable. Unfortunately, when you switch the pin to GPIO mode it no longer pulls high as the on-board UART would do. You have to provide some extra pull-up to help the receiving UART understand you're 'between characters', because without that it won't get above about 2.5 volts. That's enough for 3.3v ttl logic, but won't work with a TTL-level UART receiver. The pull-up resistor can be quite a high value, e.g. 20K, but you need something there.

Since the whole idea of adding another uart was so you can get >1 of the things it makes sense to use a different pin in the long run, and the first that comes to mind is the LED output (GPIO16). This will work, but it also needs a pull-up. You may find you need a smaller value than GPIO8. 2k seems to work OK. Once I realised I needed a 'non-standard' cable anyhow, I switched to using the LED output, and did all the experimentation using that. It's the 3rd pin along from the left on J2, and you can just about get a clip on it by prising the boards a little apart.

Connecting to the red LED control pin

GPIO caveats

In order to do this right you are going to need to handle the GPIO properly so it can coexist with other software that wants to change the 0xcf8 register. You change the GPIO values via 0xcfc, however 0xcf8 needs to be pointing to the GPIO data control register for the RDC chip. Other things, particularly the RDC watchdog module may be trying to point this register somewhere else. This is really annoying. Of course, the answer is simple, change the 0xcf8 register as well as the 0xcfc one every time you write GPIO, and that solves the problem, but port writes are expensive under x86. Not only you lose the time, but you stop other processes from running, doubly annoying. The second problem is that there is no output status register for the GPIO ports, and reading back the data value will give you the current state, rather than the last written state. If anything is pulling a GPIO pin to a different state to what you set it, that's what you'll read back. So any software UART solution needs to be aware of the Linux GPIO subsystem to retrieve the 'last written' state of the gpio pins, in order not to inadvertently change other pin states when updating the transmit pin.

Accessing Linux GPIO without any race conditions is no particular problem for a kernel module, but not something you can easily do in userspace using lxrt (not that I know of). I think this means, if you want to use LXRT, you need to not be using GPIO pins for anything else, or if you do, use them only from the LXRT process and implement your own routines that do something like the Linux GPIO subsystem does, storing the last written output value, and then ORing or ANDing against that prior to writing to the port.

Anyhow, for the moment I'm only interested in what the hardware can do rather than the vagaries of implementing things under Linux. I'll deal with that some other time.

The clock source

Rather than use RTAI to get a clock frequency for the UART I decided to just program with the 'bare metal', and hacked Biffboot to do what I wanted. Interrupts on the PC architecture involve setting up an IDT, a GDT an interrupt vector table and so on, so are not for the faint hearted. If there is interest I will try to produce an 'open-source' version of the code that does that, but I based my code on James Molloy's excellent tutorial, and it works fine so long as you make sure you understand everything, and do exactly what he says. Be prepared to get one tiny thing wrong and wonder why the hell it doesn't work for a few hours, or even evenings, but there's a certain sense of satisfaction in getting this right, not to mention it being a nice lesson in the history of the IBM PC!

Interrupt overhead

By experimentation, the fastest you can run timer interrupts on the Bifferboard to do GPIO seems to be about 147KHz. Perhaps I could make this faster by switching to an assembler handler, but since the IO port operations are the bottle neck, this may not gain much. This is programming the PIT with the following values:

outb(7, PIT_CH0);

outb(0, PIT_CH0);

The first outb sets the LSB of the divider, the second sets the MSB. You can program lower values but the overhead of handling interrupts and doing any port output means the period won't work out much different, although the jitter just might. Of course the interrupts can go faster, if you don't actually write any GPIO values, but that's kind-of pointless for this application. I consider that if I can't see any waveform on a scope I've no idea what the interrupts are doing, so idea what the jitter is, and if the waveform will be useful.

The Software

Assuming you can get interrupts or some kind of periodic call from somewhere, you may be interested in my UART transmitter code, which looks like this:

// States we can be in

#define STATE_BIT0 0

#define STATE_BIT1 1 // defines for bits 1-7 not used directly, only here to aid understanding

#define STATE_BIT2 2

#define STATE_BIT3 3

#define STATE_BIT4 4

#define STATE_BIT5 5

#define STATE_BIT6 6

#define STATE_BIT7 7

#define STATE_STOP_BIT 8

#define STATE_START_BIT 9

#define STATE_IDLE 10

static u8 state = STATE_IDLE;

static u8 xmit_char = 0;

static u32 tx_pin = GPIO_LED;

// 256-byte buffer, deliberate wrap.

static u8 tx_buffer[0x100];

static u8 tx_buffer_head=0;

static u8 tx_buffer_tail=0;

// Call this function at the baud rate.

void softuart_tx_update()

{

switch (state)

{

case STATE_IDLE: // check for characters in the buffer

if (tx_buffer_head!=tx_buffer_tail)

{

xmit_char = tx_buffer[tx_buffer_head];

tx_buffer_head++;

state = STATE_START_BIT;

}

return;

case STATE_START_BIT:

gpio_set_value(tx_pin, 0); // pull low for start bit

state = STATE_BIT0;

return;

case STATE_STOP_BIT:

gpio_set_value(tx_pin, 1); // pull high for stop bit

state = STATE_IDLE;

return;

default: // Must be transmitting data

gpio_set_value(tx_pin, (xmit_char & (1<<state))? 1 : 0);

state++; // onto the next bit, or stop bit.

}

}

void softuart_tx_init(u32 pin)

{

if (pin) tx_pin = pin;

state = STATE_IDLE;

gpio_output(tx_pin, 1); // normally high

tx_buffer_head = 0;

tx_buffer_tail = 0;

}

void softuart_tx_put( u8 ch )

{

// Protect reading of the head, since the IRQ routine can change it

asm volatile("cli");

u8 head = tx_buffer_head;

asm volatile("sti");

// Only we change the tail, so that's OK to read

u8 tail = tx_buffer_tail;

tail++;

if (tail==head) return; // if it would overflow the buffer, do nothing.

// we own the tail, so we can use it to add the char

tx_buffer[tx_buffer_tail] = ch;

// go ahead and add to the xmit buffer.

// protect the tail

asm volatile("cli");

tx_buffer_tail++;

asm volatile("sti");

}

void softuart_tx_puts(const char* text )

{

while (*text)

{

softuart_tx_put(*text);

text++;

}

}

There is not really much there. Remember that at 150MHz clock speed it's not like programming an AVR, the instructions execute quickly, it's only the IO port access that's slow, so it would be fine to do this in C++. In a kernel module or LXRT task the asm statements won't be necessary. The code implements a 255-byte transmit buffer - data to be written are added to the 'tail', and it transmits stuff from the 'head'. If the transmit buffer is full the code silently ignores the character.

Maximum transmit speed

It turns out you can pump characters out of the GPIO at 115200 baud, although that's pretty close to the limit of what you can do in an x86 interrupt handler. Minicom gave the odd garbage character and the scope waveform was a little bit off but the pl2302 cable coped fine. The PIT timer divider value was 9 for this, very close to my experimental minimum of 7.

Scope capture of the transmitted letter 'H' (ascii 0x48) @115200 baud

NB: 'H' gets sent LSB first, after the start bit, the first low transition, then 0-0-0-1 (LS nibble 8) and then 0-0-1-0 (MS nibble 4). Bit 9 is the stop bit which is always high. Checking the timing, according to the scope display (10 uS per division) 9 bits gets sent in 78 uS. According to calculation at 115200 baud 9 bits should get sent in (1000000/115200)*9 gives 78.125 uS, so pretty much spot on.

Maximum transceiver speed

I noticed some software UART code on the net needed a receiver clock speed 3x the baud rate, so I decided to see what happens when I increase the interrupt frequency 3x but add in my own frequency divider to only call my transmit routine every third interrupt. This would mirror more closely the situation in real-life where you have a transceiver setup. Even dropping to 57600 baud rate this drops the PIT interval to 6... "Danger Will Robinson!!!". Yup, with this configuration it falls apart and you get garbage characters. For this to work reliably you have to drop to 38400, and then it's back to reliable transmission.

Conclusion

We've managed a transmit-only UART at 115200. And the indication is that a transceiver can run at 38400, however little work has been done on the effects of other tasks running in the background, or the effects of interrupts from other devices (the network and/or USB), and also DMA operations (network in particular). Since 38400 is not a commonly used baud rate I suspect I'll standardise on 9600 to give some leeway. That is plenty enough for AVR projects which don't use a crystal, in fact I think you normally have to drop to 4800 if you want reliable comms with an AVR running off it's internal oscillator, which is just the kind of project this technique would be targeting.

The Receiver

Theory

The theory is that you sample at three times the bit rate to detect the transition for the start bit. Then, once you know you're in the start bit you wait for 1/3 the width of one bit, to ensure you're somewhere vaguely near the centre of that start bit. Then, you can sample all the rest of the bits, including the stop bit at the bit frequency. So you only need to sample at 3x the frequency when you're in the idle state - when you're actually receiving bytes it's similar to the transmitter.

Selecting Pins

This was a bit of a no-brainer. Since the LED was used for the output, it made sense to use the button for the input. There is no need for a pull-up resistor, since the pl2302 uart output has to pull high to conform to the 'standard' (is TTL rs232 a standard? Anyhow, you know what I mean).

The code

The code steals the character buffer code from the transmitter, just changing the cli/sti 'locks' a little. It shifts all the bits: start, data and stop bit into a U16 buffer, and when it thinks it has everything it does a verification to ensure start and stop have the correct value, then copies the delimited data to the receive buffer.

/*

Stuff relating to the soft uart receiver. This should be polled at 3x the baud rate

Copyright (c) Bifferos.com

*/

#include "types.h"

#include "stdio.h"

#include "softuart_rx.h"

#include "gpio.h"

#define STATE_START_BIT 0

#define STATE_BIT0 1 // defines for bits 0-8 not used, only here to aid understanding of state machine

#define STATE_BIT1 2

#define STATE_BIT2 3

#define STATE_BIT3 4

#define STATE_BIT4 5

#define STATE_BIT5 6

#define STATE_BIT6 7

#define STATE_BIT7 8

#define STATE_STOP_BIT 9

#define STATE_IDLE 10

// state for the state machine

static u8 state = STATE_IDLE;

// The pin to use for sampling

static u32 rx_pin = GPIO_BUTTON;

// 16-bit variable to hold the received bit pattern.

// It's larger than 8 bits so it can hold the start and stop bits.

static u16 rx_sample = 0;

// 256-byte buffer, deliberate wrap.

static u8 rx_buffer[0x100];

static u8 rx_buffer_head=0;

static u8 rx_buffer_tail=0;

// Add the character to the receive buffer or just discard it if the buffer is full.

void AddToBuffer(u8 ch)

{

u8 head = rx_buffer_head;

u8 tail = rx_buffer_tail;

tail++;

if (tail==head) return; // if it would overflow the buffer, don't receive char

// we own the tail, so we can use it to add the char

rx_buffer[rx_buffer_tail] = ch;

// go ahead and add to the xmit buffer.

rx_buffer_tail++;

}

static u8 freq_divider = 0;

// Call this function at 3x the baud rate.

void softuart_rx_update()

{

u8 val = 0;

// If we are sampling the line for the start bit, check that first

if (state == STATE_IDLE)

{

val = gpio_get_value(rx_pin);

if (val) return; // it's still high, so keep waiting

// Just went low, start sampling the signal on the next clock pulse

state = STATE_START_BIT;

rx_sample = 0;

freq_divider = 2; // ensure the sampling kicks in on the very next call of this function

return;

}

// divide the frequency by three, we're sampling now.

if (freq_divider>=2)

{

freq_divider = 0; // continue on and attempt to receive a bit.

}

else

{

freq_divider++;

return;

}

// Get the pin value, and store the sample

val = gpio_get_value(rx_pin);

if (val)

{

rx_sample |= (1<< state);

}

// Deal with the end-of-character state

if (state == STATE_STOP_BIT)

{

// val == stop bit, that needs to be high

// rx_sample lsb is the start bit, must be low.

if (val && !(rx_sample & 1)) // if start bit not low and stop bit is not high, ditch the sample, must be framing error.

{

rx_sample >>= 1; // ditch the start bit, which is the LSB

AddToBuffer(rx_sample & 0xff); // ditch the stop bit, now bit 9.

}

// strictly speaking we should not return to idle here, as a start bit before the end of the stop bit

// would indicate a framing error. I think here we allow start bit before the end of the stop bit, but in practise this

// is only going to cause problems if the transmitter is sending garbage anyhow.

state = STATE_IDLE;

return;

}

// Proceed to the next sampled bit.

state++;

}

// Setup the receiver pin and character buffer

void softuart_rx_init(u32 pin)

{

if (pin) rx_pin = pin;

state = 0;

rx_sample = 0;

gpio_input(rx_pin);

rx_buffer_head = 0;

rx_buffer_tail = 0;

}

// Check if there's a character in the buffer and if so return true, copying the buffer into ch.

int softuart_rx_getch( u8* ch )

{

// we don't own the tail, so protect our read.

asm volatile("cli");

u8 tail = rx_buffer_tail;

asm volatile("sti");

// check for characters in the buffer. Only we change the head.

if (rx_buffer_head==tail) return 0;

// There's one there, get it

*ch = rx_buffer[rx_buffer_head];

// Only we change the head, but the ISR reads it, so inc must be atomic.

asm volatile("cli");

rx_buffer_head++;

asm volatile("sti");

return 1; // character to return.

}

I kept the calling code simple for now. Here's the 'user' code. It just echoes back any received characters:

while (1)

{

if (softuart_rx_getch(&input_key))

{

softuart_tx_put(input_key);

}

}

And here's how the interrupt handler calls the tx and rx routines, dividing the frequency for the transmitter but not for the receiver. It can all be sped up if we integrate the tx and rx side more, but I want to keep them distinct for now, as I can see many applications where you only want one or the other.

static u8 freq_divider = 0;

static void timer_callback(registers_t regs)

{

if (freq_divider>=2)

{

softuart_tx_update();

freq_divider = 0;

}

else

{

freq_divider++;

}

softuart_rx_update();

}

Crank it up

I wanted to try 57600, which shouldn't work, but it seems the transmitter functions correctly. The receiver gets some garbled characters, I guess it's just sampling too slowly to catch all the bits.

This is me typing 'the quick brown fox'.

We need to drop down to 38400 and the character echoing starts to work again.

Conclusion

Once the transmitter details were worked out the receiver turned out to be the easy part. This is probably because making the interrupts work correctly, and figuring out how far you can push the timing was all done as part of the transmitter work. I've used test clips, gingerly prising the boards apart to connect them to the pins, but there's no reason you couldn't make an adapter with some veroboard and connect to the extra UART that way.

Test subject with TX and RX on LED and button respectively. The top left clip is to get a 3.3v pull-up for the TX. The pull-up resistor is off-picture.