Random Number Generation

Many things in life are random. Noise, for instance, is random. When developing software, we may find it useful at times to introduce random elements. But how do we make random elements in our code? Can an algorithm be truly random and unpredictable? What if we sampled data from something that is random and unpredictable and use that to our advantage?

In this section, we will learn about two types of random number generation:

Let's start with using the ADC for random noise. If you've flashed the basic ADC example project from the Repositories website page, you may have noticed that the x value from your joystick doesn't stay at a single number. Rather, you'll see that it rapidly fluctuates between a few numbers, all of which are very close to each other in value. Why does this happen? Because of noise.

Noise is a weak but prominent force when it comes to electrical analysis. It is usually a nuisance that prevents us from getting exact, precise values, but on a larger scale, we can usually get values that are good enough for noise to not matter (after all, we're engineers, not mathematicians). At smaller scales, though, noise has a much stronger effect. Noise has a small amplitude and a high frequency. So how do you tell apart a small, high frequency signal from noise? It's a challenge that many electrical and computer engineers may encounter in their careers as we try to optimize technology to work at smaller scales.

These effects can be seen in your joystick measurements. Have you noticed that your joystick's x value only fluctuates between numbers that are each about +/-1 apart from another? There's a reason why noise isn't changing your numbers by orders of tens, hundreds, or even thousands! Let's think about how computers understand numbers, i.e., how bits are weighted in binary.

For an 8-bit number, the MSB represents 2^7 or 128 and the LSB represents 2^0 or 1. When we do analog-to-digital conversion, our MSB will represent the bulk of the analog signal, as it can effectively divide the resolution into two regions. For example, if our ADC could convert voltages between 0V and 5V, then numbers with an MSB of 1 would effectively fall in the "upper half" of the voltage range (in other words, the voltage measurement would be between 2.5V and 5V), whereas numbers with an MSB of 0 would fall in the "lower half" of the voltage range (in other words, the voltage measurement would be between 0V and 2.5V). The next bit afterward further divides the resolution regions in half. A number starting with 00 would represent some voltage between 0V and 1.25V, a number starting with 01 would represent some voltage between 1.25V and 2.5V, a number starting with 10 would represent some voltage between 2.5V and 3.75V, etc.

What does this mean? The MSB tends to be the most stable part of a voltage reading when converting from analog to digital. The MSB can't be easily affected by noise since the MSB represents a wide variety of voltages. What about on the other side, with the LSB? Using our example of an 8-bit converter with a voltage range of 5V, our voltage resolution would be: (5V - 0V) / 2^8 ≈ 0.0195 volts per digital increment. In other words, an analog-to-digital conversion resulting in 0b1111110 (0xFE) is only 0.0195 volts less than a digital measurement of 0b11111111 (0xFF). This means that the LSB has a meager range of 0.0195V, and that's a very small amplitude. What else has a small amplitude? Noise!

Since the LSB represents the smallest, finest part of a voltage measurement, it is the most prone to noise. This is why your ADC can't get a precise analog-to-digital measurement. In the above example, the ADC was 8-bits, but the ones on the MSP432 are 14-bit, which would make the voltage resolution even smaller! So what does that mean for us? Because the LSB is so heavily affected by noise, the LSB value fluctuates so much that it becomes random and unpredictable, which is exactly what we wanted to implement.

So we have something random and unpredictable. How do we use it? The general idea is that we want to sample the LSB of the ADC as much as we need to get a random number. Let's say you wanted to emulate a coin toss. You would need to randomly generate either a 0 or a 1, or in other words, one binary bit. Simply sampling the LSB alone would give us some random number, either 0 or 1. So how do we sample the LSB? With bitmasks! Recall how we can use bitmasks to affect a single bit. Using a bitwise AND operation with a bitmask for the LSB gives us the value of the noisy LSB (and since it's the LSB, we don't need any shifts).

Generating either a 0 or a 1 is handy, but what if we want to generate more numbers? Let's say numbers between 0 and 3, so a two-bit number. In that case, we have two ADC channels to use -- one for the joystick's x channel and another for the joystick's y channel. So if you sample one, sample the other to get an extra bit! Concatenate the bits (shift one and or it with the other), and you get a two-bit number. Well, what if we need a three-bit number or even a four-bit number? What if you need an entire eight-bit number?! Nothing stops you from resampling the ADC channels. Sample bits as many times as you need to get an appropriately-sized number. There are some programming techniques needed here to you have learned them all: you need to keep a static variable to keep the accumulated/concatenated random number; you need to keep shifting what is already accumulated to the left and bring in the new sampled bit by ORing it, etc. 

Despite all the above discussions, the LSB of the joystick is not truly random and it might have some bias. Do you want to make the bit extra random? Sample bits from the x and y channels separately, then XOR them together and use that result as your random number!

Creating truly random numbers is a very difficult task. In the US, the National Institute for Standards and Technology (NIST) is in charge of creating tests that check the  "quality" of a set of random numbers. You can learn more about them here:

https://csrc.nist.gov/projects/random-bit-generation/documentation-and-software


So to summarize:

An alternative to the above method is pseudo-random number generation by using C library functions, specifically rand(). This page from GeeksForGeeks goes over how to use the rand() function in the general case and is a good reference. This section will talk about the function as it applies to our boards.

Using rand() is actually very straightforward... at a glance. rand() is a function from the C standard library, so you will need to #include <stdlib.h> in any file that needs to use rand(). Afterward, simply calling rand() will return a number. If you call rand() a few times, you'll notice that each function call returns a different number. Hurray! We've got random numbers!

... But there's a caveat. Take a note of which numbers you generated and especially which order they were generated in. When you restart the program, you'll notice that the newly-generated numbers match the previous ones exactly. That's not good; that's not even random! And this is where the flaws of pseudo-random number generation start to show.

How rand() works is that the function pulls from a predetermined list of random numbers called "seeds". Every time rand() is called, it simply iterates through a list of numbers based on the current seed. By default, rand() uses the first seed.

So how do we change that? Another function in the C standard library is srand(), which selects the seed that rand() uses. Give srand() a number, and it will change the list that rand() uses, giving you different random numbers as a result.

... But there is yet another caveat to this. If you give srand() a number that doesn't change whenever you restart the program, then srand() will keep choosing the same seed. We're back to square one!

So how do we fix that? We have to give srand() a random value. Wait, that seems backwards! Using a random number to generate a random number? Preposterous! And... it kind of is. The standard practice in most software is to use the current time as the seed for srand(), and this works because time will always be different at the start of the program, even if it isn't necessarily random in and of itself. However, with our boards, getting the time library to work is rather troublesome. Instead, if you read the first section about the ADC, you'll know that ADC readings tend to get noisy and don't quite stay at the same value each time. So a fairly simple solution is to use an ADC reading from the joystick as the seed. If you read the first section in its entirety, you may even realize that this "random" seed selection can be randomized even further by XORing readings from the joystick's x channel with readings from the joystick's y channel.

There is one more thing worth mentioning, however. According to the TutorialsPoint documentation for rand(), the function will randomly generate a number between 0 and a constant RAND_MAX whose value is guaranteed to be at least 32,767. That's a huge range! If you just want to do a coin toss, you'll numbers that are way larger than 0 or 1 by using rand(). The RAND_MAX constant can also differ across different implementations, so it's not easy to tell what the maximum value is.

So how do we limit the range of rand() and only get numbers we care about? Use the modulus operator: %! What this operator does is divide the number by the divisor you specify, then return the remainder. So 10 % 3 would result in 1, as 10 / 3 would give a quotient of 3 with a remainder of 1. So if you know you need to generate a certain range of numbers, use that range as your divisor. Going back to the coin toss example, using a modulus of 2 would give you back either 0 or 1. Similarly, if you had four numbers to randomly choose from, a modulus of 4 would give you back either 0, 1, 2, or 3.

So to summarize: