Joseph's Blog Over VHDL and WINLAB

Hello everyone,

thanks for stopping by this blog. I will be posting about my experience and ongoing learning at WINLAB (Wireless Information Network Laboratory). For a good idea as to what WINLAB is, do not take my word, take theirs. http://www.winlab.rutgers.edu/about/Index.html

I was fortunate enough to be in this internship with, along with be informed of it, by the now college freshman Nick Cooper.

I am not really great at coding, so this internship is a great chance to try to change that, and broaden my knowledge of computers from gate level and hardware level that I am more comfortable with, to software coding as well. The language which I will be trying to learn is called VHDL ( VHSIC(Very High Speed Integrated Circuit) Hardware Description Language) The program I'll be using to interpret this code is called Vivado which is produced by the IC and FPGA giant Xilinx. ( it is free to download, but you need a beefy system if you want to run it minimum 8GB of RAM and a good Quad Core CPU.)

These rather demanding hardware specs posed and issue, as I am not going to buy a good laptop until I go off to college, so I only have a weak old dumpster dive macbook to run programs and code on. The solution to this problem come through a great form of remote computing called SSH Port Forwarding. SSH stands for Secure Socket Shell, and allows very secure remote access to any system that is allowing it, and in this case the ORBIT Lab. (Open-Access Research Testbed for Next-Generation Wireless Networks (I am not sure how that makes ORBIT))

ORBIT Lab is hosted inside of the WINLAB building, and at its full glory boasted 400 cutting edge machines in unheard of form factors all hanging from the ceiling with various radio antennas coming off of each one. These cutting edge standards were in 2003, so that was a single core at 1GHz and half a gigabyte of RAM. Since then the number has dropped, to around 200 functional, but for the most part every system is now upgraded to current generation parts, Some even sport six core xeons and other fine hardware.

Here is the page for ORBIT lab, http://www.orbit-lab.org/, while your web browser might cry about unsafe access, the certificate is just old, and the site is perfectly safe.

Back to SSH, the beauty of it, is that while my trashbook cannot possibly run vivado, the more beefy nodes in ORBIT can easily run it. So with the power of SSH I can enter the node though a secure connection, and then run vivado on the node instead of on the macbook.

The SSH gurus among the readers will note though that this would not work in actuality, why? Because while the powers of SSH allow me control of the node, the only thing I see on my end is a terminal screen full of whatever code I am using to control the node, and anything the node tells me in text, not in an image.

The fix for this comes with a great second piece of technology called X11 Forwarding. (it seems Matt is already onto these numbers being literally Substandard... look how low the 1s are...) X11 forwarding works min parallel with SSH to allow a GUI / usable interface to appear on the accessing person's computer and thus allow full control though that.

So in other words with X11 forwarding enabled, I can see a vivado window on my screen and use it just like vivado, without overheating my macbook and running out of RAM. The drawback is you need a decent internet connection, and there is always some lag, but when one is typing code, low FPS and poor response time are not huge issues.

Now that I have explained how I can use vivado, and what VHDL is, lets go over some things you can do with VHDL.

As it is a hardware description language it can describe hardware... Now I know that sounds like a real wow, but that is the beauty of it. I can describe a small bread board system, or I can describe a CPU, and I can be vague or down to the gate level. I can represent a transistor with an if statement, or by changing the parameters, represent RAM reading and writing. All that power with surprisingly little rules.

After reading a few books and trying to think like a VHDL genius I decided I was going to give my self a challenge by describing a 4 bit adder. ( this was like two weeks in or so.) but to make it fun, I wanted to do it pure iSTEM style (iSTEM is a great class you all must take it) and describe it at a gate level.

The code I wrote goes as follows:

library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

entity ADDER is

Port

(A1,A2,A3,A4,B1,B2,B3,B4,CIN :IN BIT;

SUM1,SUM2,SUM3,SUM4,COUT : OUT BIT);

end ADDER;

architecture Behavioral of ADDER is

begin

SUM1 <= (CIN XOR (A1 XOR B1));

SUM2 <= (((A1 AND B1) OR (CIN AND (A1 XOR B1))) XOR (A2 XOR B2));

SUM3 <= (((((A1 AND B1) OR (CIN AND (A1 XOR B1))) AND (A2 XOR B2)) OR (A2 AND B2)) XOR (A3 XOR B3));

SUM4 <= ((((((((A1 AND B1) OR (CIN AND (A1 XOR B1))) AND (A2 XOR B2)) OR (A2 AND B2))) AND (A3 XOR B3)) OR (A3 AND B3)) XOR ( A4 XOR B4));

COUT <= (((((((((((A1 AND B1) OR (CIN AND (A1 XOR B1))) AND (A2 XOR B2)) OR (A2 AND B2))) AND (A3 XOR B3)) OR (A3 AND B3)))) AND (A4 XOR B4)) OR (A4 AND B4));

end Behavioral;

First off, I know the layout of the code is terrible, I learned how to make code easier to read later on...

The first two lines are pretty much setting a standard for how I am going to describe the adder. I am telling the software that everything in my code is in the IEEE.STD_LOGIC_1164.ALL library. you can use other libraries, but I never really found a situation where I needed to.

Then I name the entity(the thing I am making) ADDER (very original I know) and then saying what ports are in the design and what the ports do. As this entire design is gate level I could not unify all the As into one bus, as they are all processed by different gates.

The next 7 lines are describing how the adder will work.

SUM 1 ( the lowest value bit on the 4 bit out put (plus the one overflow)) is simply a function of A XOR B and then that product XOR CIN (carry in for when adders are daisy chained) (A and B go through an XOR gate and the result is then fed to another XOR gate with CIN and the result of this is SUM1 out). So we can already see if A and B are high then we get a low out of the first calculation, and then if CIN is high we get a 1 and a 0 which results in a 1 out. Now those unfamiliar with an adder might be wondering how three 1s make a 1, and the answer is simple in binary 0,1,2,3 in a 2 bit system are represented as 00,01,10,11 notice how both 1 (o1) and three (11) both end in a 1? that right most number is SUM1, so SUM one works, we simply need SUM2 to work so as to find out how the second 1 in three gets there.

SUM2 however follows a more demanding path, as its work is a result of part of the logic for SUM1. Looking at SUM2 we see that one of the calculations it deals with is CIN and A and B ,but really those can be understood as, A2 B2, and the logic performed that goes to the first COUT. Because the second CIN is a result of the first COUT, one can just follow the logic path all the way back to where COUT1 was computed. This is where the coding gets longer fast. COUT1 is the result of A AND B, which then goes to an OR gate. The other input of this OR gate is being fed the result of A XOR B being XORed (now a verb) with CIN. Now this OR gate outputs what is now COUT, and CIN2, and thus this output is computed on top of. For SUM2, we now take A2 XOR B2 and XOR that with CIN2 or in other words everything we just did to get COUT1. If that just made no sense I apologize, I assure you it made no sense to me just three years ago.

Going back to the missing 1 saga, we can now see that if A1 and B1 and CIN are all ones, then we get SUM1 high, and for SUM2, if A AND B are high back in the first block, then the XOR gate will put out a low, which is then ANDed with CIN. If either input is low, then the AND gate will put out a LOW as well. So going into the OR gate where COUT emerges, there is a LOW, and from the other input, a high, as both A AND B were high meaning that there is a high coming out of the A1 and B1 AND gate. With a high and a low going into the last OR gate, the output is high, as that is how an OR gate works.

Now there is a high going into CIN2 and A2 and B2 are both low here as we are adding 1+1+1, via A1, B1, and CIN. The first XOR gate for A2 and B2, is thus low, meaning that the next gate along gets a high from CIN2 and a low from A2 XOR B2 which results in a high out of there as well as it is an XOR gate as well. So... SUM2 gives us a 1. A one might not seem exciting, but it is, as now if we look at SUM1 and SUM2 as a combined output, we get 11, which in binary is the 3 we were looking for!

Looking at SUM1 and SUM2 gives us a picture of how a two bit binary full adder works. The nice part of this is, as an adder consists of identical blocks chained together by the previous block's COUT, going to a three or 64 Bit adder is pretty simple in the sense that it follows a very fixed pattern.

Where does Vivado come into play here? Great question, for starters, an interpreter is needed for the VHDL code to be tested. Vivado also happens to have effectively an omniscient oscilloscope function built in which is a fantastic tool. Were I to run the above code without this tool, were the system not to work in the expected manner, I would be left with a correct input and an incorrect output, meaning the the issue lies somewhere in the mess of code wires I created. On a breadboard it is possible to use a logic probe so as to find a problem, by following signals through the system until something incorrect is noticed. With Vivado it is possible to simultaneously check every defined input and output listed above in the code and watch it change relative to the signals going in over time. So in other words, it is akin to having a logic probe on every single input and out put, and being able to look back or forward in time to see what is going on anywhere. What does this look like?

Those with good glasses, eyesight, or massive monitors will notice that every input and output listed in the code is displayed. It is important to know that Vivado does not make this display by default, nor does Vivado pick random numbers to feed into the system, that is all done in the test bed code, which is for all intensive purposes too in-depth and annoying for any blog. A good way to sum it up without getting stuck in the brier patch is that one is effectively saying how to feed a signal in to the system and defining how everything in the system has to go together signal wise. On larger projects this can involve linking several batches of code for multiple parts which becomes exceedingly tedious and alarmingly easy to botch.

Tangents aside, we can see that when A1 and B1 are high that SUM1 is low and SUM2 is high which is indeed the binary equivalent of two. Near the end at the yellow line, the system faces a larger test, A1, A2, A3, and A4 are high representing right there a total of 15 on the A side alone. For the B side, B2 and B4 are high meaning a total of 10. CIN is also high adding one more. If the adder still works, then it should tell us the answer is twenty six. Looking at the SUM lines and COUT, which in this case is overflow that would ideally be ignored but is in this case read, we see COUT is high (16) as is SUM4 (8) and SUM2 is high (2) 16+8+2 does equal 26, so the adder works!

If you thought this was already too much, then you will be glad to know that there is no way to possibly cover the processor that Nick Cooper and I built in such detail. Instead, I'll give summaries.

After I had figured out how to use SSH and vivado through it (Nick was at WINLAB the year before as well and thus already knew it) We set out to go try to create a working processor described in VHDL. We arrived at such a decision as we were both in iSTEM that school year and really enjoyed the logic level breadboard work, and as a simple processor can be built on a breadboard, we figured making one in VHDL would be interesting.

Things start to get interesting though the more one thinks about the project, something that became very apparent as time progressed. We started with the idea to have a basic calculator like style processor that could multiply, divide, add, subtract, and then should time allow it, solve square roots and math with exponents. The initial proposition was a 4 bit integer only system which thus lead to many questions. what is the use of such a system? If you want to add 7 and 8 then you are set, as it can handle a number up to 15. If you want to divide 15 by 3 you are also set, but what about 12 divided by 7? or 4 times 4? The first one would require a measure to round to teh enarest intereger which is wildly inaccurate considering that being off by half a number is bad enough, especially when you only have 15 (or 30 if you incorporate a negative sign) in the first place.

The first solution that we could think of was to make the system larger, after all being off by a half when you have 510 numbers at your disposal is not as bad. Yet once again we ran into the "is this useful?" wall. The odds of ever wanting to know the square root of say 234 but only down to the nearest whole number and not being able to simply reason it out on paper are slim to none. There had to be a way to work with fractions of numbers or even better yet decimals. The answer was literally all around us. The staff of WINLAB who are involved in the summer internships are always willing to take time to help out internees and thus after a weekly progress meeting/presentation (another great aspect of the internship) both an undergraduate student and a supervisor suggested we look into the floating point system which allows work with binary decimals and scientific notation resulting in the ability to do precise rounding and more complicated math such as square roots.(with not much ease, nor ease in general, but once can do them which is the big deal).

It seems a breif overview of floating point is in order...

Photo from Wikipedia on IEEE floating point, photo by Codekaizen.

Simply put floating point is scientific notation in binary. However there are some rules, there is an automatic 1. in front of any given floating point number, and the first digit of the FP(floating point) number is the sign +/- (o is + 1 is -), after that there are a certain amount of numbers reserved for the exponential value of the number (the base is already known so it is not written) Following that is the significance, or digits after the 1. which was mentioned earlier. so 001100000001 assuming the next three after the first zero indicating a positive number are the exponent value, we would have a base raised to the 3rd power times the significance which is thankfully in this case one. (This is technically wrong as there is a binary offset value which allows negative numbers pretty much the exponent value is compared relative to the highest possible to determine the actual exponential value, but that just makes things more abstract and more annoying until it is needed)

How exactly does this work? Let us look at this as a comparison between working with numbers and scientific notation with the commonly used base 10 system and in binary.

If I want to write 124,500,000 in a more condensed notation I can use scientific notation. This turns this moderately unwieldy number into a more manageable 1.2345 X 10^8 this is useful as now if we have another number 2.456 X 10^8 it is a simple as adding the base numbers and keeping the exponent. Now one can argue that these numbers are not that unwieldy or that I am preaching a pointless cause, and that would be fair enough until we get to binary...

The problem with the simplicity of binary is that because there are only two states for each place value, place values get used up fast. I can go from 1-9 with standard base ten numbers in one place value. In binary 1-9 is 0001-1001 which as we can see uses up 4 place values. 255 uses up three spaces in base ten but uses eight in binary. Being able to chop these numbers into a smaller number times a base raised by an exponent, is thus pretty useful. On the flip side, the limitations of binary place values means that exponents are a breeze to work with.

Take the number 1111.0 or 15 if I want to divide it in two, I can simply shift the number to the right and thus have 111.1, or 7.5 (half of 15). Dividing again by two yeilds 11.11 (decimal binary goes 1/2, 1/4, 1/8, 1/16...) or 3.75 (1/4 of 15). likewise, 1111 times two is 11110 which is 30 so to multiply by two I have simply added a zero or in other words a place value. Going to 1111 time 4 is as simple as adding another 0, 111100 which is 60. This all works as every value place in binary doubles relative to the one on its right as one goes further left. Where does this relate to exponential notation? 150 X10^2 is the same as 15 X10^3 yes? This is for base ten numbers. For base two numbers (binary) the base is 2 which means that going back to 15 as 1111 we can write it as 11111 X2^0 now, when we decide to shift the number over to the right but conserve the overall value of 15, we know that we will get 111.1 X2^? well 7.5 times 2 is 15 so, we know that the ? must be 1. In other-words, in order to express a binary number as a smaller number with a larger exponent one shifts the number to the right relative to the decimal point and then increases the exponent how ever many times the number was shifted. The opposite is done when one wants a lower exponential value, but a larger number being worked with.

However when shifting a number around, one is limited by the constraints of the system while an infinite amount of bits for a floating point number would be great, a given system can only take so many bits. if a number has to be expanded beyond the limits of the system, the excess numbers on the end must be chopped resulting in a loss of precision. Your computer probably runs on a 64 bit system unless it is very old. This is where we had to again change our plans to 16 bit floating point, as eight bit floating point is not accurate enough (with eight bit there simply is not enough space for a decent range of exponents and enough significance to allow some to be lost in calculations and still have a close enough answer.)

So now hopefully floating point makes some sense, now how does this help? Unfortunately it just leads to another problem...

Multiplying with exponents is nice and easy, but now with addition we cannot add unless the numbers are of equal exponential value. Now some might argue that perhaps the addition might be better off solved in normal binary with decimals and none of this exponential stuff, but as mentioned above, space becomes an issue, and also having to convert from floating point to decimal back to floating point would greatly slow down any system. At least binary and floating point are friendly when it comes to exponents. In order to get two numbers which must be added to the same exponential value, one can easily shift the numbers around until they are at equal exponential values. This was a real challenge...

Here one sees the return of the magic waveforms. This is a look at the adding core attempting to shift a number relative to another one so that the numbers can be added. However, the system is not working quite right, as it simply shifts until it runs out of space and then leaves us with an absolutely meaningless number. But this is a great picture, as one can actually see the significance move up the bus and thus see it shift and make a nice staircase pattern. As one will start to notice, these waveforms become rather interpretative at this level. The overall idea is more important than the actual numbers here now. Neither Nick nor I know what this early adder core was trying to add, but we still know where it was failing, ideally one could squint for hours and find the numbers fed in but this picture does not cover all of that only the shifting, meaning that it is for demonstration only. In this case the Addition core simply was unable to tell when to stop shifting the numbers and thus would go until it ran out of time. After a week of madness trying to solve this, and some adjustment to the code, the result was as follows.

Here the picture focuses on the exponent, ( which we had to add, before it only dealt with the significance leaving the exponent up to the view to judge.) and the significance both of which have clearly shifted once and stopped. Thus this is a working adder core, later on it would be adjusted to take negative numbers and thus allow subtraction. The multiplication/ division core was never quite finished due to time limits, so there are no pictures of it. The code for the adder alone was pushing 200 complex lines and thus is not going to be shown (also I do not have it with me at the moment)

Once we had the addition and subtraction sorted out it was imperative that the memory or RAM was sorted out, as we wanted to be able to add to results or add multiple numbers, and thus be able to store numbers for later use. The best way to think of the RAM is as a sheet of paper in which you can write down information while solving a problem. Space is limited, and thus if you need more space you will want to erase parts that are not in use. For the sake of speed, one wants to use the first available spot on the paper and write there if it is available, or erase it and then write there. Of course as the placement is Random, one needs to be able to know what is in a cell which in the paper situation would involve assigning numbers or letters such as part a of problem 1 and so one. This allows one to Access the cell they need from the Memory. (See what I did there?)

Instead of trying to explain the RAM we built, which is already fuzzy in my head, and would probably take three times the the current length of this blog to do it justice, I'll show another waveform...

If looking at these is starting to make you think this is art and you want a poster for your wall, you are not alone in that thinking. Seen here is a very compressed view of a single RAM cell. The orange Us represent an empty area (undefined neither 0 nor 1) In the course of the loop one can see the orange lines near the bottom turn green on command of a control signal going high at 20ns (data is written and then stored and the cycle repeats as the time frame was long enough)