Lessons 12 : The data memory

In this final lesson, you will instantiate a RAM component. With ISE this is quite simple. You can then change your FSM to support two new READ and WRITE instructions. These two instructions are for registers Ram and Rdm not yet used.

Finally to complete the S3 processor, it must allow you to manage function calls with or without parameter passing. Here again it is on the register transfer level, towards the program counter, that you're going to intervene. A stack of addresses will be instantiated and will allow you to stack and unstack the branch address during a call, as well as the data that is exchanged between the calling and called functions.

Having chosen a Harvard-type architecture initially, your processor does not yet have a data memory. Yet everything is there to integrate. Then you can write programs on large data structures stored in this memory. Finally, the last feature we will implement together concerns function calling. Similarly as the branch instructions it is a data transfer between the program counter and return stack that will perform the backup and restore of the return address or the function’s parameter passing.

Required knowledge

Memory addressing, notion of stack, return call functions and parameter passing.

Objectives

In this final lesson, you will instantiate a RAM component. With ISE this is quite simple. You can then change your FSM to support two new READ and WRITE instructions. These two instructions are for registers Ram and Rdm not yet used.

Finally to complete the S3 processor, it must allow you to manage function calls with or without parameter passing. Here again it is on the register transfer level, towards the program counter, that you're going to intervene. A stack of addresses will be instantiated and will allow you to stack and unstack the branch address during a call, as well as the data that is exchanged between the calling and called functions.

The memory function

It's ISE that will create you a memory symbol, simply place it in S3.sch then.

Open the project built in lesson 10. You will create a new IPcore type source, name it datamem.

Figure 128 datamem IP core creation

Once validated, choose the generator Basic Elements generator> Memory Elements> Distributed Memory Generator. Next and then Finish opens the specification of the IP screen.

Figure 129 IP generation

Simply choose the size of 2048 16-bit words in Single Port RAM mode and generate the IP by Generate.

Figure 130 datamem Specification

The symbol is now available. You can now instantiate to the left of the Ram register. Rename the output buses of Ram and Rdm, ADRmem(15: 0) and DATmem(15: 0). Connect the clk input to the clk bus, the input d(15: 0) on DATmem(15: 0), the input a(10: 0) via a Bus Tap on ADRmem(10: 0). Place a wire on we’s input and name it WE.

Figure 131 datamem Connection

Regarding the Rdm register, it can be loaded from this data memory or from the data bus of the processor S3. As for the loading of CO register, you will use a mux2x16 whose output O(15: 0) is connected to the input D(15: 0) of the Rdm register. The input D0 is connected to the output of the spo of the DATAmem component and the input D1 to the data bus bus_data of the S3 processor. You have to consider both load commands to this register Rdm. The first comes from B2Rdm signal, the second from a new signal RE (Read Enable). Then place an OR2 on these two signals, its output directly controls the CE of Rdm. Still to control the mux2x16, it is the B2Rdm signal which selects the multiplexer via the S0 input.

Figure 132 The datamem memory fully connected

READ / WRITE instructions

The semantics of these two instructions is quite trivial:

    • READ: Ram contains an address. READ triggers a reading in datamem at this address, the result is stored in Rdm.
    • WRITE: Ram contains an address, Rdm contains data. WRITE triggers the writing of this data in datamem to the specified address.

Since the memory is internal to the processor, these two instructions take only one processor cycle.

You will need to change your FSM!

As from the beginning, you need to add the output ports that will carry the control signals. Here there are two new WE and RE signals. I propose to anticipate and to add on two more push and pop signals. Also add the 4 internal signals associated WE_I, RE_i, push_i and pop_i.

In the process Next_output, you can initialize the 4 internal signals. Then add a branch in the IF of the state chargement to decode these two instructions. I suggest you choose the codes 7000 for READ and 7100 for WRITE.

You still have to complete the process Synchro to produce signals on the output ports of your FSM.

Once the fsm.vhd file saved, you can recreate the symbol associated with fsm.vhd and update it in S3.sch. Add four wires on new ports WE, RE, push and pop and name them with the same names. Other outputs do not change.

Figure 133 The fsm with the new ports

Update your assembler S3asm.bat to support these new instructions. Just add these two control lines after the ALU codes for example.

A test program

I suggest you write the program that inputs three 8-bit words from switches, store them in the memory at addresses 0, 1, 2 and then displays them one by one in the same order on the display.

With the assembler you can produce testmem.coe file, place it in insmem, generate insmem, update it in S3, save S3.sch, generate the corresponding symbol and then update it in toplevel. But now you know to do all this ... Now, it is your turn to test this program on the card once toplevel.bit is produces or simulate it if you have not yet bought your card …

An autonomous stack

Usually the stack is stored in the data memory and a special register is used as a stack pointer on a reserved area of this memory. A PUSH instruction requests a transfer from this register to Ram, a transfer from a register containing the data to be stacked into Rdm then a WRITE. For a POP you can figure out what actions to be undertaken. In order to maintain a one cycle execution, I propose to build a mini-stack of 4 locations outside the data memory and with its own access instructions.

First you will build a stack pointer that can be incremented or decremented on the cycle rising edge.

You had already built a counter in lesson 4 (Figure 51). Here it is necessary that the counter only works when there is an active increment or decrement command and must be synchronous with the clock: you should use an FDE flip-flop instead of an FD (read Symbol Info). Here is the behavior of our stack pointer at each rising edge of the clock as a function of the DEC and INC inputs.

Create a new schematic type source and name it incdec2. Instantiate two FDE flip-flops. Create a wire clk with its I / O Marker connected to the two flip-flops C inputs. Then two wires with I / O Marker inc and dec are associated with an OR2 gate and the output of this gate validates the CE of the two FDE flip-flops when one of the two signals is active, you have just realized the identity on Q0, Q1 when neither inc nor dec is active. In both cases dec or inc, the flip-flops will be activated by the signal E. For Q0, simply invert it in both cases. Connect the output Q of the flip-flop to the left to its D input after passing through an inverter INV. For Q1 you only need an XOR2 gate on both outputs Q0 and Q1. Then you just have connect the input of the second flip-flop to the XOR2 result if it is an increment or to its inverse if a it is a decrement. This IF is achieved with two AND2, an INV and an OR2.

Place two wires Q0 and Q1 on both Q outputs of the respective flip-flops and place your I / O Marker on these wires.

Figure 134 2 bits synchronous Up / down counter

Once saved, you can generate the corresponding symbol.

You will now create a stack of depth 4. Create a new source of type shematic named stack. Place four 16-bit registers with the symbol previously used for processor registers: FD16CE. The same I / O marker named clk is connected to all C inputs. Another I / O Marker named a(15: 0) is connected to all inputs D(15: 0).

The stack pointer always points to the next free location, during an active push a write must be done in the register associated with the value of the stack pointer. For this, use a decoder D2_4E of the Decoder library. Depending on the outputs Q0 and Q1 of the stack pointer incdec2, One and only one register will be active. The decoder itself is only active if the push signal is active. Place three input wires, name Q0 the wire connected to A0, Q1 the one connected to A1 and push the one connected to E. The four outputs D0 .. D3 are connected to each CE of the corresponding register.

Now place under the decoder an instance of your component incdec2. Place three input wires: clk on clk, push on Inc and pop on dec. Place two I / O Markers on push and pop. Pull two wires named Q0 and Q1 on the outputs. Here how your construction would look like now.

Figure 135 Push on the stack

For the pop part you need to remember that the stack pointer points on the next empty cell in the stack. Therefore, select the output of one of the four registers by taking into account this shift. As in the case of the ALU you will build a multiplexer tree mux2x16, here it needs only three, the last produces the output b(15: 0) on which you place an I / O Marker. The truth table for this selection tree is easy!

With an inverter on Q0 and an XOR2 on Q0 and Q1 it's OK. Then you just have to respectively control the multiplexers of the first level first and then that of the second level.

Figure 136 The stack

You can save and generate the corresponding symbol.

Integration into the S3 processor

The stack must receive input data and thus will be connected to the data bus after the last register (RI), it must also produce a data on this bus, simply include it in the cascade of connecteur16. I suggest that you place a stack instance to the right of register RI. Delete the connection bus between connecteur16 CCO and CRI. Place a new connecteur16 above your stack and name it Cpile. Connect Dout(15: 0) of CCO with Din(15: 0) of Cpile. Connect Dout(15: 0) of Cpile with Din (15: 0) of CRI, the cascade is rebuilt.

Connect the stack output B(15: 0) with the Cpile connector input R(15: 0). The input of the stack a(15: 0) is connected to the bus output Dout(15: 0) of the CRI. You still have to add a name wire on clk named clk, a wire on push named push and a wire connecting pop and connect o Cpile called pop.

Figure 137 Integration of the stack

The last step is the inclusion of the new instructions for the management this stack. This is done in your FSM. You have already added the necessary ports and internal signals. What you still ned to do is to decode the instructions.

I propose to integrate three instructions: PUSH, POP, and PUSHI, these three instructions reproduce the same coding as the MOV and MVI:

    • PUSH: Stores a register in the stack. I suggest the x80s0 code where s is the source register number (1..F).
    • POP: removes the top value of the stack and stores it in a register. If the POP destination register is CO, as for the MOV the next instruction will still be executed before taking into account the jump (delayed branch). I suggest x900d code where d is the source register number (1..F).
    • PUSHI: stores a 16 bits immediate to the stack. As for MVI this immediate is constructed of the 8 bits of the instruction RI(11: 4) completed on the left by zeros. I suggest xAvv0 code where vv is the least significant bit of the immediate.

In the process Next_output, simply decode the three codes of these instructions and validate the source or destination then sent the push or pop signal.

Finally save the file and regenerate the symbol associated with fsm.vhd. Update higher hierarchical levels (S3 and toplevel).

Votre processeur S3 est complet !!! ( in french in the text ;-)

You have to change your assembler to take advantage easily. Here are three lines to add. For PUSH or PUSHI, instructions are always coded on 16 bits, the simplicity of SED requires you to manually add the missing less significant 4-bits, put anything on the 4 bits it will work because there is no destination register, the easier it is to put a zero on the right. (See the following example for syntax)

Here is a small program to test these recent instructions. You have written a program that calculated the sum of two 8-bit numbers entered in 2 times on the switches. Here is an add16.S3 code which seizes two 16-bit numbers in 4 times and calculate the sum to display. With S3asm.bat generate the file add16.coe and place it in insmem as the configuration file of this memory. After updating the higher hierarchical levels, you can re-synthesize the toplevel.bit file.

To test it, enter 4 8-bit words separated by pushes on the pause button. It displays the result of the sum.

This code executes twice the same sequences of instructions. You can test a new version add16V2.S3 that uses the concept of function. This function takes a 16-bit number and places it in the register R1. Calling it twice and with some register transfers you can perform addition and display.

Assemble, store in insmem, update the hierarchy levels, synthesize and run on the card or simulate with Isim.

The last code that I propose allows you to pass the return value on the stack itself. Simply store this value by a PUSH just after the POP on CO. In the calling program a POP on the stack allows to retrieve the data to do whatever you want with.

Your turn

Question 1

Add a function that displays a 16-bit number passing it on the stack. Use this function to display the result of the addition.

Question 2 (difficult)

With the functions DEC, MUL that you have developed in Lesson 9, and adding a new function RDM (RanDoM) in the ALU and in your assembler which returns a random number of 16 bits (Linear Feedback Shift Registers), Write a program that calculates the number PI with a Monte Carlo method.