CSCI-3434 Fall 2024 - Allocator Component

In this component, you will implement a bottom-up local register allocator based on the algorithm in Ch. 13.3.2 of Engineering a Compiler. Your program will takes as input a sequence of branch-free operations, as specified in Component 1, and an integer k ≥ 3. Your program will produce as output an equivalent sequence of ILOC operations that uses at most k registers.

Do not underestimate this component. This is the hardest single assignment I give in any of my courses.

Specification

For the purposes of this component, an output block is equivalent to an input block when, for every input, they print same values in the same order; and each data-memory location defined in the execution of the input block has the same value in execution of the output block. It is permissible for output block to define additional memory locations that hold spilled values.

Your allocator MUST only perform register allocation. No other optimization is allowed. While you could execute some blocks and replace them with a series of output and store operations, this is forbidden. The output block MUST contain all of the arithmetic operations, in the same sequence, as the input block. The goal is to focus on the problems that arise in a local register allocator, not to capitalize on the simplified form of the ILOC test blocks. You MUST NOT use hashtables for this component.

Before submitting your program, test it thoroughly on all provided blocks.

Core Component: Naïve Allocator

You MUST extend tipc to support register allocation. As before, your input is a ingle basic block, consisting of instructions from the same subset of ILOC as The Lexer/Parser Component. You SHOULD use the lexer, parser, and intermediate representation you built in that component, so be sure to fix any parsing or lexing errors: they will count as failures for this component. You SHOULD update your intermediate representation to use a struct or class for each operand, so you can store the different registers allocated at different stages of the process. You SHOULD store the original index of each instruction, which MAY be the line number in the source file or MAY simply be a count of instructions in the source file.

The new component of tipc is the bottom-up local register allocator, described below and drawn from Figure 13.1, page 687, of Engineering a Compiler. This proceeds in two steps.

For each instruction, you MUST compute the next use of each operand. Along the way, you SHOULD rename the registers, giving a unique virtual register to each live range using Algorithm 1: Compute Last Use. This helps separate the definition of a register from a use of a register. Finally, you SHOULD compute the maximum number of concurrently live registers by keeping track of the number of valid entries in the table.
To allocate the registers, use the allocation algorithm detailed in Algorithm 2: Local Register Allocation. This will require you to spill some register values into memory.
- Note that each integer requires four bytes.
- Memory locations 0 through 32767 are user memory, and are reserved for use by ILOC input blocks.
- Locations starting at 32768 in data memory are reserved for the register allocator. Your allocator can safely store spilled values into these locations.
- You SHOULD add additional tables to enable GetPR() to be constant in the number of instructions: this means you SHOULD NOT iterate over any table indexed by virtual instructions.

You SHOULD use different fields to store the source register, virtual register, and allocated physical register of each operand, along with all necessary annotations like next use. The algorithm here use the fields SR, VR, PR, and NU for the source register, virtual register, physical register, and next use value. These algorithms assume each instruction has three operands. Modifying them to work with the full range of instructions is left as an exercise for the implementer.

As a reminder, a store r1 => r3 instruction does not define a register. The second operand, r3, is a use, even though it occurs on the right of a =>. You MUST to treat both operands as uses. You SHOULD store the second operand in the second column of your intermediate representation.

You MAY add annotations to your intermediate representation to note the original index, if an instruction is a spill or restore instruction, or other useful debugging information. If printed during pretty-printing, these annotations MUST be printed as comments.

Algorithms

Full Credit: Optimizations and Heuristics

Your final deliverable SHOULD improve upon this naïve algorithm. Your goal is to minimize the total number of cycles that the allocate code uses. You will optimize your allocator by avoiding unnecessary memory operations for certain values. There are three conditions that will result in a value in a register not needing to be stored to memory on a spill: it may be rematerializable, already spilled, or clean. Modify your allocator to take advantage of these kinds values and improve the resulting code.

A rematerializable value is the result of a loadI instruction. Instead of spilling and reloading a rematerializable value, the allocator SHOULD load the constant with another loadI instruction. This replaces 6 cycles of work with 1.
A respilled value is the result of a load instruction from the area of memory reserved for spilling. This value still resides in memory, and so does not need to be stored. It SHOULD be reloaded directly from the existing memory location. This replaces 6 cycles of work with 3.
A clean value is the result of some load instruction from user memory that meets two conditions. First, you MUST be able to determine the location it was loaded from. Second, you MUST verify that the location will not be modified (dirtied) by a store before the next use. Assume that every store dirties every clean value. When you can ensure both conditions, you SHOULD replace 6 cycles of work with 3. Be sure you do not sacrifice correctness for efficiency! You SHOULD keep your allocator linear (or n log(n)) in the number of instructions, and thus SHOULD ensure that all information you need to spill or allocate a line of code is already available.

Once you have optimized spills, you SHOULD reconsider how you choose which registers to spill. In the naïve algorithm, you always choose the register with the furthest next use. When all spills were equal, this was clearly a good heuristic. Now that clean spills are quicker than dirty spills, this is not always the case. In fact, determining the optimal choice of which registers to spill is NP-complete. Sometimes it will be better to always pick the cheapest, clean value. Other times it is better to pick the furthest value, even if it is dirty. You must develop a heuristic that gives you good results. Test your heuristic widely on the sample blocks, using a variety of number of registers.

While you MAY attempt to optimize memory usage of your spills, you will receive no credit for doing so. The critical metric is the run-time of the compiled code, and secondary metrics are the run-time and memory usage of the compiler. The memory usage of the compiled code is not important for this project.

Implementation Details

Interface

Your code MUST compile on the janus machines, using a Makefile in the project directory, to an executable named in that directory. Your program MUST implement the following command line arguments when specified by the short flags. You SHOULD support the long flags as well. You SHOULD use a library, such as getopt, to handle these flags. Your default for this version SHOULD be the -a flag. Aside from the default behavior, your code MUST be backwards-compatible with Component 1.

-a --alloc Run the allocator algorithm on the input block of code. By default, pretty-print the results.
-k num Use num registers when allocating the code.
-l --lexer Print a list of the tokens, one per line. You should print both the token type and the specific lexeme.
-p --pretty-print Print legal ILOC code. You do not have to preserve white-space or comments.
-t --table-print Print the intermediate representation in tabular form.
-h --help Print a help summary and exit.
-d --debug Print debugging information during execution. All debugging information SHOULD be printed as a comment.

If the -k flag is not provided, you SHOULD use a default value. You MAY add an optional parameter to -p, specifying if the pretty-printer prints source, virtual, or physical registers. If you do, you MUST update the -h flag accordingly. Unless the -h flag is used, the final argument MUST be filename, an input file containing an ILOC block. If the -a flag is provided, or if none of -p, -l, or -t are provided, then your output MUST be an equivalent block of ILOC code that uses the registers 0..k-1.

For the tabular output, you SHOULD format the output into well-spaced columns, using separators such as | between elements of your IR. All outputs MUST be printed to stdout. Error messages for invalid arguments or an invalid file SHOULD be reasonable and MUST be printed to stderr. You MAY forbid the combination of -l with other print flags. Error messages for invalid arguments or an invalid file SHOUDL be reasonable and printed to stderr.

You may include other command-line arguments or flags for your program, but SHOULD leave the -s flags free.

Infrastructure

The workspace repository has a directory blocks that contains a a number of sample blocks. Each block has a comment with a sample input and expected output when simulated (see below).

Further the following are available in the directory thcAllocator

A folder outputs/ that contains sample outputs for some blocks.
alloc_ref: A reference allocator to compare your results with.
alloc_o0: A reference allocator that does no optimization.

sim: A simulator. Running ./sim -h will display details, but the usual invocation is ./sim < file or ./tipc -p file | sim.
The -i nums flag lets you initialize memory. The first number is the starting location, the every number after that populates 4 bytes. The -r NUM flag allows you to set the number of registers.
TestAlloc: A script to test your implementation. The normal use is ./TestAlloc <allocator> <directory> <k>
TestTiming: A script to test the cycles used in the output of an allocator. The normal use is ./TestTiming <allocator> blocks/report/
test_opts: An even more basic script to test optimizations. Just run it on your allocator.

Lab Report

You MUST write a lab report for this component. The report SHOULD be 3-5 pages, typeset, and contain the following sections.

Problem Statement

Provide a brief description of the problem. Your report SHOULD make sense to someone from outside the class.

Methods

Provide a high-level description of the register allocation algorithm you implemented. You MAY use pseudo-code if it is helpful, but SHOULD NOT focus on describing the naïve algorithm. Instead, emphasize differences between your algorithm and the one described in the book. In specific, you SHOULD touch on:

The key design decisions you made when implementing your allocator. Do not include a detailed description of every field, class, and method. Only describe those that are important to your high-level algorithm.
Any significant tables that you used, which did not occur in the pseudocode above.
Any optimizations you implemented, and a high-level description of how they were implemented.
How your allocator chooses which register to spill.
Any approximations or heuristics your algorithm uses, and why they are reasonable.
The asymptotic complexity of your allocation algorithm, using big O notation.

Results

Discuss your experimental results and the conclusions you drew about the efficiency of your allocator. You MUST use tables or graphs to describe your results, and SHOULD choose whichever one is more informative to your reader. You SHOULD:

Compare the results of your allocator and the reference allocator. Measure the number of cycles reported by the ILOC simulator for both allocators on each report block, and each value of k from 3,4,8,12, and 16. Be sure to list the number of cycles used by the original code. This can be done with TestTiming.
Compare the running time of your allocator and the reference allocator. Measure the running time of both allocators on each of the timing blocks, for k=15. Each timing block has a successively larger number of lines. If your allocator is less efficient, consider which design decisions might account for the difference. Consider the consistency of the timing results and your complexity analysis in the Methods section. This can be done with the linux time command.
Discuss any blocks your allocator failed on, and the relevant values of k. This may be failing to run, producing a block that does not run on the simulator, or a block that generates the wrong output when run on the simulator.

Experience

Describe what you learned while implementing your allocator. This is the only sections where you SHOULD discuss the process behind your decisions, instead of the decisions themselves. You MAY discuss the following:

How did your plans change as your implementation progressed?
Did your choices for Component 1 impact this component?
If you had to start over, what would you do differently?
What advice would you give to other students attempting this component?

Rubric

While this project is not graded, you may use the following rubric to judge the importance of different aspects.

40 points - Correctness of the code produced by your allocator.
30 points - Implementing optimizations and improving heuristics.
- 5 points - Rematerialization optimization.
- 10 points - Already-spilled value optimization.
- 10 points - Clean value optimization.
- 5 points - Improving on the default spill heuristic.
25 points - Final report
5 points - Interface follows the specification.
5 points (bonus) - The relative ranking of your improved heuristic to the reference implementation and other students.