Full Credit: Optimizations and Heuristics
Your final deliverable SHOULD improve upon this naïve algorithm. Your goal is to minimize the total number of cycles that the allocate code uses. You will optimize your allocator by avoiding unnecessary memory operations for certain values. There are three conditions that will result in a value in a register not needing to be stored to memory on a spill: it may be rematerializable, already spilled, or clean. Modify your allocator to take advantage of these kinds values and improve the resulting code.
A rematerializable value is the result of a loadI instruction. Instead of spilling and reloading a rematerializable value, the allocator SHOULD load the constant with another loadI instruction. This replaces 6 cycles of work with 1.
A respilled value is the result of a load instruction from the area of memory reserved for spilling. This value still resides in memory, and so does not need to be stored. It SHOULD be reloaded directly from the existing memory location. This replaces 6 cycles of work with 3.
A clean value is the result of some load instruction from user memory that meets two conditions. First, you MUST be able to determine the location it was loaded from. Second, you MUST verify that the location will not be modified (dirtied) by a store before the next use. Assume that every store dirties every clean value. When you can ensure both conditions, you SHOULD replace 6 cycles of work with 3. Be sure you do not sacrifice correctness for efficiency! You SHOULD keep your allocator linear (or n log(n)) in the number of instructions, and thus SHOULD ensure that all information you need to spill or allocate a line of code is already available.
Once you have optimized spills, you SHOULD reconsider how you choose which registers to spill. In the naïve algorithm, you always choose the register with the furthest next use. When all spills were equal, this was clearly a good heuristic. Now that clean spills are quicker than dirty spills, this is not always the case. In fact, determining the optimal choice of which registers to spill is NP-complete. Sometimes it will be better to always pick the cheapest, clean value. Other times it is better to pick the furthest value, even if it is dirty. You must develop a heuristic that gives you good results. Test your heuristic widely on the sample blocks, using a variety of number of registers.
While you MAY attempt to optimize memory usage of your spills, you will receive no credit for doing so. The critical metric is the run-time of the compiled code, and secondary metrics are the run-time and memory usage of the compiler. The memory usage of the compiled code is not important for this project.
Your code MUST compile on the janus machines, using a Makefile in the project directory, to an executable named in that directory. Your program MUST implement the following command line arguments when specified by the short flags. You SHOULD support the long flags as well. You SHOULD use a library, such as getopt, to handle these flags. Your default for this version SHOULD be the -a flag. Aside from the default behavior, your code MUST be backwards-compatible with Component 1.
-a --alloc Run the allocator algorithm on the input block of code. By default, pretty-print the results.
-k num Use num registers when allocating the code.
-l --lexer Print a list of the tokens, one per line. You should print both the token type and the specific lexeme.
-p --pretty-print Print legal ILOC code. You do not have to preserve white-space or comments.
-t --table-print Print the intermediate representation in tabular form.
-h --help Print a help summary and exit.
-d --debug Print debugging information during execution. All debugging information SHOULD be printed as a comment.
If the -k flag is not provided, you SHOULD use a default value. You MAY add an optional parameter to -p, specifying if the pretty-printer prints source, virtual, or physical registers. If you do, you MUST update the -h flag accordingly. Unless the -h flag is used, the final argument MUST be filename, an input file containing an ILOC block. If the -a flag is provided, or if none of -p, -l, or -t are provided, then your output MUST be an equivalent block of ILOC code that uses the registers 0..k-1.
For the tabular output, you SHOULD format the output into well-spaced columns, using separators such as | between elements of your IR. All outputs MUST be printed to stdout. Error messages for invalid arguments or an invalid file SHOULD be reasonable and MUST be printed to stderr. You MAY forbid the combination of -l with other print flags. Error messages for invalid arguments or an invalid file SHOUDL be reasonable and printed to stderr.
You may include other command-line arguments or flags for your program, but SHOULD leave the -s flags free.
The workspace repository has a directory blocks that contains a a number of sample blocks. Each block has a comment with a sample input and expected output when simulated (see below).
Further the following are available in the directory thcAllocator
A folder outputs/ that contains sample outputs for some blocks.
alloc_ref: A reference allocator to compare your results with.
alloc_o0: A reference allocator that does no optimization.
sim: A simulator. Running ./sim -h will display details, but the usual invocation is ./sim < file or ./tipc -p file | sim.
The -i nums flag lets you initialize memory. The first number is the starting location, the every number after that populates 4 bytes. The -r NUM flag allows you to set the number of registers.
TestAlloc: A script to test your implementation. The normal use is ./TestAlloc <allocator> <directory> <k>
TestTiming: A script to test the cycles used in the output of an allocator. The normal use is ./TestTiming <allocator> blocks/report/
test_opts: An even more basic script to test optimizations. Just run it on your allocator.
You MUST write a lab report for this component. The report SHOULD be 3-5 pages, typeset, and contain the following sections.
Provide a brief description of the problem. Your report SHOULD make sense to someone from outside the class.
Provide a high-level description of the register allocation algorithm you implemented. You MAY use pseudo-code if it is helpful, but SHOULD NOT focus on describing the naïve algorithm. Instead, emphasize differences between your algorithm and the one described in the book. In specific, you SHOULD touch on:
The key design decisions you made when implementing your allocator. Do not include a detailed description of every field, class, and method. Only describe those that are important to your high-level algorithm.
Any significant tables that you used, which did not occur in the pseudocode above.
Any optimizations you implemented, and a high-level description of how they were implemented.
How your allocator chooses which register to spill.
Any approximations or heuristics your algorithm uses, and why they are reasonable.
The asymptotic complexity of your allocation algorithm, using big O notation.
Discuss your experimental results and the conclusions you drew about the efficiency of your allocator. You MUST use tables or graphs to describe your results, and SHOULD choose whichever one is more informative to your reader. You SHOULD:
Compare the results of your allocator and the reference allocator. Measure the number of cycles reported by the ILOC simulator for both allocators on each report block, and each value of k from 3,4,8,12, and 16. Be sure to list the number of cycles used by the original code. This can be done with TestTiming.
Compare the running time of your allocator and the reference allocator. Measure the running time of both allocators on each of the timing blocks, for k=15. Each timing block has a successively larger number of lines. If your allocator is less efficient, consider which design decisions might account for the difference. Consider the consistency of the timing results and your complexity analysis in the Methods section. This can be done with the linux time command.
Discuss any blocks your allocator failed on, and the relevant values of k. This may be failing to run, producing a block that does not run on the simulator, or a block that generates the wrong output when run on the simulator.
Describe what you learned while implementing your allocator. This is the only sections where you SHOULD discuss the process behind your decisions, instead of the decisions themselves. You MAY discuss the following:
How did your plans change as your implementation progressed?
Did your choices for Component 1 impact this component?
If you had to start over, what would you do differently?
What advice would you give to other students attempting this component?
While this project is not graded, you may use the following rubric to judge the importance of different aspects.
40 points - Correctness of the code produced by your allocator.
30 points - Implementing optimizations and improving heuristics.
5 points - Rematerialization optimization.
10 points - Already-spilled value optimization.
10 points - Clean value optimization.
5 points - Improving on the default spill heuristic.
25 points - Final report
5 points - Interface follows the specification.
5 points (bonus) - The relative ranking of your improved heuristic to the reference implementation and other students.