Your assignment turn in will consist of a write up along with code pushed to Gitlab.
The purpose of this assignment is to (a) familiarize you with the code base that we will use for the other two assignments and (b) familiarize you with how programming languages are implemented. The code base is large and probably a bit intimidating. However, the problems do not require you to understand every line of the CaCL implementation. They do require that you touch a bit of most parts of the implementation, so you will have to understand the overall architecture of a language implementation. (This is intentional.)
Buggy Code - Gilbert wrote CaCL in a fever dream where he thought it would be cool if everyone could hack on some custom weirdo language for this course. There are some deliberate learning benefits to the way CaCL was designed. AND Gilbert put a lot of time into trying to add useful features and test the code. BUT there is a very high likelihood that the CaCL implementation has a lot of trivial and deep bugs hidden in it. Please report these bugs! We will take this into account when grading, so please do not stress out too much if you run into such bugs.
Create your GitLab repository fork of CaCL. Clone a local version of the code and confirm that you can get all the tests to pass. Congratulations!
Recall our discussion of let bindings from class. Specifically, please recall the formal semantics for let-bindings that we looked at
In this part of the assignment, you will be given a formal definition of an alternative let-binding semantics. We will call this new construct plet. It differs subtly from the standard let-binding
Doing this may require you to change the majority of files in the cacl/ directory. Hint: Look for everywhere in the source code that a let node in an IR is defined or a let case is handled in a pass. You should be able to get 90% of the way there by copying and pasting. Getting 100% of the way there will require you to think critically about which parts of the intepreter/compiler actually need to be modified.
You will find an existing test file tests/a1/test_plet.py. It contains two example tests that demonstrate some less obvious features of the pytest framework. However, please look at other tests in the tests/dev/ directory or at a pytest tutorial to get a better sense for how this testing framework works and can be used.
Create a test or pair of tests consisting of two programs that are identical except for substituting plet for let, and for which the result of evaluating these two programs is different.
Note: For credit here you must have one such distinguishing test or pair of tests. However, we recomend writing more tests to help ensure that your implementation is correct. If you changed compiler.py or c_target.py, make sure to write a compiler test (i.e. using assert_compile or check_compile_match). If you did not change those files, then don't worry. In general, it is important to make sure the interpreter and compiler do not diverge!
Write up an intuitive description in English of the difference between the two let-binding constructs. Which do you think is better? Can you give pros and cons for each choice?
If you inspect the CaCL implementation you will see that I have defined special IR nodes for And(...) and Or(...). However, if you look in cacl/builtins.py you will see that there are builtin functions (b_and ...) and (b_or ...). This seems awfully redundant. Is there a subtle difference, or is this a trick question???
Aha, so you think they're not equivalent. Can you prove it to me? We should be able to use the same technique from subproblem 1.2. Write a test that demonstrates the two constructs produce different results. Put this test inside a file named tests/a1/test_and_or.py
Aha, so you think they're equivalent. Can you prove it to me? Actually, you need to take CSE505 to learn how to do that. Ok, let's do something more practical. Make a proposal for how we should simplify the CaCL implementation by removing the IR nodes And(...) and Or(...). (Why is it generally better to remove IR nodes over builtins?) Make an English-language argument for why removing all of the code specific to And(...) and Or(...) will be safe and doesn't do anything fundamentally different than the builtins.
Your version of CaCL comes equipped with two for loop constructs:
(gen-vector (i in size) = body)
The first of these is kind of like a functional map operation. For each integer 0 ≤ i < size, we evaluate the expression body to a value of some type T. The whole gen-vector then evaluates to a {Vector T}, containing the results of each loop iteration.
(for (i in size) carry (x = init) = body)
The second form is more like a functional fold operation. To start, set the "carry variable" x to the result of evaluating init. Then, for each integer 0 ≤ i < size, set x to the result of evaluating body. The whole for then evaluates to the final value of x. (note: CaCL also supports executing for without a carry variable, but that detail is unimportant for the purposes of this problem)
In this problem, you will extend CaCL with a third basic looping construct that is sometimes known in functional programming by the name scan.
(scan-vector (i in size) carry (x = init) = body)
Scan will execute just like the for loop. However, the result of the overall scan-vector operation will be to produce a {Vector T} collecting all evaluations of body, rather than returning only the final value of x/body. (note: x, init, and body must all have type T)
For example, (scan-vector (i in 4) carry (x = 0) = (+ x (get-vec V i)) will produce a vector of length 4 containing all partial sums of vector V. If V = (vector 2 - 3 10 42), then the result of this scan-vector call will be (vector 2 -1 9 51).
If you wish, you may choose to write a formal semantics specification for scan-vector. You don't need to do this, but it may help you reflect if you are stuck or keep running into problems. If you choose to do this, please include it in the writeup so that we can better assess your understanding.
This is similar to sub-problem 1.1. You will need to make local modifications to a majority of the files in the cacl/ directory. Again, this is a kind of mash-up of already existing code. However, you will need to think critically about the correct way to reconcile or differ from the existing looping constructs.
In this testing sub-problem, you're not trying to prove that two things are different. Rather, we're asking you to make a good faith effort to test your implementation. We will keep secret some wrong implementations of scan-vector, so you will need to think critically about how to detect bugs in other people's implementations of scan-vector.
To get full credit for this sub-problem, you will need to include at least 3 tests in file tests/a1/test_scan_vector.py: tests for correct type-checking, tests for correct interpreted behavior and tests for correct compiled behavior. (Look up the @pytest.fixture named check_compile_match in tests/conftest.py and its use in tests/dev/ for a convenient way to test whether the interpreter and compiler agree.)
To make this interesting, we will also run your tests on other groups' implementations. You will get bonus points if your tests find bugs in other groups' implementations. We recommend trying to come up with at least 3 distinct tests of correct interpreted behavior (these usually require the most thought).
It's possible to do computations on multi-dimensional arrays using only vectors. (e.g. x[i][j] can be reduced to x[i*n_col + j]) However, it might be more convenient to have a proper tensor type. Design and add a Tensor type to CaCL, similar to Ref or Vector. We recommend something like {Tensor 2 Float} for a matrix or {Tensor 3 Float} for a 3-tensor.
You should provide tests and an explanation of your design in addition to your implementation.
This will take a substantial amount of time to do, so only attempt it if your group is able to quickly tackle the basic problems.
It would be very helpful to have some kind of builtin Map data-structure (like a Python dict or C++ Map). To keep this simple, let's assume we are only interested in maps that use Strings as keys. So something like {StringMap T} where T is an arbitrary type that you are mapping strings to.
As with bonus problem 4, you should provide tests and an explanation of your design in addition to your implementation.
This will take a substantial amount of time to do, so only attempt it if your group is able to quickly tackle the basic problems.
It would be cool if you could implement some program of real interest in CaCL. It would probably be useful to everyone on future assignments. Here are some ideas of programs that I'd like to write but haven't gotten around to. (None of them are too complicated)
Matrix Multiplication using vectors (or tensors)
Can you think of an interesting (numeric) computation using tree data structures?
Image Processing: how about a simple 3x3 blur
A particle simulation of some kind e.g. a gravitational simulation, or maybe a mass-spring system
Inverse Kinematics for a linear chain robot (maybe just 2d to start)
Other ideas are welcome; check with Gilbert if you're unsure
CaCL would be a lot more useful if we could easily move data between it and Python. In particular, we'd be able to use the existing Python support for plotting graphs or reading and writing data from and to all kinds of formats.
Can you design a foreign function bridge between CaCL and Python? In particular, it would be good if it's possible to use CaCL to compile C++ code and execute those sub-routines from Python.