Your assignment turn in will consist of a write up along with code pushed to Gitlab.
The purpose of this assignment is to force you to confront the concrete details and challenges of implementing reverse-mode automatic differentiation. The original intention was that you would only have to touch an isolated part of the compiler. That is no longer true...
(More) Buggy Code - Gilbert has implemented Forward-mode AD for CaCL. However, this implementation is likely buggy in both small and larger ways. Regardless, the modifications should give you some inspiration for how to get started implementing reverse-mode.
Learn from the Journey - Because there is less starter code than intended, this assignment will probably be harder than expected. That's ok. We will be lenient in grading. On the other hand, this is an excellent opportunity to learn by doing something that's really challenging. Consistent with this philosophy, please take the opportunity to have more design discussions on the EdStem boards. It is not generally ok to share code with other groups, but I strongly encourage you to have design discussions. When you have those discussions, please try to do them on the forum so everyone can participate more equally (and not just e.g. people who happen to be hanging around the PLSE lab)
Update your Gitlab repository by pulling from the reference repository. Confirm that you can run the existing tests in A2. If we are able to get additional tests and updates out in the middle of the assignment, please also confirm that you have pulled those and that the tests run ok.
Recall in class that we discussed a higher-order function that applies a function repeatedly to some argument (expressed in Python-esque pseudo-code below)
def dodo( f : A -> A, x : A ) -> A:
return f(f(x))
Recall that we can design a continuation-based reverse-mode AD system in which the adjoint derivative types are defined to obey the following equations. (Note that we introduce the type U as a parameter to resolve the question "what does the continuation return?")
Then our adjoint total derivative for a function f: A -> B can be defined in terms of the adjoint total differential of dtf: A -> B -> A. (Note that we assume here that the argument Dx : A separates into two components x : A and dtx : A -> U
What is the adjoint derivative D^T_U[dodo]? Give your answer using typed pseudo-code, similar to the original function.
Consider the following function that makes use of dodo.
def test( x : Float, r : Float ) -> Float:
let f = (fun (y) = y * (y - r))
return dodo(f, x) - r * r * r * r
Express the gradient of this function test as a program that invokes the adjoint derivative of dodo.
Explain how we could try to find an optimal value of the parameter r, s.t. (assuming a constant x) the value of test(x,r) is minimized.
(bonus, numeric methods) What about this test function is likely to behave poorly? What is a reformulation of this minimization problem (e.g. by modifying test or the optimization procedure) that will find the same answer but exhibit better numeric behavior?
Hint: the answers you give for this problem ought to be consistent with the answers you would get if you merely inlined all of the functions to eliminate higher-order functions altogether. This is a good way to sanity check your work.
Implement reverse-mode AD in CaCL, and provide tests.
Update: There is now substantially more code infrastructure to help you.
There is one test (insufficient) of reverse mode in tests/a2/test_norm2.py However, this test illustrates the use of reverse-mode syntax. Some of the functions it calls are defined in tests/a2/conftest.py such as (grad1) and (run-tape). You need to implement these functions.
In addition to implementing those functions, you will also need to implement reverse mode in file cacl/derive.py I have written what should be all of the necessary front-end and type-checking code already. (The typechecking code is nasty.) You can find definitions of a few built-in types at the bottom of cacl/types.py
Note that I have not provided you with extensive information about how we plan to test your implementation. You will find that the current tests do not adequately test forward mode. In particular, there is poor checking of (a) derivatives of Variant types and handling of (b) stateful mutating code.
Recall our discussion of property-based testing. What is at least one useful property of reverse-mode AD that is worth testing? There is a "right answer" here, and you should describe why it's the right answer in your writeup. You may additionally identify other properties to test, but you should at least identify the most important one.
Once you've identified this property, write code to test this property for at least all 3 of griewank_1, lighthouse, and norm2. You may find it helpful to factor out some redundant code in the process.
Note that I have not provided you with extensive information about how we plan to test your implementation. You will find that the current tests do not adequately test forward mode. In particular, there is poor checking of (a) derivatives of Variant types and handling of (b) stateful mutating code.
Note that I have not provided you with extensive information about how we plan to test your implementation. You will find that the current tests do not adequately test forward mode. In particular, there is poor checking of (a) derivatives of Variant types and handling of (b) stateful mutating code. I probably got one or more things wrong.
Try to write tests that do a better job of exercising this code. If you find bugs, try to fix those bugs! If the bugs are really terrible and you think everyone needs to know about them, you can always ask me for advice on whether to post it.