GSoC '21 - Halide

Rewrite Rules Evaluation - Evan Lee

Deliverable: A subset of the ~4000 auto-synthesized rewrite rules merged to Halide compiler

Final Code (pull request): https://github.com/halide/Halide/pull/6174

Brief Summary:

  • Experiments from this paper https://dl.acm.org/doi/pdf/10.1145/3428234 showed that adding 4127 autogenerated rewrite rules lowers Halide's peak memory usage by up to 50% in 195 use cases. However, adding all those rules is impractical due to increased code size and compile time. The goal of this project was to identify the rules that resulted in those performance wins.

  • The first step I took was to modify the Halide template matching code for rewrite rules to print out the rules that get matched in the pipeline. In the process I fixed a minor error associated with Halide debugging: https://github.com/halide/Halide/pull/6088

  • With the 4127 rewrite rules enabled, I ran all the apps https://github.com/halide/Halide/tree/master/apps and saved the rewrite rules that got triggered during the pipelines. Then, just with those rules, I ran the experiment https://github.com/halide/Halide/blob/super_simplify/apps/super_simplify/run_experiment.sh which measures various performance metrics including peak memory usage, compile time, etc. However, it had no meaningful performance improvements.

  • The next step I took was to extract the rewrite rules that get triggered when running the apps with random seeds generated by the experiment. This time, I was able to reap the performance gains very similar to what was achieved in the research paper. As a result, I was able to narrow down the rules to ~300 rules.

  • I took a divide & conquer approach to testing the ~300 rewrite rules. I tried numerous approaches, such as keeping on diving the subset by half, and using randomized selection algorithms. At the end, to our surprise, I was able to narrow down the rules to 11 rewrite rules, whose associative & commutative variants were added in this PR https://github.com/halide/Halide/pull/6174. With just these rules, Halide achieves 10% ~ 50% peak memory reductions in 192 cases in apps including camera_pipe, harris, nl_means, and stencil_chain, which is similar to the results (with 4127 rules) from the research paper.

I was able to finish the project early (1 month remaining), so I decided to take on a second project as a stretch-goal. In the project, I am working on a code generation tool which produces valid C++ code (if-else trees) for rewrite rules, with the aim of reducing compilation time for Halide. I am still working on the project, and the progress so far can be found here: https://github.com/halide/Halide/tree/rootjalex/gsoc_codegen/apps/codegen