This blog contains guidelines and tips on how to run linear and logistic regression in distributed settings with Apache Spark and TensorFlow. It was written by Alessandra Cabassi and Junyang Wang during the 2018 internship programme of The Alan Turing Institute. The project was sponsored by Cray inc. and carried out on the Cray Urika-GX platform.
The first section of the blog, "Airline data", is dedicated to describing the data that we used for the tutorial. The other two, "Regression with TensorFlow" and "Regression with Apache Spark", contain blogposts covering different aspects of distributed regression in the two frameworks.