Integrating R and julia for parallel processing in the cloud
The aim of the following (yet unfinished) set of posts is to provide an introductory tutorial for using the languages R and julia together, not only on your local machine but in the cloud (allowing for parallel computing). This is a workflow that I'm currently creating for myself, so I'll discuss why each of these steps might be useful for your own projects.
As a statistician, R is my day-to-day language of choice. It's a what I'm comfortable with and (mostly) provides the functionality that a statistician/data scientist might need. In the project I'm currently working on, however, R becomes infeasible as a language base because it's just too slow for the computationally intensive, iterative simulations that I need to run (and many instances of this simulation are required).
Julia is relatively new language (version 1.0 was released in August 2018) that aims to provide "the speed of C with dynamism of Ruby...with true macros like Lisp, but with the obvious, familiar mathematical notation like Matlab...as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell."
This is a pretty ambitious list of requirements, and I would not say is fully met in version 1.0, however, my experience of julia has been that it is very fast and as an R user with little background in programming, fairly easy to pick up. If you are interested in learning more about julia, the learning pages of their website have a wealth of information and usually include a monthly 'intro' video that you can watch at any time. For my current work, using julia for the intensive, iterative simulations has made this project feasible - taking a instance that may take over 20 minutes in R to under 20 seconds in julia. However, I haven't managed to make the full switch to julia because I also need to use statistical techniques such as generalised, random effect linear modelling on the outputs from these simulations, and the linear modelling packages of julia are not quite there yet on the usability front (though I'm sure in a year or so they could be).
With that in mind, I've chosen to use both R and julia together, making use of the R package JuliaCall to integrate these two languages. This post gives a quick demonstration on how to get started with this package.
This presentation walks you through how to implement embarrassingly parallel problems in parallel on both your local machine and in the cloud using Azure web services, based on a tutorial by David Smith.