Working with projects, Git, Github and renv in R
Git is a distributed Version Control System (CVS) that is important if you work with a group of collaborators (also as a supervisor and PhD or MSc student) on he same script. Also when working alone, it is very important for the quality of your workflow, as it forces you to document all changes, work with lists of issues to solve, etc. Github is the central platform where scripts of a projet
To work with Git and Github in R Studio, you need to work with projects in R Studio, not with stand-alone scripts, see this introduction. This is a good idea anyway, as it keeps all stuff together for your project. Your local project repository as described above works as the project folder. When opening RStudio, open the project that you want to work with, or make a new one. This also allows working with renv, which controls the versions of the libraries that you use in a project. This allows the synchronization that when different users collaborate on a script, they always use the same versions of all libraries used in the script. This is critical to predictable behaviour of a script in the future, eg after publications, or when you return to a project for example to revise your manuscript. It also help a lot when working together, eg with students that may not have the same library versions installed as you. In addition it is important you work with an R Studio project that is renv enabled. This secures that alll collaborators, and also yourself in 5 years from now, exactly work with the same software (library versions).
Git for Windows and Github installation steps
Make a free Github account on https://github.com/, and give it a logical username, such as your full name (without spaces), this helps your later collaborators, see hereon tips on user names. You can add multiple email addresses, but make sure to at least include your @rug.nl or @student.rug.nl email address (so you can be found and added to relevant repositories). When at another academic institution, use your email from that as it allows you to quality for free Github Copilot use.
Make sure to upgrade to the latest R and R studio version
Install Git for Windows, accepting all the default installation options
Or install Git for Mac, see here on how to do this and you use login by using a token
Accept all the default choices during the installation.
When using Windows, download and install RTools for windows - making sure the version matches your R version (currently RTools 44) , using the RTools installer as listed on the page with each version. This is for compiling packages from source.
When using Mac OS you need the following tools in order to compile R:
Xcode developer tools from Apple. Xcode can be obtained from Apple AppStore and the Xcode developer page. Older versions are available in the "more" section of the Developer pages (Apple developer account necessary). On modern macOS versions you can simply use
sudo xcode-select --install
which installs Xcode command line tools which are sufficient to build R (however, if you want to also build the R.app GUI you do need the full Xcode installation).
GNU Fortran compiler
R and some contributed package require a FORTRAN compiler. Unfortunately Xcode doesn't contain a Fortran compiler, therefore you will have to install one. R 4.3.0 and higher uses universal GNU Fortran 12.2 compiler. You can download an installer package gfortran-12.2-universal.pkg (242MB) - for more details and other download options see R-macos GNU Fortran releases on GitHub.
Register your name and email address within Git on your computer (you can always change this) by doing this:
Open a Terminal window, you can do this within RStudio at the bottom next to the "Console" tab, or you can use the as separate Terminal window in Windows
Then type at the prompt in the terminal window, or copypaste the following commands, with enter after each command. Put in your name instead of Jane Doe and your email (as registered with github) instead of jane@example.com, and enter one command at the time
git config --global user.name "Jane Doe"
git config --global user.email "jane@example.com"
git config --global --list
Create in your File Explorer or otherwise a local folder to where you will put your projects for collaboration on scripts. Put in a simple location that you can quickly find back+, such as d:\_github
Open RStudio, and close all open scripts or projects through /File / Close all (when only scripts open) and /File Close Project (when having a script open)
Clone the repository that you have made in Github (or received as a collaborator), by:
- copy the link to the repository in your webbrowser
- in R Studio, select File / New Project / Version Control / Git
and paste the name to your repository, making sure that it is placed in your local Github folder that you created earlier. Keep the name of the project the same as the repository.
Install in R Studio the package renv to your general R installation, for example throug using install.packages("renv")
In the R Studio console, type renv:: restore()
You are now asked to install packages to your project-specific library. You may think you already installed these before.
But you now work with renv, which means that the versions of libraries in the script are kept in sync across all users working on the project. And the packages are installed with your project and not in your general R installation.
This first installation takes a little while, but is only when you create the project like this locally for the first time.
In the script make sure to put at the top
renv::restore()
This (also in the future) restores all versions of the libraries to the version as registered in the .lock file of the project, also if your collaborator installs and additional library
You can keep this line in your script, if all is in sync, you get the message
'- The library is already synchronized with the lockfile.' after running the restore command
Now make a new script or change something in an existing script, for example by adding a comment line to the bottom
# this is a good script
And save your script.
In the top right pane in R Studio, select the tab 'Git' , which shows with the blue M that your script file is modified.
Select 'Commit' in this pane
Check the checkbox in front of the file to which you are going to commit a change
In the Commit Message top right, type a short summary of why you made this change to the script.
Meanwhile, in green (new) and red (old) you see what you change in this script with this commit.
In this case that would be "Annotation added".
This commit message is obligatory, you cannot leave it empty.
Now select "Commit" and your change is made in your local Git tracking changes.
Finally select the green "Push" button, and your change is pushed to the main version of the script in the GitHub repository
→ Using Mac and having a problem push authentication: With Mac, you are no longer allowed to use your password as an authentication for a push. Instead, you need to create a personal access token. When you push for the first time, you may be asked for a password. This personal access token should then be used as your password to push (not your account password).
If you now view the script on Github, you will see the change there.
In a next editing session, you want to check first if there are any changes made in the script by your collaborators.
For this, open the project from your local project folder (or it may be opened automatically already)
Go the the Git pane topright, and select "Pull" to check if your local project is in sync with the repository. If not, files will be updated accordingly. Always do this first before you continue editing anything in the project.
Keep the renv::restore() command in your script to see if you work with the right versions of the libraries that you and your collaborators use in this project.
If you need to change to different versions of the libraries you will be asked to restart your R session (in menu /Session / restart R session)
To see the last change but also the history of changes to the file, select "History" in the Git Pane in RStudio. There you can see all changes to all scripts in the project, or also filter for a particular script or file (as a figure).
This is not only important during collaboration, but also for yourself!
Further details on library version management with renv
When you work on different computers and/or with different users on the same script, it is important that all computers have the same version of the libraries installed that are used for the script.
This "freezes" the behavior of the script - it will always work exactly the same.
https://rstudio.github.io/renv/articles/collaborating.html
renv::init() Initialises renv for the first time for a project, creating the lock file. Do not use this anymore after the project has been created
renv::snapshot() stores in the lock file which versions of which packages were installed and active during the development of a script, if you update a library while working with the project, re-run a snapshot. When other users then run restore, they are asked to update their versions of libraries. Do this snapshot only when really needed, when a function is broke in a library or you really need the functionality of the new version of the library. Otherwise stick with the version you first started with. Typically as a user you do not need to use this command.
By default, renv::snapshot() uses the type = "implicit" strategy, which only captures packages loaded in the current session. If the new package isn't loaded, it won't be included.
Solution: Use renv::snapshot(type = "explicit") to force renv to capture all packages installed in the project library.
So if you have installed new, libraries, have not opened script, or loaded a library yet, but anyway want to record all installed libraries in the project to the lock file, use the "explicit" option
renv::restore(clean=TRUE) checks if the current installed version of libraries matches that of the script. If not, it installs the version of the library als also used by your other collaborators in the project
renv::status()
Checks the integrity of the renv setup of your project, flagging problems with libraries that are not synced or up to date