Git Tutorial
Getting Started with Git & GitHub for Project Collaboration
The total length of the videos in this section is approximately 30 minutes.
You can also view all the videos in this section at the YouTube playlist linked here.
This is our newest module, and our goal is to find out how effective this tutorial is and how to improve it. We would love to hear feedback.
This lecture was created by Donna Gan '20.
If you are completing this module as part of Wellesley's QAI Summer Program or STAT 260, note that you do not need to memorize or even practice every command that's included here. Depending on your previous experience, Git might seem familiar/logical or completely different from anything you've seen before. Take a look at the posted assignment to see that we will ask you to practice the commands from the first few videos but not the last few. If you are new to this platform, you should watch all of the videos, but don't spend hours trying to internalize every detail. Like anything, Git is best learned with practice.
Part 1: Introduction to Git
Most projects involve collaborating with others and repeatedly updating code files and other documents. Git is an open-source, distributed version control system. In other words, Git is a way to manage a project when multiple people are making repeated changes to the same files.
Why learn Git?
Track all changes in files and maintain the ability to revert back to previous versions
Synchronize code among collaborators
Test changes to the code without affecting the original copy (this is called "branching")
Downloading Git
On a mac:
You may already have Git downloaded on your computer. So, the first step is to check.
Open the Terminal application (try searching your computer for "Terminal") and type
git --version
If a version number is printed to the screen, you already have Git. If not, you need to download Git.
You can download Git here. We suggest choosing the Xcode option.
On a PC:
If you are using a Windows operating system, the simplest way to execute the command-line functionalities is through the application Git Bash, which is included when you install the Git For Windows package. Git Bash provides the Bash emulation to run Git from the command line for the Windows environment. In other words, you will use Git Bash to access Git.
PC users: click here only if you are following this tutorial on the Windows command line or PowerShell and already have Git installed there. Again, we recommend downloading Git Bash instead.
The commands described in the videos and text in this module assume that you are using mac/linux or Git Bash. If you are used to using the built-in Windows command line or PowerShell and prefer not to download Git Bash, you will have to make some substitutions. In the list below, bold text is the linux/mac command, and regular text is the Windows/PowerShell command.
pwd = cd
touch = echo or, in PowerShell, echo $null >> filename
echo "some text" > filename
if you just do echo > filename, the generated file will have the message "ECHO is on."
open = just type the file name without any command and it'll open up, or, in PowerShell, substitute ii for open
cat = type
References
You might find it helpful to explore these links if you have questions as you go through the material in this module.
Here is the official documentation for the commands you will learn in this module (and other commands), just like the help pages in R.
Git is primarily used through the command-line interface. This link provides a useful introduction to some basic command-line commands.
This page is a guide to setting up and getting started with Git..
This page is a guide to common tasks in GitHub.
Learning new terminology
Throughout this tutorial, we will primarily be using Git to work with "Git repositories." A Git repository is a space to store files for a project. You may find that learning Git involves becoming familiar with new terminology, such as "repository". That's okay and is part of the learning process. Each of the videos below is followed by notes that are intended to help spell out some of the terminology and ideas that are introduced in the video.
A change in terminology
GitHub made an important change to its terminology just after these videos were recorded. The primary version of a repository is now called "main" instead of "master", as part of an effort to avoid language related to slavery. You can find discussion in various articles, like this one. You will still see "master" in this module's videos (and one screenshot from a video) because we have not yet rerecorded. However, we have substituted "main" for "master" in the text on this page. When you are experimenting with the commands you learn from the videos, be sure to use "main" instead of "master", because GitHub will not recognize "master" and will think that you are trying to create and name something new.
Question 1: What is Git, and why is it useful?
Show answer
Git is a distributed version control system that tracks changes to files over time. We can use Git to go back to a specific version of the tracked files and in a collaborative setting, coordinate parallel work among team members. Git is particularly useful for managing large-scale projects.
Part 2: Getting started with GitHub
GitHub is a platform for hosting and collaborating on Git repositories (version control + collaboration)
In addition to downloading Git, you need to create a GitHub account. You can create a GitHub account here: https://github.com/
GitHub has announced that a username and password will not be sufficient after August 13, 2021. This page explains the change, and this page explains how to create a "personal access token."
Once you have a GitHub account, you need to associate this account with the Git application that you downloaded. In Terminal (for a mac) or Git Bash (for a PC), use the following commands:
git config --global user.name "Your Name"
git config --global user.email "Your Email"
The following videos assume that you already have Git installed on your computer, that you have already created an account on GitHub, and that the Git installation on your computer is associated with your GitHub account.
![](https://www.google.com/images/icons/product/drive-32.png)
As an alternative to cloning an existing repository from GitHub, we could initialize a repository locally and upload to GitHub:
First create a new repository on GitHub and obtain the url, then locally, execute from the command line:
git init <repository name>
cd <repository name>
echo "# repository name" >> README.md
git add README.md
git commit -m "first commit"
git remote add origin <url>
git push -u origin main
You can find the link to the DemoProject repository under the WellesleyQAI account on GitHub:
See Part 2 video notes
Create a GitHub repository & add collaborators
Clone an existing repository from GitHub:
cd Desktop
git clone <url>
To see the local setup after cloning:
cd <repository name>
git remote -v (using the default name “origin” to denote the local copy of the remote repository)
Could also initialize repository locally:
cd Desktop
git init <repository name>
OR turn an existing directory into a git repository:
cd <directory name>
git init
Question 2: What are the two common workflows of setting up the remote and local versions of the repositories mentioned in the video?
Show answer
You can either clone a repository from GitHub or take a local directory that is currently not under version control, turn it into a Git repository, and upload it to GitHub.
Part 3: Tracking edits with Git
![](https://www.google.com/images/icons/product/drive-32.png)
Note: It’s generally a good idea to commit incremental changes (although perhaps not too frequently).
Part 3 video notes - make sure to read, as the key steps are reviewed here ("add" and "commit")
Create a new R file in the cloned (local) repository:
cd DemoRepo
touch demo.R
open demo.R
Make edits to demo.R: e.g. head(iris)
Visualize changes in the repository: cat demo.R / git status
Add files that you want to commit for version control purposes from your working directory to the staging area:
git add <filename> (specific file)
OR
git add . (everything that’s changed in the current directory)
Save a snapshot of the file (i.e. commit edits): git commit -m "message"
You can put whatever you want as a "message," usually something like "updated code to include new data" or "edited code to allow for color in graphics." The idea is that these labels can help you identify previous versions of a file.
Use git status to see: we have successfully captured a snapshot of the current state of the local working directory and we will go into details about how to synchronize with the remote repository on GitHub in the next section
If you notice that an extra file called R.history is being created when you use GitHub to store R code files, you may find that you need to delete this extra file before using commit or push.
Question 3: How do we use Git to track edits for version control?
Show answer
Git uses a so-called two-step commit process: you would first use the git add command to add a file that you want to track to the staging area, and then from the staging area, use the git commit command to record a snapshot of the repository.
Part 4: Communicating with the remote repository
![](https://www.google.com/images/icons/product/drive-32.png)
See Part 4 video notes
Configuration: added as a collaborator of the repository, log-in credentials stored, etc.
Two network commands: git push and git pull
Push local edits onto the remote repository: git push (refresh GitHub webpage to see that changes have indeed been reflected in the remote repository)
Make an edit on GitHub: e.g. #head(iris)
To synchronize with the most updated version of the file which is currently located on the remote: git pull (see that the local version is indeed updated)
The git pull command is a combination of the two commands git fetch (fetch content from the remote repository) and git merge (merge with the local copy).
Question 4: What are the two Git network commands that are mentioned in the video and what are they used for?
Show answer
We use the git push command to upload local repository content to the remote repository and the git pull command to download content from the remote repository and update the local repository to match the remote copy.
Part 5: Collaboration -- Manage merge conflicts
![](https://www.google.com/images/icons/product/drive-32.png)
To overwrite local version with remote (pull):
git pull (creates a merge conflict)
git checkout --theirs <filename>
git commit -am “resolve merge conflict”
git push
OR
git fetch --all (similar to git pull but without the merging)
git reset --hard origin/main (using hard reset to discard local changes to match origin/master)
git pull origin main
To overwrite remote version with local (push):
git pull (creates a merge conflict)
git checkout -- ours <filename>
git commit -am “resolve merge conflict”
git push
OR
git push -f (force push local to overwrite the remote)
See Part 5 video notes
Conflicts occur when there are competing changes to the same file on the remote and in the local copy
Add and commit some edits locally: e.g. dim(iris)
Note: shortcut for adding and committing an already tracked file is: git commit -am "message"
Add and commit some edits on GitHub: e.g. summary(iris)
Now conflict occurs and both git push and git pull would fail
Reopen demo.R: HEAD points at local version and the remote version is shown with its commit hash; e.g. to keep both edits, delete the conflict markers
Resolving the merge conflicts locally and synchronize with the remote:
git commit -am “resolve merge conflict”
git push
Question 5: Why did merge conflict occur in this scenario?
Show answer
In the video, edits to the same line of the same file were made both in the local and the remote copy. To synchronize the local file with the remote file, we need to decide which edits to keep and resolve the merge conflict.
Part 6: Collaboration -- Branching
Branching is a way to experiment with changes to the file without affecting the current saved version that collaborators can access.
![](https://www.google.com/images/icons/product/drive-32.png)
See Part 6 video notes
Idea of branching: experimenting with some new features without messing up the original copy
To see what branch we have / we are on right now: git branch
Create and move into a new branch called feature: git checkout -b feature (shortcut for creating and switching to a branch)
Type git branch again to see that we are now on the feature branch
Make changes and commit: explore swiss dataset
To push the new branch to the remote repository: git push --set-upstream origin feature (need to specify the upstream branch when pushing)
The new branch is reflected on GitHub, switch to the new branch and compare changes
Create a pull request to propose the edits, merge pull request, and delete feature branch (result: only 1 branch left and new edits on the feature branch merged into the file on master)
To merge branches locally, first switch back to master branch: git checkout main, then merge feature branch into master: git merge feature
To delete a branch locally: git branch -D feature
Question 6: When would we want to use the branching features in Git?
Show answer
Suppose we want to experiment with a new feature without potentially messing up the current working version of our project, we can create a new branch which allows us to work on a separate copy of our code. Once the new feature is implemented, we can merge it back to the main branch.
Part 7: More on version control (these videos are optional)
![](https://www.google.com/images/icons/product/drive-32.png)
Question 7: If you go back to an earlier version, can you return to the current version again?
Show answer
Yes, using git checkout main
![](https://www.google.com/images/icons/product/drive-32.png)
Question 8: Can you go back to an earlier version if you have saved but not committed your most recent work?
Show answer
Yes: git stash temporarily saves your most recent work, and git checkout main brings you back to the stored version.
![](https://www.google.com/images/icons/product/drive-32.png)
Variations of the git reset command:
git reset --soft <commit-hash> (uncommit changes since the target commit i.e. place edits to the staging area)
(default) git reset --mixed <commit-hash> (uncommit and unstage changes since the target commit i.e. place edits back to the working directory)
(use with caution) git reset --hard <commit-hash> (delete all changes in working directory, staging area, and even commit history to match the target commit)
Note: While demonstrating the git revert command in the video, I should have input the command git log --oneline immediately after to further illustrate the idea that a new commit has been created in this case. Therefore if I had shown the output from git log --oneline, the following should appear at the very top of the commit history in addition to the existing commit d7991f9:
c527def (HEAD -> main) Revert "comment out swiss data"
Note: Again, the following output would show up if I were to input git log --oneline after demonstrating the git reset --hard command. Notice that all commits after the commit bbec100 are wiped out from commit history:
See Part 7 video notes
Show commit history: git log (type q to exit) OR git log --oneline (for a more concise version of commit history
(Not mentioned in the video, but good to know) Could use git diff to compare branches/commits:
git diff <branch-name> <branch-name>
OR
git diff <commit-hash> <commit-hash>
Temporarily going back in history to check out a previous version: git checkout <commit-hash> and use git checkout main to switch back to the current state
Suppose there are uncommitted changes in the working directory before we switch branches, we can stash the changes with git stash and when back on master branch, use git stash pop (which means apply edits and drop the stored copy from stack) to get the changes back and commit
Create a new commit that reverses the changes in a previous commit: git revert <commit-hash> --no-edit
Another way to undo changes is to uncommit/discard previous commits using git reset, for example:
git reset --hard <commit-hash>
To untrack a file in the working directory: git rm --cached <filename>
Question 7: What are some of the Git commands that you can invoke to restore/rollback to a previous state of the repository?
Show answer
1) git checkout <commit-hash> allows you to temporarily switch to and inspect an older version of your project without modifying the commit history.
2) git revert <commit-hash> creates a new commit that reverts the effect of a previous commit. It is recommended to use git revert if the particular commit that you wish to undo is public so that your collaborators could clearly see the changes introduced and all old commits that they may depend on are preserved.
3) git reset --hard <commit-hash> completely removes a particular commit and all subsequent commits from commit history. Proceed with caution since this command overwrites commit history.
Extensions:
GitHub Desktop (offers the option to skip the command line and interact with Git/GitHub through a user interface): https://desktop.github.com/
Build your own website with GitHub Pages: https://pages.github.com/
Thanks for working through this tutorial!