Over the course of the semester, many labs will require the use of external software libraries. Rather than figuring out how to install all of the libraries on your own computers, there is a Docker container that you can download. A Docker container is a virtual environment that you can enable and treat like a computer within your computer. The container will contain all the data and Python libraries that you need for the semester. The container is hosted on Docker cloud, so if I update the container (for instance, to update a package to work with a lab), you will be able to easily pull the newest version from the cloud.
To get started, you will need to visit the Get Started with Docker page to download Docker Community Edition. You will need to login or create a free account Docker before you can download the software. After you’ve downloaded the software, follow the instructions to install it.
When your container is running, you will need to mount one of your local folders to give the container access to read and write files in the "outside world" of your own files. Since you’ll be pulling git repositories ("repos") to your computer in order to complete the labs, it's a good idea to make a folder on your computer as the parent directory to hold the lab repos and to just mount that whole directory.
There are two different images that you can use: one that just contains the installation for the coding parts of the assignment (harveymudd/cs159-student:fall2021) and one contains pandoc (harveymudd/cs159-student-pandoc:fall2021). The instructions below assume you'll want pandoc as well, but you can swap in whichever you prefer.
For the example below, my repos are at /home/xanda/Documents/nlp/.
To run Docker, use the following command (as a single line, and replacing the underlined part with your lab directory path):
docker run -it -h docker -v /home/xanda/Documents/nlp:/home/student/nlp harveymudd/cs159-student-pandoc:fall2021 bash
Docker for Windows can be tricky to set up: until recently, it only supported Windows 10 Home, and it seems to cause some bumps on Windows. You can try out the regular Windows instructions here, or try setting it up with the Windows Subsystem for Linux 2 (WSL 2) . (Note that updating from WSL 1 to WSL 2 may delete all your Linux-bound files, so back things up!) I currently have this working with Ubuntu for Windows on Windows Enterprise and can try to help debug. If you'd like to avoid the headache, just use your CS login to the knuth server and work on things there; all the same support files and libraries should be present.
For the example below, my repos are at C:\Users\Xanda\My Documents\nlp.
In order to actually run this command, you'll first want to make sure Docker is running (that is, that Docker Desktop has an icon among your icons in your taskbar.) You'll want to open up either PowerShell as an administrator (you can do this by right-clicking on it in your Start menu and selecting "Run as administrator") or use your WSL distribution, e.g. Ubuntu, to run Docker commands. Using the regular built-in command line in your editor, e.g. VSCode, may not work because of the lack of administrative privileges. My usual approach is to open Ubuntu for Windows, check out code using git, navigate to the top level directory of the repository in the command line and then run the command code . to open Visual Studio Code for editing anything in the directory. I'll then keep using the command-line window to run Docker commands as needed.
Notice that in the command below, even though the directory is called “My Documents”, Windows allows you to just call the folder “Documents” at the command line to avoid problems with spaces. Also notice that Windows allows you to use ‘normal’ Linux forward-slashes to specify the path (as a single line):
docker run -it -h docker -v C:/Users/Xanda/Documents/nlp:/home/student/nlp harveymudd/cs159-student-pandoc:fall2021 bash
You’re welcome to change the location of your top-level folder to anywhere you want on your computer, but leave the part after the colon (/home/student/nlp) the same.
Although the container has both emacs and vi installed, you will probably find that using a more modern editor installed locally on your machine is easier to use. Since you’re editing files that are local to your computer, a local editor (e.g. atom or Visual Studio Code) is likely a better choice. Both of these also have extensions to support remote pair programming (like Teletype for atom or Visual Studio Live Share for VS Code). However, you're also welcome to just screen share for pair programming in group work, as long as you're good about switching regularly who is typing.
If your installation does not go smoothly or you have additional questions, please let Prof. Xanda or a grutor know on Discord. If it makes sense to remotely access the knuth server instead, here are some (slightly old) docs to get you started.
For most labs, we encourage you to use a Markdown file to make your writeup, compiled using the pandoc tool to a nice, LaTeX-styled PDF without the pain of writing the LaTeX code:
pandoc analysis.md -o analysis.pdf
Wherever you run this, you'll need both pandoc and LaTeX installed. The easiest thing to do is to use the Docker image that combines the NLP class files and pandoc, harveymudd/cs159-student-pandoc:fall2021. However, if you'd prefer to use the smaller student image, you can also just use a separate Docker image to run pandoc as follows:
Outside of the class Docker image, pull down an image containing pandoc and LaTeX: docker pull pandoc/latex:latest
Navigate to the directory where your .md file is (let's say analysis.md)
Use the following Docker command to run the pandoc command on the file and save the output as a PDF:
docker run --rm -v "`pwd`":/data pandoc/latex:latest analysis.md -o analysis.pdf
(Note: Powershell seems to struggle with the command above on Windows. If you have WSL set up with e.g. Ubuntu, that may make this easier. If you need help navigating how to set this up, reach out to Prof. Xanda.)
(Thanks to Prof. Bang and Prof. Talvitie for help updating the PAT part of this!)
If you've used GitHub before, you may be used to sometimes logging in using your username and password on the command line. Unfortunately, GitHub stopped letting you do that recently, so you might need to change your system for authentication!
You have a couple of options for how to handle this:
Use a personal access token (PAT). This is an alternative of sorts to a password that you can generate for your account for each machine you use. Unlike a password, you can have multiple of these for one account, each with different permissions for what it can do. To set this up, you can follow GitHub's instructions. You probably only need the "repo" level of access for your token. Then, when you are asked for your GitHub password, paste your token in instead!
If you use this approach, you probably won't want to have to type that key in for every single operation. Luckily, Git support caching that information on your local machine so you can keep reusing the authentication.
Mac: In the shell, enter git config --global credential.helper osxkeychain. (If you use a lab machine, it will forget everything when you log out, so you'll have to do this each time you log in to the lab machine again. You can just generate a new PAT for each session. Don't forget to delete the old one!)
Windows: In the shell, enter git config --global credential.helper wincred.
Linux: In the shell, enter git config --global credential.helper 'cache --timeout=10000000' to store your password for ten million seconds (that’s roughly 16 weeks and should hopefully get you to the end of the semester without logging in again).
Use an SSH key. This is similar, in the sense of using a special form of random password to authenticate your access to GitHub, but instead of having GitHub make the random key and entering it on your computer, you're going to generate the random key on the computer where you're using GitHub and then tell GitHub to use that key to check that it's you. To do this, we're going to generate an SSH key.
In order to do this, you'll want to use the ssh-keygen command, for which GitHub has a helpful guide. This will generate both a private key that should never be sent to anyone, and a matching public key (with a .pub extension) that can be given to others to help communicate with you in an encrypted way. Once you generate the key on the computer you want to log in from, you'll log in via your browser to GitHub and add the public key to your account (for which GitHub also has instructions). You'll have to do this for each distinct computer you use to check out code; you should not copy the SSH private key between computers.
Once you do this, you can check GitHub repositories out using SSH instead of HTTPS! You'll be able to select this when you copy the command to clone your repository from GitHub by selecting SSH in the menu and copying the text into a git clone command, e.g., as
git clone git@github.com: hmc-cs159-fall2021/[repository].git