Large data advice

The dataset you will be working with is quite LARGE.

It is recommended you create a small data set that you can use to test things on. Then, if it works out, you can apply your procedure to the large dataset.

Some procedures can take a frustratingly long time to run on large data sets, and so it will be comforting to know that your procedure works (because you tested it on a smaller data set) while you wait.

It is recommended taking a random sample of rows from the original data set, but there might be other approaches you find useful.