Groups:
Group 1 (xena01-03) - Fordin, Sarah; Samoray, Nicholas; Viltoft, Jorgen
Group 2 (xena04-06) - Koeller, Jordan; Ang, Sam; Newton, Michael
Group 3 (xena07-09) - Burton, Craig; Yang, Mary; Herbert, Emily
Group 4 (xena10-12) - Burnett, Jesse; Usiri, Calvin; Holloway, Taylor;
Group 5 (xena13-15) - Reyes, Miguel; Walker, Blair; Andres, Robbie
Group 6 (xena16-18) - Bomer, Dan; Chang, Stephen; Skogman, Brett; Taylor, Zachary
Group 7 (xena19-21) - Witecki, Ian; Croxton, John; Whitten, Marcus
Data Set:
The file /users/mlewis/CSCI3395-F17/data/ghcn-daily/2017.csv was pulled from the "by_year" directory at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/. That same directory also has files called ghcnd-stations.txt and ghcnd-countries.txt that you will need to answer the questions for this week. You will need to read some of the support files on the web site in order to get information on the format of the data files.
In Class Questions (done in groups):
1. How many stations are there in the state of Texas?
2. How many of those stations have reported some form of data in 2017?
3. What is the highest temperature reported anywhere this year? Where was it and when?
4. How many stations in the stations list haven't reported any data in 2017?
Before you leave class, one member of your group needs to send me an email with your group answers to these questions. Make sure the email also includes the names of all the group members who were present to work on this.
Between Class Questions (done alone):
All the code that you write to answer these questions should be put in a package called sparkrdd in the in-class repository. (Note that a package is just a subdirectory. Since all of your code is going in src/main/scala for sbt, the code for this should be in src/main/scala/sparkrdd.) You should also make a file called sparkrdd.md in the top level of your repository that includes a write-up with your answers to the questions and any requested plots. Once you do your last push of the files, send me an email with links to both files to let me know.
1. What is the maximum rainfall for any station in Texas during 2017? What station and when?
2. What is the maximum rainfall for any station in India during 2017? What station and when?
3. How many weather stations are there associated with San Antonio, TX?
4. How many of those have reported temperature data in 2017?
5. What is the largest daily increase in high temp for San Antonio in this data file?
6. What is the correlation coefficient between high temperatures and rainfall for San Antonio? Note that you can only use values from the same date and station for the correlation.
7. Make a plot of temperatures over time for five different stations, each separated by at least 10 degrees in latitude. Make sure you tell me which stations you are using.