1. Choose one of the data sets listed above in the Activity section or one that you find on your own and give a brief description of it. What specifically were the types of data (text, sounds, transactions, etc.) included in the data set you chose?
I looked at a data set that had information on student debt. It showed a graph that plotted different schools as points. One axis was average tuition cost and the other was average debt after graduation. You could also filter colleges by size and change the year of the graph.
2. What new facts did you learn when exploring the data set? List at least 3 facts.
I saw that there was a trend towards both public and private colleges becoming more expensive to attend. Also, students are graduating with more and more debt every year. It seems that there is only a slight difference of amount of graduation debt between public and private colleges.
3. Write a question you have about the data set you chose. Now, convert that question into a hypothesis (a statement) with your prediction about the data.
If the graduation rate of a given college is high, then the average amount of debt after graduation will be higher than a college with a lower graduation rate.
4. Identify at least one security and/or privacy concern that is associated with the data in the data set you chose?
There is a possibility that this data could be used to predict the amount of debt someone has after graduating from a certain college, and this information could be used against them if it was known.
5. If your data set included a visualization, explain the purpose of the visualization. How would you change or improve the visualization? If it did not include a visualization, describe one that you think would be useful in understanding the data.
The data set included two methods of visualization: a map and a graph. Both of these were pretty good, with size of the colleges evident by size of the points and a color code for differentiating between public and private colleges. To improve these visualizations I would make it easier to look at separate points since they were all really close together, making it difficult to click on an individual point and get data.