Description:
The skill of preparing, analyzing, and creating visualizations for large data sets plays a central role in Engineering and Computer Science.
In this project you will work in groups and pull a Dataset from the University of California, Irvine repository. (https://archive.ics.uci.edu/ml/index.php ) Several links to specific datasets are included below.
Requirements:
Select a dataset from the UCI collection. Download the dataset and convert to Google Sheets or Excel Format. You will also need a .csv file of the dataset
Repair or standardize the dataset as needed
Read the dataset description and determine a goal for analysis. This goal may include
Visualization of Data (Reading and generating a graphic file)
Finding the max, min, or other statistically interesting trend in data
Other observations and noted in data description files
Design any classes or data structures needed to work with information
Implement a Java Program that does the following:
Reads the data from a text file
Parses data into your classes and data structures
Visualizes the data in some way (Use Princeton's StdDraw Library provided below)
Prints or displays your analysis
Deliverables:
Google Slide Deck with following Slides
Description and Link to Dataset. What will you analyze/find
Pipeline diagram showing the process of your code and structure
Slide showing visualizing and results
ZIP file with source code for project. (Or, a github link to a repository with your code)
Resources:
Link to UCI Archive: https://archive.ics.uci.edu/ml/index.php
Selected Datasets:
https://archive.ics.uci.edu/ml/datasets/Daily+and+Sports+Activities
https://archive.ics.uci.edu/ml/datasets/Letter+Recognition
https://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits
Princeton's Standard Library: https://introcs.cs.princeton.edu/java/stdlib/stdlib.jar
JavaDocs and Source code for stdlib: https://introcs.cs.princeton.edu/java/stdlib/