Data Analysis
Data gathering, manipulation, analysis, and visualization
Visit the official UMSI website for the most up-to-date information on client based courses
Information on the site you are currently visiting is no longer being updated as of Summer 2021
SI 370: Data Exploration
What is Data Exploration?
In SI 370: Data Exploration, undergraduate students get started with their own data acquisition and exploratory data analysis (EDA). Students in this course will learn basic concepts of information visualization and techniques of exploratory data analysis, using scripting, text parsing, structured query language, regular expressions, graphing, and clustering methods to explore data. Students will be able to make sense of and see patterns in otherwise intractable quantities of data.
Deliverables
What do clients receive for participating in this course?
Students will analyze the dataset and deliver insights identified through both exploratory data analysis and utilizing a variety of techniques
Client Eligibility
Who can participate?
Potential clients should meet the following criteria:
Able to provide one or more datasets that might offer insights to interesting questions posed by your organization
Data must not be too small (100-1000 rows is likely too small, though depends on the questions that are relevant to the data)
Willingness to allow students to identify trends, insights, and questions about the data
Projects
What kinds of projects are appropriate for the course?
Potential projects should meet the following criteria:
Data-centric and revolve around large-scale datasets, with students working on problems of data manipulation, analysis, and visualization
Involve significant technical work, with corresponding amounts of programming and/or data analysis scripting
Projects without well-defined outcomes or paths to judging success
Desirable projects may include the following:
Apply tools of EDA in new situations (specifically techniques/methods/workflows to: (a) maximize insight into a data set; (b) uncover underlying structure; (c) extract important variables; (d) detect outliers and anomalies; (e) test underlying assumptions; (f) develop parsimonious models; and (g) determine optimal factor settings
Compute and visualize summary statistics of datasets
Combine the use of graphical aesthetics with data manipulation to visualize relationships between variables
Use factors to analyze categorical data and exploratory clustering analysis for unlabeled data.
What kinds of projects are NOT appropriate for the course?
Less desirable projects may include the following:
Work on mission-critical components or processes with critical dependencies on other projects
Projects that do not involve significant technical or programming work
How many projects are selected for this course?
Approximately 5-10
* Due to variability in the number of enrolled students each year, these numbers are subject to change and can be used as a rough estimate.
Participate
How do I become a client?
Potential clients should complete this brief form with their contact information and a short summary of their project idea. Our Client Engagement Team will review your submission and reach out to you within 3 business days with next steps.
What if I don't have a project right now, but I'm interested in future opportunities or want to learn more?
If you don't have a specific project in mind for the upcoming semester, but would like to stay informed about future opportunities to work with students through our client-based courses or other programs, complete this registration form to be added to our mailing list.
Timeline
This is a Fall semester course which occurs August – December
June - August
Client submits project idea
Client Engagement Team (CET) reviews project idea and requests full project proposal
CET works with client to scope and refine proposal
Client sends sample of dataset to CET and faculty
August - September
Faculty choose proposals to present to students
Students choose their project
Client sends full dataset to CET
October - December
Students explore data, and finalize anticipated scope and deliverables