Data Analysis

Data gathering, manipulation, analysis, and visualization

Visit the official UMSI website for the most up-to-date information on client based courses

Information on the site you are currently visiting is no longer being updated as of Summer 2021


SI 370: Data Exploration

What is Data Exploration?

In SI 370: Data Exploration, undergraduate students get started with their own data acquisition and exploratory data analysis (EDA). Students in this course will learn basic concepts of information visualization and techniques of exploratory data analysis, using scripting, text parsing, structured query language, regular expressions, graphing, and clustering methods to explore data. Students will be able to make sense of and see patterns in otherwise intractable quantities of data.

Deliverables

What do clients receive for participating in this course?

  • Students will analyze the dataset and deliver insights identified through both exploratory data analysis and utilizing a variety of techniques

Client Eligibility

Who can participate?

Potential clients should meet the following criteria:

  • Able to provide one or more datasets that might offer insights to interesting questions posed by your organization

  • Data must not be too small (100-1000 rows is likely too small, though depends on the questions that are relevant to the data)

  • Willingness to allow students to identify trends, insights, and questions about the data

Projects

What kinds of projects are appropriate for the course?

Potential projects should meet the following criteria:

  • Data-centric and revolve around large-scale datasets, with students working on problems of data manipulation, analysis, and visualization

  • Involve significant technical work, with corresponding amounts of programming and/or data analysis scripting

  • Projects without well-defined outcomes or paths to judging success

Desirable projects may include the following:

  • Apply tools of EDA in new situations (specifically techniques/methods/workflows to: (a) maximize insight into a data set; (b) uncover underlying structure; (c) extract important variables; (d) detect outliers and anomalies; (e) test underlying assumptions; (f) develop parsimonious models; and (g) determine optimal factor settings

  • Compute and visualize summary statistics of datasets

  • Combine the use of graphical aesthetics with data manipulation to visualize relationships between variables

  • Use factors to analyze categorical data and exploratory clustering analysis for unlabeled data.

What kinds of projects are NOT appropriate for the course?

Less desirable projects may include the following:

  • Work on mission-critical components or processes with critical dependencies on other projects

  • Projects that do not involve significant technical or programming work

How many projects are selected for this course?

  • Approximately 5-10

* Due to variability in the number of enrolled students each year, these numbers are subject to change and can be used as a rough estimate.

Participate

How do I become a client?

Potential clients should complete this brief form with their contact information and a short summary of their project idea. Our Client Engagement Team will review your submission and reach out to you within 3 business days with next steps.

What if I don't have a project right now, but I'm interested in future opportunities or want to learn more?

If you don't have a specific project in mind for the upcoming semester, but would like to stay informed about future opportunities to work with students through our client-based courses or other programs, complete this registration form to be added to our mailing list.

Timeline

This is a Fall semester course which occurs August – December

June - August

  • Client submits project idea

  • Client Engagement Team (CET) reviews project idea and requests full project proposal

  • CET works with client to scope and refine proposal

  • Client sends sample of dataset to CET and faculty

August - September

  • Faculty choose proposals to present to students

  • Students choose their project

  • Client sends full dataset to CET

October - December

  • Students explore data, and finalize anticipated scope and deliverables