What will be taught?
Introduction to Data Science with R will guide participants through the processes used to gain insights from data - regardless of whether the dataset is small set of points or a Big Data application running in the cloud across multiple servers. The course will focus on using R to handle the all 4 key phases of a Data Science project:
- Recording the data and process used: Creating a research notebook to help record methods, data, graphics and processes used in a form that can be executed and published.
- Research notes: Capturing the rationale and the findings of research
- Sharing the notebook: Publishing the data, analytical methods and results
- Manipulation of the dataset: Developing a dataset that is a reliable sample
- Data acquisition: Loading raw data from databases, spreadsheets, data listings and the web into reliable data frames.
- Data selection: Creating useful data subsets that can be subjected to analysis
- Dataset development: Development of sharable collections of data
- Visualizations: Providing a means for seeing trends in the dataset
- Data plots: A wide assortment of ways to plot data
- GIS maps: Mapping information to its geo-location
- Qualitative Text Analysis: separating text into useful groupings to discover trends in the text
- Word Clouds and Distribution Plots: analysis of the frequency of occurrences
- Machine learning: Subjecting data to various statistical learning techniques provides opportunities to uncover useful trends in a dataset.
- Classification: Being able to distinguish between groupings in the dataset.
- Trend analysis: Use of ANOVA, Regression and other techniques to establish a predictive model of the data.
- Factor analysis: Determining how factors influence outcomes
- Associations: Determining how components relate and influence each other in a systems