What will be taught?

Introduction to Data Science with R will guide participants through the processes used to gain insights from data - regardless of whether the dataset is small set of points or a Big Data application running in the cloud across multiple servers. The course will focus on using R to handle the all 4 key phases of a Data Science project:

    1. Recording the data and process used: Creating a research notebook to help record methods, data, graphics and processes used in a form that can be executed and published.
        • Research notes: Capturing the rationale and the findings of research
        • Sharing the notebook: Publishing the data, analytical methods and results
    2. Manipulation of the dataset: Developing a dataset that is a reliable sample
        • Data acquisition: Loading raw data from databases, spreadsheets, data listings and the web into reliable data frames.
        • Data selection: Creating useful data subsets that can be subjected to analysis
        • Dataset development: Development of sharable collections of data
    3. Visualizations: Providing a means for seeing trends in the dataset
        • Data plots: A wide assortment of ways to plot data
        • GIS maps: Mapping information to its geo-location
        • Qualitative Text Analysis: separating text into useful groupings to discover trends in the text
        • Word Clouds and Distribution Plots: analysis of the frequency of occurrences
    4. Machine learning: Subjecting data to various statistical learning techniques provides opportunities to uncover useful trends in a dataset.
        • Classification: Being able to distinguish between groupings in the dataset.
        • Trend analysis: Use of ANOVA, Regression and other techniques to establish a predictive model of the data.
        • Factor analysis: Determining how factors influence outcomes
        • Associations: Determining how components relate and influence each other in a systems