RyanGoh's Eportfolio

2. Predictive Modeling Fundamentals

Predictive Analytics (PA)

Finding a pattern (from historical data) so that an opportunity outcome can be identified before it occurred.
PA is a supervised learning, where a target is required (e.g. the data to predict)
A supervised learning algorithm analyses the historical (e.g. training) data and produces an inferred function, which can be used for mapping new examples (e.g. predictions)

Predictive Modeling Overview

Each training data record must include categorical or numeric input and target measurements
Predictive model is a concise representation of the input and target association
Prediction is the output of the predictive model (e.g. score) given a set of input measurements (e.g. score data)

Modeling Essentials

Predict new cases (Three Prediction Types)
1. Decisions
  - A predictive model uses input measurements to make the best decision for each case
2. Rankings
  - A predictive model uses input measurements to optimally rank each case
3. Estimates
  - A predictive model uses input measurements to optimally estimate the target value
Select useful inputs
- Dimension Reduction: Redundancy & Irrelevancy
Optimize complexity
- Data Partitioning (Tune models with validation data)
  - Partition available data into training and validation datasets
  - The purpose of data split:
    - The model is fit on the training dataset, and that model’s performance is evaluated on the validation dataset
  - Select the simplest model with the highest validation assessment
- The full dataset consists of (Rule of Thumb):
  1. Dataset for Modeling
    - Train (70%) - Creat the model / see whether overfitting of data
    - Validation (15%) - Validate the model performance
  2. Dataset to Assess Model
    - Test (15%) - Test & assess the model
  - The training dataset is bigger than Validation dataset. And the test dataset is smaller than modeling datase

Google Sites

Report abuse