Finding a pattern (from historical data) so that an opportunity outcome can be identified before it occurred.
PA is a supervised learning, where a target is required (e.g. the data to predict)
A supervised learning algorithm analyses the historical (e.g. training) data and produces an inferred function, which can be used for mapping new examples (e.g. predictions)
Predictive Modeling Overview
Each training data record must include categorical or numeric input and target measurements
Predictive model is a concise representation of the input and target association
Prediction is the output of the predictive model (e.g. score) given a set of input measurements (e.g. score data)
Modeling Essentials
Predict new cases (Three Prediction Types)
Decisions
A predictive model uses input measurements to make the best decision for each case
Rankings
A predictive model uses input measurements to optimally rank each case
Estimates
A predictive model uses input measurements to optimally estimate the target value
Select useful inputs
Dimension Reduction: Redundancy & Irrelevancy
Optimize complexity
Data Partitioning (Tune models with validation data)
Partition available data into training and validation datasets
The purpose of data split:
The model is fit on the training dataset, and that model’s performance is evaluated on the validation dataset
Select the simplest model with the highest validation assessment
The full dataset consists of (Rule of Thumb):
Dataset for Modeling
Train (70%) - Creat the model / see whether overfitting of data
Validation (15%) - Validate the model performance
Dataset to Assess Model
Test (15%) - Test & assess the model
The training dataset is bigger than Validation dataset. And the test dataset is smaller than modeling datase