SPECIALIZATION ELECTIVE
Credit Hours : 3
Pre-requisite: Mathematics for Machine Learning
Synopsis
Feature engineering is the act of extracting features from raw data and transforming them into formats that are suitable for the machine learning model. The vast majority of time in building a machine learning pipeline is spent on feature engineering and data cleaning. This course provides the knowledge necessary for feature engineering. Feature engineering is a crucial step in the machine learning pipeline because the right features can ease the difficulty of modeling, and therefore enable the pipeline to output results of higher quality. The topics covered in this course address the basic feature engineering for numeric data; feature engineering for natural next; the encoding techniques for categorical behaviour; principal component analysis(PCA); k-means as featurization techniques; and manual feature extraction techniques for images. In addition, this course will also include the explanation of deep learning as feature extraction for images.
Course Content
Topic 1: The Machine Learning Pipeline
Topic 2: Feature Engineering for Numerical Data
Scalar, vector, spaces
Dealing with Counts (Binarization; Quantization or Binning)
Log Transformation (Log Transform in Actin; Power Transform)
Feature Scaling or Normalization (Min-Max Scaling; Standardization; Normalization)
Interaction Features
Feature Selection
Topic 3: Feature Engineering for Textual Data
Bag of X (Bag-of-Words)
Bag-of-n-Grams
Filtering for Cleaner Features (Stopword, Frequency-Based Filtering, Stemming)
Atoms of Meaning (Parsing and Tokenization; Collocation Extraction for Phrase Detection)
Feature Scaling (Standardization, Min-Max Scaling, Robust Scaling, Max Absolute Value, Mean Normalization, Unit Length Scaling)
Topic 4: Categorical variables
Encoding Categorical Variables (One-hot Encoding; Dummy Coding; Effect Coding)
Dealing with Large Categorical Variables (Feature Hashing, Bin Counting)
Topic 5: Dimensionality Reduction
Intuition
Derivation
PCA in Action
Whitening and ZCA
Consideration and Limitation for PCA
Topic 6: Nonlinear Featurization via K-Means Model Stacking
k-Means Clustering
Clustering as surface Tiling
k-Means Featurization for Classifications
Topic 7: Automating the Featurizer: Image Feature Extraction and Deep Learning
The Simples Image Features
Manual Feature Extraction (SIFT and HOG)
Learning Image Features with Deep Neural Networks
References
Alice Zheng, Amanda Casari. [2018]. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. 1st Edition, O'Reilly Media, Inc.
Max Kuhn and Kjell Johnson. [2019]. Feature Engineering and Selection: A Practical Approach for Predictive Models, CRC Press.
Prepared By
Ts. Dr. Yasmin Mohd Yacob / Assoc. Prof Ts. Dr. Amiza Amir