PYTHON PROGRAMMING for Data Analytics
Introduction
Course Introduction
Course Curriculum Overview & python3
Installation Notes
Quick Note on Jupyter Notebook
Python Installation --- Windows, ubuntu
IDE Selection
Jupyter (iPython) Notebooks
Basic Command
Numbers
Strings
Print Formatting
Lists
Dictionaries
Tuples
Files
Sets and Booleans
Resources for More Basic Practice
Condition, Expression and Loop
Introduction to Python Statements
if, elif, and elif Statements
for Loops
while Loops
range()
Exception Handling: try, except, finally
Errors and Exceptions Homework
Errors and Exceptions – Solutions
Function and Importing & Exporting
Introduction to functional programming
Function recipe
Function parameters and resuse
Recursive functions
Creating modules
Lamda
Import & Export
Text file
CSV
Excel
JSON
HTML
PLOT
Matplotlib
seaborn
Library Numpy (array matrix)
Numpy
Library Panda (complete)
Panda
Regular Expression
Pattern Matching And replacement
Date and Time and Missing Treatment
Date and Time
Data wrangling and cleaning
Missing treatment
Python & MySQL Connection
Web Scraping using Python
Predictive Analytics and Machine Learning using Python
Introduction to Analytics
• Evolution of Analytics
• Definition of Analytics
• Scope of analytics in different industries
Types of Analytics
• Descriptive Analysis
• Predictive Analysis
• Prescriptive Analysis
Concepts of Analytics
• Confirmatory & Exploratory Analysis
• Different Scale of Measurement-Nominal, Ordinal, Interval
• Ratio Attribute and Variable concept
• Graphical Representation of Data
• Measures of Central Tendency-Mean, Median, Mode
• Measures of Dispersion-Range, Variance, Standard Deviation
• Measures Of Location-Quartiles, Interquartile Range
• Outliers & Box Plot Graphs
Probability
• Concept of Probability
• Probability mass function
• Random Variables-Discrete and Continuous
• Binomial Distribution
• Poisson Distribution
• Normal Distribution
Sampling Theory
• Concept of sampling: Population and Sample
• Types of Sampling
• Probability sampling-Simple, Stratified, Systematic
• Non probability Sampling-Convenience, Judgmental
• Testing Of Hypothesis-Null and Alternative
• Type I error and Type II error
• Significance level
• Confidence Interval
Parametric Test
• Concept of Parametric test
• Z test
• T test
• Two independent sample T test
• Paired sample T test
Association between Variables
• Chi square Test for Independence
• Scatter Plot
• Correlation
• Partial Correlation
Analysis Of Variance (ANOVA)
• One-Way & Two-Way ANOVA
• Concept of Eigen Value and Eigen Vector
Introduction of Machine Learning
Introduction to Data Science and Artificial Intelligence
Introduction to Machine Learning
How Artificial Intelligence relates to Machine Learning
History of Machine Learning
Introduction of Basic Mathematical concepts used in Machine Learning
Vectors
• Vectors Operations ( Addition , subtract, multiplication )
• Sparsh Vector
• Dense Vector
• Eigen Values
• Eigen Vectors
Machine Learning using Python Programming
Machine Learning Algorithms in Theory
Type of Machine Learning Algorithms
• Supervised
• Unsupervised
• Recommendation Systems
Supervised Learning
• Classification
Decision Trees
Naïve Bayes
Gaussian Naïve Bayes
Logistic Regression
Linear Discriminant Analysis
K- Nearest Neighbour ( KNN )
Support Vector Machine (SVM)
• Regression
Linear Regression
Ridge Regression
LASSO Regression
ElasticNet Regression
Decision Tree Regressor
KNN Regressor
Support Vector Regressor
• Ensemble Algorithm
Boosting
Random Forest
Extra Trees
AdaBoost
Gradient Boosting Machine
Unsupervised Learning
Dimension Reduction using PCA
K Mean Clustering
Recommendation Systems
• Collaborative filtering
• Content based filtering
-Singular value decomposition
Text Mining and Introduction to NLP
Regular Expression
Bag of Word Model
Projects and Case Study
Recommender System (recommend movies to watch)
Titanic (EDA and ML)
SMS spam classification
Yelp reviews classification
Boston House price prediction
Irish data classification
Sonar vs Rock data classification
Election data analysis
Cancer classification
Time series Analysis
• Fundamentals
What is Time Series Forecasting
Time Series as Supervised Learning
Load and Explore Time Series Data
Feature Engineering for Time Series
Time Series Visualization
Resampling and Interpolation
Power Transforms
Moving Average Smoothing
White Noise
Introduction to the Random Walk
Decompose Time Series Data
Stationarity in Time Series Data
Backtest Forecast Models
Persistence Model for Forecasting
Residual Forecast Errors
Introduction to the Box-Jenkins Method
Autoregression Models for Forecasting
Moving Average Models for Forecasting
ARIMA Model for Forecasting
Autocorrelation and Partial Autocorrelation
Grid Search ARIMA Model Hyperparameters
Save Models and Make Predictions
Projects
Project: Monthly Armed Robberies in Boston
Project: Annual Water Usage in Baltimore
Project: Monthly Sales of French Champagne
Project: Stock market analysis and Risk analysis
Project: Stock market prediction
******* Capstone Project --- Choosen by candidate that will be monitored by instructor to complete the project.
Gradient Boosting Trees with XGBoost
XGBoost Basics
- Introduction to Gradient Boosting
- AdaBoost the First Boosting Algorithm
- Introduction to XGBoost
-Algorithm XGBoost
-XGBoost Model in Python with scikit-learn
-Problem Description: Predict Onset of Diabetes
-Predictions with XGBoost Model
-Data Preparation for Gradient Boosting
-Evaluate Models With k-Fold Cross-Validation
-Plot XGBoost Decision Tree
2. XGBoost Advanced
- Save and Load Trained XGBoost Models
- XGBoost and Feature Selection
- Monitor Training Performance and Early Stopping
-Tune Multithreading Support for XGBoost
- Train XGBoost Models in the Cloud with Amazon Web Services
3. XGBoost Tuning
- Tune the Number and Size of Decision Trees with
- Tune Learning Rate and Number of Trees with XGBoost
- Tuning Stochastic Gradient Boosting with XGBoost
Deep Learning
For
Natural Language Processing
1- Foundations
1.1 Natural Language Processing
2- Deep Learning
2.1 - Deep Learning for Natural Language
3- Develop Deep Learning Models with Keras
3.1 Keras Model Life-Cycle
3.2 Keras Functional Models
4- Data Preparation
4.1 How to Clean Text Manually and with NLTK
4.2 Metamorphosis by Franz Kafka
4.3 Tokenization and Cleaning with NLTK
4.4 Additional Text Cleaning
5- Prepare Text Data with scikit-learn
5.1 The Bag-of-Words Model
5.2 Word Counts with CountVectorizer
5.3 Word Frequencies with TfidfVectorizer
5.4 Hashing with HashingVectorizer
6- Prepare Text Data with Keras
6.1 Split Words with text to word sequence
6.2 Encoding with one hot
6.3 Hash Encoding with hashing trick
6.4 Tokenizer API
7- Bag-of-Words
8 The Bag-of-Words Model
8.4 Example of the Bag-of-Words Model
8.5 Managing Vocabulary
8.6 Scoring Words
9 - Prepare Movie Review Data for Sentiment Analysis
10- Project: Develop a Neural Bag-of-Words Model for Sentiment Analysis
11- Word Embeddings
11.1 How to Develop Word Embedding with Genism
11.2 Genism Python Library
11.3 Develop Word2Vec Embedding
11.4 Visualize Word Embedding
11.5 Load Google's Word2Vec Embedding
11.6 Load Stanford's GloVe Embedding
11.7 Learn and Load Word Embeddings in Keras
12 - Text Classification
12.1 Neural Models for Document Classification
12.2 Word Embeddings + CNN = Text Classification
13- Project: Develop an Embedding + CNN Model for Sentiment Analysis
14- Project: Develop an n-gram CNN Model for Sentiment Analysis
15- Language Modeling
15.1 Neural Language Modeling
15.2 Statistical Language Modeling
15.3 Neural Language Models
16- Develop a Character-Based Neural Language Model
16.1 Sing a Song of Sixpence
16.2 Generate Text
16.3 Develop a Word-Based Neural Language Model
16.4 Jack and Jill Nursery Rhyme
16.5 Model 1: One-Word-In, One-Word-Out Sequences
16.6 Model 2: Line-by-Line Sequence
16.7 Model 3: Two-Words-In, One-Word-Out Sequence
17- Project: Develop a Neural Language Model for Text Generation
17.1 The Republic by Plato
18- Image Captioning
18.1 Neural Image Caption Generation
18.2 Describing an Image with Text
18.3 Encoder-Decoder Architecture
19- Neural Network Models for Caption Generation
19.1 Image Caption Generation
19.2 Load and Use a Pre-Trained Object Recognition Model
19.3 ImageNet
19.4 The Oxford VGG Models
19.5 Load the VGG Model in Keras
19.5 Develop a Simple Photo Classifier
20- BLEU Score
20.1 Bilingual Evaluation Understudy Score
20.2 Calculate BLEU Scores
20.3 Cumulative and Individual BLEU Scores
20.4 Prepare a Photo Caption Dataset for Modeling
21- Project: Develop a Neural Image Caption Generation Model
21.1- Machine Translation
21.2 Statistical Machine Translation
21.3 Neural Machine Translation
21.4 Encoder-Decoder Models for Neural Machine Translation
21.5 Encoder-Decoder Architecture for NMT
21.6 Sutskever NMT Model
21.7 Cho NMT Model
22- Configure Encoder-Decoder Models for Machine Translation
22.1 Encoder-Decoder Model for Neural Machine Translation
23- Project: Develop a Neural Machine Translation Model
German to English Translation Dataset
Projects ----
Sentiments Analysis
Twitter Data Analysis
Fake News Classifier
Topic Modelling
Major Projects
1-Chatbot using Deep Learning
2-Recommender System
3-Language Translation
4-Credit Risk Modelling
**Note:**
The syllabus is subject to adjustments based on the pace of the class and emerging developments in the field of data engineering with PySpark, Scala, and Shell scripting.