Exploratory Data Analysis

Course Details

Class: III B.Tech , Semester: V, CAY: 2025-26

Course Objectives: The main objectives of the course are to

Introducing the fundamentals of Exploratory Data Analysis.
Cover essential exploration techniques for understanding multivariate data by summarizing it through statistical methods and graphical methods
Evaluate the models and select the best model.

Course Educational Objectives (CEOs)

To provide foundational knowledge in data science concepts, including the role and importance of Exploratory Data Analysis (EDA) in the data science lifecycle.
To equip students with practical skills in data visualization and exploration techniques using modern tools and libraries.
To develop the ability to preprocess and transform data effectively for analysis and modelling, ensuring data quality and readiness.
To enable students to apply statistical methods for summarizing and interpreting data, fostering analytical thinking.
To prepare students to build and evaluate predictive models, enabling them to solve real-world problems using data-driven approaches.

COURSE OUTCOMES (COs): At the end of the course, the student will be able to

CO1: Understand the Fundamentals of EDA and Data Science (Understand, L2)
CO2: Apply Visualization Techniques for Data Exploration (Apply, L3)
CO3: Perform Data Transformation and Preprocessing (Apply, L3)
CO4: Analyze Data Using Descriptive Statistics (Analyze, L4)
CO5: Develop and Evaluate Predictive Model (Apply, L3)

EDA- Syllabus [JNTUK]

Textbook:

1. Suresh Kumar Mukhiya, Usman Ahmed, Hands-On Exploratory Data Analysis with Python, Packt Publishing, 2020.

Reference Books:

1. Ronald K. Pearson, Exploratory Data Analysis Using R, CRC Press, 2020

2. Radhika Datar, Harish Garg, Hands-On Exploratory Data Analysis with R: Become an expert in exploration data analysis using R packages, 1st Edition, Packet Publishing, 2019

Web References:

1. https://github.com/PacktPublishing/Hands-on-Exploratory-Data-Analysis-with-Python

2. https://www.analyticsvidhya.com/blog/2022/07/step-by-step-exploratory-dataanalysis-eda-using-python/#h-conclusion

3. https://github.com/PacktPublishing/Exploratory-Data-Analysis-with-Python-Cookbook

SYLLABUS

Unit I: Exploratory Data Analysis Fundamentals (Chapters 1, 2)

Exploratory Data Analysis Fundamentals: Understanding data science, the significance of EDA, Steps in EDA, making sense of data, Numerical data, Categorical data, Measurement scales, Comparing EDA with classical and Bayesian analysis, Software tools available for EDA, getting started with EDA.

Unit II: Visual Aids for EDA (Chapters 3, 4)

Visual Aids for EDA: Technical requirements, Line chart, Bar charts, Scatter plot using seaborn, Polar chart, Histogram, Choosing the best chart. Case Study: EDA with Personal Email, Technical requirements, Loading the dataset, Data transformation, Data cleansing, Applying descriptive statistics, Data refactoring, Data analysis.

Unit III: Data Transformation (Chapter 5)

Data Transformation: Merging database-style data frames, concatenating along with an axis, merging on index, Reshaping and pivoting, Transformation techniques, handling missing data, Mathematical operations with NaN, Filling missing values, Discretization and binning, Outlier detection and filtering, Permutation and random sampling, Benefits of data transformation, Challenges.

Unit IV: Descriptive Statistics (Chapter 6, 7, 8, 9)

Descriptive Statistics: Distribution function, Measures of central tendency, Measures of dispersion, Types of kurtoses, calculating percentiles, Quartiles, Grouping Datasets, Correlation, Understanding univariate, bivariate, multivariate analysis, Time Series Analysis

Unit V: Model Development and Evaluation (Chapter 10, 11, 12, 13)

Unified machine learning workflow, Data pre-processing, Data preparation, Training sets and corpus creation, Model creation and training, Model evaluation, best model selection and evaluation, Model deployment

Case Study: EDA on Wine Quality Data Analysis