Class: III B.Tech , Semester: V, CAY: 2025-26
Course Objectives: The main objectives of the course are to
Introducing the fundamentals of Exploratory Data Analysis.
Cover essential exploration techniques for understanding multivariate data by summarizing it through statistical methods and graphical methods
Evaluate the models and select the best model.
Course Educational Objectives (CEOs)
To provide foundational knowledge in data science concepts, including the role and importance of Exploratory Data Analysis (EDA) in the data science lifecycle.
To equip students with practical skills in data visualization and exploration techniques using modern tools and libraries.
To develop the ability to preprocess and transform data effectively for analysis and modelling, ensuring data quality and readiness.
To enable students to apply statistical methods for summarizing and interpreting data, fostering analytical thinking.
To prepare students to build and evaluate predictive models, enabling them to solve real-world problems using data-driven approaches.
COURSE OUTCOMES (COs): At the end of the course, the student will be able to
CO1: Understand the Fundamentals of EDA and Data Science (Understand, L2)
CO2: Apply Visualization Techniques for Data Exploration (Apply, L3)
CO3: Perform Data Transformation and Preprocessing (Apply, L3)
CO4: Analyze Data Using Descriptive Statistics (Analyze, L4)
CO5: Develop and Evaluate Predictive Model (Apply, L3)
Textbook:
Reference Books:
1. Ronald K. Pearson, Exploratory Data Analysis Using R, CRC Press, 2020
2. Radhika Datar, Harish Garg, Hands-On Exploratory Data Analysis with R: Become an expert in exploration data analysis using R packages, 1st Edition, Packet Publishing, 2019
Web References:
1. https://github.com/PacktPublishing/Hands-on-Exploratory-Data-Analysis-with-Python
2. https://www.analyticsvidhya.com/blog/2022/07/step-by-step-exploratory-dataanalysis-eda-using-python/#h-conclusion
3. https://github.com/PacktPublishing/Exploratory-Data-Analysis-with-Python-Cookbook
Exploratory Data Analysis Fundamentals: Understanding data science, the significance of EDA, Steps in EDA, making sense of data, Numerical data, Categorical data, Measurement scales, Comparing EDA with classical and Bayesian analysis, Software tools available for EDA, getting started with EDA.
Visual Aids for EDA: Technical requirements, Line chart, Bar charts, Scatter plot using seaborn, Polar chart, Histogram, Choosing the best chart. Case Study: EDA with Personal Email, Technical requirements, Loading the dataset, Data transformation, Data cleansing, Applying descriptive statistics, Data refactoring, Data analysis.
Data Transformation: Merging database-style data frames, concatenating along with an axis, merging on index, Reshaping and pivoting, Transformation techniques, handling missing data, Mathematical operations with NaN, Filling missing values, Discretization and binning, Outlier detection and filtering, Permutation and random sampling, Benefits of data transformation, Challenges.
Descriptive Statistics: Distribution function, Measures of central tendency, Measures of dispersion, Types of kurtoses, calculating percentiles, Quartiles, Grouping Datasets, Correlation, Understanding univariate, bivariate, multivariate analysis, Time Series Analysis
Unified machine learning workflow, Data pre-processing, Data preparation, Training sets and corpus creation, Model creation and training, Model evaluation, best model selection and evaluation, Model deployment
Case Study: EDA on Wine Quality Data Analysis
Geospatial EDA (Geo-EDA): Introduction, Data Types & Basic Visualizations
Geo-EDA: Spatial Queries, Aggregations & Advanced Mapping
Interactive Dashboards: Introduction to Plotly & Simple Dash Layouts
Interactive Dashboards: Callbacks, Interactivity & Deployment Basics
Advanced Outlier Detection: Statistical & Proximity-Based Methods
Advanced Outlier Detection: Model-Based Methods & Interpretation