Chapter 2 of datacamp course Exploratory Data Analysis in Python
MS Word related video links Setting up Heading formatting and numbering as well as tables of contents for Scientific reports, documents and Theses , Inserting Figures and Figure Legends in Microsoft Word, Adding sections BEFORE your table of contents without numbering (optional), Producing a Figure for publication containing microscopy images in MS Word (optional) ampling
Prompt Engineering, Vibe coding,
Types of prompts: ICC, zero shot, n-shot, CoT
LLMs are not deterministic models, Temperature, Tokens, Top p, Top k, context window, tool calling- Internet Search, Hallucination
Importance of Domain Knowlede, and Meta Skills in use of LLM models and vibe coding
System and User Prompt
Definition of Vibe coding and practice with Google AI studio
Sampling
Types of Sampling Methods (4.1) see this video https://www.youtube.com/watch?v=pTuj57uXWlk
Read the html file related to sampling we discussed in the class [download this file and double click it to open in your web browser e.g. chrome, firefox, or edge etc]
Census, Nonresponse, and Undercoverage (4.2) https://www.youtube.com/watch?v=EZrP_av3cmA
The Normal Distribution and the 68-95-99.7 Rule (5.2) https://www.youtube.com/watch?v=mtbJbDwqWLE
Z-Scores, Standardization, and the Standard Normal Distribution (5.3) https://www.youtube.com/watch?v=2tuBREK_mgE
Project: Analyzing Crime in Los Angeles - see project description from here and press continue project hyperlink. On the left window you will see various tabs related to project guides, resouces, and solution. On the Right (main window) you will see project notebook, where you will run the code
Note:
- As discussed in the class this is part of yout final term paper. So complete it yourself and it will be part of your sessionals assesment as well.
- Students having less than 70% attendance are not elligible to sit in the final term exams and are also not elligible for sessional marks.
Data Science Work Flow
Defining the problem
Data Prepation and Injection - Extract Transforkm Load (ETL)
Data Preprocessing
One hot encoding
Data Modeling, Analysis and Machine Learning
Introduction to Supervised Machine Learning: Classification - Regression, Model evaluation metric (accuracy), DV, IV
Delievering the results
Report writing
https://www.datacamp.com/blog/sports-analytics-euro-2024?src=data-ai-world-cup (TBD)
https://www.datacamp.com/code-along/create-claude-skills-for-data-tasks?src=data-ai-world-cup (TBD)
https://media.datacamp.com/cms/resources---create-claude-skills-for-data-tasks-v2.pdf
https://media.datacamp.com/cms/claude-skills-for-data-tasks.pdf
Report writing using MS Word by Dr James Clark King's College London: ehealth.kcl.ac.uk/sites/physiology/LST-MSOffice5.html
Inserting Figures and Figure Legends in Microsoft Word (must listen)
Setting up Heading formatting and numbering as well as tables of contents for Scientific reports, documents and Theses (must listen)
Adding sections BEFORE your table of contents without numbering
Producing a Figure for publication containing microscopy images in MS Word
Colab Notebooks For Python Basics:
Visit colab and open new notebook for practice and code we learn in the class. Visit this notebook discussed in the class dated 12 Feb 2026
Also visit these notebooki Links: notebook , Numpy Quick Tutorial, Pandas Quick Tutorial
Online Python resources:
Writing Efficient Python Code (to master Python concepts useful for AI, ML, EDA, and Python programmer)
Exploratory Data Analysis
Introduction to Statistics (in addition to related youtube videos one can learn theory from here)
Introduction to Statistics in Python (in addition to related youtube channels one can learn from here)
Sampling in Python (in addition to related youtube videos one can learn theory from here)
Hypothesis Testing in Python (It will complement the topic discussed in EDA related course)
Note: It is must to attempt related Practice in Python, Theory and also related Projects available at datacamp!
Theory related to Exploratory Data Analysis, Statistics (mean, median, mode, variance, standard deviation, range, Quartiles, deciles, percentiles, correlation, outliers, Z score, confidence interval, hypothesis testing), Visualizations (pie chart, histogram, bar chart, box & whisker plot etc.), and Experimental Design can be learnt from various Youtube channels.
Our main channel to follow for related theory is Simple Learning Pro.
Semester Project 1: Scrap espncricinfo data (see the adjacent link) into .csv file and answer following using Python as well as Excel:
Who is the top man of the series award in test cricket. Provide list of top 5 players.
List names of Pakistani and Indian players present in the list
Which type of playerr's got most awards (bowler, batsman, all rounder)?
Which country has most number of players in top 10, top 20, top 30, ...
Assignment 1: Explore five job postings related to web scraping and prepare a word document. Carefully read those jobs and make a summary in your own words.
Assignment 2: Perform Data preparation and preprocessing of the .csv file (or excel file) downloaded from cricinfo website. You can make a small excel file of your own and apply name and country splitting formulas on player column.
Activity: Read the following link and explore various projects: Link to KD-Nuggets' Ten project list for Data Scientist