Objective: Introduce participants to the fundamentals of data science and provide a strong foundation in for Data Science tasks.
Morning Session:
Introduction to Data Science:
o Overview of Data Science: What it is, applications, and career paths
o Key concepts: Data collection, cleaning, exploration, visualization, machine learning, and deployment.
o Types of Data: Structured vs. unstructured data
Introduction to Python Libraries:
o Numpy: Array manipulation and basic operations.
o Pandas: Introduction to data frames, reading data from CSV/Excel, SQL, and APIs files.
Afternoon Session:
Setting Up the Environment:
o Installation of Python, Jupyter Notebook, and libraries (NumPy, Pandas, Matplotlib).
o Intro to cloud platforms: Google Colab, Kaggle etc. • Hands-on Exercise:
o Hands-on: Analyze a real-world dataset.
o Introduction to data loading, cleaning, and basic exploration using Pandas.
To teach participants how to explore and visualize data using Python libraries.
Morning Session:
Data Preprocessing and Cleaning:
o Handling missing data, duplicate rows, and outliers.
o Data transformation and normalization.
Exploratory Data Analysis (EDA):
o Importance of EDA in data science.
o Summarizing data: Descriptive statistics, correlation, and distributions. Afternoon Session:
Data Visualization:
o Introduction to Matplotlib and Seaborn.
o Plotting histograms, bar charts, scatter plots, and heatmaps
Hands-on Exercise:
o EDA and visualizing a real dataset using Pandas, Matplotlib, and Seaborn.
o Group task: Interpret results from visualizations and summarize findings.
To teach participants how to explore and visualize data using Python libraries.
Morning Session:
Data Preprocessing and Cleaning:
o Handling missing data, duplicate rows, and outliers.
o Data transformation and normalization.
Exploratory Data Analysis (EDA):
o Importance of EDA in data science.
o Summarizing data: Descriptive statistics, correlation, and distributions. Afternoon Session:
Data Visualization:
o Introduction to Matplotlib and Seaborn.
o Plotting histograms, bar charts, scatter plots, and heatmaps
Hands-on Exercise:
o EDA and visualizing a real dataset using Pandas, Matplotlib, and Seaborn.
o Group task: Interpret results from visualizations and summarize findings.
Objective: Introduce participants to advanced machine learning techniques, along with model evaluation. Morning Session: • Decision Trees and Random Forests o Theory and practical applications. o Hands-on: Predict loan approvals using Random Forests. • Clustering Techniques K-means and Hierarchical Clustering. Hands-on: Segment customers from an e-commerce dataset. • Unsupervised Learning: o K-Means Clustering: How it works and when to use it. o Dimensionality Reduction: Introduction to PCA (Principal Component Analysis). • Model Evaluation: o Cross-validation: K-fold cross-validation and its importance. o Evaluation metrics for clustering and classification models. Afternoon Session: • Hands-on with Unsupervised Learning: o Implement K-Means Clustering on a dataset. o Apply PCA to reduce dimensions and visualize results. • Model Tuning: o Hyperparameter tuning using GridSearchCV and RandomizedSearchCV. o Understanding the impact of overfitting and underfitting. • Hands-on Exercise: o Apply K-Means Clustering and PCA on a real-world dataset
Introduction to Deep Learning: Basic concepts of neural networks. o Overview of Natural Language Processing (NLP) and time series analysis. • Model Deployment: o Introduction to model deployment concepts (overview of Flask, FastAPI, or cloud platforms). o Building a simple web API to deploy a model. Afternoon Session: • Mini-Project: o Participants work on a mini-project applying the concepts learned (e.g., analyzing a real dataset and creating a machine learning model). o Presentations: Participants present their findings and models. • Wrap-Up and Q&A: o Recap of the week’s topics. o Future learning paths and resources. o Open Q&A session.
Materials and Setup:
• Tools/Software Required: Python, Jupyter Notebooks, Anaconda, libraries: Numpy, Pandas, Matplotlib, Seaborn, Scikit-Learn.
• Project Datasets: Use publicly available datasets from Hugging Face, Papers with Code or Kaggle datasets for hands-on exercises.
• Resources: A slide deck for each session, coding notebooks with examples, and access to learning platforms (e.g., Kaggle, GitHub).