Last updated: Feb 1, 2025
First of all, you need an environment (called IDE or Integrated Development Environment, etc.) to type and execute your code. We'll use Google Colab and if you have a Gmail account, you already have access to it. Google Colaboratory is free and requires no setup and is great to start with Python. It has limited storage space and may be slow at times. But it is sufficient for this short course. Check this link and see if it works for you: https://colab.research.google.com/drive/1HEkVymxOvXvE77epZl_D2oQa2KWOrWeD?usp=sharing.If you have a Chromebook, it will run on that to as the only thing it needs is the Chrome browser. Hence Windows, Mac, Linux, Chromebooks, and cell phones / tablets of all types can be used and if it works, you are all set! Note that smaller devices like iPads maybe showing you a reduced version of the site. In the settings of that page, there is an option called "Desktop site" or "Show Desktop site". Click on it to see the various options like File, Run, etc.
You may want to check a short video to make you feel better about Google Colab, https://www.youtube.com/watch?v=inN8seMm7UI
7. Learn how neural networks function by coding them in a Google Sheet and learn what 'weights' or 'parameters' do:
Google Slides Neural Networks and the corresponding Google Sheet: Understanding Neural Networks
8. Supervised Learning - Classification:
Numerical Data: Iris Flower using Decision Trees
Text Data:
Sentiment Analysis Version 1 and the corresponding data: movie_data_reduced.xlsx
Sentiment Analysis Version 2 and the corresponding data: movie_data_reduced.xlsx (It allows you to write a review and get a rating!)
Sentiment Analysis Version 3 is the same as above but uses the FULL data, which it reads from a Google sheet. Accuracy of almost 90%.
Sentiment Analysis Version 4 is the same as above but allows you to save, and then reload the ML model as a pickle file.
Sentiment Analysis Version 5 is the same as above but uses 5 sentiments rather than 2 (coming soon!)
Images: SVM for Digits (Optional)
9. Supervised Learning - Regression:
Numerical Data - Weather Prediction
Optional Reading (External page): Linear Regression, https://medium.com/codex/the-derivation-of-the-linear-regression-coefficient-c801771a9322
10. Unsupervised Learning (Clustering):
Numerical and Categorical Data: KMeans Clustering for Iris Flower
Images: KMeans Clustering for Digits
11. Google Colab: Project 1 - California Housing.ipynb
12. Google Colab: Project 2 - MNIST Data.ipynb
These are not 'required' to learn the basics of data science. But these enhance your understanding of some of the topics discussed earlier. These jupyter notebooks may also be used as 'seeds' or project ideas.
To process text with computers, we need to vectorize the text.
Here is a slide deck to show a few simple examples: Vectorizers Explained.
Here is a python file that explains what it is and how it works, using two different vectorizing techniques: Vectorizers Explained.ipynb
You have seen sentiment analysis of movie reviews. What if you want to do something similar but don't have access to labeled data (text and its category). Well, you can generate your own sentiments, using random numbers: Generate your own sentiments data.ipynb
Generate Music: This file does NOT run in Google Colab and requires a lot of setup. It is also not related to Machine Learning. Find instructions in the file itself: Music from Thornny.ipynb
Use your webcam to take your photo. Again, not ML but can be used in an application (say face detection). Difficult and long coding: Lights, Camera, Action.ipynb
More coming soon!
You may ask, what can I do to keep learning? The best thing you can do is to stay connected with the community via email newsletters and/or meetup groups. Some are listed here.
Free Resources
Free online courses from Harvard University (via EdX),
Data Science: Machine Learning (8-weeks long), https://pll.harvard.edu/course/data-science-machine-learning
Machine Learning and AI with Python (6-weeks long), https://pll.harvard.edu/course/machine-learning-and-ai-python
Etc.
The following subscriptions also are FREE - Please do not pay!
Daily Dose of Data Science, https://blog.dailydoseofds.com/
Deeplearning AI, https://www.deeplearning.ai/
Machine Learning Mastery, https://machinelearningmastery.com/
Analytics Vidhya, https://www.analyticsvidhya.com/
FourthBrain, https://fourthbrain.ai/
Abacus AI, https://abacus.ai/
DataCamp, https://www.datacamp.com/
DataScienceSalon (AI & Data Weekly), https://roundtable.datascience.salon/
The Hugging Face, https://huggingface.co/
Kaggle, https://www.kaggle.com/
KDNuggets, https://www.kdnuggets.com/
John Snow Labs, https://www.johnsnowlabs.com/
TLDR AI, https://tldr.tech/
AI by Hand (this one is a bit advanced as it tries to teach all major AI models, using calcuations done by hand to explain how each one works.) https://aibyhand.substack.com/
To learn most of the above material, you don't need more than what you have already learned in high school. However, if feel rusted, or did not learn all the basics earlier, here is short guide to the key items you must learn.
Mathematics
Linear Algebra (Vectors, Matrices, etc): See units 1 and 2 of ‘Linear Algebra’ at the Khan Academy, https://www.khanacademy.org/math/linear-algebra
Calculus: See units 1, 2 and 4 of ‘Calculus 1’ at the Khan Academy, https://www.khanacademy.org/math/calculus-1
Probability and Statistics
Probability Theory, Random Variables, Distributions, Statistical Inference. You can’t learn all of this in a short period of time but at least see units 1-7, 9-10 of ‘Statistics and Probabilty’ at the Khan Academy, https://www.khanacademy.org/math/statistics-probability/probability-library
Programming
Introduction to Python Programming: See units 1-3 of ‘Intro to computer science - Python’ at the Khan Academy, https://www.khanacademy.org/computing/intro-to-python-fundamentals.
Note: If you have time, complete the above course (that means finish units 4-6 too).
Python for Data Science requires knowledge of libraries like Pandas, Numpy and Matplotlib. These are already covered in the jupyter notebooks mentioned above but here are some video links.
Pandas: This video covers Pandas and related topics, https://www.youtube.com/watch?v=mkv5mxYu0Wk.
Numpy: A video on that, https://youtu.be/QUT1VHiLmmI?si=c_kjolRLCPSjLFIf
Matplotlib and Seaborn, https://youtu.be/a9UrKTVEeZA?si=VI_cfRt7Luk074Ml