We now have the chat system in our home page. Check it out!
For many data analysts and scientists who use Python, the vast majority of their work will be done using pandas. This is likely because the initial data exploration and preparation tend to take the most time. Some entire projects consist only of data exploration and have no machine learning component. Data scientist spend so much time on this stage that it is said that data scientist spend 80% of their time cleaning the data and the other 20% complaining about cleaning the data.
In this course, we will see some fundamentals of pandas as the main data-handling tool. We will also see some commands in matplotlib for data visualization and also some commands and methods in scikit-learn for unsupervised and supervised learning.
Length 2.5 Weeks (180 minutes per session)
Level Introductory
Language English
Course Type Self-paced during lecture hours
Data Handling with pandas
Data Visualization with matplotlib
Basic Data Analysis
Unsupervised Learning
Supervised Learning
Basic knowledge of Python syntaxes are necessary. If you have not previously used Python, it is highly suggested to take Skill-up Training class before taking this class.
Lesson 1: Data Handling
In this lesson, we will see some basic method of handling the data including storing, reading, validating, and filling the missing data.
Lesson 2: Data Visualization
In this lesson, we will learn some basic plotting method using matplotlib. Labelling and annotationg the plot will also be covered here, followed by a brief explanation on choosing the right type of visual for different types of data.
Topic 3: Data Analysis
In this lesson, we will try out some statistical method to get a meaningful information for our data.
Topic 4: Data Wrangling
In this lesson, we will further get more information from the data using some unsupervised machine learning method, including SVD, factor analysis, and SVD.
Topic 5: Advanced Data Handling
In this lesson, we will see some advanced method in data handling. We will also deal with timestamp object and data management.
The grading for this class will be using point system.
Five in-class exercises: each session gives maximum 4 points (total of 20).
Five assignments: each assignment consists of maximum 16 points (total of 80).
Submission is on each respective Assignment page.
Penalties (point reduction) will be given for late submission, unsupportive behavior, etc.
Assignments can be submitted through the provided form in assignment page. For communication and exercise submission, the embedded chatroom in each lesson or Chat main page can be used.
Before sharing the link for the code, make sure the top-right corner of your code shows "Draft Saved" before you share the link. In general, the code should automatically saved. (Try to make some small changes in the code to generate another link.)
Adjust your window size so both the output and input windows are shown side-by-side, as the following picture:
Assistant Professor
M. Samy Baladram
Teaching Assistant
Mike Zielewski
Teaching Assistant
Pratik Sahu
This site will provide most of the information to solve the exercises and assignments. Here are some useful additional reference can be found.
Websites: