Pandas Guides
Welcome to our page on pandas guides! We hope you will find it useful. If you have no experience with pandas, we recommend that you read through the introduction guide to get a sense of how pandas DataFrame objects work. After looking through the introduction, look for a guide that contains the function you're interested in, based on the outlines we've provided on this page.
There is also a feedback section at the end of the page; we would greatly appreciate any criticism or places where we should improve. Thanks for reading!
Introduction to DataFrames
This is a guide for getting started with pandas, and more specifically, the DataFrame object that represents tables in pandas. Here is a list of topics this guide covers:
Importing data as a DataFrame
Viewing DataFrames
Indexing (accessing specific rows, like rows 1-15, in the DataFrame)
Accessing one column
Concept: each DataFrame column is a pandas Series object
Accessing two or more columns
Slicing based on column values (accessing rows based on column values, e.g., all rows with column_1>5)
Renaming columns
Exporting data
Feature Transformations & Vectorized Operations
This guide will cover the basics of creating new columns by transforming our columns in some way (in machine learning parlance, this is called "feature transformation"). Here is a list of what we'll cover:
Creating new columns
Deleting (dropping) columns
Vectorized operations: doing math on columns (adding, subtracting, multiplying, etc.)
Example: normalizing a column
Vectorized string operations
Example: getting the last letter of a word in each row
Example: getting the last digit of a number in each row
Other vectorized operations (link to a list of many additional functions, including rounding)
Advanced feature transformation (applying any function you wrote to a column of data)
Anonymous functions
Column transformations with arbitrary functions
Statistical functions
This guide will cover the various statistical functions that can be used on both DataFrames and Series. Common metrics you wish to compute can be done quickly with the use of these following functions:
Max
Min
Median
Mean
Sum
Count
Mode
Quantile
Percent Change
Standard Deviation
Variance
Groupby
This guide will cover a powerful function called groupby. You want to use groupby if you want to compute aggregate statistics (such as counts and sums) of one column of your data, grouped by the unique values in another column.
Here is what's covered in this guide:
Computing the total count of each unique value in a column
Example: number of unique Iris species
Computing the sum of a column, for each unique value in another column
Example: sum of Iris petal_widths for each Iris species
Aside: Plotting with groupby
Other groupby operations (mean, median, max, min, etc.)
If you're familiar with a database language like SQL, groupby in pandas is exactly like group by there. If not, don't worry; we'll explain how it works through many examples.
Join
This guide will cover a powerful function called join. You want to use join if you want to combine DataFrames which may share columns. This is incredibly useful for comparing data side by side which you can then graph to display similar data.
Here is what's covered in this guide:
Splitting DataFrames by column names
Example: only want the petal length and petal width of Iris species
Combining two DataFrames that do not share a column
Combining two DataFrames that have a common column
Example: Comparing the price of Apple and Tesla stock on the same date
Different ways to join DataFrames
Example: Wanting the price for both stocks for all the dates available even if the date is not included in one of the DataFrames
Example: Wanting the price for both stocks for only the dates that they have in common
Combining multiple DataFrames
Using statistical functions on the new DataFrame
Comparing values in a graph
If you're familiar with a database language like SQL, join in pandas is exactly like the different joins there. If not, don't worry; we'll explain how it works through many examples.
Graphs
This guide will cover the creation of graphs and different graphs you can create. Graphs are a great way to present data to others. Some practical uses can be visualization of profits at different stores, looking at a breakdown of your budget and the percentage that goes to different categories, and more. Some kinds of graphs that can be created are:
Bar
Scatter
Histogram
Box
Line
Pie
Conclusion
Upon completion of reading through this guide, you should have greater understanding of DataFrames and Series and how to manipulate them with groupby and join using the Pandas library. You will also have the knowledge necessary to perform mathematical and statistical operations on these DataFrames. Finally, you will have the tools to be able to create various graphs to display your data and the metrics you have calculated.
Additional Resources
Our guides cover many of the most useful pandas functions (based on our experiences as engineers). However, there are many more methods you might be interested in exploring. Please use the link below to access pandas' documentation, which we frequently referenced to ensure that our guides were accurate.