Pandas Guides

Welcome to our page on pandas guides! We hope you will find it useful. If you have no experience with pandas, we recommend that you read through the introduction guide to get a sense of how pandas DataFrame objects work. After looking through the introduction, look for a guide that contains the function you're interested in, based on the outlines we've provided on this page.

There is also a feedback section at the end of the page; we would greatly appreciate any criticism or places where we should improve. Thanks for reading!

Introduction to DataFrames

This is a guide for getting started with pandas, and more specifically, the DataFrame object that represents tables in pandas. Here is a list of topics this guide covers:

  • Importing data as a DataFrame

  • Viewing DataFrames

  • Indexing (accessing specific rows, like rows 1-15, in the DataFrame)

  • Accessing one column

    • Concept: each DataFrame column is a pandas Series object

  • Accessing two or more columns

  • Slicing based on column values (accessing rows based on column values, e.g., all rows with column_1>5)

  • Renaming columns

  • Exporting data

Feature Transformations & Vectorized Operations

This guide will cover the basics of creating new columns by transforming our columns in some way (in machine learning parlance, this is called "feature transformation"). Here is a list of what we'll cover:

  • Creating new columns

  • Deleting (dropping) columns

  • Vectorized operations: doing math on columns (adding, subtracting, multiplying, etc.)

    • Example: normalizing a column

  • Vectorized string operations

    • Example: getting the last letter of a word in each row

    • Example: getting the last digit of a number in each row

  • Other vectorized operations (link to a list of many additional functions, including rounding)

  • Advanced feature transformation (applying any function you wrote to a column of data)

    • Anonymous functions

    • Column transformations with arbitrary functions

Statistical functions

This guide will cover the various statistical functions that can be used on both DataFrames and Series. Common metrics you wish to compute can be done quickly with the use of these following functions:

  • Max

  • Min

  • Median

  • Mean

  • Sum

  • Count

  • Mode

  • Quantile

  • Percent Change

  • Standard Deviation

  • Variance

Groupby

This guide will cover a powerful function called groupby. You want to use groupby if you want to compute aggregate statistics (such as counts and sums) of one column of your data, grouped by the unique values in another column.

Here is what's covered in this guide:

  • Computing the total count of each unique value in a column

    • Example: number of unique Iris species

  • Computing the sum of a column, for each unique value in another column

    • Example: sum of Iris petal_widths for each Iris species

    • Aside: Plotting with groupby

  • Other groupby operations (mean, median, max, min, etc.)

If you're familiar with a database language like SQL, groupby in pandas is exactly like group by there. If not, don't worry; we'll explain how it works through many examples.

Join

This guide will cover a powerful function called join. You want to use join if you want to combine DataFrames which may share columns. This is incredibly useful for comparing data side by side which you can then graph to display similar data.

Here is what's covered in this guide:

  • Splitting DataFrames by column names

    • Example: only want the petal length and petal width of Iris species

  • Combining two DataFrames that do not share a column

  • Combining two DataFrames that have a common column

    • Example: Comparing the price of Apple and Tesla stock on the same date

  • Different ways to join DataFrames

    • Example: Wanting the price for both stocks for all the dates available even if the date is not included in one of the DataFrames

    • Example: Wanting the price for both stocks for only the dates that they have in common

  • Combining multiple DataFrames

  • Using statistical functions on the new DataFrame

  • Comparing values in a graph

If you're familiar with a database language like SQL, join in pandas is exactly like the different joins there. If not, don't worry; we'll explain how it works through many examples.

Graphs

This guide will cover the creation of graphs and different graphs you can create. Graphs are a great way to present data to others. Some practical uses can be visualization of profits at different stores, looking at a breakdown of your budget and the percentage that goes to different categories, and more. Some kinds of graphs that can be created are:

  • Bar

  • Scatter

  • Histogram

  • Box

  • Line

  • Pie

Conclusion

Upon completion of reading through this guide, you should have greater understanding of DataFrames and Series and how to manipulate them with groupby and join using the Pandas library. You will also have the knowledge necessary to perform mathematical and statistical operations on these DataFrames. Finally, you will have the tools to be able to create various graphs to display your data and the metrics you have calculated.

Additional Resources

Our guides cover many of the most useful pandas functions (based on our experiences as engineers). However, there are many more methods you might be interested in exploring. Please use the link below to access pandas' documentation, which we frequently referenced to ensure that our guides were accurate.

Pandas Documentation


Feedback

If you'd like to give feedback on any of our guides, please do so with the Google Form embedded below.