Search this site
Embedded Files
AIMD GPDS Courses
  • Home
  • Courses
  • Contact
AIMD GPDS Courses
  • Home
  • Courses
  • Contact
  • More
    • Home
    • Courses
    • Contact

日本語  ❯

Lesson 2    ❮    Lesson List    ❮    Top Page

2.1  Filtering with Boolean

2.2  Grouping Object

2.3  Aggregation

❯  2.4  Concatenate, Join, Merge

⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳  Video   8m 22s
☷  Interactive readings   5m

Comparing concat, join, & merge

concat

    • Pandas function

    • Combines two or more pandas objects vertically or horizontally

    • Aligns only on the index

    • Errors whenever a duplicate appears in the index

    • Defaults to outer join with option for inner



join

    • DataFrame method

    • Combines two or more pandas objects horizontally

    • Aligns the calling DataFrame's column(s) or index with the other objects' index (and not the columns)

    • Handles duplicate values on the joining columns/index by performing a cartesian product

    • Defaults to left join with options for inner, outer, and right



merge

    • DataFrame method

    • Combines exactly two DataFrames horizontally

    • Aligns the calling DataFrame's column(s)/index with the other DataFrame's column(s)/index

    • Handles duplicate values on the joining columns/index by performing a cartesian product

    • Defaults to inner join with options for left, outer, and right

Read in the 2016, 2017, and 2018 into a list of DataFrames using a loop instead of three different calls to the read_csv function.

Combining Vertically

The concat function is the only one able to combine DataFrames vertically. Let's do this by passing it the list stock_tables:

Now that we have started combining DataFrames horizontally, we can use the join and merge methods to replicate this functionality of concat. Here we use the join method to combine the stock_2016 and stock_2017 DataFrames. 

Combining Horizontally

By default, the DataFrames align on their index. If any of the columns have the same names, then you must supply a value to the lsuffix or rsuffix parameters to distinguish them in the result.

To exactly replicate the output of the concat function, we can pass a list of DataFrames to the join method.

Combining Using Index

Now, let's turn to merge that, unlike concat and join, can combine exactly two DataFrames together. By default, merge attempts to align the values in the columns that have the same name for each of the DataFrames. However, you can choose to have it align on the index by setting the boolean parameters left_index and right_index to True. Let's merge the 2016 and 2017 together.

By default, merge uses an inner join and automatically supplies suffixes for identically named columns. Let's change to an outer join and then perform another outer join of the 2018 data to exactly replicate concat.

©2023. All rights reserved.  Samy Baladram,
Graduate Program in Data Science - GSIS - Tohoku University
Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse