Lesson 2 ❮ Lesson List ❮ Top Page
❯ 2.4 Concatenate, Join, Merge
⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳ Video 8m 22s
☷ Interactive readings 5m
Pandas function
Combines two or more pandas objects vertically or horizontally
Aligns only on the index
Errors whenever a duplicate appears in the index
Defaults to outer join with option for inner
DataFrame method
Combines two or more pandas objects horizontally
Aligns the calling DataFrame's column(s) or index with the other objects' index (and not the columns)
Handles duplicate values on the joining columns/index by performing a cartesian product
Defaults to left join with options for inner, outer, and right
DataFrame method
Combines exactly two DataFrames horizontally
Aligns the calling DataFrame's column(s)/index with the other DataFrame's column(s)/index
Handles duplicate values on the joining columns/index by performing a cartesian product
Defaults to inner join with options for left, outer, and right
Read in the 2016, 2017, and 2018 into a list of DataFrames using a loop instead of three different calls to the read_csv function.
The concat function is the only one able to combine DataFrames vertically. Let's do this by passing it the list stock_tables:
Now that we have started combining DataFrames horizontally, we can use the join and merge methods to replicate this functionality of concat. Here we use the join method to combine the stock_2016 and stock_2017 DataFrames.
By default, the DataFrames align on their index. If any of the columns have the same names, then you must supply a value to the lsuffix or rsuffix parameters to distinguish them in the result.
To exactly replicate the output of the concat function, we can pass a list of DataFrames to the join method.
Now, let's turn to merge that, unlike concat and join, can combine exactly two DataFrames together. By default, merge attempts to align the values in the columns that have the same name for each of the DataFrames. However, you can choose to have it align on the index by setting the boolean parameters left_index and right_index to True. Let's merge the 2016 and 2017 together.
By default, merge uses an inner join and automatically supplies suffixes for identically named columns. Let's change to an outer join and then perform another outer join of the 2018 data to exactly replicate concat.