Grouping & Aggregation
Being able to extract particular information and combine them back together allows for much greater analysis to take place. Below you will find examples where I have utilised these skills in my analysis.
Being able to extract particular information and combine them back together allows for much greater analysis to take place. Below you will find examples where I have utilised these skills in my analysis.
Summary of Project:
In this project, I was working with data relating to the various administrative wards of Greater London. The original data set had a tremendous amount of information in it, allowing for the use of grouping and aggregation to make more insights relating to specific topics.
A snapshot of the original data frame
As with any data set, first I went about cleaning and reformatting the data for further analysis. With this data set, the first thing I wanted to do was to split the 'Ward Name' series into a 'Borough' and 'Ward' respectively.
This could be done by creating functions to create each part, then creating new series using .apply() function along witthe the created functions.
Finally, I wanted to move the newly created series to the left side of the data frame and remove the combined 'Ward name' series
Reformatted data frame - notice the Borough and Ward series are first
After making a copy of the reformatted data frame, the first metric I wanted to find was the total population by Borough.
This would require grouping the entries which have matching 'Borough' values. To do this I can use a mask including the .groupby() function.
Next, I wanted a list of Boroughs with the most Wards in them.
Again, I used the .groupby() function, however I also needed to order the results in descending order using the .sort_values() function.
Next I wanted to have a look at the car usage by Borough.
For this, I would again utilise the .groupby() function, as well as the .agg() function to perform multiple operations to the data.
Not happy with the lengthy decimals for the mean, I decided to round that series.
I saved this data frame as 'car_stats' to be used later.
Finally, I aiemd to create a dataframe which combines all the transport data from the original data set. Having already made a dataframe with the car data, I just needed to make another data frame with the other information then combine them.
First, I created the new data frame from the remaining transport information available.
Then the final steps after that, were to merge the car data with the newly made transport data and remove the unwanted series.