Perform the following on the dataset:
(i): Pick a suitable dataset from a trustable source of your own interest (For example: cricket, music, bollywood data, food, agriculture, football, etc.). The selected dataset should comprise at least 1000 rows and at least 5 columns which contain both categorical (at least 2) and numerical (at least 3) variables. Kindly cite the link of the data source and refrain yourself from using the data that are not intended for public consumption.
OR
You can use the same dataset which you have used in the previous activity. (Make sure all conditions for dataset is fulfilled and cite the link of the previous data source as well)
(ii): Transfer the data from the source mentioned above into a Google Spreadsheet. You can either embed the spreadsheet directly into your Student Portfolio page or provide a link to access it.
Create another Google Spreadsheet named “Activity-3” and execute the following tasks within it.
Identify the following from your dataset and also give the reason behind it:
Measures of central tendency (by visualization also)
Measures of dispersion
Select any two categorical variables from your dataset and find the association between them. Also, provide your insights regarding the observed relationship between the variables based on plotting a 100% stacked bar chart.
Select any two numerical variables, say X1 and X2, from your dataset and plot the scatter plot between them. Also, provide your insights regarding the observed relationship between the variables based on the scatter plot visualization.
Find the covariance and correlation between the above selected two variables and interpret the relationship between the variables based on the obtained values.
Note:
1. “Please refer to the sample Google Spreadsheet for the same activity”.
2. While submitting the spreadsheet of your work, please give only the view access to “Anyone with the link”.
Solution To The Activity👇
Sheet Link: Click here
Source Of Data: Click here
Extra Activity 3