Activity Overview:
Activity 1: Create a Google Site as a student portfolio with Statistics 1 as one of the pages inside it. Provide a short and relevant description about the course on this page.
Activity 2.1: Pick a suitable dataset from a trustable source of your own interest (For example: cricket, music, bollywood data, food, agriculture, football, etc.). The selected dataset should comprise at least 1000 rows and at least 5 columns which contain both categorical (at least 2) and numerical (at least 3) variables. Kindly cite the link of the data source and refrain yourself from using the data that are not intended for public consumption.
Activity 2.2: Transfer the data from the source mentioned above into a Google Spreadsheet . You can either embed the spreadsheet directly into your Student Portfolio page or provide a link to access it.
Create another Google Spreadsheet and execute the following tasks within it.
1. Identify the following from your dataset and also give the reason behind it:
categorical and numerical variables
Scales of measurement of the variables
Discrete and continuous variables
2. Visualize the dataset using suitable plots. Also, provide the reasons behind choosing the respective plots and interpret the obtained trends.
Solution To The Activity👇
Bar Chart: Total Runs by Batsman
Reason for Choosing: This plot helps us quickly compare the total runs scored by each batsman in the dataset. Bar charts are effective for displaying and comparing discrete data categories.
Interpretation: We can see which batsmen are the top performers in terms of run-scoring. In this sample, Devon Conway is leading with 16 runs.
Histogram: Distribution of Runs
Reason for Choosing: Histograms are ideal for visualizing the distribution of a single numerical variable. Here, it shows how often different run totals occur within the dataset.
Interpretation: This plot reveals that most batsmen scored fewer runs, with the distribution skewed right. This indicates that higher scores are less frequent.
Scatter Plot: Runs vs. Balls Faced
Reason for Choosing: Scatter plots are excellent for showing the relationship between two numerical variables. Here, we can see if there's a correlation between the number of runs scored and the number of balls faced by each batsman.
Interpretation: The plot shows a slight positive trend, indicating that batsmen who face more balls tend to score more runs. However, there's also considerable variation, suggesting other factors like batting style and match conditions also play a role.
Other Possible Graphs:
Sheet Link: Click here
Source Of Data: Click here