You have the following two options to consider for the dataset:
Use any of the data from the sheet for the experiment. (The sheet contains the credit rating dataset of different corporate bonds and the ratings assigned to them. Each row represents a bond and the rating given by the credit rating agency).
Note: The dataset is synthetically generated and scaled.
Collect your own data comprising 1000 rows and 5 to 6 columns with column data type similar to the above given data.
Transfer the data from the source mentioned above into a Google Spreadsheet and mention the name of the used data (e.g. Dataset1) from the sheet. You can either embed the spreadsheet directly into your Student Portfolio page or provide a link to access it.
Create a google doc file and execute tasks (3) and (4) within it.
(i) Select three numerical variables, such as X1, X2, and
X3 from your dataset. Create scatterplots to visualize the relationships between these variable pairs. For instance, generate scatterplots for X1 vs. X2, X2 vs. X3, and so on.
Additionally, provide your insights regarding the observed relationships between these variable pairs based on the scatterplot visualizations.
(ii) Calculate the covariance for the chosen pairs of
numerical variables, such as Cov(X1, X2), Cov(X2, X3), and Cov(X1, X3). Then, analyze and interpret the relationships between these variables based on the computed covariance values.
(i) Calculate the mean and standard deviation for the three chosen numerical variables. Afterward, provide an interpretation of the results to better understand the characteristics of these variables.
(ii) Formulate a statement to find a suitable bound (both an upper bound and a lower bound) by applying Chebyshev's inequality. Subsequently, give an interpretation of this result to understand the significance of the bound in terms of the variable's behavior and variability.
Example: Suppose that it is known that the number of items produced in a factory during a week is a random variable with mean 50 and variance 25. Then, by using the Chebyshev’s inequality, you can be at least 75% sure that this week’s production will be between 40 and 60 or the probability that this week’s production will be between 40 and 60 is at least 0.75.
And, the probability that this week’s production will be at least 60 or at most 40 is at most 0.25.
Note: Please perform 4(ii) for the selected three numerical variables.
Please refer the sample solution doc file for the above activity.
Google Document:
Sheet Link: Click Here
Google Doc Link: Click Here