This Activity involves analyzing a dataset containing credit ratings of various corporate bonds, focusing on three key numerical variables. The dataset is synthetically generated and scaled, providing a suitable foundation for statistical analysis and visualization. The primary objectives are to visualize relationships between the selected variables, calculate and interpret covariances, determine means and standard deviations, and apply Chebyshev's inequality to understand the bounds and variability of the data
Probability Distribution :
A probability distribution is a mathematical function that describes the likelihood of different outcomes in an experiment. It provides the probabilities of occurrence of different possible outcomes in a sample space. Probability distributions can be discrete or continuous, depending on whether they describe the probabilities of a finite set of discrete outcomes or an infinite set of continuous outcomes.
Binomial Distribution :
The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials. Each trial has only two possible outcomes: success (with probability \( p \)) or failure (with probability \( 1 - p \)). The PMF (Probability Mass Function) of a binomial distribution is given by:
\[ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} \]
where:
- \( n \) is the number of trials,
- \( k \) is the number of successes,
- \( p \) is the probability of success on a single trial,
- \( \binom{n}{k} \) is the binomial coefficient, which represents the number of ways to choose \( k \) successes out of \( n \) trials.
Poisson Distribution :
The Poisson distribution is a discrete probability distribution that describes the number of events occurring within a fixed interval of time or space. It is particularly useful for modeling rare events. The distribution is characterized by a single parameter, \( \lambda \) (lambda), which represents the average rate of occurrence of the events within the interval. The PMF of a Poisson distribution is given by:
\[ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} \]
where:
- \( k \) is the number of events,
- \( \lambda \) is the average rate of event occurrence,
- \( e \) is the base of the natural logarithm.
Normal Distribution :
The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is symmetric around its mean. It is characterized by its bell-shaped curve, where most of the observations cluster around the central peak and probabilities for values further from the mean taper off equally in both directions. The PDF (Probability Density Function) of a normal distribution is given by:
\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} \]
where:
- \( \mu \) is the mean,
- \( \sigma \) is the standard deviation,
- \( e \) is the base of the natural logarithm,
- \( \pi \) is the constant Pi (approximately 3.14159).
Exponential Distribution :
The exponential distribution is a continuous probability distribution that describes the time between events in a Poisson process. This distribution is often used to model the time until the next event, such as the time between arrivals of customers at a service center. The PDF of an exponential distribution is given by:
\[ f(x) = \lambda e^{-\lambda x} \]
for \( x \geq 0 \), where:
- \( \lambda \) is the rate parameter of the distribution (inverse of the mean).
Uniform Distribution :
The uniform distribution is a continuous probability distribution where all outcomes within a specified range are equally likely. This distribution is characterized by a constant PDF over the interval \([a, b]\). The PDF of a continuous uniform distribution is given by:
\[ f(x) = \frac{1}{b - a} \]
for \( a \leq x \leq b \), where:
- \( a \) is the lower bound,
- \( b \) is the upper bound.
Probability Mass Function (PMF) :
The PMF is a function that gives the probability that a discrete random variable is exactly equal to some value. It provides a way to specify the distribution of a discrete random variable. For a discrete random variable \( X \), the PMF is denoted as \( P(X = x) \).
Cumulative Distribution Function (CDF) :
The CDF of a random variable \( X \) is a function that gives the probability that \( X \) is less than or equal to a certain value. For a random variable \( X \), the CDF is denoted as \( F(x) = P(X \leq x) \). It is used to describe the distribution of a random variable by providing the cumulative probability up to a certain value.
Joint Distribution :
A joint distribution describes the probability distribution of two or more random variables. For two discrete random variables \( X \) and \( Y \), the joint PMF is denoted as \( P(X = x, Y = y) \). For continuous random variables, the joint PDF is denoted as \( f(x, y) \). The joint distribution provides information about the likelihood of different combinations of values for the variables.
You can access the detailed analysis and visualizations in the Google Colab file :
https://colab.research.google.com/drive/1HHSUoxUInLEvpR-7_vrhMKarDnQ8tCVl?usp=sharing
In this analysis, scatterplots were generated to visualize the relationships between three chosen numerical variables, highlighting potential correlations and trends. Covariances were calculated to quantify these relationships, offering deeper insights into their interdependencies. Additionally, the mean and standard deviation were computed to provide an understanding of the central tendency and dispersion of the data. By applying Chebyshev's inequality, we established probabilistic bounds for the variables, enhancing our comprehension of their behavior and variability. This comprehensive approach offers valuable insights into the dataset's characteristics and the interrelationships among the variables
Thanks for visiting :)