What do these three things have in common? What are some of these things?! You have questions, and
I have answers.
Well, the OED has answers. It defines probability in general terms as "the extent to which something is probable; the likelihood of something happening or being the case. " An example might be:
In the world of mathematics and statistics, probability has a more specific meaning (thanks again, OED): "The extent to which an event is likely to occur, measured by the ratio of the favorable cases to the whole number of cases possible." Which might involve a similar question, one that might be more along these lines:
Please note, for the purpose of this blog post, let's pretend I did not just get carried away and bring the Pacific Ocean into the equation (equation, see what I just did there?); after all, I'm a newbie data scientist. Let's keep things a bit more manageable for the time being.
But enough silliness. If anything, I've just shown that my Python skills, my French vocabulaire and my sense of humour are all ridiculous. Siméon Denis Poisson was a renowned and prodigious 19th century French mathematician, best known for discovering and popularizing the Poisson distribution.
In inferential statistics, a distribution refers to, as our friends at 365DataScience.com note:
a function that shows the possible values for a variable and how often they occur... The distribution of an event consists not only of the input values that can be observed, but is made up of all possible values.
There are a whole whack of different probability distributions, each of them appropriate to different probability scenarios. Some of these distributions and the specifics of their appropriate scenarios will be the subject of future blog posts. For the moment, what you need to know is that the Poisson distribution is one that is appropriate for discrete random variables.
And what the heck are those, you may ask? It's not as convoluted as it sounds. A discrete random variable is something that can be finitely counted. Like pineapples falling from trees on a private island. Now about those pineapples...
Not so fast! You didn't think you could make it to the end of this post without some math coming at ya, did you? Let's take a moment to talk about the what the Poisson distribution is used for, and how it is mathematically calculated.
We use the Poisson distribution to represent the number of events that may occur during a given window of time. For the first iteration of this blog post, I'm going to lift this technical description right out of my learning materials at General Assembly (because I don't want to screw anything up!):
Whoa, I think I need a nap after all that math But lucky for us, our trusted serpentine data-science language
(um, Python), can crush all those numbers (and greek alphabet letters!) for us.
The wise persons at GA also note the uses of the Poisson distribution in greater detail. Let's peep what they have to say:
when the number of successes is is a non-negative integer,
when events occur independently,
when the rate at which events occur is constant,
when two events cannot occur at exactly the same instant, and
the probability of an event occurring in an interval is proportional to the length of the interval
So, oops, let's forget about those pineapples I wouldn't shut up about earlier. It is improbable, however possible, for two pineapples to fall at the exact same moment. Oh screw it, I love pineapples. Let's say that on my private island, it has never been documented that two pineapples fell from a tree in the exact same millisecond (work with me here).
Let's say that we see 12 pineapples hit the ground in an hour (this is quite the earthquake!). And this activity continues for 300 hours (OK, maybe it is in fact a monsoon, so let's make that 300,000; I knew I never should have dropped all my umbrella factory stock during the COVID-18 scare of 2020!). Let's draw samples from the overall distribution and plot a histogram using that data -- all with the help of our friends at python, numpy, matplotlib and SciPy (from whom this example is derived).
First we will import numpy and use it to draw samples from the Poisson distribution. Then we will plot the histogram using matplotlib:
And as Poisson himself would say, Voila! I hope I got that right. If not, please go easy on me since this is my first encounter with Poisson since that disastrous trip to Quebec City in grade 9. Thank you for reading!