Mathematics and Science, Part II
Week 5
During the fifth week of class, we will continue to discuss the relationship between mathematics and science. In particular, we will continue and extend the activity begun during Week 4.
Gaussian (Normal) Distribution Activity
We will add a supplemental activity onto the previous week's, which will involve the Gaussian distribution (often called the normal distribution or the Bell curve).
The Gaussian distribution is very often used to describe random values of a variable whose distribution is unknown. This is because a distribution of random values will approach the Gaussian distribution under certain conditions--roughly, that the number of values is very large, and that the mean and standard deviation are finite. The standard deviation is a measure of how spread out the values are, and the mean is the average. It turns out that, for the Gaussian distribution, about 34% of the values will be between the mean and 1 standard deviation larger than the mean. For instance, suppose that the mean height of a population is 5 foot 7 inches. If the standard deviation is 2 inches, then about 34% of people should be 5 foot 7 inches to 5 foot 9 inches--if the distribution is Gaussian.
Our purpose here is not to understand this distribution in a detailed and quantitative way (this is properly done in a statistics-based course), but to gain an intuition about what the Gaussian distribution is and why it is used. Let us see how our dart experiment might be related to the Gaussian distribution. To do so, we will play the same game of darts, but score the throws a bit differently. First, we will increase the number of random points to 20. Create two new columns, one entitled "x" and the other entitled "y". In the cell just below your x column, add the following:
=RAND()*10
This will produce a random number between 0 and 10. Drag this downwards so that you have a total of 20 x values. Repeat this for y. Create a new column of r values, just as you did last week, by computing
=(x^2 + y^2)^0.5
Then, drag the bottom r cell downwards to produce all of the associated r values.
Now on to the scores! Create a new column in your Excel sheet entitled "Dart Scores". This will consist of a score for each r value, which will be defined by
r<3.5: 5 points
3.5<r<6.5: 4 points
6.5<r<9: 3 points
9<r<11: 2 points
11<r<12: 1 point
r>12: 0 points
We can automatically score our r values by inserting the following Excel code into the top cell in this column, below the title:
=IF(D2<3.5,5,IF(D2<6.5,4,IF(D2<9,3,IF(D2<11,2,IF(D2<12,1,IF(D2>12,0))))))
Here, "D2" is the cell containing the first r value. Then, we can drag this cell downwards to produce all of the scores. Now, we will produce a running average of the scores. To do this, create a new column entitled "Score Running Average"; below the title, insert
=AVERAGE(B$1:B1)
Here, "B1" is the cell containing the first r score. We again drag this downwards, and now we have a running average; this means that each cell in this column is the average of all of the scores up to and including this point. Finally, make a histogram of the dart scores and answer the following questions:
Does your histogram seem to approximate a Gaussian distribution? Why or why not? Cite references which describe the shape of the Gaussian distribution.
Now, click on one of the x cells. Put your cursor in the cell, but don't type anything. Hit the enter button. This will regenerate the random numbers. Make sure to save both of these plots, so that you can compare them. Do they look different? If you produced them using the same random process, why would this be?
Now increase the number of r values to 1000, by just highlighting and dragging downwards the bottom row of x, y, and r values. Recreate your histogram. Does it look different than the first two? If so, why would it look different?
Can you estimate the standard deviation from your 1000-value plot, by using qualitative arguments? If I were to throw one dart on the board randomly in this game, can you estimate the probability that I will get a score better than one standard deviation from the mean?
Suppose that you were to choose 20 random dart scores from your 1000-value data sample. Do you have confidence that these 20 randomly chosen points accurately represent the entire population? For example, compare the average of a 20-value sample to the average of the 1000-value sample. Are they similar? If not, why not?
It is widely stated in some circles that the average human body temperature is 98.6 degrees F. Let's assume for a moment that this number is accurate. Your doctor takes your temperature and obtains 98.2 degrees F. Can you think of two reasons why comparing one measurement of your body temperature to the human average is misleading?
Do a little research on the average human body temperature. Is it indeed likely to be 98.6 degrees F? If not, what is a more accurate way of stating it? Provide citations!
Semester-long Projects
To expedite the completion of the projects, we will work on them during class.
Submit assignment
To sign in, you must input your CUNY credentials ("firstname.lastnameXX@login.cuny.edu", where "XX" are the last two digits of your student ID). You cannot use "qmail" credentials. If you get an error, please logout of your email/Office365 and then click on the below link.