WORK IN PROGRESS....not done yet
I was once told by an engineer that October was the driest month of the year. I’ve always wondered it that was true. The statement could mean that October:
Receives the least amount of precipitation per month, or
Has the least number of days with precipitation.
A review of 24 years of data from the weather station located near Morgantown Kentucky showed that October was not the driest month of the year. While it is one of the driest months of the year:
Month with the lowest total precipitation was November with a median 3.38 inches followed by September. Note there is not a statistical significant difference total precipitations per month (p=0.24).
Month with the fewest days with precipitation was September with a median 7 days followed by August, October and November (all median of 8 days). Note there is a statistical significant difference number precipitations events per month (p=1.79e-05).
Note for total precipitation per month there was an increasing trend for March with a p=0.0046 at a rate of 0.14 inches per year.
Below is a detailed description of how the review was conducted.
The Data
The raw data was obtained from Climate.gov for the period of January 1, 2000 to December 31, 2023 resulting 24 years of data. The station used was the Aberdeen, Kentucky station (Station ID USC00150012) (37.2318, -86.6866) which is located near the city of Morgantown, Kentucky. This station was selected because I have a job site near there at which we sample storm water.
The Review - Monthly Totals
First the monthly totals were reviewed. It is always best to graph the data. Notched box plots are a good way to look at a dataset all at once. From Figure 1 we can note that:
It appears the notches over lab. If the notches did not over lap that would be evidence that the medians differed.
October does not appear to be the lowest. Novemeber's median is below October's.
There are some outliers, especially in April's dataset.
Outliers
Ed Gilroy, formerly Statistician at USGS stated “Treat outliers like children …… correct them when necessary, but never throw them out.”
To review the outliers, data for some other stations near by was reviewed for the same time periods. The suspicious values in the Aberdeen dataset are:
April 2011 - 15.66 inches.
Woodbury - 15.45 inches (Station USC00158824 - 4.34 miles Southeast)
Bowling Green - 12.87 inches (station USC00150904 - 18.2 miles Southeast)
Bowling Green-State Police - 10.52 inches (Station USC00150906 - 24.5 miles Southeast)
Greenville - 14.34 inches (Station US1KYMU0001 - 33 miles West)
July 2016 - 13.21 inches.
Bowling Green - 11.30 inches (Station USC00150904 - 18.2 miles Southeast)
Bowling Green-Plum Springs -12.69 inches (Station US1KYWR0024 - 22 miles Southeast)
Powderly - 14.18 inches (Station USC00156495 - 26 miles West)
April 2015 - 10.56 inches
Rochester Ferry - 8.75 inches (Station USC00156882 - 11.5 miles West)
Bowling Green - 9.55 inches (Station USC00150904 - 18.2 miles Southeast)
Powderly - 6.51 inches (Station USC00156495 - 26 miles West)
February 2019 - 10.43
Bowling Green - 12.86 inches (station USC00150904 - 18.2 miles Southeast)
Rochester Ferry - 10.59 inches (Station USC00156882 - 11.5 miles West)
May 2010 - 10.96 inches
Woodbury - 10.47 inches (Station USC00158824 - 4.34 miles Southeast)
Bowling Green - 13.78 inches (station USC00150904 - 18.2 miles Southeast)
Greenville - 11.17 inches (Station US1KYMU0001 - 33 miles West)
So while the values are outside the expected range, they appear to similar to surrounding areas. The one exception is the April 2015 data of 10.56 inches but it was decided to include it in the review.
Table 1 provides the summary statistics for the monthly precipitation totals showing May as the month with the most precipitation and November as the least.
While the notched box plots were very informative, we still need a formal statistical test. Here we will use ANOVA. The preferred method would be Welch's t-test. The test has the following assumptions/requirements:
Data are normally distributed.
Data are not heavily skewed (coefficient of variation is less than or equal to 1.5).
Variances of the populations can be unequal.
No temporal trends in the data can be present.
No naturally-occurring spatial variability can be present.
Samples must be spatially and temporally independent.
Use of 8 to 10 measurements is recommended.
Source: ITRC GSMC Section 5.11.1 and Wikipedia Welch's t-test
First we will test for normality. This will be done with the Shapiro-Wilks and Shapiro-Francia. The Shapiro–Wilk test is more appropriate method for small sample sizes 4 to 50 observations. Shapiro-Francia can be used for 5 to 5,000 observations. Table 1 provides a summary of the results. All but April appear to be from normal distributions.
In light that one of the datasets does not appear to be from a normal distribution, the Kruskal-Wallis Test was used. This is a non-parametric test and does not require normal distribution. Here is what we from the results:
The resulting P-value is 0.24 (see the red box below) is above the threshold of 0.05 so we accept the hypothesis there is no difference between the groups.
The size effect of 0.05 (see blue box below) indicates the effect of which month it is has a medium effect on the monthly precipitation. Field, A (2013) proposed the following scale for size effect:
Small < 0.01
Medium 0.01 - 0.06
Large 0.06 to 0.14
Very large > 0.14
The Review - Daily Events
Next the number of days per month which had precipitation greater than 0.1 inches recorded. Data source was the same as above. Note some dates were included from the source. Months with more than one missing date were removed from the data set.
A review of the box plots in Figure 3 shows some of the notches do not over lap.
Table 3 provides the summary statistics for the monthly precipitation totals showing May as the month with the most precipitation and Se as the least.
As noted above while the notched box plots were very informative, we still need a formal statistical test. Here we will use ANOVA. The preferred method would be Welch's t-test. The test has the following assumptions:
Data are normally distributed.
Data are not heavily skewed (coefficient of variation is less than or equal to 1.5).
No naturally-occurring spatial variability can be present.
Samples must be spatially and temporally independent.
No temporal trends in the data can be present.
Use of 8 to 10 measurements is recommended,
First we will test for normality. This will be done with the Shapiro-Wilks and Shapiro-Francia. The Shapiro–Wilk test is more appropriate method for small sample sizes 4 to 50 observations. Shapiro-Francia can be used for 5 to 5,000 observations. Table 4 provides a summary of the results. All but appear to be from normal distributions.
As datasets appear not to be from a normal distribution, the Kruskal-Wallis Test was used. This is a non-parametric test and does not require normal distribution. Here is what we from the results:
The resulting P-value is 1.29e-04 is below the threshold of 0.05 so we accept the hypothesis there is difference between the groups.
The size effect of 0.13 indicates the effect of which month it is has a large effect on the number of days with precipitation per month. Field, A (2013) proposed the following scale for size effect:
Small < 0.01
Medium 0.01 - 0.06
Large 0.06 to 0.14
Very large > 0.14
Checking the Results
In an effort to evaluate if the results from the Morgantown review were repersentive for a lager area a review of the data for Dickson, Tennessee was conducted. This station is located 87 miles southwest of the station used in the Morgantown, Kentucky review above.
The Data
As with the Morgantown, Kentucky review the data reviewed was the period of January 1, 2000 to December 31, 2023 resulting 24 years of data. The station used was the Dickson, Tennessee station (Station ID USC00402489) (36.0747, -87.3931).
The Review - Monthly Totals
From Figure 5 we can note that:
It appears some of the notches do not over lab. For example May and June. If the notches d0 not over lap that would be evidence that the medians differed.
October does not appear to be the lowest. September's median is below October's.
There are some outliers. A review of other stations in the area showed similar results for the same time period so these point were retained.
Table 5 provides the summary statistics for the monthly precipitation totals showing May as the month with the most precipitation (same as the Morgantown review) and September as the least (the Morgantown review showed November as the lowest and September as the second lowest.
The results of the normality test displayed in Table 6 show half of the months appear not to be of a normal distribution. Based on this the Kruskal-Wallis Test was used to review the data.
The Kruskal-Wallis Test results showed a p value of 0.01 providing eveidence there is differences between the months. The size effect of 0.08 indicates the month has a large effect on the preciptiaton totals.
Daily Events
Next the number of days per month which had precipitation greater than 0.1 iches recorded. Note some dates were included from the source. Months with more than one missing date were removed from the data set.
A review of the box plots in Figure 3 shows some of the notches do not over lap.
References:
Field, A (2024) Discovering statistics using IBM SPSS Statistics. Six Edition. Sage: London.