Surf Data 3 - Wave Height
Feb 10, 2021
Feb 10, 2021
Here is the third installment of the Surfing Data series, where I’m using data I’ve collected since 2017. This time, we are analyzing the wave height! This is a parameter I have collected since the end of 2017 but in this analysis, I only worked with 3 years of data: 2018-2020. If you were wondering, that is 665 surf sessions!
Questions
Using my dataset, I was mainly curious about answering:
How does the wave height vary within a year?
How does the wave height vary by location?
Do my wave height records align with the actual wave height?
Biases
There are a few biases within my wave height data.
Subjectivity. My recording of the wave height is subjective. Different people will assign different heights based on their judgment. I attempt to record the average height of the set waves that come through. This means that if one 10-foot set rolled through, I may still label the day as 6-8ft.
Sampling. My data represents only the days and times I decide to surf. This means that if the waves are flat or if they are 20ft+, I probably did not surf, and thus no recording was kept.
When recording the wave height, these two biases are inevitable.
My Data
Let’s look at some plots. First, I decided to plot all of my wave height data by year to compare the annual variability. The data were binned by month and averaged.
This graph brings me to another problem in the dataset - it is not location-specific. I tend to surf a variety of waves, and therefore, only broad conclusions can be made. That being said, a high majority of the time I surf on the Eastern Pacific.
The plot below analyzes the wave height among different locations, with the region corresponding to a specific color. I decided to include only the most visited spots (surfed at least 20 times) in order to achieve the largest statistical significance. We can see that Ocean Beach in San Francisco (SF) has the largest average wave height (5.6 ft). Between San Luis Obispo (SLO) and San Diego (SD), it is apparent that SLO has larger waves on average - more on this below.
Buoy Data
I decided to collect buoy data from both Torrey Pines, San Diego, and Diablo Canyon, San Luis Obispo. These were my points of reference when comparing to my data.
Before looking at the chart, let’s understand how this data is collected. Out in the ocean, a buoy sits with many sensors attached to it. One of these sensors records the vertical motion of the buoy to calculate the ‘significant wave height’. This variable is the result of averaging (median) the largest ⅓ of waves that are detected from the sensor. Cool stuff, huh.
The chart above is a 3-year timeseries where each point represents the average (mean) of an entire month. The most striking distinction between these two locations is the size. San Diego is smaller throughout the entire year (we saw this in Fig. 2) because it is protected by Point Conception and the Channel Islands from north swells. The general variability between the locations roughly tracks each other. We also observe the same seasonal patterns as Fig. 1 where the winters have larger waves on average than summers.
My Data vs Buoy Data
To compare my data to the buoy’s data, I had to standardize my data by region. As a rough method, I found the most common region surfed per month. Most of my data is from San Luis Obispo, so I decided to focus on this region’s comparison, as it has the most statistical significance. The disconnected segments were simply months where a majority of my surfing was not in SLO.
In the plot above, there are noticeable differences between the datasets, but the general trends seem to coincide. With all the biases and sources of error, I am surprised that my data captured similar variations to the buoy data!
Thank you for reading :)
-Garrett