Precision. I have been working on narrowing down my project, making figures that answer the general question I'm trying to answer: How do we optimize autonomous recording units in avian surveys? I am looking at how the follow three parameters affect our ability to estimate bird occupancy: number of point counts, number of ARU recordings, and the number of sites. My goal is to understand how these parameters influence the posterior distribution of the parameters that relate to the probability of occupancy of a single species. By simulating data that represents a specific species, we can see how the model shows the relationship between bird occupancy and burn severity. Once I get the simulations done, I can use the real data we collected to see if the simulations are an accurate representation of the data we would collect of a species.
Now, I just need to work on the simulations and write them up!
Writing. Writing in LaTeX is a lot of fun, but creating something cohesive is more challenging. I have been working towards getting a compete introduction and methods section, but that requires that I flesh out my questions. Here's my current questions that I am planning on answering in the paper:
What is the most optimal combination of ARUs and PCs to obtain significant slopes for covariates?
How does an ARU-only model compare to a-PC only model and a combined model?
What is a good measure of "information" for an avian occupancy model?
Can we add the information obtained from a PC only model and an ARU only model to obtain the information of a combined model?
How do we determine the "power" of this combined occupancy model to determine how many points and surveys to utilize in order to obtain significant results?
I have shifted my analysis of the variance of the posterior to look at the variance of the slopes that relate burn severity to occupancy. When people are designing studies, they care about finding significant results when comparing their covariates to their observations.
In terms of the simulation, I have been able to make a simulation, but the model is not able to recover the slopes for the covariates. Perhaps it has something to do with the link functions, but I am honestly not certain about why it is not working. It could also have to do with the model specification being incorrect, but I don't really know.
Overall, I'm in a weird place. On one hand, I have results that add important information to the scientific community. On the other, I need to bring them together in a way that can communicate what has been simmering in my brain.
Simulation. Don't we all live in a simulation? Well, we might, and simulations can help us understand how models behave. See, the purpose of models is to estimate parameters that explain patterns in data. For example, we might estimate a parameter that tells us the probability of a species occupying a site (psi). Then we can use that parameter to see how it changes in response to burn severity (beta2). BTW, this is why the models we are using are called multilevel (or hierarchical)---we are using the result of one parameter in another sub-model part of the same model. Of course, in the real world, we don't know the true value of the parameters---we are trying to estimate them. That begs the question, how do we know if our model is working if we can't validate its estimates? That is where simulation comes in.
When we simulate data, we know the values of the parameters because we are literally setting them in the simulated data. So we can run the simulated data through the model and see how well the model is able to get the parameters back. And that's exactly what I've been doing---simulating data by reverse engineering the model. And I think the results are pretty cool. Of course, it's just a simulation, but it should do a reasonable job of reflecting all types of species.
The coolest thing that I found was that the variance of psi depended on the value of psi, where psi values that were closer to the extremes (0 and 1) had higher posterior variances than psi values close to 0.5. This makes total sense because the variance of the binomial distribution is npq and the posterior is proportional to the likelihood (where the likelihood is the binomial distribution). I did some more tests and found some cool things, and they are mostly outlined in this Rmarkdown document.
Statistics, statistics, statistics. That is where my mind has been the past month. And I love it. I've been reading the book Statistical Rethinking to help me understand what Bayesian Statistics is all about.
The current stage of my project is in an interesting state. I feel that I have all of the technical and statistical knowledge to understand the samples from the posterior. Variance and entropy are the two main ways that I can determine "information", and I can use these measures to compare the model with different amount of data.
Theoretically, more data == less variance as the posterior will be more centered around the mean. At some point, the variance will level off as the model will be "saturated" with data.
Currently, I have written a script that can sample from the posterior and return the distribution of samples, but the model is not working very well. When the model gets improved, there will be a more significant change in the posterior when I change the amount of data that the model uses.
Well, it's been a while. A Willow Warbler came to Rodeo lagoon, the first time it's been seen in the lower 48. This fall has been amazing, just incredible. So many rare birds. Okay, now onto the real MARC work.
I've been working on understanding the model and trying to figure out how to make it converge. I was able to get the script I mentioned below to work, but the outputs are not super useful since the model does not really do a good job of estimating the occupancy. Here are the two potential ways that we can "fix" the model:
Restructure the data going into the model. The model does a very good job with the point count data, so it's just the ARU data that we need to work on. The lambda parameter in the Poisson distribution is not converging, so we need to figure out how to reorganize the input ARU data. We need to aggregate the ARU data (reducing the file sizing) so that the model has an idea of the parameter that it should be "inferring." This will probably involve aggregating the ARU data every 60 seconds (taking top logits across that time period) instead of every 2.5 seconds.
Use a different model. We have had success using a negative binomial model, but the parameters in the model are not super useful and transferable to the real world. There is not a probability of success parameter like there is in the other model.
So, we'll have to see what our group comes up with to fix this pressing issue. Then, I'll be able to figure out the "information" contained within each of the data sets.
I have been working on the script that allows me to run the model with different amount of input data. It mostly works, although it won't allow me to only have 1 point count and 1 ARU recording, but I should be able to fix that soon. There are many ways to determine the "information" in the model, and we can use a binary entropy function to determine the information (which is related to the deviance). I need to meet with people who made the model to add a calculated parameter that is the binary entropy. Here's a link to the outputs from the script that runs the model with different amount of data.
Here's some cool birds I have recently seen:
Abbotts Lagoon - 10/8/22
Pt. Reyes National Seashore - 9/20/22
Wow, we're already back in school! September marks the beginning of the best birding season in Marin, and I've already seen two county birds; a Painted Bunting and a Ruddy Turnstone. Hopefully, the southeast winds will bring more vagrants in!
Now, turning back to my MARC project. During the first week of school, my mentors worked with some Google employees on a combined model that is able to determine the probability of detections using both the ARU and the point count data. Using this model, I hope to make a function that is able to run the model with varying amounts of data to determine how well we can determine birds' occupancy if we were to collect less data. This is super important as it would allow future research to know how much ARU data and PC data they actually need to collect. It will also reveal whether ARU-only studies are missing anything, as I can run the model with just the ARU data. I'm looking forward to more big data analysis in R!
Here's something else I did this summer, comparing ARU and PC data.
I just came back from my 7 days of camping. Use the above map to orient yourself on my adventures. Here I'll give an overview of what I did each day:
Day 1 (June 7) (2.75 mi.) -- I woke up at 4:15 AM and headed to the Flat Cluster. I conducted point counts with Jack Dumbacher (my mentor) and moved two ARUs. I felt very winded because of the elevation. I also had to relearn a few bird songs that I hadn't heard in over a year. These birds include Hermit Warbler, Yellow-rumped Warbler, Green-tailed Towhee, and Fox Sparrow. We saw 2 Evening Grosbeaks which are super cool birds.
Day 2 (June 8) (9.5 mi.) -- I woke up at 4:00 AM and headed to the Hayflat trail (near the Cody Cluster). I was again with Jack. The plan was to hike from the top of the valley into the canyon and then walk out to the Caples Canyon trail head where we had left a car the night before. We got to the Hayflat trail head at 5:00 AM and walked to point 582 on Convict. We started the first point count around 6:00 AM and then walked to point 583. I still was feeling the elevation, so I had to go very slowly to 583, which is up a very steep hill. Because I was slowing down the point counts so much, my mentor decided that I should go move some ARUs while he did the point counts. After point 583, I went to 627 and moved an ARU to 628. Then I moved 625 to 626. While I was walking between the points, I saw a Northern Goshawk fly by which is one of the most amazing hawks to see. I finally found my way back onto the trail and then moved 624 to 622. Both of those points are directly on the trail which is nice. I got back to the car pretty late and was exhausted after walking 9.5 miles in 8.5 hours.
Day 3 (June 9) (2.5 mi.) -- I woke up at 4:30 AM (a late start!) and headed to the Cody cluster. I went with Kristen who works for The Institute for Bird Populations and we conducted point counts at the 4 points. We then went down to Silver and did a point count at 1061 because Kristen was not able to do it the day before as she got there after 9:15 AM which is the latest a point count can be started.
Day 4 (June 10) (5 mi.) -- I woke up at 3:30 AM and headed to Marin Meadow (bottom right of map). This place is an hour drive away which is why we had to wake up so early. I went with Kristen and Chris who is a grad student at Humboldt State. We went about 1/2 a mile down an off road trail and dropped off Kristen as she had to start at 450 which is the furthest point from the parking lot. Chris and I went back and parked at the parking lot. We went to 543 and did the first point count together. Then, I split off and moved 3 ARUs while Chris and Kristen did their point counts. This allowed us to leave Martin at 10 and get back to camp at a reasonable time after our sleeping was cut short.
Day 5 (June 14) (9.5 mi.) -- I woke up at 4:00 AM and headed back to the Hayflat trail. This day was very similar to day 2 except that now, I was doing the point counts by myself. I did two point counts on the Convict Cluster and then moved 3 ARUs on the way back. On one of the point counts, I saw 12 species which was the most I had seen on a single point count the entire time. By this time, I had mastered most of bird songs and I could tell Hermit Warblers from Yellow-rumped Warblers, which took me multiple days to relearn.
Day 6 (June 15) (5 mi.) -- I woke up at 3:45 AM and headed back to Martin's Meadow. I was doing point counts at all of the southern points and Jack was doing point counts at all of the northern points. This area burned intensely which made is very easy to walk on as all of the brush had burned into ash. Over 95% of the trees burned in the Caldor Fire, meaning that in a few years walking will be difficult due to all of the trees that will fall. We planned to set up some of the ARUs that had come back to the camp instead of moving the ARUs. Jack soon noticed that one of the ARUs he was setting up had the wrong settings and was recording at a lower bitrate, in mono (instead of stereo), and at a lower gain. This caused the file sizes to be significantly smaller. After the point counts, I got back to the car and checked the settings of all of the ARUs I had moved in the previous week. The app I was using thankfully saved the settings. It turned out that 4 recorders were improperly setup and now we had to go fix them and change their settings. Not only did we have to correct their settings, but we also had to put ARUs on the points where they had been recording with the incorrect settings. This is because most ARUs had been to 2 points before we caught the error. In the afternoon, we created a plan to setup and fix as many ARUs as possible.
Day 7 (June 16) (9 mi.) -- I woke up at 6:00 AM, a late start because we were not conducting point counts. ARUs can be moved at any time but point counts must be finished before 9:30. We headed up to hayflat and I followed a similar route to day 5. I reprogrammed the ARU at 670 (that I had set up 2 days ago) and then set up ARUs at 627 and 628. Because we got up so late in the day, it was pretty hot on the way back to the trail head. I put up the final ARU of the study season, so now I'm just waiting for the data.
Very Burnt Forest
Chris Near Point 543
Jack at 5:15 AM
Fun fact, the reason that I am writing in this particular ARU is because it was incorrectly setup (it was recording at a lower gain, lower bitrate, and in mono compared to stereo) and I had to write when I changed the settings.
Great View From Where I Saw a Northern Goshawk!
A Northern Goshawk that flew right up to us!
School has just ended. This makes me sad. I love school. Thankfully, the boredom will only last a few days as I am going to Eldorado National Forest on Monday, June 6 to help collect bird data! I am going to be joining my mentors in the field (without my parents), even better than school! From June 6-10 and June 13-17 I will be camping at China Flat Campground in Eldorado National Forest. Waking up before the break of dawn, I will hike a few miles to points scattered throughout the burned forest. Once I get to the point, I will count all of the birds that I see and hear for 10 minutes. In addition to conducting point counts, I will also be moving the autonomous recording units (ARUs). There are 80 points in the study area but only 30 ARUs. This means that each ARU needs to record at 3 different points, which requires moving the ARUs. Depending on the day, I will either move them while I am conducting point counts or move them in the afternoon when they are not recording (they only record in the mornings).
If I am not moving ARUs in the afternoon, I will be able to work on the study design of my MARC project with my mentors and other scientists who work for the US Forest Service.