Intern with the Monterey Bay Aquarium Research Institute
Goals for Summer 2020
My goal for this summer is to get real-world experience working with data. I also hope to further my technical skills this summer. I want my R and Python knowledge to increase tenfold. I also hope to develop good habits working a job from 9-5 as it is going to be good preparation for the real-world. What I hope to learn from my mentor is how to analyze a problem and then design an experiment based on that analysis. I am going to be doing similar work to something already published so, recreating those results is something I hope to achieve. The skills that I hope to acquire are advanced statistical and programming knowledge which, I can translate into a job once the internship is complete.
The work ethic I hope to gain from this internship is to work hard every day that I am on the clock. I want to become a great data scientist in the future, and that has to start with a good work habit here in this internship. I am going to be balancing studying for the GRE and doing some side projects as well. I need to be prepared to work hard. I think that this internship is crucial because it will show me how I work in an actual job setting. I am going to note some of the areas that I am struggling in and try to fix them before I start my first data analyst job.
I hope to also get some networking contacts from doing this internship as well as the entire GEOPATHS program. Making the right connections here in this program is going to set me up for the rest of my career. This internship aligns with what I want to study for graduate school. Doing well here will benefit me greatly. I am ready to spend a lot of time working on my project.
How Covid-19 changed my summer plans
Before COVID-19 happened, I was stressing out about what I was going to do over the summer. My education had shifted to a data analytics based focus, and I wanted to improve those skills. When everything shut down I didn’t know what to think. The one thing I did do was take advantage of the situation. I spent a lot of time improving my programming knowledge as well as my statistics knowledge. Next thing I knew I received an email about possible internships from GEOPATHS and one of the categories for the internship was ‘Data Analytics’. I couldn’t believe it! I actually had the opportunity to do something I was passionate about. I got assigned to work at the Monterey Bay Aquarium Research Institute (MBARI) doing a data analysis project. So far I am enjoying every day of it. COVID-19 might have changed the facts that it is not in person, but I am making the most of it. I have made friends with the other interns at MBARI and am improving every day in my programming skills. I have advanced super far in my project in just one week.
I think because of the inevitable shutdown of COVID-19 my plans changed. I molded them into a way I could advance my knowledge. I think that being indoors so long allowed me to explore subjects that I might’ve not considered before. I also found out what I want to study going into graduate school. By being resilient I made the most out of these dark times. I am now preparing to study for the GRE all while doing a fascinating internship.
GeoScience Career Panel Reflection
I am very grateful to have the opportunity to listen to many wonderful speakers during the career panel. I enjoyed listening to everyone’s story since everyone had a unique journey getting to the position in which they are currently in. The person who I resonated most with was Kyle’s story since I also want to start my career as a data analyst/scientist. I love learning about the earth and ways that our generation could help preserve it. I want to do that using data. I haven’t met too many people who took a similar route to me and ended up as a data analyst so I enjoyed listening to Kyle.
I was surprised at all of the different routes the panelists took to get to their current positions. It just supports the idea that there is not one particular route to end up in a certain career. I would love to see examples of some of the different work that the panelists produce in their occupation. As well as the different types of software they use daily. Since computers are becoming increasingly important throughout the years hearing about their experience with technology would also be beneficial.
Since I just finished my undergraduate degree not too long ago my plans are all over the place currently. Hearing about other’s experiences out in the field is beneficial because I see I have a wide variety of options. I believe I am going to try to expand the types of companies I apply for and not limit myself to just careers in the Earth Sciences.
Final Reflection
This summer I aimed to improve my data analysis skills because I felt like I wasn’t progressing at a fast enough pace. Looking back at my paper ‘Goals for Summer 2020’ I felt like I had a good idea of what I wanted to do but wasn’t exactly sure how to achieve it. Shifting career goals is something that I had thought about for a long time and, I finally decided to do it.
One of the goals I had set for myself this summer was to improve my Python and R skills while I worked on my project. The project that I worked on this summer was titled “Characterization of the Phytoplankton Phenology in the Subarctic Pacific Ocean”. In this project, I compared two different methods quantifying at what time of year phytoplankton blooms begin initiating off of the Gulf of Alaska. I will describe this project in greater detail later further in the reflection.
Immediately when I started the project I was introduced to some programming concepts in R that I was unfamiliar with. I was very excited because right from the start I was learning new concepts. As time went on this trend continued and I gained a lot of experience working with real-world data. My R skills improved rapidly and I was happy with the progress that I was making. R wasn’t the only language that I wanted to improve upon. I also wanted to strengthen my Python skill which was the weakest tool I had in my computer software toolbox. There were some tasks throughout my internship in which I found myself struggling to get working in R, which I knew would work in Python but I just did not know how exactly. I spent about a week reading textbooks and watching videos and I finally had the skillset I needed to accomplish the task. I automated some file downloads from the internet using my new skills. It made the work that I was doing run about 10x faster.
Overall throughout the internship, I gained this programming knowledge which was the main goal I had set for myself this summer. As the program went on my goals began to shift in priority. I discovered that although programming was still near the very top of my goals, my main goal had changed. I realized to become a true data scientist, I would need to have a combination of programming mixed with analytical, writing, and presentation skills. Throughout this internship, these skills began to manifest themselves as I began to think of myself more as a data analyst than a statistical programmer. I have always been rather shy, but I worked on my presentation skills and was able to give some presentations I am proud of.
Another goal I had set for myself this summer was to get the feel of how I would like to work 40 hours a week from 9-5. Honestly, at the beginning of the internship, this was something that I was dreading. I felt like I wouldn’t have enough time to do the things that I like to do. As the internship went on I found myself enjoying this schedule. There were a couple of reasons why I think I enjoyed working on this schedule. The first being that during the day, I spend many hours doing data analysis of some sort anyways. Being able to work with real-world data motivates me even further. I found myself completing my tasks well before my mentor thought I would. The second reason was that since there was no homework after 5 pm I could use this time to focus on my hobbies. I had felt drained because of school but this internship helped me rekindle my passion for data analysis. I often found myself excited to show my mentor my results.
The last goal I had for myself this summer was to network. Before this internship, I had limited contacts in the real world. I needed to expand upon this network especially, with other data analysts. I had gained confidence because I had to have conversations with multiple people every week during the summer. An unintended effect of this was that I was able to have smoother communications with others. Specifically when I was reaching out to recruiters or other researchers. I combined these skills with my data analysis skills and checked off every box that I had set for myself this summer.
Something I wish I had learned at the beginning of Summer was how much work goes into setting up a project. My mentor and I often found ourselves changing the schema of our project when we encountered an issue. This happened multiple times and it made me realize the importance of having multiple back up options if something wasn’t working correctly. I wished I had set up a time to do this when I began my project because I know I could have been even more productive than I already was.
Research
This summer I worked on analyzing the phytoplankton phenology in the Subarctic Pacific Ocean. The main goal of this project was to see when phytoplankton blooms began initiating. It was previously thought these blooms occurred at the beginning of Spring. I was testing out a hypothesis from the paper ‘Phytoplankton Phenology in the North Atlantic: Insights From Profiling Float Measurements’ (Yang et al 2019).
When quantifying phytoplankton biomass, three variables are important for defining the beginning of the phytoplankton bloom. The first variable is the growth rate (μ), which is influenced by characteristics such as light and nutrients in the ocean. The second variable is the loss rate (l), which is influenced by characteristics such as grazing and depletion of nutrients in the water. The third variable is the specific accumulation rate (r), which is the growth rate minus the loss rate (μ - l). The specific accumulation tells us a phytoplankton bloom has initialized when the value is higher than zero. The method I was using to determine when the phytoplankton bloom was beginning was the monthly climatology of the data which I will discuss a little later on
Our data was obtained from two different sources. The primary source of data came from Biogeochemical (BGC) Argo floats. These are these autonomous floats that are placed in the ocean. They pick up different metrics based on the sensors attached to them. Some of these variables are pH, salinity. temperature, and light. All of these metrics play a role in quantifying the phytoplankton blooms, but they each compose their own piece of the overall project. Salinity and temperature make up density while the light is used to define the growth rate, mu. These metrics are sent to the Monterey Bay Aquarium Research Institute where quality control takes place. This means the data is cleaned to be more easily understood. These floats capture the data arguably better than other methods because they are directly on the ocean. This brings me to the second form of data that is used in this analysis - satellite imagery. While this data is taken from the sky is still produces accurate results when compared to the BGC Argo float data. Both of the different types of data were plotted on the same axis and produced similar results.
From these two types of data, we were able to quantify the phytoplankton blooms, as well as during what period they emerged. I grouped the data by date and took the median of all of the points for every date. I chose to use the median because there were some outliers in the data. This prevented those outliers from affecting the monthly climatology.
The result that we got from this final figure shows that the bloom begins initiating in March. This did not align with the hypothesis that I initially set out to prove. One of the main reasons I believe this happened was because of the location where these measurements were taken. In the Yang et al. (2019) paper they focused on the North Atlantic. They also mentioned using their methods in different locations might produce different results.
Overall the project was a good experience because I got to work with real-world data as well as contribute to a meaningful project. Even though the result was different from what I expected a lot of the steps along the way produced great results I was satisfied with my internship this summer.
Figure 1. Growth Rate (μ) is taken from three different BGC Argo-floats in the Subarctic Pacific Ocean. The Argo floats took measurements from 2010-2020. One of these floats is still active in the Gulf of Alaska.
Figure 2. Monthly climatology of the BGC Argo-float data. Growth rate, loss rate, and r are all defined on the left axis, and C Phyto (Carbon Phytoplankton) is on the right axis. When r reaches a value greater than zero then a phytoplankton bloom has initialized.