My MARC project is officially complete! I'm very happy with the results I got - I found some interesting correlations between house structure and increased mosquito entry rates, as well as a correlation between poverty and increased mosquito entry rates. My final paper is complete and I gave my final presentation yesterday! Unfortunately, the presentation was moved to Zoom because our school is experiencing an uptick in Covid cases at the moment. But even so, I was glad to be able to share my project, and the presentation went smoothly (aside from one of my methods slides being skipped). I'm sad to be done with my MARC project, but I've learned so much over the last two years! Teaching myself to use R has been both challenging and rewarding. I've learned so much about the scientific process and the research community, and I'm so excited to see how I use the skills I learned in MARC in the future!
Today is the first day of Senior Project. For the next three weeks, I have no classes and will instead be working only on finalizing my MARC project. My main priorities are to write the results and discussion sections of my paper and then finalize my manuscript; and complete my project presentation, to be given to the community in the last week of May. I am very excited to spend the next few working working exclusively on my MARC project, and am looking forward to having a completed paper by the end of the month!
In the last month, I've spent a lot of time working on my introduction and methods sections of my final paper. At this point, I have full drafts of both of these sections, as well as peer edits and suggestions on both papers. In the next few weeks, I plan on beginning to incorporate these edits, especially after I get feedback from Amy. Spring break starts tomorrow, and while I will be out of town for some of it, I hope that I can continue working on my project. One of my priorities now is to ensure that I can display my results through figures and graphs in the results section in my paper and in my final MARC presentation. I also need to begin applying statistical significance tests to my data, as this is an extremely important aspect of presenting my results. Thus far, everything I'm doing with R is aligning with what I expected, so hopefully that will continue to be the case.
One of the most significant things I've been focusing on in the last month is linear regressions. I feel that I now have a good understanding of how to use them and after working through some initial challenges, I have started applying them to my data. I've also been communicating with my mentor, and she gave some advice and helped me think about how I'm directing my project. In the next couple weeks, I hope to continue working on linear regressions. I want to try and add in more variables to further complicate my models. I've also begun working on the introduction section for my final paper. I realized upon starting the outline that I don't have information about how the wealth index that is included in one of my datasets is calculated. I reached out to my mentor and she sent a paper that described two different wealth indices and how they were calculated; however, she was not sure which one was in the data she sent me and suggested that I might have to recalculate it. Unfortunately, the methods the paper outlined are somewhat difficult. The authors used PCA (principal component analysis), which appears to be quite complicated, so I'm not sure it's feasible to learn how to do the calculations by the end of the year. I'm planning on discussing this with Amy to see if she has any suggestions.
This last month, I've made a lot of progress with R, as well as with Microsoft Excel. One thing that's been incredibly effective is using R Studio alongside R. This has helped a lot with being able to work with multiple datasets at once and has generally helped me visualize all of my data. I also decided to write all of the datasets I have into Excel, both as a backup and because I believe it may be helpful to use for certain aspects of my project - such as combining datasets, which I hope to begin working on in the next couple weeks. I've also figured out how to make and format effective data dictionaries using the R packages haven, labelled, and sjlabelled. This has helped me in understanding the datasets, so I have a much clearer picture of what I need to do.
I've also decided that at some point this semester, I hope to return to the lab. It doesn't seem feasible to work on anything malaria-related in the MA lab, as the safety measures necessary for that would be very difficult for a high school lab. However, I believe that I can add onto my semester-long project from MARC EED, in which I used C. elegans as a model organism for Alzheimer's disease to investigate whether Rapamycin (also called Sirolimus), a drug normally used in organ transplants, would be a viable treatment for AD. Rapamycin is now in the clinical trial phase for AD treatment, so I hope to find something different for my own project. However, this lab work may not begin until February or later, as I need to continue focusing on my work with R. Lab work would also be a viable option for May, when MA seniors embark on independent projects of their choosing. All participants in the MARC program work on their MARC project during this month, so spending that time working in the lab would be a good use of my time for my senior project.
Over the past month, I've continued my work with R. I especially started focusing on ggplot2, which my mentor suggested to me specifically. Ggplot2 is part of the tidyverse package, and it is useful for making graphs and visually displaying data. I've also been working on labels and learning about how they are helpful in understanding my data. One thing I've found to be especially helpful in the past week is switching to R Studio, as it makes it much easier to see and understand the variables in each dataset. Some of the datasets are quite large, with one having upwards of 500 variables (columns), so I need to find a better way of breaking them down or only using the parts of them that I need. One exceptionally helpful resource that I've found in the last month is a free online textbook called R for Data Science. This was written by the same writers of Advanced R, and is slightly more relevant to the work that I want to do.
After meeting with my mentor some time ago, I have a much better idea of what I need to be working on with R. Over the past several weeks, I've been using a number of online tutorials and resources to learn how to work with the datasets that my mentor sent to me. Among these are a Harvard course through Coursera, a DataCamp course, and Advanced R, an online textbook. My focus at the moment is working on subsetting and learning how to clean up the datasets I have so that I will be able to analyze them. I've hit a number of roadblocks while trying to work with the data, as R can be somewhat finnicky. For example, it took a good amount of trial and error just to figure out how to read the datasets into R so that it would read the data in a way that made sense. I've gotten help from a member of the IT staff at my school - although he doesn't have direct proficiency in R, he's been able to help by applying his knowledge of Python. the MARC program coordinator has also been exceptionally helpful, and I've also been using a number of online forums to find answers to specific questions. I plan on reaching out to my mentor again soon to make sure I'm on the right track and to see if she has any suggestions for other R specifics I should focus on learning.
We're few weeks into the school year now, and I've fully jumped back into working on my project. I contacted my mentor last week and set up a Zoom meeting with her for tomorrow, when I plan on talking with her about specific factors associated with malaria epidemiology that I will be analyzing. I've also found a number of free online courses and tutorials that should assist me in learning how to use R (some suggested to me by my mentor and Stori), and I'm going to continue working on that so that I can start on my data analysis as soon as possible.
With the start of the school year, I'm getting back into the rhythm of working on my MARC project again. I have just finished setting my SMART goals for the month of September. This month, I hope to reach out to my mentor to reconnect with her after the summer, and I hope to make a plan for learning how to use R. I have some online courses and tutorials saved, as well as a few classmates who have some experience with R. With all of these resources, I think that by the end of the month, it is feasible for me to have a plan for learning R and for me to have a plan for how I will specifically analyze the data in at least one of the databases. Senior year is off to a great start and I'm so excited for MARC this year!
In the last several weeks, I've honed in on my research question and hypothesis, as well as my specific aims for the project. I've also looked through some tutorials on R. But most notably, I finished my poster and video for the MARC end of year presentation (poster here). I also got to see what my classmates have been doing in the last semester, and I'm so excited for everyone's projects.
Over the last quarter, my project has come a long way. I decided to focus my project on house conditions and they effect they have on malaria cases, and Dr. Conrad sent me several databases to use. I'm so excited to continue working on my project over the summer and next year. My biggest goals for the summer are to become proficient with R and to keep a dialogue open with Dr. Conrad so that I can jump right back in to my project when Senior year begins.
On another note, a huge thank you goes to Stori. Your patience and guidance over the past three years has been incredible. You've offered your time, your support, and your expertise, and you've been such an important part of my experience at MA. I will miss you next year, but I'm so excited to see what you do next! Wishing you the best of luck.
In the past week, I've spent a lot of time working with R. After a lot of research, I figured out how to read the .dta files that my mentor sent me into R . I also started looking at the Excel spreadsheets that Stori was able to format for me. I'm continuing to think about a research question, and I'll be looking more closely at how I can use R to look for correlations in the data. I also spent time working on my proposal and started on my poster for the poster showcase at the end of the year. In the next week, I will continue working on my poster and proposal, and I'll keep learning about R to hopefully become more proficient in using it.
This week, I met with my mentor to go over the specifics of what I'll be researching for the next year. She gave me more information about what data she could help me access, and we decided that the best way to move forward is for me to look at data regarding wealth, house structure/type, and profession. She sent me the databases that I will be using, and in the next couple of weeks, I will start to look over the data preliminarily. I will also need to learn how to use R for data processing, and I plan on looking for online resources to help me. I also continued to work on my project proposal.