Methodology

Project Goals

The goal of this project was to help Dr. Polliana Leru and her team at Colentina Clinical Hospital and Carol Davila University of Pharmacy and Medicine organize data and make correlations between pollen counts, pollution, climate and lifestyle choices to combat the rise in seasonal allergies. To achieve this goal, our team obtained all current data, created a software tool for data analysis, analyzed correlations in the data and compared them to other countries, examined future trends, and devised a strategy to educate the public about the rise in seasonal allergies. These steps are outlined in Figure 9. Our timeline to complete these activities can be found in Appendix B.

As mentioned in the Introduction, due to the global outbreak of COVID19, we were unable to travel to Bucharest and all work was done remotely in the United States. Fortunately, data could be sent electronically so the only change in our methodology was the implementation of the lifestyle survey, which will be further explained later in this chapter.

Figure 9: Project Goals, Specific Objectives and Associated Methods

Compile Data on Pollen, Allergies, Climate and Lifestyle Choices

Before we started any analysis, we needed to collect all necessary data. Throughout the project, we worked with data of pollen counts, meteorological information, chemical air pollutants, and public interest. Data about pollen counts and air pollutants was sent by our collaborators, and additional meteorological data was collected by the National Meteorological Agency and the National Institute for Research and Development in Optoelectronics (INOE). There is little existing data about the types of the lifestyle choices that Dr. Leru wanted us to analyze, so we developed our own method to collect this through a survey. Additionally, we were unable to get respiratory health data from Romania during the pandemic. We have instead collected data about the general Romanian public’s interest in pollen and allergies using Google Trends.


3.1.1 Obtaining Pollen Data

Dr. Leru and her collaborators have been collecting pollen data using the pollen trap located at Colentina Clinical Hospital since 2014. Our team obtained this data from Dr. Leru’s PhD students and made our data analysis tool compatible with their data formats. We received pollen data for 37 different species, including Ambrosia (ragweed), grasses and trees, consisting of a daily count of pollen particles per cubic meter of air (Leru et al., 2018). Dr. Leru informed us that the pollen trap failed to correctly capture Ambrosia pollen in 2015, so we removed the Ambrosia 2015 data to prevent it from inaccurately skewing our results.

3.1.2 Obtaining Climate and Chemical Air Pollution Data

Climate data was provided along with pollen data from Dr. Leru’s PhD students. It was collected from the National Meteorological Agency (Global Surface Archives, 2019), consisting of measurements for temperature, dew point, relative humidity, solar radiation, wind direction, wind speed, and precipitation amount, all of which have been previously shown to affect the spread of pollen. We also received additional Bucharest climate data from her collaborators at INOE, allowing us to fill in the missing data points in our data set for the first half of 2014 and the last half of 2019.

Dr. Leru’s collaborators also gave us the chemical pollution data that she has obtained from the National Air Quality Monitoring Network stations B3 and B1 since 2014, containing measurements of PM10 (particulate matter with a diameter of 10 microns or less), PM2.5, ozone, toluene, benzene, butadiene, carbon monoxide, sulfur dioxide, nitrogen oxide, nitrogen dioxide, and alternative forms of nitrogen oxide species (NOx). Figure 10 below shows the location of all 6 air quality stations in Bucharest in context with the pollen trap location.

https://drive.google.com/file/d/1Kxm9uGTg88zhtGOFPLGfcEkHNIDZBQ_5/view?usp=sharing

Figure 10: Air Quality Monitoring Stations in Bucharest (National Air Quality Monitoring Network)

3.1.3 Gathering Data on Romanian Lifestyle Choices

As mentioned in the Background, studies from other countries have found a correlation between lifestyle choices and seasonal allergies. These correlations were shown both directly to impact the individual’s health and indirectly to affect the environment and increase pollen production. However, public data about general lifestyle practices of the Romanian people are lacking. Therefore, we designed a survey to begin collecting this data and for future distribution until a large enough data set is collected. To design an effective survey, we looked at similar studies done in Romania and other European countries, and applied their methods.

3.1.4 Iteration of Survey Questions

Our original plan was to conduct a focus group of 20 participants from Colentina Clinical Hospital with various backgrounds and knowledge of allergies and medicine, including doctors, administrative staff, and patients, to revise the survey questions. However, with the project having to be carried out remotely, we sent the survey to Dr. Leru, who had experience distributing surveys and is an expert in allergies (Leru et al., 2015), as well as a small group of local friends, family and classmates for feedback and sample responses for the iteration of the questions. The final version of the survey questions can be found in Appendix A. We distributed our survey to a Romanian Ambrosia allergy sufferers Facebook group for quantitative responses. Dr. Leru also distributed the survey to some of her patients.

3.1.5 Gathering Public Interest Data

Along with the lifestyle survey, we wanted to include data on the interest in pollen allergies of the Romanian people. We did this using Google Trends, which is a feature that shows statistics on Google searches in any given location around the world. A popular term suggests that there is an interest in learning more about it. We downloaded and used this data to represent the public’s level of allergy symptoms because we can connect the number of people affected by pollen and the number of people searching for information about allergies and ragweed.

Developing and Implementing a Data Analysis Tool

Dr. Leru and her team did not have a solid method for statistically analyzing their data, so we decided to develop a software tool that can analyze large sets of data and can be easily used by others with little statistics or computer science background. We then used this tool, termed the “Correlation Machine”, to analyze the data on pollen, pollution, climate, and public interest by calculating correlations, predicting future trends, and then compared the results to other countries.

3.2.1 Developing the Correlation Machine

To create a method for Dr. Leru and her team to easily analyze their data in the future, we designed a user interface (UI) with simple interactive features. We chose to code everything in Python due to its wide offering of libraries, widespread use, and comprehensive documentation. This allows our group and those who come after us to manage and use the program.

The data analysis features we implemented consist of compiling and managing data, calculating Spearman’s correlation coefficients, producing graphs, and generating lines of regression. We also created optional settings for the user to look at specific dates or seasons, calculate yearly or monthly averages, or extend a regression line to a future year. We chose these options based on the specific correlations Dr. Leru was interested in seeing. We also made the tool user friendly by programming it to update all correlations to a table, color coding the numbers so someone of any background can easily understand their results, and to document the correlations by saving them to Excel. See the Results section 4.1 for a description of this final deliverable.

We ensured the Correlation Machine was easy for anyone to distribute and install by tightly packaging all of the code. The user does not need to download anything other than a single executable file that they can easily open to run the program whenever they want.

3.2.2 Organizing the Data for Analysis

First, we used the Correlation Machine to organize all of the data into a consistent form for easy interpretation. Since the input data can be in many different formats, we made the tool adaptable to many variations. We designed the tool to be capable of continually updating with new data, without making any modifications to the source file. We implemented this feature so our collaborator can continually collect more data and expand the results over time. Additionally, since there are many different types of data, such as pollen or climate, that may be input to the tool, it allows pollen to be manually distinguished from others and minimally processed to normalize all data.

We used these features to compile all of the data into one large “master” file with the dates shown on the leftmost side and each column representing a different category. Pollen categories were distinguished and the total pollen count per day was added to the data set.

3.2.3 Analysis of Correlations

Once the data was compiled, we analyzed the correlations between pollen, climate, pollution, and public interest. We chose Spearman’s coefficient to calculate correlations because it is most commonly used in similar pollen and allergy studies as a way to statistically quantify relationships between variables. Spearman’s coefficient represents how well the data fits a monotonic function. A positive correlation means that, as the values of one variable increase, so do the values of the other variable, whereas a negative correlation means that one variable decreases as the other increases.

We used the Correlation Machine to calculate the coefficients, and considered those greater than 0.4 and less than -0.4 to be significant (Akoglu, 2018). We also calculated a p-value, which describes the probability that the correlation is accurate. A p-value of less than 0.05 was considered statistically significant. We chose these boundaries using standards for the research community.

Correlation coefficients are a computational way to indicate cause for further investigation. While this does not prove causation between the variables, a strong correlation indicates the potential to predict future trends in the relationships between factors of climate, lifestyle choices, pollen levels, and seasonal allergies. We took into consideration the nature of the data to understand what the coefficients describe. We also consulted with our collaborator for her expertise in pollen allergies and used her guidance to conduct our analysis.

3.2.4 Graphing and Prospective Analysis

XY scatter plots have been used in other studies to represent the correlations of pollen symptoms to other factors, however, we designed our tool for viewing both the XY scatter plot and boxplot for any data set. A box plot with whiskers is capable of providing a year’s worth of information from a glance, even factoring in uncertainty unlike a yearly average, to provide a cleaner visual. We discussed with Dr. Leru and determined that scatter plots were more relevant in this particular study for comparability with studies in other countries.

Another area where other studies are lacking is predictions of how the relationships evolve. We improved this method by adding regression as an additional level of analysis. Regression determines the relationship between variables using a single best-fit line and creates a predictive model of how they may interact in the future. We looked at three types of regression to analyze, since one may be more suited than another for any given relationship, and programmed the tool to be capable of plotting all three. We looked at linear regression, which uses a straight line, second-order polynomial, which uses a single-curved line, and LOESS, which shows a best-fit for non-linear trends. Since LOESS uses non-parametric functions for data smoothing, we could not use it to predict how the trend will continue in the far future. However, visual analysis of this type of trend was still important in understanding the relationship and predicting how the trendline might continue in the near future.

We completed a prospective analysis to predict future pollen and allergy trends based on our correlation and regression results, current climate and pollution patterns in Bucharest, and results from other studies. These regressions and projections were combined with a contextual knowledge of how pollens interact with various factors, as well as how different factors may behave over time. An example of the importance of context with climate change is if one were to create a linear regression of the last three years’ weather, one might falsely assume that the average temperatures are dropping and therefore the climate is cooling. One must use the distinction between weather and climate to understand that three years of decreasing temperature does not indicate global cooling, as climate should not be measured over a matter of years, but decades or centuries. For this reason, we could make preliminary predictions, but confident prospections cannot be made until much more data is gathered.

3.2.5 Comparing Results to Other Countries

Models of the relationships between factors are important because they provide us with the necessary information to draw conclusions. The conclusions we drew were compared to similar studies from the United States and other European countries. We chose the studies to use for comparison based on if they used similar methods to our study, like Spearman’s coefficients or linear or polynomial regressions. We also used studies that had a data set greater than 5 years to see which trends they found that were consistent over longer periods of time. The purpose was to make connections between Bucharest and other countries regarding the rise in pollen allergies and put an emphasis on the need for more research on this problem in Romania.

3.2.6 Documentation

To facilitate the use of the Correlation Machine, we also created technical documentation, written as clearly and literally as possible. It provides technical information about the development of the tool, how the internal code works, possible ways to modify the source code, and how to remake the executable file once a modification is made to the source code. See Appendix D for technical documentation of the tool. We also made sure to leave clear comments in the code itself to briefly describe the goal of the individual lines or blocks of code to help future contributors understand, replicate, or update the source code. We publicly published all of our source code to GitHub (https://github.com/trschaeffer/PollenTool) for anyone who wants to use it or make modifications in the future, with special care taken to not share our collaborator’s data along with it.

We also created a User’s Guide with step-by-step instructions and a series of YouTube video tutorials to help anyone who has limited experience with statistics or computer science using the tool. These additional documents contain annotated photos to help guide the reader and have a flexible format to allow people who are not fluent in English to learn at a pace that is comfortable for them. We made the guide and video tutorials accessible through links in a “Help” tab within the tool itself. A link to the User’s Guide can be found in Appendix D.

To assess the effectiveness of our documentation, we evaluated how well we met the guidelines of Nielsen’s Heuristics and adjusted the functions of the tool to satisfy them as best we could. We also sent the tool along with the User’s Guide to Dr. Leru and our advisor Professor Addison to test. They assessed how easily an inexperienced user could install and use the tool to analyze a small data set based solely on the documentation. Using their feedback, we made adjustments to the appearance of the tool and wording of the guide to ensure our materials were user-friendly.

Informing and Educating the Romanian Public

In addition to analyzing the data, we wanted to inform and educate the public about pollen allergies and the related factors so people can try to reduce their effect on the population. Keeping in mind the cultural and social implications of Romania's forty-five year history of being a communist country, we believed that the most neutral and effective approach to raise awareness while avoiding any semblance of propaganda was to create a Facebook page to provide a social media presence for pollen allergies research. Additionally, we designed an informative pamphlet to be distributed to the public and a website to showcase all of our project work.

3.3.1 Facebook Page

The purpose of the Facebook page is to provide information to the public in an easy and accessible way, as well as gain more interest in research and encourage people to reach out if they have any questions. Our Facebook page provides the link to the lifestyle survey for our collaborators to collect more data. It contains posts about basic pollen and seasonal allergy information, updates on research, and has links to useful articles and news. We created this page and posted at least once a week as our project progressed to help form a strong informational basis for our visitors. We handed over the page as a deliverable at the end of the project.

3.3.2 Pamphlet

In addition to the Facebook page, we designed a pamphlet to provide basic information about pollen and seasonal allergies on a global scale, common symptoms, as well as several prevention and treatment methods in a compact format. The pamphlet can be distributed as physical or electronic copies. It includes a link to the Facebook page and our contact information. We revised the design based on the feedback from our collaborator and advisors, making sure all information was accurate and no images infringe copyright laws. The final version was sent to Dr. Leru along with our other deliverables.

Desired Outcomes

The goal of this project was to assist Dr. Leru in addressing the pollen allergy increase in Bucharest. The first step was to analyze previously collected regarding pollen, pollution, climate and public interest. Then, we streamlined analysis of future data and created an easy way to add new data to the pre-existing dataset. Additionally, comparing our findings to other countries helped us to put Romania in the global context of pollen allergies in order to find ways to decrease their negative impact in Bucharest. Finally, we created materials to raise awareness about the rise in allergies to help further educate the Romanian public. We also created a website for our project, containing all the deliverables, as well as information about our group members. We hope to prove that further support for allergy research is valuable to help the Romanian people learn more about how pollen and seasonal allergies affect their region.