Important Documents

Below you will find copies or links to our most important supplemental documents.

Survey Questions (Appendix A)

Proposed Timeline (Appendix B)

Pamphlet (Appendix C)

Correlation Machine Technical Documentation (Appendix D)

The tool that we created is a data analysis program coded in Python to determine the relationships between airborne pollen counts and environmental factors such as climate and air pollution. The full code can be found on Github: https://github.com/trschaeffer/PollenTool


We have also written a detailed User’s Guide including video tutorials so that anyone can learn how to use the tool. It can be accessed from this link:

https://docs.google.com/document/d/1AO7KPz4ybl1cILyjgckGVQ38X49zzYsnGo60tWddggA/edit?usp=sharing


D.1 Development of the Data Analysis Tool

The development of the pollen tool began as a segmented process where the individual functions were written before implementation into the user interface (UI). The three main tabs of the UI: Data Entry, Correlation, and Plotting, became three separate python files which are imported into the main UI file. The goal of the data entry file is reading and merging data that the user inputs to the master Excel file using the openpyxl library. The correlation functions are able to filter the data into a format that can be analyzed using Spearman’s correlation methods. The plotting functions are able to process the data and generate plots using the python library MatPlotLib, including plotting of various forms of regression.

D.2 Development of the User Interface

Our data analysis tool was built with a set of robust logic so that it would be flexible and applicable to as many situations as possible. Below are some of the decisions that we made to ensure that the tool could be used for all of our collaborator’s data.

We have created flexibility in the user’s ability to import data. The tool will accept .xls and .xlsx files, and the format in which data is stored in these files can range. Additionally, the user has the option to import a new data file that they have created and add it to their master data set, or load in the master data set. The master data set consists of all of the data they have imported in the past. This allows all of the different types of data, such as pollution, climate, and pollen, to be stored all in one place for easy access by the tool. Since the daily pollen counts are currently collected manually, we have also created the option to enter individual data points to the master data set using the tool and calculate the total pollen per day.

The tool is also able to calculate correlations and generate graphs based on various conditions the user can opt to use. The user may filter the data to include only certain dates, specify the pollen season, calculate monthly or yearly averages, and plot and extend regression lines into the future.

D.2.1 Data entry Logic

One of the innovative features of our tool is its ability to continually be updated with data over time, allowing more accuracy in its data processing over time. The challenge lies in the variance of formatting styles, thus, our tool must be flexible to receive and understand these differences. We received files with three different formats and two different file types. While these different formats initially provided a challenge, it allowed us to enhance the adaptability of our program. A .xlsx format with the dates in mm/dd/yyyy format was used in the spreadsheet that will contain all data, a spreadsheet we refer to as the “mastersheet”. When importing data to the mastersheet, the tool tries to identify the format of the data file by checking cells ‘A2’, ‘A3’, and ‘B1’ for a date, whether it be a written out month or a formatted date. If written-out months are given, the year is extracted from ‘A1’. The program will then use the orientation of the dates to continue to extract the category information and data out of the file. With this, as well as several functions to merge, and filter data by date and category, we were able to get all data into a single format and onto the mastersheet. This mastersheet is loaded and updated whenever a new file is merged into it.

D.2.2 Merging, filtering, and maintaining data integrity

When a spreadsheet is created it contains three key elements that our program must be able to handle. It contains categories, data, and dates. These categories are filled using the above techniques. Categories will be a list N long, dates will be a list M long, and data will be a list of lists, N lists of M length.

The first operation performed on this dataset is to remove any categories and dates with no data in them. Any cells that are blank, contain a “-”, “--”, or “nan” are considered to be empty and are turned into NoneType. It is best to remove these empty sets so the user will not be able to select an empty category to be analyzed, which would result in absence of results. The dates containing no data were also removed so that the program could run faster and not increase file sizes unnecessarily. Whenever a date or category is deleted from the two lists, special care is taken that the main data list is also resized so that it can continue to align its index values to the indexes of the dates and the categories. Whenever a category or date is deleted, a string is generated describing this action and can be reported to the user.

The second main operation that is performed after selecting a file to browse is to merge this file into the master data set. The idea behind this is it will give the user the ability to interface with all available data at once. Similar categories are brought together, as well as similar dates. To properly merge categories together the category with the shorter name is compared against the longer category, concatenated to the same length. Any data within these categories is also merged, with the data from the new file overwriting the data from the masterfile. This is done so that if there were to be a mistake in the master spreadsheet, one could simply fix the original spreadsheet then import the data once again, as opposed to manually editing the mastersheet as well as the original file. “Total pollen” is a special category that will always overwrite all data in its category. This is done so that if a former pollen category were made a non-pollen category it would not allow any prior data to remain.

If a new category must be added, it is simply put on the end of the list of categories. While this process could be done alphabetically, for us it makes more sense this way because pollen allergies can be loaded into the master sheet first, followed by other factors, allowing the most important data to always appear first in drop-down menus. Upon merging files, the interface will generate a prompt asking the user if the new categories are pollen categories. This is done so that total pollen per day can be calculated based on the pollen count of any given day. To note a pollen category, a marker, “(p)” is added to the end of its name. The marker is always filtered out before presenting data to the user. To view and edit what is a pollen category, the user is able to list all categories along with their status as a pollen category. Dates are merged chronologically, and future dates are appended to the ends of the categories. Special care is taken throughout these processes to ensure that the data is kept associated with the correct dates and categories.

D.2.3 Special treatment of “Pollen Categories”

There are several differences in the way that data is represented between our collaborator and the other institutes whose data we used. When the pollen count is zero, that category is left blank. When weather data is not taken, the cell is left blank or with a ‘-’. This would cause some issues for the tool because it raises the question, “how should a blank or ‘-’ be dealt with to keep the data honest?” The resolution our group created was by making a distinction between the treatment of pollen and non-pollen categories.

When the user merges a file, just before the new categories are appended to the end of the master data set, the user is given a checklist of the new categories, and asked to check off which are pollen types. This begins the special treatment of the pollen category. A ‘(p)’ is added to the end of the category’s name to mark it as a pollen category, and -10000 is added in every blank slot of the data. The -10000 is treated as zero by the UI, and simply marks that the data was added artificially. This allows the removal of pollen categories, along with any zeros the tool has added to the data. There is a button in the Data entry section of the UI allowing the user to switch categories between being a pollen or non-pollen category using the same dialog as before.Having the user perform this task also allows the creation of a total pollen category, which is the sum of all pollen categories. There is a seperate button to calculate this category initially, and whenever a pollen category is modified, it will automatically be recalculated.

D.3 Correlation and graphing

The program creates an array for each variable using “numpy.array”, then uses the function “spearmanr”, available from the SciPy library in Python, to calculate Spearman’s correlation coefficients. We used the “polyfit” function of the in the numpy library to create linear regression for each relationship. Polynomial regression applies the same “polyfit” function with some different parameters. LOESS regression is done using the “loess” function of the statsmodel library.

D.4 Advanced Data Analysis Settings

In order to make the data analysis functions of the tool more flexible, we decided to implement some optional advanced settings for organizing the data. Since there is such a large number of data points available to cover the many years of data collection, the graphs can easily become very messy and difficult to interpret. As a means of simplifying the analysis process, we have made it an option for the user to graph the averages of the data across years or months and to graph these averages. This option is also available for the correlation calculations so the user can determine the appropriate spearman’s correlation coefficient values for the data they wish to analyze. If used, when the user prompts for the correlation calculation or creation of the graph, the data is read in by the tool and it calculates averages of the data points. The tool goes through every date in the file, unless a specific date range is specified (see section below), groups the dates and corresponding data point for each time period, calculates the average for each group of data, and reports them in a chronological array used for the graphing function or correlation calculation. This makes it much easier to visualize and analyze the data over a period of several years. Simplifying the data in this way may also bring to light some patterns in yearly or monthly meteorological or pollution trends that influence the pollen season.

Other options available to the user are selecting to correlate or graph data within only certain dates in the pollen season or a certain date range. If the user chooses to select these options, then only data within the specified date range will be included in the calculations or graphs. To do this, the tool reads in each date, determines if it is within the specified date range, makes arrays of only the corresponding data points within that range, and then completes the calculations and graphing functions using these filtered arrays. Any combination of optional settings is able to be processed by the tool. These features are useful because the user may want to look at only certain dates within the pollen season, or a certain month, week, or year.

D.5 Delivery of the Data Analysis Tool

Python is not a language that is made to produce deliverable products to those with little technical experience. This shortcoming is exasperated by our remote presence, meaning that our team could not be present to assist any installation or technical guidance. Any user loading python packages is typically expected to install Python, collect dependencies from the Python terminal, and finally launch the file through the same terminal. As an alternative to this, we use the Pyinstaller package, a tool that is made to create a single executable file with all dependencies, even Python, included. While it creates a large file that may take a while to load and be scanned by antivirus softwares, this is the best option for our work, as this file can be run on any computer with a single click.

Getting Pyinstaller to properly work was challenging. The process involved finding all of the missing “hidden” imports. The steps taken to get the pyinstaller tool to work were documented and put into a text file so that this process may be repeated in the future. Unfortunately, we found pyinstaller to be extremely unreliable in transferring from computer to computer, so results may vary with our process of using it to successfully create a .exe.



References

Abramson, S. (2018). Reducing Environmental Allergic Triggers: Policy Issues. Journal of Allergy and Clinical Immunology: In Practice, 6(1), 32-35. doi:10.1016/j.jaip.2017.10.027
Agnew, M., Banic, I., Lake, I.R., Goodess, C., Grossi, C. M., Jones, N. R., Plavec, D., Epstein, M., & Turkalj, M. (2018). Modifiable risk factors for common ragweed (Ambrosia artemisiifolia) allergy and disease in children: A case-control study. International Journal of Environmental Research and Public Health, 15(7), 1339. doi: 10.3390/ijerph15071339
Bandera, C. (2016). Design and management of public health outreach using interoperable mobile multimedia: an analysis of a national winter weather preparedness campaign. BMC Public Health, 16(436). https://doi.org/10.1186/s12889-016-3104-z
Berger, U., Karatzas, K., Jaeger, S., Voukantsis, D., Sofiev, M., Brandt, O., Zuberbier, T., & Bergmann, K.C. (2013). Personalized pollen‐related symptom‐forecast information services for allergic rhinitis patients in Europe. Allergy, 68(8), 963-965. doi: 10.1111/all.12181
Blouin, D., Pellerin, S. & Poulin, M. (2019). Increase in non-native species richness leads to biotic homogenization in vacant lots of a highly urbanized landscape. Urban Ecosystems 22(5), 879-892. doi: 10.1007/s11252-019-00863-9
Bocsan, I.C., Muntean, I.A., Ureche, C., Pop, R.M., Neag, M.A., Sabin, O., Deleanu, D., & Buzoianu, A.D. (2019). Characterization of patients with allergic rhinitis to ragweed pollen in two distinct regions of Romania. Medicina, 55(11), 712. doi: 10.3390/medicina55110712
Cariñanos, P. & Casares-Porcel M. (2011). Urban green zones and related pollen allergy: A review. Some guidelines for designing spaces with low allergy impact. Landscape and Urban Planning, 101(3), 205-214. doi: 10.1016/j.landurbplan.2011.03.006
Cartier, A. (1994). Definition and Diagnosis of Occupational Asthma. European Respiratory Journal, 7(7), 153-160
Chan-Yeung, M., & Malo JL. (1994). Aetiological agents in occupational asthma. European respiratory journal, 7(2), 346-371
Chen, K. W., Marusciac, L., Tamas, P. T., Valenta, R., & Panaitescu, C. (2018) Ragweed pollen allergy: Burden, characteristics, and management of an imported allergen source in Europe. International Archives of Allergy and Immunology, 176(3-4), 163-180. doi: 10.1159/000487997
Chirilă, M., Nicolau, C., & Florescu, L. (1990). Houses and allergic respiratory syndromes. Medecine Interne, 28(4), 341-346.
Cmeciu, C., Cmeciu, D., & Pǎtrut, M. (2014). Visual framing of European Years in Romanian public communication campaign blogs. Communication and society, 27(1), 107-125.
Cvetkovski, B., Kritikos, V., Yan, K., & Bosnic-Anticevich, S. (2018). Tell me about your hay fever: a qualitative investigation of allergic rhinitis management from the perspective of the patient. NPJ Primary Care Respiratory Medicine, 28(3). doi: 10.1038/s41533-018-0071-0
D’Amato, G., Cecchi, L., D’Amato, M., & Liccardi, G. (2010). Urban air pollution and climate change as environmental risk factors of respiratory allergy: An update. Journal of Investigational Allergology and Clinical Immunology, 20(2).
Deutschewitz, K., Lausch, A., Kühn, I., & Klotz, S. (2003). Native and alien plant species richness in relation to spatial heterogeneity on a regional scale in Germany, Global Ecology and Biogeography, 12, 299-311. doi: 10.1046/j.1466-822X.2003.00025
Erbas, B., Lowe, A. J., Lodge, C. J., Matheson, M. C., Hosking, C. S., Hill, D. J., Vicendese, D., Allen, K. J., Abramson, M. J., & Dharmage, S. C. (2012). Persistent pollen exposure during infancy is associated with increased risk of subsequent childhood asthma and hayfever. Clinical & Amp; Experimental Allergy, 43(3), 263-378. doi: 10.1111/cea.12071
Florincescu-Gheorghe, N. A., Popescu, F., Alexandru, D. O., & Popescu, F. D. (2019). The prevalence of allergic rhinitis to Ambrosia Elatior in Oltenia area and the association with allergic conjunctivitis or asthma. Current Health Sciences Journal, 45(1), 66–72. doi: 10.12865/CHSJ.45.01.09
Fritz A. (1986). Biophysical factors in ragweed pollen. Grana, 25(3), 221-233. doi:10.1080/00173138609427724
Gilles, S., Akdis, C., Lauener, R., Schmid‐Grendelmeier, P., Bieber, T., Schäppi, G., & Traidl‐Hoffmann, C. (2018). The role of environmental factors in allergy: A critical reappraisal. Experimental Dermatology, 27(11), 1192-1311. doi:10.1111/exd.13769
Hüse, B., Szabó, S., Deák, B., & Tóthmérész, B. (2016). Mapping an ecological network of green habitat patches and their role in maintaining urban biodiversity in and around Debrecen city (Eastern Hungary). Land Use Policy, 57(16), 574-581. doi:10.1016/j.landusepol.2016.06.026
Iannacone, M., & Green, A. (2014). Towards skin cancer prevention and early detection: evolution of skin cancer awareness campaigns in Australia. Melanoma Management, 1(1), 75-84. doi: 10.2217/mmt.14.6
IPCC, 2014: Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Core Writing Team, R.K. Pachauri and L.A. Meyer (eds.)]. IPCC, Geneva, Switzerland, 151 pp.
Jones, N. R., Agnew, M., Banic, I., Grossi, C. M., Colón-González, F. J., Plavec, D., Goodess, C. M., Epstein, M. M., Turkalj, M., & Lake I. R. (2019). Ragweed pollen and allergic symptoms in children: Results from a three-year longitudinal study. Science of The Total Environment, 683, 240-248. doi:10.1016/j.scitotenv.2019.05.284
Leru, P., & Anton, V. (2019). Clinical pattern and risk factors of respiratory allergies due to Ambrosia (ragweed) pollen: Experience of one allergy center from Bucharest. Archives of the Balkan Medical Union
Leru, P. M., Eftimie, A., Anton, V. F., Thibaudon, M. (2019). Five-year data on pollen monitoring, distribution and health impact of allergenic plants in Bucharest and the southeastern region of Romania. Medicina 55(5), 140. doi:10.3390/medicina55050140
Leru, P, Eftimie, A.M. & Thibaudon, M. (2017). First allergenic pollen monitoring in Bucharest and results of three years collaboration with European aerobiology specialists. Romanian Journal of Internal Medicine. 56. doi:10.1515/rjim-2017-0033
Levente, K. (2007). Why is biocontrol of common ragweed, the most allergenic weed in Eastern Europe, still only a hope?. In C. Vincent, M. S. Goettel, & G. Lazarovits (Eds.), Biological control: A global perspective (pp. 80-91). Wallingford: CABI.
Lo, F., Bitz, C., Battisti, D., & Hess, J. (2019). Pollen calendars and maps of allergenic pollen in North America. Aerobiologia, 35, 613-633. doi:10.1007/s10453-019-09601-2Manole, M., Duma, O., Gheorma, A., Manole, A., Pavaleanu, I., Velenciuc, N., Chelaru, L., & Duceac, L.D. (2017). Self-medication- a public health problem in Romania nowadays. The first quest. Medical-Surgical Journal-Revista Medico-Chirurgicala, 121(3), 608-615.
Matsui, E., Abramson, S., & Sandel, M. (2016). Indoor environmental control practices and asthma management. Pediatrics, 138(5). doi:10.1542/peds.2016-2589
Matyasovszky, I., Makra, L., & Tusnády, G.(2018). Biogeographical drivers of ragweed pollen concentrations in Europe. Theoretical and Applied Climatology, 133, 277. doi: 10.1007/s00704-017-2184-8
Medek, D.E., Simunovic, M., Erbas, B., Katelaris, C.H., Lampugnani, E.R., Huete, A., Beggs, P.J., & Davies, J.M. (2019). Enabling self-management of pollen allergies: a pre-season questionnaire evaluating the perceived benefit of providing local pollen information. Aerobiologia, 35(4), 777-782. doi:10.1007/s10453-019-09602-1
Montagnani, C., Gentili, R., Smith, M., Guarino M. F., & Citterio S. (2017). The worldwide spread, success, and impact of ragweed (Ambrosia spp.). Critical Reviews in Plant Sciences, 36(3), 139-178. doi: 10.1080/07352689.2017.1360112
Nicolaou, N., Siddique, N., & Custovic, A. (2005). Allergic disease in urban and rural populations: increasing prevalence with increasing urbanization. Allergy, 60(11). doi:0.1111/j.1398-9995.2005.00961
Pawankar, R., Holgate, S. T., Canonica, G. W., & Lockey, G. F. (Eds.). (2011). WAO white book on allergy. World Allergy Organization
Pfaar, O., Agache, I., Blay, F. de, Bonini, S., Chaker, A. M., Durham, S. R., & Akdis, C. A. (2019). Perspectives in allergen immunotherapy: 2019 and beyond. Allergy, 74(S108). doi:10.1111/all.14077
Petrovicia, A., & Dobrescub, T. (2013). New trends in responsible Romania: Social campaigns. Procedia - Social and Behavioral Sciences, 92, 697-701. https://doi.org/10.1016/j.sbspro.2013.08.741
Platts-Mills, T. (2015). The allergy epidemics: 1870-2010. Journal of allergy and clinical immunology, 136(1), 3-13. doi:10.1016/j.jaci.2015.03.048
Pollen Info. (n.d.). Retrieved January 31, 2020, from https://www.polleninfo.org/RO/ro.html
Popescu, F., & Tudose, A. (2011). Ambrosia pollen sensitization in allergic rhinitis patients from the central part of the Romanian Plain. Romanian Journal of Rhinology, 1(1)
Scăunaș, S., Păunescu, C., & Merciu, G.L. (2019). Spatial-temporal analysis of land cover and use changes using gis tools. Case study Băneasa neighborhood, Bucharest, Journal of Applied Engineering Sciences, 9(2), 187-194. doi:10.2478/jaes-2019-0026
Schmidt, C. W. (2016). Pollen overload: Seasonal allergies in a changing climate. Environmental Health Perspectives, 124(4), A70-A74. doi:10.1289/ehp.124-A70
Sikoparija, B., Skojth, C. A., Celenk, S., Testoni C., Abramidze, T., Almkubler, K., et al. (2016). Spatial and temporal variations in airborne Ambrosia pollen in Europe. Aerobiologia. doi:10.1007/s10453-016-9463-1
Silverberg, J. (2015). Association between adult atopic dermatitis, cardiovascular disease, and increased heart attacks in three population‐based studies. Allergy, 70(10), 1300-1308. doi:10.1111/all.12685
Silverberg, J. (2016). Atopic disease and cardiovascular risk factors in US children. Journal of allergy and clinical immunology, 137(3), 938-940. doi:10.1016/j.jaci.2015.09.012
Silverberg, J., & Greenland, P. (2015) Eczema and cardiovascular risk factors in 2 US adult population studies. Journal of allergy and clinical immunology, 135(3), 721-728. doi:10.1016/j.jaci.2014.11.023
Silverberg, J., Song, J., Pinto, D., Yu, S., Gilbert, A., Dunlop, D., ete al. (2016). Atopic dermatitis is associated with less physical activity in US adults. Journal of investigative dermatology, 136(8), 1714-1716. doi:10.1016/j.jid.2016.04.025
Taramarcaz, P., Lambelet, C., Clot, B., Keimerd C., & Hausera, C. (2005). Ragweed (Ambrosia) progression and its health risks. Swiss Medical Weekly, 135(37-38), 538–548. doi: 2005/37/smw-11201
Te, Q., & Lianghua, C. (2019). Carsharing: mitigation strategy for transport-related carbon footprint. Mitigation and Adaptation Strategies for Global Change. doi: 10.1007/s11027-019-09893-2.
World Bank & World Development Indicators. (2018). CO2 emissions from transport (% of total fuel combustion) [Data file]. Retrieved from https://dataworldbankorg/indicator/ENCO2TRANZS?locations=CN
Ziska, L., Knowlton, K., Rogers, C., Dalan, D., Tierney, N., Elder, M., et al. (2010). Recent warming by latitude associated with increased length of ragweed pollen season in central North America. Proceedings of the National Academy of Sciences of the United States of America, 108(10), 4248-4251. doi:10.1073/pnas.1014107108
Zhou, H., Li, X., Kim, J., Salam, M., Kim, H., McConnell, R., et al. (2018). Effect of inhaled allergens and air pollutants on childhood rhinitis development. Ann allergy asthma immunol, 120(2), 212-214. doi:10.1016/j.anai.2017.10.036