Beginnings/Prelude
The first sign posts towards the road to research occurred throughout last year and in the summer. Through taking AP Statistics and the Biotechnology Research class I uncovered in myself a deep interest in statistical data analysis.
Specifically within my AP Statistics class, I learned the uses, applications, and importance of Statistics towards experimental results, while I applied those skills within the Biotechnology Research class where I was in charge of statistical analysis.
I enjoyed this experience with statistics so much that I continued pursuing statistics over the summer, taking probability courses as well as researching more into the application of statistics.
All of these road markers came into play when I began deciding my topic of research. After looking into various topics, I finally found a project that I loved which would play into all of my interests. This of course became my topic of designing a program to quantify statistical errors in molecular dynamics averages. This not only was a statistical heavy project, allowing me to continue my study of statistics, but also tied statistics to computer science and biology, both topics I enjoy.
I was initially hesitant about continuing through with this project, as several possible roadblocks obscured the path in front of me after doing further research into the topic, specifically with familiarity with terminology in the field of molecular dynamics as well as in statistics. However, my resolve in this journey eventually solidified, as I became more and more familiar with all the terminology associated within my field, clearing up the fog along the path, while also making further modifications and adjustments to my research based on the existing research.
The Winding Roads of Research
Initially I began researching both statistical tests and the specific terminologies of Molecular Dynamics.
Learning more about Molecular Dynamics (MD) simulations, I found the vital importance of:
Assigning uncertainties to the computed results
This allows for the drawing of statistically reliable conclusions
This is why it is vital to have proper statistical programs when analyzing MD simulations.
This winding road of research led to a very clear destination for my first research topic and objective of developing a user-friendly computer program to quantify statistical errors in molecular dynamics averages.
It was decided that this would be done using the programming language of Python
Specifically the Mann-Kendall Test, a nonparametric trend test, will play a major role in the test for statistical errors
Specifically I plan to adopt a plan of tests that was suggested by Schiferl and Wallace in, Statistical errors in molecular dynamics averages, using the four separate statistical tests: a test for lack of trend in the X, using a Mann-Kendall test, a test for lack of trend in the standard error S, using another Mann-Kendall test, a test for normality of X, using either a W test or shape test, and lastly a test for lack of positive correlation in the X using a one-tailed von Neumann test.
An given outline for the programming being developed is given on the right:
Key words within this project include:
Molecular Dynamics (MD)
Statistical Sampling
Error Analysis
Mann-Kendall Statistical Test
Serial Correlation
Upon further research within the topic, new crossroads appeared before me, allowing me to pursue multiple subpaths. Specifically I found several other alternative tests that were also suggested including the Wilcoxon-Mann-Whitney rank test and the newly introduced Bootstrap rank Welch test for stochastic equality. This has opened doors for me in attempting to implement each of these tests within my program and comparing them against each other for different types of data, and further optimizing my program.
Python logo 01 From Wikimedia Commons accessed 10 December 2022, <https://commons.wikimedia.org/wiki/File:Python_logo_01.svg>
Creative Commons Licensed
The Future Path
The next tangible step within my project is to develop the program, starting with statistical tests of the program and working up from there. I will first work on implementing the Mann-Kendall test while testing it with reference data.
I look forward to continuing this journey of research and tackling any possible roadblocks that come my way. Thank you to everyone who supported me thus far in my project.