In summary, we successfully trained a machine learning framework that can predict the variant final determination
based on two numerical variables
with 93.10% accuracy.
This is a promising start, however there is ample opportunity for further training and expansion of our model. For example, there are other markers of CF including biological or clinical markers such as a sweat test, pulmonary function test, or pancreatic function. In clinical settings, sweat tests are conducted on newborns to test their chloride concentration in order to predict CF diagnosis [1]. Pulmonary function tests can give insight into patient's respiratory function through readouts such as functional reserve capacity (FRC), vital capacity (VC), slow vital capacity (SVC), expiratory reserve volume (ERV), and residual volume (RV) [2].There are also factors in the blood that can be screened for, such as immunoreactive trysinogen (IRT) [3]. Adding these factors to the trained and tested data may increase the versatility of the model to predict patient diagnosis outside of genetic testing. Additionally, larger data sets that include non-CF patients should be employed in future work.