For our final design, we developed a transformer neural network using the tensorflow package as well as a random forest classifier to compare performance to the neural network The classifier is opposed to the random forest regressor, an alternative design considered. Neural networks can learn much more complicated interactions between the bacteria of the gut microbiome and how they interact with each other. In exchange, they are more selective in terms of input and design than a classical machine model. This means that creating a neural network has to be done from scratch, but gives us complete control over how the model works from start to end. The dataset is split into both a sequence embedding vector and CLR counts scale the embeddings, combining all the data into a matrix of sequences per patient.The data is then run through both PCA for dimensionality reduction as well as a Multi-Headed Attention layer, which is a novel layer used in modern models such as ChatGPT to find non-linear patterns within the input. The results are then run through both a classifier and regressor step to predict the correct month, with the regressor making sure to go through cyclical consideration as the end of the year is close to the start of the year. The results are then combined to arrive at a final prediction.
Our final results shows that the model can guess within roughly 1 month of the correct month, which shows that the gut microbiome changes enough throughout the year to be detectable.
We are still unsure what exactly about the different times of year lead to the different levels of gut microbe populations, but current guesses are either due to slight shifts in diet throughout the year, changes in weather, or different levels of sunlight