Additional content for the reviewers
Friedman2017
BaranwalClark2022
Plots of probability distributions of relative abundances of bacteria in each datasets. For BaranwalClark2022, samples are grouped by communitiy size.
Loss as computed by the smooth L1 and squared error functions. The smooth L1 parameter is varied between 0 (equivalent to the absolute loss) and 0.5.
The formulas are as follow:
Mean squared error (MSE) on the test set of GraphConv models trained with different loss functions. Models were fitted on 5 data splits, with 5 seeds; the mean and 95% confidence intervals are plotted.
Table describing the genome and annotation sizes in the real datasets. The average genome length and number (#) of annotations is given with the min and max in parenthesis. The number of ubiquitous annotations correspond to annotations that were detected in all genomes and therefore removed. The number of non-ubiquitous annotations correspond to the input length for models.
t-SNE plots of the feature spaces in the real datasets. Friedman2017: Pseudomonas species, Klebsiella aerogenes (Ea), and Serratia marcescens (Sm) are colored distinctly. BaranwalClark2022: bacteria are colored according to their taxonomic family. The three Actinobacteria clustering on the left belong to the Bifidobacterium genus while the others are Collinsella aerofaciens and Eggerthella lenta.
Figure 1: Addition of "grow to equilibrium" to specify that only the relative abundances at equilibrium are predicted.
Figure 2: Addition of pointers from the abundance at equilibrium to the targets in the "Simulate Lotka-Volterra dynamics".