Figure 3 shows the experimental results for three gene-level tasks:
In Figure 3, (a) and (b) refer to the evaluation of Batch Effect Correction, (c) and (d) refer to the evaluation of Multi-omics Data Integration, and (e) - (g) refer to the evaluation of Cell-type Annotation. Our evaluation based on cell-perspective tasks shows that single-cell LLMs can reduce batch effect and annotate cell types.
Figure 3. Experimental results of single-cell LLMs and benchmarking methods for cell-level tasks. (a): An overall assessment of raw data and data after batch effect correction based on different methods. scGPT full represents scGPT model with larger pre-training datasets comparing to scGPT. (b): The effect of hyper-parameters including Bins, Learning rate (LR), and Epoch for scGPT training in the Batch Effect Correction task. (c): Results of different initial settings for the multi-omics data integration task. (d): The effect of hyper-parameters including loss weight, mask ratio, and epoch for scGPT training in the Multi-omics Data Integration task. (e): Comparison among models in the Cell-type Annotation task. The scores on the left represent the average accuracy of different models across different datasets. (f): The effect of hyper-parameters including LR, Epoch and ECS for scGPT training in the Cell-type Annotation. LR and Epoch are shared hyper-parameters for Geneformer, scBERT and scGPT. (g) Ablation tests of the loss function components for cell-type annotation. The red component is significant (p-value<0.05).