This project leverages two complementary datasets from the LiTraj benchmark suite. These datasets differ in accuracy and scale, enabling a transfer learning framework.
Size: 122,421 lithium vacancy migration hops
Method: BVSE-NEB (Bond Valence Site Energy + Nudged Elastic Band)
This dataset contains migration barriers computed using an approximate electrostatic model. While less accurate than DFT, BVSE allows rapid estimation of barriers at large scale.
Key characteristics:
Each data point corresponds to a single lithium vacancy hop.
Structures are provided as supercells.
A special centroid atom (“X”) marks the diffusion bottleneck.
The dataset includes predefined train/validation/test splits.
Role in this project:
This dataset is used for large-scale pretraining to learn structural representations of migration physics.
Size: 1,681 lithium vacancy migration hops
Method: DFT-NEB (Density Functional Theory + Nudged Elastic Band)
This dataset contains high-accuracy migration barriers computed using quantum mechanical DFT calculations.
Key characteristics:
Migration paths are fully relaxed using DFT-NEB.
Forces and energies are computed from first principles.
Structures include centroid representations for GNN input.
Role in this project:
This dataset serves as the high-fidelity benchmark. It is used for:
Fine-tuning pretrained models
Training a scratch baseline
Final evaluation
To ensure fair evaluation:
DFT data was split into 80% training and 20% test.
The test set remained untouched during training.
Both scratch and fine-tuned models were evaluated on the same test set.
This ensures a clean comparison between training strategies.