Tools
Baseline systems
We provide code to train baseline systems for DSGS to German in a public Github repository. The codebase contains scripts to preprocess data, train, translate and evaluate models. The underlying sequence-to-sequence toolkit is Sockeye which is based on Pytorch.
For questions or comments regarding the code please open issues on Github. Also, pull requests with contributions are very welcome.
Data set loaders
We added our training corpora to the sign_language_datasets library. The datasets can now be loaded as a Tensorflow data set. For example, provided that you obtained Zenodo access tokens:
import tensorflow_datasets as tfds
import sign_language_datasets.datasets
from sign_language_datasets.datasets.config import SignDatasetConfig
# Populate your access tokens
TOKENS = {
"zenodo_focusnews_token": "TODO",
"zenodo_srf_videos_token": "TODO",
"zenodo_srf_poses_token": "TODO"
}
# Load only the annotations, and include path to video files
config = SignDatasetConfig(name="annotations", version="1.0.0", process_video=False)
wmtslt = tfds.load(name='wmtslt', builder_kwargs={"config": config, **TOKENS})
Example: loading training data as a TFDS data set
See the README for further instructions and usage examples. For questions or comments regarding this loader please open issues on Github.