ESPnet: End-to-end Speech Processing Toolkit:
ESPnet is an end-to-end speech processing toolkit that mainly focuses on end-to-end speech recognition. ESPnet is a scalable research framework that supports reproducible recipes and provides a complete setup for speech recognition, audio foundation models, and other speech processing experiments.
VERSA: Versatile Evaluation of Speech and Audio)
VERSA is a comprehensive toolkit for evaluating speech and audio quality. It provides seamless access to over 90 evaluation/profiling metrics with 10x variants, enabling researchers and developers to assess audio quality through multiple dimensions.