ReproGen has two tracks: one an ‘unshared task’ in which teams attempt to reproduce their own prior human evaluation results (Track B below), the other a shared task in which teams repeat existing human evaluation studies with the aim of reproducing their results (Track A).
A. Main Reproducibility Track: For a shared set of selected human evaluation studies, participants repeat one or more studies, and attempt to reproduce their results, using published information plus additional information and resources provided by the authors, and making common-sense assumptions where information is still incomplete.
B. RYO Track: Reproduce Your Own previous human evaluation results, and report what happened. Unshared task.
ReproGen website: https://reprogen.github.io/2021/
Anya Belz, Anastasia Shimorina, Shubham Agarwal, and Ehud Reiter. 2021. The ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results. In Proceedings of the 14th International Conference on Natural Language Generation, pages 249–258, Aberdeen, Scotland, UK. Association for Computational Linguistics.
Christian Richter, Yanran Chen, and Steffen Eger. 2021. TUDA-Reproducibility @ ReproGen: Replicability of Human Evaluation of Text-to-Text and Concept-to-Text Generation. In Proceedings of the 14th International Conference on Natural Language Generation, pages 301–307, Aberdeen, Scotland, UK. Association for Computational Linguistics.
Simon Mille, Thiago Castro Ferreira, Anya Belz, and Brian Davis. 2021. Another PASS: A Reproduction Study of the Human Evaluation of a Football Report Generation System. In Proceedings of the 14th International Conference on Natural Language Generation, pages 286–292, Aberdeen, Scotland, UK. Association for Computational Linguistics.
Track B
Saad Mahamood. 2021. Reproducing a Comparison of Hedged and Non-hedged NLG Texts. In Proceedings of the 14th International Conference on Natural Language Generation, pages 282–285, Aberdeen, Scotland, UK. Association for Computational Linguistics.
Maja Popović and Anya Belz. 2021. A Reproduction Study of an Annotation-based Human Evaluation of MT Outputs. In Proceedings of the 14th International Conference on Natural Language Generation, pages 293–300, Aberdeen, Scotland, UK. Association for Computational Linguistics.