Jointly with Prof. Dr. Stefan Conrad from HHU Düsseldorf, I received a grant from the Federal Ministry of Research, Technology and Space to develop competencies in text analysis and to support other educational researchers in working with textual data (Grant number: 16DKWN139A - Studi-BUCH). The funding period was from October 2022 to September 2025.
The grant was based on three pillars: i) Digitization of extensive textual and tabular data on the German higher education market, ii) research on the development of the German higher education landscape over the past 50 years and how these affected educational choices and (regional) labor markets, incl. usage of ML- and NLP-methods, iii) development of strong skills in NLP including transfer of these skills into the Education and Labor Econ research community.
We organized three amazing interdisciplinary workshops during which we discussed papers employing various NLP techniques to textual data from various sources such as curricula, textbooks, counseling guides, job vacancies, and vocational training regulations. In addition, we designed an NLP toolbox that can be downloaded to benefit from a ready-to-use tool to get insights from your textual data.
NLP4EDU is an application that enables researchers to use common natural language processing (NLP) methods for data analysis. It provides an easy-to-use interface for configuring and running various NLP tasks. No coding skills required.
Download the app here (redirects to github), provided by Boris Thome and Michael Khal (both at HHU Düsseldorf).
May 22 & 23, 2023: Program - Keynotes: Anna Kerkhof (ifo & LMU) and Theresa Gessler (Uni Viadrina)
May 16 & 17, 2024: Program - Keynotes: Alessandra Casarico (Uni Bocconi & SI Lab) and Simon Wiederhold (IWH Halle)
July 9 & 10, 2025: Program - Keynotes: Felix Chopra (Frankfurt School of Finance & Management) and Frauke Peter (DZHW)
Publications
Peer-reviewed publications:
Thome, Boris, Friederike Hertweck, Serife Yasar, Lukas Jonas & Stefan Conrad, 2025. A dataset of study program availability in German higher education between 1971 and 1996. Scientific Data, 12, 1626.
Thome, Boris, Friederike Hertweck & Stefan Conrad, 2025. Predicting Perceived Text Complexity: The Role of Person-Related Features in Profile-Based Models. Journal of Educational Data Mining, 17(1), 276–307.
Thome, Boris, Friederike Hertweck, Lukas Jonas & Serife Yasar, 2024. Automated Extraction of Icon-based Tables, GI-Edition Lecture Notes in Informatics, pp. 2003-2005.
Thome, Boris, Friederike Hertweck & Stefan Conrad, 2024. Determining Perceived Text Complexity: An Evaluation of German Sentences Through Student Assessments, Proceedings of the Seventeenth International Conference on Educational Data Mining (EDM 2024), pp. 714-721.
Data and dataset descriptions:
Hertweck, Friederike, Lukas Jonas, Boris Thome & Serife Yasar, 2024. RWI-UNI-SUBJECTS: Complete records of all subjects across German HEIs (1971 - 1996). RWI-Micro. Version: 1. RWI – Leibniz Institute for Economic Research. Dataset. https://doi.org/10.7807/studi:buch:suf:v1.
Hertweck, Friederike, Lukas Jonas, Boris Thome & Serife Yasar, 2024. RWI-UNI-SUBJECTS: Complete records of all subjects across German HEIs (1971-1996), RWI Materialien.
Completed work under review
Hertweck, Friederike, Boris Thome, & Valeria Ride, 2025. Enduring words, enduring worlds? The persistent language of study guides informing about teacher education in Germany since 1971. [Currently under Review.]
Hertweck, Friederike & Serife Yasar, 2024. Effects of college openings on local youths. Ruhr Economic Paper No. 1075. [Currently under Review.]
Remaining work in progress
The effect of computer science at HEIs on local labor markets (with Shihang Hou, Britta Jensen & Lukas Jonas)