Pre-Conference Workshops

Two consecutive pre-conference workshops will be held on Thursday, September 12th. Cost of participation in the workshops is $30. Seats are limited.

 

Recent years have seen large language models (LLMs) effectively applied in various contexts, but their potential for fine-grained linguistic analysis remains under-explored. This pre-conference workshop focuses on leveraging LLMs for corpus linguistics, particularly in the domain of linguistic annotations and the use of specialized tools for detailed lexicogrammatical analysis. This workshop aims for researchers who are interested in training LLMs using supervised learning methods based on the linguistic annotations, conducting reliability evaluations, and applying these models to large-scale data analysis.

Session Details

Morning Session (9:00 AM - 11:30 AM): Led by Hakyung Sung and Kristopher Kyle, this session will focus on the construction of high-quality manually annotated datasets. Participants will learn how to organize data formats, handle complex annotation instances, adjudicate difficult cases, evaluate annotator reliability, and document processes for systematic annotation records. The session will include hands-on activities involving features of lexicogrammatical complexity (e.g., Biber et al., 2021) to provide practical experience in annotation. Additionally, it will cover how these annotated datasets can be used for further training and evaluation of language models. 

Afternoon Session (1:00PM - 4:00PM): Led by Kristopher Kyle, Jesse Egbert, Doug Biber, and Randi Reppen this segment will introduce the LxGrTagger, an open source linguistic analysis tool that provides automatic annotation of lexical and lexicogrammatical features from the Biber Tagger (Biber, 1988; Biber et al., 2021). The hands-on session will focus on using LxGrTagger to annotate corpora, fix-tagging problematic features, generating numeric feature counts with fix-tagged texts, and conducting a key-features analysis (Egbert & Biber, 2023).

Technical Requirements

Attendees should bring their laptops to the workshop. For the first session, we will use spreadsheet software (e.g., Google Sheets). For the second session, we will use the Python version of LxGrTagger, which has a simple interface. We will also provide annotated text files in case participants encounter technical difficulties and/or are not familiar with Python.