Workshop on Stylistic Variation
The concept of linguistic style as an object of study independent of the semantic content of language has a long history in descriptive linguistics, but outside of a handful of classic applications such as authorship attribution, it has never been a mainstream area of research in computational linguistics. Though the increasing amount of work involving stylistically heterogeneous texts—for instance social media or literature—means researchers are increasingly forced to address relevant issues, yet there exists no venue for discussion of shared issues across the many instantiations of stylistic difference, including those involving variables such as individual speaker, speaker demographics, target audience, genre, language, etc. The fragmentation of stylistics in NLP ultimately hinders the kind of real progress on relevant issues that have been apparent recently in core research areas such as semantics. We propose this workshop, therefore, with the goal of bringing together a diverse collection of researchers who encounter stylistic variation directly or indirectly in their work, identifying joint challenges and future directions.
From a traditional task-driven perspective, the general concept of stylistic variation might appear on the surface to an overly broad and nebulous one. It is relatively easy, however, to find ways to tie together otherwise disparate threads. The stylistic dimension of formality, for example, can be related directly to any of the variables mentioned above: people express themselves more or less formally due to their background, their intended audience, the conventions of their language for the genre in question, or just as a matter of personal style. The associated variation in phonological, lexical, syntactic, or discourse realisation of a particular semantic content have important consequences for low-level tasks such as text normalization, POS tagging, and parsing as well as downstream applications such as text simplification, sentiment analysis, information retrieval, or text generation. One of the overarching questions that motivates this workshop what extent it is possible or desirable to go beyond superficial, uninterpretable, task-specific stylistic features to deeper, broader, more systematic, and more psychologically-plausible conceptualizations of stylistic variation which would allow for more generalization beyond individual tasks.
Topics of interest would include (but are not limited to):
- Evidence for or against targeted approaches to stylistic variation
- General methods for differentiating style from semantics/topic
- Interpretability of computational models of style
- Use of classic stylistic features (e.g. function words, POS n-grams) in classification
- Effects of stylistic variation on downstream tasks
- Authorship attribution
- Stylistic segmentation/intrinsic plagiarism detection
- Style in distributional vector space models (embeddings, etc.)
- Stylistic lexicon acquisition
- Text normalization
- Domain adaptation (across stylistically distinct domains)
- Modelling of demographics and personality
- Politeness and other linguistic manifestations of social power
- Quantification of genre differences
- Stylistically-informed sentiment analysis (e.g. sarcasm, hate speech)
- Readability, complexity, and simplification
- Learner language (e.g. fluency, use of collocations, stylistic appropriateness, etc.)
- Style-aware natural language generation
- Identifying trustworthiness and deception
- Literary stylistics (author and character profiling)
- Rhetoric (e.g. stylistic choice in political speeches, etc.)
- Stylistic features for diagnosis of mental illness
- Style in acoustic signals (e.g. speaker identification)
- The challenges of annotating style
Since style is often a secondary focus of relevant work in other areas of computational linguistics, in addition to standard long research papers we would like to solicit short papers which offer a thoughtful re-exploration of existing work relative to the stylistic interests of the workshop. Since the importance of style goes well beyond computational linguistics, we also welcome empirical perspectives from other disciplines, including sociolinguistics, corpus linguistics, psycholinguistics, education, political science, and the digital humanities.
Julian Brooke is a McKenzie Postdoctoral Fellow at the the University of Melbourne. His PhD thesis was on lexical aspects of stylistic variation, including automated methods for stylistic lexicon acquisition, properties of style across genres and demographic groups, and robust native language identification. His other interests include sentiment analysis, multiword expressions, and applications of NLP in education and the digital humanities.
Thamar Solorio is Associate Professor in the Department of Computer Science at the University of Houston (UH). One of her main lines of research involves stylistic modeling of text in applications such as authorship attribution and profiling, detection of cyberpedophilia and strongly negative text. Her work on attribution is currently funded by a CAREER award from the National Science Foundation.
Moshe Koppel is in the Department of Computer Science at Bar-Ilan University. Much of his research has focused on the use of stylistic variation for authorship attribution, verification and profiling. He currently directs the DICTA project for applying methods developed in these areas to historical corpora.