This page is for students who are looking for ideas for research ideas that I feel confident would make for good projects. Unfortunately, I don't have the time I would like to investigate all of the questions that I have. So, I would love the opportunity to advise students in investigating any of the topics below. They are organized (somewhat) according to area of research, but note that some topics may fit into more than one category, so I somewhat arbitrarily just put them wherever seemed best to me at the time.
Legal Linguistics
Generate a lexical frequency profile of titles the US Code to determine the lexical difficulty of statutory language.
Generate a lexical frequency profile of warrants to determine whether people who are served warrants can understand them.
Comparing the results of using three types of corpora (legislative history, congressional record, general language) in interpreting a statute.
Using topic modelling on a corpus of patents to find potential candidates for intellectual property infringement.
A corpus-based legal interpretive analysis of any word that is currently or has been contested in the US Constitution or code of federal statutes.
Automatic classification of subregisters within statutes us CorUSS
The use of shell nouns in contracts and statutes
Defining words for jurors using corpus-based materials and examples
Linguistic description of federal rules of civil and criminal procedure, regulations, or patents
Linguistic description of the use and misuse of "I'm sorry laws"
Test readers' comprehension of legal language using eye-tracking, reading comprehension tasks, or even highlighting tasks
Functional analysis of potentially problematic ambiguity in the use of modals in legal language (esp. shall)
Authorship analysis of the books of the Bible/Apocrypha
Eye-tracking of lay readers of contracts vs. lawyers
Corpus Linguistics
A keyword analysis of the standard works comparing the Old vs. New Testament, the Book of Mormon vs. the Bible, and/or authors within the Book of Mormon to each other in order to learn what these different books of scripture emphasize more.
Keyword analysis comparing the 4 gospels
An updated evaluation of collocational measures by correlating them against a psycholinguistic reaction times dataset.
Designing an automated method for annotating nominalizations in English by using NOMLEX combined with rule-based and perhaps probabilistic tagging.
A multidimensional analysis of a language where not such analysis has been performed or where a multidimensional analysis has not been performed for a long time (e.g., Japanese, Korean, German).
Collocates across sentences or punctuation or not? Which correlates better with psycholinguistic norms?
Using key collocates for comparison of the use of words in different discourse communities
Bootstrapping to measure the stability of collocate and keyword lists
Evaluation of the accuracy of part-of-speech taggers under different conditions (e.g., speech vs. writing; edited vs. unedited)
A method of automatic detection of new themes in a corpus (maybe news, social media, general conference) by examining new, repeated, and dispersed items where the lexicon updates, the disperal looks backward and emphasizes second/third uses, and clustered uses as it relates to time, and can considers both words and n-grams.
Consideration of how manipulation of window size affects the nature of collocations: syntactic/phraseological (proximate) vs. semantic/lexical (wider window) vs. discourse (very wide window)
Methodological study considering the use of medians instead of means for measurement of lexical/lexico-grammatical variables in corpus studies
Automatic analysis of English compounds (using an algorithm that I've already developed)
Language Assessment
Use the L2RC to perform a synthetic study examining the use of reliability statistics in applied linguistics.
Comparing vocabulary breadth tests using different item types to determine the extent to which scores can vary because of the type of knowledge being tested.
Creating an computer adaptive self-assessment with a confidence interval criteria.
Second Language Studies
Use the L2RC to perform a corpus-based history of second language methods and theories (lexical MDA).
Use the L2RC to examine terminological drift in second language studies.
Key feature analysis comparing L1 and L2 student writings (where they are writing in response to the same set of prompts).
A situational description of the amount of second language input of foreign language learners at BYU in different conditions: just taking a foreign language class, in foreign language housing, on study abroad.
Correlation between prompt complexity and production complexity for writing and speaking tasks.
Should words in the prompt be considered as indicators of the writer's complexity?
Using generative AI to create novel decodable readers (also could be used by elementary school children)
Register
Test register theory by designing experimental tasks where situational characteristics of language use are manipulated to see how participants change their language production under different conditions. For example, in a task where participants are asked to watch a video and report the events, how do their lexico-grammatical features change if they are limited to 50 vs. 100 vs. 500. vs. 1000 words vs do not have any limit.
Investigate the effect of register on reading fluency by using eye-tracking or read-aloud to determine of phrasally dense registers are more cognitively difficult than less phrasally dense registers.
Investigate the effect of register on writing fluency by looking at composing rate.
Comparison of lexico-grammatical features across speeches/talks in order to predict engagement/entertainment/memorableness/meaningfulness.
Using app tracking data to detect what digital registers people engage with day-to-day
A survey of what registers a person uses in a day (non-proportional) examining range/dispersion of registers across people rather than frequency and amount of time spent.
Vocabulary
Test whether function words are polysemous using vectors combined with MDA. Can vectors improve MDA?
Test the scalability using a scalogram, IRT, or similar techniques to test the validity of the levels of Bauer and Nation's (1993) word families in English and other languages.
Comparing lexical diversity measures against human judgments of lexical diversity in English when lexical diversity is well-defined vs. when it is not defined at all.
Using lexical diversity measures to predict lexical, writing, or overall language proficiency in languages other than English.
Examining whether concreteness or frequency is a better predictor of lexical processing ease
Kernal density in vocabulary lists
Test word list stability by using bootstrapping on existing word lists and correlated against knowledge-based lists
Using knowledge-based vocabulary lists to predict lexical sophistication/lexical proficiency/other constructs.
Bootstrapping to see if there are enough items in a vocabulary size test.
Create a website that houses vocabulary list resources
Using lexical prevalence measures that combine frequency and dispersion to predict word difficulty across datasets
Measuring word imageability by examining the image similarity of the top x# of results from a Google search and validating it against other word imageability datasets
Examine differences in lexical diversity divided by part-of-speech
Grammar
A corpus-based study of language prescriptions and the extent to which they are being followed in edited registers (i.e., publish fiction, academic articles, and news articles).
Investigating whether the use of various honorifics are declining in usage in Korean among younger speakers of the language.
The analysis of reduplicating ideophones/mimetics across registers of Korean (esp. fiction) identified using regex.
Correlating passive usage in news articles with perceived biasedness.
The expanding category of intensifiers in English (e.g, how 'literally' and 'physically' are entering the category)
Change in the use of "your guys'" as a possessive determiner
Corpus-based descriptions of reduplicating ideophones/mimetics in Korean captured using regex (perhaps a cross-register comparison? focus on informal registers?)
Reduplication in English n-grams with high collocation (\b(\w*)(\w{2,})\b \b(\w*\2)\b ) (e.g., hanky panky, hoity toity, pell mell, argy bargy, itsy bitsy, teeny weeny, regular degular etc.)