Selected Publications
Books
Shermis, M.D., & Burstein, J. (2013). Handbook of Automated Essay Evaluation: Current Applications and Future Directions. New York: Routledge.
Shermis, M. D., & Burstein, J. (2003). Automated essay scoring: A cross-disciplinary perspective. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Book Chapters, Conference Papers, Journal Articles, & Research Reports
von Davier, A.A., Burstein, J. (2024). AI in the Assessment Ecosystem: A Human–Centered AI Perspective. In: Ilic, P., Casebourne, I., Wegerif, R. (eds) Artificial Intelligence in Education: The Intersection of Technology and Pedagogy. Intelligent Systems Reference Library, vol 261. Springer, Cham. https://doi.org/10.1007/978-3-031-71232-6_6
Burstein, J. & LaFlair, G. T. (2024). Where Assessment and Responsible AI Meet. To appear in the Special Issue In Honour of Carol A. Chapelle’s Contributions to Language Assessment and Learning: Language Teaching Research Quarterly. (Guest Editors: Christine Coombe, Tony Clark, and Hassan Mohebbi): https://doi.org/10.48550/arXiv.2411.02577.
Burstein, J. & Attali, Y. (2024). Automated Writing Evaluation. In Antony John Kunnan (Editor). The Concise Companion to Language Assessment. Wiley.
Burstein, J., LaFlair, G.T., Yancey, K., Davier, A.A., & Dotan, R. (2024). Responsible AI for Test Equity and Quality: The Duolingo English Test as a Case Study: https://doi.org/10.48550/arXiv.2409.07476
Cardwell , R., Naismith, B., Burstein, J., Nydick, S., Goodwin, S., & Verardi, A. (2024). From Pen to Pixel: Rethinking English Language Proficiency Admissions Assessments in the Digital Era. CALICO Journal. https://doi.org/10.1558/cj.27104
Belzak, W.C.M., Naismith, B., Burstein, J. (2023). Ensuring Fairness of Human- and AI-Generated Test Items. In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, vol 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_108
Yancey, K., LaFlair, G., Verardi, A., & Burstein, J. (2023). Rating Short L2 Essays on the CEFR Scale with GPT-4. To appear in Proceedings of the18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), ACL 2023, Toronto, Canada.
Naismith, B. ,Mulcaire, P., & Burstein, J., (2023). Automated Evaluation of Written Discourse Coherence Using GPT-4. To appear in Proceedings of the18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), ACL 2023, Toronto, Canada.
Burstein, J. (2023). Duolingo English Test: Responsible AI Standards: https://doi.org/10.46999/VCAE5025
McCaffrey, D., Burstein, J., et al (2023). Making Sense of College Students’ Writing Achievement and Retention With Automated Writing Evaluation. In Yaneva, V. and von Davier, M. (Eds). Advancing Natural Language Processing in Educational Assessment. NCME Educational Measurement and Assessment Book Series. Taylor & Francis, 2023.
Langenfeld, T., Burstein, J., & von Davier, A (2022). Digital-first Learning and Assessment Systems for the 21st Century. Frontiers in Education: Assessment, Testing and Applied Measurement.
Oddis, K., Burstein, J., McCaffrey, D., & Holtzman, S. (2022). A Framework for Analyzing Features of Writing Curriculum in Studies of Student Writing Achievement. The Journal of Writing Analytics, Volume 6, 2022: Incubating Writing Analytics Research in the Time of COVID-19: 95-141.
McCaffrey, D., Zhang, M., & Burstein, J., (2022). Across Performance Contexts: Using Automated Writing Evaluation to Explore Student Writing. The Journal of Writing Analytics, Volume 6, 2022: Incubating Writing Analytics Research in the Time of COVID-19: 167-199.
Burstein, J. , LaFlair, G., Kunnan, A.J., & A. von Davier (2022). A Theoretical Assessment Ecosystem for a Digital-First Assessment—The Duolingo English Test. Duolingo Research Report DRR-21-04: 1-32.
Cumming, A., Cho, Y., Burstein, J., Everson, P., & Kantor, R. (2021). Assessing academic writing. In Assessing Academic English for Higher Education Admissions (pp. 107-151). Routledge.
McCaffrey, D., Holtzman, S., Burstein, J.,and Beigman Klebanov, B. (2021). What can we learn about college retention from student writing? In Companion Proceedings in the 11th International Conference on Learning Analytics & Knowledge (LAK21).
Ling, Guangming, Elliot, N., Burstein, J. C., McCaffrey, D. F., MacArthur, C. A., Holtzman, S. (2021). Writing motivation: A validation study of self-judgement and performance. Assessing Writing, 48.
Hazelton, L., Nastal, J., Elliot, N., Burstein, J. & McCaffrey, D. (2021). Formative automated writing evaluation: A standpoint theory of action. Journal of Response to Writing, 7(1), 3.
Burstein, J., Riordan, B., & McCaffrey, D. (2020). Expanding Automated Writing Evaluation. In, Yan, D., Rupp, A. A., & Foltz, P. W. (Eds.). (2020). Handbook of automated scoring: Theory into practice. CRC Press.
Burstein, J. (2020). Natural Language Processing and the Literacy Challenge. In H. Jiao, & R. Lissetz (Eds). Applications of artificial intelligence to assessment. Charlotte, NC: Information Age Publisher.
Burstein, J., McCaffrey, D., Elliot, N., Beigman Klebanov, B., Molloy, H., Houghton, P. & Mladineo, Z. (2020). Exploring Writing Achievement and Genre in Postsecondary Writing . In Companion Proceedings in the 10th International Conference on Learning Analytics & Knowledge (LAK20), 53-55.
Burstein, J., McCaffrey, D., Beigman Klebanov, B., Ling, G. & Holtzman, S. (2019). Exploring Writing Analytics and Postsecondary Success Indicators. In Companion Proceedings 9th International Conference on Learning Analytics & Knowledge (LAK19), 213-214.
Burstein, J. Elliot, N., Beigman Klebanov, B., Madnani, N., Napolitano, D., Schwartz, M., Houghton, P. & Molloy, H. (2018). Writing Mentor: Writing Progress Using Self-Regulated Writing Support. Journal of Writing Analytics. Vol. 2: 285-313.
Madnani, N., Burstein, J., Elliot, N., Beigman Klebanov, B., Napolitano, D., Andreyev, S., and Schwartz, M. (2018). Writing Mentor: Self-Regulated Writing Feedback for Struggling Writers. In Proceedings of COLING (demos).
Burstein, J., McCaffrey, D., Beigman Klebanov, B., & Ling, G. (2017). Exploring Relationships between Writing and Broader Outcomes with Automated Writing Evaluation. In Proceeding of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), EMNLP 2017, Copenhagen, Denmark.
Burstein, J., Madnani, N., Sabatini, J., McCaffrey, D., Biggers, K., and Dreier, K. (2017) Generating Language Activities in Real-Time for English Learners using Language Muse. In Proc. Fourth Annual ACM Conference on Learning at Scale (Short Papers).
Madnani, N., Loukina, A., von Davier, A., Burstein, J. and Cahill, A. (2017) Building Better Open-source Tools to Support Fairness in Automated Scoring. In Proc. EACL Workshop on Ethics in Natural Language Processing. Valencia, Spain.
Klebanov, B. B., Burstein, J., Harackiewicz, J. M., Priniski, S. J., & Mulholland, M. (2017). Reflective Writing About the Utility Value of Science as a Tool for Increasing STEM Motivation and Retention–Can AI Help Scale Up?. International Journal of Artificial Intelligence in Education, 27(4), 791-818.
Burstein, J., Beigman Klebanov, B., Elliot, N., & Molloy, H. (2016). A Left Turn: Automated Feedback & Activity Generation for Student Writers. To appear in the Proceedings of the 3rd Language Teaching, Language & Technology Workshop, co-located with Interspeech, San Francisco, CA.
Burstein, J., & Sabatini, J. (2016). The Language Muse Activity Palette: Technology for Promoting Improved Content Comprehension for English Language Learners. Crossley, S.A. & McNamara, D.S., Adaptive Educational Technologies for Literacy Instruction. Taylor & Francis, Routledge: NY.
Madnani, N., Burstein, J., Sabatini, J., Biggers, K., & Andreyev, S. (2016). Language MuseTM: Automated Linguistic Activity Generation for English Language Learners." Proceedings of the Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.
Beigman Klebanov, B., Burstein, J., Harackiewicz, J., Prinski, S., Mullholland, M. (2016). Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), NAACL 2016, San Diego, CA, USA.
Burstein, J., Elliott, N., and Molloy, H. (2016). Informing Automated Writing Evaluation Using the Lens of Genre: Two Studies. In Special Issue: CALICO Journal 33.1, 2016 (Guest Editors: Volker Hegelheimer, Ahmet Dursun, Zhi Li).
Shermis, M.D., Burstein, J., Brew, C., Higgins, D., & Zechner, K. (2015). Recent innovations in machine scoring. In S. Lange, Y. Haladyna, & M. Raymond (Eds). Handbook of test development, Second Edition. New York, NY: Taylor & Francis/Routledge.
Shermis, M., Burstein, J., Elliot, N., Miel, S., and Foltz, P. (2015). Automated Writing Evaluation: A Growing Body of Knowledge. In the Handbook of Writing Research (Eds. C. MacArthur, S. Graham, and J. Fitzgerald): Guilford Press: NY.
Burstein, J., Shore, J., Sabatini, J., Moulder, B., Lentini, J., Biggers, K., and Holtzman, S. (2014). From Teacher Professional Development to the Classroom: How NLP Technology Can Enhance Teachers’ Linguistic Awareness to Support Curriculum Development for English Language Learners. Journal of Educational Computing Research, 51(1): 119-144.
Somasundaran, S., Burstein, J., and Chodorow, M. (2014). Lexical Chaining for Measuring Discourse Coherence Quality in Test-taker Essays.The 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland, August 23-29, 2014
Beigman Klebanov, B., Madnani, N., Burstein, J., and Somasundaran, S. (2014). Content Importance Models for Scoring Writing From Sources. In Proceedings of the Annual Meeting of the Association of Computational Linguistics, Baltimore, MD. June 23-25, 2014.
Burstein, J., Sabatini, J., & Shore, J. (2014). In Ruslan Mitkov (Ed.), Developing NLP Applications for Educational Problem Spaces, Oxford Handbook of Computational Linguistics. New York: Oxford University Press. Currently published online: www.oxfordhandbooks.com.
Burstein, J., Tetreault, J., & Chodorow, M. (2013). Holistic discourse coherence annotation for noisy essay writing. Dialogue & Discourse, 4(2), 34-52.
Beigman Klebanov, B., Burstein, J., and Madnani, N. (2013). Sentiment Profiles of Multi-Word Expressions in Test-Taker Essays: The Case of Noun-Noun Compounds. ACM Transactions on Speech and Language Processing (TSLP), 10(3), 12.
Beigman-Klebanov, B., Madnani, N, Burstein, J. (2013). Using pivot-based paraphrasing and sentiment profiles to improve a subjectivity lexicon for essay data. Transactions of the Association for Computational Linguistics.
Burstein, J., Sabatini, J., Shore, J., Moulder, B., and Lentini, J. (2013). A User Study: Technology to Increase Teachers’ Linguistic Awareness to Improve Instructional Language Support for English Language Learners. In Proceedings of the Workshop for Improving Textual Accessibility in conjunction with the Annual Meeting of the North American Association for Computational Linguistics, Atlanta, Georgia, June 14, 2013
Burstein, J., Tetreault, J., & Madnani, N. (2013). The E-rater® Automated Essay Scoring System. In Shermis, M.D., & Burstein, J. (Eds.), Handbook of Automated Essay Scoring: Current Applications and Future Directions. New York: Routledge.
Burstein, J., Tetreault, J., Chodorow, M.,Blanchard, D., & Andreyev, S. (2013). Automated Evaluation of Discourse Coherence Quality in Essay Writing. In Shermis, M.D., & Burstein, J. (Eds.), Handbook of Automated Essay Scoring: Current Applications and Future Directions. New York: Routledge.
Burstein, J., Beigman-Klebanov, B., Madnani, N., & Faulkner, A. Sentiment Analysis Detection for Essay Evaluation. (2013). In Shermis, M.D., & Burstein, J. (Eds.), Handbook of Automated Essay Scoring: Current Applications and Future Directions. New York: Routledge.
Madnani, N., Burstein, J., Sabatini, J., and O’Reilly, T. (2013). Automated Scoring of a Summary-Writing Task Designed to Measure Reading Comprehension. In Proceedings of the North American Association for Computational Linguistics Eighth Workshop Using Innovative NLP for Building Educational Applications, Atlanta, Georgia, June 13, 2013.
Burstein, J., Shore, J., Sabatini, J., Moulder, B., Holtzman, S., & Pedersen, T. (2012). The Language Muse system: Linguistically focused instructional authoring ETS RR-12-21. Princeton, NJ: ETS.
Burstein, J., Flor, M., Tetreault, J., Madnani, N., & Holtzman, S. (2012). Examining linguistic characteristics of paraphrase in a test-taker summaries. ETS RR-12-18. Princeton, NJ: ETS.
Burstein, J. (2012). Fostering Best Practices in Writing Instruction and Assessment with E-rater®. In “Writing Assessment in the 21st Century—Essays in Honor of Edward M. White, Norbert Elliott and Les Perelman (Eds.).
Beigman-Klebanov, B., Burstein, J., Madnani, N., Faulkner, A., and Tetreault, J. (2012). Building Subjectivity Lexicon(s) From Scratch For Essay Data. In Alexander Gelbulkh (Ed.), Springer Lecture Notes in Computer Science. Berlin: Springer-Verlag.
Burstein, J. (2012). Automated Essay Scoring and Evaluation. In Carol Chapelle (Ed.)., The Encyclopedia of Applied Linguistics. Wiley Blackwell. Marlden, MA.
Burstein, J., Tetreault, J. and Andreyev, S. (2010). Using Entity-Based Features to Model Coherence in Student Essays. Proceedings of the HLT/NAACL Annual Meeting, Los Angeles, June 2010.
Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K. (2010). Automated essay scoring: Writing assessment and instruction. In E. Baker, B. McGaw & N. S. Petersen (Eds.), International Encyclopedia of Education (Vol. 4, pp. 20-26). Oxford, UK: Elsevier.
Burstein, J., Shore, J., Sabatini, J., Lee, Y., & Ventura, M. (2007). The automated text adaptation tool. In Demo Proceedings of the the annual conference of the North American chapter of the Association for Computational Linguistics (NAACL-HLT 2007), Rochester, NY.
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater v.2.0. Journal of Technology, Learning, and Assessment, 4(3).
Higgins, D., Burstein, J., and Attali, Y. (2006). Identifying Off-Topic Student Essays without Topic-Specific Training Data. In J. Burstein and C. Leacock (eds). Special Issue of Natural Language Engineering on Educational Applications Using NLP.
Shermis, M. D., Burstein, J., & Leacock, C. (2006). Applications of computers in assessment and analysis of writing. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research. New York: Guilford Publications.
Burstein, J. and Higgins, D. (2005). Advanced Capabilities for Evaluating Student Writing: Detecting Off-Topic Essays Without Topic-Specific Training. Proceedings of the International Conference on Artificial Intelligence in Education, July 2005, Amsterdam, The Netherlands.
Burstein, J., Chodorow, M., and Leacock, C. (2004). Automated Essay Evaluation: The Criterion Online Service, AI Magazine, 25(3), 27-36.
Higgins, D., Burstein, J., Marcu, D., and Gentile, C. (2004). Evaluating Multiple Aspects of Coherence in Student Essays. Proceedings of the HLT/NAACL Annual Meeting, Boston, May 2004.
Burstein, J. and Wolska, M. (2003). Toward Evaluation of Writing Style: Overly Repetitious Word Use. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Budapest, Hungary.
Burstein, J., Marcu, D., and Knight, K. (2003). Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays. In S. Harabagiu and F. Ciravegna (Eds.) Special Issue on Advances in Natural Language Processing, IEEE Intelligent Systems, Vol.18, no. 1 pp. 32-39.
Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., and Harris, M. D. (1998). Automated Scoring Using A Hybrid Feature Identification Technique. In the Proceedings of the Annual Meeting of the Association of Computational Linguistics, August, 1998. Montreal, Canada.