Shane Bergsma's Homepage

Shane Bergsma

shane.a.bergsma@gmail.com

On Google Scholar

Publications:

2026

M Elhoushi, N Dey, AD Pretko, BC Zhang, G Gray, G Gosal, A Mahmoud, S Bergsma, J Hestness, Don't Drop Dropout: Optimizing Layer Sparsity for Efficient LLM Training and Inference, To appear in ICML 2026.
S Bergsma, BC Zhang, N Dey, S Muhammad, G Gosal, J Hestness, Scaling with Collapse: Efficient and Predictable Training of LLM Families, In ICLR 2026
S Bergsma, N Dey, J Hestness, Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs, In ICLR 2026

2025

E Goffinet, S Bergsma, A Sheinin, N Vassilieva, P Nakov, G Gosal, PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets, AI That Keeps Up: NeurIPS 2025 Workshop on Continual and Compatible Foundation Model Updates
S Bergsma, N Dey, G Gosal, G Gray, D Soboleva, J Hestness, Power Lines: Scaling Laws for Weight Decay and Batch Size in LLM Pre-training, In NeurIPS 2025
N Dey, BC Zhang, L Noci, M Li, B Bordelon, S Bergsma, C Pehlevan, B Hanin, J Hestness, Don't be lazy: CompleteP enables compute-efficient deep transformers, In NeurIPS 2025
S Bergsma, N Dey, G Gosal, G Gray, D Soboleva, J Hestness, Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs, In ICLR 2025

2024

G Gray, A Tiwari, S Bergsma, J Hestness, Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers, In NeurIPS 2024
E Singh, S Bergsma, N Dey, J Hestness, G Gray, Empirical Upper Bounds for Unstructured Sparsity in Compute-Efficient Language Modeling, Compression Workshop @ NeurIPS 2024
N Dey, S Bergsma, J Hestness, Sparse maximal update parameterization: A holistic approach to sparse training dynamics, In NeurIPS 2024

2023

S Bergsma, T Zeyl, L Guo, SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting, In NeurIPS 2023

2022

S Bergsma, T Zeyl, JR Anaraki, L Guo, C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting, In NeurIPS 2022 [slides] [code] [poster] [openreview]
SM Iqbal, H Li, S Bergsma, I Beschastnikh, AJ Hu, CoSpot: A Cooperative VM Allocation Framework for Increased Revenue from Spot Instances, In SoCC 2022

2021

S Bergsma, T Zeyl, A Senderovich, JC Beck, Generating Complex, Realistic Cloud Workloads using Recurrent Neural Networks, In SOSP 2021 [Preprint] [code]
X Ke, C Guo, S Ji, S Bergsma, Z Hu, L Guo, Fundy: A Scalable and Extensible Resource Manager for Cloud Resources, In IEEE Cloud 2021

2019

C Chen, X Ke, T Zeyl, K Du, S Sanjabi, S Bergsma, R Pournaghi, C Chen, Minimum makespan workflow scheduling for malleable jobs with precedence constraints and lifetime resource demands, In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)

2014

C. Beller, R. Knowles, C. Harman, S. Bergsma, M. Mitchell, B. Van Durme, I’m a Belieber: Social Roles via Self-identification and Conceptual Attributes, In Proc. ACL 2014 (Short Papers). [pdf] [bib]
S. Bergsma, R.L. Mandryk, G. McCalla, Learning to Measure Influence in a Scientific Social Network, In Advances in Artificial Intelligence (LNCS Volume 8436, pp 35-46) AI 2014. [pdf]

2013

S. Bergsma, B. Van Durme, Using Conceptual Class Attributes to Characterize Social Media Users, In Proc. ACL 2013. [pdf] [bib] [data]
M. Post, S. Bergsma, Explicit and Implicit Syntactic Features for Text Classification, In Proc. ACL 2013 (Short Papers). [pdf] [bib]
S. Bergsma, M. Dredze, B. Van Durme, T. Wilson, D. Yarowsky, Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter, In Proc. NAACL-HLT 2013. [pdf] [bib][data]
M. Dredze, M. Paul, S. Bergsma, H. Tran, A Twitter Geolocation System with Applications to Public Health, In Proc. AAAI-13 Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI). [pdf] [bib] [code]
S. Bergsma, D. Yarowsky, Learning Domain-Specific, L1-Specific Measures of Word Readability, In Traitement Automatique des Langues Volume 54, Number 1. [pdf]

2012

- S. Bergsma, M. Post, D. Yarowsky, Stylometric Analysis of Scientific Articles, In Proc. NAACL-HLT 2012. [pdf] [bib] [data]
- S. Bergsma, P. McNamee, M. Bagdouri, C. Fink, T. Wilson, Language Identification for Creating Language-Specific Twitter Collections, In Proc. LSM 2012. [pdf] [bib] [code/data]

2011

S. Bergsma, B. Van Durme, Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images, In Proc. IJCAI 2011. [pdf] [bib] [slides] [poster] [code/data]
S. Bergsma, D. Yarowsky, K. Church, Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation, In Proc. ACL-HLT 2011. [pdf] [bib] [poster] [code/data]
C. Cherry, S. Bergsma, Joint Training of Dependency Parsing Filters through Latent Support Vector Machines, In Proc. ACL-HLT 2011 (Short Papers). [pdf] [bib] [code]
S. Bergsma, R. Goebel, Using Visual Information to Predict Lexical Preference, In Proc. RANLP 2011. [pdf] [bib] [slides] [code/data]
S. Bergsma, D. Yarowsky, NADA: A Robust System for Non-Referential Pronoun Detection, In Proc. DAARC 2011. [pdf] [bib] [code]

2010

S. Bergsma, E. Pitler, D. Lin, Creating Robust Supervised Classifiers via Web-Scale N-gram Data, In Proc. ACL 2010. [pdf] [bib] [slides] [data]
S. Bergsma, A. Bhargava, H. He, G. Kondrak, Predicting the Semantic Compositionality of Prefix Verbs, In Proc. EMNLP 2010. [pdf] [bib] [slides] [data]
S. Bergsma, C. Cherry, Fast and Accurate Arc Filtering for Dependency Parsing, In Proc. COLING 2010. [pdf] [bib] [slides] [code]
E. Pitler, S. Bergsma, D. Lin, K. Church, Using Web-scale N-grams to Improve Base NP Parsing Performance, In Proc. COLING 2010. [pdf] [bib] [slides]
S. Bergsma, D. Lin, D. Schuurmans, Improved Natural Language Learning via Variance-Regularization Support Vector Machines, In Proc. CoNLL 2010. [pdf] [bib] [poster]
D. Lin, K. Church, H. Ji, S. Sekine, D. Yarowsky, S. Bergsma, K. Patil, E. Pitler, R. Lathbury, V. Rao, K. Dalwani, S. Narsale, New Tools for Web-Scale N-grams, In Proc. LREC 2010. [pdf] [bib] [slides] [data]
S. Jiampojamarn, K. Dwyer, S. Bergsma, A. Bhargava, Q. Dou, M.Y. Kim, G. Kondrak, Transliteration Generation and Mining with Limited Training Resources, In Proc. NEWS 2010. [pdf] [bib]
R. Goebel, S. Bergsma, Y. Xu, C. Ringlstetter, M.Y. Kim, The Nature of Noise in Linguistic Corpora, In Proc. AND 2010. [pdf] [bib]
S. Bergsma, Large-Scale Semi-Supervised Learning for Natural Language Processing, PhD Thesis, University of Alberta. [pdf] [bib]

2009

S. Bergsma, D. Lin, R. Goebel, Web-Scale N-gram Models for Lexical Disambiguation, In Proc. IJCAI 2009. [pdf] [bib] [slides]
Q. Dou, S. Bergsma, S. Jiampojamarn, G. Kondrak, A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion, In Proc. ACL-IJCNLP 2009. [pdf] [bib]
- - Nominated, ACL-IJCNLP Best Paper Award
S. Bergsma, D. Lin, R. Goebel, Glen, Glenda or Glendale: Unsupervised and Semi-supervised Learning of English Noun Gender, In Proc. CoNLL 2009. [pdf] [bib] [poster]

2008

S. Bergsma, D. Lin, R. Goebel, Distributional Identification of Non-Referential Pronouns, In Proc. ACL-HLT 2008. [pdf] [bib] [slides] [data]
S. Bergsma, D. Lin, R. Goebel, Discriminative Learning of Selectional Preference from Unlabeled Text, In Proc. EMNLP 2008. [pdf] [bib] [slides]

2007

S. Bergsma, G. Kondrak, Alignment-Based Discriminative String Similarity, In Proc. ACL 2007. [pdf] [bib] [slides] [data]
S. Bergsma, Q.I. Wang, Learning Noun Phrase Query Segmentation, In Proc. EMNLP-CoNLL 2007. [pdf] [bib] [slides] [data]
C. Pinchak, S. Bergsma, Automatic Answer Typing for How-Questions, In Proc. NAACL-HLT 2007. [pdf] [bib] [slides]
S. Bergsma, G. Kondrak, Multilingual Cognate Identification using Integer Linear Programming, In Proc. AMML 2007. [pdf] [bib]

2006

S. Bergsma, D. Lin, Bootstrapping Path-Based Pronoun Resolution, In Proc. COLING-ACL 2006. [pdf] [bib] [slides] [data]

2005

S. Bergsma, Automatic Acquisition of Gender Information for Anaphora Resolution, In Proc. Advances in Artificial Intelligence, (LNCS,3501,© Springer Verlag). [pdf] [bib] [slides] [data]
- - Winner, AI'2005 Best Paper Award
C. Cherry, S. Bergsma, An Expectation Maximization Approach to Pronoun Resolution, In Proc. CoNLL 2005. [pdf] [bib]
S. Bergsma, Corpus-Based Learning for Pronominal Anaphora Resolution, M.Sc. Thesis, University of Alberta [pdf]

Teaching:

- CMPT 317, Introduction to Artificial Intelligence, University of Saskatchewan, Winter 2014.
- CS 600.405: Applications of Probabilistic Graphical Models in Language and Speech Processing, Johns Hopkins University, Spring 2011.

Presentation Materials:

You can find slides for most of my conference presentations below with the corresponding publication. In addition, I also provide here the presentation materials for some recent invited talks and other presentations:
- Better Together: Large Monolingual, Bilingual and Multimodal Corpora in Natural Language Processing, 2011 talks at Cambridge University, University of Pennsylvania (intended for an NLP audience). Slides in [pptx] [ppt] [pdf].
- Simple, Effective, Robust Semi-Supervised Learning, Thanks To Google N-grams, Invited Talk at the RANLP 2011 Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing on September 15, in Hissar, Bulgaria. Slides in [pptx] [ppt] [pdf].
- Three kinds of web data that can help computers make better sense of human language, Fall 2011 talks at York University, University of Saskatchewan, Stony Brook University (intended for general Computer Science audience). Slides in [pptx] [ppt] [pdf].
- Coreference Resolution using Web-Scale Statistics, most recently a Fall 2011 lecture at Stony Brook University (intended for an NLP audience). Slides in [pptx] [ppt] [pdf].

JHU Research Workshops:

- 2009 Workshop on Unsupervised Acquisition of Lexical Knowledge from N-Grams

Code/Data:

- Software Projects:
  - 1. ArcFilter: An efficient program that vastly speeds up arc-based dependency parsing. It filters arcs from the dependency graph before parsing begins. Used in our recent COLING and ACL papers. [@GoogleCode]
    2. NADA: A robust program for detecting non-referential (a.k.a. pleonastic, expletive, dummy) pronouns. It takes tokenized English sentences as input and finds occurrences of the word 'it'. When an 'it' is found, the system outputs a probability for whether the 'it' is a referential instance, or instead a non-referential pronoun. Described in our DAARC 2011 paper. [@GoogleCode]
    3. Carmen: A Twitter Geolocation System. "Given a tweet, Carmen will return Location objects that represent a physical location. Carmen uses both coordinates and other information in a tweet to make geolocation decisions. It's not perfect, but this greatly increases the number of geolocated tweets over what Twitter provides." Described in our HIAI paper. [@GitHub]
    4. ngramtools: Tools for searching and lexical knowledge acquisition from Google N-grams [@GoogleCode]
- Generally Useful NLP Data:

- - 1. Noun Gender and Number Data for Coreference Resolution. My most widely-used data, one of the standard resources in the Closed Task for the CoNLL 2011 Shared Task on Modeling Unrestricted Coreference in OntoNotes. Your coreference system should probably make use of it too! [GenderData] - if that doesn't work, try here: [GenderData]
    2. First name, last name, and location clusters from Twitter: Large-scale data mined from Twitter communication patterns. [Clusters]
    3. Distributional Clustering of Phrases: A clustering of a huge number of phrases from Google N-grams. [Clusters]

- Training and Evaluation Code/Data:

- - 1. *Manually-Annotated Data for Language Identification in Twitter along with a Python-based language-ID system [Tweets]
    2. *Manually-Segmented Search Engine Queries and Feature Data. This query data has become a standard evaluation set for Information Retrieval research. [Queries]
    3. Annotated and processed ACL articles used in our work on Stylometric Analysis of Scientific Articles. [Labeled ACL Papers]
    4. Evaluation code and data for Learning Bilingual Lexicons from the visual similarity of Web Images. [Visual Lexicon Materials]
    5. Evaluation code and data for our Coordination Disambiguation project. [Coordination Materials]
    6. Evaluation code and data for our Visual Selectional Preference project. [Visual Selectional Preference Materials]
    7. Evaluation data for our Robust Supervised Classifiers project. [Robust Data]
    8. It-Bank: An online repository of labelled instances of the pronoun "it": [It-Bank]
    9. American National Corpus articles with Annotated Anaphora Resolutions: [Annotated Anaphora Data]
    10. Evaluation data used in our Alignment-Based Discriminative String Similarity project. [Cognates]

Page updated

Google Sites

Report abuse