This site provides the data for the Word Mapper app hosted by Quartz.


The original Word Mapper app lets users map the relative frequencies of the top 10,000 most common words in an
8.9 billion word corpus of 890 million geocoded Tweets collected from across the contiguous United States between 11 October 2013 and 22 November 2014. The app was created by Jack Grieve, Andrea Nini, and Diansheng Guo for the Trees and Tweets project, funded by AHRC/ESRC/JISC/IMLS as part of Digging into Data 3. The Quartz version was designed by Nikhil Sonnad and maps the 97,246 words that occur at least 500 times in the corpus.


The four word-by-county regional data matrices used for Word Mapper are
available for download here. The first matrix contains the relative frequencies per billion words of the 97,246 words measured across 3,075 counties and the next three matrices contain the corresponding Getis-Ord Gi* z-scores, calculated using three different nearest neighbors spatial weights matrices.

See the papers and talks below for more information. If you use the data in your research, please cite one or more of the papers. You can contact Jack Grieve at with any questions.


Jack Grieve, Andrea Nini and Diansheng Guo. 2016. Mapping lexical innovation on American social media. In Review.

Andrea Nini, Carlo Corradini, Diansheng Guo and Jack Grieve. 2016. The application of growth curve modeling for the analysis of diachronic corpora. Forthcoming in Language Dynamics and Change.

Jack Grieve, Andrea Nini and Diansheng Guo. 2016. Analyzing lexical emergence in American English online. Forthcoming in English Language and Linguistics

Martijn Wieling, Jack Grieve, Gosse Bouma, Josef Fruehwald, John Coleman and Mark Liberman. 2016. Variation and change in the use of hesitation markers in Germanic languages. Language Dynamics and Change 6: 199-234.

Yuan Huang, Diansheng Guo, Alice Kasakoff and Jack Grieve. 2016. Understanding US regional linguistic variation with Twitter data analysis. Computers, Environment and Urban Systems 54: 244-255.


Jack Grieve. 2016. Identifying and mapping the spread of new words. Invited presentation at BAULT 2016, University of Helsinki, December 2, 2016.

Jack Grieve. 2016. Functional variation drives regional variation. Invited presentation at University of Edinburgh, November 10, 2016.

Jack Grieve. 2016. Using big data to map language structure and use. Invited Plenary at American Association of Corpus Linguistics 2016, Ames, Iowa, September 16, 2016.

Jacopo Rocchi, Andrea Nini, David Saad, Jack Grieve. 2016. Dynamics and equilibria in Twitter: Analyzing geographical lexical spread. Presented at IT Open Research Forum Workshop, London School of Economics, May 19, 2016.

Jack Grieve, Diansheng Guo, Alice Kasakoff, Andrea Nini. 2016. Trees and Tweets: Mining Billions to Understand Regional Linguistic Variation and Human Migration. Presented at Digging into Data Round 3 Conference, Glasgow, January 28, 2016.

Jack Grieve, Andrea Nini, Diansheng Guo, Alice Kasakoff. Using Social Media to Map Double modals in Modern American English. Presented at New Ways of Analyzing Variation 44, University of Toronto, October 22-25, 2015.
Jack Grieve, Andrea Nini, Diansheng Guo, Alice Kasakoff. Big Data for the Analysis of Language Variation and Change. Presented at From Data to Evidence: Big Data, Rich Data, Uncharted Data, University of Helsinki, October 19, 2015.

Jack Grieve, Andrea Nini, Diansheng Guo, Alice Kasakoff. Recent Changes in Word Formation Strategies in American Social Media. Presented at Corpus Linguistics 2015, Lancaster University, July 22, 2015.

Jack Grieve. Tracking the Emergence of New Words Across Time and Space. Invited Presentation at the Digital Science Speaker Series, London, May 26, 2015.

Jack Grieve. Corpus Linguistics for Regional Dialectology. Invited Presentation at UCREL CRS, Lancaster University, UK, May 14, 2015.

Jack Grieve. Big Data for Lexical Research. Invited Presentation at JISC's Digifest 2015, as part of the Big Data and the Dark Arts panel, Birmingham, UK, March 10, 2015.

Jack Grieve. Tracking the Emergence of New Words Across Time and Space. Invited Presentation at the Digital History Seminar Series, Institute of Historical Research, School of Advanced Study, University of London, February 24, 2015.

Jack Grieve. Mapping Lexical Spread in American English. Presented at the American Dialect Society Annual Meeting, Portland, Oregon, January 8, 2015.

Jack Grieve, Diansheng Guo, Alice Kasakoff, and Andrea Nini. Big-data Dialectology: Analyzing Lexical Spread in a Multi-billion Word Corpus of American English. Presented at American Association of Corpus Linguistics 2014, Flagstaff, Arizona, September 28, 2014.

Jack Grieve. Spatial and geostatistical analysis for regional dialectology. Presented at Methods in Dialectology XV, Groningen, Netherlands, August 11, 2014.

