Word Mapper

About

The original Word Mapper app lets users map the relative frequencies of the top 10,000 most common words in an 8.9 billion word corpus of 890 million geocoded Tweets collected from across the contiguous United States between 11 October 2013 and 22 November 2014. The app was created by Jack Grieve, Andrea Nini, and Diansheng Guo for the Trees and Tweets project, funded by AHRC/ESRC/JISC/IMLS as part of Digging into Data 3. The Quartz version was designed by Nikhil Sonnad and maps the 97,246 words that occur at least 500 times in the corpus.


Data

The four word-by-county regional data matrices used for Word Mapper are available for download here. The first matrix contains the relative frequencies per billion words of the 97,246 words measured across 3,075 counties and the next three matrices contain the corresponding Getis-Ord Gi* z-scores, calculated using three different nearest neighbors spatial weights matrices.

Relative Frequencies (per billion words)

Getis-Ord Gi* z-scores (5 nearest neighbors SWM)

Getis-Ord Gi* z-scores (20 nearest neighbors SWM)

Getis-Ord Gi* z-scores (50 nearest neighbors SWM)



References

For more information, see the papers below. Please cite if you use the data in your research.

Jack Grieve, Andrea Nini and Diansheng Guo. 2017. Analyzing lexical emergence in American English online. English Language and Linguistics 21: 99-127.

Yuan Huang, Diansheng Guo, Alice Kasakoff and Jack Grieve. 2016. Understanding US regional linguistic variation with Twitter data analysis. Computers, Environment and Urban Systems 54: 244-255.