Firm Name Cleaning

This page contains code to clean and harmonize firm names, either for general databases of firm names or specifically for the firm names from the Burning Glass / Lightcast dataset of online firm level vacancies. 
Here is a link to firm name cleaning code. This code can be applied with few alterations to any dataset of firm names that needs to be cleaned or harmonized.
Here is a link to a crosswalk (3.5GB size), which maps from raw firm names in Burning Glass / Lightcast data to cleaned and harmonized firm names. This crosswalk first applies the cleaning code above, and then uses a machine learning deduplication algorithm, Dedupe. The procedure to train the algorithm and construct the crosswalk is described in Appendix A1.1 of "National Wage Setting" (by Jonathon Hazell, Christina Patterson, Heather Sarsons and Bledi Taska). 
These codes were developed by Anna Stansbury, Gregor Schubert, Jonathon Hazell, Christina Patterson, Heather Sarsons and Bledi Taska. If you use these codes, please cite: 
Schubert, G., Stansbury, A., & Taska, B. (2021). Employer Concentration and Outside Options.
Hazell, J., Patterson, C., Sarsons, H., & Taska, B. (2024). National Wage Setting.