This page lists datasets and code available at the group's internal shared directory at IU's high performance computing center, accessible to members via the Thinlinc client and found in the Slate project folder "PFWG21".
This is a dataset containing webscrapped information about properties listed for rent on AirBNB (2014-2019). The dataset includes a property file (host identifier, property identifier, geolocation, ratings, size, amenities, etc.), a daily file (price listed by day and property id), and a monthly performance report (host id, property id, days listed, days reserved, revenues, etc.). Code provided to clean and link across datasets with panel identifiers.
It is a nationwide dataset with daily pricing information.
o This data source is comprised of three datasets: property dataset, monthly dataset, and daily dataset. Each dataset communicates to each other through the propertyid variable (i.e. a unique identifier for each property listed).
§ Property dataset contains information on the characteristics of the house (i.e. location, size, number of bedrooms).
§ Monthly dataset contains data on monthly revenue aggregates for each property.
§ Daily dataset has daily pricing information on each property listed on Airbnb.
o Limitations: these data can only be accessed through RED. Moreover, these are massive files that could not be opened by any statistical software. For example, the raw file of the daily dataset is approximately 147 gb.
§ You can partition the data before doing the analysis. At the folder where the data is stored there is a README file explaining how to access and work with these data files.
Real estate transfer records, data available but not cleaned.
Raw national, state, and MSA/non-MSA data downloaded from the Bureau of Labor Statistics. Code for State data (1997-2020) cleans and processes into dataset of occupation by state by year panel.
Individual personnel files retrieved from each state as far back as 2009 to 2020 for 46 states. Information varies by state, but generally includes employee identifying information, position codes, department codes, salary, start/stop dates, compensation, etc. Code cleans raw data and harmonizes into employee panel dataset.
Contact: John Stavick
The public use micro dataset with code to import raw data and clean/compile into panel of local governments. Years 1967-2019.
IPUMS raw CPS monthly data, with code to clean and stack into panel data form. Years 2015 to 2021
Assorted Indiana data on local governments by year pertaining to budgets and property tax caps.
Historic Table 2 of individual income tax statistics raw data downloaded. Code imports and cleans data in each year.
Historic Table 2 of individual income tax statistics raw data downloaded. Code imports and cleans data in each year.
Through IU libraries we have access to IPREO. Here you can find pricing information on financial markets. In particular, it contains data of the municipal bond market.
· Note to users: In order to access IPREO you need to set up a new account at the library.
· Municipal Bond Data: the MSRC command allows you to do queries looking for bond-level data. The CUSIP code is the unique identifier for each bond.
o Variables: Contains key bond pricing variables like the price at issue, yield at issue, coupon rate, maturity, use of proceeds, type of bond (revenue bond, general obligation bond), credit rating, among others. Moreover, it has detailed information on financial intermediation variables.
o Limitations: like Bloomberg, IPREO is very restrictive in terms of data download. You need to download the data through several manual steps (i.e. tab by tab).
o Link:
Virginia's State Auditor of Public Accounts provides comparative reports of revenues and expenditures by function and object, as well as certain debt and capital information. We have raw data 1997-2020, along with code to import and clean selected years and assign FIPS code.
This is the Multi-Resolution Land Characteristics Consortium's National Land Cover Database. Included are 2016 lower 48 states and the land cover change from 2016 to 2021 land cover.
Map files downloaded from the U.S. Census Bureau.
The Bloomberg LP terminal could be accessed at the SPEA library. We have 2 terminals. You can access with a guest account or ask the librarian for an individual account. Navigating the Bloomberg terminal might be challenging at first, but it has almost all the information regarding financial markets.
· Note to users: the Bloomberg terminal has an excel add-in that needs to be installed before running Bloomberg. Every time you access the terminal first verify such add-in is indeed installed (i.e. just open excel and there should be a Bloomberg tab in the options at the top). To download data Bloomberg exports it into a spreadsheet, thus you need to verify Bloomberg and Excel are communicating correctly before attempting a download.
· Municipal Bond Data: the MSRC command allows you to do queries looking for bond-level data. The CUSIP code is the unique identifier for each bond.
o Variables: Contains key bond pricing variables like the price at issue, yield at issue, coupon rate, maturity, use of proceeds, type of bond (revenue bond, general obligation bond), credit rating, etc.
o Limitations: Bloomberg is very restrictive in terms of data download. Does not allows for massive downloads. It constrains each user to download a limited number of observations each time he/she accesses the terminal (i.e. around 10K bonds).
§ Learning how to narrow down the query becomes essential to download data efficiently.
o Financial Intermediation: Bloomberg lacks detail on the financial intermediation variables like the underwriter, bond counsel, and financial advisors, as well as the intermediation fees charged by all these agents.
· Financial Statement Variables: the FA command allows you to explore financial statements from state governments, and some local governments as well.
o Variables: here you can find detail on financial statements (balance sheet and income statement) across time and download an excel file with historic data. This data source comes directly from the government’s audited financial statements.
§ Limitation: you can only access one government at a time. Hence, massive download is not necessarily efficient. This functionality is more helpful if you want to look deeply at a specific government.
· Pricing Data: with the HP or LP command at the terminal you can access time series data on the pricing of financial instruments like stocks and bonds. For instance, the yield on treasury securities (i.e. sovereign bonds), and market indices (e.g. commodities, currencies, and interest rates).
§ To download this data you can build a spreadsheet within the Bloomberg terminal, or just display the data using the HP command and manually copy and paste it into a spreadsheet.
Through IU we have access to WRDS. Here there are several databases available. The one useful for municipal bond data is the MSRB. The Municipal Securities Rulemaking Board (MSRB) collects and makes publicly available information of municipal bond trades. This dataset is particularly useful if you are looking for secondary market data (i.e. bond trades after issuance).
· Variables: has secondary market pricing data.
· Pros: allows for massive download. You can extract the complete dataset (memory permitting). Bonds are uniquely identified by CUSIP, hence you can merge this dataset with bond and issuer characteristics coming from Bloomberg or IPREO.
o You can download only information for a sample of bonds obtained from Bloomberg or IPREO. When building a query you can use as input a txt file with the list of all CUSIPS (i.e. bonds) from which you want information.
· Limitations: does not contains all the bond/issuer information. For instance, this dataset does not tell you the name of the issuer, just provides a description of the instrument.
o Ideally, you can retrieve such information if you have a list of bonds from Bloomberg or IPREO. In that case, merging by CUSIP allows linking bond and issuer characteristics to secondary market pricing data.
· Link to IU: https://libraries.indiana.edu/information-about-wrds-datasets
· Link to WRDS: https://wrds-www.wharton.upenn.edu/login/?next=/pages/get-data/msrb-municipal-securities-rulemaking-board/municipal-securities-transaction-data/
S&P Capital IQ is commonly used in corporate finance. It has information of financial market variables, as well as firm-specific reports and statistics. For public finance, it has data of the S&P Municipal Bond Index (i.e. a set of state-debt indices), hence it provides time series data on the price of state governments’ debt.
Link to Bond data: https://www.capitaliq.com/CIQDotNet/FixedIncome/BarclayIndex/Index.aspx?companyId=332532024
· Note: you’ll need to create an account using your IU email.
· Some useful links: https://cdn.ihsmarkit.com/www/pdf/Markit-Pricing-Data-Municipal-bonds.pdf , https://cdn.ihsmarkit.com/www/pdf/Pricing-Reference-Data-brochure.pdf and https://www.spglobal.com/marketintelligence/en/mi/products/pricing-data-bonds.html
We have access to the compilation of financial statements for all subnational governments in the United States.
o Limitation: data comes in pdf files. It is very hard to compile and manage these files without some text processing code.
To access the library, please go to https://acfrs.reason.org and login using the username and password found on the PFWG Trello Board under "Project Resources"-->"Financial Statement Libraries"