In 2005, the central bank in India, RBI (Reserve Bank of India), announced a bank expansion policy to promote financial inclusion in districts that had remained "under-banked". Banks were incentivized to open customer-facing branches in these underbanked districts. The underbanked status of a district was determined by the population-to-branches ratio: a district was called underbanked if its population-to-branches ratio exceeded the national average (see Young (2021) for a review of the policy framework). This discontinuity, thus, can be used for causal inference, as a number of papers have done across contexts (see for example, Young (2021), Khanna and Mukherjee (2023), Gupta and Sedai (2023), Cramer (2025)). In my research, I use this policy to explore spatial heterogeneity in job loss post demonetization (Nov 8, 2006) and its consequent impact on household resource sharing. Essentially, districts on either side of the cutoff, which were otherwise similar, differed in the number of bank branches that they received due to this policy and, hence, differed in the economic severity of demonetization.
We need to construct our running variable, population-to-branches ratio, and we need the treatment status (whether the district got the "underbanked" status from RBI). We use three main data sources:
Population data (Census 2001): The RBI relied on 2001 Census data for the population numbers. One can access this data from the Official Census Website but very often, this link doesn't work. Another reliable source for this information is the EFA website (maintained by Prof. Arun C. Mehta).
Number of Branches by District: The RBI publishes quarterly reports on the number of bank branches operating in a district. This data can be accessed under Statement 4A here.
Underbanked Districts: In its first Master Circular dated Sep 08, 2005, announcing the policy, the RBI published a list of districts that it called underbanked. Subsequently, a final list of these underbanked districts was published on Jul 01, 2006. These lists can be found on the RBI website: Go to the list of all Master Circulars here. Under Archives, go to 2005-2006 (for the Sep 08, 2005 circular) or 2006-2007 (for the Jul 01, 2006 circular) archives to get the two relevant Master Circulars titled "Master Circular on Branch Authorisation": Sep 08, 2005 (see Annex III) and Jul 1, 2006 (see Annex V)
Note: It is okay to use the list from 2005 or 2006 since they are very similar - some papers listed above have used the data from 2005, while others have used 2006 data. One should be careful, however, to use the appropriate bank branches data: If using the 2005 underbanked list, use 2005 Q2 data on bank branches, and for 2006, use 2006 Q1 data.
Since the RBI used 2001 population data to generate its measure of population-to-branches, we must match the 2005/2006 district-branch data to the 2001 district boundaries. There isn't a reliable source on the Internet that has done this, to the best of my knowledge, so I have compiled an almost exhaustive mapping of districts that were formed after 2001 and matched them to their 2001 boundaries. For districts with multiple parent districts, one can form a "super district" (Young (2021)).
Once these datasets have been merged, we can setup the first-stage with Pr.(Underbanked) being the dependent variable and population to branch ratio being our running variable. The national average of the population to branch ratio comes out to be 14917 (or 6.7 branches per 100,000 people) which will be our cutoff. This is very similar to what the literature has found. For example, Khanna and Mukherjee (2023) use 6.6 branches per 100,000 people cutoff in their paper. We see that there is almost perfect compliance with the policy rule: Policy Compliance Figure.
I will post my code for merging these datasets soon. In case you have questions about that, please feel free to email me.