Dr. Asad Zaman
Notes on meeting of Sub-Committee of the Governing Council on preparations for 6th Population and Housing Census. Held on 4th September, 2013 at 11:00 AM in the Committee Room of PBS.
Summary Of Main Idea of this Note: From the discussions at this meeting, it appeared that the current plans for the 6th Census may not ever get executed. Furthermore, if they do get executed, it is likely that the outcome will be a very poor set numbers which will contain many types of errors, and create controversy and confusion. The goal of this note is to provide a low cost alternative, which will achieve the goals of the census with higher accuracy and at substantially lower cost.
OBSTACLES to carrying out the 6th Census: The following points were raised at the meeting:
1. Carrying out current plans would require a huge investment of resources, with money, manpower, engagement of the army on a large scale, and virtual shutdown of the nation for a couple of days. Current government is engaged in solving pressing problems, and carrying out census is not high on their list of priorities. It is unlikely that they will commit to such a huge investment anytime in the near future.
2. Given current security problems, large areas of the country are hard to access and count, including portions of Karachi. Thus high in-accuracies on the order of 6 to 7 percent may be expected in enumeration. Good training for a large staff of about 200,000 people required is virtually impossible to achieve, which means even greater inaccuracies in terms of answers to the questionnaires to be administered.
To summarize, because of political conditions the census is unlikely to be carried out in its present planned shape. Furthermore, even if it was carried out, there is likely be extremely large inaccuracies in the results. While 6% inaccuracies are tolerable in a small and inexpensive random sample of around 2000 people, this is completely un-acceptable in the results of an exercise based on an extremely expensive and exhaustive complete enumeration of the total population of crores of people in the country. Both of these points lead to the question of whether there is an alternative method which could be used to achieve the goals of the census.
A Feasible Alternative: Certain special features of current situation in Pakistan make it possible to achieve the goals of the census at a substantially lower cost, and with substantially greater accuracy, using a radical change in design. The main idea is to start with NIC’s as providing a base count for the population. The statistical nature of the problem changes substantial if conceived of as being one of UPDATING the NADRA database by adding all elements not already counted, instead of counting the entire population of Pakistan starting from scratch.
1. Use NADRA database as the FRAME for the survey. This is much easier then devoting two initial days of the survey to creating a new frame, or using the obsolete earlier frame. Given that the survey assumes that every household being surveyed will have at least one member with a NIC, this methodology provides a framework with virtually no cost, compared to current plans. Designing a valid frame from which to take a random sample is often the more difficult part of a census/survey undertaking. Here we have one from the beginning, which is currently not being exploited.
2. With the NIC as framework, the question being asked changes from “How to enumerate the entire population of Pakistan?” to “How to calculate the DIFFERENTIAL between the NADAR database and the ACTUAL population?” This requires different types of statistical techniques.
3. The first step would be to take a PILOT sample of about 2000 randomly selected NIC numbers. For each number selected in this initial sample, we would find out the household size, and ask other relevant CAREFULLY selected questions. A crucial parameter that we will attempt to measure is the proportion of the population without NIC cards (mostly youth, but certain others as well). With careful design, we can even include households without any NIC holders. With only 2000 interviews, trained experts should be available, and adequate security can be provided, so that the whole process should be completed in short time with relatively low expense.
4. Proper statistical analysis of this pilot sample will allow us to calculate many parameters of great interest:
a. The distribution of the youth (those not listed in NADRA database). Also the proportion of people with valid NIC’s to those who have not applied for one. Once we have this crucial parameter for the pilot sample, we can extrapolate to the whole population to get a preliminary estimate.
b. The VARIANCE of our estimate for this distribution. By comparing two estimates based on subsamples of size 1000 (of the full 2000), we can calculate variance as a function of the sample size. This will allow us to calculate the sample size needed to achieve any desired level of accuracy. Based on experience with similar surveys elsewhere, sample sizes of 60,000 can achieve accuracies of within plus or minus 0.5 %, which is substantially higher than the 6% expected accuracy of the complete census.
c. Information on validity, precision, informativeness of the questionnaires administered on the pilot sample.
5. Based on the pilot sample, we will get an accurate estimate of the correct sample size required for desired level of precision on ANY target quantity it is desired to estimate. The primary target quantity is the proportion of coverage by NADRA and the enumeration of the youth who are not covered by NADRA. But other target quantities such as district-wise enumerations of key characteristics may also be desirable. Experience indicates that the sample size required could be anywhere between 20,000 to 200,000 but certainly not more than that. This is a MUCH more manageable sample size than the complete enumeration of crores of people. It could be done with low visibility (required politically), low budget, and with complete training of all enumerators required.
In summary, I believe it is possible to achieve the objectives of the census by complete enumeration with substantially lower cost and effort based on a random sampling methodology, using NADRA database as the initial framework for the study. It will require re-thinking the entire plan, and an initial feasibility study by statistical experts which takes into account all the particular details of NADRA, and also the level (district/tehsil/thana) at which precision is desired. Even at slow bureaucratic pace, a suitable initial plan for the pilot study could be ready for execution within four to six months. The budget required for the pilot study would be “peanuts.” Analysis of results of Pilot Study and creation of the main study would require another four to six months. My guess is that we could have results within two years, if the plan is initiated soon.