Fallacy: The Goal of research is to estimate a regression model
TRUTH: Regression models are alway a means to an end. We have some hypothesis about the real world and how it works. There are three main purposes of regression models:
Descriptive -- it describes patterns of trends and correlations present in the data -- inequality has been increasing or decreasing, X and Y are correlated or not, this correlation has been increasing or decreasing, etc.
Assessment of Hypotheses: Economic hypotheses, like the idea that free trade leads to economic growth, can be assessed via regression models, which might provide support, or disconfirm the hypothesis.
Policy decisions. Regression models can provide estimates of the impact of one variable on another which can be useful for policy.
In all cases, there is some background theory which is used to justify the regression model, and the data sheds light on the theory. There is only one exception to this rule --
EXCEPTIONAL CASE:
that is the case of EXPLORATORY DATA ANALYSIS. This means we have no idea about how things work in the world and we run some regressions to explore; in this case the regression will be used to generate interesting hypothesis. For example, if we find a negative correlation between inflation and unemployment, we might come up with a hypothesis that reducing unemployment leads to inflation.
A regression model can be used to provide some support for this hypothesis. It can be used to describe some aspects of the real world which help us understand, or to make policy decisions.
NORMAL CASE:
In the usual case, we have some background theory. The regression is ONLY ONE element of support for the background theory. One can and should bring in other evidence, historical and qualitative to provide support (or to disconfirm) the hypothesis being researched. The main point is that the regression model is a MEANS to a goal, and not the END or the GOAL. One cannot run a regression and consider that the research is finished. We now give some specific examples.
EXAMPLES:
SOLUTIONS
So how can we fix these problems?
First understand that we dont start our research by saying: I am going to discover the effect of X on Y. We must start with a real world problem. For example, I start by saying: how much money in government budget should be spent on Higher Education Versus Primary Education. Note that this is a real practical problem. If we spend 1 million on education, should we put 500,000 into higher education and 500,000 into elementary schools, or more or less?
To answer this question, we will need to run regression which will tell us the rates of return to education in higher education and in elementary education. This will be one part of the information needed to answer this question. Other parts will require reading the literature on education to find out. The whole set of information needed to answer this question would be the subject of an MS Thesis, which will review all related literature. Regression will be just one part of the whole picture, not the entire focus of the thesis. For more detailed information, see How to choose a topic for MS Thesis Research.
Second understand the limitations of data. This should be understood in an intuitive way, directly, without doing statistics. Suppose I want to find out the effect of remittances on consumption. In my mind, I have the idea that remittances from abroad lead to luxury consumption, but not so much to investment. Or else that remittances are invested in land, leading to higher real estate prices. Can I find this out by using macro data? That is, I use annual data on remittances and annual data on luxury consumption versus investment, or on land prices. Will this tell us what we want to know? HIGHLY UNLIKELY. This is an observational study. Consumption and land prices have been subject to a lot of influences. Everything has been changing over the past twenty years. Wars, 9/11, Pakistan Atomic Bomb, embargo, Oil Price Hikes, etc. With only a small amount of data, it is hard to sort out the influences, and make sure that the changes we see are due to remittances, and not to some other factor. ON THE OTHER HAND, if we look at micro level household data in HIES or some such, there are thousands of households. Some have remittances and some dont. Looking at consumption patterns within these households, it should be possible to find out how remittances are spent, and to compare with case of no remittances. So here, micro level data should enable us to answer the question, which macro level cannot. This type of thinking/reasoning must be done in ADVANCE of running regressions, which can easily give misleading answers. For another example of a case where data is insufficient for the hypothesis being tested, but we can get results by CHANGING and INCREASING the data set, look at [link]