Population Distribution by Single Years of Age
Keith Greiner
August 8, 2020
This page contains two videos that animate the US population by age demographic from 1880 through 2010. The animation shows the amazing changes in our population over the 130 years since James Garfield was President. The population distribution helps us understand how human behavior can change as various population segments move from younger to older age groups. The videos each present 14 distribution graphs. Each graph presents the number of people on the y axis, and the age of those individuals on the x axis. For 1890 we see a simple pattern that has more young people than old. Basically, people are born, and then they die at a linear rate. However, beginning with the 1920 Census, we see the beginnings of a pre-WWII baby deficiency that is undoubtedly the result of economic downturns affecting each family's ability to pay the expenses involved in child rearing. At the end of WWII, and as seen in the 1950 Census, there is a dramatic increase in the population. The increase is widely known as the post-WWII baby boom. The distribution for 1970 shows a peak in population for people born around 1962, which is the same time that birth control pills became widely available.
The 1890 graph shows a deficiency of individuals under the age of 2. An analysis of the 1900 Census suggests that the 1890 data is in error. For the presentation, I did not modify the published data, even though it is likely the result of an error in the original data collection and tabulation.
The 1990 graph shows much more variation than other years. This appears to be the result of processes used by the Census to smooth some data in other years, but not in 1990. My presentations of the data are as published by the Census at the time the data were obtained, and have not been altered by me.
The 2010 graph includes an Excel polynomial regression line of order 4. It includes the formula, and the R2 value of 0.97279. That is an excellent value. Now that we can see the trend line, we can also see the portion of the curve the is below the line, having a baby deficiency, and the portion above the line which indicates the baby boom.
After the 2010 slide, there is a presentation of the distribution of males and females on the same chart. Some people who publish these data will present an upper/lower chart or a left/right chart that is called a population pyramid. That type of chart makes it impossible to compare the two distributions. Well, the slide in this presentation puts the two distributions on the same chart and shows a remarkable patten of the number of males vs. females by age.
In the early years of the graphs, notice there are peaks in population every 10 years, with minor peaks every five years. This is due to the practice of age-rounding in the Census data.
Each slide is shown for 3.0 seconds, with a 0.7 second transition, so if you think it is not moving, .... it is. The entire video takes about 1:30.
The following link shows the historical distributions with the typical college-age segment (ages of 18 to 24) highlighted.
Notice how that segment increases and drops between the Census intervals.
The graphs of population distributions by single year of age had an important role in an analysis I completed between 1989 and 1993. That work resulted in an article published in the Winter, 1994 edition of Chance magazine. Chance is a magazine of the American Statistical Association. One premise of the article was that there were concerns about the exponential population growth described by Paul Ehrlich in his 1968 book, The Population Bomb. Ehrlich’s projection model was shown to be inconsistent with the actual growth seen in the United States between 1890 and 1990. Below is an update on the growth charts. Here, I compared actual total population growth to Excel trend lines created under three assumptions: linear regression, linear regression with an exponential trend line, and a regression with a 6th degree polynomial trend line. The three graphs are shown below.
Graph 1 shows the actual U. S. Population between 1890 and 2010, with an Excel linear regression trend line. Here, the R2 value is 0.97645, which is very good. Still, as you can see, there isn't a perfect track of the trend line along the red line. On the right end of the graph the trend line falls below the actual. That is to be expected when working with stochastic data. If the final part of the line accelerates then the linear trend will be below it. If the trend line were to be extended, we could expect that the actual would likely be above it, at least until the actual drops below for awhile.
Graph 2 takes us closer to Ehrlich's model. Here, we use the Excel option for an exponential trend line. The R2 value is larger, at 0.98848, but on that crucial right end of the trend, we see the trend line is accelerating faster than the original data. An extension of the trend line, based on the exponential model will likely over-estimate the total population.
The third graph shows the original data matched with an Excel polynomial trend line. This analysis has an R2 value of 0.99886, which is the highest value of all. We could conclude that is wonderful. But what about projections? As it turns out, I have found a polynomial trend projection can be just as problematic as linear and exponential. For the moment, I'll stay with that, pending some analysis to be conducted in the future.
So, how do these three graphs inform the population trends? It is possible that could be the subject of vigorous debate. For the time being, I'll suggest that the growth is still not at the exponential level proposed by Ehrlich It may be increasing, but it is not yet exponential. That may be because of the leveling of the population by single years of age distribution for people born after 1962. Imagine that, ideally, the distriution in 2010 should look more like the distribution in 1890 where people are born and then die at roughly a linear rate. Well, at least that is an area of possible future inquiry.