Demog Blog

A prospective age Lexis surface? Convoluted Spielen

posted May 1, 2016, 11:36 AM by Tim Riffe   [ updated May 1, 2016, 11:51 AM ]

One could simply plot remaining life expectancy on a Lexis surface. That'd be the smart thing to do. Not what I'm going to do here, which is Sunday goofing off.

First let's swap out age with prospective age. A prospective age is the age where you hit some some fixed level of remaining life expectancy. Since mortality changes depending on where and when you are prospective ages change too. This is a concept that Warren Sanderson and Sergei Scherbov use a lot. Here's the main go-to paper that explains it all in a compelling way. It's the basis of demographers starting to say things like "60 is the new 50", etc. People had been saying that kind of thing already, and noticing active aged people more and more, but the lifetable gives a nice objective basis for sayings like that. The idea is to take some fixed remaining life expectancy like 40 or whatever, and ask which age it belongs to. That usually requires some interpolation. So here's a calculator to do it (as well as a survival quantile calculator), and the image I want to explain (it's US females, by the way, HMD, as usual)

So, we have remaining life expectancy on the y-axis, and calendar year on the x-axis. The z-coordinate that is plotted is chronological age, which is represented both with color (darker blue higher age) and with labelled black contour lines. All the countours are increasing, which just means that remaining life expectancy is increasing at all ages. Where the contours are steeper it increased faster. There is a light 5x5 grid. for e(x) and year. Following any of the horizontals will tell you the Age at which the given remaining life expectancy was hit. So, for example, follow the e(x) = 25 line. Around 1945 it hits age 50, and in the most recent year it hits 60. Ergo 60-year olds today have the remaining life expectancy of 50-year olds in 1945, hence "60 is the new 50". It's awesome to find quick increases, but this plot doesn't have many (many other populations of the world have had more drammatic gains than the USA, by the way).

Finally the yellow descending lines are birth cohorts, and their wavy pattern shows how we've not only flipped, but also irregularly distorted the Lexis diagram. These allow for more comparisons. Of course the surface uses period mortality, so there's another level of distortion in it that simply isn't revealed. One could use cohort data for either a restricted range of e(x) and/or age, or else switch to a country like Sweden, which has a long series of data and lets you do pretty much any formal demog you want.

Here's the code, including a survival quantile lookup function, also based on a spline. You could also repeat the same Lexis-ish transformation with survival quantiles on the y-axis.... And you couldplot e(x) with one contour plot, and age with another contour plot above that. Oh dear, the possibilities are endless.

(FYI, to see this code, you probably need to actually click on the link to this post. I think it doesnt' show up at the top level blogish part of this website)

github gist

Decomposing the population pyramid à la Vaupel & Yashin (1987)

posted Mar 1, 2016, 10:09 AM by Tim Riffe   [ updated Mar 1, 2016, 10:10 AM ]

I'm a fan of Vaupel & Yashin (1987) Repeated rescuscitation: How lifesaving alters lifetables (this paperpossibly paywalled) Here is a Jstor Link to the same article. 

Under some strict assumptions, they derive how hypothetical improvements in mortality rates can be translated to saved lives, but that the process can be repeated indefinitely. Among the items derived is the composition of the stationary population by the numbers of times individuals have been saved. In the article, they decompose period l(x) schedules by making period comparisons. One thing that I'm always messing around with is the idea that cohort longevity has typically (always?) been better than that belonging to the period lifetables in which cohorts are born. For example, I was born in 1981, with a period e(0) of 70.81 . Ya right! By cohort is going to outperform that by a mile! One could already say that the force of mortality that I've been winning against so far my whole life is a perturbation of that which was observed in the USA in 1981. In this way, the 1981 period mortality schedule is the one I'm comparing to. I'm now 34 (I think...), so I have 35 single-age values of cohort m(x) of my own that can be compared with the period m(x) from ages 0 to 34 in 1981. Make sense?

The difference in these two series of m(x) can translate into uppercase Lambda from equation 10 and onwards in the above-linked paper. I can also get l(x) from these period and cohort m(x) series for the sake of comparison, and this gives everything needed for eq 10 in the paper. Ergo, I can decompose my cohort into those whose lives have been saved 0,1,2,... times due to improvements in mortality since we were born. And likewise for all the other cohorts passing through the population pyramid this year.

In order to break down the upper ages of the pyramid into how many times they've hypothetically been saved thus far, we need a mortality series stretching far enough back in time. The HMD contains a few such series. I'll take Sweden, because it's everyone's toy dataset for stuff like this: you just know it's going to work before you even start!

So, following my period-cohort comparison of mortality schedules to derive cumulative rate improvements, we get the following decomposition of the 2012 Swedish population pyramid:

The central off-white area are those whose lives have 'never' been saved, and then each successive shade of purple increments the number of times you've been saved. Looks like most 80-year females (right side) have been saved at least once, for instance.

This is all hypothetical- it assumes that ongoing mortality risk is the same for those that were saved 0 or 1 or 2 or more times. Inclusion of frailty would change the whole picture. But then we've never seen a pyramid decomposed by a frailty distribution because they're hypothetical too, not really observable directly unless you make more assumptions to instrumentalize the notion.

Here's the code:

Vaupel & Yashin (1987) in R

Note that one of the data objects required to reproduce this is uploaded at the bottom of this post. You could download the cohort death rates from the HMD, but these are not available for cohorts with fewer than 30 observations, which we need. So I derived them straight from the HMD raw data, which is a bit extra work.

FAQ: The 1/3 and 2/3 in the HMD version 5 exposure formula

posted Feb 25, 2016, 11:55 AM by Tim Riffe   [ updated Feb 26, 2016, 1:47 AM ]

It doesn't take that much work to convince oneself that the HMD population exposure formula (version 5, soon to be incremented) makes sense (if you accept the assumptions). The assumptions are simple: Assume that deaths in the upper and lower Lexis triangles are uniformly distributed (and no migration). This is the formula:

Exposure = 1/2 * January 1st population  -  2/3 deaths in the upper triangle + 1/2 December 31st population + 1/3 deaths in lower triangle

It's easy to accept the (Jan+Dec) / 2 part of the formula, but a bit more difficult to wrap one's mind around the - 2/3 and the +1/3 parts for the triangles. 

If you look in Appendix E of the version 5 protocol, you'll get the calculus-based explanation:

Fair is fair, but it's not intuitive to that many people, so I'll show it numerically (you need to play along in R), and then give a non-rigorous geometric explanation, which is however intuitive (I think).

Next some code and an inspirational Lexis square to prove it to yourself: Jan 1 pop is P1, and Dec 31 pop is P2. Deaths in upper triangle are DU, and DL is the lower.

This Lexis square is a bit awkward to prove the 1/3, 2/3 thing graphically, so I'll put an equilateral one at the end, which might help. First note that all the tdeaths in DU should have been counted alive in P1. So, 1/2 * P1 would be an overestimate of the years those people lived in the triangle. So, we need to subtract some portion of DU to account for the years not lived in the triangle due to deaths. Under uniformty, the number of years they lived in the triangle on average is 1/3, so we need to subtract 2/3. We still need to show that 1/3 is the average. In the same way, the deaths in DL are not counted in P2, which means 1/2*P2 is an underestimate so we need to add some portion of DL, and they average years they lived in the triangle is also 1/3.

Here's R code to simulate this numerically, which I'll explain a bit here. First, the years lived by individuals passing through the square are diagonals, but this way of drawing the Lexis diagram stretches lifelines by sqrt(2) because they're shown as a hypotenuse... So that's why the image is awkward. Instead, from this image, imagine the life lived by those dying in the upper triangle as the distance in x from P1, whereas the life lived by those dying in the lower triangle is the distance lived in y from the lower bound of the square.

Follow along in the code to see that the average years lived in each triangle are 1/3, and that the rest therefore makes sense. 

Show it in R!

And here is a more visually appealing proof using the equilateral version of Lexis (annotation below)

If we constrain age, period, and cohort to use the same units we end up with equilateral Lexis triangles. Convince yourself that the orange circles are the center of gravity of each triangle (they are!). Now focus on the lower triangle. The lower line is 1 year long (as is the diagonal, because it's equilateral...). If I cut the lower line into thirds, we get the points a,b,c, etc. We know that the distance from a to b is 1/3, ergo, and from b to c is 1/3. Now notice that the points b,c,d form a new equilateral triangle. And so we know that the segment b,d is 1/3 long as well. Ah! And that's the one that lifelines are parallel to! and since the uniform average of the triangle is at d, we know that it took on average 1/3 of a year to get to d. Voila, we are now convinced about the 1/3 thing in the triangles.

move along, nothing to see here.

#PAA2016 where to find me

posted Feb 5, 2016, 6:51 AM by Tim Riffe   [ updated Feb 5, 2016, 6:53 AM ]

Here is my PAA 2016 program entry:

You can also find me on Wednesday, March 30 at the HMD side meeting in the morning, and at the dataviz workshop in the afternoon. I intend to sleep at night. Most paper submissions currently on the website will be updated before March 7 too.

dataviz workshop pre #PAA2016 @PRBdata @minnpop @MPIDRnews

posted Dec 11, 2015, 5:38 AM by Tim Riffe   [ updated Mar 3, 2016, 9:30 AM ]

The following annoucement went out in the most recent issue of PAA Affairs:

PAA Data Visualization Workshop 2016: PAA attendees with an interest in data visualization are invited to attend a pre-PAA workshop to be held on Wednesday, March 30, 2:00-6:00 pm at the Population Reference Bureau office in Washington DC (close to the PAA venue and located at 1875 Connecticut Ave NW, Suite 520). This workshop will include a mixture of short presentations and lots of hands-on exercises with a special focus on visualizing demographic data (stocks, flows, intensities, etc.) in commonly used communication media, such as articles and presentations. All levels of experience are welcome. There is no participation fee, but space is limited. If you are interested in participating and/or would like to be on our listserv, please send an email to Audrey Dorélien and Tim Riffe at This workshop is supported by the Max Planck Institute for Demographic Research, the Minnesota Population Center, and the Population Reference Bureau.

Here's a flyer that Erica Nybro put together (graphic from yours truly, code here)

More program details are available below. I think that all levels of dataviz saaviness could benefit from this workshop, and it's a short walk from the Marriot to PRB (or 1 subway stop): 

Also, if you bring your PAA presentation, there's a good chance we can arrange from group critique of it's visual characteristics. Likewise for yet-to-be-printed posters, but in that case, you'd need to arrange for quick printing yourself, which might cost more than your other options. Also, it'd be best to let us know well in advance (via that special email address!, not our personal ones), so we can have a chance to work it in. The venue probably holds 35-40 people. Organizers and host participants might end up being 6-8 of those, so space is limited. If you're interested, please email 
paadataviz2016 (at)

Here's the agenda:

Welcome (10 min) Peter Goldstein

Session 1 (40 min)

Audrey Dorélien Visual Perception and Best Practices for Tables and Graphs

Jon Schwabish Visualizing and presenting data better

* break (10 min) *

Session 2 (40 min)

Jonas Schöley Guidelines for using color

Tim Riffe Visual communication in demography

* refreshments (20 min)*

Small group critique of participants’ active work (2hrs)

Participants are encouraged to send the organizers a current version of a figure or table with underlying data. We will make a selection to work on in small groups. These will be shared with all participants, as well as the workshop results. Selected participants will be asked to present their problem very briefly before starting group work. This section is then organized as follows:

  • Problem introductions (30 min)

  • Work through first set of problems (45 min)

  • Work through second set of problems (45 min)

And some commentary on the agenda: part I consists in wise people imparting wisdom. Audrey and Jon (pending) will give broad and general comments and Jonas will give a rigorous presentation on color (where we could all do a better job!). I'll lead up the snarky part II, which will consist in finding published figures or tables in demography journals and improving them. I'll try to keep the variety up! Finally, part III, which we hope will be the bulk of the workshop, will be hands-on. Participants that want to can email us with a project: a figure from a paper in progress that needs some help, an aspect of a poster for this very PAA that still hasn't been printed, a slide from this very PAA that still hasn't been finalized. You'd of course also need to provide the data. If there is a big response we'll have to do a selection, but we'll play it by ear. So you'd present what the goal is and what's been done so far (like 2 or 3 min maybe), and then we'd break into working groups, each led up by one of the presenters/organizers. I expect we can all learn from each other in these excercises. We'll just take it as far as we can in an hour. This is not software-specific, so we'll just make do with what we have. I think we'll get some awesome results!

Schoen's 'del', a very fine index if there ever was one

posted Aug 28, 2015, 5:11 AM by Tim Riffe   [ updated Aug 28, 2015, 5:12 AM ]

published on 45th anniversary of the article (August, 1970), with month -precision....

I've heard Andrew Noymer bring up Schoen's Δ (del) a couple times now in the context of 'good lifetable summary indicators that have been passed over, but without any obvious reason'. Usually it takes three to tip me over, but this time it was two mentions. Maybe because we're talking about demography tools. Well, it should have been one mention!

 Δ is just the geometric mean of a mortality rate schedule, and it has lots of neat properties that you can read about in the short linked article. One of them is that ratios of Δ for two populations (or sexes, or years, etc) can be interpreted in a straightforward way: if the ratio of male Δ to female Δ is 2, then male mortality rates are twice as high. This is not the case for life expectancy or the age standardized death rates. If you halve mortality rates, you don't double life expectancy, and so forth. Age standardized death rates are arbitrary due to the use of a standard (even if some standards are common practice...).

So here are all Δ in the HMD (or their inverse, actually...), for both sexes (males blue, females red):

Talk about divergence!!!

If we want to study trends in divergence / convergence, and if we want to make group comparisons in mortality, there is a good argument that this is the measure we should be using. You can decompose differences in much the same way as we decompose differences in life expectancy, and so forth, partitioning the difference out to ages and causes. Just say'n.

Here's the code:


Tic tocs up, toc ticks down, relativized, averaged

posted Aug 21, 2015, 5:27 AM by Tim Riffe

If you haven't seen the Time Flies page yet, you ought to, it's super cool:

That viz (can you call it a dataviz? there's no data... concept viz?) got me thinking. The basic notion is that the meaning of a year get's relativized to the amount of time you've lived. As we grow older, the proportion of our life that a given year takes up is less and less. This is all because our reference period (lived life) is growing. Anyway, it's a theoretical optimum, rather than actual perception, but it coincides with the anecdotes people have about time flying faster all the time. I googled a bit and found that plenty of people actually study the perception of time as a function of age. That's awesome.

I then thought, if you knew how long you'd live, then you'd know how long you have left, and this could be the reference rather than years lived. Let's call this, forward-looking relativized, as opposed to backward-looking relativized. Forward-looking relativization is symmetrical to backward-looking relativization. If you knew when you were going to die, you'd have both durations to relativize to. Then what would you think? How would this change the way you make decisions? Does running out of time make you savor? Does it make you slow down and notice details? At mid life, would we switch perspectives, always referring to the shorter segment of life (the one behind or the one in front). Maybe we'd take an average? But which kind of mean? When you don't know, use the arthmetic mean! And here's how the directionally-averaged perception of a unit of time looks by years lived, years left, and lifespan (in an ATL diagram).

You'd of course need to weight the lifelines in there, possibly using d(x) from the lifetable, or some other population weights. Plenty of imagining to do in this direction. 

* In this case, the notions of fast and slow can easily flip, depending on what kind of mean you take ... In fact, even in the given arithmetic representation, you could swap slow and fast, and it'd still be legit. mental yoga.

HT #Lotka (1934)

posted Jul 29, 2015, 8:29 AM by Tim Riffe

Today's quote:

"... the question arises: would not a treatment of demographic problems that based itself on hypotheses in order to extract necessary conclusions be of doubtful practical value? We would be powerfully misled in viewing matters that way. The conditions that present themselves in an actual population are always excessively complicated. Whoever has failed to grasp clearly the necessary relations among the characteristics of a theoretical population subject to simple hypotheses, will certainly be unable to manage in the much more complicated relations that exist in a real population. If one has wavered in the attack on a simple problem, he will assuredly stumble in the face of very serious complications. It is for this reason that authors who profess little respect for the application of mathematical analysis to demographic problems are those who in their writings present us with horrible examples of the confusion that results from striving to resolve by an avalanche of words problems whose complexity imposes on us the use of the condensed language of mathematics." - A. Lotka (1934, 1939, both in French) This translation was from D. P. Smith and H. Rossert (1998)

True then, true today. HT to VCR for sending me to this manuscript.

*It turns out to be useful to have a large number of such retorts on hand for the kind of work I do.

mid-lab-talk summary

posted Jul 21, 2015, 5:58 AM by Tim Riffe

I plowed through a bunch of dizzying diagrams, then repeated the exercise with these words:

APC surface

two dimensional hacks

three views it gives

three views it lacks

A dimension of time

of life and of death

with cohorts defined

by their very last breath

topsy turvy

realigned in a mirror

in TPD time

death grows nearer

but to see variation

within spans of life

we need to cut time

with a different knife

you can squeeze out time

or slice right through it

the ATL plane

is a good way to view it

tic tocs up

toc ticks down

add’m together

and lifespans abound

a series of planes

all stacked in a row

gives six temporal views

of stocks or of flows

and as models go

there are more and less clever

we just need to see

these perspectives together

really it’s easier

than origami

just triangle slices

cut like salami

get six dimensions

for the price of three

it’s an aesthetic result

between you and me

this is not a tale

of statistical inference

ask and I’ll give you

complete indifference

but can it do tricks?

can we make any money?

-it tells party jokes

but you won’t find them funny

turns out there’s a hoax

a crime to expose

a detective adventure

that I can’t solve with prose

And then moved on to CSI Rostock: "The case of the mis-specified morbidity pattern"

Next stop, Prague!

6 dimensions of demographic time #demography

posted Jul 16, 2015, 5:06 AM by Tim Riffe   [ updated Jul 19, 2015, 11:12 AM ]

This post will be woefully short. Basically, you know how with APC you buy two and get the third for free? That is, you really only have two pieces of info with APC: well, if you had three pieces of info you get SIX indices! The six indices are chronological age (A), period (P), birth cohort (C), thanatological age (T), death cohort (D), and lifespan (L). In short, you only need 3 pieces of information to build out a 3-d temporal space. For example, with 1) birth cohort, 2) lifespan, and 3) a position in time (period), then we get chrono age, death cohort and thano age for free! Who doesn't like free things?! This is an ongoing project of mine to build out a 3-d Lexis-like space. The projection you see in this WebGL object follows the right-angles that are all around in demography, whereas an isotropic (time proportions are same in all directions) version of the same space ends up being a tetrahedral-octahedral honeycomb (say what?!). This beast of a diagram was done using the rgl package in R, which lets you save to WebGL, which lets me save the thing so you can see it in a browser. But if you come hang out then we can make one out of Zometool

This is a still-shot. Just click the image to go to the interactive (twirly) one, or visit:

I'm excited to present this (and a proper buildup to it) at a lab talk next week for the Population and Health Lab at the MPIDR. I'm also excited to present it at the upcoming EAPS "Changing patterns of mortality and morbidity : age-, time-, cause- and cohort-perspectives" workshop in Prague. By then, Jonas Schöley will have been working on an interactive Shiny App to view data that permit the use of such coordinates, and Pancho Villavicencio will be helping my dot the i's and cross the t's when it comes to describing the geometry of all this. Can you say "let's calculate some new kinds of rates!"? Demography rules.

1-10 of 135