subscribe to posts here
Recent Entries

the joy of fertility
So I've been helping coteach an R for demographers workshop back at the CED in Barcelona, and one of the sessions I was responsible for was about base plotting. I gave as an exercise to make a stacked plot similar to the Joy Division Unknown Pleasures album cover but made of fertility curves from a data object I gave them. There many such dataviz projects out there, like this, this, and this, just to list some that come to mind. This way they'd get some experience with using the primitive plotting element polygon(), would have to write a function in order to draw one for each fertility curve, and would need to set up some sort of iteration in order to do so. So, three R concept covered in this exercise: plot device management, functions, and iteration. Not bad says I. Yes it's a lot for beginners, but that's why the exercises are done with the instructors present. The point isn't necessarily to get the final polished product, but to learn via trying rather than replicating code that 'just works'. So I hope it succeeded somehow there.
Anyway, later on I got a more comprehensive set of fertility curves from the Human Fertility Collection, regenerated, then marked it up in Inkscape (for better labeling). Each fertility curve here is drawn at a spacing of .06 units of TFR, ergo the spacing is on scale with the data and you can treat the baselines as grid lines in this funny way. Really there's no sense trying to read values out of it. Curves for the different countries are sorted by TFR, with the lowest TFR at the top and the highest (in this set) on the bottom. You see what a variety of shapes fertility can take under similar values of TFR. I suppose that's what the plot is good at showing. Some curves are of similar shape and size, but at different locations (ages). Some are of different shape, but the same size. And so forth. Anyway, here's a vector pdf version of the plot: And the R code to produce this (prior to Inkscape markup) can be found here.
And here's an update, only a short while later, due to suggestions received on Twitter from Carl Schmertmann, Alison Taylor, Nikola Sander, and Ramon Bauer. Thanks all!

Posted Jul 11, 2017, 11:00 AM by Tim Riffe

a higher order time identity
Most demographers are familiar with the ageperiodcohort identity. 'Identity' might not be the first thing that comes to mind when we think of APC. Instead with think of the Lexis diagram, or else the identification problem, which then leads back to the fact that these three measures form an identity. Any two of them will do, and you'll get the third via implication. Well, you can go bigger than that. Say we had period, cohort, start of employment, and retirement. By taking the pairwise differences between these four events, we'd end up with 6 further durations. Just as APC relate in a simple graph, the form of a triangle, these 4+6 time measures relate in the form of a denser graph with 5 vertices and 10 (4+6) edges. It's a complete graph, meaning all vertices are connected directly. Via Cayley's formula there are 125 ways to select 4 edges and still touch all vertices. And coming back to the time identity, there are therefore 125 ways to start with 4 times measures (of these 10) and end up generating the remaining 6 (i.e. such that the remaining 6 are implied). Here's a mini poster showing them all: I think I'd try sorting these somehow, but need to figure out how. For now these are unordered 'solutions'.

Posted Jul 11, 2017, 7:47 AM by Tim Riffe

Demography, Uruguay, Futbol
So Victoria Prieto kindly invited me to give a lecture via Skype (because the class is in Uruguay) on applications of the Lexis diagram. And I must say that Skype quality just stinks for this kind of thing. They were able to get clear audio of me IFF we made it so that I was just doing screen share and their audio connection was turned off. We made it work somehow, but got lost in a sea of white noise during the Q&A a couple times. Victoria asked me to try to make them love Lexis. So, aside from trying to make a real case on the utility of the Lexis diagram, I also found some atypical applications here and there, and made my own silly (but fun!) example of lifelines in the Lexis diagram, based on futbol (I have to write it like that cuz football means US football to me, no matter how long I live in the EU) players on the Uruguay national team in all of Uruguay's appearances in World Cup and Copa America. Uruguay, you ask? It's here:
(By Connormah  Own work, CC BYSA 3.0, https://commons.wikimedia.org/w/index.php?curid=6913529)
Uruguay punches (or bites? mwahaha) above their weight class in the realm of futbol. It's a mostly urban country, and they have a demography masters program, and a PhD specialization in pop studies. And Victoria teaches there, ergo the Lexis lecture, and that's why the Skype hassel. OK, now back to the point of this post:
I just wanted to show that you can represent any population of durations on the Lexis diagram (or its analogues).
Here are the plots I showed them, without first telling them what it was. I asked them to guess the data:
1) guess what this is showing.
It's right skewed, between ages 15 and 40, possibly fertility?! It does look that way doesn't it?, This turns out to be the aggregate of all ages in all teams in all cups, standardized in 2.5 yearwidth age groups. Meh. I didn't reveal the subject here yet.
2) Maybe this would tip off? If I just say that the data in the above historgram came from the points in this Lexis diagram? (click to embiggen)
This image still didn't settle it. I guess x ticks every 20 years kind of obfuscates recognizing the particular years of the cups. Like, 1930 and 1950 I think Uruguay won, but there aren't any ticks there to indicate it. So then came the reveal, but since it was a oneway connection I'm not sure if it tanked or if they thought it was cool. But they did ask for a blog post on it with the code to get the data, etc. The truth is you could augment these data in so many ways I think sports data is ripe for demographic analysis, though one might need to play with the definition of age, like this:
Same data, but the lifelines only connect the first and last cups played by each player. The life lines are of course still aligned to age 0 (birth), but you could certainly realign them to make first cup be age zero, or last cup age omega... Further you could augment these to make the points games rather than simply being a team member, add other cups, regular season, and ... the predata territory: youth teams. The Lexis diagram isn't the limiting factor here, but the more detail that is added, the less useful lifelines become, and the more likely you are to gain insight from some kind of aggregation (perhaps on a metric rather than on team membership, and aligned to one or another definition of age). Then patterns will emerge that are otherwise invisible. Woot.
I hopothesized that you could use the Lexis diagrams to guess at the heros of any given team. Whaaa? Take the lifelines of the most recent team, and follow the lines back in time (downward left) to sometime between ages 5 and 15, assuming that that's when futbol impressions are made in the most lasting way, then look up to which players appeared in those cups. That's the population of likely homebrew heros. Not entirely novel. Any player will just tell you their childhood references anyway.
But possibly more interesting that this subject matter was the mungery required to get these data. The data were scraped from wikipedia entries, like this one. And indeed, using the code annotated here
Feel free to use it, expand it, make cool plots, and share them.
and here's the full presentation:
Thanks again Vicky!

Posted Jul 21, 2016, 2:15 AM by Tim Riffe

Many lifelines
Posted Jun 14, 2016, 1:15 PM by Tim Riffe

Reprojections of the Lexis surface
If we think age and period as lat long coordinates, then we can also borrow the analogy of map projections. There is a large but finite number of ways to project the Lexis surface as it is (rotations, reflections, isotropic or not, other kinds of time measures, etc), but there are an infinite number of ways to reproject any of the time measures used as axes, thereby reprojecting whatever data is being shown on the surface. I'll use the example of an age pattern. This is very much in line with the previous post, where age itself was reprojected, and it's a basic extention of the SandersonScherbov method of standardizing age patterns by distorting age in one of them. Let's start with the following standard Lexis surface of fertility.
Behold 120+ years of Sweden: ( HFD) There are plenty of stories one can tell here. Now let's use the trick from the previous post, and transform the y=axis. The trick to transforming the yaxis is that the ycoordinate for each x coordinate must remaing comparable with respect to the function that you use to transform. I'll take the example of survival quantiles, because for each year we have a period survival function, l(x) ( HMD), which tells us the proportion surviving at each age x. If youinvert this then it tells you the age at which a given proportion of the synthetic population remains alive. Ergo, the ages that correspond to quantiles. These change over time, as mortality jumps about and/or gradually improves. We want the quantiles themselves as the new yaxis (last post it was remaining life expectancy, but now using survivial quantiles, just because). So, 1) for a given quantile and year, find the exact age that it corresponds to, 2) find the fertility rate that corresponds to that exact age (I use splines for both steps) 3) wash, wrinse, repeat over the whole surface: voila:
Same data! OK, we know the story, reproductive ages are now almost completely survived through. But it wasn't always the case! But that's not the point of this post. The point is the idea of reprojecting one or more axes of the Lexis surface, and its variants. All you need are two time series of data that is structured on the same agelike index (lifespan, timetodeath, time since X, time until X, etc). Here I chose lifespan quantiles, but it could have easily been another (preferably monotonic, but not necessarily so) lifetable function (or a function of a lifetable function!). Nor must it be a lifetable function. In this case we could have used ANY agepattern to reproject. You could use fertility to reproject any aspect of mortality, for instance (but there might be more details to sort out).
Maybe I'll post another example sometime.
I presented this idea to some graphics pros yesterday at the Fraunhofer Institute in Rostock, and you can find the code in a repo for that presentation.

Posted May 27, 2016, 2:03 AM by Tim Riffe
The inelegant oldschool 1page blog has now migrated to the 'blog' tab on the left. This page only shows the 5 most recent entries.

