R-stuff

The importance of R: an introduction

"R is really important to the point that it’s hard to overvalue it

Daryl Pregibon, research scientist at Google

Since some years I have been working with R and I must confess that currently I used only this statistical software for my analysis. For those who do not (yet) know R I promise that once you start using it and forgetting other software, you will find a number of built-in mechanisms for organizing data, running calculations, and creating graphical representations of data sets.

I am not a "fundamentalist" of this software, but I am still searching for a good reason to be "converted". I ensure that there is not learning curve who stops me. Obviously this is not a web-page which should explain capabilities of R, but I'd like to list some nonacademic articles which could give you some reasons for moving to R.

Furthermore, in the last year I develop a specific package for R for smoothing and forecasting mortality (and any Poisson distributed data). It can been installed directly within your R device. See subsection MortalitySmooth and associated paper (Camarda 2012, JSS, 50, 1-24) for more information.

Some years ago, I published an article in which we proposed a novel methodology for smoothing density with evident digit preference patterns (Camarda et. al 2008, Stat Model, 8(4), 385–401). In subsection Digit Preference Model , I provide a simple code for implementing the suggested approach.

For speeding up my own research, I also developed other two packages which are not freely accessible for copyright reasons. They collect mortality data from the Human Mortality Database and they present user-friendly function for extracting specific subsets of these data. See subsection HMDdata and HMDdataLT for information and some examples.


Digit Preference Model

Here I offer the link to the R-codes for implementing the model presented in

Camarda, C. G., P. Eilers and J. Gampe (2008)

Modelling General Pattern of Digit Preference

Statistical Modelling 8, 385-401

(link)


Two codes are provided:

- functionsPCLML1.R : a collection of functions useful to estimate the model

- DPexample.R: an example with simulated data

I commented these files as much as possible and I used simple simulated data, but if something is not clear please let me know.

HMDdata

HMDdata is a package I have compiled to speed up my work on mortality analysis. It does not aim to be universal and fully tested as a CRAN package. It is more a user-friendly tool for extracting 1x1 mortality data from the Human Mortality Database (HMD). More complete and general R-code for similar purposes are available on Tim Riffe's web-site.

So, HMDdata is nothing complex and there are no models/statistics involved. It is a product of an afternoon-work when I have got bothered by the tedious job of selecting every time specific data from the HMD in a matrix formats. Therefore I decided to include all HMD populations in an R-package and I provide a user-friendly R-function for extracting them. I also update my own database twice a year by including latest years for already included populations, and newly added populations.

Unfortunately the package HMDdata can't be publicly available because it include many information coming from the Human Mortality Database (HMD) and distributing the data would violate the HMD user agreement (paragraph 3: "Please do not pass your copy of these data to other users...."). Hence if you need it or you would like to have a look, please write me an email: carlo-giovanni.camarda at ined.fr

Specifically, data for deaths, population, exposures and rates are available in 1x1 age-year intervals (HMDdata). Two functions are also provided:

  • selectHMDdata: select a specific dataset and create an HMDdata object

  • plot.HMDdata: simple plot of an HMDdata object

The manual can be found here and below some example and the instruction for installation.

Needless to say that any suggestion/comment/criticism is more than welcome. I hope you'll find it useful and easy to use. Otherwise have a nice day!


## EXAMPLE:

## load the package

library(HMDdata)

## check available populations

names(HMDdata)

## select Danish females deaths, ages 50-100, years 1950-2009

D <- selectHMDdata(country="Denmark",

data="Death",

sex="Females",

ages=50:100,

years=1950:2009)

## plot the data

plot(D)

D is a matrix where rows and columns are indexed by age and year, respectively.

Plotting an HMDdata object produces a shaded contour-map or a scatter plot for 2D and 1D datasets, respectively (log-scale in case of rates).

## INSTALLATION:

## with Linux

setwd("~/your_path/")

install.packages(pkg="HMDdata_1.0.tar.gz")

## with Windows

setwd("~\\your_path\\")

install.packages(pkg="HMDdata_1.0.zip")

HMDdataLT

This is a mirrored package of HMDdata. It is a user-friendly tool for collecting and extracting in R 1x1 life-table functions from the Human Mortality Database (HMD). It works similarly to the HMDdata and it can't be publicly available, too. You could simply send me an email to get it: carlo-giovanni.camarda at ined.fr

Life Expectancy Confidence Intervals

By working with relatively small sub-populations, I encountered the issue of assessing the uncertainty around their life expectancy. In other words, I faced the issue of constructing confidence intervals around life expectancy. Chiang (1984) already proposed a solution based on the binomial assumptions for the probability of dying within the life table. Andreev and Shkolnikov (2010) presented a spreadsheet for calculation of confidence limits for any life table.

Mimicking this last work, my contribution is to provide a simple R routine for building confidence intervals for life expectancy using bootstrapping life table deaths based on the binomial assumptions. As by product, you could also find an R function for constructing a life table from a series of deaths and exposures.

Two codes and three datasets for testing the method are provided:

- LifeTableFUN.R : a collection of functions useful to building any life table and for constructing confidence interval for life expectancy at any user-defined level of confidence, number of simulated life table, age and sex.

- ConfidenceIntervalLifeExpectancy.R : a code running the previous routines on 3 different datasets which are provided below.

- ExampleLT.csv , ExampleLT2.txt and ExampleLT3.txt : three datasets which could be used for testing the approach.

As always, I commented these files as much as possible and I used simple data, but if something is not clear please let me know.

MortalitySmooth

Under construction

Smooth Constrained Mortality Forecasting

This page offers the link to the R-codes for implementing the model presented in


Camarda, C. G. (2008)

Smooth Constrained Mortality Forecasting

Demographic Research. 41 (38), 1091-1130

DOI: 10.4054/DemRes.2019.41.38

Open access here


Prepared on 2019.09.03 using version 3.6.1. These files are only slightly different from those published in the Demographic Research web-page.

Requirements in terms of R codes and packages are described in the preamble of each file.

Specifically, registration to the Human Mortality Database is required for running all presented examples. Using the R-package "HMDHFDplus", the code will prompt for the HMD user name and password.

Codes are extensively commented and object-names follow as much as possible notation as presented in the publication, but if something is not clear please let me know.


Transition Coefficients

This page offers the link to the R-codes for implementing the model presented in

Camarda, C. G. (2013).

Estimating Transition Coefficients in Reconstructing Continuous Series of Mortality by Cause of Death.

Modicod. Kick-off Seminar, MPIDR, Rostock (Germany), April 2013.

(slides)


Afterward the model has been modified in the estimating procedure, though the concept remained unchanged. Specifically I reduce the number of transition coefficients that needs to be estimated by incorporating the equality constraints into the regression frame. Additionally I implemented a quadratic programming approach via the R-function solve.QP(quadprogr) instead of the constrained linear model by lsei(limSolve)as shown in the original slides.

Two codes are provided:


  • CfunCoD.R : function for building the transition matrix C as presented on slide 11

  • EstimCoefExample.R: an example with Russian data on digestive disease


I commented these files as much as possible and I used simple dataset, but if something is not clear please let me know.

Please note that I also generalized the model assuming a smooth change over ages of the transition coefficients. This general approach was presented in


Camarda, C. G. (2014)

Reconstructing Mortality Series by Cause of Death: Two alternative approaches

In Kneib, T., Sobotka, F., Fahrenholz, J. and Irmer, H.: Proceedings of the 29th International Workshop on Statistical Modelling

Göttingen (Germany). 14-18 July, 2014. 69-74

(paper)

and related R-code will be (soon) available.