survival.analysis.fnc

Objetivo

Estimated nonparametric survival Kaplan-Meier models and Cox's regression (parametric) for estimating the survival function predicted by covariates user-defined variables.

KAPLAN-MEIER

We start from the database leukemia survival in order to estimate the survival curve using the Kaplan-Meier function survival.analysis.fnc.

head(leukemia)

time status x

1 9 1 Maintained

2 13 1 Maintained

3 13 0 Maintained

4 18 1 Maintained

5 23 1 Maintained

6 28 0 Maintained

As can be seen we have three variables: time measures the survival time of the study participants. Which some, still alive at the time of measurement (Censored Data status = 0) or died (status = 1) .

We wish first, assess the survival function from leukemia survival time.

survival.analysis.fnc(leukemia, time='time', status='status',

model='km')

#--------------------------------------------------------------

# ANALISIS DE SUPERVIVENCIA - KAPLAN-MEIER

#--------------------------------------------------------------

$modelo

Call: survfit(formula = eval(parse(text = p2)))

n events median 0.95LCL 0.95UCL

23 18 27 18 45

$modelo.km

records n.max n.start events median 0.95LCL 0.95UCL

23 23 23 18 27 18 45

$km.supervivencia

time n.risk n.event survival std.err lower 95% CI upper 95% CI

5 23 2 0.9130 0.0588 0.8049 1.000

8 21 2 0.8261 0.0790 0.6848 0.996

9 19 1 0.7826 0.0860 0.6310 0.971

12 18 1 0.7391 0.0916 0.5798 0.942

13 17 1 0.6957 0.0959 0.5309 0.912

18 14 1 0.6460 0.1011 0.4753 0.878

23 13 2 0.5466 0.1073 0.3721 0.803

27 11 1 0.4969 0.1084 0.3240 0.762

30 9 1 0.4417 0.1095 0.2717 0.718

31 8 1 0.3865 0.1089 0.2225 0.671

33 7 1 0.3313 0.1064 0.1765 0.622

34 6 1 0.2761 0.1020 0.1338 0.569

43 5 1 0.2208 0.0954 0.0947 0.515

45 4 1 0.1656 0.0860 0.0598 0.458

48 2 1 0.0828 0.0727 0.0148 0.462

We have 23 records and there have been 18 events (deaths). The median survival time were 27 months (unit of time), with a confidence interval between 18 and 45 months. The first patient survives 5 months and there are 23 so far at risk (n.risk = 23). At that time point the probability of survival is 95%.

The second patient survives until month 8 and with that included patient there are now 21 at risk. Survival at 8 months falls to 82%. To conclude, we can see that the last death occurs in the month 48 which represents a survival probability of 8% until this week.

In the database you can see that there is a variable that we have not used name x, which measures whether or not the patient have been kept in chemotherapy (Manteined vs Nonmanteined).

frequency.fnc(leukemia, variable='x')

Maintained Nonmaintained

11 12

We will request the survival table at every level of the variable x. We will use which.factor argument including the name or names of the factors on which we want to see the different curves.

survival.analysis.fnc(leukemia, time='time', status='status',

model='km', which.factor='x')

#--------------------------------------------------------------

# ANALISIS DE SUPERVIVENCIA - KAPLAN-MEIER

#--------------------------------------------------------------

$modelo.km

records n.max n.start events median 0.95LCL 0.95UCL

x=Maintained 11 11 11 7 31 18 NA

x=Nonmaintained 12 12 12 11 23 8 NA

$km.supervivencia

Call: survfit(formula = eval(parse(text = p2)))

x=Maintained

time n.risk n.event survival std.err lower 95% CI upper 95% CI

9 11 1 0.909 0.0867 0.7541 1.000

13 10 1 0.818 0.1163 0.6192 1.000

18 8 1 0.716 0.1397 0.4884 1.000

23 7 1 0.614 0.1526 0.3769 0.999

31 5 1 0.491 0.1642 0.2549 0.946

34 4 1 0.368 0.1627 0.1549 0.875

48 2 1 0.184 0.1535 0.0359 0.944

x=Nonmaintained

time n.risk n.event survival std.err lower 95% CI upper 95% CI

5 12 2 0.8333 0.1076 0.6470 1.000

8 10 2 0.6667 0.1361 0.4468 0.995

12 8 1 0.5833 0.1423 0.3616 0.941

23 6 1 0.4861 0.1481 0.2675 0.883

27 5 1 0.3889 0.1470 0.1854 0.816

30 4 1 0.2917 0.1387 0.1148 0.741

33 3 1 0.1944 0.1219 0.0569 0.664

43 2 1 0.0972 0.0919 0.0153 0.620

45 1 1 0.0000 NaN NA NA

$contraste.log.rank Call: survdiff(formula = eval(parse(text = p2))) N Observed Expected (O-E)^2/E (O-E)^2/V x=Maintained 11 7 10.69 1.27 3.4 x=Nonmaintained 12 11 7.31 1.86 3.4 Chisq= 3.4 on 1 degrees of freedom, p= 0.0653

We could request the graph does not contain the confidence intervals for the survival curve including the argument ci=F (TRUE default) (Figure 3).

survival.analysis.fnc(leukemia, time='time',

status='status', model='km', which.factor='x', ci=F)

Cox Regression

REGRESIÓN DE COX

En el análisis de la supervivencia, se conoce como regresión de Cox o también como modelo de los riesgos proporcionales, a una clase de modelos usados para modelar los riesgos que afectan a la supervivencia de una población de sujetos. El modelo de Cox expresa la función de riesgo instantáneo de muerte en función del tiempo t y de covariables X1...Xp (https://es.wikipedia.org/wiki/Regresión_de_Cox).

Utilizamos el modelo de riesgo proporcional de Cox, cuando deseamos poder modelar no sólo la relación entre la tasa de supervivencia y el tiempo, sino también la posible relación con diferentes variables obtenidas para cada sujeto. Se trata por tanto de calcular la tasa de mortalidad como una función del tiempo y de ciertas variables de pronóstico que llamaremos covariantes.

Utilizaremos la base de datos lung que contiene los datos de 228 enfermos de cáncer de pulmón medidos en múltiples variables. Deseamos modelar la probabilidad de supervivencia a partir de obviamente el paso de tiempo como de múltiples variables covariantes como la edad (age), calorías ingeridas (meal.cal), pérdida de peso en los últimos 6 meses ( wt.loss), etc.

head(lung)

inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss

1 3 306 2 74 1 1 90 100 1175 NA

2 3 455 2 68 1 0 90 90 1225 15

3 3 1010 1 56 1 0 90 90 NA 15

4 5 210 2 57 1 1 90 60 1150 11

5 1 883 2 60 1 0 100 90 NA 0

6 12 1022 1 74 1 1 50 80 513 0

Para ello utilizaremos la misma función solo que incluiremos en el argumento modelo el valor 'cox'. Las variables predictoras de la función basal de supervivencia las incluiremos en el argumento variables. Por defecto se llevará a cabo una estimación paso a paso del modelo. Si se desea el método simultáneo (estimar el modelo con todas las variables incluidas) deberemos añadir el argumento paso.a.paso=F.

survival.analysis.fnc(lung, time = 'time', status = 'status',

model='cox')

Hemos solicitado la regresión de Cox pero no hemos incluido ninguna variable covariante predictora en el modelo. Esta estimación es equivalente a la estimación de un modelo con el método Kaplan-Meier y así te lo hará saber proponiéndote la inclusión del argumento variables con el nombre o columnas de las variables que deseas valorar como predictoras de la función de supervivencia. En la siguiente llamada incluiremos todas las variables predictoras disponibles en la base de datos (columnas 4 a 10).

survival.analysis.fnc(lung, time = 'time', status = 'status',

model='cox', variables=4:10)

#------------------------------------------------------------------

# ANALISIS DE SUPERVIVENCIA - REGRESION DE COX - PASO A PASO

#------------------------------------------------------------------

$modelo.cox

Call:

coxph(formula = Surv(time, status) ~ sex + ph.ecog + ph.karno +

pat.karno + wt.loss, data = datos)

n= 168, number of events= 121

coef exp(coef) se(coef) z Pr(>|z|)

sex -0.556256 0.573352 0.198616 -2.801 0.00510 **

ph.ecog 0.738091 2.091939 0.225670 3.271 0.00107 **

ph.karno 0.020412 1.020621 0.011083 1.842 0.06551 .

pat.karno -0.012682 0.987398 0.007926 -1.600 0.10960

wt.loss -0.014601 0.985505 0.007702 -1.896 0.05799 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95

sex 0.5734 1.7441 0.3885 0.8462

ph.ecog 2.0919 0.4780 1.3442 3.2557

ph.karno 1.0206 0.9798 0.9987 1.0430

pat.karno 0.9874 1.0128 0.9722 1.0029

wt.loss 0.9855 1.0147 0.9707 1.0005

Concordance= 0.656 (se = 0.031 )

Rsquare= 0.151 (max possible= 0.998 )

Likelihood ratio test= 27.47 on 5 df, p=0.00004616

Wald test = 27.02 on 5 df, p=0.00005666

Score (logrank) test = 27.77 on 5 df, p=0.00004031

$Ho.riesgo.proporcional

rho chisq p

sex 0.1150 1.5117 0.219

ph.ecog 0.0152 0.0297 0.863

ph.karno 0.1643 2.3267 0.127

pat.karno 0.0568 0.4531 0.501

wt.loss 0.0574 0.5502 0.458

GLOBAL NA 7.9388 0.160

La tercera figura se ha obtenido incluyendo el argumento pivote con el nombre del factor o variable sobre la que deseamos ver la curva de supervivencia en el promedio del resto de las variables. Por defecto la función pivota sobre la primera variable del modelo.

survival.analysis.fnc(lung, time = 'time', status = 'status',

model='cox', variables=4:10, pivot='ph.ecog')

La ejecución de esta función genera además un número importante de gráficas que aquí se omite su salida por motivos de espacio pero que puedes consultar en este pdf. En ellas encontrarás:

Gráficas de violación de supuesto de riesgo proporcional para cada variable predictora.
Curva de supervivencia en el promedio de todas las covariantes.
Curva de supervivencia para la primera variable del modelo actuando como pivote (sexo en este ejemplo). Donde podemos observar la curva de supervivencia para cada nivel ( si la variable es un factor del pivote o en los percentiles 25 y 75 si la variable es numérica) en el promedio del resto de covariantes del modelo estimado.
Gráfica de valores palancas donde podemos observar la presencia de casos extremos que afectan al parámetro estimado.
Residuales para cada variable predictora o cavariante.