0.1 Background:

A 1991 Australian study by Caplehorn et al., compared retention in two methadone treatment clinics for heroin addicts. A patient’s survival time \((T)\) was determined as the time in days until the patient dropped out of the clinic or was censored at the end of the study clinic. The two clinics differed according to their overall treatment policies.

0.2 Variable codes:

  • I.D. of subject – ranging from 1 to 266
  • Clinic - 0 = Clinic 1 ; 1 = Clinic 2
  • Status - 0 = Censored ; 1 = endpoint
  • Survival time – in days
  • Pris_rec – Prison record (0 = none ; 1 = any)
  • Depression- 0=No; 1=Yes
  • LI- Lung Injury
  • Dose - Methadone dose (mg/day)

0.3 Reading data in R

setwd("C:\\Users\\rajesh.majumder\\Downloads\\Survival tutorial SJRI_15_07_23\\")
data= read.csv("Addict.csv")

0.4 View data

head(data,10)
##    idno clinic status time pris_rec dose LI Depression
## 1   104      0      0  713        0   50  7          1
## 2   128      0      0   72        1   40  7          0
## 3   137      0      0  563        0   70  8          0
## 4   165      0      0 1076        1   80  8          0
## 5   131      0      0  367        0   70  9          0
## 6   184      0      0  222        0   40  9          0
## 7   193      0      0  531        1   65  9          0
## 8   166      0      0    2        1   40 10          0
## 9   121      0      0  591        0   70 11          0
## 10  163      0      0  611        1   40 11          0

0.5 Data Structure:

str(data)
## 'data.frame':    238 obs. of  8 variables:
##  $ idno      : int  104 128 137 165 131 184 193 166 121 163 ...
##  $ clinic    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ status    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ time      : int  713 72 563 1076 367 222 531 2 591 611 ...
##  $ pris_rec  : int  0 1 0 1 0 0 1 1 0 1 ...
##  $ dose      : int  50 40 70 80 70 40 65 40 70 40 ...
##  $ LI        : int  7 7 8 8 9 9 9 10 11 11 ...
##  $ Depression: int  1 0 0 0 0 0 0 0 0 0 ...

0.6 Reformation of the working data:

## Re-labeling and converting to factor ##
##-- clinic --##
data$clinic= factor(data$clinic,levels=c(0,1),labels=c("Clinic1","Clinic2"))

##--status--##
data$status= factor(data$status,levels=c(0,1),labels=c("Censored","endpoint"))

##--pris_rec--##
data$pris_rec= factor(data$pris_rec,levels=c(0,1),labels=c("none","any"))

##-- Depression --##
data$Depression= factor(data$Depression,levels = c(0,1),labels = c("yes","no"))

0.7 Required Libraries:

library(survival)
library(survminer)
library(ggsurvfit)
library(survRM2)

0.8 visualization of the time distribution

  • Distribution of follow-up times is skewed, and may differ between censored patients and those with events. Follow-up times always positive.

Let’s draw the density plot(using Kernel density estimation technique) for Censored and Event.

library(ggplot2)      
 
# create density plot with desired colours
ggplot(data, aes(x=time,fill=status,color=status)) + geom_density(
                                         alpha=0.8)+facet_wrap(.~clinic)

0.9 Q(1). Calculate the retention probabilities using Kaplan Meier and plot the survival and hazard plots.

Now, Our interest is to estimate what is the retention probability at a certain time say \(t\), which is nothing but Survival probability at a certain time,\(S(t)\).

The survival function, is the probability an individual survives (or, the probability that the event of interest does not occur) up to and including time \(t\). It’s the probability that the event (e.g., death) hasn’t occured yet. It looks like this, where \(T\) is the time of death, and \(P(T>t)\) is the probability that the time of death is greater than some time \(t\).

There are various techniques to estimate this retention probability or survival probability, one of them is Product Limit estimation/Kaplan-Meier estimation.

In R the survfit() function under survival package; estimates median survival with 95% C.I and creates survival curves based on a formula.

## Estimated Median
data$status_code= (data$status=="endpoint")*1
f1=survfit(Surv(time, status_code)~1,data=data)
f1
## Call: survfit(formula = Surv(time, status_code) ~ 1, data = data)
## 
##        n events median 0.95LCL 0.95UCL
## [1,] 238    150    504     394     560
summary(f1)
## Call: survfit(formula = Surv(time, status_code) ~ 1, data = data)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     7    236       1    0.996 0.00423       0.9875        1.000
##    13    235       1    0.992 0.00597       0.9799        1.000
##    17    234       1    0.987 0.00729       0.9731        1.000
##    19    233       1    0.983 0.00840       0.9667        1.000
##    26    232       1    0.979 0.00937       0.9606        0.997
##    29    229       1    0.975 0.01026       0.9546        0.995
##    30    228       1    0.970 0.01107       0.9488        0.992
##    33    227       1    0.966 0.01182       0.9431        0.989
##    35    226       2    0.957 0.01317       0.9320        0.984
##    37    224       1    0.953 0.01379       0.9265        0.981
##    41    223       2    0.945 0.01493       0.9158        0.974
##    44    221       1    0.940 0.01546       0.9105        0.971
##    47    220       1    0.936 0.01597       0.9053        0.968
##    49    219       1    0.932 0.01646       0.9001        0.965
##    50    218       1    0.928 0.01693       0.8949        0.961
##    59    215       1    0.923 0.01739       0.8897        0.958
##    62    214       1    0.919 0.01784       0.8846        0.955
##    67    212       1    0.915 0.01827       0.8794        0.951
##    75    210       1    0.910 0.01870       0.8743        0.948
##    79    209       1    0.906 0.01911       0.8692        0.944
##    84    208       1    0.901 0.01951       0.8641        0.941
##    90    206       1    0.897 0.01990       0.8590        0.937
##    95    205       1    0.893 0.02028       0.8539        0.933
##    96    204       1    0.888 0.02064       0.8488        0.930
##   109    201       1    0.884 0.02101       0.8437        0.926
##   117    199       1    0.880 0.02137       0.8386        0.922
##   122    198       1    0.875 0.02172       0.8335        0.919
##   126    197       1    0.871 0.02206       0.8284        0.915
##   127    196       1    0.866 0.02239       0.8234        0.911
##   129    195       1    0.862 0.02271       0.8184        0.907
##   136    193       1    0.857 0.02302       0.8133        0.904
##   143    192       1    0.853 0.02333       0.8083        0.900
##   145    191       1    0.848 0.02363       0.8033        0.896
##   147    189       1    0.844 0.02393       0.7982        0.892
##   149    187       1    0.839 0.02423       0.7932        0.888
##   150    186       1    0.835 0.02451       0.7881        0.884
##   157    184       1    0.830 0.02480       0.7831        0.880
##   160    183       1    0.826 0.02507       0.7780        0.876
##   161    182       1    0.821 0.02534       0.7730        0.872
##   167    180       1    0.817 0.02561       0.7680        0.868
##   168    179       1    0.812 0.02587       0.7629        0.864
##   170    178       1    0.808 0.02612       0.7579        0.860
##   175    177       1    0.803 0.02637       0.7529        0.856
##   176    175       1    0.798 0.02662       0.7479        0.852
##   180    174       2    0.789 0.02709       0.7379        0.844
##   181    172       1    0.785 0.02732       0.7329        0.840
##   183    171       1    0.780 0.02754       0.7279        0.836
##   190    170       1    0.775 0.02776       0.7229        0.832
##   192    169       1    0.771 0.02797       0.7179        0.828
##   193    168       1    0.766 0.02818       0.7130        0.824
##   204    167       1    0.762 0.02838       0.7080        0.819
##   205    165       1    0.757 0.02858       0.7031        0.815
##   207    164       1    0.752 0.02878       0.6981        0.811
##   209    163       1    0.748 0.02897       0.6931        0.807
##   212    161       2    0.739 0.02934       0.6832        0.798
##   216    159       2    0.729 0.02970       0.6733        0.790
##   223    156       1    0.725 0.02988       0.6683        0.786
##   231    155       1    0.720 0.03005       0.6633        0.781
##   232    154       1    0.715 0.03021       0.6584        0.777
##   237    153       1    0.711 0.03037       0.6534        0.773
##   244    152       1    0.706 0.03053       0.6485        0.768
##   247    151       1    0.701 0.03069       0.6436        0.764
##   257    150       1    0.697 0.03084       0.6386        0.760
##   258    149       1    0.692 0.03098       0.6337        0.755
##   259    148       1    0.687 0.03112       0.6288        0.751
##   262    147       2    0.678 0.03139       0.6190        0.742
##   268    145       2    0.668 0.03165       0.6092        0.733
##   275    143       1    0.664 0.03177       0.6044        0.729
##   280    142       1    0.659 0.03189       0.5995        0.725
##   286    140       1    0.654 0.03201       0.5946        0.720
##   293    139       1    0.650 0.03212       0.5897        0.716
##   294    138       1    0.645 0.03223       0.5848        0.711
##   299    137       1    0.640 0.03234       0.5799        0.707
##   302    136       1    0.636 0.03244       0.5751        0.702
##   314    135       1    0.631 0.03254       0.5702        0.698
##   322    133       1    0.626 0.03264       0.5653        0.693
##   337    130       1    0.621 0.03274       0.5603        0.689
##   341    128       1    0.616 0.03284       0.5553        0.684
##   348    125       1    0.612 0.03295       0.5502        0.680
##   350    124       1    0.607 0.03305       0.5452        0.675
##   358    123       1    0.602 0.03315       0.5401        0.670
##   366    121       1    0.597 0.03325       0.5350        0.666
##   367    120       1    0.592 0.03334       0.5299        0.661
##   368    118       1    0.587 0.03343       0.5247        0.656
##   376    117       1    0.582 0.03352       0.5196        0.651
##   386    116       1    0.577 0.03360       0.5144        0.646
##   389    115       1    0.572 0.03368       0.5093        0.642
##   393    114       1    0.567 0.03376       0.5042        0.637
##   394    113       1    0.562 0.03383       0.4991        0.632
##   399    111       1    0.557 0.03390       0.4939        0.627
##   428    108       1    0.551 0.03398       0.4887        0.622
##   438    107       1    0.546 0.03405       0.4834        0.617
##   450    105       1    0.541 0.03412       0.4782        0.612
##   452    104       1    0.536 0.03419       0.4729        0.607
##   457    102       1    0.531 0.03425       0.4676        0.602
##   460    101       1    0.525 0.03431       0.4622        0.597
##   465     99       1    0.520 0.03437       0.4569        0.592
##   482     96       1    0.515 0.03444       0.4514        0.587
##   489     95       1    0.509 0.03450       0.4459        0.582
##   496     94       1    0.504 0.03456       0.4404        0.576
##   504     92       1    0.498 0.03461       0.4349        0.571
##   512     91       1    0.493 0.03466       0.4294        0.566
##   514     90       1    0.487 0.03471       0.4239        0.560
##   517     89       1    0.482 0.03475       0.4184        0.555
##   518     87       1    0.476 0.03479       0.4128        0.550
##   522     86       1    0.471 0.03482       0.4073        0.544
##   523     85       2    0.460 0.03487       0.3962        0.533
##   532     80       1    0.454 0.03490       0.3905        0.528
##   533     78       1    0.448 0.03494       0.3847        0.522
##   540     77       1    0.442 0.03497       0.3789        0.516
##   546     74       1    0.436 0.03500       0.3729        0.511
##   550     73       1    0.430 0.03503       0.3669        0.505
##   560     70       1    0.424 0.03506       0.3608        0.499
##   563     69       1    0.418 0.03509       0.3547        0.493
##   581     62       1    0.411 0.03517       0.3479        0.486
##   591     59       1    0.404 0.03525       0.3409        0.480
##   612     54       2    0.389 0.03550       0.3257        0.466
##   624     51       1    0.382 0.03562       0.3180        0.458
##   646     48       1    0.374 0.03575       0.3099        0.451
##   652     47       1    0.366 0.03587       0.3019        0.443
##   661     46       1    0.358 0.03596       0.2939        0.436
##   667     45       1    0.350 0.03603       0.2860        0.428
##   679     44       1    0.342 0.03608       0.2781        0.421
##   683     43       1    0.334 0.03610       0.2703        0.413
##   708     39       1    0.325 0.03618       0.2618        0.405
##   714     37       1    0.317 0.03626       0.2530        0.396
##   739     35       1    0.308 0.03633       0.2441        0.388
##   749     34       1    0.299 0.03637       0.2352        0.379
##   755     33       1    0.290 0.03638       0.2263        0.370
##   760     32       1    0.280 0.03635       0.2176        0.362
##   771     28       1    0.270 0.03641       0.2078        0.352
##   774     27       1    0.260 0.03641       0.1980        0.343
##   785     26       1    0.250 0.03636       0.1884        0.333
##   821     20       2    0.225 0.03679       0.1637        0.310
##   836     17       1    0.212 0.03693       0.1508        0.298
##   837     16       1    0.199 0.03693       0.1382        0.286
##   857     14       1    0.185 0.03692       0.1248        0.273
##   878     13       1    0.170 0.03671       0.1118        0.260
##   892     10       1    0.153 0.03679       0.0959        0.245
##   899      9       1    0.136 0.03644       0.0808        0.230

We can give the summary() function an option for what times we want to show in the results.

summary(f1,times = c(10,50,100,150,200,250,300,350,400))
## Call: survfit(formula = Surv(time, status_code) ~ 1, data = data)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    10    235       1    0.996 0.00423        0.988        1.000
##    50    218      16    0.928 0.01693        0.895        0.961
##   100    202       9    0.888 0.02064        0.849        0.930
##   150    186      12    0.835 0.02451        0.788        0.884
##   200    167      15    0.766 0.02818        0.713        0.824
##   250    150      14    0.701 0.03069        0.644        0.764
##   300    136      13    0.640 0.03234        0.580        0.707
##   350    124       7    0.607 0.03305        0.545        0.675
##   400    110      10    0.557 0.03390        0.494        0.627

Note that, by default R does not report mean survival. To get the mean survival:

print(f1,print.rmean=T)
## Call: survfit(formula = Surv(time, status_code) ~ 1, data = data)
## 
##        n events rmean* se(rmean) median 0.95LCL 0.95UCL
## [1,] 238    150    517      24.4    504     394     560
##     * restricted mean with upper limit =  1076

0.9.1 Kaplan-Meier Plots:

plot(f1,
     xlab = "Time",
     ylab="Retension Probability",
     main="Overall Survival Probability ptot",
     lwd=2)
box()

0.9.2 Cumulative Probability

##-- Simple Survival plot
ggsurvplot(f1, data = data, conf.int = T,legend="top")+xlab("Time")

#surv.median.line = "hv"

Here, in the above this is the overall survival curve. But where is the Median Survival value? To draw it…

ggsurvplot(f1, data = data, conf.int = T,legend="top",surv.median.line = "hv")+xlab("Time")

Alternatively,

survfit2(Surv(time, status_code) ~ 1, data = data) %>%
  ggsurvfit()+add_confidence_interval()+add_quantile(y_value = 0.5, color = "gray50", linewidth = 0.75)+
  labs(x="Time (Days)",y="Survival Probability")+
  add_risktable(risktable_stats = c("n.risk","cum.event"),stats_label = list(n.risk = "Number at Risk",cum.event="Number of Events"))

0.10 Q(2). Compare the retention probabilities between the two clinics using the appropriate test.

f2=survfit(Surv(time, status_code)~clinic,data=data)
print(f2,print.rmean = T)
## Call: survfit(formula = Surv(time, status_code) ~ clinic, data = data)
## 
##                  n events rmean* se(rmean) median 0.95LCL 0.95UCL
## clinic=Clinic1  75     28    732      50.5     NA     661      NA
## clinic=Clinic2 163    122    432      23.3    399     341     514
##     * restricted mean with upper limit =  1076
## Survival  Plot
ggsurvplot(f2, data = data,linetype = "strata",pval = TRUE, surv.median.line = "hv",conf.int = F,legend="top")+xlab("Years")

Similarly we can also estimate retention probabilities for other factors:

### For Prison record---:

##-- Median Estimate with 95% C.I
f2=survfit(Surv(time, status_code)~pris_rec,data=data)
f2
## Call: survfit(formula = Surv(time, status_code) ~ pris_rec, data = data)
## 
##                 n events median 0.95LCL 0.95UCL
## pris_rec=none 127     81    523     460     652
## pris_rec=any  111     69    394     268     563
ggsurvplot(f2, data = data,linetype = "strata",pval = TRUE, conf.int = F,surv.median.line = "hv",legend="top")+xlab("Years")

### For Dose---:

##-- Median Estimate with 95% C.I
f3=survfit(Surv(time, status_code)~dose,data=data)
f3
## Call: survfit(formula = Surv(time, status_code) ~ dose, data = data)
## 
##           n events median 0.95LCL 0.95UCL
## dose=20   1      1    127      NA      NA
## dose=30   2      2    332     183      NA
## dose=35   2      2    420     160      NA
## dose=40  30     18    223     117      NA
## dose=45  10      7    322      84      NA
## dose=50  27     15    460     216      NA
## dose=55  21     20    366     262     450
## dose=60  52     37    399     257     708
## dose=65  22     17    496     337      NA
## dose=70  24     14    563     314      NA
## dose=75   6      3    771     771      NA
## dose=80  35     12     NA     785      NA
## dose=90   2      1    280     280      NA
## dose=100  3      1     NA     216      NA
## dose=110  1      0     NA      NA      NA
### For LI record---:

##-- Median Estimate with 95% C.I/
f4=survfit(Surv(time, status_code)~LI,data=data)
f4
## Call: survfit(formula = Surv(time, status_code) ~ LI, data = data)
## 
##        n events median 0.95LCL 0.95UCL
## LI=7   2      0     NA      NA      NA
## LI=8   5      0     NA      NA      NA
## LI=9   4      0     NA      NA      NA
## LI=10  3      0     NA      NA      NA
## LI=11  4      0     NA      NA      NA
## LI=12  9      1     NA      NA      NA
## LI=13  9      0     NA      NA      NA
## LI=14 14      0     NA      NA      NA
## LI=15 16      4     NA     612      NA
## LI=16 19     12    280     207      NA
## LI=17 17      9    223     161      NA
## LI=18 18     14    394     358      NA
## LI=19 17     16    314     190     679
## LI=20 33     28    399     337     560
## LI=21 24     23    366     216     540
## LI=22 20     19    275     247     624
## LI=23 10     10    228     127      NA
## LI=24  4      4    318     175      NA
## LI=25  6      6    178      50      NA
## LI=26  4      4    280      79      NA
### For Depression---:

##-- Median Estimate with 95% C.I/
f5=survfit(Surv(time, status_code)~Depression,data=data)
f5
## Call: survfit(formula = Surv(time, status_code) ~ Depression, data = data)
## 
##                  n events median 0.95LCL 0.95UCL
## Depression=yes  97     30    785     714      NA
## Depression=no  141    120    322     259     452
ggsurvplot(f5, data = data,linetype = "strata",pval = TRUE, conf.int = F,surv.median.line = "hv",legend="top")+xlab("Years")

When there are more than 1 survival curves for number of groups \(k\;|;where\:\:k\geq2\), we need to perform a statistical significant test between those survival curves. As, \(S(t)\) is a probability function, so, Log Rank test statistic is approximately distributed as a chi-square test statistic with d.f. 1.

So the comparison of retention probabilities between Clinics:

0.10.1 Log Rank Test

test= survdiff(Surv(time,status_code)~clinic,data=data)
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ clinic, data = data)
## 
##                  N Observed Expected (O-E)^2/E (O-E)^2/V
## clinic=Clinic1  75       28     59.2      16.4      28.1
## clinic=Clinic2 163      122     90.8      10.7      28.1
## 
##  Chisq= 28.1  on 1 degrees of freedom, p= 1e-07

The Chi-Squared test statistic is 28.1 with 1 degree of freedom and the corresponding p-value is <0.001. Since this p-value is less than 0.001, we reject the null hypothesis,i.e., \(H_0: \text{in terms of retention, there is no difference between two clinics.}\).

In other words, we have sufficient evidence to say that there is a statistically significant difference for retention between the Clinic 1 & Clinic 2.

To, extract the p-value from survdiff we use the following trick:

p.val= 1 - pchisq(test$chisq, length(test$n) - 1)

round(p.val,3)
## [1] 0

similarly, the log-rank test for other covariates:

test= survdiff(Surv(time,status_code)~pris_rec,data=data)
print("Prison record:--")
## [1] "Prison record:--"
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ pris_rec, data = data)
## 
##                 N Observed Expected (O-E)^2/E (O-E)^2/V
## pris_rec=none 127       81     87.4     0.474      1.14
## pris_rec=any  111       69     62.6     0.663      1.14
## 
##  Chisq= 1.1  on 1 degrees of freedom, p= 0.3
test= survdiff(Surv(time,status_code)~dose,data=data)
print("Dose:--")
## [1] "Dose:--"
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ dose, data = data)
## 
##           N Observed Expected (O-E)^2/E (O-E)^2/V
## dose=20   1        1    0.143    5.1221    5.1488
## dose=30   2        2    0.910    1.3073    1.3201
## dose=35   2        2    1.258    0.4382    0.4436
## dose=40  30       18   10.198    5.9682    6.4796
## dose=45  10        7    4.411    1.5202    1.5816
## dose=50  27       15   14.219    0.0429    0.0477
## dose=55  21       20    9.676   11.0153   11.9907
## dose=60  52       37   30.039    1.6133    2.0287
## dose=65  22       17   14.107    0.5933    0.6623
## dose=70  24       14   16.070    0.2667    0.3022
## dose=75   6        3    5.070    0.8453    0.8798
## dose=80  35       12   38.921   18.6209   27.5374
## dose=90   2        1    1.789    0.3478    0.3536
## dose=100  3        1    2.306    0.7396    0.7542
## dose=110  1        0    0.884    0.8843    0.8923
## 
##  Chisq= 53.4  on 14 degrees of freedom, p= 2e-06
test= survdiff(Surv(time,status_code)~LI,data=data)
print("LI:--")
## [1] "LI:--"
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ LI, data = data)
## 
##        N Observed Expected (O-E)^2/E (O-E)^2/V
## LI=7   2        0     1.20  1.20e+00  1.22e+00
## LI=8   5        0     6.30  6.30e+00  6.77e+00
## LI=9   4        0     2.26  2.26e+00  2.31e+00
## LI=10  3        0     1.74  1.74e+00  1.77e+00
## LI=11  4        0     4.55  4.55e+00  4.73e+00
## LI=12  9        1     7.52  5.65e+00  6.11e+00
## LI=13  9        0     6.97  6.97e+00  7.34e+00
## LI=14 14        0    12.62  1.26e+01  1.39e+01
## LI=15 16        4    10.19  3.76e+00  4.09e+00
## LI=16 19       12     9.73  5.28e-01  5.70e-01
## LI=17 17        9     6.81  7.04e-01  7.42e-01
## LI=18 18       14    13.92  4.43e-04  4.94e-04
## LI=19 17       16    10.27  3.19e+00  3.44e+00
## LI=20 33       28    21.59  1.90e+00  2.23e+00
## LI=21 24       23    13.70  6.31e+00  7.01e+00
## LI=22 20       19    10.07  7.91e+00  8.53e+00
## LI=23 10       10     3.71  1.07e+01  1.10e+01
## LI=24  4        4     2.62  7.31e-01  7.49e-01
## LI=25  6        6     2.65  4.25e+00  4.37e+00
## LI=26  4        4     1.57  3.75e+00  3.81e+00
## 
##  Chisq= 87  on 19 degrees of freedom, p= 1e-10
test= survdiff(Surv(time,status_code)~Depression,data=data)
print("Depression:--")
## [1] "Depression:--"
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ Depression, data = data)
## 
##                  N Observed Expected (O-E)^2/E (O-E)^2/V
## Depression=yes  97       30     69.7      22.6      42.6
## Depression=no  141      120     80.3      19.7      42.6
## 
##  Chisq= 42.6  on 1 degrees of freedom, p= 7e-11

0.11 Q(3). Check for the Proportional Hazards assumption for each of the independent variables using– (i)Log log survival plot, (ii)Plot Schoenfeld residuals

Kaplan-Meier curves are good for visualizing differences in survival between two categorical groups, but they don’t work well for assessing the effect of quantitative variables like age, gene expression, leukocyte count, etc. Cox PH regression can assess the effect of both categorical and continuous variables, and can model the effect of multiple variables at once.

Let,

Mod1=coxph(Surv(time,status_code)~Depression,data=data)
summary(Mod1)
## Call:
## coxph(formula = Surv(time, status_code) ~ Depression, data = data)
## 
##   n= 238, number of events= 150 
## 
##                coef exp(coef) se(coef)     z Pr(>|z|)    
## Depressionno 1.2532    3.5016   0.2045 6.128  8.9e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##              exp(coef) exp(-coef) lower .95 upper .95
## Depressionno     3.502     0.2856     2.345     5.228
## 
## Concordance= 0.633  (se = 0.021 )
## Likelihood ratio test= 46.26  on 1 df,   p=1e-11
## Wald test            = 37.55  on 1 df,   p=9e-10
## Score (logrank) test = 42.66  on 1 df,   p=7e-11

The exp(coef) column contains \(e^{\beta}\). This is the adjusted hazard ratio (in our case HR of clinic adjusted for dose =3.502) – the multiplicative effect of that variable on the hazard rate (for each unit increase in that variable). So, for the categorical variable like clinic, at a given instant in time, someone who was diagnosed in clinic 2 is 2.6 times as likely to get event of interest as someone who was diagnosed in clinic 1 adjusting for dose.

Note that, Likelihood ratio test, Wald test and Score test are the values of the test statistics for testing of hypothesis: \(H_0:\text{All}\:\:\beta_s \:\:=0\).

These give the evidence of overall model significance.

Note that the Concordance value is= 66%(0.66) which gives the model accuracy. Just like for logistic regression; AUC.

0.12 Testing proportional Hazards assumption : Log-log plot

plot(survfit(Surv(time, status_code) ~ Depression,data = data), col=c("black", "red"), fun="cloglog",
     xlab = "Time",ylab = "log(-log(S(t)))",main="Log-Log plot to check PH Assumption")

Note that, the two curves are not overlapped, that means the proportional Hazard assumption is hold.

0.13 Testing proportional Hazards assumption : Schoenfeld residuals

Mod2=coxph(Surv(time,status_code)~dose,data=data)
summary(Mod2)
## Call:
## coxph(formula = Surv(time, status_code) ~ dose, data = data)
## 
##   n= 238, number of events= 150 
## 
##           coef exp(coef)  se(coef)  z Pr(>|z|)    
## dose -0.035795  0.964838  0.005966 -6 1.98e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      exp(coef) exp(-coef) lower .95 upper .95
## dose    0.9648      1.036    0.9536    0.9762
## 
## Concordance= 0.65  (se = 0.025 )
## Likelihood ratio test= 36.78  on 1 df,   p=1e-09
## Wald test            = 36  on 1 df,   p=2e-09
## Score (logrank) test = 36.35  on 1 df,   p=2e-09
  • Here, \(H_0:\text{HAZARDs are proportional a.g.,}\:\:H_1:\text{HAZARDs are NOT proportional}\)

  • It will return test for each X and for overall model.

To test for the proportional-hazards (PH) assumption, type this:

cox.zph(Mod2)
##        chisq df    p
## dose   0.174  1 0.68
## GLOBAL 0.174  1 0.68
plot(cox.zph(Mod2))
abline(h=0,col="red")

Mod3= coxph(Surv(time,status_code)~dose+Depression,data=data)
summary(Mod3)
## Call:
## coxph(formula = Surv(time, status_code) ~ dose + Depression, 
##     data = data)
## 
##   n= 238, number of events= 150 
## 
##                   coef exp(coef)  se(coef)      z Pr(>|z|)    
## dose         -0.036286  0.964364  0.006278 -5.780 7.46e-09 ***
## Depressionno  1.224191  3.401415  0.204746  5.979 2.24e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##              exp(coef) exp(-coef) lower .95 upper .95
## dose            0.9644      1.037    0.9526    0.9763
## Depressionno    3.4014      0.294    2.2771    5.0809
## 
## Concordance= 0.71  (se = 0.022 )
## Likelihood ratio test= 80.62  on 2 df,   p=<2e-16
## Wald test            = 68.24  on 2 df,   p=2e-15
## Score (logrank) test = 73.88  on 2 df,   p=<2e-16
cox.zph(Mod3)
##             chisq df    p
## dose       0.1099  1 0.74
## Depression 0.0325  1 0.86
## GLOBAL     0.1411  2 0.93

0.14 Restricted Mean Survival:

Restricted mean survival time (RMST) is suggested as a novel alternative measure in survival analyses and may be useful when proportional hazards assumption cannot be made or when event rate is low.

Note that, previously we have seen that Median survival for the clinic 1 was not converged. So in this kind of scenario RMST are mostly used.

In R to do this RMST:

#library(survRM2)
RMST_Clinic=rmst2(time = data$time,status = data$status_code,arm = (data$clinic=="Clinic1")*1,tau = 905)
RMST_Clinic
## 
## The truncation time: tau = 905  was specified. 
## 
## Restricted Mean Survival Time (RMST) by arm 
##                 Est.     se lower .95 upper .95
## RMST (arm=1) 643.783 40.079   565.229   722.337
## RMST (arm=0) 429.332 22.581   385.073   473.591
## 
## 
## Restricted Mean Time Lost (RMTL) by arm 
##                 Est.     se lower .95 upper .95
## RMTL (arm=1) 261.217 40.079   182.663   339.771
## RMTL (arm=0) 475.668 22.581   431.409   519.927
## 
## 
## Between-group contrast 
##                         Est. lower .95 upper .95 p
## RMST (arm=1)-(arm=0) 214.451   124.287   304.615 0
## RMST (arm=1)/(arm=0)   1.499     1.278     1.759 0
## RMTL (arm=1)/(arm=0)   0.549     0.401     0.752 0