A 1991 Australian study by Caplehorn et al., compared retention in two methadone treatment clinics for heroin addicts. A patient’s survival time \((T)\) was determined as the time in days until the patient dropped out of the clinic or was censored at the end of the study clinic. The two clinics differed according to their overall treatment policies.
setwd("C:\\Users\\rajesh.majumder\\Downloads\\Survival tutorial SJRI_15_07_23\\")
data= read.csv("Addict.csv")
head(data,10)
## idno clinic status time pris_rec dose LI Depression
## 1 104 0 0 713 0 50 7 1
## 2 128 0 0 72 1 40 7 0
## 3 137 0 0 563 0 70 8 0
## 4 165 0 0 1076 1 80 8 0
## 5 131 0 0 367 0 70 9 0
## 6 184 0 0 222 0 40 9 0
## 7 193 0 0 531 1 65 9 0
## 8 166 0 0 2 1 40 10 0
## 9 121 0 0 591 0 70 11 0
## 10 163 0 0 611 1 40 11 0
str(data)
## 'data.frame': 238 obs. of 8 variables:
## $ idno : int 104 128 137 165 131 184 193 166 121 163 ...
## $ clinic : int 0 0 0 0 0 0 0 0 0 0 ...
## $ status : int 0 0 0 0 0 0 0 0 0 0 ...
## $ time : int 713 72 563 1076 367 222 531 2 591 611 ...
## $ pris_rec : int 0 1 0 1 0 0 1 1 0 1 ...
## $ dose : int 50 40 70 80 70 40 65 40 70 40 ...
## $ LI : int 7 7 8 8 9 9 9 10 11 11 ...
## $ Depression: int 1 0 0 0 0 0 0 0 0 0 ...
## Re-labeling and converting to factor ##
##-- clinic --##
data$clinic= factor(data$clinic,levels=c(0,1),labels=c("Clinic1","Clinic2"))
##--status--##
data$status= factor(data$status,levels=c(0,1),labels=c("Censored","endpoint"))
##--pris_rec--##
data$pris_rec= factor(data$pris_rec,levels=c(0,1),labels=c("none","any"))
##-- Depression --##
data$Depression= factor(data$Depression,levels = c(0,1),labels = c("yes","no"))
library(survival)
library(survminer)
library(ggsurvfit)
library(survRM2)
Let’s draw the density plot(using Kernel density estimation technique) for Censored and Event.
library(ggplot2)
# create density plot with desired colours
ggplot(data, aes(x=time,fill=status,color=status)) + geom_density(
alpha=0.8)+facet_wrap(.~clinic)
Now, Our interest is to estimate what is the retention probability at a certain time say \(t\), which is nothing but Survival probability at a certain time,\(S(t)\).
The survival function, is the probability an individual survives (or, the probability that the event of interest does not occur) up to and including time \(t\). It’s the probability that the event (e.g., death) hasn’t occured yet. It looks like this, where \(T\) is the time of death, and \(P(T>t)\) is the probability that the time of death is greater than some time \(t\).
There are various techniques to estimate this retention probability or survival probability, one of them is Product Limit estimation/Kaplan-Meier estimation.
In R the survfit()
function under survival
package; estimates median survival with 95% C.I and creates survival
curves based on a formula.
## Estimated Median
data$status_code= (data$status=="endpoint")*1
f1=survfit(Surv(time, status_code)~1,data=data)
f1
## Call: survfit(formula = Surv(time, status_code) ~ 1, data = data)
##
## n events median 0.95LCL 0.95UCL
## [1,] 238 150 504 394 560
summary(f1)
## Call: survfit(formula = Surv(time, status_code) ~ 1, data = data)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 7 236 1 0.996 0.00423 0.9875 1.000
## 13 235 1 0.992 0.00597 0.9799 1.000
## 17 234 1 0.987 0.00729 0.9731 1.000
## 19 233 1 0.983 0.00840 0.9667 1.000
## 26 232 1 0.979 0.00937 0.9606 0.997
## 29 229 1 0.975 0.01026 0.9546 0.995
## 30 228 1 0.970 0.01107 0.9488 0.992
## 33 227 1 0.966 0.01182 0.9431 0.989
## 35 226 2 0.957 0.01317 0.9320 0.984
## 37 224 1 0.953 0.01379 0.9265 0.981
## 41 223 2 0.945 0.01493 0.9158 0.974
## 44 221 1 0.940 0.01546 0.9105 0.971
## 47 220 1 0.936 0.01597 0.9053 0.968
## 49 219 1 0.932 0.01646 0.9001 0.965
## 50 218 1 0.928 0.01693 0.8949 0.961
## 59 215 1 0.923 0.01739 0.8897 0.958
## 62 214 1 0.919 0.01784 0.8846 0.955
## 67 212 1 0.915 0.01827 0.8794 0.951
## 75 210 1 0.910 0.01870 0.8743 0.948
## 79 209 1 0.906 0.01911 0.8692 0.944
## 84 208 1 0.901 0.01951 0.8641 0.941
## 90 206 1 0.897 0.01990 0.8590 0.937
## 95 205 1 0.893 0.02028 0.8539 0.933
## 96 204 1 0.888 0.02064 0.8488 0.930
## 109 201 1 0.884 0.02101 0.8437 0.926
## 117 199 1 0.880 0.02137 0.8386 0.922
## 122 198 1 0.875 0.02172 0.8335 0.919
## 126 197 1 0.871 0.02206 0.8284 0.915
## 127 196 1 0.866 0.02239 0.8234 0.911
## 129 195 1 0.862 0.02271 0.8184 0.907
## 136 193 1 0.857 0.02302 0.8133 0.904
## 143 192 1 0.853 0.02333 0.8083 0.900
## 145 191 1 0.848 0.02363 0.8033 0.896
## 147 189 1 0.844 0.02393 0.7982 0.892
## 149 187 1 0.839 0.02423 0.7932 0.888
## 150 186 1 0.835 0.02451 0.7881 0.884
## 157 184 1 0.830 0.02480 0.7831 0.880
## 160 183 1 0.826 0.02507 0.7780 0.876
## 161 182 1 0.821 0.02534 0.7730 0.872
## 167 180 1 0.817 0.02561 0.7680 0.868
## 168 179 1 0.812 0.02587 0.7629 0.864
## 170 178 1 0.808 0.02612 0.7579 0.860
## 175 177 1 0.803 0.02637 0.7529 0.856
## 176 175 1 0.798 0.02662 0.7479 0.852
## 180 174 2 0.789 0.02709 0.7379 0.844
## 181 172 1 0.785 0.02732 0.7329 0.840
## 183 171 1 0.780 0.02754 0.7279 0.836
## 190 170 1 0.775 0.02776 0.7229 0.832
## 192 169 1 0.771 0.02797 0.7179 0.828
## 193 168 1 0.766 0.02818 0.7130 0.824
## 204 167 1 0.762 0.02838 0.7080 0.819
## 205 165 1 0.757 0.02858 0.7031 0.815
## 207 164 1 0.752 0.02878 0.6981 0.811
## 209 163 1 0.748 0.02897 0.6931 0.807
## 212 161 2 0.739 0.02934 0.6832 0.798
## 216 159 2 0.729 0.02970 0.6733 0.790
## 223 156 1 0.725 0.02988 0.6683 0.786
## 231 155 1 0.720 0.03005 0.6633 0.781
## 232 154 1 0.715 0.03021 0.6584 0.777
## 237 153 1 0.711 0.03037 0.6534 0.773
## 244 152 1 0.706 0.03053 0.6485 0.768
## 247 151 1 0.701 0.03069 0.6436 0.764
## 257 150 1 0.697 0.03084 0.6386 0.760
## 258 149 1 0.692 0.03098 0.6337 0.755
## 259 148 1 0.687 0.03112 0.6288 0.751
## 262 147 2 0.678 0.03139 0.6190 0.742
## 268 145 2 0.668 0.03165 0.6092 0.733
## 275 143 1 0.664 0.03177 0.6044 0.729
## 280 142 1 0.659 0.03189 0.5995 0.725
## 286 140 1 0.654 0.03201 0.5946 0.720
## 293 139 1 0.650 0.03212 0.5897 0.716
## 294 138 1 0.645 0.03223 0.5848 0.711
## 299 137 1 0.640 0.03234 0.5799 0.707
## 302 136 1 0.636 0.03244 0.5751 0.702
## 314 135 1 0.631 0.03254 0.5702 0.698
## 322 133 1 0.626 0.03264 0.5653 0.693
## 337 130 1 0.621 0.03274 0.5603 0.689
## 341 128 1 0.616 0.03284 0.5553 0.684
## 348 125 1 0.612 0.03295 0.5502 0.680
## 350 124 1 0.607 0.03305 0.5452 0.675
## 358 123 1 0.602 0.03315 0.5401 0.670
## 366 121 1 0.597 0.03325 0.5350 0.666
## 367 120 1 0.592 0.03334 0.5299 0.661
## 368 118 1 0.587 0.03343 0.5247 0.656
## 376 117 1 0.582 0.03352 0.5196 0.651
## 386 116 1 0.577 0.03360 0.5144 0.646
## 389 115 1 0.572 0.03368 0.5093 0.642
## 393 114 1 0.567 0.03376 0.5042 0.637
## 394 113 1 0.562 0.03383 0.4991 0.632
## 399 111 1 0.557 0.03390 0.4939 0.627
## 428 108 1 0.551 0.03398 0.4887 0.622
## 438 107 1 0.546 0.03405 0.4834 0.617
## 450 105 1 0.541 0.03412 0.4782 0.612
## 452 104 1 0.536 0.03419 0.4729 0.607
## 457 102 1 0.531 0.03425 0.4676 0.602
## 460 101 1 0.525 0.03431 0.4622 0.597
## 465 99 1 0.520 0.03437 0.4569 0.592
## 482 96 1 0.515 0.03444 0.4514 0.587
## 489 95 1 0.509 0.03450 0.4459 0.582
## 496 94 1 0.504 0.03456 0.4404 0.576
## 504 92 1 0.498 0.03461 0.4349 0.571
## 512 91 1 0.493 0.03466 0.4294 0.566
## 514 90 1 0.487 0.03471 0.4239 0.560
## 517 89 1 0.482 0.03475 0.4184 0.555
## 518 87 1 0.476 0.03479 0.4128 0.550
## 522 86 1 0.471 0.03482 0.4073 0.544
## 523 85 2 0.460 0.03487 0.3962 0.533
## 532 80 1 0.454 0.03490 0.3905 0.528
## 533 78 1 0.448 0.03494 0.3847 0.522
## 540 77 1 0.442 0.03497 0.3789 0.516
## 546 74 1 0.436 0.03500 0.3729 0.511
## 550 73 1 0.430 0.03503 0.3669 0.505
## 560 70 1 0.424 0.03506 0.3608 0.499
## 563 69 1 0.418 0.03509 0.3547 0.493
## 581 62 1 0.411 0.03517 0.3479 0.486
## 591 59 1 0.404 0.03525 0.3409 0.480
## 612 54 2 0.389 0.03550 0.3257 0.466
## 624 51 1 0.382 0.03562 0.3180 0.458
## 646 48 1 0.374 0.03575 0.3099 0.451
## 652 47 1 0.366 0.03587 0.3019 0.443
## 661 46 1 0.358 0.03596 0.2939 0.436
## 667 45 1 0.350 0.03603 0.2860 0.428
## 679 44 1 0.342 0.03608 0.2781 0.421
## 683 43 1 0.334 0.03610 0.2703 0.413
## 708 39 1 0.325 0.03618 0.2618 0.405
## 714 37 1 0.317 0.03626 0.2530 0.396
## 739 35 1 0.308 0.03633 0.2441 0.388
## 749 34 1 0.299 0.03637 0.2352 0.379
## 755 33 1 0.290 0.03638 0.2263 0.370
## 760 32 1 0.280 0.03635 0.2176 0.362
## 771 28 1 0.270 0.03641 0.2078 0.352
## 774 27 1 0.260 0.03641 0.1980 0.343
## 785 26 1 0.250 0.03636 0.1884 0.333
## 821 20 2 0.225 0.03679 0.1637 0.310
## 836 17 1 0.212 0.03693 0.1508 0.298
## 837 16 1 0.199 0.03693 0.1382 0.286
## 857 14 1 0.185 0.03692 0.1248 0.273
## 878 13 1 0.170 0.03671 0.1118 0.260
## 892 10 1 0.153 0.03679 0.0959 0.245
## 899 9 1 0.136 0.03644 0.0808 0.230
We can give the summary()
function an option for what
times we want to show in the results.
summary(f1,times = c(10,50,100,150,200,250,300,350,400))
## Call: survfit(formula = Surv(time, status_code) ~ 1, data = data)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 10 235 1 0.996 0.00423 0.988 1.000
## 50 218 16 0.928 0.01693 0.895 0.961
## 100 202 9 0.888 0.02064 0.849 0.930
## 150 186 12 0.835 0.02451 0.788 0.884
## 200 167 15 0.766 0.02818 0.713 0.824
## 250 150 14 0.701 0.03069 0.644 0.764
## 300 136 13 0.640 0.03234 0.580 0.707
## 350 124 7 0.607 0.03305 0.545 0.675
## 400 110 10 0.557 0.03390 0.494 0.627
Note that, by default R does not report mean survival. To get the mean survival:
print(f1,print.rmean=T)
## Call: survfit(formula = Surv(time, status_code) ~ 1, data = data)
##
## n events rmean* se(rmean) median 0.95LCL 0.95UCL
## [1,] 238 150 517 24.4 504 394 560
## * restricted mean with upper limit = 1076
plot(f1,
xlab = "Time",
ylab="Retension Probability",
main="Overall Survival Probability ptot",
lwd=2)
box()
##-- Simple Survival plot
ggsurvplot(f1, data = data, conf.int = T,legend="top")+xlab("Time")
#surv.median.line = "hv"
Here, in the above this is the overall survival curve. But where is the Median Survival value? To draw it…
ggsurvplot(f1, data = data, conf.int = T,legend="top",surv.median.line = "hv")+xlab("Time")
Alternatively,
survfit2(Surv(time, status_code) ~ 1, data = data) %>%
ggsurvfit()+add_confidence_interval()+add_quantile(y_value = 0.5, color = "gray50", linewidth = 0.75)+
labs(x="Time (Days)",y="Survival Probability")+
add_risktable(risktable_stats = c("n.risk","cum.event"),stats_label = list(n.risk = "Number at Risk",cum.event="Number of Events"))
f2=survfit(Surv(time, status_code)~clinic,data=data)
print(f2,print.rmean = T)
## Call: survfit(formula = Surv(time, status_code) ~ clinic, data = data)
##
## n events rmean* se(rmean) median 0.95LCL 0.95UCL
## clinic=Clinic1 75 28 732 50.5 NA 661 NA
## clinic=Clinic2 163 122 432 23.3 399 341 514
## * restricted mean with upper limit = 1076
## Survival Plot
ggsurvplot(f2, data = data,linetype = "strata",pval = TRUE, surv.median.line = "hv",conf.int = F,legend="top")+xlab("Years")
Similarly we can also estimate retention probabilities for other factors:
### For Prison record---:
##-- Median Estimate with 95% C.I
f2=survfit(Surv(time, status_code)~pris_rec,data=data)
f2
## Call: survfit(formula = Surv(time, status_code) ~ pris_rec, data = data)
##
## n events median 0.95LCL 0.95UCL
## pris_rec=none 127 81 523 460 652
## pris_rec=any 111 69 394 268 563
ggsurvplot(f2, data = data,linetype = "strata",pval = TRUE, conf.int = F,surv.median.line = "hv",legend="top")+xlab("Years")
### For Dose---:
##-- Median Estimate with 95% C.I
f3=survfit(Surv(time, status_code)~dose,data=data)
f3
## Call: survfit(formula = Surv(time, status_code) ~ dose, data = data)
##
## n events median 0.95LCL 0.95UCL
## dose=20 1 1 127 NA NA
## dose=30 2 2 332 183 NA
## dose=35 2 2 420 160 NA
## dose=40 30 18 223 117 NA
## dose=45 10 7 322 84 NA
## dose=50 27 15 460 216 NA
## dose=55 21 20 366 262 450
## dose=60 52 37 399 257 708
## dose=65 22 17 496 337 NA
## dose=70 24 14 563 314 NA
## dose=75 6 3 771 771 NA
## dose=80 35 12 NA 785 NA
## dose=90 2 1 280 280 NA
## dose=100 3 1 NA 216 NA
## dose=110 1 0 NA NA NA
### For LI record---:
##-- Median Estimate with 95% C.I/
f4=survfit(Surv(time, status_code)~LI,data=data)
f4
## Call: survfit(formula = Surv(time, status_code) ~ LI, data = data)
##
## n events median 0.95LCL 0.95UCL
## LI=7 2 0 NA NA NA
## LI=8 5 0 NA NA NA
## LI=9 4 0 NA NA NA
## LI=10 3 0 NA NA NA
## LI=11 4 0 NA NA NA
## LI=12 9 1 NA NA NA
## LI=13 9 0 NA NA NA
## LI=14 14 0 NA NA NA
## LI=15 16 4 NA 612 NA
## LI=16 19 12 280 207 NA
## LI=17 17 9 223 161 NA
## LI=18 18 14 394 358 NA
## LI=19 17 16 314 190 679
## LI=20 33 28 399 337 560
## LI=21 24 23 366 216 540
## LI=22 20 19 275 247 624
## LI=23 10 10 228 127 NA
## LI=24 4 4 318 175 NA
## LI=25 6 6 178 50 NA
## LI=26 4 4 280 79 NA
### For Depression---:
##-- Median Estimate with 95% C.I/
f5=survfit(Surv(time, status_code)~Depression,data=data)
f5
## Call: survfit(formula = Surv(time, status_code) ~ Depression, data = data)
##
## n events median 0.95LCL 0.95UCL
## Depression=yes 97 30 785 714 NA
## Depression=no 141 120 322 259 452
ggsurvplot(f5, data = data,linetype = "strata",pval = TRUE, conf.int = F,surv.median.line = "hv",legend="top")+xlab("Years")
When there are more than 1 survival curves for number of groups \(k\;|;where\:\:k\geq2\), we need to perform a statistical significant test between those survival curves. As, \(S(t)\) is a probability function, so, Log Rank test statistic is approximately distributed as a chi-square test statistic with d.f. 1.
So the comparison of retention probabilities between Clinics:
test= survdiff(Surv(time,status_code)~clinic,data=data)
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ clinic, data = data)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## clinic=Clinic1 75 28 59.2 16.4 28.1
## clinic=Clinic2 163 122 90.8 10.7 28.1
##
## Chisq= 28.1 on 1 degrees of freedom, p= 1e-07
The Chi-Squared test statistic is 28.1 with 1 degree of freedom and the corresponding p-value is <0.001. Since this p-value is less than 0.001, we reject the null hypothesis,i.e., \(H_0: \text{in terms of retention, there is no difference between two clinics.}\).
In other words, we have sufficient evidence to say that there is a statistically significant difference for retention between the Clinic 1 & Clinic 2.
To, extract the p-value from survdiff
we use the
following trick:
p.val= 1 - pchisq(test$chisq, length(test$n) - 1)
round(p.val,3)
## [1] 0
similarly, the log-rank test for other covariates:
test= survdiff(Surv(time,status_code)~pris_rec,data=data)
print("Prison record:--")
## [1] "Prison record:--"
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ pris_rec, data = data)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## pris_rec=none 127 81 87.4 0.474 1.14
## pris_rec=any 111 69 62.6 0.663 1.14
##
## Chisq= 1.1 on 1 degrees of freedom, p= 0.3
test= survdiff(Surv(time,status_code)~dose,data=data)
print("Dose:--")
## [1] "Dose:--"
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ dose, data = data)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## dose=20 1 1 0.143 5.1221 5.1488
## dose=30 2 2 0.910 1.3073 1.3201
## dose=35 2 2 1.258 0.4382 0.4436
## dose=40 30 18 10.198 5.9682 6.4796
## dose=45 10 7 4.411 1.5202 1.5816
## dose=50 27 15 14.219 0.0429 0.0477
## dose=55 21 20 9.676 11.0153 11.9907
## dose=60 52 37 30.039 1.6133 2.0287
## dose=65 22 17 14.107 0.5933 0.6623
## dose=70 24 14 16.070 0.2667 0.3022
## dose=75 6 3 5.070 0.8453 0.8798
## dose=80 35 12 38.921 18.6209 27.5374
## dose=90 2 1 1.789 0.3478 0.3536
## dose=100 3 1 2.306 0.7396 0.7542
## dose=110 1 0 0.884 0.8843 0.8923
##
## Chisq= 53.4 on 14 degrees of freedom, p= 2e-06
test= survdiff(Surv(time,status_code)~LI,data=data)
print("LI:--")
## [1] "LI:--"
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ LI, data = data)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## LI=7 2 0 1.20 1.20e+00 1.22e+00
## LI=8 5 0 6.30 6.30e+00 6.77e+00
## LI=9 4 0 2.26 2.26e+00 2.31e+00
## LI=10 3 0 1.74 1.74e+00 1.77e+00
## LI=11 4 0 4.55 4.55e+00 4.73e+00
## LI=12 9 1 7.52 5.65e+00 6.11e+00
## LI=13 9 0 6.97 6.97e+00 7.34e+00
## LI=14 14 0 12.62 1.26e+01 1.39e+01
## LI=15 16 4 10.19 3.76e+00 4.09e+00
## LI=16 19 12 9.73 5.28e-01 5.70e-01
## LI=17 17 9 6.81 7.04e-01 7.42e-01
## LI=18 18 14 13.92 4.43e-04 4.94e-04
## LI=19 17 16 10.27 3.19e+00 3.44e+00
## LI=20 33 28 21.59 1.90e+00 2.23e+00
## LI=21 24 23 13.70 6.31e+00 7.01e+00
## LI=22 20 19 10.07 7.91e+00 8.53e+00
## LI=23 10 10 3.71 1.07e+01 1.10e+01
## LI=24 4 4 2.62 7.31e-01 7.49e-01
## LI=25 6 6 2.65 4.25e+00 4.37e+00
## LI=26 4 4 1.57 3.75e+00 3.81e+00
##
## Chisq= 87 on 19 degrees of freedom, p= 1e-10
test= survdiff(Surv(time,status_code)~Depression,data=data)
print("Depression:--")
## [1] "Depression:--"
test
## Call:
## survdiff(formula = Surv(time, status_code) ~ Depression, data = data)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Depression=yes 97 30 69.7 22.6 42.6
## Depression=no 141 120 80.3 19.7 42.6
##
## Chisq= 42.6 on 1 degrees of freedom, p= 7e-11
Kaplan-Meier curves are good for visualizing differences in survival between two categorical groups, but they don’t work well for assessing the effect of quantitative variables like age, gene expression, leukocyte count, etc. Cox PH regression can assess the effect of both categorical and continuous variables, and can model the effect of multiple variables at once.
Let,
Mod1=coxph(Surv(time,status_code)~Depression,data=data)
summary(Mod1)
## Call:
## coxph(formula = Surv(time, status_code) ~ Depression, data = data)
##
## n= 238, number of events= 150
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Depressionno 1.2532 3.5016 0.2045 6.128 8.9e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Depressionno 3.502 0.2856 2.345 5.228
##
## Concordance= 0.633 (se = 0.021 )
## Likelihood ratio test= 46.26 on 1 df, p=1e-11
## Wald test = 37.55 on 1 df, p=9e-10
## Score (logrank) test = 42.66 on 1 df, p=7e-11
The exp(coef)
column contains \(e^{\beta}\). This is the adjusted
hazard ratio (in our case HR of clinic adjusted for
dose =3.502) – the multiplicative effect of that variable on the hazard
rate (for each unit increase in that variable). So, for the categorical
variable like clinic, at a given instant in time, someone who was
diagnosed in clinic 2 is 2.6 times as likely to get event of interest as
someone who was diagnosed in clinic 1 adjusting for dose.
Note that, Likelihood ratio test, Wald test and Score test are the values of the test statistics for testing of hypothesis: \(H_0:\text{All}\:\:\beta_s \:\:=0\).
These give the evidence of overall model significance.
Note that the Concordance value is= 66%(0.66) which gives the model accuracy. Just like for logistic regression; AUC.
plot(survfit(Surv(time, status_code) ~ Depression,data = data), col=c("black", "red"), fun="cloglog",
xlab = "Time",ylab = "log(-log(S(t)))",main="Log-Log plot to check PH Assumption")
Note that, the two curves are not overlapped, that means the proportional Hazard assumption is hold.
Mod2=coxph(Surv(time,status_code)~dose,data=data)
summary(Mod2)
## Call:
## coxph(formula = Surv(time, status_code) ~ dose, data = data)
##
## n= 238, number of events= 150
##
## coef exp(coef) se(coef) z Pr(>|z|)
## dose -0.035795 0.964838 0.005966 -6 1.98e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## dose 0.9648 1.036 0.9536 0.9762
##
## Concordance= 0.65 (se = 0.025 )
## Likelihood ratio test= 36.78 on 1 df, p=1e-09
## Wald test = 36 on 1 df, p=2e-09
## Score (logrank) test = 36.35 on 1 df, p=2e-09
Here, \(H_0:\text{HAZARDs are proportional a.g.,}\:\:H_1:\text{HAZARDs are NOT proportional}\)
It will return test for each X and for overall model.
To test for the proportional-hazards (PH) assumption, type this:
cox.zph(Mod2)
## chisq df p
## dose 0.174 1 0.68
## GLOBAL 0.174 1 0.68
plot(cox.zph(Mod2))
abline(h=0,col="red")
Mod3= coxph(Surv(time,status_code)~dose+Depression,data=data)
summary(Mod3)
## Call:
## coxph(formula = Surv(time, status_code) ~ dose + Depression,
## data = data)
##
## n= 238, number of events= 150
##
## coef exp(coef) se(coef) z Pr(>|z|)
## dose -0.036286 0.964364 0.006278 -5.780 7.46e-09 ***
## Depressionno 1.224191 3.401415 0.204746 5.979 2.24e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## dose 0.9644 1.037 0.9526 0.9763
## Depressionno 3.4014 0.294 2.2771 5.0809
##
## Concordance= 0.71 (se = 0.022 )
## Likelihood ratio test= 80.62 on 2 df, p=<2e-16
## Wald test = 68.24 on 2 df, p=2e-15
## Score (logrank) test = 73.88 on 2 df, p=<2e-16
cox.zph(Mod3)
## chisq df p
## dose 0.1099 1 0.74
## Depression 0.0325 1 0.86
## GLOBAL 0.1411 2 0.93
Restricted mean survival time (RMST) is suggested as a novel alternative measure in survival analyses and may be useful when proportional hazards assumption cannot be made or when event rate is low.
Note that, previously we have seen that Median survival for the clinic 1 was not converged. So in this kind of scenario RMST are mostly used.
In R to do this RMST:
#library(survRM2)
RMST_Clinic=rmst2(time = data$time,status = data$status_code,arm = (data$clinic=="Clinic1")*1,tau = 905)
RMST_Clinic
##
## The truncation time: tau = 905 was specified.
##
## Restricted Mean Survival Time (RMST) by arm
## Est. se lower .95 upper .95
## RMST (arm=1) 643.783 40.079 565.229 722.337
## RMST (arm=0) 429.332 22.581 385.073 473.591
##
##
## Restricted Mean Time Lost (RMTL) by arm
## Est. se lower .95 upper .95
## RMTL (arm=1) 261.217 40.079 182.663 339.771
## RMTL (arm=0) 475.668 22.581 431.409 519.927
##
##
## Between-group contrast
## Est. lower .95 upper .95 p
## RMST (arm=1)-(arm=0) 214.451 124.287 304.615 0
## RMST (arm=1)/(arm=0) 1.499 1.278 1.759 0
## RMTL (arm=1)/(arm=0) 0.549 0.401 0.752 0
To get better understanding about RMST, go click here
For more details: click here