This document is an accompanying file of the paper:
Plana-Ripoll et al. Exploring Comorbidity Within Mental Disorders Among a Danish National Population. JAMA Psychiatry, published online on Jan 16, 2019. doi:10.1001/jamapsychiatry.2018.3658. Link to the published paper hereOleguer Plana-Ripoll, Carsten Bøcker Pedersen, Yan Holtz, Michael Benros, Søren Dalsgaard, Peter De Jonge, Chun Chieh Fan, Louisa Degenhardt, Andrea Ganna, Aja Neergaard Greve, Jane Gunn, Kim Moesgaard Iburg, Lars Vedel Kessing, Brian K. Lee, Carmen C.W. Lim, Ole Mors, Merete Nordentoft, Anders Prior, Annelieke M. Roest, Sukanta Saha, Andrew Schork, James G. Scott, Kate M. Scott, Terry Stedman, Holger J. Sørensen, Thomas Werge, Harvey A. Whiteford, Thomas Munk Laursen, Esben Agerbo, Ron C. Kessler, Preben Bo Mortensen and John J. McGrath
In this paper, we estimated the association between 90 pairs of mental disorders. In this document, the R code for one specific pair is provided: the association between Mood Disorders (as a prior-disorder) and Neurotic Disorders (as a later-disorder). Follow the steps 1-7 below in order to replicate the analyses.
library(lubridate)
library(survival)
library(knitr)
library(multcomp)
library(gdata)
library(data.table)
library(mstate)
library(plyr)
load("../basics.RData")
There are two necessary datasets for the estimations: data_disease and land.
data_disease <- stam_diag[, c("pnr", "fdato", "fdatoD", "Byear", "kqn", "stat", "statd", "statdD", "statdA", "D00", "D10", "D20", "D30", "D41", "D51", "D61", "D70", "D81", "D91")]
nrow(data_disease)
## [1] 7387479
The Danish Civil Registration System was established on April 1, 1968, where all persons alive and living in Denmark were registered. After this date, all persons born or immigrating to Denmark were included. Dataset data_disease contains information on the subsample of this group who were born in Denmark between January 1, 1900 and December 31, 2015 (N=7387479). Information includes a unique personal identification number used to link information from several registers (pnr), birthdate (fdato), sex (kqn: “M” for males and “K” for females), legal status (stat: 1 for active citizens, 70 for disappeared, 80 for emigrated and 90 for dead) and date of disappearance, emigration or death (statd). In addition, information on the 10 mental disorders used in this publication were included from the Danish Psychiatric Central Research Register: Organic Disorders (D00), Substance Use disorders (D10), Schizophrenia and related disorders (D20), Mood Disorders (D30), Neurotic Disorders (D41), Eating Disorders (D51), Personality Disorders (D61), Intelectual disabilities (D70), Developmental Disorders (D81) and Behavioral Disorders (D91).
According to Danish regulations, it is not allowed to share personal data, so an invented case is used here to show the data structure. This invented person will be used throughout the whole code example. She was born on May 31, 1985 and emigrated from Denmark on March 13, 2015. She received a diagnosis of Substance Use disorders for the first time on October 15, 2005; Mood Disorders on December 1, 2010; Neurotic Disorders on November 10, 2014; and Eating Disorders on April 13, 2011.
pnr | fdato | fdatoD | Byear | kqn | stat | statd | statdD | statdA | D00 | D10 | D20 | D30 | D41 | D51 | D61 | D70 | D81 | D91 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
invented_person | 1985-05-31 | 1985.411 | 1985 | K | 80 | 2015-03-13 | 2015.195 | 29.784 | NA | 2005-10-15 | NA | 2010-12-01 | 2014-11-10 | 2011-04-13 | NA | NA | NA | NA |
Dataset land contains information on all persons in data_disease regarding periods of living in Denmark. Each period of Danish residence is defined with a starting date (tflytd) and ending date (fflytd)
Invented data for the same invented person above is used to show the data structure. This person was living in Denmark since birth (May 31, 1985) until emigration (March 13, 2015) except for a period that she moved abroad (from August 1, 1997 to January 5, 1998).
pnr | tflytd | fflytd |
---|---|---|
invented_person | 1985-05-31 | 1997-08-01 |
invented_person | 1998-01-05 | 2015-03-13 |
With these two datasets, we can set the study population, the specific follow-up time for each person, and information on the later-disorder (Neurotic Disorders) and all prior-disorders (all disorders except Neurotic Disorders).
The outcome of interest for this code example is Neurotic Disorders (variable D41). The possible minimum age of onset for this disorder is 5 years.
disease <- "D41"
minage <- 5
For every person, follow-up starts on age at January 1, 2000 or at 5 years, whichever occurs later.
data_disease$entryA <- pmax(2000-data_disease[, "fdatoD"], minage)
data_disease$entryD <- as.numeric(data_disease$fdatoD) + as.numeric(data_disease$entryA)
data_disease$entry <- as.Date(date_decimal(data_disease$entryD))
For every person, follow-up ends on December 31, 2016, death, emigration or disappearance, whichever occurs first. Follow-up for persons developing Neurotic Disorders will end on the date of onset of the disorder, but this information will be included later.
data_disease$exit <- pmin(as.Date("2016-12-31"), data_disease[, "statd"], na.rm = TRUE)
data_disease$exitD <- decimal_date(data_disease$exit)
data_disease$exitA <- data_disease$exitD - data_disease$fdatoD
Some persons died or emigrated before the follow-up started on January 1, 2000 (or age 5 years). These persons need to be excluded from the sample. An easy way to identify them is because their follow-up time finishes before it starts, so we keep in the sample only those whose follow-up starts before it finishes.
data_disease <- data_disease[data_disease$entryA < data_disease$exitA, ]
Some persons were living outside Denmark on the date of start of follow-up (variable entry). We need to keep only those residing in the country on that date. We use information from the dataset land to obtain this information.
data_disease_DK <- merge(data_disease, land, all.x = TRUE, all.y = FALSE)
data_disease_DK <- data_disease_DK[(!is.na(data_disease_DK$tflytd)) &
(data_disease_DK$tflytd <= data_disease_DK$entry) &
(data_disease_DK$fflytd > data_disease_DK$entry)
, ]
nrow(data_disease_DK)
## [1] 5699944
The sample now consists on 5699944 persons who were living in Denmark on the date of start of follow-up (January 1, 2000 or the date they turned 5 years, if it happened afterwards).
pnr | fdato | fdatoD | Byear | kqn | stat | statd | statdD | statdA | D00 | D10 | D20 | D30 | D41 | D51 | D61 | D70 | D81 | D91 | entryA | entryD | entry | exit | exitD | exitA | tflytd | fflytd |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
invented_person | 1985-05-31 | 1985.411 | 1985 | K | 80 | 2015-03-13 | 2015.195 | 29.784 | NA | 2005-10-15 | NA | 2010-12-01 | 2014-11-10 | 2011-04-13 | NA | NA | NA | NA | 14.589 | 2000 | 2000-01-01 | 2015-03-13 | 2015.195 | 29.78352 | 1998-01-05 | 2015-03-13 |
Information on the later-disorder is included at this stage. Variable onsetD contains the date of onset of the specific disorder, and variable onsetA contains the age at onset.
data_disease_DK[!is.na(data_disease_DK[, disease]), "onsetD"] <- decimal_date((data_disease_DK[!is.na(data_disease_DK[, disease]), disease] + 1))
data_disease_DK$onsetA <- data_disease_DK$onsetD - data_disease_DK$fdatoD
Follow-up now needs to finish on date of onset, for those who develop this disorder.
data_disease_DK$exit <- pmin(data_disease_DK$exit, (data_disease_DK[, disease] + 1), na.rm = TRUE)
data_disease_DK$exitD <- decimal_date(data_disease_DK$exit)
data_disease_DK$exitA <- data_disease_DK$exitD - data_disease_DK$fdatoD
Note that for someone developing Neurotic Disorders before the start of follow-up, now exit will occur before entry. These are the prevalent cases and will be identified later.
A variable fail can be created indicating the type of outcome at end of follow-up (0 = censored; 1 = onset of neurotic disorders;70 = disappearance; 80 = emigration; 90 = death).
data_disease_DK$fail <- (90 * (!is.na(data_disease_DK$statdA) & (data_disease_DK$statdA == data_disease_DK$exitA) & (data_disease_DK$stat == 90))) +
(80 * (!is.na(data_disease_DK$statdA) & (data_disease_DK$statdA == data_disease_DK$exitA) & (data_disease_DK$stat == 80))) +
(70 * (!is.na(data_disease_DK$statdA) & (data_disease_DK$statdA == data_disease_DK$exitA) & (data_disease_DK$stat == 70))) +
(01 * (!is.na(data_disease_DK$onsetA) & (data_disease_DK$onsetA == data_disease_DK$exitA)))
table(data_disease_DK$fail)
##
## 0 1 70 80 90 91
## 4497071 294575 1069 69417 837798 14
Note that some persons can have two outcomes on the same day (e.g. 91 = death and onset of Neurotic Disorders).
pnr | fdato | fdatoD | Byear | kqn | stat | statd | statdD | statdA | D00 | D10 | D20 | D30 | D41 | D51 | D61 | D70 | D81 | D91 | entryA | entryD | entry | exit | exitD | exitA | tflytd | fflytd | onsetD | onsetA | fail |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
invented_person | 1985-05-31 | 1985.411 | 1985 | K | 80 | 2015-03-13 | 2015.195 | 29.784 | NA | 2005-10-15 | NA | 2010-12-01 | 2014-11-10 | 2011-04-13 | NA | NA | NA | NA | 14.589 | 2000 | 2000-01-01 | 2014-11-11 | 2014.86 | 29.44927 | 1998-01-05 | 2015-03-13 | 2014.86 | 29.44927 | 1 |
The later-disorder of interest is Neurotic Disorders (variable D41). The other nine disorders (variables D00, D10, D20, D30, D51, D61, D70, D81, D91) will be treated as prior-disorders. For example, variable D30D contains the date of onset of Mood Disorders, and variable D30A contains the age of onset.
comorbidities <- c("D00", "D10", "D20", "D30", "D51", "D61", "D70", "D81", "D91")
comorbiditiesD <- paste(comorbidities,"D",sep="")
comorbiditiesA <- paste(comorbidities,"A",sep="")
for(i in 1:length(comorbidities)) {
data_disease_DK[!is.na(data_disease_DK[, comorbidities[i]]), comorbiditiesD[i]] <- decimal_date((data_disease_DK[!is.na(data_disease_DK[, comorbidities[i]]), comorbidities[i]] + 1))
data_disease_DK[, comorbiditiesA[i]] <- as.numeric(data_disease_DK[, comorbiditiesD[i]]) - as.numeric(data_disease_DK[, "fdatoD"])
}
Finally, it is important to identify the persons who developed Neurotic Disorders before the start of follow-up (prevalent cases) and consider only those at risk of developing Neurotic Disorders.
prevalent <- data_disease_DK[data_disease_DK$entryA >= data_disease_DK$exitA,
c("pnr", "fdatoD", "entry", "entryD", "entryA", "exitD", "exitA", "fail", "statdA", "kqn", "Byear", comorbiditiesA)]
at_risk <- data_disease_DK[data_disease_DK$entryA < data_disease_DK$exitA,
c("pnr", "fdatoD", "entry", "entryD", "entryA", "exitD", "exitA", "fail", "statdA", "kqn", "Byear", comorbiditiesA)]
nrow(prevalent); nrow(at_risk)
## [1] 89348
## [1] 5610596
From the total of 5699944 persons who were living in Denmark on the date of start of follow-up, 89348 had a previous diagnosis of Neurotic Disorders, and were excluded from the analysis as prevalent cases. The remaining 5610596 did not have a previous diagnosis of Neurotic Disorders and were at risk of becoming an incident case during the follow-up, and thus included in the analysis (this information is included in Table 1 in the publication).
pnr | fdatoD | entry | entryD | entryA | exitD | exitA | fail | statdA | kqn | Byear | D00A | D10A | D20A | D30A | D51A | D61A | D70A | D81A | D91A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA |
For some persons, the prior-disorder and later-disorder occurred on the same day (ties). In a time-to-event setting, individuals are followed until the outcome (later-disorder), and these persons would not be considered to be exposed to the prior-disorder. This would underestimate all associations, given that concurrant pairs of disorders could never occur. To fix it, we broke some of these ties, as explained below.
exposures <- comorbidities
exposures
## [1] "D00" "D10" "D20" "D30" "D51" "D61" "D70" "D81" "D91"
This is the list of variables containing the nine possible prior-disorders.
temp <- at_risk
temp$outcome <- (temp$fail %in% c(1,71,81,91))*1
table(temp$outcome)
##
## 0 1
## 5405355 205241
In total, there are 205241 who develop the later-disorder during the follow-up (this information is included in Table 1 of the publication). These are the persons that we need to observe to identify ties (a prior-disorder with onset on the same day). We will do it for each of the nine prior-disorders:
for(i in 1:length(exposures)) {
# First we identify those with both the prior- and later-disorder
both_dx <- temp[!is.na(temp[, paste(exposures[i],"A",sep="")]) & temp$outcome==1, c("pnr", paste(exposures[i],"A",sep=""), "exitA")]
both_dx$diff <- both_dx[, "exitA"] - both_dx[, paste(exposures[i],"A",sep="")]
# Only if the two disorders happen within 5 years
both_dx <- both_dx[abs(both_dx$diff)<5, ]
# How many the prior-disorder occurs before or after the later-disorder
before <- dim(both_dx[both_dx$diff>0, ])[1]
after <- dim(both_dx[both_dx$diff<0, ])[1]
# prop_before is the proportion of persons who develop first the prior-disorder among those who
# develop the two disorders within 5 years (but not on the same day)
prop_before <- before/(before+after)
if(before==0) {
prop_before <- 0
}
# This proportion is used as a probability to break ties. Each tie will be broken with probability
# prop_before by moving the prior-disorder to one day earlier, following a random uniform distribution.
temp$random <- runif(n=dim(temp)[1])
temp[temp$random <= prop_before & !is.na(temp[, paste(exposures[i],"A",sep="")]) & temp$outcome==1 & temp[, paste(exposures[i],"A",sep="")]==temp$exitA, paste(exposures[i],"A",sep="")] <-
temp[temp$random <= prop_before & !is.na(temp[, paste(exposures[i],"A",sep="")]) & temp$outcome==1 & temp[, paste(exposures[i],"A",sep="")]==temp$exitA, paste(exposures[i],"A",sep="")] - (1/365.25)
temp <- temp[, names(temp) != "random"]
}
Data for the invented person do not change compared to previous step, because there are no ties.
pnr | fdatoD | entry | entryD | entryA | exitD | exitA | fail | statdA | kqn | Byear | D00A | D10A | D20A | D30A | D51A | D61A | D70A | D81A | D91A |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA |
We create new variables id, tstart, tstop and endpoint (equivalent to pnr, entryA, exitA, outcome), but that will change later.
temp <- tmerge(temp, temp, id = pnr, tstart = entryA, tstop = exitA)
temp <- tmerge(temp, temp, id = pnr, endpoint = event(exitA, outcome))
pnr | fdatoD | entry | entryD | entryA | exitD | exitA | fail | statdA | kqn | Byear | D00A | D10A | D20A | D30A | D51A | D61A | D70A | D81A | D91A | outcome | id | tstart | tstop | endpoint |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 14.589 | 29.44927 | 1 |
Each person with a diagnosis of Mood Disorders (D30) needs to be “unexposed” until the onset of the disorder, and “exposed” afterwards. In addition, we want to split the follow-up time further depending on time since exposure (6 months, 1, 2, 5, 10, 15, >15 years). For each person, we need to specify in which dates the follow-up time is going to be splitted.
# First on date of onset of Mood Disorders
db_exposures0 <- temp[!is.na(temp[, "D30A"]), c("pnr", "exitA", "D30A")]
db_exposures0 <- db_exposures0[db_exposures0[, "exitA"] >= db_exposures0[, "D30A"],]
# And then 6 months, 1 year, 2 years, etc. later.
db_exposures6m <- db_exposures0
db_exposures6m[, "D30A"] <- db_exposures0[, "D30A"] + (1/2)
db_exposures6m <- db_exposures6m[db_exposures6m[, "exitA"] >= db_exposures6m[, "D30A"],]
db_exposures1 <- db_exposures0
db_exposures1[, "D30A"] <- db_exposures0[, "D30A"] + 1
db_exposures1 <- db_exposures1[db_exposures1[, "exitA"] >= db_exposures1[, "D30A"],]
db_exposures2 <- db_exposures0
db_exposures2[, "D30A"] <- db_exposures0[, "D30A"] + 2
db_exposures2 <- db_exposures2[db_exposures2[, "exitA"] >= db_exposures2[, "D30A"],]
db_exposures5 <- db_exposures0
db_exposures5[, "D30A"] <- db_exposures0[, "D30A"] + 5
db_exposures5 <- db_exposures5[db_exposures5[, "exitA"] >= db_exposures5[, "D30A"],]
db_exposures10 <- db_exposures0
db_exposures10[, "D30A"] <- db_exposures0[, "D30A"] + 10
db_exposures10 <- db_exposures10[db_exposures10[, "exitA"] >= db_exposures10[, "D30A"],]
db_exposures15 <- db_exposures0
db_exposures15[, "D30A"] <- db_exposures0[, "D30A"] + 15
db_exposures15 <- db_exposures15[db_exposures15[, "exitA"] >= db_exposures15[, "D30A"],]
# Finally we combine all these times
db_exposures <- rbind(db_exposures0, db_exposures6m, db_exposures1, db_exposures2, db_exposures5, db_exposures10, db_exposures15)
db_exposures <- db_exposures[!duplicated(db_exposures), ]
names(db_exposures)[names(db_exposures)=="D30A"]<-"cut_points"
db_exposures <- db_exposures[, c("pnr", "cut_points")]
pnr | cut_points |
---|---|
invented_person | 25.50681 |
invented_person | 26.00681 |
invented_person | 26.50681 |
invented_person | 27.50681 |
The invented person developed Mood Disorders at age 25.5, that is when the first splitting takes place. Then it will be splitted again 6 months after and 1 and 2 years after. No more times will be needed, given that this specific person is followed until age 29.8.
We create two new variables: expo_D30 that is a binary variable saying whether the person has had a diagnosis of Mood Disorders, and expo_time_after_D30 that counts also the time after the onset.
temp <- tmerge(temp, db_exposures, id = pnr, expo_time_after_D30=cumtdc(cut_points))
temp$expo_D30 <- (temp$expo_time_after_D30 >= 1)
pnr | fdatoD | entry | entryD | entryA | exitD | exitA | fail | statdA | kqn | Byear | D00A | D10A | D20A | D30A | D51A | D61A | D70A | D81A | D91A | outcome | id | tstart | tstop | endpoint | expo_time_after_D30 | expo_D30 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 14.58900 | 25.50681 | 0 | 0 | FALSE |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 25.50681 | 26.00681 | 0 | 1 | TRUE |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 26.00681 | 26.50681 | 0 | 2 | TRUE |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 26.50681 | 27.50681 | 0 | 3 | TRUE |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 27.50681 | 29.44927 | 1 | 4 | TRUE |
Note that the invented person now has 5 different observations, where tstop of each observation finishes on tstart of the following one, and the two variables regarding exposure to Mood Disorders change in each observation.
There is one prior-disorder (Mood Disorders) and one later-disorder (Neurotic Disorders). The remaining eight disorders are used to adjust the models for mental health comorbidity. It is therefore necessary to replicate the previous step with the other 8 disorders.
adjustment <- exposures[exposures!="D30"]
adjustment
## [1] "D00" "D10" "D20" "D51" "D61" "D70" "D81" "D91"
for(i in 1:length(adjustment)) {
db_exposures0 <- temp[!is.na(temp[, paste(adjustment[i],"A",sep="")]), c("pnr", "exitA", "D30A", paste(adjustment[i],"A",sep=""))]
## We do not want to adjust for mediators (only for confounders), so we ignore mental disorders with onset after the prior-disorder
db_exposures0 <- db_exposures0[is.na(db_exposures0[, "D30A"]) |
(!is.na(db_exposures0[, "D30A"]) & db_exposures0[, paste(adjustment[i],"A",sep="")] < db_exposures0[, "D30A"]),]
db_exposures0 <- db_exposures0[db_exposures0[, "exitA"] >= db_exposures0[, paste(adjustment[i],"A",sep="")],]
db_exposures0 <- db_exposures0[!duplicated(db_exposures0), ]
names(db_exposures0)[names(db_exposures0)==paste(adjustment[i],"A",sep="")]<-"cut_points"
db_exposures0 <- db_exposures0[, c("pnr", "cut_points")]
temp <- tmerge(temp, db_exposures0, id = pnr, exposure=tdc(cut_points))
colnames(temp)[colnames(temp) == "exposure"] <- paste("adjust_D30_by_prev_", adjustment[i], sep="")
}
pnr | fdatoD | entry | entryD | entryA | exitD | exitA | fail | statdA | kqn | Byear | D00A | D10A | D20A | D30A | D51A | D61A | D70A | D81A | D91A | outcome | id | tstart | tstop | endpoint | expo_time_after_D30 | expo_D30 | adjust_D30_by_prevD00 | adjust_D30_by_prev_D10 | adjust_D30_by_prevD20 | adjust_D30_by_prevD51 | adjust_D30_by_prevD61 | adjust_D30_by_prevD70 | adjust_D30_by_prevD81 | adjust_D30_by_prevD91 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 14.58900 | 20.37804 | 0 | 0 | FALSE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 20.37804 | 25.50681 | 0 | 0 | FALSE | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 25.50681 | 26.00681 | 0 | 1 | TRUE | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 26.00681 | 26.50681 | 0 | 2 | TRUE | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 26.50681 | 27.50681 | 0 | 3 | TRUE | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
invented_person | 1985.411 | 2000-01-01 | 2000 | 14.589 | 2014.86 | 29.44927 | 1 | 29.784 | K | 1985 | NA | 20.37804 | NA | 25.50681 | 25.87119 | NA | NA | NA | NA | 1 | invented_person | 27.50681 | 29.44927 | 1 | 4 | TRUE | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Note that Substance Use disorders (D10) occurred before the prior-disorder, and it is considered as a time-varying variable. However, onset of Eating Disorders (D51) occurred after the onset of the prior-disorder, and it is not considered when adjusting for mental health comorbidity.
data_survival <- temp
save(prevalent, at_risk, data_survival, data_invented, file="SHARED_disease_D41.RData")
Load the datasets prepared in steps 1-4.
rm(list = ls())
load(file="SHARED_disease_D41.RData")
nrow(prevalent); table(prevalent$kqn)
## [1] 89348
##
## K M
## 57910 31438
nrow(at_risk); table(at_risk$kqn)
## [1] 5610596
##
## K M
## 2807442 2803154
nrow(data_survival[data_survival$endpoint==1, ]); table(data_survival[data_survival$endpoint==1, "kqn"])
## [1] 205241
##
## K M
## 120756 84485
# Some persons have more than one observation (due to splitting time), and they need to be counted only once
number <- data_survival[, c("pnr", "kqn", "expo_D30")]
number <- number[!duplicated(number), ]
number$number <- 1
number_sex <- aggregate(number ~ expo_D30 + kqn, data = number, FUN=sum, na.rm=TRUE)
data_survival$time <- data_survival$tstop - data_survival$tstart
py_sex <- aggregate(cbind(time, endpoint) ~ expo_D30 + kqn, data = data_survival, FUN=sum, na.rm=TRUE)
number <- merge(number_sex, py_sex)
number2 <- rbind(number,
cbind(aggregate(cbind(number, time, endpoint) ~ expo_D30, data = number, FUN=sum, na.rm=TRUE), kqn="all")
)
# According to Danish regulations, we cannot report cells with less than 3 cases
number2[number2$number<3, "number"] <- NA
number2[number2$endpoint<3, "endpoint"] <- NA
kable(number2)
expo_D30 | kqn | number | time | endpoint |
---|---|---|---|---|
FALSE | K | 2773129 | 37728220.1 | 98555 |
FALSE | M | 2784330 | 38039730.0 | 71364 |
TRUE | K | 102155 | 757457.0 | 22201 |
TRUE | M | 62826 | 445976.4 | 13121 |
FALSE | all | 5557459 | 75767950.1 | 169919 |
TRUE | all | 164981 | 1203433.4 | 35322 |
Note that a person could be counted twice (one among exposed and one among unexposed) if exposure to Mood Disorders occurs during the follow-up time.
number <- data_survival[, c("pnr", "kqn", "expo_time_after_D30")]
number <- number[!duplicated(number), ]
number$number <- 1
number_sex <- aggregate(number ~ expo_time_after_D30 + kqn, data = number, FUN=sum, na.rm=TRUE)
data_survival$time <- data_survival$tstop - data_survival$tstart
py_sex <- aggregate(cbind(time, endpoint) ~ expo_time_after_D30 + kqn, data = data_survival, FUN=sum, na.rm=TRUE)
number <- merge(number_sex, py_sex)
number2 <- rbind(number,
cbind(aggregate(cbind(number, time, endpoint) ~ expo_time_after_D30, data = number, FUN=sum, na.rm=TRUE), kqn="all")
)
# According to Danish regulations, we cannot report cells with less than 3 cases
number2[number2$number<3, "number"] <- NA
number2[number2$endpoint<3, "endpoint"] <- NA
kable(number2)
expo_time_after_D30 | kqn | number | time | endpoint |
---|---|---|---|---|
0 | K | 2773129 | 37728220.07 | 98555 |
0 | M | 2784330 | 38039730.02 | 71364 |
1 | K | 69228 | 29427.65 | 9559 |
1 | M | 44894 | 18950.67 | 6164 |
2 | K | 58431 | 27836.62 | 1500 |
2 | M | 37537 | 17880.54 | 802 |
3 | K | 57103 | 52463.44 | 1987 |
3 | M | 36528 | 33614.52 | 1092 |
4 | K | 56917 | 136880.70 | 3256 |
4 | M | 35956 | 86405.72 | 1932 |
5 | K | 45298 | 164371.80 | 2758 |
5 | M | 27864 | 100406.19 | 1530 |
6 | K | 28634 | 98242.20 | 1267 |
6 | M | 17236 | 59286.81 | 679 |
7 | K | 29946 | 248234.57 | 1874 |
7 | M | 16475 | 129431.98 | 922 |
0 | all | 5557459 | 75767950.10 | 169919 |
1 | all | 114122 | 48378.32 | 15723 |
2 | all | 95968 | 45717.16 | 2302 |
3 | all | 93631 | 86077.97 | 3079 |
4 | all | 92873 | 223286.42 | 5188 |
5 | all | 73162 | 264777.99 | 4288 |
6 | all | 45870 | 157529.01 | 1946 |
7 | all | 46421 | 377666.55 | 2796 |
Load the datasets prepared in steps 1-4.
rm(list = ls())
load(file="SHARED_disease_D41.RData")
model_A <- coxph(Surv(tstart, tstop, endpoint) ~ expo_D30 + kqn + entryD, data = data_survival)
print(paste0("HR = ", format(exp(summary(model_A)$coefficients[1, 1]), digits=1, nsmall=1),
" [95% CI: ", format(summary(model_A)$conf.int[1, 3], digits=1, nsmall=1), "-",
format(summary(model_A)$conf.int[1, 4], digits=1, nsmall=1), "]"))
## [1] "HR = 15.7 [95% CI: 15.5-15.9]"
# Model B adjusts also for number of disorders, and it needs to be counted.
data_survival$num_others <- rowSums(data_survival[, c("adjust_D30_by_prev_D00", "adjust_D30_by_prev_D10", "adjust_D30_by_prev_D20", "adjust_D30_by_prev_D51", "adjust_D30_by_prev_D61", "adjust_D30_by_prev_D70","adjust_D30_by_prev_D81", "adjust_D30_by_prev_D91")])
data_survival$num_others[data_survival$num_others>4] <- 4
data_survival$num_others[data_survival$num_others==1] <- 0
model_B <- coxph(Surv(tstart, tstop, endpoint) ~ expo_D30 + kqn + entryD
+ adjust_D30_by_prev_D00 + adjust_D30_by_prev_D10 + adjust_D30_by_prev_D20
+ adjust_D30_by_prev_D51 + adjust_D30_by_prev_D61 + adjust_D30_by_prev_D70
+ adjust_D30_by_prev_D81 + adjust_D30_by_prev_D91 + as.factor(num_others), data = data_survival)
print(paste0("HR = ", format(exp(summary(model_B)$coefficients[1, 1]), digits=1, nsmall=1),
" [95% CI: ", format(summary(model_B)$conf.int[1, 3], digits=1, nsmall=1), "-",
format(summary(model_B)$conf.int[1, 4], digits=1, nsmall=1), "]"))
## [1] "HR = 10.5 [95% CI: 10.3-10.6]"
times <- c("0-6 months", "6-12 months", "1-2 years", "2-5 years", "5-10 years", "10-15 years", "15+ years")
model_A_lagged <- coxph(Surv(tstart, tstop, endpoint) ~ as.factor(expo_time_after_D30) + kqn + entryD, data = data_survival)
for(i in 1:length(times)) {
print(paste0(times[i], ": HR = ", format(exp(summary(model_A_lagged)$coefficients[i, 1]), digits=1, nsmall=1), " [95% CI: ", format(summary(model_A_lagged)$conf.int[i, 3], digits=1, nsmall=1), "-", format(summary(model_A_lagged)$conf.int[i, 4], digits=1, nsmall=1), "]"))
}
## [1] "0-6 months: HR = 117.9 [95% CI: 115.9-119.8]"
## [1] "6-12 months: HR = 18.5 [95% CI: 17.7-19.2]"
## [1] "1-2 years: HR = 13.5 [95% CI: 13.0-14.0]"
## [1] "2-5 years: HR = 9.4 [95% CI: 9.2- 9.7]"
## [1] "5-10 years: HR = 7.6 [95% CI: 7.4-7.9]"
## [1] "10-15 years: HR = 7.1 [95% CI: 6.8-7.4]"
## [1] "15+ years: HR = 6.4 [95% CI: 6.2-6.7]"
model_B_lagged <- coxph(Surv(tstart, tstop, endpoint) ~ as.factor(expo_time_after_D30) + kqn + entryD
+ adjust_D30_by_prev_D00 + adjust_D30_by_prev_D10 + adjust_D30_by_prev_D20
+ adjust_D30_by_prev_D51 + adjust_D30_by_prev_D61 + adjust_D30_by_prev_D70
+ adjust_D30_by_prev_D81 + adjust_D30_by_prev_D91 + as.factor(num_others), data = data_survival)
for(i in 1:length(times)) {
print(paste0(times[i], ": HR = ", format(exp(summary(model_B_lagged)$coefficients[i, 1]), digits=1, nsmall=1), " [95% CI: ", format(summary(model_B_lagged)$conf.int[i, 3], digits=1, nsmall=1), "-", format(summary(model_B_lagged)$conf.int[i, 4], digits=1, nsmall=1), "]"))
}
## [1] "0-6 months: HR = 77.9 [95% CI: 76.5-79.2]"
## [1] "6-12 months: HR = 12.2 [95% CI: 11.7-12.7]"
## [1] "1-2 years: HR = 8.9 [95% CI: 8.6-9.2]"
## [1] "2-5 years: HR = 6.2 [95% CI: 6.0-6.4]"
## [1] "5-10 years: HR = 5.0 [95% CI: 4.9-5.2]"
## [1] "10-15 years: HR = 4.6 [95% CI: 4.4-4.8]"
## [1] "15+ years: HR = 4.3 [95% CI: 4.1-4.4]"
model_A <- coxph(Surv(tstart, tstop, endpoint) ~ expo_D30*kqn + entryD, data = data_survival)
matrix_contrasts <-t(
matrix(c(
1,0,0,0,
1,0,0,1
), ncol=2))
model_A_sex <- glht(model_A, linfct=matrix_contrasts)
HR <- data.frame(format(exp(confint(model_A_sex)$confint)[1:2, ], digits=1, nsmall=1))
HR[1, "sex"] <- "Females"
HR[2, "sex"] <- "Males"
HR
## Estimate lwr upr sex
## 1 13.8 13.6 14.1 Females
## 2 19.9 19.4 20.3 Males
model_B <- coxph(Surv(tstart, tstop, endpoint) ~ expo_D30*kqn + entryD
+ adjust_D30_by_prev_D00 + adjust_D30_by_prev_D10 + adjust_D30_by_prev_D20
+ adjust_D30_by_prev_D51 + adjust_D30_by_prev_D61 + adjust_D30_by_prev_D70
+ adjust_D30_by_prev_D81 + adjust_D30_by_prev_D91 + as.factor(num_others), data = data_survival)
matrix_contrasts <-t(
matrix(c(
1,rep(0, 13), 0,
1,rep(0, 13), 1
), ncol=2))
model_B_sex <- glht(model_B, linfct=matrix_contrasts)
HR <- data.frame(format(exp(confint(model_B_sex)$confint)[1:2, ], digits=1, nsmall=1))
HR[1, "sex"] <- "Females"
HR[2, "sex"] <- "Males"
HR
## Estimate lwr upr sex
## 1 9.7 9.5 9.9 Females
## 2 12.0 11.7 12.3 Males
model_A_lagged <- coxph(Surv(tstart, tstop, endpoint) ~ as.factor(expo_time_after_D30)*kqn + entryD, data = data_survival)
matrix_contrasts <-t(
matrix(c(
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,
1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,
0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,
0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,
0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1
), ncol=14))
model_A_lagged_sex <- glht(model_A_lagged, linfct=matrix_contrasts)
HR <- data.frame(format(exp(confint(model_A_lagged_sex)$confint)[1:14, ], digits=1, nsmall=1))
HR[1:7, "sex"] <- "Females"
HR[8:14, "sex"] <- "Males"
HR$time_after_exposure <- c(times, times)
HR
## Estimate lwr upr sex time_after_exposure
## 1 100.6 97.5 103.8 Females 0-6 months
## 2 16.9 15.7 18.2 Females 6-12 months
## 3 12.3 11.5 13.1 Females 1-2 years
## 4 8.4 7.9 8.8 Females 2-5 years
## 5 6.9 6.5 7.3 Females 5-10 years
## 6 6.5 6.0 7.1 Females 10-15 years
## 7 5.9 5.5 6.4 Females 15+ years
## 8 156.4 150.4 162.5 Males 0-6 months
## 9 21.7 19.6 24.1 Males 6-12 months
## 10 16.0 14.7 17.5 Males 1-2 years
## 11 11.7 10.9 12.5 Males 2-5 years
## 12 9.2 8.5 9.9 Males 5-10 years
## 13 8.4 7.5 9.3 Males 10-15 years
## 14 7.6 6.9 8.3 Males 15+ years
model_B_lagged <- coxph(Surv(tstart, tstop, endpoint) ~ as.factor(expo_time_after_D30)*kqn + entryD
+ adjust_D30_by_prev_D00 + adjust_D30_by_prev_D10 + adjust_D30_by_prev_D20
+ adjust_D30_by_prev_D51 + adjust_D30_by_prev_D61 + adjust_D30_by_prev_D70
+ adjust_D30_by_prev_D81 + adjust_D30_by_prev_D91 + as.factor(num_others), data = data_survival)
matrix_contrasts <-t(
matrix(c(
1,0,0,0,0,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
0,0,0,1,0,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
0,0,0,0,0,1,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,0,rep(0, 11),0,0,0,0,0,0,0,0,
1,0,0,0,0,0,0,0,rep(0, 11),0,1,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,rep(0, 11),0,0,1,0,0,0,0,0,
0,0,1,0,0,0,0,0,rep(0, 11),0,0,0,1,0,0,0,0,
0,0,0,1,0,0,0,0,rep(0, 11),0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,0,rep(0, 11),0,0,0,0,0,1,0,0,
0,0,0,0,0,1,0,0,rep(0, 11),0,0,0,0,0,0,1,0,
0,0,0,0,0,0,1,0,rep(0, 11),0,0,0,0,0,0,0,1
), ncol=14))
model_B_lagged_sex <- glht(model_B_lagged, linfct=matrix_contrasts)
HR <- data.frame(format(exp(confint(model_B_lagged_sex)$confint)[1:14, ], digits=1, nsmall=1))
HR[1:7, "sex"] <- "Females"
HR[8:14, "sex"] <- "Males"
HR$time_after_exposure <- c(times, times)
HR
## Estimate lwr upr sex time_after_exposure
## 1 70.4 68.1 72.6 Females 0-6 months
## 2 11.8 10.9 12.7 Females 6-12 months
## 3 8.6 8.0 9.2 Females 1-2 years
## 4 5.8 5.5 6.2 Females 2-5 years
## 5 4.8 4.6 5.1 Females 5-10 years
## 6 4.4 4.1 4.8 Females 10-15 years
## 7 4.1 3.8 4.4 Females 15+ years
## 8 92.7 89.1 96.4 Males 0-6 months
## 9 12.8 11.6 14.2 Males 6-12 months
## 10 9.5 8.7 10.4 Males 1-2 years
## 11 6.9 6.5 7.4 Males 2-5 years
## 12 5.5 5.1 5.9 Males 5-10 years
## 13 4.9 4.4 5.5 Males 10-15 years
## 14 4.6 4.2 5.1 Males 15+ years
Load the datasets prepared in steps 1-4.
rm(list = ls())
load(file="SHARED_disease_D41.RData")
In the Cox proportional hazards models (used to estimate hazard ratios), those who died or emigrated/disappeared were censored at date of death or emigration. The absolute risks can be biased if we do not take into consideration these competing risks events. For this reason, we give a new outcome to those events: 2 for disappearance or emigration, and 3 for death.
data_survival[data_survival$statdA==data_survival$tstop & data_survival$fail %in% c(70, 80), "endpoint"] <- 2
data_survival[data_survival$statdA==data_survival$tstop & data_survival$fail==90, "endpoint"] <- 3
For multi-state models, it is necessary to define a matrix of transitions. In this example, every person starts being free of the later-disorder (Neurotic Disorders), and they are labelled “healthy”. During follow-up, they can change the status from “healthy” to “diseased”, “emigrated”, or “dead”. No other transitions (e.g. from “emigrated” to “diseased”) are allowed.
tmat <- transMat(x=list(c(2,3,4), c(), c(), c()), names=c("healthy", "disease", "emi", "death"))
tmat
## to
## from healthy disease emi death
## healthy NA 1 2 3
## disease NA NA NA NA
## emi NA NA NA NA
## death NA NA NA NA
There are 4 states, but only 3 transitions. Each observation in the dataset has to be repeated 3 times. In each repetition, one different possible outcome is considered, and the other two treated as censoring.
CRcoxdata <- combine(data_survival, data_survival, data_survival, names = c(1, 2, 3))
A new outcome d can now be created, where each type of outcome is considering censoring in the other two repetitions.
CRcoxdata$d <- ((CRcoxdata$endpoint == 1) * (CRcoxdata$source == 1)) +
((CRcoxdata$endpoint == 2) * (CRcoxdata$source == 2)) +
((CRcoxdata$endpoint == 3) * (CRcoxdata$source == 3))
CRcoxdata$h <- as.numeric(CRcoxdata$source)
The invented data used in steps 1-4 can be used to illustrate the data structure. Note that the 6 observations for the invented person have been repeated 3 times.
pnr | D30A | tstart | tstop | endpoint | source | d | h |
---|---|---|---|---|---|---|---|
invented_person | 25.50681 | 14.58900 | 20.37804 | 0 | 1 | 0 | 1 |
invented_person | 25.50681 | 20.37804 | 25.50681 | 0 | 1 | 0 | 1 |
invented_person | 25.50681 | 25.50681 | 26.00681 | 0 | 1 | 0 | 1 |
invented_person | 25.50681 | 26.00681 | 26.50681 | 0 | 1 | 0 | 1 |
invented_person | 25.50681 | 26.50681 | 27.50681 | 0 | 1 | 0 | 1 |
invented_person | 25.50681 | 27.50681 | 29.44927 | 1 | 1 | 1 | 1 |
invented_person | 25.50681 | 14.58900 | 20.37804 | 0 | 2 | 0 | 2 |
invented_person | 25.50681 | 20.37804 | 25.50681 | 0 | 2 | 0 | 2 |
invented_person | 25.50681 | 25.50681 | 26.00681 | 0 | 2 | 0 | 2 |
invented_person | 25.50681 | 26.00681 | 26.50681 | 0 | 2 | 0 | 2 |
invented_person | 25.50681 | 26.50681 | 27.50681 | 0 | 2 | 0 | 2 |
invented_person | 25.50681 | 27.50681 | 29.44927 | 1 | 2 | 0 | 2 |
invented_person | 25.50681 | 14.58900 | 20.37804 | 0 | 3 | 0 | 3 |
invented_person | 25.50681 | 20.37804 | 25.50681 | 0 | 3 | 0 | 3 |
invented_person | 25.50681 | 25.50681 | 26.00681 | 0 | 3 | 0 | 3 |
invented_person | 25.50681 | 26.00681 | 26.50681 | 0 | 3 | 0 | 3 |
invented_person | 25.50681 | 26.50681 | 27.50681 | 0 | 3 | 0 | 3 |
invented_person | 25.50681 | 27.50681 | 29.44927 | 1 | 3 | 0 | 3 |
In these analyses, we aim to estimate the proportion of individuals with a diagnosis of a prior-disorder (Mood Disorders) who later develop a later-disorder (Neurotic Disorders). For this reason, we can keep only those with a diagnosis of Mood Disorders, and start the follow-up time at onset of the disorder.
CRcoxdata_expo <- CRcoxdata[!is.na(CRcoxdata[, "D30A"]),]
CRcoxdata_expo[, "expo_Tstart"] <- CRcoxdata_expo[, "tstart"] - CRcoxdata_expo[, "D30A"]
CRcoxdata_expo[, "expo_Tstop"] <- CRcoxdata_expo[, "tstop"] - CRcoxdata_expo[, "D30A"]
CRcoxdata_expo <- CRcoxdata_expo[CRcoxdata_expo$expo_Tstart >= 0,]
pnr | D30A | tstart | tstop | expo_Tstart | expo_Tstop | endpoint | source | d | h |
---|---|---|---|---|---|---|---|---|---|
invented_person | 25.50681 | 25.50681 | 26.00681 | 0.0 | 0.500000 | 0 | 1 | 0 | 1 |
invented_person | 25.50681 | 26.00681 | 26.50681 | 0.5 | 1.000000 | 0 | 1 | 0 | 1 |
invented_person | 25.50681 | 26.50681 | 27.50681 | 1.0 | 2.000000 | 0 | 1 | 0 | 1 |
invented_person | 25.50681 | 27.50681 | 29.44927 | 2.0 | 3.942466 | 1 | 1 | 1 | 1 |
invented_person | 25.50681 | 25.50681 | 26.00681 | 0.0 | 0.500000 | 0 | 2 | 0 | 2 |
invented_person | 25.50681 | 26.00681 | 26.50681 | 0.5 | 1.000000 | 0 | 2 | 0 | 2 |
invented_person | 25.50681 | 26.50681 | 27.50681 | 1.0 | 2.000000 | 0 | 2 | 0 | 2 |
invented_person | 25.50681 | 27.50681 | 29.44927 | 2.0 | 3.942466 | 1 | 2 | 0 | 2 |
invented_person | 25.50681 | 25.50681 | 26.00681 | 0.0 | 0.500000 | 0 | 3 | 0 | 3 |
invented_person | 25.50681 | 26.00681 | 26.50681 | 0.5 | 1.000000 | 0 | 3 | 0 | 3 |
invented_person | 25.50681 | 26.50681 | 27.50681 | 1.0 | 2.000000 | 0 | 3 | 0 | 3 |
invented_person | 25.50681 | 27.50681 | 29.44927 | 2.0 | 3.942466 | 1 | 3 | 0 | 3 |
The new variables expo_Tstart and expo_Tstop focus on time since onset of Mood Disorders, compared to the old variables tstart and tstop that focus on time since birth (age).
Each replication (h=1, h=2, and h=3) is linked to one specific transition (1–>2, 1–>3, 1–>4). Two new variables from and to define these transitions.
CRcoxdata_expo$from <- 1
CRcoxdata_expo$to <- CRcoxdata_expo$h + 1
In order to investigate specific absolute risks depending on age at diagnosis of Mood Disorders, different age groups have to be identified.
CRcoxdata_expo$age_group <- cut(CRcoxdata_expo[, "D30A"], breaks=c(0, 20, 40, 60, 80, 200), include.lowest=TRUE, right = FALSE)
CRcoxdata_expo$age_group <- droplevels(CRcoxdata_expo$age_group)
Finally, the dataset was splitted many times (because of mental health comorbidity, time since exposure, etc.). This level of detail is not necessary for this analysis, so the dataset can be minimized.
CRcoxdata_expo_min <- data.table(CRcoxdata_expo[, c("pnr", "kqn", "D30A", "age_group", "expo_Tstart", "expo_Tstop", "from", "to", "h", "d")])
CRcoxdata_expo_min <- CRcoxdata_expo_min[, list(tstart=min(expo_Tstart), tstop=max(expo_Tstop), d=max(d)), by=c("pnr", "kqn", "D30A", "age_group", "from", "to", "h")]
CRcoxdata_expo_min <- data.frame(CRcoxdata_expo_min)
pnr | D30A | age_group | kqn | tstart | tstop | from | to | d |
---|---|---|---|---|---|---|---|---|
invented_person | 25.50681 | [20,40) | K | 0 | 3.942466 | 1 | 2 | 1 |
invented_person | 25.50681 | [20,40) | K | 0 | 3.942466 | 1 | 3 | 0 |
invented_person | 25.50681 | [20,40) | K | 0 | 3.942466 | 1 | 4 | 0 |
The invented person is followed from onset of Mood Disorders (at age 25.5 years) during 3.94 years, moment in which this person is diagnosed with Neurotic Disorders (and censored for emigration or death).
In this step, the number of persons with Mood Disorders (exposed) is calculated, together with how many develop subsequent Neurotic Disorders (d).
count_data <- CRcoxdata_expo_min[CRcoxdata_expo_min$h==1, c("pnr", "kqn", "age_group", "tstop", "d")]
count_data <- count_data[!duplicated(count_data), ]
count_data$number <- 1
numbers <- aggregate(cbind(number, d) ~ kqn + age_group, data = count_data, FUN=sum, na.rm=TRUE)
# According to Danish regulations, we cannot report cells with less than 3 cases
numbers_print <- numbers
numbers_print[numbers_print$number<3, "number"] <- NA
numbers_print[numbers_print$d<3, "d"] <- NA
kable(numbers_print)
kqn | age_group | number | d |
---|---|---|---|
K | [0,20) | 11474 | 4472 |
M | [0,20) | 5398 | 1649 |
K | [20,40) | 36526 | 9990 |
M | [20,40) | 23077 | 6003 |
K | [40,60) | 28614 | 5470 |
M | [40,60) | 21018 | 4172 |
K | [60,80) | 17207 | 1911 |
M | [60,80) | 10121 | 1140 |
K | [80,200] | 8334 | 358 |
M | [80,200] | 3212 | 157 |
for(age_group in c("all", levels(CRcoxdata_expo_min$age_group))) {
# First we select the observations for the age group of interest
if(age_group=="all") {
multi <- CRcoxdata_expo_min
} else {
multi <- CRcoxdata_expo_min[CRcoxdata_expo_min$age_group==age_group, ]
}
for(sex in c("all", "M", "K")) {
# Second we select the observations for the sex of interest
if(sex=="all") {
multi2 <- multi
} else {
multi2 <- multi[(multi[, "kqn"]==sex), ]
}
# The CIP is estimated only if there is at least one person in the specific sex- and age-group
if(dim(multi2)[1]>0) {
multi2$trans <- multi2$h
coxres <- coxph(Surv(tstart, tstop, d) ~ strata(trans), data=multi2, method="breslow")
who <- multi2[c(1:3), ]
who$strata<-1:3
msf0 <- msfit(coxres, who, trans=tmat)
pt0 <- probtrans(msf0, predt=0)
CIP_aux <- pt0[[1]]
CIP_aux$sex <- sex
CIP_aux$age_group <- age_group
if (age_group=="all" & sex=="all") {
CIP <- CIP_aux
} else {
CIP <- rbind(CIP, CIP_aux)
}
}
}
}
head(CIP)
## time pstate1 pstate2 pstate3 pstate4 se1
## 1 0.000000000 1.0000000 0.000000000 0 0.000000e+00 0.0000000000
## 2 0.002732240 0.9983728 0.001573522 0 5.364279e-05 0.0001204172
## 3 0.002732240 0.9980241 0.001913317 0 6.258478e-05 0.0001326891
## 4 0.002737851 0.9916574 0.008280012 0 6.258478e-05 0.0002712736
## 5 0.002737851 0.9655200 0.034417440 0 6.258478e-05 0.0005397441
## 6 0.002737851 0.9612815 0.038655942 0 6.258478e-05 0.0005712619
## se2 se3 se4 sex age_group
## 1 0.0000000000 0 0.000000e+00 all all
## 2 0.0001184157 0 2.186394e-05 all all
## 3 0.0001305720 0 2.362047e-05 all all
## 4 0.0002702575 0 2.362047e-05 all all
## 5 0.0005392612 0 2.362047e-05 all all
## 6 0.0005708098 0 2.362047e-05 all all
There is information on too many data points and for all possible transitions. In this step, we will keep only one estimate per month (aprox 0.08 years) and only transition 1 (going from “Healthy” to “Diseased”).
CIP$time_since_dx <- 0.08 * ceiling(as.numeric(CIP$time)/0.08)
CIP <- ddply(CIP, .(sex, age_group, time_since_dx), function(x) tail(x, 1))
CIP$cip <- 100*CIP$pstate2
CIP[, "cip_low"] <- pmax(0, 100*exp(log(CIP[, "pstate2"]) - qnorm(0.975) * (CIP[, "se2"] / CIP[, "pstate2"])))
CIP[, "cip_high"] <- pmin(100, 100*exp(log(CIP[, "pstate2"]) + qnorm(0.975) * (CIP[, "se2"] / CIP[, "pstate2"])))
CIP <- CIP[, c("sex", "age_group", "time", "cip", "cip_low", "cip_high")]
head(CIP)
## sex age_group time cip cip_low cip_high
## 1 all [0,20) 0.00000000 0.00000 NaN NaN
## 2 all [0,20) 0.07945205 14.09683 13.55720 14.65794
## 3 all [0,20) 0.15890411 16.30146 15.72631 16.89764
## 4 all [0,20) 0.23835616 17.64508 17.05019 18.26074
## 5 all [0,20) 0.31780822 18.62727 18.01874 19.25635
## 6 all [0,20) 0.40000000 19.34755 18.72941 19.98608
We are interested on the first 15 years after the prior-disorder. We will delete the observations after 15 years, and extend the last observation until exact 15 years after the prior-disorder.
CIP <- CIP[CIP$time<=15, ]
CIP_last <- ddply(CIP, .(sex, age_group), function(x) x[nrow(x), ])
CIP_last$time <- 15
CIP <- rbind(CIP, CIP_last)
numbers <- numbers[!duplicated(numbers), ]
numbers2 <- aggregate(cbind(number, d)~kqn, data=numbers, FUN=sum)
numbers2$age_group <- "all"
numbers3 <- aggregate(cbind(number, d)~age_group, data=numbers, FUN=sum)
numbers3$kqn <- "all"
numbers4 <- aggregate(cbind(number, d)~1, data=numbers, FUN=sum)
numbers4$age_group <- "all"
numbers4$kqn <- "all"
numbers <- rbind(numbers, numbers2, numbers3, numbers4)
colnames(numbers)[colnames(numbers)=="kqn"] <- "sex"
CIP <- merge(CIP, numbers, all.x=T, all.y=F)
colnames(CIP)[colnames(CIP)=="d"]<-"cases_outcome"
colnames(CIP)[colnames(CIP)=="number"]<-"number_exposed"
# According to Danish regulations, we cannot report cells with less than 3 cases
CIP[CIP$cases_outcome<3, "cip"] <- NA
CIP[CIP$cases_outcome<3, "cip_low"] <- NA
CIP[CIP$cases_outcome<3, "cip_high"] <- NA
CIP[CIP$cases_outcome<3, "cases_outcome"] <- NA
CIP[CIP$number_exposed<3, "number_exposed"] <- NA
head(CIP)
## sex age_group time cip cip_low cip_high number_exposed
## 1 all [0,20) 0.00000000 0.00000 NaN NaN 16872
## 2 all [0,20) 0.07945205 14.09683 13.55720 14.65794 16872
## 3 all [0,20) 0.15890411 16.30146 15.72631 16.89764 16872
## 4 all [0,20) 0.23835616 17.64508 17.05019 18.26074 16872
## 5 all [0,20) 0.31780822 18.62727 18.01874 19.25635 16872
## 6 all [0,20) 0.40000000 19.34755 18.72941 19.98608 16872
## cases_outcome
## 1 6121
## 2 6121
## 3 6121
## 4 6121
## 5 6121
## 6 6121