This document is an accompanying file of the paper:

Plana-Ripoll et al. Exploring Comorbidity Within Mental Disorders Among a Danish National Population. JAMA Psychiatry, published online on Jan 16, 2019. doi:10.1001/jamapsychiatry.2018.3658. Link to the published paper here
Full list of authors

Oleguer Plana-Ripoll, Carsten Bøcker Pedersen, Yan Holtz, Michael Benros, Søren Dalsgaard, Peter De Jonge, Chun Chieh Fan, Louisa Degenhardt, Andrea Ganna, Aja Neergaard Greve, Jane Gunn, Kim Moesgaard Iburg, Lars Vedel Kessing, Brian K. Lee, Carmen C.W. Lim, Ole Mors, Merete Nordentoft, Anders Prior, Annelieke M. Roest, Sukanta Saha, Andrew Schork, James G. Scott, Kate M. Scott, Terry Stedman, Holger J. Sørensen, Thomas Werge, Harvey A. Whiteford, Thomas Munk Laursen, Esben Agerbo, Ron C. Kessler, Preben Bo Mortensen and John J. McGrath

Link to the interactive website with all results available here - Comments on the code here.

In this paper, we estimated the association between 90 pairs of mental disorders. In this document, the R code for one specific pair is provided: the association between Mood Disorders (as a prior-disorder) and Neurotic Disorders (as a later-disorder). Follow the steps 1-7 below in order to replicate the analyses.

1. Libraries and data

Load necessary libraries

library(lubridate)
library(survival)
library(knitr)
library(multcomp)
library(gdata)
library(data.table)
library(mstate)
library(plyr)

Load necessary data

load("../basics.RData")

There are two necessary datasets for the estimations: data_disease and land.

dataset data_disease

data_disease <- stam_diag[, c("pnr", "fdato", "fdatoD", "Byear", "kqn", "stat", "statd", "statdD", "statdA", "D00", "D10", "D20", "D30", "D41", "D51", "D61", "D70", "D81", "D91")]
nrow(data_disease)
## [1] 7387479

The Danish Civil Registration System was established on April 1, 1968, where all persons alive and living in Denmark were registered. After this date, all persons born or immigrating to Denmark were included. Dataset data_disease contains information on the subsample of this group who were born in Denmark between January 1, 1900 and December 31, 2015 (N=7387479). Information includes a unique personal identification number used to link information from several registers (pnr), birthdate (fdato), sex (kqn: “M” for males and “K” for females), legal status (stat: 1 for active citizens, 70 for disappeared, 80 for emigrated and 90 for dead) and date of disappearance, emigration or death (statd). In addition, information on the 10 mental disorders used in this publication were included from the Danish Psychiatric Central Research Register: Organic Disorders (D00), Substance Use disorders (D10), Schizophrenia and related disorders (D20), Mood Disorders (D30), Neurotic Disorders (D41), Eating Disorders (D51), Personality Disorders (D61), Intelectual disabilities (D70), Developmental Disorders (D81) and Behavioral Disorders (D91).

According to Danish regulations, it is not allowed to share personal data, so an invented case is used here to show the data structure. This invented person will be used throughout the whole code example. She was born on May 31, 1985 and emigrated from Denmark on March 13, 2015. She received a diagnosis of Substance Use disorders for the first time on October 15, 2005; Mood Disorders on December 1, 2010; Neurotic Disorders on November 10, 2014; and Eating Disorders on April 13, 2011.

data_disease: This table includes invented data (not real person)
pnr fdato fdatoD Byear kqn stat statd statdD statdA D00 D10 D20 D30 D41 D51 D61 D70 D81 D91
invented_person 1985-05-31 1985.411 1985 K 80 2015-03-13 2015.195 29.784 NA 2005-10-15 NA 2010-12-01 2014-11-10 2011-04-13 NA NA NA NA

dataset land

Dataset land contains information on all persons in data_disease regarding periods of living in Denmark. Each period of Danish residence is defined with a starting date (tflytd) and ending date (fflytd)

Invented data for the same invented person above is used to show the data structure. This person was living in Denmark since birth (May 31, 1985) until emigration (March 13, 2015) except for a period that she moved abroad (from August 1, 1997 to January 5, 1998).

land_disease: This table includes invented data (not real person)
pnr tflytd fflytd
invented_person 1985-05-31 1997-08-01
invented_person 1998-01-05 2015-03-13

2. Population, follow-up time, prior- and later-disorders.

With these two datasets, we can set the study population, the specific follow-up time for each person, and information on the later-disorder (Neurotic Disorders) and all prior-disorders (all disorders except Neurotic Disorders).

Set outcome of interest (later-disorder)

The outcome of interest for this code example is Neurotic Disorders (variable D41). The possible minimum age of onset for this disorder is 5 years.

disease <- "D41"
minage <- 5

Left truncation

For every person, follow-up starts on age at January 1, 2000 or at 5 years, whichever occurs later.

data_disease$entryA <- pmax(2000-data_disease[, "fdatoD"], minage)
data_disease$entryD <- as.numeric(data_disease$fdatoD) + as.numeric(data_disease$entryA)
data_disease$entry <- as.Date(date_decimal(data_disease$entryD))

Right censoring

For every person, follow-up ends on December 31, 2016, death, emigration or disappearance, whichever occurs first. Follow-up for persons developing Neurotic Disorders will end on the date of onset of the disorder, but this information will be included later.

data_disease$exit <- pmin(as.Date("2016-12-31"), data_disease[, "statd"], na.rm = TRUE)
data_disease$exitD <- decimal_date(data_disease$exit)
data_disease$exitA <- data_disease$exitD - data_disease$fdatoD

Eliminate those with “negative” follow-up time

Some persons died or emigrated before the follow-up started on January 1, 2000 (or age 5 years). These persons need to be excluded from the sample. An easy way to identify them is because their follow-up time finishes before it starts, so we keep in the sample only those whose follow-up starts before it finishes.

data_disease <- data_disease[data_disease$entryA < data_disease$exitA, ]

Include only those living in Denmark at beginning of follow-up

Some persons were living outside Denmark on the date of start of follow-up (variable entry). We need to keep only those residing in the country on that date. We use information from the dataset land to obtain this information.

data_disease_DK <- merge(data_disease, land, all.x = TRUE, all.y = FALSE)
data_disease_DK <- data_disease_DK[(!is.na(data_disease_DK$tflytd)) &
                                     (data_disease_DK$tflytd <= data_disease_DK$entry) &
                                     (data_disease_DK$fflytd > data_disease_DK$entry) 
                                   , ]
nrow(data_disease_DK)
## [1] 5699944

The sample now consists on 5699944 persons who were living in Denmark on the date of start of follow-up (January 1, 2000 or the date they turned 5 years, if it happened afterwards).

This table includes invented data (not real person)
pnr fdato fdatoD Byear kqn stat statd statdD statdA D00 D10 D20 D30 D41 D51 D61 D70 D81 D91 entryA entryD entry exit exitD exitA tflytd fflytd
invented_person 1985-05-31 1985.411 1985 K 80 2015-03-13 2015.195 29.784 NA 2005-10-15 NA 2010-12-01 2014-11-10 2011-04-13 NA NA NA NA 14.589 2000 2000-01-01 2015-03-13 2015.195 29.78352 1998-01-05 2015-03-13

Information on onset of later-disorder (Neurotic Disorders)

Information on the later-disorder is included at this stage. Variable onsetD contains the date of onset of the specific disorder, and variable onsetA contains the age at onset.

data_disease_DK[!is.na(data_disease_DK[, disease]), "onsetD"] <- decimal_date((data_disease_DK[!is.na(data_disease_DK[, disease]), disease] + 1))
data_disease_DK$onsetA <- data_disease_DK$onsetD - data_disease_DK$fdatoD  

Follow-up now needs to finish on date of onset, for those who develop this disorder.

data_disease_DK$exit <- pmin(data_disease_DK$exit, (data_disease_DK[, disease] + 1), na.rm = TRUE)
data_disease_DK$exitD <- decimal_date(data_disease_DK$exit)
data_disease_DK$exitA <- data_disease_DK$exitD - data_disease_DK$fdatoD 

Note that for someone developing Neurotic Disorders before the start of follow-up, now exit will occur before entry. These are the prevalent cases and will be identified later.

A variable fail can be created indicating the type of outcome at end of follow-up (0 = censored; 1 = onset of neurotic disorders;70 = disappearance; 80 = emigration; 90 = death).

data_disease_DK$fail <- (90 * (!is.na(data_disease_DK$statdA) & (data_disease_DK$statdA == data_disease_DK$exitA) & (data_disease_DK$stat == 90))) + 
  (80 * (!is.na(data_disease_DK$statdA) & (data_disease_DK$statdA == data_disease_DK$exitA) & (data_disease_DK$stat == 80))) +
  (70 * (!is.na(data_disease_DK$statdA) & (data_disease_DK$statdA == data_disease_DK$exitA) & (data_disease_DK$stat == 70))) +
  (01 * (!is.na(data_disease_DK$onsetA) & (data_disease_DK$onsetA == data_disease_DK$exitA))) 
table(data_disease_DK$fail)
## 
##       0       1      70      80      90      91 
## 4497071  294575    1069   69417  837798      14

Note that some persons can have two outcomes on the same day (e.g. 91 = death and onset of Neurotic Disorders).

This table includes invented data (not real person)
pnr fdato fdatoD Byear kqn stat statd statdD statdA D00 D10 D20 D30 D41 D51 D61 D70 D81 D91 entryA entryD entry exit exitD exitA tflytd fflytd onsetD onsetA fail
invented_person 1985-05-31 1985.411 1985 K 80 2015-03-13 2015.195 29.784 NA 2005-10-15 NA 2010-12-01 2014-11-10 2011-04-13 NA NA NA NA 14.589 2000 2000-01-01 2014-11-11 2014.86 29.44927 1998-01-05 2015-03-13 2014.86 29.44927 1

Information on onset of prior-disorders

The later-disorder of interest is Neurotic Disorders (variable D41). The other nine disorders (variables D00, D10, D20, D30, D51, D61, D70, D81, D91) will be treated as prior-disorders. For example, variable D30D contains the date of onset of Mood Disorders, and variable D30A contains the age of onset.

comorbidities <- c("D00", "D10", "D20", "D30", "D51", "D61", "D70", "D81", "D91")
comorbiditiesD <- paste(comorbidities,"D",sep="")
comorbiditiesA <- paste(comorbidities,"A",sep="")

for(i in 1:length(comorbidities)) {
  data_disease_DK[!is.na(data_disease_DK[, comorbidities[i]]), comorbiditiesD[i]] <- decimal_date((data_disease_DK[!is.na(data_disease_DK[, comorbidities[i]]), comorbidities[i]] + 1))
  data_disease_DK[, comorbiditiesA[i]] <- as.numeric(data_disease_DK[, comorbiditiesD[i]]) - as.numeric(data_disease_DK[, "fdatoD"])
}

Finally, it is important to identify the persons who developed Neurotic Disorders before the start of follow-up (prevalent cases) and consider only those at risk of developing Neurotic Disorders.

prevalent <- data_disease_DK[data_disease_DK$entryA >= data_disease_DK$exitA, 
                             c("pnr", "fdatoD", "entry", "entryD", "entryA", "exitD", "exitA", "fail", "statdA", "kqn", "Byear", comorbiditiesA)]
at_risk <- data_disease_DK[data_disease_DK$entryA < data_disease_DK$exitA, 
                           c("pnr", "fdatoD", "entry", "entryD", "entryA", "exitD", "exitA", "fail", "statdA", "kqn", "Byear", comorbiditiesA)]

nrow(prevalent); nrow(at_risk)
## [1] 89348
## [1] 5610596

From the total of 5699944 persons who were living in Denmark on the date of start of follow-up, 89348 had a previous diagnosis of Neurotic Disorders, and were excluded from the analysis as prevalent cases. The remaining 5610596 did not have a previous diagnosis of Neurotic Disorders and were at risk of becoming an incident case during the follow-up, and thus included in the analysis (this information is included in Table 1 in the publication).

This table includes invented data (not real person)
pnr fdatoD entry entryD entryA exitD exitA fail statdA kqn Byear D00A D10A D20A D30A D51A D61A D70A D81A D91A
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA

3. Break ties

For some persons, the prior-disorder and later-disorder occurred on the same day (ties). In a time-to-event setting, individuals are followed until the outcome (later-disorder), and these persons would not be considered to be exposed to the prior-disorder. This would underestimate all associations, given that concurrant pairs of disorders could never occur. To fix it, we broke some of these ties, as explained below.

exposures <- comorbidities
exposures
## [1] "D00" "D10" "D20" "D30" "D51" "D61" "D70" "D81" "D91"

This is the list of variables containing the nine possible prior-disorders.

temp <- at_risk
temp$outcome <- (temp$fail %in% c(1,71,81,91))*1
table(temp$outcome)
## 
##       0       1 
## 5405355  205241

In total, there are 205241 who develop the later-disorder during the follow-up (this information is included in Table 1 of the publication). These are the persons that we need to observe to identify ties (a prior-disorder with onset on the same day). We will do it for each of the nine prior-disorders:

for(i in 1:length(exposures)) {
  # First we identify those with both the prior- and later-disorder
  both_dx <- temp[!is.na(temp[, paste(exposures[i],"A",sep="")]) & temp$outcome==1, c("pnr", paste(exposures[i],"A",sep=""), "exitA")]  
  both_dx$diff <- both_dx[, "exitA"] - both_dx[, paste(exposures[i],"A",sep="")]
  
  # Only if the two disorders happen within 5 years
  both_dx <- both_dx[abs(both_dx$diff)<5, ]
  
  # How many the prior-disorder occurs before or after the later-disorder
  before <- dim(both_dx[both_dx$diff>0, ])[1]
  after <- dim(both_dx[both_dx$diff<0, ])[1]
  
  # prop_before is the proportion of persons who develop first the prior-disorder among those who 
  # develop the two disorders within 5 years (but not on the same day)
  prop_before <- before/(before+after)
  if(before==0) {
    prop_before <- 0
  }
  
  # This proportion is used as a probability to break ties. Each tie will be broken with probability
  # prop_before by moving the prior-disorder to one day earlier, following a random uniform distribution.
  temp$random <- runif(n=dim(temp)[1])
  temp[temp$random <= prop_before & !is.na(temp[, paste(exposures[i],"A",sep="")]) & temp$outcome==1 & temp[, paste(exposures[i],"A",sep="")]==temp$exitA, paste(exposures[i],"A",sep="")] <-
    temp[temp$random <= prop_before & !is.na(temp[, paste(exposures[i],"A",sep="")]) & temp$outcome==1 & temp[, paste(exposures[i],"A",sep="")]==temp$exitA, paste(exposures[i],"A",sep="")] - (1/365.25)          
  
  temp <- temp[, names(temp) != "random"]
}

Data for the invented person do not change compared to previous step, because there are no ties.

This table includes invented data (not real person)
pnr fdatoD entry entryD entryA exitD exitA fail statdA kqn Byear D00A D10A D20A D30A D51A D61A D70A D81A D91A
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA

4. Time-varying variables

Prepare data for time-to-event analysis

We create new variables id, tstart, tstop and endpoint (equivalent to pnr, entryA, exitA, outcome), but that will change later.

temp <- tmerge(temp, temp, id = pnr, tstart = entryA, tstop = exitA)
temp <- tmerge(temp, temp, id = pnr, endpoint = event(exitA, outcome))
This table includes invented data (not real person)
pnr fdatoD entry entryD entryA exitD exitA fail statdA kqn Byear D00A D10A D20A D30A D51A D61A D70A D81A D91A outcome id tstart tstop endpoint
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 14.589 29.44927 1

Split follow-up time

Each person with a diagnosis of Mood Disorders (D30) needs to be “unexposed” until the onset of the disorder, and “exposed” afterwards. In addition, we want to split the follow-up time further depending on time since exposure (6 months, 1, 2, 5, 10, 15, >15 years). For each person, we need to specify in which dates the follow-up time is going to be splitted.

Identify at what ages the follow-up time should be splitted

# First on date of onset of Mood Disorders
db_exposures0 <- temp[!is.na(temp[, "D30A"]), c("pnr", "exitA", "D30A")]
db_exposures0 <- db_exposures0[db_exposures0[, "exitA"] >= db_exposures0[, "D30A"],]

# And then 6 months, 1 year, 2 years, etc. later.
db_exposures6m <- db_exposures0
db_exposures6m[, "D30A"] <- db_exposures0[, "D30A"] + (1/2)
db_exposures6m <- db_exposures6m[db_exposures6m[, "exitA"] >= db_exposures6m[, "D30A"],]
db_exposures1 <- db_exposures0
db_exposures1[, "D30A"] <- db_exposures0[, "D30A"] + 1
db_exposures1 <- db_exposures1[db_exposures1[, "exitA"] >= db_exposures1[, "D30A"],]
db_exposures2 <- db_exposures0
db_exposures2[, "D30A"] <- db_exposures0[, "D30A"] + 2
db_exposures2 <- db_exposures2[db_exposures2[, "exitA"] >= db_exposures2[, "D30A"],]
db_exposures5 <- db_exposures0
db_exposures5[, "D30A"] <- db_exposures0[, "D30A"] + 5
db_exposures5 <- db_exposures5[db_exposures5[, "exitA"] >= db_exposures5[, "D30A"],]
db_exposures10 <- db_exposures0
db_exposures10[, "D30A"] <- db_exposures0[, "D30A"] + 10
db_exposures10 <- db_exposures10[db_exposures10[, "exitA"] >= db_exposures10[, "D30A"],]
db_exposures15 <- db_exposures0
db_exposures15[, "D30A"] <- db_exposures0[, "D30A"] + 15
db_exposures15 <- db_exposures15[db_exposures15[, "exitA"] >= db_exposures15[, "D30A"],]

# Finally we combine all these times
db_exposures <- rbind(db_exposures0, db_exposures6m, db_exposures1, db_exposures2, db_exposures5, db_exposures10, db_exposures15)
db_exposures <- db_exposures[!duplicated(db_exposures), ]
names(db_exposures)[names(db_exposures)=="D30A"]<-"cut_points"
db_exposures <- db_exposures[, c("pnr", "cut_points")]
This table includes invented data (not real person)
pnr cut_points
invented_person 25.50681
invented_person 26.00681
invented_person 26.50681
invented_person 27.50681

The invented person developed Mood Disorders at age 25.5, that is when the first splitting takes place. Then it will be splitted again 6 months after and 1 and 2 years after. No more times will be needed, given that this specific person is followed until age 29.8.

Split the follow-up at those ages

We create two new variables: expo_D30 that is a binary variable saying whether the person has had a diagnosis of Mood Disorders, and expo_time_after_D30 that counts also the time after the onset.

temp <- tmerge(temp, db_exposures, id = pnr, expo_time_after_D30=cumtdc(cut_points))
temp$expo_D30 <- (temp$expo_time_after_D30 >= 1)
This table includes invented data (not real person)
pnr fdatoD entry entryD entryA exitD exitA fail statdA kqn Byear D00A D10A D20A D30A D51A D61A D70A D81A D91A outcome id tstart tstop endpoint expo_time_after_D30 expo_D30
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 14.58900 25.50681 0 0 FALSE
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 25.50681 26.00681 0 1 TRUE
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 26.00681 26.50681 0 2 TRUE
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 26.50681 27.50681 0 3 TRUE
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 27.50681 29.44927 1 4 TRUE

Note that the invented person now has 5 different observations, where tstop of each observation finishes on tstart of the following one, and the two variables regarding exposure to Mood Disorders change in each observation.

Split follow-up time for comorbidities

There is one prior-disorder (Mood Disorders) and one later-disorder (Neurotic Disorders). The remaining eight disorders are used to adjust the models for mental health comorbidity. It is therefore necessary to replicate the previous step with the other 8 disorders.

adjustment <- exposures[exposures!="D30"]
adjustment 
## [1] "D00" "D10" "D20" "D51" "D61" "D70" "D81" "D91"
for(i in 1:length(adjustment)) {
  
  db_exposures0 <- temp[!is.na(temp[, paste(adjustment[i],"A",sep="")]), c("pnr", "exitA", "D30A", paste(adjustment[i],"A",sep=""))]
  
  ## We do not want to adjust for mediators (only for confounders), so we ignore mental disorders with onset after the prior-disorder
  db_exposures0 <- db_exposures0[is.na(db_exposures0[, "D30A"]) |
                                   (!is.na(db_exposures0[, "D30A"]) & db_exposures0[, paste(adjustment[i],"A",sep="")] < db_exposures0[, "D30A"]),]
  db_exposures0 <- db_exposures0[db_exposures0[, "exitA"] >= db_exposures0[, paste(adjustment[i],"A",sep="")],]
  db_exposures0 <- db_exposures0[!duplicated(db_exposures0), ]
  
  names(db_exposures0)[names(db_exposures0)==paste(adjustment[i],"A",sep="")]<-"cut_points"
  db_exposures0 <- db_exposures0[, c("pnr", "cut_points")]
  temp <- tmerge(temp, db_exposures0, id = pnr, exposure=tdc(cut_points))

  colnames(temp)[colnames(temp) == "exposure"] <- paste("adjust_D30_by_prev_", adjustment[i], sep="")
}
This table includes invented data (not real person)
pnr fdatoD entry entryD entryA exitD exitA fail statdA kqn Byear D00A D10A D20A D30A D51A D61A D70A D81A D91A outcome id tstart tstop endpoint expo_time_after_D30 expo_D30 adjust_D30_by_prevD00 adjust_D30_by_prev_D10 adjust_D30_by_prevD20 adjust_D30_by_prevD51 adjust_D30_by_prevD61 adjust_D30_by_prevD70 adjust_D30_by_prevD81 adjust_D30_by_prevD91
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 14.58900 20.37804 0 0 FALSE 0 0 0 0 0 0 0 0
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 20.37804 25.50681 0 0 FALSE 0 1 0 0 0 0 0 0
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 25.50681 26.00681 0 1 TRUE 0 1 0 0 0 0 0 0
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 26.00681 26.50681 0 2 TRUE 0 1 0 0 0 0 0 0
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 26.50681 27.50681 0 3 TRUE 0 1 0 0 0 0 0 0
invented_person 1985.411 2000-01-01 2000 14.589 2014.86 29.44927 1 29.784 K 1985 NA 20.37804 NA 25.50681 25.87119 NA NA NA NA 1 invented_person 27.50681 29.44927 1 4 TRUE 0 1 0 0 0 0 0 0

Note that Substance Use disorders (D10) occurred before the prior-disorder, and it is considered as a time-varying variable. However, onset of Eating Disorders (D51) occurred after the onset of the prior-disorder, and it is not considered when adjusting for mental health comorbidity.

Save the data

data_survival <- temp
save(prevalent, at_risk, data_survival, data_invented, file="SHARED_disease_D41.RData")

5. Descriptive analysis

Load the data

Load the datasets prepared in steps 1-4.

rm(list = ls())
load(file="SHARED_disease_D41.RData")

Descriptive statistics of Neurotic Disorders as later-disorder (Table 1 of the publication)

Prevalent cases before start of follow-up - overall and by sex

nrow(prevalent); table(prevalent$kqn)
## [1] 89348
## 
##     K     M 
## 57910 31438

Persons at risk of developing Neurotic Disorders during follow-up - overall and by sex

nrow(at_risk); table(at_risk$kqn)
## [1] 5610596
## 
##       K       M 
## 2807442 2803154

New cases of Neurotic Disorders during follow-up - overall and by sex

nrow(data_survival[data_survival$endpoint==1, ]); table(data_survival[data_survival$endpoint==1, "kqn"])
## [1] 205241
## 
##      K      M 
## 120756  84485

Association between Mood Disorders (prior-disorder) and Neurotic Disorders (later-disorder)

1. Count persons, person-years and number of cases for exposed and unexposed to Mood Disorders

# Some persons have more than one observation (due to splitting time), and they need to be counted only once
number <- data_survival[, c("pnr", "kqn", "expo_D30")]
number <- number[!duplicated(number), ]
number$number <- 1
number_sex <- aggregate(number ~ expo_D30 + kqn, data = number, FUN=sum, na.rm=TRUE)

data_survival$time <- data_survival$tstop - data_survival$tstart
py_sex <- aggregate(cbind(time, endpoint) ~ expo_D30 + kqn, data = data_survival, FUN=sum, na.rm=TRUE)

number <- merge(number_sex, py_sex)
number2 <- rbind(number, 
                 cbind(aggregate(cbind(number, time, endpoint) ~ expo_D30, data = number, FUN=sum, na.rm=TRUE), kqn="all")
) 

# According to Danish regulations, we cannot report cells with less than 3 cases
number2[number2$number<3, "number"] <- NA
number2[number2$endpoint<3, "endpoint"] <- NA

kable(number2) 
expo_D30 kqn number time endpoint
FALSE K 2773129 37728220.1 98555
FALSE M 2784330 38039730.0 71364
TRUE K 102155 757457.0 22201
TRUE M 62826 445976.4 13121
FALSE all 5557459 75767950.1 169919
TRUE all 164981 1203433.4 35322

Note that a person could be counted twice (one among exposed and one among unexposed) if exposure to Mood Disorders occurs during the follow-up time.

2. Count persons, person-years and number of cases for exposed and unexposed to Mood Disorders (considering time since exposure)

number <- data_survival[, c("pnr", "kqn", "expo_time_after_D30")]
number <- number[!duplicated(number), ]
number$number <- 1
number_sex <- aggregate(number ~ expo_time_after_D30 + kqn, data = number, FUN=sum, na.rm=TRUE)

data_survival$time <- data_survival$tstop - data_survival$tstart
py_sex <- aggregate(cbind(time, endpoint) ~ expo_time_after_D30 + kqn, data = data_survival, FUN=sum, na.rm=TRUE)

number <- merge(number_sex, py_sex)
number2 <- rbind(number, 
                 cbind(aggregate(cbind(number, time, endpoint) ~ expo_time_after_D30, data = number, FUN=sum, na.rm=TRUE), kqn="all")
) 
# According to Danish regulations, we cannot report cells with less than 3 cases
number2[number2$number<3, "number"] <- NA
number2[number2$endpoint<3, "endpoint"] <- NA

kable(number2)
expo_time_after_D30 kqn number time endpoint
0 K 2773129 37728220.07 98555
0 M 2784330 38039730.02 71364
1 K 69228 29427.65 9559
1 M 44894 18950.67 6164
2 K 58431 27836.62 1500
2 M 37537 17880.54 802
3 K 57103 52463.44 1987
3 M 36528 33614.52 1092
4 K 56917 136880.70 3256
4 M 35956 86405.72 1932
5 K 45298 164371.80 2758
5 M 27864 100406.19 1530
6 K 28634 98242.20 1267
6 M 17236 59286.81 679
7 K 29946 248234.57 1874
7 M 16475 129431.98 922
0 all 5557459 75767950.10 169919
1 all 114122 48378.32 15723
2 all 95968 45717.16 2302
3 all 93631 86077.97 3079
4 all 92873 223286.42 5188
5 all 73162 264777.99 4288
6 all 45870 157529.01 1946
7 all 46421 377666.55 2796

6. Hazard ratios

Load the data

Load the datasets prepared in steps 1-4.

rm(list = ls())
load(file="SHARED_disease_D41.RData")

1. Hazard ratios for persons (males and females combined)

Model A (adjusted for sex and calendar time)
model_A <- coxph(Surv(tstart, tstop, endpoint) ~ expo_D30 + kqn + entryD, data = data_survival)
print(paste0("HR = ", format(exp(summary(model_A)$coefficients[1, 1]), digits=1, nsmall=1),
      " [95% CI: ", format(summary(model_A)$conf.int[1, 3], digits=1, nsmall=1), "-", 
      format(summary(model_A)$conf.int[1, 4], digits=1, nsmall=1), "]"))
## [1] "HR = 15.7 [95% CI: 15.5-15.9]"
Model B (further adjusted for mental health comorbidity)
# Model B adjusts also for number of disorders, and it needs to be counted.
data_survival$num_others <- rowSums(data_survival[, c("adjust_D30_by_prev_D00", "adjust_D30_by_prev_D10", "adjust_D30_by_prev_D20", "adjust_D30_by_prev_D51", "adjust_D30_by_prev_D61", "adjust_D30_by_prev_D70","adjust_D30_by_prev_D81", "adjust_D30_by_prev_D91")])
data_survival$num_others[data_survival$num_others>4] <- 4
data_survival$num_others[data_survival$num_others==1] <- 0

model_B <- coxph(Surv(tstart, tstop, endpoint) ~ expo_D30 + kqn + entryD
                 + adjust_D30_by_prev_D00 + adjust_D30_by_prev_D10 + adjust_D30_by_prev_D20
                 + adjust_D30_by_prev_D51 + adjust_D30_by_prev_D61 + adjust_D30_by_prev_D70
                 + adjust_D30_by_prev_D81 + adjust_D30_by_prev_D91 + as.factor(num_others), data = data_survival)
print(paste0("HR = ", format(exp(summary(model_B)$coefficients[1, 1]), digits=1, nsmall=1),
             " [95% CI: ", format(summary(model_B)$conf.int[1, 3], digits=1, nsmall=1), "-", 
             format(summary(model_B)$conf.int[1, 4], digits=1, nsmall=1), "]"))
## [1] "HR = 10.5 [95% CI: 10.3-10.6]"
Lagged HR - Model A
times <- c("0-6 months", "6-12 months", "1-2 years", "2-5 years", "5-10 years", "10-15 years", "15+ years")

model_A_lagged <- coxph(Surv(tstart, tstop, endpoint) ~ as.factor(expo_time_after_D30) + kqn + entryD, data = data_survival)

for(i in 1:length(times)) {
  print(paste0(times[i], ": HR = ", format(exp(summary(model_A_lagged)$coefficients[i, 1]), digits=1, nsmall=1), " [95% CI: ", format(summary(model_A_lagged)$conf.int[i, 3], digits=1, nsmall=1), "-", format(summary(model_A_lagged)$conf.int[i, 4], digits=1, nsmall=1), "]"))
}
## [1] "0-6 months: HR = 117.9 [95% CI: 115.9-119.8]"
## [1] "6-12 months: HR = 18.5 [95% CI: 17.7-19.2]"
## [1] "1-2 years: HR = 13.5 [95% CI: 13.0-14.0]"
## [1] "2-5 years: HR = 9.4 [95% CI: 9.2- 9.7]"
## [1] "5-10 years: HR = 7.6 [95% CI: 7.4-7.9]"
## [1] "10-15 years: HR = 7.1 [95% CI: 6.8-7.4]"
## [1] "15+ years: HR = 6.4 [95% CI: 6.2-6.7]"
Lagged HR - Model B
model_B_lagged <- coxph(Surv(tstart, tstop, endpoint) ~ as.factor(expo_time_after_D30) + kqn + entryD
                 + adjust_D30_by_prev_D00 + adjust_D30_by_prev_D10 + adjust_D30_by_prev_D20
                 + adjust_D30_by_prev_D51 + adjust_D30_by_prev_D61 + adjust_D30_by_prev_D70
                 + adjust_D30_by_prev_D81 + adjust_D30_by_prev_D91 + as.factor(num_others), data = data_survival)

for(i in 1:length(times)) {
  print(paste0(times[i], ": HR = ", format(exp(summary(model_B_lagged)$coefficients[i, 1]), digits=1, nsmall=1), " [95% CI: ", format(summary(model_B_lagged)$conf.int[i, 3], digits=1, nsmall=1), "-", format(summary(model_B_lagged)$conf.int[i, 4], digits=1, nsmall=1), "]"))
}
## [1] "0-6 months: HR = 77.9 [95% CI: 76.5-79.2]"
## [1] "6-12 months: HR = 12.2 [95% CI: 11.7-12.7]"
## [1] "1-2 years: HR = 8.9 [95% CI: 8.6-9.2]"
## [1] "2-5 years: HR = 6.2 [95% CI: 6.0-6.4]"
## [1] "5-10 years: HR = 5.0 [95% CI: 4.9-5.2]"
## [1] "10-15 years: HR = 4.6 [95% CI: 4.4-4.8]"
## [1] "15+ years: HR = 4.3 [95% CI: 4.1-4.4]"

2. Sex-specific hazard ratios

Model A (adjusted for sex and calendar time)
model_A <- coxph(Surv(tstart, tstop, endpoint) ~ expo_D30*kqn + entryD, data = data_survival)
matrix_contrasts <-t(
    matrix(c(
      1,0,0,0,
      1,0,0,1
    ), ncol=2))
model_A_sex <- glht(model_A, linfct=matrix_contrasts)
HR <- data.frame(format(exp(confint(model_A_sex)$confint)[1:2, ], digits=1, nsmall=1))
HR[1, "sex"] <- "Females"
HR[2, "sex"] <- "Males"
HR
##   Estimate  lwr  upr     sex
## 1     13.8 13.6 14.1 Females
## 2     19.9 19.4 20.3   Males
Model B (further adjusted for mental health comorbidity)
model_B <- coxph(Surv(tstart, tstop, endpoint) ~ expo_D30*kqn + entryD
                 + adjust_D30_by_prev_D00 + adjust_D30_by_prev_D10 + adjust_D30_by_prev_D20
                 + adjust_D30_by_prev_D51 + adjust_D30_by_prev_D61 + adjust_D30_by_prev_D70
                 + adjust_D30_by_prev_D81 + adjust_D30_by_prev_D91 + as.factor(num_others), data = data_survival)

matrix_contrasts <-t(
  matrix(c(
    1,rep(0, 13), 0,
    1,rep(0, 13), 1
  ), ncol=2))

model_B_sex <- glht(model_B, linfct=matrix_contrasts)
HR <- data.frame(format(exp(confint(model_B_sex)$confint)[1:2, ], digits=1, nsmall=1))
HR[1, "sex"] <- "Females"
HR[2, "sex"] <- "Males"
HR
##   Estimate  lwr  upr     sex
## 1      9.7  9.5  9.9 Females
## 2     12.0 11.7 12.3   Males
Lagged HR - Model A
model_A_lagged <- coxph(Surv(tstart, tstop, endpoint) ~ as.factor(expo_time_after_D30)*kqn + entryD, data = data_survival)
matrix_contrasts <-t(
  matrix(c(
    1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,
    1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,
    0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,
    0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,
    0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,
    0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,
    0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,
    0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1    
  ), ncol=14))

model_A_lagged_sex <- glht(model_A_lagged, linfct=matrix_contrasts)
HR <- data.frame(format(exp(confint(model_A_lagged_sex)$confint)[1:14, ], digits=1, nsmall=1))
HR[1:7, "sex"] <- "Females"
HR[8:14, "sex"] <- "Males"
HR$time_after_exposure <- c(times, times)
HR
##    Estimate   lwr   upr     sex time_after_exposure
## 1     100.6  97.5 103.8 Females          0-6 months
## 2      16.9  15.7  18.2 Females         6-12 months
## 3      12.3  11.5  13.1 Females           1-2 years
## 4       8.4   7.9   8.8 Females           2-5 years
## 5       6.9   6.5   7.3 Females          5-10 years
## 6       6.5   6.0   7.1 Females         10-15 years
## 7       5.9   5.5   6.4 Females           15+ years
## 8     156.4 150.4 162.5   Males          0-6 months
## 9      21.7  19.6  24.1   Males         6-12 months
## 10     16.0  14.7  17.5   Males           1-2 years
## 11     11.7  10.9  12.5   Males           2-5 years
## 12      9.2   8.5   9.9   Males          5-10 years
## 13      8.4   7.5   9.3   Males         10-15 years
## 14      7.6   6.9   8.3   Males           15+ years
Lagged HR - Model B
model_B_lagged <- coxph(Surv(tstart, tstop, endpoint) ~ as.factor(expo_time_after_D30)*kqn + entryD
                 + adjust_D30_by_prev_D00 + adjust_D30_by_prev_D10 + adjust_D30_by_prev_D20
                 + adjust_D30_by_prev_D51 + adjust_D30_by_prev_D61 + adjust_D30_by_prev_D70
                 + adjust_D30_by_prev_D81 + adjust_D30_by_prev_D91 + as.factor(num_others), data = data_survival)
matrix_contrasts <-t(
  matrix(c(
    1,0,0,0,0,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
    0,1,0,0,0,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
    0,0,1,0,0,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
    0,0,0,1,0,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
    0,0,0,0,1,0,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
    0,0,0,0,0,1,0,0,rep(0, 11),0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,1,0,rep(0, 11),0,0,0,0,0,0,0,0,
    1,0,0,0,0,0,0,0,rep(0, 11),0,1,0,0,0,0,0,0,
    0,1,0,0,0,0,0,0,rep(0, 11),0,0,1,0,0,0,0,0,
    0,0,1,0,0,0,0,0,rep(0, 11),0,0,0,1,0,0,0,0,
    0,0,0,1,0,0,0,0,rep(0, 11),0,0,0,0,1,0,0,0,
    0,0,0,0,1,0,0,0,rep(0, 11),0,0,0,0,0,1,0,0,
    0,0,0,0,0,1,0,0,rep(0, 11),0,0,0,0,0,0,1,0,
    0,0,0,0,0,0,1,0,rep(0, 11),0,0,0,0,0,0,0,1    
  ), ncol=14))

model_B_lagged_sex <- glht(model_B_lagged, linfct=matrix_contrasts)
HR <- data.frame(format(exp(confint(model_B_lagged_sex)$confint)[1:14, ], digits=1, nsmall=1))
HR[1:7, "sex"] <- "Females"
HR[8:14, "sex"] <- "Males"
HR$time_after_exposure <- c(times, times)
HR
##    Estimate  lwr  upr     sex time_after_exposure
## 1      70.4 68.1 72.6 Females          0-6 months
## 2      11.8 10.9 12.7 Females         6-12 months
## 3       8.6  8.0  9.2 Females           1-2 years
## 4       5.8  5.5  6.2 Females           2-5 years
## 5       4.8  4.6  5.1 Females          5-10 years
## 6       4.4  4.1  4.8 Females         10-15 years
## 7       4.1  3.8  4.4 Females           15+ years
## 8      92.7 89.1 96.4   Males          0-6 months
## 9      12.8 11.6 14.2   Males         6-12 months
## 10      9.5  8.7 10.4   Males           1-2 years
## 11      6.9  6.5  7.4   Males           2-5 years
## 12      5.5  5.1  5.9   Males          5-10 years
## 13      4.9  4.4  5.5   Males         10-15 years
## 14      4.6  4.2  5.1   Males           15+ years

7. Absolute risks

Load the data

Load the datasets prepared in steps 1-4.

rm(list = ls())
load(file="SHARED_disease_D41.RData")

Prepare data for a multi-state model

In the Cox proportional hazards models (used to estimate hazard ratios), those who died or emigrated/disappeared were censored at date of death or emigration. The absolute risks can be biased if we do not take into consideration these competing risks events. For this reason, we give a new outcome to those events: 2 for disappearance or emigration, and 3 for death.

data_survival[data_survival$statdA==data_survival$tstop & data_survival$fail %in% c(70, 80), "endpoint"] <- 2
data_survival[data_survival$statdA==data_survival$tstop & data_survival$fail==90, "endpoint"] <- 3

For multi-state models, it is necessary to define a matrix of transitions. In this example, every person starts being free of the later-disorder (Neurotic Disorders), and they are labelled “healthy”. During follow-up, they can change the status from “healthy” to “diseased”, “emigrated”, or “dead”. No other transitions (e.g. from “emigrated” to “diseased”) are allowed.

tmat <- transMat(x=list(c(2,3,4), c(), c(), c()), names=c("healthy", "disease", "emi", "death"))
tmat
##          to
## from      healthy disease emi death
##   healthy      NA       1   2     3
##   disease      NA      NA  NA    NA
##   emi          NA      NA  NA    NA
##   death        NA      NA  NA    NA

There are 4 states, but only 3 transitions. Each observation in the dataset has to be repeated 3 times. In each repetition, one different possible outcome is considered, and the other two treated as censoring.

CRcoxdata <- combine(data_survival, data_survival, data_survival, names = c(1, 2, 3))

A new outcome d can now be created, where each type of outcome is considering censoring in the other two repetitions.

CRcoxdata$d <- ((CRcoxdata$endpoint == 1) * (CRcoxdata$source == 1)) +
  ((CRcoxdata$endpoint == 2) * (CRcoxdata$source == 2)) +
  ((CRcoxdata$endpoint == 3) * (CRcoxdata$source == 3))

CRcoxdata$h <- as.numeric(CRcoxdata$source)

The invented data used in steps 1-4 can be used to illustrate the data structure. Note that the 6 observations for the invented person have been repeated 3 times.

This table includes invented data (not real person)
pnr D30A tstart tstop endpoint source d h
invented_person 25.50681 14.58900 20.37804 0 1 0 1
invented_person 25.50681 20.37804 25.50681 0 1 0 1
invented_person 25.50681 25.50681 26.00681 0 1 0 1
invented_person 25.50681 26.00681 26.50681 0 1 0 1
invented_person 25.50681 26.50681 27.50681 0 1 0 1
invented_person 25.50681 27.50681 29.44927 1 1 1 1
invented_person 25.50681 14.58900 20.37804 0 2 0 2
invented_person 25.50681 20.37804 25.50681 0 2 0 2
invented_person 25.50681 25.50681 26.00681 0 2 0 2
invented_person 25.50681 26.00681 26.50681 0 2 0 2
invented_person 25.50681 26.50681 27.50681 0 2 0 2
invented_person 25.50681 27.50681 29.44927 1 2 0 2
invented_person 25.50681 14.58900 20.37804 0 3 0 3
invented_person 25.50681 20.37804 25.50681 0 3 0 3
invented_person 25.50681 25.50681 26.00681 0 3 0 3
invented_person 25.50681 26.00681 26.50681 0 3 0 3
invented_person 25.50681 26.50681 27.50681 0 3 0 3
invented_person 25.50681 27.50681 29.44927 1 3 0 3

Focus only on persons with Mood Disorders (prior-disorder)

In these analyses, we aim to estimate the proportion of individuals with a diagnosis of a prior-disorder (Mood Disorders) who later develop a later-disorder (Neurotic Disorders). For this reason, we can keep only those with a diagnosis of Mood Disorders, and start the follow-up time at onset of the disorder.

CRcoxdata_expo <- CRcoxdata[!is.na(CRcoxdata[, "D30A"]),]
CRcoxdata_expo[, "expo_Tstart"] <- CRcoxdata_expo[, "tstart"] - CRcoxdata_expo[, "D30A"]
CRcoxdata_expo[, "expo_Tstop"] <- CRcoxdata_expo[, "tstop"] - CRcoxdata_expo[, "D30A"]
CRcoxdata_expo <- CRcoxdata_expo[CRcoxdata_expo$expo_Tstart >= 0,]
This table includes invented data (not real person)
pnr D30A tstart tstop expo_Tstart expo_Tstop endpoint source d h
invented_person 25.50681 25.50681 26.00681 0.0 0.500000 0 1 0 1
invented_person 25.50681 26.00681 26.50681 0.5 1.000000 0 1 0 1
invented_person 25.50681 26.50681 27.50681 1.0 2.000000 0 1 0 1
invented_person 25.50681 27.50681 29.44927 2.0 3.942466 1 1 1 1
invented_person 25.50681 25.50681 26.00681 0.0 0.500000 0 2 0 2
invented_person 25.50681 26.00681 26.50681 0.5 1.000000 0 2 0 2
invented_person 25.50681 26.50681 27.50681 1.0 2.000000 0 2 0 2
invented_person 25.50681 27.50681 29.44927 2.0 3.942466 1 2 0 2
invented_person 25.50681 25.50681 26.00681 0.0 0.500000 0 3 0 3
invented_person 25.50681 26.00681 26.50681 0.5 1.000000 0 3 0 3
invented_person 25.50681 26.50681 27.50681 1.0 2.000000 0 3 0 3
invented_person 25.50681 27.50681 29.44927 2.0 3.942466 1 3 0 3

The new variables expo_Tstart and expo_Tstop focus on time since onset of Mood Disorders, compared to the old variables tstart and tstop that focus on time since birth (age).

Each replication (h=1, h=2, and h=3) is linked to one specific transition (1–>2, 1–>3, 1–>4). Two new variables from and to define these transitions.

CRcoxdata_expo$from <- 1
CRcoxdata_expo$to <- CRcoxdata_expo$h + 1

In order to investigate specific absolute risks depending on age at diagnosis of Mood Disorders, different age groups have to be identified.

CRcoxdata_expo$age_group <- cut(CRcoxdata_expo[, "D30A"], breaks=c(0, 20, 40, 60, 80, 200), include.lowest=TRUE, right = FALSE)
CRcoxdata_expo$age_group <- droplevels(CRcoxdata_expo$age_group)

Finally, the dataset was splitted many times (because of mental health comorbidity, time since exposure, etc.). This level of detail is not necessary for this analysis, so the dataset can be minimized.

CRcoxdata_expo_min <- data.table(CRcoxdata_expo[, c("pnr", "kqn", "D30A", "age_group", "expo_Tstart", "expo_Tstop", "from", "to", "h", "d")])
CRcoxdata_expo_min <- CRcoxdata_expo_min[, list(tstart=min(expo_Tstart), tstop=max(expo_Tstop), d=max(d)), by=c("pnr", "kqn", "D30A", "age_group", "from", "to", "h")]
CRcoxdata_expo_min <- data.frame(CRcoxdata_expo_min)
This table includes invented data (not real person)
pnr D30A age_group kqn tstart tstop from to d
invented_person 25.50681 [20,40) K 0 3.942466 1 2 1
invented_person 25.50681 [20,40) K 0 3.942466 1 3 0
invented_person 25.50681 [20,40) K 0 3.942466 1 4 0

The invented person is followed from onset of Mood Disorders (at age 25.5 years) during 3.94 years, moment in which this person is diagnosed with Neurotic Disorders (and censored for emigration or death).

Count number of persons

In this step, the number of persons with Mood Disorders (exposed) is calculated, together with how many develop subsequent Neurotic Disorders (d).

count_data <- CRcoxdata_expo_min[CRcoxdata_expo_min$h==1, c("pnr", "kqn", "age_group", "tstop", "d")]
count_data <- count_data[!duplicated(count_data), ]
count_data$number <- 1
numbers <- aggregate(cbind(number, d) ~ kqn + age_group, data = count_data, FUN=sum, na.rm=TRUE)

# According to Danish regulations, we cannot report cells with less than 3 cases
numbers_print <- numbers
numbers_print[numbers_print$number<3, "number"] <- NA
numbers_print[numbers_print$d<3, "d"] <- NA

kable(numbers_print)
kqn age_group number d
K [0,20) 11474 4472
M [0,20) 5398 1649
K [20,40) 36526 9990
M [20,40) 23077 6003
K [40,60) 28614 5470
M [40,60) 21018 4172
K [60,80) 17207 1911
M [60,80) 10121 1140
K [80,200] 8334 358
M [80,200] 3212 157

Estimate cumulative incidence proportion (CIP) for each sex and age group

for(age_group in c("all", levels(CRcoxdata_expo_min$age_group))) { 
  # First we select the observations for the age group of interest
  if(age_group=="all") { 
    multi <- CRcoxdata_expo_min
  } else {
    multi <- CRcoxdata_expo_min[CRcoxdata_expo_min$age_group==age_group, ]
  }  
  
  for(sex in c("all", "M", "K")) {
    # Second we select the observations for the sex of interest
    if(sex=="all") {
      multi2 <- multi
    } else {
      multi2 <- multi[(multi[, "kqn"]==sex), ]
    }
    
    # The CIP is estimated only if there is at least one person in the specific sex- and age-group
    if(dim(multi2)[1]>0) {
      multi2$trans <- multi2$h
      coxres <- coxph(Surv(tstart, tstop, d) ~ strata(trans), data=multi2, method="breslow")
      who <- multi2[c(1:3), ]
      who$strata<-1:3
      msf0 <- msfit(coxres, who, trans=tmat)
      pt0 <- probtrans(msf0, predt=0)
      CIP_aux <- pt0[[1]]
      CIP_aux$sex <- sex
      CIP_aux$age_group <- age_group
      
      if (age_group=="all" & sex=="all") {
        CIP <- CIP_aux
      } else {
        CIP <- rbind(CIP, CIP_aux)
      }
    }
  }    
}

head(CIP)
##          time   pstate1     pstate2 pstate3      pstate4          se1
## 1 0.000000000 1.0000000 0.000000000       0 0.000000e+00 0.0000000000
## 2 0.002732240 0.9983728 0.001573522       0 5.364279e-05 0.0001204172
## 3 0.002732240 0.9980241 0.001913317       0 6.258478e-05 0.0001326891
## 4 0.002737851 0.9916574 0.008280012       0 6.258478e-05 0.0002712736
## 5 0.002737851 0.9655200 0.034417440       0 6.258478e-05 0.0005397441
## 6 0.002737851 0.9612815 0.038655942       0 6.258478e-05 0.0005712619
##            se2 se3          se4 sex age_group
## 1 0.0000000000   0 0.000000e+00 all       all
## 2 0.0001184157   0 2.186394e-05 all       all
## 3 0.0001305720   0 2.362047e-05 all       all
## 4 0.0002702575   0 2.362047e-05 all       all
## 5 0.0005392612   0 2.362047e-05 all       all
## 6 0.0005708098   0 2.362047e-05 all       all

Keep only necessary information

There is information on too many data points and for all possible transitions. In this step, we will keep only one estimate per month (aprox 0.08 years) and only transition 1 (going from “Healthy” to “Diseased”).

CIP$time_since_dx <- 0.08 * ceiling(as.numeric(CIP$time)/0.08)
CIP <- ddply(CIP, .(sex, age_group, time_since_dx), function(x) tail(x, 1))
CIP$cip <- 100*CIP$pstate2
CIP[, "cip_low"] <- pmax(0, 100*exp(log(CIP[, "pstate2"]) - qnorm(0.975) * (CIP[, "se2"] / CIP[, "pstate2"])))
CIP[, "cip_high"] <- pmin(100, 100*exp(log(CIP[, "pstate2"]) + qnorm(0.975) * (CIP[, "se2"] / CIP[, "pstate2"])))
CIP <- CIP[, c("sex", "age_group", "time", "cip", "cip_low", "cip_high")]
head(CIP)
##   sex age_group       time      cip  cip_low cip_high
## 1 all    [0,20) 0.00000000  0.00000      NaN      NaN
## 2 all    [0,20) 0.07945205 14.09683 13.55720 14.65794
## 3 all    [0,20) 0.15890411 16.30146 15.72631 16.89764
## 4 all    [0,20) 0.23835616 17.64508 17.05019 18.26074
## 5 all    [0,20) 0.31780822 18.62727 18.01874 19.25635
## 6 all    [0,20) 0.40000000 19.34755 18.72941 19.98608

We are interested on the first 15 years after the prior-disorder. We will delete the observations after 15 years, and extend the last observation until exact 15 years after the prior-disorder.

CIP <- CIP[CIP$time<=15, ]
CIP_last <- ddply(CIP, .(sex, age_group), function(x) x[nrow(x), ])
CIP_last$time <- 15
CIP <- rbind(CIP, CIP_last)

Merge the number of cases and the CIP estimates

numbers <- numbers[!duplicated(numbers), ]
numbers2 <- aggregate(cbind(number, d)~kqn, data=numbers, FUN=sum)
numbers2$age_group <- "all"
numbers3 <- aggregate(cbind(number, d)~age_group, data=numbers, FUN=sum)
numbers3$kqn <- "all"
numbers4 <- aggregate(cbind(number, d)~1, data=numbers, FUN=sum)
numbers4$age_group <- "all"
numbers4$kqn <- "all"
numbers <- rbind(numbers, numbers2, numbers3, numbers4)
colnames(numbers)[colnames(numbers)=="kqn"] <- "sex"

CIP <- merge(CIP, numbers, all.x=T, all.y=F)
colnames(CIP)[colnames(CIP)=="d"]<-"cases_outcome"
colnames(CIP)[colnames(CIP)=="number"]<-"number_exposed"

# According to Danish regulations, we cannot report cells with less than 3 cases
CIP[CIP$cases_outcome<3, "cip"] <- NA
CIP[CIP$cases_outcome<3, "cip_low"] <- NA
CIP[CIP$cases_outcome<3, "cip_high"] <- NA
CIP[CIP$cases_outcome<3, "cases_outcome"] <- NA
CIP[CIP$number_exposed<3, "number_exposed"] <- NA

head(CIP)
##   sex age_group       time      cip  cip_low cip_high number_exposed
## 1 all    [0,20) 0.00000000  0.00000      NaN      NaN          16872
## 2 all    [0,20) 0.07945205 14.09683 13.55720 14.65794          16872
## 3 all    [0,20) 0.15890411 16.30146 15.72631 16.89764          16872
## 4 all    [0,20) 0.23835616 17.64508 17.05019 18.26074          16872
## 5 all    [0,20) 0.31780822 18.62727 18.01874 19.25635          16872
## 6 all    [0,20) 0.40000000 19.34755 18.72941 19.98608          16872
##   cases_outcome
## 1          6121
## 2          6121
## 3          6121
## 4          6121
## 5          6121
## 6          6121