1.1 Survival data analysis
Survival data or more generally speaking “time-to-event” data considers the time from a given origin to the occurrence of an event of interest, for example the time from the diagnosis of a certain disease to the death of the patient. While it is common to speak about survival time, the event considered is not necessarily death and we could, for example, be interested in the time to cancer relapse in an oncology study, the time to rejection of the transplanted organ in a transplantation study, the time to pain relief post surgery in an analgesic studies, or the time to the first pregnancy. Obviously, survival data are not restricted to medicine, and one can also think of time to first employment after graduation, time to the first claim for an insurance policy, time to break down of an engine, . As can be seen from these examples, the event of interest can be either negative (death, rejection, break down, ) or positive (pain relief, first employment, ). While the term survival analysis is commonly used in the biomedical area, the terms of duration analysis and reliability analysis are more common in human sciences and engineering.
Survival data have two main distinguishing features. First, the time-to-event, often denoted T, is obviously a positive continuous random variable. A second typical feature of survival data is that they may be subject to censoring and truncation, which leads to incomplete data. Censoring means that for certain individuals under study, the time-to-event of interest is not known precisely. For example, a patient may still be alive at the time of the last follow-up visit in a clinical study. In that case, we know that the real survival time is longer than the observed survival time and the survival time is said to be right-censored at the date of the last information available. Although right-censoring is usually considered to be the more common form of censoring, one also speaks about left censoring and interval-censoring, and these concepts will be shortly wrapped up in Section 1.2.1, together with truncation. While for censored observations, “some” information is available, truncation occurs when a part of the relevant subject’s observations will not at all appear in the data. Unless specified otherwise, we will concentrate in this book on right-censoring and provide further references for left- and interval-censoring and/or truncation whenever available.
Time-to-event or survival data analysis has been the subject of numerous textbooks, amongst which are [13, 78, 91, 181, 186, 372]. For a more applied perspective, we can also mention [8, 193] and [260] amongst many others, or [249, 282] who focus on the design and analysis of clinical trials with a time-to-event endpoint. These books mainly consider what can be called classical or standard survival data. Such data are characterized by a single event of interest (e.g. death from any cause). Furthermore, one assumes that this event would be observed for all experimental units if the follow-up would be long enough. One assumes further that all experimental units are independent and that the population is homogeneous given the observed covariates.
These classical survival data analysis techniques encompass estimation, hypothesis tests, and regression models. Such regression models are useful to analyse simultaneously the impact of several factors on the time-to-event under investigation. For example, in the context of a clinical trial, such regression models are often used to estimate the treatment effect on time to death while adjusting for important prognostic factors such as the stage of disease at randomization. These models are particularly useful in the context of...