Survival Analysis with Python
eBook - ePub

Survival Analysis with Python

Avishek Nag

Condividi libro
  1. 84 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Survival Analysis with Python

Avishek Nag

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Survival analysis uses statistics to calculate time to failure. Survival Analysis with Python takes a fresh look at this complex subject by explaining how to use the Python programming language to perform this type of analysis. As the subject itself is very mathematical and full of expressions and formulations, the book provides detailed explanations and examines practical implications. The book begins with an overview of the concepts underpinning statistical survival analysis. It then delves into



  • Parametric models with coverage of


    • Concept of maximum likelihood estimate (MLE) of a probability distribution parameter


    • MLE of the survival function


    • Common probability distributions and their analysis


    • Analysis of exponential distribution as a survival function


    • Analysis of Weibull distribution as a survival function


    • Derivation of Gumbel distribution as a survival function from Weibull



  • Non-parametric models including


    • Kaplan–Meier (KM) estimator, a derivation of expression using MLE


    • Fitting KM estimator with an example dataset, Python code and plotting curves


    • Greenwood's formula and its derivation



  • Models with covariates explaining


    • The concept of time shift and the accelerated failure time (AFT) model


    • Weibull-AFT model and derivation of parameters by MLE


    • Proportional Hazard (PH) model


    • Cox-PH model and Breslow's method


    • Significance of covariates


    • Selection of covariates

The Python lifelines library is used for coding examples. By mapping theory to practical examples featuring datasets, this book is a hands-on tutorial as well as a handy reference.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Survival Analysis with Python è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Survival Analysis with Python di Avishek Nag in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Ciencia de la computación e Programación en Python. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2021
ISBN
9781000520699

Chapter 1Introduction

DOI: 10.1201/9781003255499-1
We will start our discussion with a few events that can be observed: death of a person due to a disease, attrition of an employee from an organization and incident of a natural calamity (earthquake or flood). All these examples are from completely different domains, but they have a common thing: time or, better to say, time until an event occurs. Time is crucial in all these situations. If we know beforehand that a certain event may occur at any specific time, then a lot of lives and resources can be saved. Survival analysis is defined as a collection of statistical longitudinal data analysis techniques where time is a major factor. It is utilized in biology, medicine, engineering, marketing, social sciences or behavioral sciences. Survival analysis is also sometimes named as reliability theory under operations research or engineering. It is a complex subject and the reader would need expertise in probability, statistics, calculus and optimization to grasp it fully.
In this chapter, we will explore some basic concepts of survival analysis, nomenclatures and sample datasets.

Concept of Failure Time

We have already talked about event. In general, survival analysis deals with the events related to failure. And failure off course can occur one or more time for any subject. For the topics discussed in this book it is assumed that failure occurs only once for a subject. We will be using the term subject throughout this book to represent the entity which is going through some phases and the failure (or the event) is attached to it. A subject may be a person, a machine, a river, and even an entire geographic region. There are numerous use cases where survival analysis can be applied to find out chances of event occurrence. Some of them are:
  • Death of a person by any disease
  • Suicide
  • Failure of machine tools
  • Attrition of employees from organization
  • Divorce
  • Occurrence any natural catastrophe (flood, earthquake, volcanic eruption, etc.)
In this book, we will be discussing mostly about the death by disease use cases, as survival analysis finds its usage in these cases mostly. Death by disease use case is mostly analyzed in case of drug development, where survival analysis plays a crucial role to identify the right drug by comparative study of several options.
We are talking about time a lot. But what does it signify? By time, we mean years, months, weeks or days from the beginning of analysis of the data until an event (like death, exit of an employee, earthquake, etc.) occurs. As said earlier, event is also termed as failure. So, time taken till failure is referred to as the failure time or survival time. Time may not be a physical unit always; there are cases where it can be used as a logical indicator. Below points are needed to be taken care of before defining a time scale:
  • Origin of the time must be unambiguously defined.
  • The scale for measuring the time difference must be defined.
  • Definition of failure must be clear.

Concept of Survival

When we speak about survival, we mean probabilities. Probability of not occurring an event till some time can be taken as survival probability. In other words, probability of an event occurrence after a certain time is survival probability. For example, when we say survival probability of a heart patient at age 71 is 0.23, it means that the patient will survive at least till age 71 and there is a probability 0.23 that he/she will keep surviving after 71. Age is a time scale here. Similarly, there could be a probability 0.40 that he/she will survive after 50. Reason is clear. At younger age, chances of collapsing by a heart attack is less and thus survival probability will be higher. So, we can have a survival probability distribution over random variable time (here age) like below:
Table 1.1 A Sample Survival Probability Distribution
Time (Age)
40
45
50
60
65
70
Survival Probability
0.51
0.42
0.38
0.36
0.28
0.24
One of the purposes of survival analysis is to find out this probability distribution. A lot of other domain-specific statistical inferences can also be drawn from this. It can be observed that survival probability decreases over time. It is a very important feature of distribution. We will discuss it in greater detail in Chapter 2. Like heart patient use case, the same analysis can be done for employee attrition of an organization. The purpose is to find out survival probability distribution of employee exit at various times after he/she joins there. Interesting part is that the term survival is very generic here. It should not necessarily always mean saving yourself from something. It is not also always related to disease, patients or healthcare. Survival means non-occurrence of an event till some time. Events could either be any one from the list as discussed in the section ‘Concept of Failure Time’ or something else.

Censoring

Most survival analyses must consider a very important analytical problem called censoring. It is caused by not observing some subjects fo...

Indice dei contenuti