Survival Analysis with Python
eBook - ePub

Survival Analysis with Python

Avishek Nag

Share book
  1. 84 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Survival Analysis with Python

Avishek Nag

Book details
Book preview
Table of contents
Citations

About This Book

Survival analysis uses statistics to calculate time to failure. Survival Analysis with Python takes a fresh look at this complex subject by explaining how to use the Python programming language to perform this type of analysis. As the subject itself is very mathematical and full of expressions and formulations, the book provides detailed explanations and examines practical implications. The book begins with an overview of the concepts underpinning statistical survival analysis. It then delves into



  • Parametric models with coverage of


    • Concept of maximum likelihood estimate (MLE) of a probability distribution parameter


    • MLE of the survival function


    • Common probability distributions and their analysis


    • Analysis of exponential distribution as a survival function


    • Analysis of Weibull distribution as a survival function


    • Derivation of Gumbel distribution as a survival function from Weibull



  • Non-parametric models including


    • Kaplan–Meier (KM) estimator, a derivation of expression using MLE


    • Fitting KM estimator with an example dataset, Python code and plotting curves


    • Greenwood's formula and its derivation



  • Models with covariates explaining


    • The concept of time shift and the accelerated failure time (AFT) model


    • Weibull-AFT model and derivation of parameters by MLE


    • Proportional Hazard (PH) model


    • Cox-PH model and Breslow's method


    • Significance of covariates


    • Selection of covariates

The Python lifelines library is used for coding examples. By mapping theory to practical examples featuring datasets, this book is a hands-on tutorial as well as a handy reference.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Survival Analysis with Python an online PDF/ePUB?
Yes, you can access Survival Analysis with Python by Avishek Nag in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Programación en Python. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
ISBN
9781000520699

Chapter 1Introduction

DOI: 10.1201/9781003255499-1
We will start our discussion with a few events that can be observed: death of a person due to a disease, attrition of an employee from an organization and incident of a natural calamity (earthquake or flood). All these examples are from completely different domains, but they have a common thing: time or, better to say, time until an event occurs. Time is crucial in all these situations. If we know beforehand that a certain event may occur at any specific time, then a lot of lives and resources can be saved. Survival analysis is defined as a collection of statistical longitudinal data analysis techniques where time is a major factor. It is utilized in biology, medicine, engineering, marketing, social sciences or behavioral sciences. Survival analysis is also sometimes named as reliability theory under operations research or engineering. It is a complex subject and the reader would need expertise in probability, statistics, calculus and optimization to grasp it fully.
In this chapter, we will explore some basic concepts of survival analysis, nomenclatures and sample datasets.

Concept of Failure Time

We have already talked about event. In general, survival analysis deals with the events related to failure. And failure off course can occur one or more time for any subject. For the topics discussed in this book it is assumed that failure occurs only once for a subject. We will be using the term subject throughout this book to represent the entity which is going through some phases and the failure (or the event) is attached to it. A subject may be a person, a machine, a river, and even an entire geographic region. There are numerous use cases where survival analysis can be applied to find out chances of event occurrence. Some of them are:
  • Death of a person by any disease
  • Suicide
  • Failure of machine tools
  • Attrition of employees from organization
  • Divorce
  • Occurrence any natural catastrophe (flood, earthquake, volcanic eruption, etc.)
In this book, we will be discussing mostly about the death by disease use cases, as survival analysis finds its usage in these cases mostly. Death by disease use case is mostly analyzed in case of drug development, where survival analysis plays a crucial role to identify the right drug by comparative study of several options.
We are talking about time a lot. But what does it signify? By time, we mean years, months, weeks or days from the beginning of analysis of the data until an event (like death, exit of an employee, earthquake, etc.) occurs. As said earlier, event is also termed as failure. So, time taken till failure is referred to as the failure time or survival time. Time may not be a physical unit always; there are cases where it can be used as a logical indicator. Below points are needed to be taken care of before defining a time scale:
  • Origin of the time must be unambiguously defined.
  • The scale for measuring the time difference must be defined.
  • Definition of failure must be clear.

Concept of Survival

When we speak about survival, we mean probabilities. Probability of not occurring an event till some time can be taken as survival probability. In other words, probability of an event occurrence after a certain time is survival probability. For example, when we say survival probability of a heart patient at age 71 is 0.23, it means that the patient will survive at least till age 71 and there is a probability 0.23 that he/she will keep surviving after 71. Age is a time scale here. Similarly, there could be a probability 0.40 that he/she will survive after 50. Reason is clear. At younger age, chances of collapsing by a heart attack is less and thus survival probability will be higher. So, we can have a survival probability distribution over random variable time (here age) like below:
Table 1.1 A Sample Survival Probability Distribution
Time (Age)
40
45
50
60
65
70
Survival Probability
0.51
0.42
0.38
0.36
0.28
0.24
One of the purposes of survival analysis is to find out this probability distribution. A lot of other domain-specific statistical inferences can also be drawn from this. It can be observed that survival probability decreases over time. It is a very important feature of distribution. We will discuss it in greater detail in Chapter 2. Like heart patient use case, the same analysis can be done for employee attrition of an organization. The purpose is to find out survival probability distribution of employee exit at various times after he/she joins there. Interesting part is that the term survival is very generic here. It should not necessarily always mean saving yourself from something. It is not also always related to disease, patients or healthcare. Survival means non-occurrence of an event till some time. Events could either be any one from the list as discussed in the section ‘Concept of Failure Time’ or something else.

Censoring

Most survival analyses must consider a very important analytical problem called censoring. It is caused by not observing some subjects fo...

Table of contents