Analysis of Infectious Disease Data
eBook - ePub

Analysis of Infectious Disease Data

  1. 234 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Analysis of Infectious Disease Data

About this book

The book gives an up-to-date account of various approaches availablefor the analysis of infectious disease data. Most of the methods havebeen developed only recently, and for those based on particularlymodern mathematics, details of the computation are carefullyillustrated. Interpretation is discussed at some length and the emphasisthroughout is on making statistical inferences about epidemiologicallyimportant parameters.Niels G. Becker is Reader in Statistics at La Trobe University,Australia.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Analysis of Infectious Disease Data by N.G. Becker in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

CHAPTER 1

Introduction

The material in this book is concerned with the statistical analysis of quantitative data obtained by observing the spread of infectious diseases. By an infectious disease we mean a disease which is infectious in the sense that an infected host passes through a stage, called his infectious period, during which he is able to transmit the disease to a susceptible host, either by a direct ‘sufficiently close’ host-to-host contact or by infecting the environment and the susceptible host then making ‘sufficiently close’ contact with the environment. In this context the meaning of environment depends on the particular disease. The infected environment might include the linen and cutlery of the household as well as the ambient air in the house, but for vector-borne diseases would consist of the vector population.

1.1 Infectious diseases today

Infectious diseases were once the main cause of morbidity and mortality in man. Over the centuries we have managed to eradicate the more serious of these diseases from many parts of the world. However, diseases such as malaria, schistosomiasis, filariasis, hookworm disease and trachoma still affect hundreds of millions of people today, while the number of people affected by diseases such as leprosy and onchocerciasis is still many tens of millions.
Even countries free from the more serious infectious diseases still have public health problems due to such diseases. Firstly, when the more serious diseases seem under control, the less serious diseases receive more attention with a view towards bringing them under control as well. Thus, for example, a survey conducted in Denmark on measles (Horowitz et al, 1974) revealed that 18% of the measles cases involved complications such as otitis media, pneumonia and encephalitis, suggesting that measles is not the harmless disease it was once thought to be. It is therefore desirable to have vaccination compaigns aimed at controlling the spread of measles, as well as rubella, influenza and others of the less serious diseases which induce illness and can lead to complications. Another type of public health problem due to infectious diseases arises when vaccinations are used to control a disease and a small fraction of the vaccinations lead to serious complications. When the disease is nearly eradicated it becomes difficult to assess whether damage due to further vaccinations is greater or less than the damage due to the residual spread of the disease. This was the case for smallpox during 1965-1975 (Lane et al., 1971) and in some countries today it is the case for whooping-cough (Miller and Pollock, 1974). Furthermore, with increased air travel the entire world population is at risk from pandemics of diseases such as influenza, as well as occasional local outbreaks of diseases such as cholera, even in countries essentially free from this disease. Finally, there is always the risk of a new infectious disease developing which can pose a serious threat to mankind. The threat currently posed by AIDS (acquired immune deficiency syndrome) illustrates this point.
In short, infectious diseases continue to pose public health problems. The methods of statistical analysis described here are aimed at improving our understanding of infectious diseases and their spread through communities, with the hope that such additional knowledge will help in the control of these diseases.

1.2 Preamble about data and models

In most sciences one accumulates knowledge by analysing data from repetitions of planned experiments. It is generally not possible to conduct repeated experiments involving outbreaks of an infectious disease. Infectious disease data are usually obtained from epidemics occurring in nature, which makes it difficult to accumulate precise data and explains why existing data are often lacking in detail. Of the infection process one can at best hope to have the times at which infected individuals showed certain symptoms. It is usually impossible to observe the times at which infections occur, or to know which infected person is responsible for transmitting the disease to a particular susceptible.
An epidemiological study of an infectious disease should generally begin with a statement of the study’s objectives. Ideally one would like to use the objectives to plan suitable experiments. As epidemics are not planned experiments, the researcher can use the objectives only to determine the type and detail of data that need to be collected. Indeed, it is important to determine beforehand which observations are required on the infection process and on the sociological setting of the affected community, so as to ensure that the proposed objectives may be met. Accordingly, when describing the various methods of analysis in the following chapters, we make an attempt to indicate clearly the type of data which are necessary to implement the methods as well as the questions which may be answered thereby. In the next section we give a preliminary discussion of some objectives which a statistical analysis of infectious disease data can help to meet.
In statistics there has developed a tendency to make inferences by distribution-free methods, so that the dangers of adopting an incorrect model are avoided. However, for most infectious diseases we have some understanding of how a transmission of the disease can occur, and it seems preferable to incorporate this knowledge into a model aimed at describing the infection process. The advantage of basing a statistical analysis on a more specific model lies in the fact that this usually leads to more efficient statistical inferences. In other words, by formulating a model to describe how the data are generated we can help to compensate for the lack of detail usually inherent in infectious disease data. Of course it is important to be sure that the model assumptions are supported both empirically and by biological and sociological considerations, because the claimed gain in efficiency becomes meaningless if the model is incorrect. A desirable by-product of using a model which is specifically formulated to suit the application is that its parameters have well-understood interpretations.
Our concern is with the statistical analysis of data and accordingly we use stochastic (or probability) models, by which we mean these models ascribe the unpredictable aspects of real epidemics to an element of chance. It is true that deterministic models can also be fitted to data and thereby lead to estimates for parameters, but it is difficult to assess the precision of such estimates. The natural role for the deterministic model is as an approximation to the stochastic model when all population sizes (i.e. the sizes of all subgroups of the community specifically referred to in the model) are large. Indeed, deterministic models are more useful in enriching the general theory of epidemics than in applications to real data. An unattractive feature of stochastic model formulations is that they tend to lead more often to model equations which are very difficult to solve in terms of explicit and manageable expressions. This is especially true of continuous-time model formulations. In such situations we try to derive methods of statistical inference directly from the model formulation, thereby avoiding the complicated model solution.
Another important question about models concerns the degree of complexity to be incorporated into a model. An objection often made of mathematically formulated epidemic models is that such models involve too many simplifying assumptions. Too much is often made of this objection because in fact most methods of analysis involve such assumptions but only the mathematically formulated approach explicitly exposes all the assumptions made. Indeed, this property of mathematical formulations coupled with the fact that many concepts, models, methods of analysis and their interpretations are most clearly described in a mathematical framework helps to explain why in most areas of science mathematics is recognized as a most useful means of communication between research workers. Nevertheless, it is true that many simplifying assumptions are contained in most of the epidemic models to be found in the literature, but it is quite wrong to reject such models merely on the basis of their simplicity, because therein lies a considerable part of their value.
Of course a simple model is an idealization and generally cannot be viewed as representing exactly the spread of real epidemics. In fact, if a statistical test indicates that the model does not adequately describe some epidemic data then it must be modified for that particular application. However, if a simple model does provide an adequate fit to some epidemic data, then it could easily prove to be more useful than a detailed, complex model that also provides an adequate fit to the same data. There is the obvious advantage that the simple model is more likely to yield to mathematical analysis.
Apart from this, however, the simple model can reveal more clearly what the important characteristics are, because in a complex model the important characteristics are often mixed in with less important ones. Furthermore, the amount of epidemic data is rarely large enough to indicate, on the basis of a statistical test, the need to retain all the detailed components of a complex model. In the absence of such evidence the preference for a complex model in a particular application often involves a considerable amount of subjective judgement and, whenever possible, such subjective judgements should be avoided in scientific studies.

1.3 Motivation

The objectives that a statistical analysis of infectious disease data can help to meet depend on both the extent of the data set and the degree of detail therein. A pointed discussion is possible only when the details of the data set are specified as is done in each of the following chapters. In this section we engage in a more general preliminary discussion intended to indicate the variety of potential uses of such an analysis. More specifically, we give reasons why it is useful to search for an epidemic model that adequately describes a set of infectious disease data.
One reason lies in the fact that the search for adequately fitting models can help to provide insight into the biological and sociological mechanisms underlying the process of disease spread. Of such a search it is important to realize that not only the final model but all inadequate models discarded along the way play an important part in helping to point to the characteristics which are essential for the model to be adequate. Each comparison of an adequate model with an inadequate model helps to isolate the more important features of disease spread. Often models fitted to the same data are compared most efficiently when one model reduces to the other by a parameter reduction, because then the comparison may be effected by testing a hypothesis about parameters of the more general model.
When an epidemic model is fitted to a data set, and is found to provide an adequate description of it, we can make use of the fitted model in several ways. The most obvious first step is the interpretation of various parameter estimates and also of the model as a whole. An epidemic model specifically formulated to describe an infection process will usually explicitly involve parameters with clear epidemiologica! meanings. It is then often a straightforward matter to extract estimates of parameters which measure the infectiousness of the disease, the mean duration of the latent period and similar epidemiologically important parameters. The interpretation of the model as a whole is also important because the fitted model provides one plausible explanation of the infection process for the disease. For this interpretation it is necessary to have a full understanding of both the underlying assumptions of the model and the consequences suggested by it. One is helped in this by looking at the fitted model with reference to the existing body of mathematical theory of infectious diseases and extracting from this theory any insights it can provide. A brief summary of the type of insights provided by this theory is given in the next section.
A most important use for epidemic models, which adequately describe the data, is as a tool to help assess proposed control procedures for infectious diseases. In most sciences, the effects of changes are assessed by analysing the results of repeated experiments. As this is generally not possible with infectious diseases it is natural to try to overcome this difficulty by constructing a model which adequately describes the basic features of epidemics in the community and then using the model to predict the consequences of introducing specific changes. The use of a model in this way to evaluate a vaccination campaign, for example, is based on the hope that if we make a change in the model in accordance with the proposed campaign, then the model will respond in a way as to adequately describe the basic features of epidemics in the corresponding partially vaccinated community. The epidemic threshold theorem, outlined in the next section, is most relevant in connection with the control of infectious diseases.
Once an adequate epidemic model has been determined for a particular data set it is also useful as a summary of the data in cases where the data set is very large. Furthermore, an adequate model provides a basis for comparisons of epidemics of the same disease in different areas and at different points in time, as well as the comparison of epidemics of different diseases. These comparisons can make a considerable contribution to our understanding of the spread of diseases. In many ways, the best way of making such comparisons is by comparing epidemic models that adequately describe the various epidemic outbreaks. Indeed, it is a standard procedure of statistical analysis to compare data sets by comparing the models that adequately summarize them.

1.4 Insights gained from epidemic theory

Epidemiologists often regard the mathematical theory of infectious diseases as a theoretical exercise rather than a body of knowledge which has practical relevance. By looking at the work in its entirety it is easy to see why this attitude prevails. Much of the work done by mathematicians is indeed of greater consequence to the furthering of mathematical knowledge than to the understanding of disease spread. However, some of the mathematical theory can help to give insight to disease spread and it is important to learn what we can from this theory, rather than dismiss all of it as an academic exercise. In this section an attempt is made to indicate some of the things which may be learnt from this theory.
For most inkctious diseases we are able to describe the type of contact which can lead to a transmission of the disease from an infectious individual to a susceptible individual. This provides the essential ingredient for a model formulation, or model equations. An epidemic consists of a chain of infections, generated, at least partially, by chance contacts and it is not easy to picture the likely extent of such a chain of infections. The solution to the model equations provides us with a convenient representation of the likely extent of the outbreak through the community. A major function of an epidemic model is therefore to provide a means by which we may go from a description of the role of an infectious individual at an instant in time to a time-dependent description of the spread of the disease through the community.
From a study of stochastic epidemic models we learn to appreciate that variations in disease spread can sometimes by explained by chance fluctuations alone. This helps to prevent us from drawing hasty conclusions, such as ascribing variations in the spread of the disease due to difference in virulence or infectiousness when the magnitude of the variations is in fact readily explained by considerations of chance alone. Indeed, via simulation studies, Bartlett (1957, 1961) demonstrated that even the apparent ‘two-year cycle’ of measles incidence in large cities can be mimicked by a stochastic epidemic model without explicitly building any periodic components into the model.

1.4.1 The epidemic threshold theorem

Probably the most important conclusions arrived at by a study of epidemic models are contained in the celebrated epidemic threshold theorem. This theorem quantifies the probability distribution for the final size of an outbreak of a disease in a closed, large population in terms of a crucial parameter Îź and other aspects of the infection process. By referring to the early stages of the epidemic we may loosely define Îź as the mean number of susceptible individuals infected by an infected individual during his infectious period. This parameter may take different values for different diseases and different types of communities, so that it is important to estimate it for every epidemic. In view of the paucity of data available on epidemics, the estimation of this parameter is not a trivial statistical problem, as we shall see later.
It might be tempting for epidemiologists unfamiliar with this mathematical work to dismiss the epidemic threshold theorem as being merely a consequence of oversimplified assumptions. While it is true that certain details of the theorem will change when some of these assumptions are relaxed, it is clear that the threshold phenomenon of the theorem is quite robust under relaxations of the assumptions. The threshold result is the aspect of the theorem with the greatest practical consequences, and states that in large populations the probability of an outbreak being minor is unity whenever μ < 1, but when μ > 1 the probability of a major epidemic is positive. The importance of this threshold result lies in the observation that by immunizing a fraction v of the susceptible individuals, selected at random, we reduce the threshold parameter to μ* = (1 — ν)μ. If ν > 1 — 1/μ, then μ* < 1 and the partially immunized community satisfies the condition under which minor epidemics occur with probability 1. In other words, the threshold theorem indicates what fraction of the community must be immunized in order that major epidemics will be prevented. In applications it is necessary to have an estimate for the threshold parameter μ and this question is referred to in various places throughout the book.
The evidence which supports the claim that the threshold phenomenon is reasonably insensitive to the simplifying assumptions of epidemic models is found in the link between the event of a minor epidemic in a large population and the extinction of an approximating branching process. The awareness of this connection goes back to Bartlett (1949) and is more explicitly seen in the derivations of the threshold theorem by Whittle (1955) and Becker (1977b). Once this link is observed we may use the results of multi-type branching processes, branching processes with random environments, and branching processes with different households (see Bartoszynski, 1972, for the last of these), to make the safe conjecture that the epidemic threshold phenomenon applies under very general conditions for large communities. Becker (1977a) indicates that unde...

Table of contents

  1. Cover
  2. Title Page
  3. Copyright Page
  4. Table of Contents
  5. Preface
  6. 1 Introduction
  7. 2 Chain binomial models
  8. 3 Chain models with random effects
  9. 4 Latent and infectious periods
  10. 5 Heterogeneity of disease spread through a community
  11. 6 Generalized linear models
  12. 7 Martingale methods
  13. 8 Methods of inference for large populations
  14. Appendix
  15. References
  16. Author index
  17. Subject index