Regression Models for Categorical and Count Data
eBook - ePub

Regression Models for Categorical and Count Data

  1. 272 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Regression Models for Categorical and Count Data

About this book

This text provides practical guidance on conducting regression analysis on categorical and count data. Step by step and supported by lots of helpful graphs, it covers both the theoretical underpinnings of these methods as well as their application, giving you the skills needed to apply them to your own research. It offers guidance on:

¡       Using logistic regression models for binary, ordinal, and multinomial outcomes

¡       Applying count regression, including Poisson, negative binomial, and zero-inflated models

¡       Choosing the most appropriate model to use for your research

¡       The general principles of good statistical modelling in practice

Part of The SAGE Quantitative Research Kit, this book will give you the know-how and confidence needed to succeed on your quantitative research journey

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Regression Models for Categorical and Count Data by Peter Martin,Author in PDF and/or ePUB format, as well as other popular books in Social Sciences & Social Science Research & Methodology. We have over one million books available in our catalogue for you to explore.

1 Introduction

Chapter Overview

  • Why study regression models for categorical and count data? 2
  • A few words on terminology 3
  • Why do we need to look beyond linear regression? 4
  • Regression beyond the linear model: an illustrated introduction 4
  • Linear regression: a reminder, with some mathematical notation 8
  • Generalised linear models 10
  • What’s the same and what’s different 11
  • How you might use this book 13
  • Further Reading 13
This book is intended as a first introduction to regression models for outcomes that are either categorical or count variables. Count variables represent social phenomena that can be counted, such as the number of crimes in a city, the number of times a person visits a hospital, or the number of Members of Parliament who change party allegiance. Categorical variables represent social phenomena that cannot be measured numerically. We consider three types of categorical variables:
  1. Dichotomous variables have exactly two categories. Examples are presence of an illness (the patient is either ‘ill’ or ‘not ill’) or retirement status (‘retired’ or ‘not retired’).
  2. Ordinal variables have three or more categories that can be placed in a natural order. Examples are highest qualification (‘no qualification’, ‘completed primary school’, ‘completed secondary school’, ‘university degree’, etc.) or subjective health status based on an ordered response scale (‘poor health’, ‘fair’, ‘good’, ‘very good health’).
  3. Nominal variables have three or more categories that cannot be placed in a meaningful order. Examples are choice of study subject (‘science’, ‘humanities’, ‘arts’, etc.) or type of accommodation (‘rented’, ‘owned with mortgage’, ‘owned outright’, ‘nursing home or other institution’, ‘homeless’).
The models discussed in this book include the following:
  • Logistic regression for dichotomous (binary) outcomes
  • The general ordered logit model for ordinal outcomes (also known as ordinal logistic regression)
  • Multinomial logistic regression for nominal outcomes
  • Several models for count outcomes, including Poisson and negative binomial regression, as well as zero-truncated, zero-inflated and hurdle models
This book assumes that the reader is familiar with linear regression, elementary inferential statistics (hypothesis tests and confidence intervals) and general methodological considerations in the collection and analysis of quantitative social science data. A good way to acquire or refresh this knowledge is to study the volumes in The SAGE Quantitative Research Kit series that precede this one. In particular, Volume 7 gives a thorough introduction to linear regression.

Why study regression models for categorical and count data?

Regression models are used widely in the social sciences to investigate relationships between social phenomena, to test theories about the social world, and to provide model-based predictions of what might happen in the future. Research examples discussed in Chapters 2 to 5 of this book include the following:
  • Health inequalities: In England, people living in poorer areas are less likely to make use of free eye tests than people living in richer areas. Why?
  • Mental health: Can we identify childhood experiences and characteristics that are associated with the risk of developing an eating disorder as an adult?
  • Sociology of culture: Do people choose their cultural activities to display their social status?
  • Political science: Under what circumstances are local politicians prepared to tolerate illegal street vendors in their cities?
  • Sociology of religion: Is it true that people with a strong religious identity are more likely to be happy with their lives? And if so, why might that be?
This book does not provide conclusive answers to any of these questions. But it does discuss the methods used to investigate them.

A few words on terminology

Outcome and predictor

In regression models, we distinguish between the outcome variable and the predictor variables. The outcome variable is what we wish to explain or predict, the predictor variables contain the information that does the explaining or predicting. In different texts, you may find other names for these concepts:
  • The outcome variable is also known as the dependent variable, or the response. It is usually denoted by the letter Y.
  • Predictor variables are also known as independent variables, explanatory variables, or exposures. The conventional symbol for a predictor variable is X. When there are multiple predictor variables, they are identified by numeric subscripts: X1, X2, X3, and so forth.

Types of variables

We can distinguish numeric and categorical variables. The values of a numeric variable are numbers that represent numeric measurements, such as a person’s height or a country’s gross domestic product. Among the numeric variables, we distinguish continuous and discrete variables:
  • Continuous variables: A continuous variable is a numeric variable that can take any value within its possible range. For example, age is a continuous variable: a person can be 28 years old, 28.4 years old, or even 28.397853 years old. Age changes every day, every minute, every second, so our measurement of age is limited only by how precise we can or wish to be.
  • Discrete variables: A discrete variable is a numeric variable that can only take particular, ‘discrete’ values. Count variables are discrete. They can take the values 0, 1, 2, 3 and so on. Consider the count variable ‘number of children’: you can have zero children, one child or seven children, but not 1.5 children.
In contrast to numeric variables, the values of categorical variables are not numbers, but categories. In a particular data set, the categories might be represented by numbers, but then the numbers are merely names for the categories and do not represent true numeric measurements. Three types of categorical variables were defined in the previous section.

Why do we need to look beyond linear regression?

When the intended outcome of an analysis is a categorical or count variable, linear regression is often not appropriate. With dichotomous or ordinal outcomes, a form of linear regression can sometimes be applied but, in general, is rarely advisable. (Some reasons are given in Chapter 2.) Nominal variables with three or more categories cannot meaningfully be modelled using linear regression at all. Linear regression models applied to count outcomes often fail to meet some of their assumptions. In particular, errors and residuals from a linear regression on count data are often not normally distributed and not homoscedastic.
So, to construct statistical models for categorical and count outcomes, we need specialised techniques. This book does not give a complete overview of all models that can be applied to categorical and count outcomes. But it does discuss the models most commonly used in the social sciences. Thus, for categorical outcomes, this book discusses models of the logistic regression family. Less frequently used alternatives (e.g. probit models, or the l...

Table of contents

  1. Cover
  2. Half Title
  3. Acknowledgements
  4. Title Page
  5. Copyright Page
  6. Contents
  7. Illustration List
  8. About the Author
  9. Acknowledgements
  10. Preface
  11. 1 Introduction
  12. 2 Logistic Regression
  13. 3 Ordinal Logistic Regression: The Generalised Ordered Logit Model
  14. 4 Multinomial Logistic Regression
  15. 5 Regression Models for Count Data
  16. 6 The Practice of Modelling
  17. Glossary
  18. References
  19. Index