?The editors of the new SAGE Handbook of Regression Analysis and Causal Inference have assembled a wide-ranging, high-quality, and timely collection of articles on topics of central importance to quantitative social research, many written by leaders in the field. Everyone engaged in statistical analysis of social-science data will find something of interest in this book.?

- John Fox, Professor, Department of Sociology, McMaster University

?The authors do a great job in explaining the various statistical methods in a clear and simple way - focussing on fundamental understanding, interpretation of results, and practical application - yet being precise in their exposition.?

- Ben Jann, Executive Director, Institute of Sociology, University of Bern

?Best and Wolf have put together a powerful collection, especially valuable in its separate discussions of uses for both cross-sectional and panel data analysis.?

-Tom Smith, Senior Fellow, NORC, University of Chicago

Edited and written by a team of leading international social scientists, this Handbook provides a comprehensive introduction to multivariate methods. The Handbook focuses on regression analysis of cross-sectional and longitudinal data with an emphasis on causal analysis, thereby covering a large number of different techniques including selection models, complex samples, and regression discontinuities.

Each Part starts with a non-mathematical introduction to the method covered in that section, giving readers a basic knowledge of the method's logic, scope and unique features. Next, the mathematical and statistical basis of each method is presented along with advanced aspects. Using real-world data from the European Social Survey (ESS) and the Socio-Economic Panel (GSOEP), the book provides a comprehensive discussion of each method's application, making this an ideal text for PhD students and researchers embarking on their own data analysis.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access The SAGE Handbook of Regression Analysis and Causal Inference by Henning Best, Christof Wolf, Henning Best,Christof Wolf,Author in PDF and/or ePUB format, as well as other popular books in Social Sciences & Social Science Research & Methodology. We have over one million books available in our catalogue for you to explore.

Information

Publisher

SAGE Publications Ltd

Year

2013

Print ISBN

9781446252444

eBook ISBN

9781473914384

Edition

Topic

Social Sciences

Subtopic

Social Science Research & Methodology

Index

Social Sciences

Introduction

Christof Wolf and Henning Best

In recent years, the social sciences have made tremendous progress in quantitative methodology and data analysis. The classical linear model, while still remaining an important foundation for more advanced methods, has been increasingly complemented by specialized techniques. Major improvements include the widespread use of non-linear models, advances in multilevel modeling and Bayesian estimation, the diffusion of longitudinal analyses and, more recently, the focus on novel methods for causal inference.

The interested reader can chose from a number of excellent textbooks on a wide range of topics: starting from general econometrics books such as Wooldridge (2009, 2010) or Greene (2012), ranging over volumes on regression and Bayesian methods (Gelman et al., 2003; Fox, 2008; Gelman and Hill, 2007), multilevel modeling (Hox, 2010), non-linear models for limited dependent variables (Long, 1997; Train, 2009), event history techniques (Blossfeld et al., 2007), right up to trend-setting textbooks on causal inference (Pearl, 2009; Angrist and Pischke, 2009; Morgan and Winship, 2007) or specialized handbooks like the one edited by Morgan (2013).

Having so many excellent monographs on matters of regression analysis and causal inference makes it difficult for scholars and researchers to obtain an overview of these different approaches. Our aim with this Sage Handbook of Regression and Causal Inference is to give readers an accessible outline of a broad set of regression techniques and methods for causal inference written by international experts in the field. Many students and researchers in the social sciences will find this handbook useful as it provides an overview of a range of different methods: ordinary least squares and logistic regression, multilevel and panel regression, time-series cross-section models as well as methods for causal inference – for example, instrumental variables regression, regression discontinuities or propensity score matching. Hence, this volume covers the most commonly used techniques for the statistical analysis of cross-sectional and longitudinal data as well as a number of newer and advanced regression models. Each chapter provides an accessible yet at the same time rigorous presentation of a statistical method. With few exceptions, the contributions follow a common structure, making it easy for readers to navigate through the text. Each chapter begins with an easily accessible, non-technical introduction to the respective method, providing a basic understanding of the method’s logic, scope and unique features. The introduction is followed by a presentation of the statistical foundations of the method. To give readers a better understanding of how a particular method can be applied, the next step consists of a comprehensive discussion of the method’s application in an example analysis based on publicly available real-world data. Whenever possible, authors used the European Social Survey (see http://www.europeansocialsurvey.org/). Readers can download Stata or R code from the companion website to this book and reproduce the analyses (see https://study.sagepub.com/bestwolf). The example is followed by discussion of frequently made errors and caveats of the methods and their applications. Each chapter ends with a brief annotated list of references for further reading.

The book is divided into three major blocks: two chapters on estimation techniques, eight chapters on regression models for cross-sectional data, and six chapters focusing on causal inference and the analysis of longitudinal data.

The volume opens with two chapters on different estimation techniques used in regression analysis. In the first of these Martin Elff discusses ordinary least squares and maximum likelihood methods for the estimation of parameters of linear regression and other statistical models. One of the caveats discussed by Elff is that maximum likelihood estimation can become very difficult if sample sizes are small. A technique particularly suited to this situation is Bayesian estimation, which Susumu Shikano presents in the following chapter. After an introduction to the general idea of Bayesian analysis, Shikano shows how the coefficients of a regression model are estimated in the Bayesian framework.

The second block of chapters in this volume deals with regression analysis for cross-sectional data. Linear regression, a powerful tool often termed the workhorse of the social sciences, is introduced by Christof Wolf and Henning Best. Sound applications can only be expected if the assumptions underlying this model are understood. These are elaborately discussed in the next chapter by Bart Meuleman, Geert Loosveldt and Viktor Emonds. They also present the tools used to diagnose deviations from the assumptions. In the following chapter Henning Lohmann shows how we can incorporate non-linear and non-additive effects into linear regression models. In great detail he discusses interaction effects, polynomials and splines and demonstrates how flexible multiple linear regression is. Joop Hox and Leoniek Wijngaards-de Meij’s contribution focuses on regression models for hierarchical, multilevel data. These models are suitable if the units of observations are ‘nested’ within higher-level units (e.g. students in schools, residents in neighborhoods or employees in firms). The authors discuss these models for both metric and binary dependent variables. An in-depth coverage of regression models for binary outcomes can be found in the next chapter by Henning Best and Christof Wolf on logistic regression. This is directly followed by a presentation of regression models for multinomial and ordinal variables authored by Scott Long. In both chapters dealing with non-metric outcome variables the authors emphasize that interpreting the results of these kinds of models is anything but straightforward. An indispensable tool to successfully meet the challenge to correctly interpret regression results are graphical displays. These are presented and discussed in the subsequent chapter by Gerrit Bauer. The block on regression analysis for cross-sectional data closes with a contribution by Steven Heeringa, Brady West, and Patricia Berglund who address regression modeling for complex sample survey data.

The third block of chapters is devoted to methods for longitudinal data analysis and causal inference that are based on a counterfactual model of causality. Markus Gangl opens this part with a contribution on matching estimators for treatment effects. The chapter discusses analytical goals and mathematical foundations that underlie the use of matching estimators for causal inference. As the name of the method suggests, two types of units – the ‘treated’ and ‘nontreated’ – are matched based on some common characteristic. An alternative method for causal inference is introduced in the chapter by Christopher Muller, Christopher Winship, and Stephen Morgan. They provide a non-technical introduction to instrumental variables regression. This kind of regression helps in dealing with endogeneity by using an additional instrument variable that is correlated with the causal factor of interest, but otherwise exogenous. Another important method, regression discontinuity designs, is presented by Thomas Lemieux and David Lee. They present the conceptual framework behind this research design and draw a parallel between regression discontinuity and randomized experiments. The next chapter, by Josef Brüderl and Volker Ludwig, offers a description of fixed-effects panel regression which they compare to random-effects models and models including a lagged dependent variable. In addition to the basic model of fixed-effects panel regression, the authors discuss a more advanced variant of this approach allowing for heterogeneous change, that is, a model with individual slopes. Another form of longitudinal data is event history data that provides information on a sequence of different states occupied by each unit of analysis and the timing of changes among these states. Hans-Peter Blossfeld and Gwendolin Blossfeld present regression models to analyze such data structures. For them event history models are closely linked to an understanding of causation as a generative process. The book closes with a contribution by Jessica Fortin-Rittberger on models for time-series cross-section. These models are particularly useful if we have data on a comparatively small number of units for a comparatively large number of time points. This type of data structure arises often in comparative political science applications.

We hope that readers will find this Sage Handbook useful for their daily practice in social science teaching and research. We are confident that the book will help students and researchers in conducting quantitative social research and contribute to the further diffusion of important methods for causal inference. If the book helps advance the methodologically sound analysis of society, the time invested will have been well spent.

REFERENCES

Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton: Princeton University Press.

Blossfeld, H.-P, Golsch, K., and Rohwer, G. (2007). Event History Analysis with Stata. Mahwah: Erlbaum.

Fox, J. (2008). Applied Regression Analysis and Generalized Linear Models. Thousand Oaks: Sage.

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis, Second Edition. Chapman and Hall/CRC Texts in Statistical Science. Taylor & Francis.

Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/hierarchical Models. Cambridge: Cambridge University Press.

Greene, W. H. (2012). Econometric Analysis. New York: Prentice Hall.

Hox, J. J. (2010). Multilevel Analysis. Techniques and Applications. New York: Routledge.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage.

Morgan, S. L., (Ed.) (2013). Handbook of Causal Analysis for Social Research. New York: Springer.

Morgan, S. L. and Winship, C. (2007). Counterfactuals and Causal Inference. New York: Cambridge University Press.

Pearl, J. (2009). Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press.

Train, K. (2009). Discrete Choice Methods with Simulation. Cambridge/New York: Cambridge University Press.

Wooldridge, J. M. (2009). Introductory Econometrics: A modern approach. Mason: Thomson/South-Western.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge: MIT Press.

PART I

Estimation and Inference

Estimation techniques: Ordinary least squares and maximum likelihood

Martin Elff

INTRODUCTION

A major task in regression analysis and in much of data analysis in the social sciences in general is the construction of a model that best represents (1) substantial assumptions and hypotheses a researcher may entertain and (2) auxiliary information or assumptions about the way the data under analysis are generated. To complete this task of model specification successfully, a researcher will need a fair knowledge of a variety of statistical models and their assumptions. Introducing these is one of the main purposes of this volume. In contrast to most other chapters, the present one presumes all questions with regard to model specification as already addressed and focuses on the theoretical foundations of a step that comes thereafter, the step of estimating model parameters.

While model specification sometimes appears to be something of an art, estimation clearly is a technique, the application of which researchers often gladly delegate to their computers. But for scholars intent on gaining a full understanding of the research process it is important to know the foundations of estimation. Therefore it is the purpose of this chapter to introduce these foundations, to provide an understanding of what it means to estimate parameters and to give some idea of what a ‘good’ estimator is.

The task of model specification usually leads us to a probability model of the process by which the data under analysis are generated. That is, we assume that each piece of data that we have observed, could have observed or may observe in the future has done or will do so with a particular probability. In other words, our data are observations of random variables. Roughly speaking, a random variable is a set of numbers, called the sample space, together with probabilities assigned to them or to subsets of the sample space. The set of rules by which probabilities are assigned to numbers or sets of numbers is the probability distribution of the random variable. For example, if we roll a die, then the number it shows is an observation of a random variable t...

Cover Page
Half Title
Title
Copyright
Contents
Contributors
Preface
1 Intoduction
Part I: Estimation and Inference
Part II: Regression Analysis for Cross-Sections
Part III: Causal Inference and Analysis of Longitudinal Data
Name Index
Subject Index

About this book

Frequently asked questions

Information

Table of contents