The SAGE Handbook of Quantitative Methodology for the Social Sciences
eBook - ePub

The SAGE Handbook of Quantitative Methodology for the Social Sciences

  1. 528 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

The SAGE Handbook of Quantitative Methodology for the Social Sciences

About this book

Click ?Additional Materials? for downloadable samples

"The 24 chapters in this Handbook span a wide range of topics, presenting the latest quantitative developments in scaling theory, measurement, categorical data analysis, multilevel models, latent variable models, and foundational issues. Each chapter reviews the historical context for the topic and then describes current work, including illustrative examples where appropriate. The level of presentation throughout the book is detailed enough to convey genuine understanding without overwhelming the reader with technical material. Ample references are given for readers who wish to pursue topics in more detail. The book will appeal to both researchers who wish to update their knowledge of specific quantitative methods, and students who wish to have an integrated survey of state-of- the-art quantitative methods."
—Roger E. Millsap, Arizona State University

"This handbook discusses important methodological tools and topics in quantitative methodology in easy to understand language. It is an exhaustive review of past and recent advances in each topic combined with a detailed discussion of examples and graphical illustrations. It will be an essential reference for social science researchers as an introduction to methods and quantitative concepts of great use."
—Irini Moustaki, London School of Economics, U.K.

"David Kaplan and SAGE Publications are to be congratulated on the development of a new handbook on quantitative methods for the social sciences. The Handbook is more than a set of methodologies, it is a journey. This methodological journey allows the reader to experience scaling, tests and measurement, and statistical methodologies applied to categorical, multilevel, and latent variables. The journey concludes with a number of philosophical issues of interest to researchers in the social sciences. The new Handbook is a must purchase."
—Neil H. Timm, University of Pittsburgh

The SAGE Handbook of Quantitative Methodology for the Social Sciences is the definitive reference for teachers, students, and researchers of quantitative methods in the social sciences, as it provides a comprehensive overview of the major techniques used in the field. The contributors, top methodologists and researchers, have written about their areas of expertise in ways that convey the utility of their respective techniques, but, where appropriate, they also offer a fair critique of these techniques. Relevance to real-world problems in the social sciences is an essential ingredient of each chapter and makes this an invaluable resource.

The handbook is divided into six sections:

• Scaling
• Testing and Measurement
• Models for Categorical Data
• Models for Multilevel Data
• Models for Latent Variables
• Foundational Issues

These sections, comprising twenty-four chapters, address topics in scaling and measurement, advances in statistical modeling methodologies, and broad philosophical themes and foundational issues that transcend many of the quantitative methodologies covered in the book.

The Handbook is indispensable to the teaching, study, and research of quantitative methods and will enable readers to develop a level of understanding of statistical techniques commensurate with the most recent, state-of-the-art, theoretical developments in the field. It provides the foundations for quantitative research, with cutting-edge insights on the effectiveness of each method, depending on the data and distinct research situation.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access The SAGE Handbook of Quantitative Methodology for the Social Sciences by David Kaplan, David W. Kaplan in PDF and/or ePUB format, as well as other popular books in Ciencias sociales & Investigación y metodología de las ciencias sociales. We have over one million books available in our catalogue for you to explore.

Section VI

FOUNDATIONAL ISSUES

Chapter 20

PROBABILISTIC MODELING WITH BAYESIAN NETWORKS

RICHARD E. NEAPOLITAN

SCOTT MORRIS

20.1. INTRODUCTION

Given a set of random variables, probabilistic modeling consists of acquiring properties of a joint probability distribution of the variables and thereby representing that distribution. These properties can be very important because they often enable us to succinctly represent a distribution and to do inference with the variables. For example, we may be able to concisely represent a joint probability distribution of diseases and manifestations in a medical application and, using this representation, compute the probability that a patient has certain diseases given the patient has some manifestations. First, Section 20.2 gives a brief philosophical overview of the notion of a probability as a relative frequency, which probabilistic modeling using data presupposes. Then, Section 20.3 introduces Bayesian networks and Bayesian network models (also called directed acyclic graph [DAG] models). Next, Section 20.4 discusses learning DAG models. Finally, Section 20.5 shows applications of learning DAG models.

20.2. PHILOSOPHICAL BACKGROUND

The focus of this chapter is on learning DAG models from data. The enterprise of learning something about a probability distribution from data relies on the notion of a probability as a relative frequency. So we first review the relative frequency approach to probability, and then we discuss its relationship to another approach to probability, called subjective or Bayesian.

20.2.1. The Relative Frequency Approach to Probability

In 1919, Richard von Mises developed the relative frequency approach to probability, which concerns repeatable identical experiments. First we describe relative frequencies, and then we discuss how we can learn something about them from data.
20.2.1.1. Relative Frequencies
von Mises (1928/1957) formalized the notion of repeatable identical experiments as follows:
The term is “the collective,” and it denotes a sequence of uniform events or processes which differ by certain observable attributes, say colours, numbers, or anything else. (p. 12, emphasis added)
The classical example of a collective is an infinite sequence of tosses of the same coin. Each time we toss the coin, our knowledge about the conditions of the toss is the same (assuming we do not sometimes “cheat” by, for example, holding it close to the ground and trying to flip it just once). Of course, something is different in the tosses (e.g., the distance from the ground, the torque we put on the coin, etc.) because otherwise, the coin would always land heads or always land tails. But we are not aware of these differences. Our knowledge concerning the conditions of the experiment is always the same. von Mises (1928/1957) argued that, in such repeated experiments, the fraction of occurrence of each outcome approaches a limit, and he called this limit the probability of the outcome. It has become standard to call this limit a relative frequency and to use the term probability in a more general sense.
Note that the collective (infinite sequence) only exists in theory. We never will toss the coin indefinitely. Rather, the theory assumes that there is a propensity for the coin to land heads, and as the number of tosses approaches infinity, the fraction of heads approaches this propensity. For example, if m is the number of times we toss the coin, Sm is the number of heads, and p is the true value of the propensity for the coin to land heads, then
figure
Because the propensity is a physical property of the coin, it is also called a physical probability. In 1946, J. E. Kerrich conducted many experiments using games of chance (e.g., coin tosses) indicating that the fraction does appear to approach a limit.
Note further that a collective is only defined relative to a random process, which, in the von Mises theory, is defined to be a repeatable experiment for which the infinite sequence of outcomes is assumed to be a random sequence. Intuitively, a random sequence is one that shows no regularity or pattern. For example, the finite binary sequence “1011101100” appears random, whereas the sequence “1010101010” does not because it has the pattern “10” repeated five times. There is evidence that experiments such as coin tossing and dice throwing are indeed random processes. Namely, Iversen, Longcor, Mosteller, Gilbert, and Youtz (1971) ran many experiments with dice indicating that the sequence of outcomes is random. It is believed that unbiased sampling also yields a random sequence and is therefore a random process. See van Lambalgen (1987) for a thorough discussion of this matter, including a formal definition of random sequence. Neapolitan (1990) provides a more intuitive, less mathematical treatment. We close here with an example of a nonrandom process. One of the authors prefers to exercise at his health club on Tuesday, Thursday, and Saturday. However, if he misses a day, he usually makes up for it the following day. If we track the days he exercises, we will find a pattern because the process is not random.
Under the assumption that the fraction approaches a limit and that a random sequence is generated, in 1928, von Mises was able to derive the rules of probability theory and the result that the trials are probabilistically independent. In terms of relative frequencies, what does it mean for the trials to be independent? The following example illustrates what it means. Suppose we develop many sequences of length 20 (or any other number), where each sequence represents the result of 20 coin tosses. Then we separate the set of all these sequences into disjoint subsets such that the sequences in each subset all have the same outcome on the first 19 tosses. Independence means that the fraction of heads on the 20th toss is the same in all the subsets (in the limit).
A common way to define probability in applications such as games of chance is to assign the same probability to all possible elemental outcomes. For example, in the draw of the top card from an ordinary deck of cards, each elemental outcome is assigned a probability of 1/52 because there are 52 different cards. Such probabilities are called ratios. We say we are using the principle of indifference (a term popularized by J. M. Keynes in 1921/1948) when we assign probabilities this way. The probability of a set of elemental outcomes is the sum of the probabilities of the outcomes in the set. For example, the probability of a king is 4/52 because there are four kings. How are relative frequencies related to ratios? Intuitively, we would expect that if, for example, we repeatedly shuffled a deck of cards and drew the top card, the ace of spades would come up about 1 out of every 52 times. In the experiment performed by J. E. Kerrich in 1946 (discussed above), the principle of indifference seemed to apply, and the limit was indeed the value obtained via the principle of indifference.
20.2.1.2. Sampling
Sampling techniques estimate a relative frequency for a given collective from a finite set of observations. In accordance with standard statistical practice, we use the term random sample (or simply sample) to denote the set of observations, and we call a collective a population. Note the difference between a collective and a finite population. There are currently a finite number of smokers in the world. The fraction of them with lung cancer is the probability (in the sense of a ratio) of a current smoker having lung cancer. The propensity (relative frequency) of a smoker having lung cancer may not be exactly equal to this ratio. Rather, the ratio is just an estimate of that propensity. When doing statistical inference, we sometimes want to estimate the ratio in a finite population from a sample of the population, and other times we want to estimate a propensity from a finite sequence of observations. For example, TV raters ordinarily want to estimate the actual fraction of people in a nation watching a show from a sample of those people. On the other hand, medical scientists want to estimate the propensity with which smokers have lung cancer from a finite sequence of smokers. One can create a collective from a finite population by returning a sampled item back to the population before sampling the next item. This is called sampling with replacement. In practice, it is rarely done, but ordinarily, the finite population is so large that statisticians make the simplifying assumption that sampling is done with replacement. That is, they do not replace the item, but they still assume the finite population is unchanged for the next item sampled. In this chapter, we are always concerned with propensities rather than current ratios, so this simplifying assumption does not concern us.
Estimating a relative frequency from a sample seems straightforward. That is, we simply use Sm/m as our estimate, where m is the number of trials and Sm is the number of successes. However, there is a problem in determining our confidence in the estimate. That is, the von Mises theory only says the limit in Equality 1 physically exists and is p. It is not a mathematical limit in that, given an ε > 0, it offers no means for finding an M(e) such that
figure
Mathematical probability theory enables us to determine confidence in our estimate of p. First, if we assume the trials are probabilistically independent and the probability for each trial is p, we can prove that Sm/m is the maximum likelihood (ML) value of p. That is, if d is a set of results of m trials, and P(d :
figure
) denotes the probability of d if the probability of success were
figure
, then Sm/m is the value of p that maximizes P(d :
figure
). Furthermore, we can prove the weak and strong laws of large numbers. The weak law says the following. Given ε, δ > 0,
figure
So mathematically, we have a means of finding an M(ε, δ).
The weak law is not applied directly to obtain confidence in our estimate. Rather, we obtain a confidence interval using the following res...

Table of contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Contents
  5. Preface
  6. Acknowledgments
  7. SECTION I SCALING
  8. SECTION II TESTING AND MEASUREMENT
  9. SECTION III MODELS FOR CATEGORICAL DATA
  10. SECTION IV MODELS FOR MULTILEVEL DATA
  11. SECTION V MODELS FOR LATENT VARIABLES
  12. SECTION VI FOUNDATIONAL ISSUES
  13. Name Index
  14. Subject Index
  15. About the Editor
  16. About the Contributors