Thirty-five years ago, the four authors of this book addressed the problems of validity in social science research. They were interested in new and unused methods for obtaining information. The original edition and an expanded version have often been cited as justification for using novel means to supplement, if not replace, conventional techniques, especially survey and archival research. Illustrations abound in this book. While the novelty of the illustrations will keep many a graduate student amused, the more serious purpose is to authorize and motivate ingenuity in obtaining information. Even more fundamental is the strategy of combining very different methods so that research results can, by triangulation, withstand "threats to validity" that so frequently invalidate single-measure, conventional research.
his survey directs attention to social science research data not obtained by interview or questionnaire. Some may think this exclusion does not leave much. It does. Many innovations in research method are to be found scattered throughout the social science literature. Their use, however, is unsystematic, their importance understated. Our review of this material is intended to broaden the social scientistâs currently narrow range of utilized methodologies and to encourage creative and opportunistic exploitation of unique measurement possibilities.
Today, the dominant mass of social science research is based upon interviews and questionnaires. We lament this overdependence upon a single, fallible method. Interviews and questionnaires intrude as a foreign element into the social setting they would describe, they create as well as measure attitudes, they elicit atypical roles and responses, they are limited to those who are accessible and will cooperate, and the responses obtained are produced in part by dimensions of individual differences irrelevant to the topic at hand.
But the principal objection is that they are used alone. No research method is without bias. Interviews and questionnaires must be supplemented by methods testing the same social science variables but having different methodological weaknesses.
In sampling the range of alternative approaches, we examine their weaknesses, too. The flaws are serious and give insight into why we do depend so much upon the interview. But the issue is not choosing among individual methods. Rather, it is the necessity for a multiple operationism, a collection of methods combined to avoid sharing the same weaknesses. The goal of this monograph is not to replace the interview but to supplement and cross-validate it with measures that do not require the cooperation of a respondent and that do not themselves contaminate the response.
Here are some samples of the kinds of methods we will be surveying in Chapters 2 through 6 of this monograph:
The floor tiles around the hatching-chick exhibit at Chicagoâs Museum of Science and Industry must be replaced every six weeks. Tiles in other parts of the museum need not be replaced for years. The selective erosion of tiles, indexed by the replacement rate, is a measure of the relative popularity of exhibits.
The accretion rate is another measure. One investigator wanted to learn the level of whisky consumption in a town which was officially âdry.â He did so by counting empty bottles in ashcans.
The degree of fear induced by a ghost-story-telling session can be measured by noting the shrinking diameter of a circle of seated children.
Chinese jade dealers have used the pupil dilation of their customers as a measure of the clientâs interest in particular stones, and Darwin in 1872 noted this same variable as an index of fear.
Library withdrawals were used to demonstrate the effect of the introduction of television into a community. Fiction titles dropped, nonfiction titles were unaffected.
The role of rate of interaction in managerial recruitment is shown by the overrepresentation of baseball managers who were infielders or catchers (high-interaction positions) during their playing days.
Sir Francis Galton employed surveying hardware to estimate the bodily dimensions of African women whose language he did not speak.
The childâs interest in Christmas was demonstrated by distortions in the size of Santa Claus drawings.
Racial attitudes in two colleges were compared by noting the degree of clustering of Negroes and whites in lecture halls.
These methods have been grouped into chapters by the characteristic of the data: physical traces, archives, observations.
Before making a detailed examination of such methods, it is well to present a closer argument for the use of multiple methods and to present a methodological framework within which both the traditional and the more novel methods can be evaluated.
The reader may skip directly to Sherlock Holmes and the opening of Chapter 2 if he elects, infer the criteria in a piece of detection himself, and then return for a validity check.
OPERATIONISM AND MULTIPLE OPERATIONS
The social sciences are just emerging from a period in which the precision of carefully specified operations was confused with operationism by definitional fiatâan effort now increasingly recognized as an unworkable model for science. We wish to retain and augment the precision without bowing to the fiat.
The mistaken belief in the operational definition of theoretical terms has permitted social scientists a complacent and self-defeating dependence upon single classes of measurementâusually the interview or questionnaire. Yet the operational implication of the inevitable theoretical complexity of every measure is exactly opposite; it calls for a multiple operationism, that is, for multiple measures which are hypothesized to share in the theoretically relevant components but have different patterns of irrelevant components (e.g., Campbell, 1960; Campbell & Fiske, 1959; Garner, 1954; Garner, Hake, & Eriksen, 1956; Humphreys, 1960).
Once a proposition has been confirmed by two or more independent measurement processes, the uncertainty of its interpretation is greatly reduced. The most persuasive evidence comes through a triangulation of measurement processes. If a proposition can survive the onslaught of a series of imperfect measures, with all their irrelevant error, confidence should be placed in it. Of course, this confidence is increased by minimizing error in each instrument and by a reasonable belief in the different and divergent effects of the sources of error.
A consideration of the laws of physics, as they are seen in that scienceâs measuring instruments, demonstrates that no theoretical parameter is ever measured independently of other physical parameters and other physical laws. Thus, a typical galvanometer responds in its operational measurement of voltage not only according to the laws of electricity but also to the laws of gravitation, inertia, and friction. By reducing the mass of the galvanometer needle, by orienting the needleâs motion at right angles to gravity, by setting the needleâs axis in jeweled bearings, by counterweighting the needle point, and by other refinements, the instrument designer attempts to minimize the most important of the irrelevant physical forces for his measurement purposes. As a result, the galvanometer reading may reflect, almost purely, the single parameter of voltage (or amperage, etc.).
Yet from a theoretical point of view, the movement of the needle is always a complex product of many physical forces and laws. The adequacy with which the needle measures the conceptually defined variable is a matter for investigation; the operation itself is not the ultimate basis for defining the variable. Excellent illustrations of the specific imperfections of measuring instruments are provided by Wilson (1952).
Starting with this example from physics and the construction of meters, we can see that no meter ever perfectly measures a single theoretical parameter; all series of meter readings are imperfect estimates of the theoretical parameters they are intended to measure.
Truisms perhaps, yet they belie the mistaken concept of the âoperational definitionâ of theoretical constructs which continues to be popular in the social sciences. The inappropriateness is accentuated in the social sciences because we have no measuring devices as carefully compensated to control all irrele-vancies as is the galvanometer. There simply are no social science devices designed with so perfect a knowledge of all the major relevant sources of variation. In physics, the instruments we think of as âdefinitionalâ reflect magnificently successful theoretical achievements and themselves embody classical experiments in their very operation. In the social sciences, our measures lack such control. They tap multiple processes and sources of variance of which we are as yet unaware. At such a stage of development, the theoretical impurity and factorial complexity of every measure are not niceties for pedantic quibbling but are overwhelmingly and centrally relevant in all measurement applications which involve inference and generalization.
Efforts in the social sciences at multiple confirmation often yield disappointing and inconsistent results. Awkward to write up and difficult to publish, such results confirm the gravity of the problem and the risk of false confidence that comes with dependence upon single methods (Campbell, 1957; Campbell & Fiske, 1959; Campbell & McCormack, 1957; Cook & Selltiz, 1964; Kendall, 1963; Vidich & Shapiro, 1955). When multiple operations provide consistent results, the possibility of slippage between conceptual definition and operational specification is diminished greatly.
This is not to suggest that all components of a multimethod approach should be weighted equally. Prosser (1964) has observed: â. . . but there is still no man who would not accept dog tracks in the mud against the sworn, testimony of a hundred eye-witnesses that no dog had passed byâ (p. 216). Components ideally should be weighted according to the amount of extraneous variation each is known to have and, taken in combination, according to their independence from similar sources of bias.
INTERPRETABLE COMPARISONS AND PLAUSIBLE RIVAL HYPOTHESES
In this monograph, we deal with methods of measurement appropriate to a wide range of social science studies. Some of these studies are comparisons of a single group or unit at two or more points in time; others compare several groups or units at one time; others purport to measure but a single unit at a single point in time; and, to close the circle, some compare several groups at two or more points in time. In this discussion, we assume that the goal of the social scientist is always to achieve interpretable comparisons, and that the goal of methodology is to rule out those plausible rival hypotheses which make comparisons ambiguous and tentative.
Often it seems that absolute measurement is involved, and that a social instance is being described in its splendid isolation, not for comparative purposes. But a closer look shows that absolute, isolated measurement is meaningless. In all useful measurement, an implicit comparison exists when an explicit one is not visible. âAbsoluteâ measurement is a convenient fiction and usually is nothing more than a shorthand summary in settings where plausible rival hypotheses are either unimportant or so few, specific, and well known as to be taken into account habitually. Thus, when we report a length âabsolutelyâ in meters or feet, we immediately imply comparisons with numerous familiar objects of known length, as well as comparisons with a standard preserved in some Paris or Washington sanctuary.
If measurement is regarded always as a comparison, there are three classes of approaches which have come to be used in achieving interpretable comparisons. First, and most satisfactory, is experimental design. Through deliberate randomization, the ceteris of the pious ceteris paribus prayer can be made paribus. This may require randomization of respondents, occasions, or stimulus objects. In any event, the randomization strips of plausibility many of the otherwise available explanations of the difference in question. It is a sad truth that randomized experimental design is possible for only a portion of the settings in which social scientists make measurements and seek interpretable comparisons. The number of opportunities for its use may not be staggering, but, where possible, experimental design should by all means be exploited. Many more opportunities exist than are used.
Second, a quite different and historically isolated tradition of comparison is that of index numbers. Here, sources of variance known to be irrelevant are controlled by transformations of raw data and weighted aggregates. This is analogous to the compensated and counterbalanced meters of physical science which also control irrelevant sources of variance. The goal of this old and currently neglected social science tradition is to provide measures for meaningful comparisons across wide spans of time and social space. Real wages, intelligence quotients, and net reproductive rates are examples, but an effort in this direction is made even when a percentage, a per capita, or an annual rate is computed. Index numbers cannot be used uncritically because the imperfect knowledge of the laws invoked in any such measurement situation precludes computing any effective all-purpose measures.
Furthermore, the use of complex compensated indices in the assurance that they measure what they are devised for has in many instances proved quite misleading. A notable example is found in the definitional confusion surrounding the labor force concept (Jaffe & Stewart, 1951; Moore, 1953). Often a relationship established between an over-all index and external variables is found due to only one component of the index. Cronbach (1958) has described this problem well in his discussion of dyadic scores of interpersonal perception. In the older methodological literature, the problem is raised under the term index correlations (e.g., Campbell, 1955; Guilford, 1954; Stouffer, 1934).
Despite these limitations, the problem of index numbers, which once loomed large in sociology and economics, deserves to be reactivated and integrated into modern social science methodology. The tradition is relevant in two ways for the problems of this monograph. Many of the sources of data suggested here, particularly secondary records, require a transformation of the raw data if they are to be interpretable in any but truly experimental situations. Such transformations should be performed with the wisdom accumulated within the older tradition, as well as with a regard for the precautionary literature just cited. Properly done, such transformations often improve inter-pretability even if they fall far short of some ideal (cf. Bernstein, 1935).
A second value of the literature on index numbers lies in an examination of the types of irrelevant variation which the index computation sought to exclude. The construction of index numbers is usually a response to criticisms of less sophisticated indices. They thus embody a summary of the often unrecorded criticisms of prior measures. In the criticisms and the corrections are clues to implicit or explicit plausible rival interpretations of differences, the viable threats to valid interpretation.
Take so simple a measure as an index on unemployment or of retail sales. The gross number of the unemployed or the gross total dollar level of sales is useless if one wants to make comparisons within a single year. Some of the objections to the gross figures are reflected in the seasonal corrections applied to time-series data. If we look at only the last quarter of the year, we can see that the effect of weather must be considered. Systematically, winter depresses the number of employed construction workers, for example, and increases the unemployment level. Less systematically, spells of bad weather keep people in their homes and reduce the amount of retail shopping. Both periodic and aperiodic elements of the weather should be considered if one wants a more stable and interpretable measure of unemployment or sales. So, too, our custom of giving gifts at Christmas spurs December sales, as does the coinciding custom of Christmas bonuses to employees. All of these are accounted for, crudely, by a correction applied to the gross levels for either December or the final quarter of the year.
Some of these sources of invalidity are too specific to a single setting to be generalized usefully; others are too obvious to be catalogued. But some contribute to a general enumeration of recurrent threats to valid interpretation in social science measures.
The technical problems of index-number construction are heroic. âThe index number should give consistent results for different base periods and also with its counterpart price or quantity index. No reasonably simple formula satisfies both of these consistency requirementsâ (Ekelblad, 1962, p. 726). The consistency problem is usually met by substituting a geometric mean for an arithmetic one, but then other problems arise. With complex indices of many components, there is the issue of getting an index that will yield consistent scores across all the different levels and times of the components.
In his important work on economic cycles, Hansen (1921) wrote, âHere is a heterogeneous group of statistical series all of which are related in a causal way, somehow or another, to the cycle of prosperity and depressionâ (p. 21). The search for a metric to relate these different components consistently, to be able to reverse factors without chaos, makes index construction a difficult task. But the payoff is great, and the best approximation to solving both the base-reversal and factor-reversal issues is a weighted aggregate with time-averaged weights. For good introductory statements of these and other index-number issues, see Ekelblad (1962), Yule and Kendall (1950), and Zeisel (1957). More detailed treatments can be found in Fisher (1923), Mills (1927), Mitchell (1921), and Mudgett (1951).
The third general approach to comparison may be called that of âplausible rival hypotheses.â It is the most general and least formal of the three and is applicable to the other two. Given a comparison which a social scientist wishes to ...
Table of contents
Cover Page
Title Page
Copyright Page
Contents
Preface to the First Edition (1965)
Introduction to the Classic Edition of Unobtrusive Measures
1. Approximations to Knowledge
2. Physical Traces: Erosion and Accretion
3. Archives I: The Running Record
4. Archives II: The Episodic and Private Record
5. Simple Observation
6. Contrived Observation: Hidden Hardware and Control
7. A Final Note
8. A Statistician on Method
9. Cardinal Newmanâs Epitaph
References
Further Reading
Index
About the Authors
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, weâve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere â even offline. Perfect for commutes or when youâre on the go. Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Unobtrusive Measures by Eugene J. Webb,Donald T. Campbell,Richard D. Schwartz,Lee Sechrest in PDF and/or ePUB format, as well as other popular books in Sozialwissenschaften & Wissenschaftliche Forschung & Methodik. We have over one million books available in our catalogue for you to explore.