Psychology

Reliability and Validity

Reliability refers to the consistency and stability of measurement, indicating that the same results would be obtained if the measurement were repeated. Validity, on the other hand, refers to the accuracy and truthfulness of a measurement, ensuring that it measures what it is intended to measure. In psychological research, both reliability and validity are crucial for ensuring the trustworthiness and meaningfulness of study findings.

Written by Perlego with AI-assistance

11 Key excerpts on "Reliability and Validity"

  • Book cover image for: Encyclopedia of Psychological Assessment
    As we seek to learn more about the validity of a test for a particular purpose, we gradually improve our ability to measure psychological constructs, which remain some of the most intractable constructs of modern science. Psychological assessment is at the heart of scientific psychology. Validity is the heart of psychological assessment. References American Educational Research Association, Amer-ican Psychological Association & National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association. Campbell, D.T. & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait–multi-method matrix. Psychological Bulletin , 56 , 81–105. Cronbach, L.J. (1971). Test validation. In Thorndike, R.L. (Ed.), Educational Measurement (2nd ed., pp. 443–507). Washington, DC: American Council on Education. Huff, K.L. & Sireci, S.G. (2001). Validity issues in computer-based testing. Educational Measurement: Issues and Practice , 20 , 16–25. Kane, M.T. (1992). An argument based approach to validity. Psychological Bulletin , 112 , 527–535. Messick, S. (1989). Validity. In Linn, R. (Ed.), Educational Measurement (3rd ed., pp. 13–100). Washington, DC: American Council on Education. Pitoniak, M.J., Sireci, S.G. & Luecht, R.M. (2002). A multitrait–multimethod validity investigation of scores from a professional licensure exam. Educa-tional and Psychological Measurement , 62 , 498–516. Reise, S.P., Widaman, K.F. & Pugh, R.H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychological Bulletin , 114 , 552–566. Sheehan, K.M. (1997). A tree-based approach to proficiency scaling and diagnostic assessment. Journal of Educational Measurement , 34 , 333–352. Shepard, L.A. (1993). Evaluating test validity. Review of Research in Education , 19 , 405–450. Sireci, S.G. (1998a). Gathering and analyzing content validity data.
  • Book cover image for: Essentials of Psychological Testing
    • Susana Urbina, Alan S. Kaufman, Nadeen L. Kaufman(Authors)
    • 2014(Publication Date)
    • Wiley
      (Publisher)
    Four Essentials of Reliability
    The term reliability suggests trustworthiness. To the extent that decisions of any kind are to be made, wholly or in part, on the basis of test scores, test users need to make sure that the scores are reasonably trustworthy. When used in connection with tests and measurements, reliability is based on the consistency and precision of the results of the measurement process. In order to have some degree of confidence or trust in scores, test users require evidence to the effect that the scores obtained from tests would be consistent if the tests were repeated on the same individuals or groups and that the scores are reasonably precise.
    Whereas reliability in measurement implies consistency and precision, lack of reliability implies inconsistency and imprecision, both of which are equated with measurement error. In the context of testing, measurement error may be defined as any fluctuation in scores that results from factors related to the measurement process that are irrelevant to what is being measured. Reliability, then, is a quality of test scores that suggests they are sufficiently consistent and free from measurement error to be useful.
    Note that, in order to be useful, test scores do not need to be either totally consistent or error free. As we saw in Chapter 1 , even in the physical sciences—some of which can boast of incredibly reliable instrumentation—measurements are always subject to some degree of error and fluctuation. In the social and behavioral sciences, measurements are much more prone to error due to the elusive nature of the constructs that are assessed and to the fact that the behavioral data through which they are assessed can be affected by many more intractable factors than other types of data (see Rapid Reference 5.2 on Deconstructing Constructs in Chapter 5
  • Book cover image for: Advanced Research Methods for the Social and Behavioral Sciences
    ´ Part One: Performing Good Research ´ 2 Reliability and Validity of Measurement in the Social and Behavioral Sciences Michael F. Wagner John J. Skowronski As scientists, one of our tasks is to measure stuff. Chemists measure atomic weight; physi- cists measure the speed of light. Developing good measures is crucial to science. Science was dramatically advanced by the electron microscope, the Hubble telescope, and func- tional magnetic resonance imaging (fMRI). Good measures are crucial to good science. Two principles are fundamental to good measurement. The first is that a measure must be valid: It must measure what it claims to measure (e.g., Kelly, 1927). The second fun- damental principle is that a measure must be reliable: The measure ought to produce about the same reading each time it is used (Nunn- ally, 1978). Some argue that these are linked: A measure needs to exhibit reliability before it can be considered to be valid. For example, a scale with a loose spring that produces wildly different readings each time it weighs the same object would probably not produce a valid weight for the object (although a valid weight might be produced once in a while). However, note that even perfect reliability does not ensure validity: A scale that is always 50 pounds off may be reliable, but it does not produce a valid measure of an object’s weight. Stated more formally, at least some degree of reliability is a necessary condition, but not a sufficient condition, for the validity of a measure. Establishing that measures are reliable and valid is important to the extent to which one trusts research results. For example, many are attracted to implicit measures of memories, beliefs, and attitudes. Some believe that implicit measures assess knowledge that is not easily available via responses to self-report scales (implicit knowledge, e.g., Dienes, Scott, & Wan, 2011).
  • Book cover image for: Planning an Applied Research Project in Hospitality, Tourism, and Sports
    • Frederic B. Mayo(Author)
    • 2013(Publication Date)
    • Wiley
      (Publisher)
    The basic concept behind validity is the notion of accuracy and honesty—does the research design do what it says it does? In everyday terms, validity refers to the honesty and accuracy of a statement and its author. It denotes that something can be trusted to be true. A valid 140 CHAPTER 9 Validity, Reliability, and Credibility in Research problem is a real problem; a valid statement is an honest one that can be supported and backed up. For example, the statement “Customer loyalty is an important issue for hotel general managers” is a valid statement, but the statement “Customer loyalty is the only important issue for hotel general managers” is not. Or “Cultural heritage tourism draws a large number of tourists to New York City” is valid, but “Cultural heritage tourism is the major reason that visitors come to New York City” is not. “Many New York city residents attend Yankee games to cheer their team” is valid, but “New York City residents are fanatical about the New York Yankees” is not. The first statements are broad and general, but the second statements make assertions about customer loyalty, tourism motivation, and sports fans that cannot be supported by data and are, therefore, not valid. In discussions of validity, many investigators have adopted a range of synonyms and related terms to describe validity. Some of them include quality, rigor, and trust- worthiness (Golaafshani 602). The central issue, however, remains determining the extent to which a statement is or must be true (Schwandt 267). Validity has been commonly divided into internal validity and external validity. As an aspect of research, internal validity refers to the clarity, focus, and integrity of the research design. Does it actually do what it says it does? Did it actually follow the steps described? External validity , on the other hand, derives from the context within which research is conducted.
  • Book cover image for: Measurement Theory and Applications for the Social Sciences
    In doing so, I put these into historical context, briefly explaining how the different conceptualizations came about. I then discuss current conceptualizations of validity, with an emphasis on how these differ from the traditional views. In the final section, I turn to the types of validity evidence emphasized in the most recent edi- tion (2014) of the Standards for Educational and Psychological Measurement (referred to hereafter as the Standards ), which is a joint publication of the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME), and is widely considered to be one of the most authoritative sources on measurement standards in the social sciences. 11 Validity Validity 255 VALIDITY DEFINED Currently, no definition of validity is accepted by all players in the validity debate. The following definition is taken from the Standards: “Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (AERA, APA, & NCME, p. 11). Validity thus has to do with the underlying rationale for our interpretations and uses of test scores. In other words, validity has to do with both the meaning of test scores and how we use them. As such, validity is justifiably “the most fundamental consideration in developing tests and evaluating tests,” as stated in the Standards (p. 11). Returning to my earlier example, I might conclude, based on the extroversion test score described previously, that I am much more extroverted than I had ever realized. However, making such an inference from an online test of dubious origin may not be justified. Psychometrically speaking, it may be that the test simply does not yield scores that allow for such an inference.
  • Book cover image for: Introduction to Research in Education
    • Donald Ary, Lucy Jacobs, Christine Sorensen Irvine, David Walker(Authors)
    • 2018(Publication Date)
    c h a p t e r Validity and Reliability 6 91 Q Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 92 PART TWO QUANTITATIVE RESEARCH 6-1 Validity Validity is the most important consideration in developing and evaluating measuring in-struments. Historically, validity was defined as the extent to which an instrument measured what it claimed to measure. The focus of recent views of validity is not on the instrument itself but on the interpretation and meaning of the scores derived from the instrument. The most recent Standards for Educational and Psychological Testing (2014), prepared by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education, operationalizes validity as “the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (p. 9). Measuring instruments yield scores; however, the impor-tant issue is the interpretation we make of the scores, which may or may not be valid. For example, a fourth-grade math test that might allow a teacher to make valid interpretations about the math achievement of her fourth-grade students would not yield valid interpre-tations about the fourth-graders’ abilities to solve algebra problems. If we tried to use the math achievement test for this purpose, the interpretations about the students’ ability to solve algebra problems—not the test—would be invalid.
  • Book cover image for: Educational Research
    eBook - PDF

    Educational Research

    A Contextual Approach

    • Ken Springer(Author)
    • 2015(Publication Date)
    • Wiley
      (Publisher)
    CHAPTER 6 Validity and Reliability David H. Wells/Age Fotostock America, Inc. After studying this chapter, you will be able to answer the following questions: • What is validity and why is it desirable? • What are the main types of validity and how are they determined? • What is reliability and why is it desirable? • What are the main types of reli- ability and how are they calcu- lated? This chapter will prepare you to do the following: • Evaluate the validity and reli- ability of measures in research reports • Select appropriate measures for a study • Use and interpret measures in a valid and reliable way 151 152 Chapter 6 Validity and Reliability In Chapter 5 you studied the major types of scales and measures used in educational research, and you learned some of the basic concepts that inform the use of these materials. This chapter focuses on validity and reliability, two concepts of particular importance to researchers when they create, use, and evaluate quantitative measures. Validity: An Overview Imagine taking a test that included the following questions: • Why are beer cans tapered on the ends? • How many piano tuners are there in the world? • If you could remove any one of the 50 U.S. states, which would it be? These questions have been used in job interviews conducted by Microsoft Corporation (Poundstone, 2003). Clearly, a set of questions like these would not be a good test of general knowledge. Correct answers to the first two questions would depend on knowledge of some fairly obscure facts, and it is difficult to imagine how the third question could have a correct answer. Even so, from the perspective of an interviewer, these questions might constitute a good test of creativity when solving unusual problems in stressful situations. This example illustrates the idea that the interpretation of a measure is influenced not only by the content of the measure but also how the results are put to use.
  • Book cover image for: RESEARCH METHODS FOR BEHAVIORAL SCIENCE
    Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 92 Chapter 5 Reliability and Validity Reliability The reliability of a measure refers to the extent to which it is free from ran-dom error. One direct way to determine the reliability of a measured variable is to measure it more than once. For instance, you can test the reliability of a bathroom scale by weighing yourself on it twice in a row. If the scale gives the same weight both times (we’ll assume your actual weight hasn’t changed in between), you would say that it is reliable. But if the scale gives different weights each time, you would say that it is unreliable. Just as a bathroom scale is not useful if it is not consistent over time, an unreliable measured variable will not be useful in research. The next section reviews the different approaches to assessing a mea-sure’s reliability; these are summarized in Table 5.1. Test–Retest Reliability Test–retest reliability refers to the extent to which scores on the same measured variable correlate with each other on two different measurements given at two different times. If the test is perfectly reliable, and if the scores on the conceptual variable do not change over the time period, the individuals should receive the exact same score each time, and the correlation between the scores will be r 5 1.00. However, if the measured variable contains ran-dom error, the two scores will not be as highly correlated. Higher positive correlations between the scores at the two times indicate higher test–retest reliability. Although the test–retest procedure is a direct way to measure reliability, it does have some limits. For one thing, when the procedure is used to as-sess the reliability of a self-report measure, it can produce reactivity.
  • Book cover image for: Using Qualitative Methods in Psychology
    • Mary Kopala, Lisa A. Suzuki, Mary Kopala, Lisa A. Suzuki(Authors)
    • 1999(Publication Date)
    I have chosen the first two—trustworthiness and reflex-ivity—because of the consensus about these as hallmarks of quality work, and the third—representation—because it seems to be a crucial, next issue in the field. It is important to note at the start that whether the terms Reliability and Validity belong in considerations of qualitative research is debatable. (For 25 26 T H E P H I L O S O P H I C A L F O U N D A T I O N S examples of divergent views, see Becker [1996], Lather [1993], and Wolcott, [1990].) After all, these criteria have traditionally been used to assess the quality of quantitative research. Traditionally, reliability is described as the extent to which a research endeavor and findings can be replicated; validity refers to the extent to which findings can be considered true (Stiles, 1993). As these terms have been defined and used in discussions of quantitative work, they are not truly appropriate for discussing qualitative research. Never-theless, I choose to begin my consideration of quality in qualitative research with an exploration of these terms partly because most psychologists are familiar with, and have been trained to evaluate, research using these criteria. In addition, my choice is rooted in the belief that Reliability and Validity have been appropriated by quantitative researchers for too long. My hope, in a vein similar to Lather (1993), is that qualitative researchers may reclaim and redefine the terms needed to discuss qualitative work. Believing that the research we conduct is both reliable and valid, I discuss it as such. Thoughtful use of these terms—not as a defense or an appeal to the positivist paradigm— creates space to consider what is important in qualitative research endeavors. Acknowledging the many divergent opinions about evaluation criteria for qualitative research, I rely heavily on Denzin and Lincoln (1994) to summarize four positions: 1.
  • Book cover image for: Research Methods for Social Psychology
    • Dana S. Dunn(Author)
    • 2012(Publication Date)
    • Wiley
      (Publisher)
    We begin with the issue of validity, which comes in three main forms where research is concerned: internal validity, external validity, and construct validity (Campbell, 1957; Campbell & Stanley, 1966; Cook & Campbell, 1979; Shadish et al., 2001). Some of the key questions addressed by each form of validity are shown in Table 9.1. We begin by considering internal validity. Trusting Research Evidence: Demonstrating Internal Validity When should we trust the results of an experiment or field study? One answer to this question involves demonstrating the presence of internal validity. Internal validity is a necessary quality of any and all experimental research: There is clear evidence that any change in a dependent variable is attributable only to the influence of the intended independent variable. The “clear –1.0 –0.5 0 +0.5 +1.0 +1.5 Control (No lie) 20$ Low dissonance $1 High dissonance Figure 9.1 Demonstrating insufficient justification: Ratings of enjoyment from Festinger and Carlsmith (1959) 226 Validity and Realism in Research evidence” referred to here, of course, is that any confounded variables, uncontrolled variables, or other extraneous factors are ruled out as having any effects on the observed results (recall the discussion in Chapter 4). Internal validity, then, is about the nature and direction of causality. No experiment can be labeled a true experiment unless it has internal validity. Indeed, without internal validity, the results of any experiment cannot be trusted. What about field studies or really any social psychological research that is conducted out of the controlled confines of the laboratory? In ideal circumstances, any piece of social psycho- logical research should demonstrate internal validity. As already acknowledged, however, control and the ability to accurately assess cause and effect linkages between variables recede when we leave the lab (see Chapter 5).
  • Book cover image for: Handbook of Survey Research
    • Peter H. Rossi, James D Wright, Andy B. Anderson, Peter H. Rossi, James D Wright, Andy B. Anderson(Authors)
    • 2013(Publication Date)
    • Academic Press
      (Publisher)
    Thus a valid measure of sex role identity is one which measures that construct and not some other one. Or, a valid measure of anomie measures that construct only and not nay saying as well. As indicated earlier, a useful conceptual distinction is between theoretical and empirical validity. The former refers to the correlation between the under-lying, latent construct and the observed measure, whereas the latter refers to a correlation between the observed measure and some other observed criterion. The American Psychological Association (1974) distinguishes between three types of validity— (a) criterion-related, (b) content, and (c) construct. Criterion-related validity is the same as what we have labeled empirical valid-ity. Construct validity is a general concept that subsumes what we called theo-retical validity. We do not think that content validity is a measure of validity per se, but rather is a procedure which results in theoretical validity. We now explore the types of validity in more depth. Criterion-Related Validity (Empirical Validity) Criterion-related validity is defined as the correlation between a measure and some criterion variable of interest. Since the criterion variable might be one which exists in the present or one which one might want to predict in the future, criterion-related validity is broken into two types: predictive and concurrent. Predictive validity is an assessment of an individual's future standing on a criterion variable and can be predicted from present standing on a measure. For example, if one constructs a measure of work orientation its predictive validity for job performance might be ascertained by administering it to a group of new hirees and correlating it with some measure of success (e.g., supervisors' rat-ings, advances within the organization) at some later point in time.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.