Measurement Theory in Action
eBook - ePub

Measurement Theory in Action

Case Studies and Exercises

  1. 416 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Measurement Theory in Action

Case Studies and Exercises

About this book

Measurement Theory in Action, Third Edition, helps readers apply testing and measurement theories and features 22 self-contained modules which instructors can match to their courses. Each module features an overview of a measurement issue and a step-by-step application of that theory. Best Practices provide recommendations for ensuring the appropriate application of the theory. Practical Questions help students assess their understanding of the topic. Students can apply the material using real data in the Exercises, some of which require no computer access, while others involve the use of statistical software to solve the problem. Case Studies in each module depict typical dilemmas faced when applying measurement theory followed by Questions to Ponder to encourage critical examination of the issues noted in the cases. The book's website houses the data sets, additional exercises, PowerPoints, and more. Other features include suggested readings to further one's understanding of the topics, a glossary, and a comprehensive exercise in Appendix A that incorporates many of the steps in the development of a measure of typical performance.

Updated throughout to reflect recent changes in the field, the new edition also features:

  • Recent changes in understanding measurement, with over 50 new and updated references
  • Explanations of why each chapter, article, or book in each module's Further Readings section is recommended
  • Instructors will find suggested answers to the book's questions and exercises; detailed solutions to the exercises; test bank with 10 multiple choice and 5 short answer questions for each module; and PowerPoint slides. Students and instructors can access SPSS data sets; additional exercises; the glossary; and additional information helpful in understanding psychometric concepts.

It is ideal as a text for any psychometrics or testing and measurement course taught in psychology, education, marketing, and management. It is also an invaluable reference for professional researchers in need of a quick refresher on applying measurement theory.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Measurement Theory in Action by Kenneth S Shultz,David J. Whitney,Michael J Zickar,David Whitney in PDF and/or ePUB format, as well as other popular books in Psychology & Statistics for Business & Economics. We have over one million books available in our catalogue for you to explore.
Part I
Introduction

Module 1

Introduction and Overview

Thousands of important, and oftentimes life-altering, decisions are made every day. Who should we hire? Which students should be placed in accelerated or remedial programs? Which defendants should be incarcerated and which paroled? Which treatment regimen will work best for a given client? Should custody of this child be granted to the mother or the father or the grandparents? In each of these situations, a ā€œtestā€ may be used to help provide guidance. There are many vocal opponents to the use of standardized tests to make such decisions. However, the bottom line is that these critical decisions will ultimately be made with or without the use of test information. The question we have to ask ourselves is, ā€œCan a better decision be made with the use of relevant test information?ā€ In many, although not all, instances, the answer will be yes, if a well-developed and appropriate test is used in combination with other relevant, well-justified information available to the decision maker. The opposition that many individuals have to standardized tests is that they are the sole basis for making an important, sometimes life-altering, decision. Thus, it would behoove any decision maker to take full advantage of other relevant, well-justified information, where available, to make the best and most informed decision possible.
A quick point regarding ā€œother relevant and well-justified informationā€ is in order. What one decision maker sees as ā€œrelevantā€ may not seem relevant and well justified to another constituent in the testing process. For example, as one of the reviewers of an earlier edition of this book pointed out, a manager in an organization may be willing to use tests that demonstrate validity and reliability for selecting workers in his organization. However, he may ultimately decide to rely more heavily on what he deems to be ā€œother relevant information,ā€ but in fact is simply his belief in his own biased intuition about people or non-job relevant information obtained from social media profiles. To this supervisor his intuitions, or non-systematic information gathered from social media profiles, are viewed as legitimate ā€œother relevant informationā€ beyond test scores. However, others in the testing process may not view the supervisor’s intuitions, nor non-systematic information obtained from social media profiles, as relevant. Thus, when we say that other relevant information beyond well developed and validated tests should be used when appropriate, we are not talking about information such as intuition (which should be distinguished from professional judgment, which more often than not, is in fact relevant) nor non-systematic information obtained from, say, casually perusing a job applicant’s social media profiles. Rather, we are referring to additional relevant information such as professional references, systematic background checks, structured observations, professional judgments, and the like. That is, additional information that can be well justified, as well as systematically developed, collected, and evaluated. Thus, we are not recommending collecting and using additional information beyond tests simply for the sake of doing so. Rather, any ā€œother relevant informationā€ that is used in addition to test information to make critical decisions should be well justified and supported by professional standards, as well as appropriate for the context it is being proposed for.

What Makes Tests Useful

Tests can take many forms from traditional paper-and-pencil exams to portfolio assessments, job interviews, case histories, behavioral observations, computer adaptive assessments, and peer ratings—to name just a few. The common theme in all of these assessment procedures is that they represent a sample of behaviors from the test taker. Thus, psychological testing is similar to any science in that a sample is taken to make inferences about a population. In this case, the sample consists of behaviors (e.g., test responses on a paper-and-pencil test or performance of physical tasks on a physical ability test) from a larger domain of all possible behaviors representing a construct. For example, the first test we take when we come into the world is called the APGAR test. That’s right, just one minute into the world we get our first test. You probably do not remember your score on your APGAR test, but our guess is your mother does, given the importance this first test has in revealing your initial physical functioning. The purpose of the APGAR test is to assess a newborn’s general functioning right after birth. Table 1.1 displays the five categories that newborn infants are tested on at one and five minutes after birth: Appearance, Pulse, Grimace, Activity, and Respiration (hence, the acronym APGAR). A score is obtained by summing the newborn infant’s assessed value on each of the dimensions. Scores can range from 0 to 10. A score of 7–10 is considered normal. A score of 4–6 indicates that the newborn infant may require some resuscitation, while a score of 3 or less means the newborn would require immediate and intensive resuscitation. The infant is then assessed again at five minutes, and if the score still is below a 7, the infant may be assessed again at 10 minutes. If the infant’s APGAR score is 7 or above five minutes after birth, which is typical, then no further intervention is called for. Hence, by taking a relatively small sampling of behavior, we are (or at least a competent obstetrics nurse or doctor is) able to quickly, and quite accurately, assess the functioning of a newborn infant to determine if resuscitation interventions are required to help the newborn function properly.
Table 1.1 The APGAR Test Scoring Table
Sign Points
0 1 2
Appearance (color) Pale or blue Body pink, extremities blue Pink (normal for non-Caucasian)
Pulse (heartbeat) Not detectible Lower than 100 bpm Higher than 100 bpm
Grimace (reflex) No response Grimace Lusty cry
Activity (muscle tone) Flaccid Some movement A lot of activity
Respiration (breathing) None Slow, irregular Good (crying)
The utility of any assessment device, however, will depend on the qualities of the test and the intended use of the test. Test information can be used for a variety of purposes from making predictions about the likelihood that a patient will commit suicide to making personnel selection decisions by determining which entry-level workers to hire. Tests can also be used for classification purposes, as when students are designated as remedial, gifted, or somewhere in between. Tests can also be used for evaluation purposes, as in the use of a classroom test to evaluate performance of students in a given subject matter. Counseling psychologists routinely use tests to assess clients for emotional adjustment problems or possibly for help in providing vocational and career counseling. Finally, tests can also be used for research-only purposes such as when an experimenter uses a test to prescreen study participants to assign each one to an experimental condition. If the test is not used for its intended purpose, however, it will not be very useful and, in fact, may actually be harmful. As Anastasi and Urbina (1997) note, ā€œPsychological tests are tools … Any tool can be an instrument of good or harm, depending on how it is usedā€ (p. 2).
For example, most American children in grades 2–12 are required to take standardized tests on a yearly basis. These tests were initially intended for the sole purpose of assessing students’ learning outcomes. Over time, however, a variety of other misuses for these tests have emerged. For instance, they are frequently used to determine school funding and, in some cases, teachers’ or school administrators’ ā€œmeritā€ pay. However, given that determining the pay levels of educational employees was not the intended use of such standardized educational tests when they were developed, they almost always serve poorly in this capacity. Thus, a test that was developed with good (i.e., appropriate) intentions can be (mis)used for inappropriate purposes, limiting the usefulness of the test. In this instance, however, not only is the test of little use in setting pay for teachers and administrators, it may actually be causing harm to students by coercing teachers to ā€œteach to the test,ā€ thereby trading long-term gains in learning for short-term increases in standardized test performance.
In addition, no matter how the test is used, it will only be useful if it meets certain psychometric and practical requirements. From a psychometric or measurement standpoint, we want to know if the test is accurate, standardized, and reliable; if it demonstrates evidence of validity; and if it is free of both measurement and predictive bias. Procedures for determining these psychometric qualities form the core of the rest of this book. From a practical standpoint, the test must be cost effective as well as relatively easy to administer and score. Reflecting on our earlier example, we would surmise that the APGAR meets most of these qualities of being practical. Trained doctors and nurses in a hospital delivery room can administer the APGAR quickly and efficiently. Our key psychometric concern in this situation may be how often different doctors and nurses are able to provide similar APGAR scores in a given situation (i.e., the inter-rater reliability of the APGAR).

Individual Differences

Ultimately, when it comes right down to it, those interested in applied psychological measurement are usually interested in some form of individual differences (i.e., how individuals differ on test scores and the underlying traits being measured by those tests). If there are no differences in how target individuals score on the test, then the test will have little value to us. For example, if we give a group of elite athletes the standard physical ability test given to candidates for a police officer job, there will likely be very little variability in scores with all the athletes scoring extremely high on the test. Thus, the test data would provide little value in predicting which athletes would make good police officers. On the other hand, if we had a more typical group of job candidates who passed previous hurdles in the personnel selection process for police officer (e.g., cognitive tests, background checks, psychological evaluations) and administered them the same physical ability test, we would see much wider variability in scores. Thus, the test would at least have the potential to be a useful predictor of job success, as we would have at least some variability in the observed test scores.
Individual differences on psychological tests can take several different forms. Typically, we look at inter-individual differences where we examine differences on the same construct across individuals. In such cases, the desire is usually prediction. That is, how well does the test predict some criterion of interest? For example, in the preceding scenario, we would u...

Table of contents

  1. Cover
  2. Endorsements
  3. Half Title
  4. Title Page
  5. Copyright Page
  6. Dedication
  7. Contents
  8. About the Authors
  9. Preface
  10. PART I Introduction
  11. PART II Reliability, Validation, and Test Bias
  12. PART III Practical Issues in Test Construction
  13. PART IV Advanced Topics
  14. Appendix A. Course-Long Exercise on Psychological Scale Development
  15. Appendix B. Data Set Descriptions
  16. Glossary of Key Terms
  17. Author Index
  18. Subject Index