Applying Generalizability Theory using EduG
eBook - ePub

Applying Generalizability Theory using EduG

  1. 234 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Applying Generalizability Theory using EduG

About this book

Intended to help improve measurement and data collection methods in the behavioral, social, and medical sciences, this book demonstrates an expanded and accessible use of Generalizability Theory (G theory). G theory conceptually models the way in which the reliability of measurement is ascertained. Sources of score variation are identified as potential contributors to measurement error and taken into account accordingly. The authors demonstrate the powerful potential of G theory by showing how to improve the quality of any kind of measurement, regardless of the discipline.

Readers will appreciate the conversational style used to present a comprehensive review of G theory and its application using the freeware EduG. To maximize understanding the authors illustrate all fundamental principles with concrete examples from different fields and contexts. Annotated applications lead students through the main concepts of G theory, while illustrating both the use of EduG and interpretation of its output. Formulas are avoided wherever possible. Exercises with data sets available on the Psychology Press website allow readers to carry out their own analyses to reinforce understanding.

Brief overviews of analysis of variance, estimation, and the statistical error model are provided for review. The procedures involved in carrying out a generalizability study using EduG follow, as well as guidance in the interpretation of results. Real-world applications of G theory to the assessment of depression, managerial ability, attitudes, and writing and mathematical skills are then presented. Next, annotated exercises provide an opportunity for readers to use EduG and interpret its results. The book concludes with a review of the development of G theory and possible new directions of application. Finally, for those with a strong statistical background, the appendixes provide the formulas used by EduG.

Ideal as a supplement for courses on measurement theory and/or generalizability theory taught in departments of psychology, education, medicine, and the social sciences, this text will also appeal to researchers from a variety of fields interested in learning how to apply G theory to their studies.

Trusted byĀ 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

chapter one

What is generalizability theory?

Generalizability theory: Origin and developments

Not all measuring procedures can be perfectly accurate. In the social and health sciences in particular, but in the natural sciences as well, we can rarely assume our measurements to be absolutely precise. Whether we are attempting to evaluate attitudes to mathematics, managerial aptitude, perception of pain, or blood pressure, our scores and ratings will be subject to measurement error. This is because the traits or conditions that we are trying to estimate are often difficult to define in any absolute sense, and usually cannot be directly observed. So we create instruments that we assume will elicit evidence of the traits or conditions in question. But numerous influences impact on this process of measurement and produce variability that ultimately introduces errors in the results. We need to study this phenomenon if we are to be in a position to quantify and control it, and in this way to assure maximum measurement precision.
Generalizability theory, or G theory, is essentially an approach to the estimation of measurement precision in situations where measurements are subject to multiple sources of error. It is an approach that not only provides a means of estimating the dependability of measurements already made, but that also enables information about error contributions to be used to improve measurement procedures in future applications. Lee Cronbach is at the origin of G theory, with seminal co-authored texts that remain to this day essential references for researchers wishing to study and use the methodology (Cronbach, Gleser, Nanda, & Rajaratnam, 1972; Cronbach, Rajaratnam, & Gleser, 1963).
The originality of the G theory approach lies in the fact that it introduced a radical change in perspective in measurement theory and practice. In essence, the classical correlational paradigm gave way to a new conceptual framework, deriving from the analysis of variance (ANOVA), whose fundamental aim is to partition the total variance in a data set into a number of potentially explanatory sources. Despite this profound change in perspective, G theory does not in any way contradict the results and contributions of classical test theory. It rather embraces them as special cases in a more general problematic, regrouping within a unified conceptual framework concepts and techniques that classical theory presented in a disparate, almost disconnected, way (stability, equivalence, internal consistency, validity, inter-rater agreement, etc.). The impact of the change in perspective is more than a straightforward theoretical reformulation. The fact that several identifiable sources of measurement error (markers, items, gender, etc.) can simultaneously be incorporated into the measurement model and separately quantified means that alternative sampling plans can be explored with a view to controlling the effects of these variables in future applications. G theory thus plays a unique and indispensable role in the evaluation and design of measurement procedures.
That is why the Standards for educational and psychological testing (AERA, 1999), developed jointly by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education (and hence familiarly known as the ā€œJoint Standardsā€), stress the need to refer to G theory when establishing the validity and reliability of observation or testing procedures. The first two chapters immediately embrace this inferential perspective, in which generalization to a well-defined population is made on the basis of a representative random sample. The Standards explicitly refer to G theory at several points. For instance, the commentary for standard 2.10 states, with respect to reliability estimates based on repeated or parallel measures:
Where feasible, the error variances arising from each source should be estimated. Generalizability studies and variance component analyses are especially helpful in this regard. These analyses can provide separate error variance estimates for tasks within examinees, for judges and for occasions within the time period of trait stability. (AERA, 1999, p. 34)
We return later to some of the essential characteristics of the theory. For the moment we simply draw attention to two important stages in its evolution, of which the second can be considered as an extension of the first, since it has led to the expansion and considerable diversification of its fields of application.
As originally conceived, G theory was implicitly located within the familiar framework of classical test theory, a framework in which individuals (students, psychiatric patients, etc.) are considered as the objects of measurement, and the aim is to differentiate among them as reliably as possible. The principal requirement is to check that the instrument to be used, the test or questionnaire, can produce reliable measurements of the relative standing of the individuals on some given measurement scale, despite the inevitably disturbing influence on the measures of the random selection of the elements of the measurement instrument itself (the test or questionnaire items).
During the 1970s and 1980s, a potentially broader application of the model was identified by Jean Cardinet, Yvan Tourneur, and Linda Allal, who observed that the inherent symmetry in the ANOVA model that underpinned G theory was not being fully exploited at that time. They noted that in Cronbach’s development of G theory the factor Persons was treated differently from all other factors, in that persons, typically students, were consistently the only objects of measurement. Recognizing and exploiting model symmetry (i.e., the fact that any factor in a factorial design has the potential to become an object of measurement) allows research procedures as well as individual measurement instruments to be evaluated. Thus, procedures for comparing subgroups (as in comparative effectiveness studies of various kinds) can also be evaluated for technical quality, and improved if necessary (Cardinet & Allal, 1983; Cardinet & Tourneur, 1985; Cardinet, Tourneur, & Allal, 1976, 1981, 1982). As these authors were expounding the principle of model symmetry, practitioners on both sides of the Atlantic were independently putting it into operation (e.g., Cohen & Johnson, 1982; Gillmore, Kane, & Naccarato, 1978; Johnson & Bell, 1985; Kane & Brennan, 1977).
Relative item difficulty, the mastery levels characterizing different degrees of competence, the measurement error associated with estimates of population attainment, the progress recorded between one stage and another within an educational program, the relative effectiveness of teaching methods, are all examples of G theory applications that focus on something other than the differentiation of individuals. To facilitate an extension to the theory, calculation algorithms had to be modified or even newly developed. Jean Cardinet and Yvan Tourneur (1985), whose book on G theory remains an essential reference in the French-speaking world, undertook this task. We explicitly place ourselves in the perspective adopted by these researchers.

An example to illustrate the methodology

The example

It will be useful at this point to introduce an example application to illustrate how G theory extends classical test theory, and in particular how the principle of symmetry enriches its scope. Let us suppose that a research study is planned to compare the levels of subject interest among students taught mathematics by one or the other of two different teaching methods, Method A and Method B. Five classes have been following Method A and five others Method B. A 10-item questionnaire is used to gather the research data. This presents students with statements of the following type about mathematics learning:
• Mathematics is a dry and boring subject
• During mathematics lessons I like doing the exercises given to us in class
and invites them to express their degree of agreement with each statement, using a 4-point Likert scale. Students’ responses are coded numerically from 1 (strongly agree) to 4 (strongly disagree), and where necessary score scales are transposed, so that in every case low scores indicate low levels of mathematics interest and high scores indicate high levels of mathematics interest. There are then two possibilities for summarizing students’ responses to the 10-item questionnaire: we can sum students’ scores across the 10 items to produce total scores on a 10–40 scale (10 items, each with a score between 1 and 4), or we can average students’ scores over the 10 items to produce average scores on a 1–4 scale, the original scale used for each individual item. If we adopt the second of these alternatives, then student scores higher than 2.5, the middle of the scale, indicate positive levels of interest, while scores below 2.5 indicate negative levels of interest; the closer the score is to 4, the higher the student’s general mathematics interest level, and the closer the score is to 1 the lower the student’s general mathematics interest level.

Which reliability for what type of measurement?

As we have already mentioned, the aim of the research study is to compare two mathematics teaching methods, in terms of students’ subject interest. But before we attempt the comparison we would probably be interested in exploring how ā€œfit for purposeā€ the questionnaire was in providing measures of the mathematics interest of individual students. Of all the numerous indicators of score reliability developed by different individuals prior to 1960, Cronbach’s α coefficienti (Cronbach, 1951) remains the best known and most used (Hogan, Benjamin, & Brezinski, 2000). The α coefficient was conceived to indicate the ability of a test to differentiate among individuals on the basis of their responses to a set of test items, or of their behavior within a set of situations. It tells us the extent to which an individual’s position within a score distribution remains stable across items. α coefficients take values between 0 and 1; the higher the value, the more reliable the scores. The α value in this case is 0.84. Since α values of at least 0.80 are conventionally considered to be acceptable, we could conclude that the questionnaire was of sufficient technical quality for placing students relative to one another on the scale of measurement. This is correct. But in terms of what we are trying to do here—to obtain a reliable measure of average mathematics interest levels for each of the two teaching methods—does a measure of internal consistency, which is what the α coefficient is, really give us the information we need about score reliability (or score precision)?
We refer to the Joint Standards again:
… when an instrument is used to make group judgments, reliability data must bear directly on the interpretations specific to groups. Standard errors appropriate to individual scores are not appropriate measures of the precision of group averages. A more appropriate statistic is the standard error of the observed score means. Generalizability theory can provide more refined indices when the sources of measurement are numerous and complex. (AERA, 1999, p. 30)
In fact, the precision, or rather the imprecision, of the measure used in this example depends in great part on the degree of heterogeneity among the students following each teaching method: the more heterogeneity there is the greater is the contribution of the ā€œstudents effectā€ to measurement error. This is in contrast with the classical test theory situation where the greater the variance among students the higher is the ā€œtrue scoreā€ variance and consequently the higher is the α value. Within-method student variability is a source of measurement error that should not be ignored. Moreover, other factors should equally be taken into consideration: in particular, variability among the classes (within methods), variability among the items, in terms of their overall mean scores, as well as any interactions that might exist between teaching methods and items, between students (within classes) and items, and between classes and items.

How does G theory help us?

As we will show, G theory is exactly the right approach to use for this type of application. It is sufficient to consider the two teaching methods as the objects of measurement and the other elements that enter into the study (items, students, and classes) as components in the measurement procedure, ā€œconditions of measurement,ā€ potentially contributing to measurement error. In place of the α coefficient we calculate an alternative reliability indicator, a generalizability coefficient (G coefficient). Like the α coefficient, G coefficients are variance ratios. They indicate the proportion of total score variance that can be attributed to ā€œtrueā€ (or ā€œuniverseā€) score variance, which in this case is inter-method variation, and equivalently the proportion of variance that is attributable to measurement error. Also like α, G coefficients take values between 0 (completely unreliable measurement) and 1 (perfectly reliable measurement), with 0.80 conventionally accepted as a minimum value for scores to be considered acceptably reliable. The essential difference between measurement error as conceived in the α coefficient and measurement error as conceived in a more complex G coefficient is that in the former case measurement error is attributable to one single source of variance, the student-by-item interaction (inconsistent performances of individual students over the items in the test), whereas in the latter case multiple sources of error variance are acknowledged and accommodated.
A G coefficient of relative measurement indicates how well a measurement procedure has differentiated among objects of study, in effect how well the procedure has ranked objects on a measuring scale, where the objects concerned might be students, patients, teaching methods, training programs, or whatever. This is also what the α coefficient does, but in a narrower sense. A G coefficient of absolute measurement indicates how well a measurement procedure has located objects of study on a scale, irrespective of where fellow objects are placed. Typically, ā€œabsoluteā€ coefficients have lower values than ā€œrelativeā€ coefficients, because in absolute measurement there are more potential sources of error variance at play. In this example, with 15 students representing each of the five classes, the relative and absolute G coefficients are 0.78 and 0.70, respectively (see Chapter 3 for details). This indicates that, despite the high α value for individual student measurement, the comparative study was not capable of providing an acceptably precise measure of the difference in effectiveness of the two teaching methods in terms of students’ mathematics interest.
In this type of situation, a plausible explanation for low reliability can sometimes be that the observed difference between the measured means is particularly small. This is not the case here, though: the means for the two teaching methods (A and B) were, respectively, 2.74 and 2.38 (a difference of 0.36) on the 1–4 scale. The inadequate values of the G coefficients result, rather, from the extremely influential effect of measurement error, attributable to the random selection of small numbers of attitude items, students, and classes, along with a relatively high interaction effect between teaching methods and items (again, Chapter 3 provides the details).
Standard errors of measurement,ii for relative and for absolute measurement, can be calculated and used in the usual way to produce confidence intervalsiii (but note that adjustments are sometimes necessary, as explained in Chapters 2 and 3). In this example, the adjusted standard errors are equal to 0.10 and 0.11, respectively, when the mean results of the two teaching methods are compared. Thus a band of approximately two standard errors (more specifically 1.96 standard errors, under Normal distribution assumptions) around each mean would have a width of approximately ±0.20 for relative measurement and ±0.22 for absolute measurement. As a result, the confidence intervals around the method means would overlap, confirming that the measurement errors tend to blur the true method effects.

Optimizing measurement precision

Arguably the most important contribution of the G theory methodology, and the most useful for practitioners wanting to understand how well their measuring procedures work, is the way that it quantifies the relative contributions of different factors and their interactions to the error affecting measurement precision. G coefficients are calculated using exactly this information. But the same information can also be used to explore ways of improving measurement precision in a future application. In the example presented here, the principal sources of measurement error were found to be inter-item variation, inter-class (within method) variation, and inter-student (within class) variation. The interaction effect between methods and items also played an important role. Clearly, the quality of measurement would be improved if these major contributions to measurement error could be reduced in some way. A very general, but often efficient, strategy to achieve this is to use larger samples of component elements in a future application, th...

Table of contents

  1. Cover
  2. Halftitle
  3. Title
  4. Copyright
  5. Contents
  6. Software Notice
  7. Foreword
  8. Preface
  9. Acknowledgments
  10. 1. What is generalizability theory?
  11. 2. Generalizability theory: Concepts and principles
  12. 3. Using EduG: The generalizability theory software
  13. 4. Applications to the behavioral and social sciences
  14. 5. Practice exercises
  15. 6. Current developments and future possibilities
  16. appendix A. Introduction to the analysis of variance
  17. appendix B. Sums of squares for unbalanced nested designs
  18. appendix C. Coef_G as a link between ρ2 and ω2
  19. appendix D. Confidence intervals for a mean or difference in means
  20. Key terms
  21. References
  22. Author Index
  23. Subject Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Applying Generalizability Theory using EduG by Jean Cardinet,Sandra Johnson,Gianreto Pini in PDF and/or ePUB format, as well as other popular books in Psychology & History & Theory in Psychology. We have over one million books available in our catalogue for you to explore.