Why Ordinal Methods?
ORDINAL METHODS MAKE SENSE
Statistical and psychometric methods that make use of only the ordinal information provided by the data enjoyed a brief flurry of popularity during the middle of the current century, but their use has decreased substantially since then, at least on a proportionate basis. In the present chapter, as well as at other points in this book, it is argued that this decline has been counterproductive from the point of view of maximizing the overall effectiveness of empirical research and application.
The reasons for this assertion are grouped into three categories. First, it will be argued that much of the data in behavioral research has only ordinal justification. This makes ordinal methods preferable because of the possibility that conclusions from a metric analysis of ordinal data could be changed, even reversed, under ordinal transformation of the data, whereas ordinally based conclusions will not. Second, it is argued that in a substantial fraction of applications the questions that are to be answered by the data are themselves ordinal. This makes ordinal methods preferable because they answer the ordinal research question more directly than does a metric analysis of the same data. The third category of motive, which is perhaps the one that has traditionally been most often cited, is that ordinal methods have greater statistical robustness. This is true both in the inferential sense of providing generalizations from the data that are more likely to be valid than are traditional methods, in the face of the distributional peculiarities that are so common (robustness), and in the more descriptive sense of being less influenced by a small fraction of the observed data (resistance).
OUR DATA ARE ORDINAL
Levels of Scales
An important contribution by psychologists to scientific metatheory, or the philosophy of science, has been in the differentiation of levels of quantification. Although it has roots in the work of others, the formulation by S. S. Stevens that culminated in his chapter on measurement in his book on experimental psychology (Stevens, 1951) provides the complete formulation that forms the basis for most of our current thinking. There, he distinguished four levels of measurement, the familiar nominal, ordinal, interval, and ratio scales.
In these original definitions, the characteristics that differentiated the different types of scales were the transformations that were “admissible” for each—that is, how the scales could be changed without the loss of important information. In the case of ratio scales, epitomized by the familiar physical scales of length and mass, the only admissible changing transformation is multiplication by a positive constant, such as feet to meters or inches, pounds to kilograms, and the like. If x is one version of the variable and y another, then the only admissible relation is multiplication by some conversion constant a such that x = ay. Such transformations are sometimes called linear transformations. The zero point of the scale is determined by a rational empirical process.
The interval scale is similar to the ratio scale except that the origin, or zero point, is arbitrary. Given values of a variable on one version of the scale, it is permissible to convert them to another equivalent version by multiplying by one constant and adding a second: x = ay + k for any positive value of a and any value of k. Such transformations are often called affine transformations. The typical example given is the everyday temperature used in weather reports, degrees Fahrenheit or Celsius. Converting one to the other requires two constants: F = 1.8C + 32. The point about the ratio and interval scales is that the choice of which version to use is arbitrary or a matter of convenience, and any other version within the constraints would serve as well, once one got used to it.
Ordinal scales permit much more freedom in the transformation. An ordinal scale can be transformed to a different but equally legitimate version in any way as long as the transformation preserves the order and the distinctions among values. The class of transformations fitting this constraint is called strictly monotonic. Logarithmic, square root, exponential, and a whole host of other mathematical functions are examples of monotonic transformations, but an unending variety of others is possible, including completely unsystematic or wiggly ones, as long as the order of values is preserved. In the current literature on measurement theory, which has become highly technical mathematically, the ratio, interval, and ordinal scales are called one-point, two-point, and many-point scale categories (Narens, 1981) respectively, because of the number of values that need to be fixed in order to define a given version of a scale.
The final category of scales is the nominal. A nominal variable consists of unordered categories, or distinct single entities, and each can be given a name. The name can be a number, but the number has no numerical interpretation. Therefore, any transformation that preserves identification is possible. Such transformations are called one-to-one transformations. Social security numbers are an example of a nominal scale. The numbers could be reassigned without loss of information, albeit to the disruptive confusion of numerous financial records.
Scale types are usually arranged in a hierarchy (an ordinal scale of scales) in terms of the transformations they allow, with ratio scales at the top. The reason is that the transformations are a nested set: every linear transformation is affine; every affine is monotonic; every strictly monotonic is one-to-one, but the subject and predicate in each of the clauses cannot be reversed without finding exceptions. The topic of defining scale types has an extensive literature. See Cliff (1993b) for an introductory discussion, Michel (1990) for a more extensive intermediate one, and Krantz, Luce, Suppes, and Tversky (1971) for a full technical treatment.
What Determines the Level of a Scale?
The preceding paragraphs have given the Stevensian definitions, which say that scale type is defined in terms of whether certain transformations are legitimate and others are not, without going into how it was decided whether a transformation was legitimate. The next step in the intellectual evolution of scale types was the inquiry into how the kind of transformation could be limited. Before turning to that, one can comment on the superficiality of the analogies to physical variables that are often used in examples of interval and ratio scales. Length and time are frequently cited as examples of ratio scales, yet even by the standards of 19th-century physics this would be questionable. “Length,” in itself, is not the relevant variable in even classical physics. The variable is distance, the difference in coordinates of two points, measured in a straight line. The coordinate system has an arbitrary origin (Greenwich, England, the corner of my desk, or the center of the sun) and an arbitrary unit (kilometers, inches, or parsecs). Even in the case of a lever, the “length” of a lever is merely a surrogate for the distance through which a center of mass moves. So there is assumed to be a coordinate system, space, which has no definable zero point, and it is differences with respect to space that are the variable, distance. Similarly, it is not “time” itself that enters into relations, but elapsed time, the difference in the time coordinates. Time itself had no zero point, any more than space did. The clever thing that physicists did was to take these interval scale variables, the space and time coordinates, and use the intervals on them as the variables. The intervals on interval scales behave like ratio scales. Trying to find the origin for space and time is one goal of the modern field of cosmology, which tries to find out when/where space/time begins, but so far seems not to be absolutely sure.
Temperature is a false example of an interval scale because it has been known for more than a century that there was a zero point for this scale, resulting in the Kelvin (K or Absolute) scale of temperature, which allows only linear transformation. The traditional Fahrenheit and Celsius versions are retained in weather reports and cooking directions because the numbers in them are cognitively more comfortable for everyday use, as well as more familiar. (The Fahrenheit scale was devised with weather reporting specifically in mind; 0°F is about as cold as it usually gets in the winter in many temperate parts of the world, and 100°F is about as high as it gets in the summer.) Thus, not only do the frequently cited examples of ratio scales turn out to be the intervals on interval scales, but the prime example of an interval scale is easily converted to a ratio scale.
The important thing, though, is not to quibble about the examples but to accept the fact that different kinds of scales exist and consider what it is that makes one scale an interval scale and another some other kind. What makes a transformation legitimate or illegitimate? Looked at one way, a transformation is legitimate if it does not disturb, or lead to contradictory, empirical relations. A more demanding view is to take the opposite tack and require positive reinforcement in the form of empirical relationships that support a scale’s status in the hierarchy of types before it can be awarded that status. It is this latter view that seems to provide the soundest basis for research and application.
The scale distinction that is most salient to the behavioral sciences is that between ordinal and interval scales. The reasons for this salience include, first, that many statistical procedures assume interval-scale status for the variables. Second, the distinction between interval and ratio scales is rarely important in behavioral science because plausible claimants to the latter status are rare. Finally, there are frequent situations where there is scientific or practical desirability in comparing differences that are at different points on a scale; thus, it would be nice to have interval scales. Since ordinal methods are the focus of this book, and a good deal of the motivation for using them lies in uneasiness about the interval-scale status of many variables, we concentrate on this distinction.
What Makes a Variable an Interval Scale?
The major development in the theory of measurement subsequent to Stevens’ (1951) enumeration of the scale types and their ties to classes of transformations was the elucidation of what the basis was for deciding that a scale merited a certain classification. The landmark paper here was Luce and Tukey (1964), where “conjoint measurement” was introduced. This, and related subsequent work, such as the three-volume Foundations of Measurement (Krantz et al, 1971; Luce, Krantz, Suppes, & Tversky, 1990; Suppes, Krantz, Luce, & Tversky, 1989), is couched in a mathematical framework that many find to be austere and difficult, so its diffusion into the main stream of psychology has been slow (Cliff, 1992), but its content is fundamental to serious thinking about measurement.
At a highly simplified level, the distinction that allows an ordinal variable to achieve interval-scale status is fairly direct. There must be an empirically nontrivial way of demonstrating the equality of differences at different points on the variable. Probably the most striking formulation of conditions under which this would be possible is the original one (Luce & Tukey, 1964), called conjoint measurement. Its first requirement is three variables observed simultaneously (conjointly), at least one of which provides an order. The paradigm most familiar to behavioral scientists that embodies this idea is a fully crossed factorial design, the factors providing two of the variables, and a dependent variable, which is the third. The latter provides an order for the cells of the table. Then a mild-looking constraint on the order of the cells on the dependent variable is necessary for there to be an interval scale of all three. This constraint can be formulated as a single axiom, but it is easier to understand in parts. The first part is that, within any row of the table, the order of the values of the independent variable in the columns is the same in all rows, and similarly for the columns. This amounts in analysis of variance terminology to saying that there is no crossing interaction.
The other part of the constraint is that there must be an additional consistency on the orders. To formulate this, we have to consider each three-by-three subtable, which is a combination of three levels of the row variable, say R0, R1 and R2 and three of the columns, C0, C1 and C2. We are already assuming consistency of row and column orders, as in the previous paragraph, so RiC0 ≤ RiC1 ≤ RiC2 and R0Cj ≤ R1Cj ≤ R2Cj for any row i and column j. Suppose (i) R0C1 ≤ R1C0, which is a symbolic way of saying that the step from R0 to R1 had at least as big an effect as the step from C0 to C1. Suppose also that (ii) R1C2 ≤ R2C1, implying that the step from R1 to R2 was at least as large as the step from C1 to C2. (If both of these changes are not in the same relative size, then this 3×3 does not tell us anything.) The requirement that supports interval status for the variables states that, whenever these two ordering conditions are met, it should also be true that (iii) R0C2 ≤ R2C0. This rather formal statement has a commonsense interpretation. The interpretation is that (i) showed that the first step on R was at least as big as the first step on C, and (ii) showed that the second step on R was at least as big as the second step on C. This means that, if things are behaving nicely, the two steps on R should be at least as big as the two steps on C, as summarized in (ii...