Mathematics
Categorical Variables
Categorical variables are a type of qualitative data that represent categories or groups. They can take on a limited, fixed number of values and are often used to label or classify data. In statistical analysis, categorical variables are used to organize and group data for comparison and analysis.
Written by Perlego with AI-assistance
Related key terms
1 of 5
8 Key excerpts on "Categorical Variables"
- No longer available |Learn more
- Jessica Utts, Robert Heckard(Authors)
- 2015(Publication Date)
- Cengage Learning EMEA(Publisher)
2.2 Types of Variables We learned in the previous section that a variable is a characteristic that may differ from one individual to the next. A variable may be a categorical characteristic, such as a person’s sex, or a numerical characteristic, such as hours of sleep last night. To deter-mine what type of summary might provide meaningful information, you first have to recognize which type of variable you want to summarize. For a categorical variable, the raw data consist of group or category names that don’t necessarily have any logical ordering. Each individual falls into one and only one category. For a categorical variable, the most fundamental summaries are how many individuals and what percent of the total fall into each category. The term ordinal variable may be used to describe the data when a categorical variable has ordered categories. For example, suppose that you are asked to rate your driving skills compared to the skills of other drivers, using the codes 1 better, 2 the same, and 3 worse. The response is an ordinal variable because the response catego-ries are ordered perceptions of driving skills. Following are a few examples of Categorical Variables and their possible categories. The final variable in the list, the rating of a teacher, is ordinal because the response categories convey an ordering. Categorical Variable Possible Categories Dominant hand Left-handed, Right-handed, Ambidextrous Regular church attendance Yes, No Opinion about marijuana legalization In favor, Opposed, Not sure Eye color Brown, Blue, Green, Hazel, Other Teacher Rating Scale of 1 to 7, 1 Poor, 7 Excellent Copyright 2014 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. - Paul M. Kellstedt, Guy D. Whitten(Authors)
- 2008(Publication Date)
- Cambridge University Press(Publisher)
To help you to better understand each of these variable types, we will go through each with an example. All of the examples that we are using in these initial descriptions come from survey research, but the same basic principles of measurement metric hold regardless of the type of data being analyzed. 6.2.1 Categorical Variables Categorical Variables are variables for which cases have values that are ei- ther different or the same as the values for other cases, but about which we cannot make any universally holding ranking distinctions. If we think con- sider a variable that we might label “Religious Identification,” some values for this variable are “Catholic,” “Muslim,” “nonreligious,” and so on. Al- though these values are clearly different from each other, we cannot make universally holding ranking distinctions across them. More casually, with Categorical Variables like this one, it is not possible to rank order the cate- gories from least to greatest: The value “Muslim” is neither greater nor less than “nonreligious” (and so on), for example. Instead, we are left knowing that cases with the same value for this variable are the same, whereas those cases with different values are different. The term “categorical” expresses the essence of this variable type; we can put individual cases into categories based on their values, but we cannot go any further in terms of ranking or otherwise ordering these values. 6.2.2 Ordinal Variables Like Categorical Variables, ordinal variables are also variables for which cases have values that are either different or the same as the values for other cases. The distinction between ordinal and Categorical Variables is that we can make universally holding ranking distinctions across the variable values for ordinal variables. For instance, consider the variable labeled “Retrospective Family Financial Situation” that has commonly been used as an independent variable in individual-level economic voting studies.- Hamed Barjesteh, Shaghyegh Shirzad(Authors)
- 2020(Publication Date)
- Society Publishing(Publisher)
Furthermore, scores from these assessments are used by a wide variety of tertiary institutions in many different countries to make admissions decisions, so that the results of a needs analysis can provide only general guidelines for defining the constructs to b assessed. Tests that are based on a theory of language ability are commonly referred to as “proficiency” tests. Key Concepts in Applied Linguistics: A Reference Guide 75 Categorical Variables vs. Numerical Variables 1. Categorical Variables: When researchers classify subjects by sorting them into mutually exclusive groups, the attribute on which they base the classification is termed categorical variable. Home language, county of residence, father’s principal occupation, and school in which enrolled are examples of Categorical Variables. Other examples of Categorical Variables include: ‘ gender’ (male, female), ‘eye color’ (blue, green, brown), ‘marital status’ (married, single), ‘nationality’ (Iranian, Italian, Russian, American), ‘self-esteem’ (high, medium, low), ‘proficiency level’ (high, intermediate, beginner). i. Dichotomous/Binary Variables: The simplest type of categorical variable has only two mutually exclusive classes and is called a dichotomous variable. Male-female, citizen-alien, and pass-fail are dichotomous variables. ii. Polytomous Variables: Some Categorical Variables have more than two classes (hence called polytomous/polychotomous variables); examples are educational level, religious affiliation, and state of birth. 2. Numerical Variables: i. Discrete Variables: These are numerical variables that represent quantities that are counted, that do not allow for decimal points (e.g., you can’t have 3.5 students in your class; coins don’t come in amounts of 2.3 coins), and that they can only take on a finite number of values. As a guide, discrete numerical variables arise when we ask the question “How many?” ii.- Paul M. Kellstedt, Guy D. Whitten(Authors)
- 2018(Publication Date)
- Cambridge University Press(Publisher)
2 Many researchers will present this information in an appendix (often made available online) unless there is something particularly noteworthy about the characteristics of one or more of their variables. 6.2 What Is the Variable’s Measurement Metric? 127 variable. Remember from Chapter 1 that we can think of each variable in terms of its label and its values. The label is the description of the variable – such as “Gender of survey respondent” – and its values are the denominations in which the variable occurs – such as “Male” or “Female.” For treatment in most statistical analyses, we are forced to divide our variables into two types according to the metric in which the values of the variable occur: categorical or continuous. In reality, variables come in at least three different metric types, and there are a lot of variables that do not neatly fit into just one of these classifications. To help you to better understand each of these variable types, we will go through each with an example. All of the examples that we are using in these initial descriptions come from survey research, but the same basic principles of measurement metric hold regardless of the type of data being analyzed. 6.2.1 Categorical Variables Categorical Variables are variables for which cases have values that are either different from or the same as the values for other cases, but about which we cannot make any universally holding ranking distinctions. If we consider a variable that we might label “Religious Identification,” some values for this variable are “Catholic,” “Muslim,” “nonreligious,” and so on. Although these values are clearly different from each other, we cannot make universally holding ranking distinctions across them. More casually, with Categorical Variables like this one, it is not possible to rank order the categories from least to greatest: The value “Muslim” is neither greater nor less than “nonreligious” (and so on), for example.- eBook - PDF
Statistics with JMP
Graphs, Descriptive Statistics and Probability
- Peter Goos, David Meintrup(Authors)
- 2015(Publication Date)
- Wiley(Publisher)
2 Data and its representation A microphone in the sidewalk would provide an eavesdropper with a cacophony of clocks, seemingly random like the noise from a Geiger counter. But the right kind of per- son could abstract signal from noise and count the pedestrians, provide a male/female breakdown and a leg-length histogram … (from Cryptonomicon, Neal Stephenson, p. 147) Data is a set of measurements of one or more characteristics or variables of some elements of a population, or of a number of objects generated by a process. Different types of variables can be measured. 2.1 Types of data and measurement scales Variables are classified according to the measurement scale on which they are mea- sured. Categorical or qualitative variables are measured on a nominal scale or on an ordinal scale. Quantitative variables are either measured on an interval scale or on a ratio scale. 2.1.1 Categorical or qualitative variables 2.1.1.1 Nominal variables Elements of a sample or a population can be classified using a nominal variable: the value of the variable places an element in a certain class or category. Examples of such variables are • gender (male/female), • nationality (Belgian, German, and so on), Statistics with JMP: Graphs, Descriptive Statistics, and Probability, First Edition. Peter Goos and David Meintrup. © 2015 John Wiley & Sons, Ltd. Published 2015 by John Wiley & Sons, Ltd. Companion Website: wiley.com/go/goosandmeintrup DATA AND ITS REPRESENTATION 9 • religion (Catholic, Protestant, and so on), and • whether or not one owns a car (yes/no). Sometimes it can be useful to assign labels, code numbers, or code letters, to the different classes or categories. For example, a Belgian person may be assigned the code “1”, a Dutch person the code “2”, a French person the code “3”, and a German person the code “4”. It is important to note that these figures do not imply any order and/or quantity. - Available until 30 Nov |Learn more
- Samprit Chatterjee, Ali S. Hadi(Authors)
- 2015(Publication Date)
- Wiley(Publisher)
CHAPTER 5 QUALITATIVE VARIABLES AS PREDICTORS 5.1 INTRODUCTION Qualitative or Categorical Variables can be very useful as predictor variables in regression analysis. Qualitative variables such as gender, marital status, or political affiliation can be represented by indicator or dummy variables. These variables take on only two values, usually 0 and 1. The two values signify that the observation belongs to one of two possible categories. The numerical values of indicator variables are not intended to reflect a quantitative ordering of the categories, but only serve to identify category or class membership. For example, an analysis of salaries earned by computer programmers may include variables such as education, years of experience, and gender as predictor variables. The gender variable could be quantified, say, as 1 for female and 0 for male. Indicator variables can also be used in a regression equation to distinguish among three or more groups as well as among classifications across various types of groups. For example, the regression described above may also include an indicator variable to distinguish whether the observation was for a systems or applications programmer. The four conditions determined by gender and type of programming can be represented by combining the two variables, as we shall see in this chapter. Regression Analysis by Example, Fifth Edition. By Samprit Chatterjee and Ali S. Hadi Copyright © 2012 John Wiley & Sons, Inc. 129 - Available until 25 Jan |Learn more
- Peter Sprent, Nigel C. Smeeton(Authors)
- 2016(Publication Date)
- Chapman and Hall/CRC(Publisher)
CHAPTER 12 CATEGORICAL DATA 12.1 Categories and Counts Data may consist of counts of the number of units (people, institutions, towns, countries, items, etc.) with given attributes. It is often convenient to present these in one-, two-, three-or higher-dimensional tables usually referred to as one-way, two-way, three-way, etc., contingency tables . Each dimension, or way, corresponds to a classification into categories representing attributes. Attributes may be explanatory (e.g., dose levels of a drug, the names of several different drugs, gender, psychiatric diagnoses, ethnic groups, income levels). Alternatively, they may be responses (e.g., side-effects of drugs class-ified as none, slight, moderate, severe, blood pressure levels after adminis-tration of a drug, examination grades). Attributes are often qualitative. If there is no natural ordering they are described as nominal (e.g., psychiatric diagnoses, ethnic groups). Attributes that may be arranged in a natural order are described as ordinal (e.g., reactions to a drug classified as slight, moderate, severe; grouping by age under 50 and age 50 and over). This and the following chapter deal with problems that at first sight appear different from any previously considered, yet many are solved using procedures we have already developed. The link is possible because we can re-express many problems met earlier in an equivalent contingency table format. This chapter is mainly about two-way tables consisting of two or more rows and columns. Typically, each row represents either a level of an explanatory attribute or a level of response, and each column represents a level of response. Table 12.1 has two rows and five columns. We may want to know whether the data indicate that the incidence rate of side-effects differs between drugs. The null hypothesis of independence is often expressed as one of no association between row and column categories . - No longer available |Learn more
- Julian J. Faraway(Author)
- 2016(Publication Date)
- Chapman and Hall/CRC(Publisher)
Chapter 14 Categorical Predictors Predictors that are qualitative in nature, for example, eye color, are sometimes de-scribed as categorical or called factors . The different categories of a factor variable are called levels. For example, suppose we recognize eye colors of “blue”, “green”, “brown” and “hazel”, then we would say eye color is a factor with four levels. We wish to incorporate these predictors into the regression analysis. We start with the example of a factor with just two levels, then show how to introduce quantitative predictors into the model and end with an example using a factor with more than two levels. 14.1 A Two-Level Factor The data for this example come from a study of the effects of childhood sexual abuse on adult females reported in Rodriguez et al. (1997): 45 women treated at a clinic, who reported childhood sexual abuse ( csa ), were measured for post-traumatic stress disorder ( ptsd ) and childhood physical abuse ( cpa ) both on standardized scales. Thirty-one women treated at the same clinic, who did not report childhood sexual abuse, were also measured. The full study was more complex than reported here and so readers interested in the subject matter should refer to the original article. We take a look at the data and produce a summary subsetted by csa : > data(sexab,package=faraway) > sexab cpa ptsd csa 1 2.04786 9.71365 Abused 2 0.83895 6.16933 Abused ..... 75 2.85253 6.84304 NotAbused 76 0.81138 7.12918 NotAbused > by(sexab,sexab $ csa,summary) sexab$csa: Abused cpa ptsd csa Min. :-1.11 Min. : 5.98 Abused :45 1st Qu.: 1.41 1st Qu.: 9.37 NotAbused: 0 Median : 2.63 Median :11.31 Mean : 3.08 Mean :11.94 3rd Qu.: 4.32 3rd Qu.:14.90 Max. : 8.65 Max. :18.99 ----------------------------------------------sexab$csa: NotAbused cpa ptsd csa Min. :-3.12 Min. :-3.35 Abused : 0 205 206 CATEGORICAL PREDICTORS 1st Qu.:-0.23 1st Qu.: 3.54 NotAbused :31 Median : 1.32 Median : 5.79 Mean : 1.31 Mean : 4.70 3rd Qu.: 2.83 3rd Qu.: 6.84 Max.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.







