Mathematics
Two Categorical Variables
Two categorical variables refer to two variables that are both categorical in nature, meaning they represent distinct categories or groups. In statistical analysis, these variables are often used to examine relationships and associations between different categories. This can be done through methods such as contingency tables and chi-square tests to determine if there is a significant relationship between the two variables.
Written by Perlego with AI-assistance
Related key terms
1 of 5
7 Key excerpts on "Two Categorical Variables"
- Available until 25 Jan |Learn more
- Peter Sprent, Nigel C. Smeeton(Authors)
- 2016(Publication Date)
- Chapman and Hall/CRC(Publisher)
CHAPTER 12 CATEGORICAL DATA 12.1 Categories and Counts Data may consist of counts of the number of units (people, institutions, towns, countries, items, etc.) with given attributes. It is often convenient to present these in one-, two-, three-or higher-dimensional tables usually referred to as one-way, two-way, three-way, etc., contingency tables . Each dimension, or way, corresponds to a classification into categories representing attributes. Attributes may be explanatory (e.g., dose levels of a drug, the names of several different drugs, gender, psychiatric diagnoses, ethnic groups, income levels). Alternatively, they may be responses (e.g., side-effects of drugs class-ified as none, slight, moderate, severe, blood pressure levels after adminis-tration of a drug, examination grades). Attributes are often qualitative. If there is no natural ordering they are described as nominal (e.g., psychiatric diagnoses, ethnic groups). Attributes that may be arranged in a natural order are described as ordinal (e.g., reactions to a drug classified as slight, moderate, severe; grouping by age under 50 and age 50 and over). This and the following chapter deal with problems that at first sight appear different from any previously considered, yet many are solved using procedures we have already developed. The link is possible because we can re-express many problems met earlier in an equivalent contingency table format. This chapter is mainly about two-way tables consisting of two or more rows and columns. Typically, each row represents either a level of an explanatory attribute or a level of response, and each column represents a level of response. Table 12.1 has two rows and five columns. We may want to know whether the data indicate that the incidence rate of side-effects differs between drugs. The null hypothesis of independence is often expressed as one of no association between row and column categories . - eBook - ePub
- Michel Jambu(Author)
- 1991(Publication Date)
- Academic Press(Publisher)
Chapter 42-D Statistical Data Analysis
1 Introduction
In practice, many users stop their statistical investigations after having studied the variables independently from each other. However, they have used only 1-D analysis, and usually cannot put forward any explanations of any causality for their data. For example, a questionnaire with two questions can be analyzed using two frequency distributions. However, studying each frequency distribution individually cannot provide any relation between the two questions. Another example is given by the study of two quantitative variables, for which as many statistical characteristics or graphics as required can be built (cf . Chapter 3 ). They cannot help, however, to explain the relation between the two variables. The only way to approach the explanation of how one variable is related to another is to build a relation between the two variables. That is the objective of 2-D statistical data analysis, where two variables are analyzed according to the following points of view:1. To express and highlight the relationship between two variables, in order to show the statistical dependence between them.2. When possible, to sum up the relations by a law of variation or a statistical dependence, and to characterize them by a numerical coefficient independent of the units of measure of the variables.These studies vary according to the type of variables involved (quantitative, categorical, chronological, logical, etc.), and are presented in what follows.2 2-D Analysis of Two Categorical Variables
2.1 Contigency Data Sets
The way to express a relation between Two Categorical Variables is to compute a contingency data set as follows: Let Two Categorical Variables be denoted by V 1 and V 2 :V 1 , has h forms denoted by A 1 A 2 , …, Ah;V 2 has k forms denoted by B 1 , …, Bk .For each couple of forms (Ai,Bj), we compute the number of observations, denoted bynij, that possesses the forms A, andBj - eBook - ePub
Introductory Statistics and Analytics
A Resampling Perspective
- Peter C. Bruce(Author)
- 2015(Publication Date)
- Wiley(Publisher)
Chapter 5 Relationship Between Two Categorical VariablesIn this chapter, we look at two-way tables, also called 2 × 2 tables, in which rows and columns represent binary values of two different variables. 2 × 2 tables are a subset of r × c tables (short for row × column), where the row and columns represent more than two values of their variables. After completing this chapter, you should be able to- build and interpret 2 × 2 tables,
- specify how to do a resampling test for a difference between two proportions,
- perform probability calculations involving conditional probabilities,
- perform basic Bayesian calculations
- define and test for statistical independence
5.1 Two-Way Tables
We now return to the data previously mentioned on admission to graduate schools. The data are for the six largest academic departments, and the issue under consideration was admission rates for men and women. We begin with the Two Categorical Variables, Gender and Admit. As before, we look at eight folks in a fragment of the database (Table 5.1 ).Applicants to Graduate School (Small Subset)Table 5.1Gender Dept. Admit Male A Admitted Male B Rejected Male A Admitted Female C Rejected Male A Admitted Female B Rejected Male C Admitted Female B Admitted Ignoring the department variable for now, the first person is a male who was admitted, so he goes in Table 5.2 .Building a 2 × 2 TableTable 5.2Female Male Admitted 1 Rejected Then, we have a rejected male, another admitted male, and a rejected female. We will enter these data as counts in each cell (Table 5.3 ).Table 5.3Female Male Admitted 2 Rejected 1 1 Finishing the table and adding row and column totals gives results that certainly look discriminatory (Table 5.4 ). However, these are only eight cases out of thousands. Table 5.5 is the full table for all 4526 applicants. Table 5.6 gives the data by percent. The column and row labeled “All” are termed marginal - eBook - PDF
Statistics with Confidence
An Introduction for Psychologists
- Michael J Smithson(Author)
- 1999(Publication Date)
- SAGE Publications Ltd(Publisher)
CONTENTS Tabulations and contingency tables 293 Working with one categorical variable 299 Working with Two Categorical Variables 308 Chi-square, effect size, and association 313 Using percentages and proportions to interpret associations 317 The odds ratio as a measure of association 325 Questions and exercises 331 Tabulations and contingency tables Predicting categorical variables involves working with tables, and tables (or tabulations) are among the most widely used ways of displaying data. In Chapter 3, we saw that categorical variables could be described quite well with frequency distributions presented in a tabular format. The frequency distribution of a variable presents the categories of that variable and the frequency with which each of them occurs. Usually, percentages are also provided in such tables. Here is an example. A medical researcher is interested in outcomes and survival rates of female patients who have had tumors in their breasts. She has obtained data on this kind of patient from a large-scale survey of hospitals in a region where the risk of breast cancer is reputed to be high. Suppose she has decided to categorize patient outcomes in three categories: benign, malig-nant/survived, and malignant/died. Then she might tabulate the data in a frequency distribution table like Table 9.1. Now, suppose the medical researcher wonders whether hospitals in an urban center have a similar outcome distribution to those in a rural hinterland where 293 Predicting Categorical Variables: Contingency Tables and Chi-Square 9 - eBook - PDF
Statistics
Learning from Data
- Roxy Peck, Tom Short(Authors)
- 2018(Publication Date)
- Cengage Learning EMEA(Publisher)
In this type of bivariate cat-egorical data set, only the grand total (the sample size) is known before the data are collected. Two-way tables are also used when data are collected to compare two or more popula-tions or treatments on the basis of a single categorical variable. In this situation, independent samples are selected from each population or treatment. For example, data could be collected at a university to compare students, faculty, and staff on the basis of primary mode of trans-portation to campus (car, bicycle, motorcycle, bus, or by foot). One random sample of 200 students, another random sample of 100 faculty members, and a third random sample of 150 staff members might be chosen. The selected individuals could be interviewed to obtain the necessary transportation information. Data from such a study could be summarized in a 3 3 5 two-way frequency table with row categories of student, faculty, and staff and column cat-egories corresponding to the five possible modes of transportation. The observed cell counts Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. CHAPTER 14 Learning from Categorical Data 714 could then be used to learn about differences and similarities among the three groups. In this situation, one set of marginal totals (the sample sizes for the different groups) is known before the data are collected. In the 3 3 5 situation just discussed, the row totals would be known (200, 100, and 150). - eBook - PDF
- Trudy A. Watt, Robin H. McCleery, Tom Hart(Authors)
- 2007(Publication Date)
- Chapman and Hall/CRC(Publisher)
147 8 Relating One Variable to Another Statistics have shown that mortality increases perceptibly in the military during wartime. —Robert Boynton In the previous three chapters we have been concerned with the relationship between a single continuous variable and one or more categorical variables. Thus, in Chapter 5 we asked whether the number of spiders in a quadrat (continuous) differed depending on whether we sowed wildflower seed (categorical). In this chapter we consider what to do when we have two quantitative variables that we think might be related to one another. 8.1 Correlation The simplest question we could ask about two continuous variables is whether they vary in a related way, i.e., is there a correlation between them? For example, the concentration (ppm) of two chemicals in the blood might be measured from a random sample of 14 patients suffering to various extents from a particular disease. If a consequence of the disease is that both chemicals are affected, we should expect patients with high values of one to have high values of the other and vice versa . Table 8.1 shows the concentra-tions of chemical A and of chemical B in the blood of 14 such patients. The data are shown as a scatter plot in Figure 8.1a. For comparison, Figure 8.1b shows the same data but with the column for B scrambled into a random order. In the graph of the “real” relationship, you can see that generally low concentrations of A tend to be associated with low concentrations of B, giving a “bottom left to top right” look to the graph. If we break up the relationship between each patients’ A and B concentration by randomizing column B, then the pattern disappears (Figure 8.1b). How do we characterize this relationship? Bearing in mind that what we are claiming is that relatively large concentrations of A are associated with - eBook - PDF
Statistics with JMP
Graphs, Descriptive Statistics and Probability
- Peter Goos, David Meintrup(Authors)
- 2015(Publication Date)
- Wiley(Publisher)
The variable “Name” is a nominal variable. The variables “Rating” and “Price category” are ordinal variables. The other variables are quan- titative. The measurement scale of a variable can be changed in JMP by a right-click on the name of a column, and then selecting “Column info”. In this chapter, we will mainly treat so-called univariate and bivariate representa- tions of variables. A univariate representation refers to one variable, while a bivariate representation refers to two variables simultaneously. Likewise, multivariate data is nothing but data consisting of several variables. In the remainder of the chapter, Figure 2.1 Part of the data matrix on Spanish red wines. 12 STATISTICS WITH JMP Figure 2.2 Symbols used by JMP for the different measurement scales. we assume that we have a data sample. However, the various representations that we will address may also be used for data of entire populations. 2.3 Representing univariate qualitative variables Categorical or qualitative variables allow us to put data into categories or classes. The absolute frequency, or simply the frequency, of a class is the number of elements of the sample that belong to that class. The relative frequency of a class is the ratio of the frequency and the total number of observations in the sample. Example 2.3.1 The data set described here on Spanish wines contains the final rating of the wines. The following coding is used: • E: excellent, • G/E: good to excellent, • G: good, • F/G: fair to good, • F: fair, and • P/F: poor to fair. The final rating is clearly a qualitative, ordinal variable. The absolute and relative frequencies for each class are shown in Table 2.1, which is called a frequency table. The same information can also be presented using a bar chart. Figure 2.3 shows two versions of a bar chart, which have exactly the same shape. The bar chart in Figure 2.3a shows the absolute frequencies, while that in Figure 2.3b displays the relative frequencies.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.






