Mathematics
Chi Square Test for Homogeneity
The Chi Square Test for Homogeneity is a statistical test used to determine whether the distribution of categorical variables is the same across different groups or populations. It compares observed frequencies of the categories with the expected frequencies under the assumption of homogeneity. The test helps to assess whether there are significant differences in the distribution of categorical variables among groups.
Written by Perlego with AI-assistance
Related key terms
1 of 5
10 Key excerpts on "Chi Square Test for Homogeneity"
- eBook - PDF
- Roxy Peck, Chris Olsen, , Tom Short, Roxy Peck, Chris Olsen, Tom Short(Authors)
- 2019(Publication Date)
- Cengage Learning EMEA(Publisher)
Communicating the Results of Statistical Analyses Three different chi-square tests were introduced in this chapter—the goodness-of-fit test, the test for homogeneity, and the test for independence. They are used in different settings and to answer different questions. When summarizing the results of a chi-square test, be sure to indicate which chi-square test was performed. One way to do this is to be clear about how the data were collected and the nature of the hypotheses being tested. It is also a good idea to include a table of observed and expected counts in addition to reporting the value of the test statistic and the P -value. And finally, make sure to give a conclusion in context, and that the conclusion is worded appropriately for the type of test Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. CHAPTER 12 The Analysis of Categorical Data and Goodness-of-Fit Tests 680 conducted. For example, don’t use terms such as independence and association to describe the conclusion if the test performed was a test for homogeneity. Interpreting the Results of Statistical Analyses As with the other hypothesis tests considered, it is common to find the result of a chi-square test summarized by giving the value of the chi-square test statistic and an associated P -value. Because categorical data can be summarized compactly in frequency tables, the data are often given in the article (unlike data for numerical variables, which are rarely given). - Bruce M. King, Patrick J. Rosopa, Edward W. Minium(Authors)
- 2018(Publication Date)
- Wiley(Publisher)
21 Chi-Square and Inference about Frequencies When you have finished studying this chapter, you should be able to: • Understand that the chi-square test is used to test hypotheses about the number of cases falling into the categories of a frequency distribution; • Understand that 2 provides a measure of the difference between observed frequencies and the frequencies that would be expected if the null hypothesis were true; • Explain why the chi-square test is best viewed as a test about proportions; • Compute 2 for one-variable goodness-of-fit problems; • Compute 2 to test for independence between two variables; and • Compute effect size for the chi-square test. In previous chapters, we have been concerned with numerical scores and testing hypotheses about the mean or the correlation coefficient. In this chapter, you will learn to make inferences about frequencies—the number of cases falling into the categories of a frequency distribution. For example, among four brands of soft drinks, is there a difference in the proportion of consumers who prefer the taste of each? Is there a difference among registered voters in their preference for three candidates running for local office? To answer questions like these, a researcher com- pares the observed (sample) frequencies for the several categories of the distribution with those frequencies expected according to his or her hypothesis. The difference between observed and expected frequencies is expressed in terms of a statistic named chi-square ( 2 ), introduced by Karl Pearson in 1900. 21.1 The Chi-Square Test for Goodness of Fit The chi-square (pronounced “ki”) test was developed for categorical data; that is, for data com- categorical data data comprising quali- tative categories prising qualitative categories, such as eye color, gender, or political affiliation. Although the chi-square test is conducted in terms of frequencies, it is best viewed conceptually as a test about proportions.- eBook - PDF
- Barbara Illowsky, Susan Dean(Authors)
- 2020(Publication Date)
- Openstax(Publisher)
The test is right-tailed. Each observation or cell category must have an expected value of at least five. 11.5 Comparison of the Chi-Square Tests The goodness-of-fit test is typically used to determine if data fits a particular distribution. The test of independence makes use of a contingency table to determine the independence of two factors. The test for homogeneity determines whether two populations come from the same distribution, even if this distribution is unknown. 11.6 Test of a Single Variance To test variability, use the chi-square test of a single variance. The test may be left-, right-, or two-tailed, and its hypotheses are always expressed in terms of the variance or standard deviation. FORMULA REVIEW 11.1 Facts About the Chi-Square Distribution χ 2 = (Z 1 ) 2 + (Z 2 ) 2 + . . . (Z df ) 2 chi-square distribution random variable μ χ 2 = df chi-square distribution population mean Chapter 11 | The Chi-Square Distribution 667 σ χ 2 = 2 ⎛ ⎝d f ⎞ ⎠ chi-square distribution population standard deviation 11.2 Goodness-of-Fit Test ∑ k (O − E) 2 E goodness-of-fit test statistic where O: observed values E: expected values k: number of different data cells or categories df = k − 1 degrees of freedom 11.3 Test of Independence Test of Independence • The number of degrees of freedom is equal to (number of columns–1)(number of rows–1). • The test statistic is Σ (i ⋅ j) (O – E) 2 E where O = observed values, E = expected values, i = the number of rows in the table, and j = the number of columns in the table. • If the null hypothesis is true, the expected number E = (row total)(column total) total surveyed . - eBook - PDF
- Barbara Illowsky, Susan Dean(Authors)
- 2016(Publication Date)
- Openstax(Publisher)
The test is right-tailed. Each observation or cell category must have an expected value of at least five. 11.5 Comparison of the Chi-Square Tests The goodness-of-fit test is typically used to determine if data fits a particular distribution. The test of independence makes use of a contingency table to determine the independence of two factors. The test for homogeneity determines whether two populations come from the same distribution, even if this distribution is unknown. 11.6 Test of a Single Variance To test variability, use the chi-square test of a single variance. The test may be left-, right-, or two-tailed, and its hypotheses are always expressed in terms of the variance (or standard deviation). FORMULA REVIEW 11.1 Facts About the Chi-Square Distribution χ 2 = (Z 1 ) 2 + (Z 2 ) 2 + … (Z df ) 2 chi-square distribution random variable μ χ 2 = df chi-square distribution population mean Chapter 11 | The Chi-Square Distribution 645 σ χ 2 = 2 ⎛ ⎝d f ⎞ ⎠ Chi-Square distribution population standard deviation 11.2 Goodness-of-Fit Test ∑ k (O − E) 2 E goodness-of-fit test statistic where: O: observed values E: expected values k: number of different data cells or categories df = k − 1 degrees of freedom 11.3 Test of Independence Test of Independence • The number of degrees of freedom is equal to (number of columns - 1)(number of rows - 1). • The test statistic is Σ (i ⋅ j) (O – E) 2 E where O = observed values, E = expected values, i = the number of rows in the table, and j = the number of columns in the table. • If the null hypothesis is true, the expected number E = (row total)(column total) total surveyed . - eBook - PDF
- Prem S. Mann(Author)
- 2020(Publication Date)
- Wiley(Publisher)
p-value for χ 2 = 14.077 will be less than .005. (By using technology, we obtain the p-value of .0009.) Since, α = .01 is greater than .005 (or .0009), we reject the null hypothesis and conclude that the current percentage distribution of opinions of U.S. adults in response to the survey question seems to be significantly different from the distribution of opinions in 2019. Data source: https://www.pewsocialtrends.org/2019/12/11/most-americans-say-the-current-economy- is-helping-the-rich-hurting-the-poor-and-middle-class/. 11.3 A Test of Independence or Homogeneity 497 11.3 A Test of Independence or Homogeneity LEARNING OBJECTIVES After completing this section, you should be able to: • Describe the difference between a test of independence and a test of homogeneity. • Verify the conditions for the tests of independence and homogeneity. • Write the null and alternative hypotheses for the tests of independence and homogeneity. • Use a χ 2 -distribution to illustrate the critical value, the rejection region and the nonrejection region for the tests of independence and homogeneity. • Calculate the value of a χ 2 test statistic. • Make a conclusion based on a comparison of the test statistic and the critical value. • Apply the five-step method using a critical value approach to perform the tests of hypothesis of independence and homogeneity. This section is concerned with tests of independence and homogeneity, which are performed using contingency tables. Except for a few modifications, the procedure used to make such tests is almost the same as the one applied in Section 11.2 for a goodness-of-fit test. 11.3.1 A Contingency Table Often we may have information on more than one categorical variable for each element. Such information can be summarized and presented using a two-way classification table, which is also called a contingency table or cross-tabulation. Suppose a university has a total enroll- ment of 20,758 students. - eBook - PDF
- Prem S. Mann(Author)
- 2017(Publication Date)
- Wiley(Publisher)
440 Chi-Square Tests mathieukor/iStockphoto CHAPTER 11 Are you a fan of people who work on Wall Street? Do you think that people who work on Wall Street are as honest and moral as the general public? In a Harris poll conducted in 2012, 28% of the U.S. adults polled agreed with the statement, “In general, people on Wall Street are as honest and moral as other people.” Sixty-eight percent of the adults polled disagreed with this statement. (See Case Study 11–1.) 11.1 The Chi-Square Distribution 11.2 A Goodness-of-Fit Test Case Study 11–1 Are People On Wall Street Honest And Moral? 11.3 A Test of Independence or Homogeneity 11.4 Inferences About the Population Variance The tests of hypothesis about the mean, the difference between two means, the proportion, and the difference between two proportions were discussed in Chapters 9 and 10. The tests about proportions dealt with countable or categorical data. In the case of a proportion and the difference between two proportions in Chapters 9 and 10, the tests concerned experiments with only two categories. Recall from Chapter 5 that such experiments are called binomial experiments. This chapter describes three types of tests: 1. Tests of hypothesis for experiments with more than two categories, called goodness-of-fit tests 2. Tests of hypothesis about contingency tables, called independence and homogeneity tests 3. Tests of hypothesis about the variance and standard deviation of a single population All of these tests are performed by using the chi-square distribution, which is sometimes written as χ 2 distribution and is read as “chi-square distribution.” The symbol χ is the Greek letter chi, pronounced “ki - .” The values of a chi-square distribution are denoted by the symbol χ 2 (read as “chi-square”), just as the values of the standard normal distribution and the t distribution are denoted by z and t, respectively. Section 11.1 describes the chi-square distribution. - eBook - PDF
Biostatistics
A Foundation for Analysis in the Health Sciences
- Wayne W. Daniel, Chad L. Cross(Authors)
- 2018(Publication Date)
- Wiley(Publisher)
12 The Chi-Square Distribution and the Analysis of Frequencies CHAPTER OVERVIEW This chapter explores techniques that are commonly used in the analysis of count or frequency data. Uses of the chi-square distribution, which was mentioned briefly in Chapter 6, are discussed and illustrated in greater detail. Additionally, statistical techniques often used in epidemiological studies are introduced and demonstrated by means of examples. TOPICS 12.1 Introduction 12.2 The Mathematical Properties of the Chi-Square Distribution 12.3 Tests of Goodness-of-Fit 12.4 Tests of Independence 12.5 Tests of Homogeneity 12.6 The Fisher’s Exact Test 12.7 Relative Risk, Odds Ratio, and the Mantel–Haenszel Statistic 12.8 Summary LEARNING OUTCOMES After studying this chapter, the student will 1. understand the mathematical properties of the chi-square distribution. 2. be able to use the chi-square distribution for goodness-of-fit tests. 3. be able to construct and use contingency tables to test independence and homogeneity. 4. be able to apply Fisher’s exact test for 2 × 2 tables. 5. understand how to use contingency tables to test proportions. 6. understand how to calculate and interpret the epidemiological concepts of relative risk, odds ratios, and the Mantel–Haenszel statistic. 519 520 THE CHI-SQUARE DISTRIBUTION AND THE ANALYSIS OF FREQUENCIES 12.1 Introduction In the chapters on estimation (Chapter 6) and hypothesis testing (Chapter 7), brief mention is made of the chi-square distribution in the construction of confidence intervals for, and the testing of, hypotheses concerning a population variance. This distribution, which is one of the most widely used distributions in statistical applications, has many other uses. Some of the more common ones are presented in this chapter along with a more complete description of the distribution itself, which follows in the next section. - No longer available |Learn more
Understandable Statistics
Concepts and Methods, Enhanced
- Charles Henry Brase, Corrinne Pellillo Brase(Authors)
- 2016(Publication Date)
- Cengage Learning EMEA(Publisher)
Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 634 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS 1. Statistical Literacy In general, are chi-square distributions symmetric or skewed? If skewed, are they skewed right or left? 2. Statistical Literacy For chi-square distributions, as the number of degrees of freedom increases, does any skewness increase or decrease? Do chi-square distributions become more symmetric (and normal) as the number of degrees of freedom becomes larger and larger? 3. Statistical Literacy For chi-square tests of independence and of homogeneity, do we use a right-tailed, left-tailed, or two-tailed test? 4. Critical Thinking In general, how do the hypotheses for chi-square tests of independence differ from those for chi-square tests of homogeneity? Explain. 5. Critical Thinking Zane is interested in the proportion of people who recycle each of three distinct products: paper, plastic, electronics. He wants to test the hypothesis that the proportion of people recycling each type of product differs by age group: 12–18 years old, 19–30 years old, 31–40 years old, over 40 years old. Describe the sampling method appropriate for a test of homogeneity regard-ing recycled products and age. 6. Critical Thinking Charlotte is doing a study on fraud and identity theft based both on source (checks, credit cards, debit cards, online banking/finance sites, other) and on gender of the victim. Describe the sampling method appropriate for a test of independence regarding source of fraud and gender. 7. Interpretation: Test of Homogeneity Consider Zane’s study regarding products recycled and age group (see Problem 5). - eBook - PDF
- Prem S. Mann(Author)
- 2016(Publication Date)
- Wiley(Publisher)
For example, we may want to test if the affiliation of people with the Democratic and Republican parties is independent of their income levels. We perform such a test by using the chi-square distribution. As another example, we may want to test if there is an association between being a man or a woman and having a preference for watching sports or soap operas on television. Number of defects None One Two or More Number of metal sheets 262 24 14 Does the evidence from this sample suggest that the process needs an adjustment? Use α = .01. each, and 3% have two or more defects each. The quality control inspec- tors at the company take samples of metal sheets quite often and check them for defects. If the distribution of defects for a sample is signifi- cantly different from the above-mentioned percentage distribution, the process is stopped and adjusted. A recent sample of 300 sheets produced the frequency distribution of defects listed in the following table. Students who are male and enrolled part-time Table 11.5 Total Enrollment at a University Full-Time Part-Time Male 6768 2615 Female 7658 3717 460 Chapter 11 Chi-Square Tests The value of the test statistic χ 2 in a test of independence is obtained using the same formula as in the goodness-of-fit test described in Section 11.2. The null hypothesis in a test of independence is always that the two attributes are not related. The alternative hypothesis is that the two attributes are related. The frequencies obtained from the performance of an experiment for a contingency table are called the observed frequencies. The procedure to calculate the expected frequencies for a contingency table for a test of independence is different from the one for a goodness-of-fit test. Example 11–5 describes this procedure. EXAMPLE 11–5 Lack of Discipline in Schools Lack of discipline has become a major problem in schools in the United States. - Brase/Brase, Charles Henry Brase, Corrinne Pellillo Brase(Authors)
- 2016(Publication Date)
- Cengage Learning EMEA(Publisher)
Loyalty! Going, Going, Gone! Was there a time in the past when people worked for the same company all their lives, regularly purchased the same brand names, always voted for can-didates from the same political party, and loyally cheered for the same sports team? One way to look at this question is to consider tests of statistical independence . Is customer loyalty independent of company profits? Can a company maintain its productivity inde-pendent of loyal workers? Can politicians do whatever they please independent of the voters back home? Americans may be ready to act on a pent-up desire to restore a sense of loyalty in their lives. For more information, see American Demographics , Vol. 19, No. 9. VIEWPOINT 1. Statistical Literacy In general, are chi-square distributions symmetrical or skewed? If skewed, are they skewed right or left? 2. Statistical Literacy For chi-square distributions, as the number of degrees of freedom increases, does any skewness increase or decrease? Do chi-square distributions become more symmetrical (and normal) as the number of degrees of freedom becomes larger and larger? 3. Statistical Literacy For chi-square tests of independence and of homogeneity, do we use a right-tailed, left-tailed, or two-tailed test? 4. Critical Thinking In general, how do the hypotheses for chi-square tests of independence differ from those for chi-square tests of homogeneity? Explain. 5. Critical Thinking Zane is interested in the proportion of people who recycle each of three distinct products: paper, plastic, and electronics. He wants to test the hypothesis that the proportion of people recycling each type of prod-uct differs by age group: 12–18 years old, 19–30 years old, 31–40 years old, and over 40 years old. Describe the sampling method appropriate for a test of homogeneity regarding recycled products and age.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.









