Biological Sciences
Chi-Square Test
The Chi-Square Test is a statistical method used to determine if there is a significant association between categorical variables. In biological sciences, it can be applied to analyze data such as genetic ratios, allele frequencies, or the distribution of traits among different groups. The test compares observed data with expected data to assess whether any differences are due to chance or actual relationships.
Written by Perlego with AI-assistance
Related key terms
1 of 5
10 Key excerpts on "Chi-Square Test"
- eBook - ePub
Sensory Evaluation of Food
Statistical Methods and Procedures
- Michael O'Mahony(Author)
- 2017(Publication Date)
- Routledge(Publisher)
6Chi-Square
6.1 What is Chi-Square?
We now examine a test called chi-square or chi-squared (also written as χ 2 , where χ is the Greek lowercase letter chi); it is used to test hypotheses about frequency of occurrence. As the binomial test is used to test whether there may be more men or women in the university (a test of frequency of occurrence in the “men” and “women” categories), chi-square may be used for the same purpose. However, chi-square has more uses because it can test hypotheses about frequency of occurrence in more than two categories (e.g., dogs vs. cats vs. cows vs. horses). This is often used for categorizing responses to foods (“like” vs. “indifferent” vs. “dislike” or “too sweet” vs. “correct sweetness” vs. “not sweet enough”).Just as there is a normal and a binomial distribution, there is also a chi-square distribution, which can be used to calculate the probability of getting our particular results if the null hypothesis were true (see Section 6.6 ). In practice, a chi-square value is calculated and compared with the largest value that could occur on the null hypothesis (given in tables for various levels of significance); if the calculated value is larger than this value in the tables, H 0 is rejected. This procedure will become clearer with examples.In general, chi-square is given by the formulaChi-square = Σ [where]E( O − E )2O = observed frequencyE =expected frequencyWe will now examine the application of this formula to various problems. First we look at the single-sample case, where we examine a sample to find out something about the population; this is the case in which a binomial test can also be used.6.2 Chi-Square: Single-Sample Test-One-Way Classification
In the example we used for the binomial test (Section 5.2 ) we were interested in whether there were different numbers of men and women on a university campus. Assume that we took a sample of 22 persons, of whom 16 were male and 6 were female. We use the same logic as with a binomial test. We calculate the probability of getting our result on H 0 , and if it is small, we reject H 0 . From Table G.4.b , the two-tailed binomial probability associated with this is 0.052, so we would not reject H 0 at p < 0.05. However, we can also set up a Chi-Square Test. If H 0 is true, there is no difference in the numbers of men and women; the expected number of males and females from a sample of 22 is 11 each. Thus we have our observed frequencies (O = 16 and 6) and our expected frequencies (E - Bruce M. King, Patrick J. Rosopa, Edward W. Minium(Authors)
- 2018(Publication Date)
- Wiley(Publisher)
21 Chi-Square and Inference about Frequencies When you have finished studying this chapter, you should be able to: • Understand that the Chi-Square Test is used to test hypotheses about the number of cases falling into the categories of a frequency distribution; • Understand that 2 provides a measure of the difference between observed frequencies and the frequencies that would be expected if the null hypothesis were true; • Explain why the Chi-Square Test is best viewed as a test about proportions; • Compute 2 for one-variable goodness-of-fit problems; • Compute 2 to test for independence between two variables; and • Compute effect size for the Chi-Square Test. In previous chapters, we have been concerned with numerical scores and testing hypotheses about the mean or the correlation coefficient. In this chapter, you will learn to make inferences about frequencies—the number of cases falling into the categories of a frequency distribution. For example, among four brands of soft drinks, is there a difference in the proportion of consumers who prefer the taste of each? Is there a difference among registered voters in their preference for three candidates running for local office? To answer questions like these, a researcher com- pares the observed (sample) frequencies for the several categories of the distribution with those frequencies expected according to his or her hypothesis. The difference between observed and expected frequencies is expressed in terms of a statistic named chi-square ( 2 ), introduced by Karl Pearson in 1900. 21.1 The Chi-Square Test for Goodness of Fit The chi-square (pronounced “ki”) test was developed for categorical data; that is, for data com- categorical data data comprising quali- tative categories prising qualitative categories, such as eye color, gender, or political affiliation. Although the Chi-Square Test is conducted in terms of frequencies, it is best viewed conceptually as a test about proportions.- eBook - PDF
Biostatistics
A Foundation for Analysis in the Health Sciences
- Wayne W. Daniel, Chad L. Cross(Authors)
- 2020(Publication Date)
- Wiley(Publisher)
Use the Mantel–Haenszel Chi-Square Test statistic to determine if we can conclude that there is an association between the risk factor and food insecurity. Let = .05. 1 2 . 8 S U M M A R Y In this chapter, some uses of the versatile chi-square distribution are discussed. Chi-square goodness-of-fit tests applied to the nor- mal, binomial, and Poisson distributions are presented. We see that the procedure consists of computing a statistic X 2 = ∑ [ (O i − E i ) 2 E i ] that measures the discrepancy between the observed (O i ) and expected (E i ) frequencies of occurrence of values in certain dis- crete categories. When the appropriate null hypothesis is true, this quantity is distributed approximately as 2 . When X 2 is greater than or equal to the tabulated value of 2 for some , the null hypothesis is rejected at the level of significance. Tests of independence and tests of homogeneity are also dis- cussed in this chapter. The tests are mathematically equivalent but conceptually different. Again, these tests essentially test the goodness-of-fit of observed data to expectation under hypotheses, respectively, of independence of two criteria of classifying the data and the homogeneity of proportions among two or more groups. In addition, we discussed and illustrated in this chapter four other techniques for analyzing frequency data that can be presented in the form of a 2 × 2 contingency table: McNemar’s test, the Fisher’s exact test, the odds ratio, relative risk, and the Mantel–Haenszel procedure. Finally, we discussed the basic con- cepts of survival analysis and illustrated the computational proce- dures by means of two examples. - eBook - PDF
- Barbara Illowsky, Susan Dean(Authors)
- 2020(Publication Date)
- Openstax(Publisher)
The test is right-tailed. Each observation or cell category must have an expected value of at least five. 11.5 Comparison of the Chi-Square Tests The goodness-of-fit test is typically used to determine if data fits a particular distribution. The test of independence makes use of a contingency table to determine the independence of two factors. The test for homogeneity determines whether two populations come from the same distribution, even if this distribution is unknown. 11.6 Test of a Single Variance To test variability, use the Chi-Square Test of a single variance. The test may be left-, right-, or two-tailed, and its hypotheses are always expressed in terms of the variance or standard deviation. FORMULA REVIEW 11.1 Facts About the Chi-Square Distribution χ 2 = (Z 1 ) 2 + (Z 2 ) 2 + . . . (Z df ) 2 chi-square distribution random variable μ χ 2 = df chi-square distribution population mean Chapter 11 | The Chi-Square Distribution 667 σ χ 2 = 2 ⎛ ⎝d f ⎞ ⎠ chi-square distribution population standard deviation 11.2 Goodness-of-Fit Test ∑ k (O − E) 2 E goodness-of-fit test statistic where O: observed values E: expected values k: number of different data cells or categories df = k − 1 degrees of freedom 11.3 Test of Independence Test of Independence • The number of degrees of freedom is equal to (number of columns–1)(number of rows–1). • The test statistic is Σ (i ⋅ j) (O – E) 2 E where O = observed values, E = expected values, i = the number of rows in the table, and j = the number of columns in the table. • If the null hypothesis is true, the expected number E = (row total)(column total) total surveyed . - eBook - PDF
- Prem S. Mann(Author)
- 2016(Publication Date)
- Wiley(Publisher)
Test of independence A test of the null hypothesis that two attrib- utes of a population are not related. Chi-square distribution A distribution, with degrees of freedom as the only parameter, that is skewed to the right for small df and looks like a normal curve for large df. Expected frequencies The frequencies for different categories of a multinomial experiment or for different cells of a contingency table that are expected to occur when a given null hypothesis is true. Goodness-of-fit test A test of the null hypothesis that the observed frequencies for an experiment follow a certain pattern or theoretical distribution. Multinomial experiment An experiment with n trials for which (1) the trials are identical, (2) there are more than two possible Glossary The statistician barely helped you. In the first case, you know a single piece of information: the choice of a watering place for the three groups of animals is dependent. Another way of stating the result is that your data indicate that the choice of watering places for at least one of the animals is not independent of the others. Perhaps the zebras get up early, and the gnus and gazelles follow, making the gnus and gazelles dependent on the choice of the zebras. Or perhaps the animals choose the watering place of the day independent of the other animals, but always avoid the watering place at which the lions are drinking. Regarding the goodness-of-fit test, all you know is that the hypothesis that the animals equally favor the three watering places was wrong. But you do not know what the expected distribu- tion should be. In short, the rejection of the null hypothesis raises more questions than it answers. 2. IS THERE A GENDER BIAS IN ADMISSIONS? Categorical data analysis methods, such as a Chi-Square Test for independence, are used quite often in analyzing employment and admissions data in discrimination cases. - eBook - PDF
- Frederick Gravetter, Larry Wallnau(Authors)
- 2016(Publication Date)
- Cengage Learning EMEA(Publisher)
The chi-square statistic is computed from the formula x 2 5 S s f o 2 f e d 2 f e The following table summarizes the calculations: Cell f o f e ( f o – f e ) ( f o – f e ) 2 ( f o – f e ) 2 / f e Younger than 30—digital 90 70 20 400 5.71 Younger than 30—analog 40 56 – 16 256 4.57 Younger than 30—undecided 10 14 – 4 16 1.14 30 or Older—digital 10 30 – 20 400 13.33 30 or Older—analog 40 24 16 256 10.67 30 or Older—undecided 10 6 4 16 2.67 S TE P 3 596 CHAPTER 17 | The Chi-Square Statistic: Tests for Goodness of Fit and Independence Copyright 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Finally, we can add the last column to get the chi-square value. χ 2 = 5.71 + 4.57 + 1.14 + 13.33 + 10.67 + 2.67 = 38.09 Make a decision about H 0 , and state the conclusion The chi-square value is in the critical region. Therefore, we can reject the null hypothesis. There is a relationship between watch preference and age, χ 2 (2, n = 200) = 38.09, p < .05. DEMONSTR ATION 17.2 EFFECT SIZE WITH CRAMÉR’S V Because the data matrix is larger than 2 × 2, we will compute Cramér’s V to measure effect size. Cramér’s V = Î x 2 n s df * d 5 Î 38.09 200 s 1 d 5 Ï 0.19 5 0.436 S TE P 4 PROBLEMS 1. Parametric tests (such as t or ANOVA) differ from nonparametric tests (such as chi-square) primarily in terms of the assumptions they require and the data they use. Explain these differences. 2. The student population at the state college consists of 60% females and 40% males. a. The college theater department recently staged a production of a modern musical. - David Howell(Author)
- 2020(Publication Date)
- Cengage Learning EMEA(Publisher)
By chance, we would expect the participants to be correct 50% of the time, or 140 times. Although we can tell by inspection that participants performed even worse than chance would predict, I have chosen this example in part because it raises an interesting question of the statistical significance of a test. We will return to that issue shortly. The first question that we want to answer is whether the data’s depar-ture from chance expectation is significantly greater than chance. The data follow in Table 19.1. Even if participants were operating at chance levels, one category of response is likely to come out more frequently than the other. What we want is a goodness-of-fit test to ask whether the deviations from what would be expected by chance are large enough to lead us to conclude that responses weren’t random. The most common and important formula for the chi-square statistic ( x 2 ) in-volves a comparison of observed and expected frequencies. The observed frequen-cies , as the name suggests, are the frequencies you actually observed in the data—the numbers in row two of Table 19.1. The expected frequencies are the frequencies you would expect if the null hypothesis were true . The expected frequencies are shown in 2 The interesting feature of this paper is that Emily Rosa was an invited speaker at the “Ig Noble Prize” ceremony sponsored by the Annals of Improbable Research, located at MIT. This is a group of “whacky” scientists, to use a psychological term, who look for and recognize interesting research studies. Ig Nobel Prizes honor “achievements that cannot or should not be reproduced.” Emily’s invitation was meant as an honor, and true believers in therapeutic touch were less than kind to her. The society’s Web page is located at http://www .improb.com/ and I recommend going to it when you need a break from this chapter.- No longer available |Learn more
- Jessica Utts, Robert Heckard(Authors)
- 2015(Publication Date)
- Cengage Learning EMEA(Publisher)
a. Are the conditions necessary for carrying out a Chi-Square Test met? Explain. b. Test whether there is a statistically significant relation-ship between these two variables. Show all five steps for the hypothesis test. Be sure to state the level of signifi-cance that you are using. 15.54 Refer to Exercise 15.51, in which each student guessed the results of ten coin flips. If all students are just guessing, and if the coins are fair, then the number of correct guesses for each student should follow a binomial distribution. a. What are the parameters n and p for the binomial distri-bution, assuming that the coins are fair and students were just guessing? b. Specify the probabilities of getting 0 correct, 1 correct, ... , 10 correct for this experiment if students were just guess-ing. ( Hint: These are the probabilities in the pdf for a bino-mial distribution with parameters specified in part (a).) c. The following table shows how many students got two or less right, three right, four right, and so on, separately, for students classified as Sheep (believe in ESP) and classi-fied as Goats (don’t believe in ESP). Using your results from part (b), fill in the null probabilities that correspond to the hypothesis that students are just guessing. Number Correct Sheep Goats Null Probabilities 2 6 5 3 11 12 4 16 15 5 29 19 6 28 16 7 14 10 8 8 3 Total 112 80 a. Identify the two cells with the highest “contributions to chi-square.” Specify the numerical value of the “contribution” and the row and column categories for each of the two cells. b. For each of the two cells identified in part (a), determine whether the expected count is higher or lower than the observed count. c. Using the information in parts (a) and (b), explain how the women in those category combinations contribute to the overall conclusion for this study. 15.51 Example 15.12 (p. 612) described an experiment in which students were classified as “Sheep” who believe in ESP or as “Goats” who do not. - James E. De Muth(Author)
- 2014(Publication Date)
- Chapman and Hall/CRC(Publisher)
Thus, either test can be performed for data appearing in a 2 × 2 contingency table. Fisher’s Exact Test If data for a chi square test of independence is reduced to a 2 × 2 contingency table and the expected values are still too small to meet the requirements (at least five per cell) or have a zero in one or more of the four cells, the Fisher’s exact test can be employed (Fisher, 1936). The term “exact” is used because the result of the calculations produces the exact probabilities of obtaining the observed results if the two variables are independent. This test is sometimes referred to as Fisher’s four-fold test because of the four cells of frequency data. The test used the previously described the a-b-c-d four-cell format (Figure 16.3). The formula involves the factorials for the cells and margins: d! c! b! a! n! d)! + (b c)! + (a d)! + (c b)! + (a = p Eq. 16.10 An alternative formula, using possible combinations (Chapter 2), produces the exact same results: Chi Square Tests 429 + + + = c a n c d c a b a p Eq. 16.11 The first formula is identical to the nonparametric median test that will be discussed in Chapter 21. However, in this test, cells are based on the evaluation of two independent variables and not on estimating a midpoint based on the sample data. Multiple tests are performed to determine the probability of not only the research data, but also the probabilities for each possible combination to the extreme of the observed data. These probabilities are summed to determine the exact probability of the outcome observed given complete independence.- eBook - PDF
- Prem S. Mann(Author)
- 2017(Publication Date)
- Wiley(Publisher)
Performing a Chi-Square Goodness of Fit Test for Example 11–3 of the Text 1. Enter the observed counts from Example 11–3 into C1. 2. Select Stat > Tables > Chi-Square Goodness-of-Fit Test (One Variable). 3. Use the following settings in the dialog box that appears on screen (see Screen 11.6): • Select Observed counts and type C1 in the box. • Select Equal proportions at the Test submenu. Note: If the alternative hypothesis does not specify equal proportions, go to the worksheet and type the proportions in C2. Return to the dialog box, select Specific proportions at the Test submenu, and type C2 in the box. 4. Click OK. 5. The output, including the test statistic and p-value, will be displayed in the Session window. (See Screen 11.7.) Note: By default, Minitab will also generate two different bar graphs: one of the observed and expected counts and another of the (O-E) 2 /E values, which are called the contributions to the Chi-Square statistic. These graphs are not shown here. Now compare the χ 2 -value with the critical value of χ 2 or the p-value from Screen 11.7 with α and make a decision. Performing a Chi-Square Independence/Homogeneity Test for Example 11–6 of the Text 1. Enter the contingency table from Example 11–6 into the first two rows of C1 through C3. (See Screen 11.8.) 2. Select Stat > Tables > Chi-Square Test for Association. 3. Use the following settings in the dialog box that appears on screen (see Screen 11.8): • Select Summarized data in a two-way table from the drop-down menu. Note: For raw data in C1 and C2, select Raw data (categorical variables) from the drop- down menu, type C2 in the Rows box, and C1 in the Columns box. Then go to step 4. • Type C1-C3 in the Columns containing the table box. Screen 11.6 Screen 11.7 Screen 11.8 Technology Instructions 473 Technology Instructions 473 4. Click OK. 5. The output, including the test statistic and p-value, will be displayed in the Session window.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.









