Mathematics

Chi Square Test for Independence

The Chi Square Test for Independence is a statistical test used to determine if there is a significant association between two categorical variables. It compares the observed frequencies of the variables with the frequencies that would be expected if there was no relationship between them. The test is commonly used in research and data analysis to assess the independence of variables.

Written by Perlego with AI-assistance

8 Key excerpts on "Chi Square Test for Independence"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Essential Statistics for Public Managers and Policy Analysts
    • Evan M. Berman, XiaoHu Wang(Authors)
    • 2016(Publication Date)
    • CQ Press
      (Publisher)

    ...Chi-square is but one statistic for testing a relationship between two categorical variables. Once analysts have determined that a statistically significant relationship exists through hypothesis testing, they need to assess the practical relevance of their findings. Remember, large datasets easily allow for findings of statistical significance. Practical relevance deals with the relevance of statistical differences for managers; it addresses whether statistically significant relationships have meaningful policy implications. Key Terms Alternate hypothesis (p. 182) Chi-square (p. 178) Chi-square test assumptions (p. 186) Critical value (p. 184) Degrees of freedom (p. 184) Dependent samples (p. 186) Expected frequencies (p. 179) Five steps of hypothesis testing (p. 184) Goodness-of-fit test (p. 191) Independent samples (p.186) Kendall’s tau-c (p.193) Level of statistical significance (p. 183) Null hypothesis (p. 181) Purpose of hypothesis testing (p. 180) Sample size (and hypothesis testing) (p. 188) Statistical power (p. 190) Statistical significance (p. 183) Appendix 11.1: Rival Hypotheses: Adding a Control Variable We now extend our discussion to rival hypotheses. The following is but one approach (sometimes called the “elaboration paradigm”), and we provide other (and more efficient) approaches in subsequent chapters. First mentioned in Chapter 2, rival hypotheses are alternative, plausible explanations of findings...

  • Practical Statistics for Field Biology
    • Jim Fowler, Lou Cohen, Philip Jarvis(Authors)
    • 2013(Publication Date)
    • Wiley
      (Publisher)

    ...13 ANALYSING FREQUENCIES 13.1 The chi-square test Field biologists spend a good deal of their time counting and classifying things on nominal scales such as species, colour and habitat. Statistical techniques which analyse frequencies are therefore especially useful. The classical method of analysing frequencies is the chi-square test. This involves computing a test statistic which is compared with a chi-square (χ 2) distribution that we outlined in Section 11.11. Because there is a different distribution for every possible number of degrees of freedom (df), tables in Appendix 3 showing the distribution of χ 2 are restricted to the critical values at the significance levels we are interested in. There we give critical values at P = 0.05 and P = 0.01 (the 5% and 1% levels) for 1 to 30 df. Between 30 and 100 df, the critical values are estimated by interpolation, but the need to do this arises infrequently. Chi-square tests are variously referred to as tests for homogeneity, randomness, association, independence and goodness of fit. This array is not as alarming as it might seem at first sight. The precise applications will become clear as you study the examples. In each application the underlying principle is the same. The frequencies we observe are compared with those we expect on the basis of some Null Hypothesis. If the discrepancy between observed and expected frequencies is great, then the value of the calculated test statistic will exceed the critical value at the appropriate number of degrees of freedom. We are then obliged to reject the Null Hypothesis in favour of some alternative. The mastery of the method lies not so much in the computation of the test statistic itself but in the calculation of the expected frequencies. We have already shown some examples of how expected frequencies are generated. They can be derived from sample data (Example 7.5) or according to a mathematical model (Section 7.4)...

  • Sensory Evaluation of Food
    eBook - ePub

    Sensory Evaluation of Food

    Statistical Methods and Procedures

    • Michael O'Mahony(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)

    ...6 Chi-Square 6.1 What is Chi-Square? We now examine a test called chi-square or chi-squared (also written as χ 2, where χ is the Greek lowercase letter chi); it is used to test hypotheses about frequency of occurrence. As the binomial test is used to test whether there may be more men or women in the university (a test of frequency of occurrence in the “men” and “women” categories), chi-square may be used for the same purpose. However, chi-square has more uses because it can test hypotheses about frequency of occurrence in more than two categories (e.g., dogs vs. cats vs. cows vs. horses). This is often used for categorizing responses to foods (“like” vs. “indifferent” vs. “dislike” or “too sweet” vs. “correct sweetness” vs. “not sweet enough”). Just as there is a normal and a binomial distribution, there is also a chi-square distribution, which can be used to calculate the probability of getting our particular results if the null hypothesis were true (see Section 6.6). In practice, a chi-square value is calculated and compared with the largest value that could occur on the null hypothesis (given in tables for various levels of significance); if the calculated value is larger than this value in the tables, H 0 is rejected. This procedure will become clearer with examples. In general, chi-square is given by the formula Chi-square = Σ [ (O − E) 2 E ] where O = observed frequency E = expected frequency We will now examine the application of this formula to various problems. First we look at the single-sample case, where we examine a sample to find out something about the population; this is the case in which a binomial test can also be used. 6.2 Chi-Square: Single-Sample Test-One-Way Classification In the example we used for the binomial test (Section 5.2) we were interested in whether there were different numbers of men and women on a university campus. Assume that we took a sample of 22 persons, of whom 16 were male and 6 were female...

  • A Conceptual Guide to Statistics Using SPSS

    ...3 The Chi-Squared Test for Contingency Tables CHAPTER OUTLINE Introduction to the Chi-Squared Test Computing the Chi-Squared Test in SPSS A Closer Look: Fisher’s Exact Test The Chi-Squared Test for Testing the Distribution of One Categorical Variable Introduction to the Chi-Squared Test In the chapter on descriptive statistics, we drew a distinction between categorical and continuous variables. Most of the inferential statistics we discuss in this book assume that your outcome variable is continuous. However, sometimes we have outcomes that fall into categories (e.g., was someone on trial for a crime convicted or not? Or did a participant choose to open door number one, two, or three?). In these cases, and when our predictor variable is also categorical, the chi-squared test is appropriate. The raw data typically analyzed using the chi-squared test are counts of the same outcome in each of two or more conditions. For example, if you wanted to know whether gender affected traffic court convictions, you could tally up the number of men who were and weren’t convicted on a given day, then separately tally up the number of women who were and weren’t convicted on that same day. Those four counts would then be entered into a chi-squared analysis, and it would tell you whether the proportion of men who were convicted that day was different from the proportion of women who were convicted. The chi-squared test can also be used to answer questions about proportions within a single variable. In other words, the test can be used to tell you whether the percentage of cases in each category differs from some hypothesized distribution. Suppose that a court claims that it convicts 90% of people who come up with a traffic violation...

  • Statistics for Psychologists
    eBook - ePub

    Statistics for Psychologists

    An Intermediate Course

    ...Now a represents the number of pairs of observations that both have A, and so on. To test whether the probability of having A differs in the matched populations, the relevant test statistic is which, if there is no difference, has a chi-squared distribution with a single degree of freedom. We can illustrate the use of Fisher’s exact test on the data on suicidal feelings in Table 9.4 because this has some small expected values (see Section 9.4 for more comments). The p -value form applying the test is.235, indicating that diagnosis and suicidal feelings are not associated. To illustrate McNemar’s test, we use the data shown in Table 9.6. For these data the test statistic takes the value 1.29, which is clearly not significant, and we can conclude that depersonalization is not associated with prognosis where endogenous depressed patients are concerned. Table 9.6 Recovery of 23 Pairs of Depressed Patients 9.3. Beyond the Chi-Square Test: Further Exploration of Contingency Tables by Using Residuals and Correspondence Analysis A statistical significance test is, as implied in Chapter 1, often a crude and blunt instrument. This is particularly true in the case of the chi-square test for independence in the analysis of contingency tables, and after a significant value of the test statistic is found, it is usually advisable to investigate in more detail why the null hypothesis of independence fails to fit. Here we shall look at two approaches, the first involving suitably chosen residuals and the second that attempts to represent the association in a contingency table graphically. 9.3.1.  The Use of Residuals in the Analysis of Contingency Tables After a significant chi-squared statistic is found and independence is rejected for a two-dimensional contingency table, it is usually informative to try to identify the cells of the table responsible, or most responsible, for the lack of independence...

  • Statistics
    eBook - ePub

    Statistics

    The Essentials for Research

    ...Second, we are often in a position where we know only that someone “graduated” or “failed to graduate” so we cannot use a test that utilizes finer distinctions. Finally, our data may consist of categories that differ qualitatively —non-orderable countables not amenable to true measurement, such as male-female. Chi square is relatively easy to calculate and, although it is frequently used incorrectly, its prevalence in the literature makes it an important test to know about. 10.11 Overview This is the third distribution we have studied. We have discussed the binomial distribution, the normal distribution, and now the chi square distribution. The use of all of these distributions in tests of statistical significance is quite similar. The distributions provide us with a theoretical relative frequency of events; for the binomial it is the relative frequency, or probability, of obtaining any proportion of events in a sample of size n, given the proportion of events in the population from which the sample was randomly drawn; for the normal distribution it is the relative frequency, or probability, of obtaining samples yielding values of z as deviant as those listed in Table N ; for the chi square distribution with various df it is the probability of obtaining χ 2 values as large or larger than those listed in Table C. In each case, when we select an appropriate test of significance, we assume that if the null hypothesis is true, our data should conform to that theoretical sampling distribution. When the test is significant, it means that on the basis of the hypothesized sampling distribution, the results are quite improbable. However, before we can reject hypotheses about the population parameters, it is quite important that the remaining assumptions about the distribution have been met, for example, that observations are randomly obtained and that we have the proper df...

  • Statistics for the Behavioural Sciences
    eBook - ePub

    Statistics for the Behavioural Sciences

    An Introduction to Frequentist and Bayesian Approaches

    • Riccardo Russo(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)

    ...12  The chi-square distribution and the analysis of categorical data 12.1 Introduction In this chapter, a new continuous distribution is described. This is the chi-square (or alternatively chi-squared) distribution. We will show how this continuous distribution can be used in the analysis of discrete categorical (or alternatively frequency) data. First, the general characteristics of the chi-square distribution are presented. Then the Pearson's chi-square test is described. Examples of its application in the assessment of how well a set of observed frequencies matches a set of expected frequencies (i.e., goodness of fit test), and in the analysis of contingency tables (Frequentist and Bayesian) are provided. 12.2 The chi-square (χ 2) distribution “Chi” stands for the Greek letter χ and is pronounced as either “key” or “kai”. “Square” or, alternatively, “squared”, means raised to the power of two, hence the notation χ 2. The chi-square distribution is obtained from the standardised normal distribution in the following way. Suppose we could sample a z score from the z distribution, we square it and its value is recorded. The sampling process is performed an infinite number of times, allowing for the possibility that any z score can be sampled again (i.e., independent sampling). If the z 2 scores obtained are then plotted, the resulting distribution is the χ 2 distribution with one degree of freedom (denoted as χ 1 2). Now suppose we independently sample two χ 2 scores from the χ 1 2 distribution and we add their values, as done above in the case of the z scores. This process is performed an infinite number of times, and all the sums obtained are plotted. The resulting distribution is the χ 2 distribution with two degrees of freedom (denoted as χ 2 2). This process can be generalised to the distribution of any sum of k random variables each having the χ 1 2 distribution...

  • Social Statistics
    eBook - ePub

    Social Statistics

    Managing Data, Conducting Analyses, Presenting Results

    • Thomas J. Linneman(Author)
    • 2017(Publication Date)
    • Routledge
      (Publisher)

    ...Chi-square, chi-squared, I’ve seen both, and I’m not picky. I am picky about pronunciation: say chiropractor and then take off the ropractor. Although I like to drink chai, that’s not what we’re doing here. Although I appreciate tai chi, that’s not what we’re doing here. In the world of statistical tests, the chi-square test is a relatively easy one to use. It contrasts the frequencies you observed in the crosstab with the frequencies you would expect if there were no relationship among the variables in your crosstab. It makes this contrast with each cell in the crosstab. We’ll use the third sex/gun crosstab from earlier, the one where your gut wasn’t completely sure if there was a generalizable relationship. Here it is, with its frequencies expected crosstab next to it: ■ Exhibit 4.12: Frequencies Observed and Frequencies Expected Let’ s first find the difference between the frequencies observed (hereafter referred to as f o) and the frequencies we would expect (hereafter referred to as f e): ■ Exhibit 4.13: Differences between Observed and Expected Frequencies Cell f o f e f o - f e Top left 56 49 7 Top right 91 98 -7 Bottom left 44 51 -7 Bottom right 109 102 7 Then we’re going to square each of these and divide it by its corresponding f e : ■ Exhibit 4.14: Calculating the Chi-Square Value The sum of the last column of numbers is our value for chi-square: 1.00 + 0.50 + 0.96 + 0.48 = 2.94 Here is the formula for what we just. did: χ 2 = ⁢ Σ (f o - f e) 2 f e Notice that the symbol for chi-square is χ 2. It looks like an x with some attitude. Our chi-square value of 2.94 is not an end in itself but rather a means to an end. For now we are going to go shopping, or at least an activity that I consider similar to shopping. When you go shopping (let’s say shirt shopping, because everyone loves shirts), you go into a store with one thing (money) and you come out of the store with something else (a shirt)...