Mathematics

Chi-Square Distribution

The chi-square distribution is a probability distribution that is widely used in statistics. It is often used to test the independence of two variables or to compare observed data with expected data. The shape of the chi-square distribution depends on the degrees of freedom, and it is skewed to the right.

Written by Perlego with AI-assistance

8 Key excerpts on "Chi-Square Distribution"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation

    ...While the idea of determining whether standard distributions gave acceptable fits to data sets was well established early in Pearson’s career, detailed in his 1900 paper, he was determined to derive a test procedure that further advanced the problem of goodness of fit. As a result, the formulation of the chi-square statistic stands as one of the greatest statistical achievements of the 20th century. Basic Principles and Applications Generally speaking, a chi-square test (also commonly referred to as χ 2) refers to a bevy of statistical hypothesis tests where the objective is to compare a sample distribution to a theorized distribution to confirm (or refute) a null hypothesis. Two important conditions that must exist for the chi-square test are independence and sample size or distribution. For independence, each case that contributes to the overall count or data set must be independent of all other cases that make up the overall count. Second, each particular scenario must have a specified number of cases within the data set to perform the analysis. The literature points to a number of arbitrary cutoffs for the overall sample size. The chi-square test has most often been utilized in two types of comparison situations: a test of goodness of fit or a test of independence. One of the most common uses of the chi-square test is to determine whether a frequency data set can be adequately represented by a specified distribution function. More clearly, a chi-square test is appropriate when you are trying to determine whether sample data are consistent with a hypothesized distribution...

  • Statistics for the Behavioural Sciences
    eBook - ePub

    Statistics for the Behavioural Sciences

    An Introduction to Frequentist and Bayesian Approaches

    • Riccardo Russo(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)

    ...12  The Chi-Square Distribution and the analysis of categorical data 12.1 Introduction In this chapter, a new continuous distribution is described. This is the chi-square (or alternatively chi-squared) distribution. We will show how this continuous distribution can be used in the analysis of discrete categorical (or alternatively frequency) data. First, the general characteristics of the Chi-Square Distribution are presented. Then the Pearson's chi-square test is described. Examples of its application in the assessment of how well a set of observed frequencies matches a set of expected frequencies (i.e., goodness of fit test), and in the analysis of contingency tables (Frequentist and Bayesian) are provided. 12.2 The chi-square (χ 2) distribution “Chi” stands for the Greek letter χ and is pronounced as either “key” or “kai”. “Square” or, alternatively, “squared”, means raised to the power of two, hence the notation χ 2. The Chi-Square Distribution is obtained from the standardised normal distribution in the following way. Suppose we could sample a z score from the z distribution, we square it and its value is recorded. The sampling process is performed an infinite number of times, allowing for the possibility that any z score can be sampled again (i.e., independent sampling). If the z 2 scores obtained are then plotted, the resulting distribution is the χ 2 distribution with one degree of freedom (denoted as χ 1 2). Now suppose we independently sample two χ 2 scores from the χ 1 2 distribution and we add their values, as done above in the case of the z scores. This process is performed an infinite number of times, and all the sums obtained are plotted. The resulting distribution is the χ 2 distribution with two degrees of freedom (denoted as χ 2 2). This process can be generalised to the distribution of any sum of k random variables each having the χ 1 2 distribution...

  • Statistics
    eBook - ePub

    Statistics

    The Essentials for Research

    ...Second, we are often in a position where we know only that someone “graduated” or “failed to graduate” so we cannot use a test that utilizes finer distinctions. Finally, our data may consist of categories that differ qualitatively —non-orderable countables not amenable to true measurement, such as male-female. Chi square is relatively easy to calculate and, although it is frequently used incorrectly, its prevalence in the literature makes it an important test to know about. 10.11 Overview This is the third distribution we have studied. We have discussed the binomial distribution, the normal distribution, and now the chi square distribution. The use of all of these distributions in tests of statistical significance is quite similar. The distributions provide us with a theoretical relative frequency of events; for the binomial it is the relative frequency, or probability, of obtaining any proportion of events in a sample of size n, given the proportion of events in the population from which the sample was randomly drawn; for the normal distribution it is the relative frequency, or probability, of obtaining samples yielding values of z as deviant as those listed in Table N ; for the chi square distribution with various df it is the probability of obtaining χ 2 values as large or larger than those listed in Table C. In each case, when we select an appropriate test of significance, we assume that if the null hypothesis is true, our data should conform to that theoretical sampling distribution. When the test is significant, it means that on the basis of the hypothesized sampling distribution, the results are quite improbable. However, before we can reject hypotheses about the population parameters, it is quite important that the remaining assumptions about the distribution have been met, for example, that observations are randomly obtained and that we have the proper df...

  • Sensory Evaluation of Food
    eBook - ePub

    Sensory Evaluation of Food

    Statistical Methods and Procedures

    • Michael O'Mahony(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)

    ...6 Chi-Square 6.1 What is Chi-Square? We now examine a test called chi-square or chi-squared (also written as χ 2, where χ is the Greek lowercase letter chi); it is used to test hypotheses about frequency of occurrence. As the binomial test is used to test whether there may be more men or women in the university (a test of frequency of occurrence in the “men” and “women” categories), chi-square may be used for the same purpose. However, chi-square has more uses because it can test hypotheses about frequency of occurrence in more than two categories (e.g., dogs vs. cats vs. cows vs. horses). This is often used for categorizing responses to foods (“like” vs. “indifferent” vs. “dislike” or “too sweet” vs. “correct sweetness” vs. “not sweet enough”). Just as there is a normal and a binomial distribution, there is also a Chi-Square Distribution, which can be used to calculate the probability of getting our particular results if the null hypothesis were true (see Section 6.6). In practice, a chi-square value is calculated and compared with the largest value that could occur on the null hypothesis (given in tables for various levels of significance); if the calculated value is larger than this value in the tables, H 0 is rejected. This procedure will become clearer with examples. In general, chi-square is given by the formula Chi-square = Σ [ (O − E) 2 E ] where O = observed frequency E = expected frequency We will now examine the application of this formula to various problems. First we look at the single-sample case, where we examine a sample to find out something about the population; this is the case in which a binomial test can also be used. 6.2 Chi-Square: Single-Sample Test-One-Way Classification In the example we used for the binomial test (Section 5.2) we were interested in whether there were different numbers of men and women on a university campus. Assume that we took a sample of 22 persons, of whom 16 were male and 6 were female...

  • Statistics in Psychology
    eBook - ePub

    Statistics in Psychology

    An Historical Perspective

    ...9 Sampling Distributions Large sets of elementary events are commonly called populations or universes in statistics, but the set theory term sample space is perhaps more descriptive. The term population distribution refers to the distribution of the values of the possible observations in the sample space. Although the characteristics or parameters of the population (e.g., the mean, μ, or the standard deviation, σ) are of both practical and theoretical interest, these values are rarely, if ever, known precisely. Estimates of the values are obtained from corresponding sample values, the statistics. Clearly, for a sample of a given size drawn randomly from a sample space, a distribution of values of a particular summary statistic exists. This simple statement defines a sampling distribution. In statistical practice it is the properties of these distributions that guides our inferences about properties of populations of actual or potential observations. In chapter 6 the binomial, the Poisson, and the normal distributions were discussed. Now that sampling has been examined in some detail, three other distributions and the statistical tests associated with them are reviewed. The Chi Square Distribution The development of the χ 2 (chi-square) test of “goodness-of-fit” represents one of the most important breakthroughs in the history of statistics, certainly as important as the development of the mathematical foundations of regression. The fact that both creations are attributable to the work of one man, 1 Karl Pearson, is impressive attestation to his role in the discipline. There are a number of routes by which the test can be approached, but the path that has been followed thus far is continued here. This path leads directly to the work of Pearson and Fisher, who did not make use, and, it seems, were in general unaware, of the earlier work on goodness-of-fit by mathematicians in Europe...

  • Essential Statistics for Public Managers and Policy Analysts
    • Evan M. Berman, XiaoHu Wang(Authors)
    • 2016(Publication Date)
    • CQ Press
      (Publisher)

    ...Chi-square is but one statistic for testing a relationship between two categorical variables. Once analysts have determined that a statistically significant relationship exists through hypothesis testing, they need to assess the practical relevance of their findings. Remember, large datasets easily allow for findings of statistical significance. Practical relevance deals with the relevance of statistical differences for managers; it addresses whether statistically significant relationships have meaningful policy implications. Key Terms Alternate hypothesis (p. 182) Chi-square (p. 178) Chi-square test assumptions (p. 186) Critical value (p. 184) Degrees of freedom (p. 184) Dependent samples (p. 186) Expected frequencies (p. 179) Five steps of hypothesis testing (p. 184) Goodness-of-fit test (p. 191) Independent samples (p.186) Kendall’s tau-c (p.193) Level of statistical significance (p. 183) Null hypothesis (p. 181) Purpose of hypothesis testing (p. 180) Sample size (and hypothesis testing) (p. 188) Statistical power (p. 190) Statistical significance (p. 183) Appendix 11.1: Rival Hypotheses: Adding a Control Variable We now extend our discussion to rival hypotheses. The following is but one approach (sometimes called the “elaboration paradigm”), and we provide other (and more efficient) approaches in subsequent chapters. First mentioned in Chapter 2, rival hypotheses are alternative, plausible explanations of findings...

  • A Conceptual Guide to Statistics Using SPSS

    ...In other words, the test can be used to tell you whether the percentage of cases in each category differs from some hypothesized distribution. Suppose that a court claims that it convicts 90% of people who come up with a traffic violation. If you tallied the number of people who were convicted or not for a given period, the chi-squared test could tell you whether the proportion of people convicted in that period was significantly different from 90%. We mentioned above that most inferential tests described in this book assume that your dependent measure is continuous. These tests make several more important assumptions, and we’ll get to those later on. One of the nice features of the chi-squared test is that it doesn’t make many assumptions about your data. For this reason, the chi-squared test is sometimes called a “nonparametric” test, including in SPSS. However, this is not literally true. The chi-squared test is computed by coming up with an “expected” count for each cell and comparing it to the actual, “observed” value count for each cell according to the following equation: What this equation says is that, for each cell, the difference between the observed count and the expected count is squared, then that number is divided by the expected count. These values are added up across all the cells; then the total is compared to the chi-squared distribution with (r − 1)*(c − 1) degrees of freedom, where r is the number of rows and c is the number of columns in the contingency table. So this test does make the assumption that the summed value across all cells of the above equation is chi-square distributed. Computing the Chi-Squared Test in SPSS Even though it is an inferential statistic, chi-squared can be found in SPSS by clicking on Analyze → Descriptive Statistics → Crosstabs. This crosstabs function will generate a table of tallies (i.e., frequencies) of cases in your data set that fall into each of the cells defined by the rows and columns you specify...

  • Social Statistics
    eBook - ePub

    Social Statistics

    Managing Data, Conducting Analyses, Presenting Results

    • Thomas J. Linneman(Author)
    • 2017(Publication Date)
    • Routledge
      (Publisher)

    ...Chi-square, chi-squared, I’ve seen both, and I’m not picky. I am picky about pronunciation: say chiropractor and then take off the ropractor. Although I like to drink chai, that’s not what we’re doing here. Although I appreciate tai chi, that’s not what we’re doing here. In the world of statistical tests, the chi-square test is a relatively easy one to use. It contrasts the frequencies you observed in the crosstab with the frequencies you would expect if there were no relationship among the variables in your crosstab. It makes this contrast with each cell in the crosstab. We’ll use the third sex/gun crosstab from earlier, the one where your gut wasn’t completely sure if there was a generalizable relationship. Here it is, with its frequencies expected crosstab next to it: ■ Exhibit 4.12: Frequencies Observed and Frequencies Expected Let’ s first find the difference between the frequencies observed (hereafter referred to as f o) and the frequencies we would expect (hereafter referred to as f e): ■ Exhibit 4.13: Differences between Observed and Expected Frequencies Cell f o f e f o - f e Top left 56 49 7 Top right 91 98 -7 Bottom left 44 51 -7 Bottom right 109 102 7 Then we’re going to square each of these and divide it by its corresponding f e : ■ Exhibit 4.14: Calculating the Chi-Square Value The sum of the last column of numbers is our value for chi-square: 1.00 + 0.50 + 0.96 + 0.48 = 2.94 Here is the formula for what we just. did: χ 2 = ⁢ Σ (f o - f e) 2 f e Notice that the symbol for chi-square is χ 2. It looks like an x with some attitude. Our chi-square value of 2.94 is not an end in itself but rather a means to an end. For now we are going to go shopping, or at least an activity that I consider similar to shopping. When you go shopping (let’s say shirt shopping, because everyone loves shirts), you go into a store with one thing (money) and you come out of the store with something else (a shirt)...