Mathematics
Chi-Square Distribution
The chi-square distribution is a probability distribution that is widely used in statistics. It is often used to test the independence of two variables or to compare observed data with expected data. The shape of the chi-square distribution depends on the degrees of freedom, and it is skewed to the right.
Written by Perlego with AI-assistance
Related key terms
1 of 5
12 Key excerpts on "Chi-Square Distribution"
- eBook - ePub
Statistics for the Behavioural Sciences
An Introduction to Frequentist and Bayesian Approaches
- Riccardo Russo(Author)
- 2020(Publication Date)
- Routledge(Publisher)
12 The Chi-Square Distribution and the analysis of categorical data12.1 Introduction
In this chapter, a new continuous distribution is described. This is the chi-square (or alternatively chi-squared) distribution. We will show how this continuous distribution can be used in the analysis of discrete categorical (or alternatively frequency) data.First, the general characteristics of the Chi-Square Distribution are presented. Then the Pearson's chi-square test is described. Examples of its application in the assessment of how well a set of observed frequencies matches a set of expected frequencies (i.e., goodness of fit test), and in the analysis of contingency tables (Frequentist and Bayesian) are provided.12.2 The chi-square (χ2 ) distribution“Chi” stands for the Greek letter χ and is pronounced as either “key” or “kai”. “Square” or, alternatively, “squared”, means raised to the power of two, hence the notation χ2 . The Chi-Square Distribution is obtained from the standardised normal distribution in the following way. Suppose we could sample a z score from the z distribution, we square it and its value is recorded. The sampling process is performed an infinite number of times, allowing for the possibility that any z score can be sampled again (i.e., independent sampling). If the z2 scores obtained are then plotted, the resulting distribution is the χ2 distribution with one degree of freedom (denoted as). Now suppose we independently sample two χ2 scores from theχ 1 2χ 1 2distribution and we add their values, as done above in the case of the z scores. This process is performed an infinite number of times, and all the sums obtained are plotted. The resulting distribution is the χ2 distribution with two degrees of freedom (denoted asχ 2 2). This process can be generalised to the distribution of any sum of k random variables each having theχ 1 2distribution. The distribution of a sum of k random variables, each with theχ 1 2distribution, is itself a χ2 distribution with k degrees of freedom (denoted asχ k 2). Furthermore, the distribution of the sum of two random variables distributed asχ a 2andχ b 2, has aχdistribution. For example if two independent random variables are distributed asa + b2χ 2 2andχ 3 2, then their sum is distributed asχ 5 2 - eBook - ePub
Statistics for the Behavioural Sciences
An Introduction
- Riccardo Russo(Author)
- 2004(Publication Date)
- Psychology Press(Publisher)
Chapter 6 The Chi-Square Distribution and the analysis of categorical dataIntroduction
The characteristics of the normal distribution and of the standard normal distribution were described in Chapter 5 . These are distributions of continuous random variables which can be used in the process of statistical inference. Similarly, in this chapter, a new continuous distribution is described. This is the chi-square (or alternatively chi-squared) distribution. Furthermore, we show how this continuous distribution can be used in the analysis of discrete categorical (or alternatively frequency) data.First, the general characteristics of the Chi-Square Distribution are presented. Then the Pearson’s chi-square test is described. Examples of its application in the assessment of how well a set of observed frequencies matches a set of expected frequencies (i.e., goodness of fit test), and in the analysis of contingency tables are provided.The chi-square (X2 ) distribution
“Chi” stands for the Greek letter and is pronounced as either “key” or “kai”. “Square” or, alternatively, “squared”, means raised to the power of two, hence the notation 2 . The Chi-Square Distribution is obtained from the standardised normal distribution in the following way. Suppose we sample a z score from the z distribution, we square it and its value is recorded. The sampling process is performed an infinite number of times, allowing for the possibility that any z score can be sampled again (i.e., independent sampling). If the z2 scores obtained are then plotted, the resulting distribution is the 2 distribution with one degree of freedom (denoted as 2 1 ). Now suppose we independently sample two 2 scores from the 2 1 distribution and we add their values, as done above in the case of the z scores. This process is performed an infinite number of times, and all the sums obtained are plotted. The resulting distribution is the 2 distribution with two degrees of freedom (denoted as 2 2 ). This process can be generalised to the distribution of any sum of k random variables each having the 2 1 distribution. The distribution of a sum of k random variables, each with the 2 1 distribution, is itself a 2 distribution with k degrees of freedom (denoted as 2 k ). Furthermore, the distribution of the sum of two random variables distributed as 2 a and 2 b , has a 2 (a+b) distribution. For example if two independent random variables are distributed as and 2 2 , then their sum is distributed as 2 5 - eBook - PDF
- Barbara Illowsky, Susan Dean(Authors)
- 2020(Publication Date)
- Openstax(Publisher)
11 | THE Chi-Square Distribution Figure 11.1 The Chi-Square Distribution can be used to find relationships between two things, like grocery prices at different stores. (credit: Pete/flickr) Introduction Chapter Objectives By the end of this chapter, the student should be able to do the following: • Interpret the chi-square probability distribution as the sample size changes • Conduct and interpret chi-square goodness-of-fit hypothesis tests • Conduct and interpret chi-square test of independence hypothesis tests • Conduct and interpret chi-square homogeneity hypothesis tests • Conduct and interpret chi-square single variance hypothesis tests Have you ever wondered if lottery numbers were evenly distributed or if some numbers occurred with a greater frequency? How about if the types of movies people preferred were different across different age groups? What about if a coffee machine was dispensing approximately the same amount of coffee each time? You could answer these questions by conducting a hypothesis test. Chapter 11 | The Chi-Square Distribution 637 You will now study a new distribution, one that is used to determine the answers to such questions. This distribution is called the Chi-Square Distribution. In this chapter, you will learn the three major applications of the Chi-Square Distribution: • The goodness-of-fit test, which determines if data fit a particular distribution, such as in the lottery example • The test of independence, which determines if events are independent, such as in the movie example • The test of a single variance, which tests variability, such as in the coffee example NOTE Though the Chi-Square Distribution depends on calculators or computers for most of the calculations, there is a table available (see Appendix G). TI-83+ and TI-84 calculator instructions are included in the text. - eBook - ePub
Sensory Evaluation of Food
Statistical Methods and Procedures
- Michael O'Mahony(Author)
- 2017(Publication Date)
- Routledge(Publisher)
6Chi-Square
6.1 What is Chi-Square?
We now examine a test called chi-square or chi-squared (also written as χ 2 , where χ is the Greek lowercase letter chi); it is used to test hypotheses about frequency of occurrence. As the binomial test is used to test whether there may be more men or women in the university (a test of frequency of occurrence in the “men” and “women” categories), chi-square may be used for the same purpose. However, chi-square has more uses because it can test hypotheses about frequency of occurrence in more than two categories (e.g., dogs vs. cats vs. cows vs. horses). This is often used for categorizing responses to foods (“like” vs. “indifferent” vs. “dislike” or “too sweet” vs. “correct sweetness” vs. “not sweet enough”).Just as there is a normal and a binomial distribution, there is also a Chi-Square Distribution, which can be used to calculate the probability of getting our particular results if the null hypothesis were true (see Section 6.6 ). In practice, a chi-square value is calculated and compared with the largest value that could occur on the null hypothesis (given in tables for various levels of significance); if the calculated value is larger than this value in the tables, H 0 is rejected. This procedure will become clearer with examples.In general, chi-square is given by the formulaChi-square = Σ [where]E( O − E )2O = observed frequencyE =expected frequencyWe will now examine the application of this formula to various problems. First we look at the single-sample case, where we examine a sample to find out something about the population; this is the case in which a binomial test can also be used.6.2 Chi-Square: Single-Sample Test-One-Way Classification
In the example we used for the binomial test (Section 5.2 ) we were interested in whether there were different numbers of men and women on a university campus. Assume that we took a sample of 22 persons, of whom 16 were male and 6 were female. We use the same logic as with a binomial test. We calculate the probability of getting our result on H 0 , and if it is small, we reject H 0 . From Table G.4.b , the two-tailed binomial probability associated with this is 0.052, so we would not reject H 0 at p < 0.05. However, we can also set up a chi-square test. If H 0 is true, there is no difference in the numbers of men and women; the expected number of males and females from a sample of 22 is 11 each. Thus we have our observed frequencies (O = 16 and 6) and our expected frequencies (E - eBook - ePub
Statistics in Psychology
An Historical Perspective
- Michael Cowles(Author)
- 2005(Publication Date)
- Psychology Press(Publisher)
Sampling DistributionsLarge sets of elementary events are commonly called populations or universes in statistics, but the set theory term sample space is perhaps more descriptive. The term population distribution refers to the distribution of the values of the possible observations in the sample space. Although the characteristics or parameters of the population (e.g., the mean, μ, or the standard deviation, σ) are of both practical and theoretical interest, these values are rarely, if ever, known precisely. Estimates of the values are obtained from corresponding sample values, the statistics. Clearly, for a sample of a given size drawn randomly from a sample space, a distribution of values of a particular summary statistic exists. This simple statement defines a sampling distribution. In statistical practice it is the properties of these distributions that guides our inferences about properties of populations of actual or potential observations. In chapter 6 the binomial, the Poisson , and the normal distributions were discussed. Now that sampling has been examined in some detail, three other distributions and the statistical tests associated with them are reviewed.The Chi Square Distribution
The development of the χ2 (chi-square) test of “goodness-of-fit” represents one of the most important breakthroughs in the history of statistics, certainly as important as the development of the mathematical foundations of regression. The fact that both creations are attributable to the work of one man,1 Karl Pearson, is impressive attestation to his role in the discipline. There are a number of routes by which the test can be approached, but the path that has been followed thus far is continued here. This path leads directly to the work of Pearson and Fisher, who did not make use, and, it seems, were in general unaware, of the earlier work on goodness-of-fit by mathematicians in Europe. Before looking at the development of the test of goodness-of-fit the structure of the Chi-Square Distribution itself is worth examining. Figure 9.1 shows two Chi-Square Distributions. Given a normally distributed population of scores Y with a mean μ, and a variance σ2 , suppose that samples of size n = 1 are drawn from the distribution and each score is converted to its corresponding standard score Z. - eBook - PDF
- H. Mulholland, C. R. Jones(Authors)
- 2014(Publication Date)
- Butterworth-Heinemann(Publisher)
11 CHI-SQUARED DISTRIBUTION 11.1. INTRODUCTION WHEN carrying out tests of significance using small samples (population variance unknown) we used the t test. The statistic used was defined as x — μ x — μ I s 2 t = V> = ^ψ Ν^ χ-μ I IX 2 σ/Jn where χ 2 follows the chi-squared distribution and v is the number of degrees of freedom associated with s 2 . The chi-squared distribution will be considered in this chapter. It can be used as a test of variance, a test of goodness of fit and also to set up a confidence interval for the population variance. 11.2. DEFINITION If (z;) = ( — I i = 1, 2, . . . n are a sample from the standardized normal distribution then the sum of squares χ 2 = ]Γ zf = — has the i = l <Γ probability element f( X 2 )d X 2 = 2 v/2 / ! (v/2) (x 2 ) v/2 ~ 1 e^ 2/2 d( Z 2 )(Q ^ χ 2 < oo) where v is the number of degrees of freedom associated with s 2 and Γ(ν/2) is the gamma function. The probability element is sometimes written: tw-fm®'**® The distribution so defined is called the chi-squared distribution and has the following properties: ( a ) joÄX 2 ) άχ 2 = 1. This shows that it is a probability distribution. {b) Its mean is v, i.e. £{χ 2 ) = v. (c) Its variance is 2v, i.e. var (χ 2 ) = &(χ 2 — v) 2 = 2v. 178 DEFINITION 179 (d) The maximum value οίβ,χ 2 ) occurs when χ 2 = v — 2 for v ^ 2. (e) v is the number of independent variables which are used to calculate χ 2 . For example suppose that there are variates x u x 2 , . . . x„ but if we have to estimate μ using x then we have only n — 1 independent variates. If a further m parameters have to be estimated using the original variates x l5 x 2 ,.. . x„, then we shall have only n — m — 1 independent variables remaining. The number of independent variables is called the number of degrees of freedom. {/) Figure 11.1 gives a comparison of the shape of the distribution for various values of v. (g) Percentage points χ 2 , are given in the tables for various values of v. - eBook - ePub
- Ralph B. D'Agostino, RalphB. D'Agostino(Authors)
- 2017(Publication Date)
- Routledge(Publisher)
3Tests of Chi-Squared TypeDavid S. Moore Purdue University, West Lafayette, Indiana3.1 Introduction
In the course of his Mathematical Contributions to the Theory of Evolution , Karl Pearson abandoned the assumption that biological populations are normally distributed, introducing the Pearson system of distributions to provide other models. The need to test fit arose naturally in this context, and in 1900 Pearson invented his chi-squared test. This statistic and others related to it remain among the most used statistical procedures.Pearson’s idea was to reduce the general problem of testing fit to a multinomial setting by basing a test on a comparison of observed cell counts with their expected values under the hypothesis to be tested. This reduction in general discards some information, so that tests of chi-squared type are often less powerful than other classes of tests of fit. But chi-squared tests apply to discrete or continuous, univariate or multivariate data. They are therefore the most generally applicable tests of fit.Modern developments have increased the flexibility of chi-squared tests, especially when unknown parameters must be estimated in the hypothesized family. This chapter considers two classes of chi-squared procedures. One, called “classical” because it contains such familiar statistics as the log likelihood ratio, Neyman modified chi-squared, and Freeman-Tukey, is discussed in Section 3.2 . The second, consisting of nonnegative definite quadratic forms in the standardized cell frequencies, is the main subject of Section 3.3 . Other newer developments relevant to both classes of statistics, especially the use of data-dependent cells, are also treated primarily in 3.3, while such practical considerations as choice of cells and accuracy of asymptotic approximate distributions appear in 3.2. Both sections contain a number of examples.Tests of the types considered here are also used in assessing the fit of models for categorical data. The scope of this volume forbids venturing into this closely related territory. Bishop, Fienberg, and Holland (1975) discuss the methods of categorical data analysis most closely related to the contents of this chapter. - eBook - ePub
Statistical Inference
A Short Course
- Michael J. Panik(Author)
- 2012(Publication Date)
- Wiley(Publisher)
Equation (13.4) to approximate multinomial probabilities requires the following conditions:a. Each outcome falls into one and only one cell or category.b. The outcomes are independent.c. n is large.13.3 The Chi-Square Distribution
Having rationalized the test statistic for conducting a multinomial goodness-of-fit test (Eq. 13.4 ), we next examine the properties of the Chi-Square Distribution and its attendant probabilities. The Chi-Square Distribution is a continuous distribution that represents the sampling distribution of a sum of squares of independent standard normal variables. That is, if the observations X 1 , . . ., X n constitute a random sample of size n taken from a normal population with mean μ and standard deviation σ, then the Z i = (X i − μ)/σ, i = 1, . . ., n , are independent N (0,1) random variables andLooking to the properties of the Chi-Square Distribution:1. The mean and standard deviation of a chi-square random variable X are E (X ) = v and , respectively, where v denotes degrees of freedom.2. The Chi-Square Distribution is positively skewed and it has a peak that is sharper than that of a normal distribution.3. Selected quantiles of the Chi-Square Distribution can be determined from the chi-square table (Table A.3 ) for various values of the degrees of freedom parameter v . For various cumulative probabilities 1 − α (Fig. 13.1 ), the quantile satisfies or, alternatively,That is, for various degrees of freedom v , gives the value of below which the proportion 1 − α of the distribution falls (or is the value of above which the proportion α of the distribution is found).Figure 13.1 The distribution.Example 13.1Suppose the random variable X is . Then for 1 − α = 0.95, we obtain, from Table A.3 - eBook - ePub
- Jean Dickinson Gibbons, Subhabrata Chakraborti(Authors)
- 2020(Publication Date)
- Chapman and Hall/CRC(Publisher)
The compatibility of a set of observed sample values with a normal or any other distribution can be checked by a goodness-of-fit test. These tests are designed for a null hypothesis, which is a statement about the form of the cumulative distribution or probability function of the parent population from which the sample is drawn. Ideally, the hypothesized distribution is completely specified, including all parameters. Since the alternative is necessarily quite broad, including differences only in location, scale, other parameters, form, or any combination thereof, rejection of the null hypothesis does not provide much specific information. Goodness-of-fit tests are customarily used when only the form of the population is in question, with the hope that the null hypothesis will not be rejected.In this chapter, we will consider two types of goodness-of-fit tests. The first type is designed for null hypotheses concerning a discrete distribution and compares the observed frequencies with the frequencies expected under the null hypothesis. This is the chi-square test proposed by Karl Pearson early in the history of statistics. The second type of goodness-of-fit test is designed for null hypotheses concerning a continuous distribution and compares the observed cumulative relative frequencies with those expected under the null hypothesis. This group includes the Kolmogorov–Smirnov (K–S), Lilliefors’s, and Anderson–Darling (A–D) tests. The latter are designed for testing the assumption of a normal or an exponential distribution with unspecified parameters and are therefore important preliminary tests for justifying the use of parametric or classical statistical methods that require this assumption. Finally, we present some graphical approaches to assessing the form of a distribution.4.2 The Chi-Square Goodness-of-Fit Test
A single random sample of size n is drawn from a population with unknown cdf F X . We wish to test the null hypothesisH 0:F X( x ) =F 0( x ) for all xwhere F 0 (x ) is completely specified, against the general alternativeH 1:F X( x ) ≠F 0( x ) for some xIn order to apply the chi-square test in this situation, the sample data must first be grouped according to some scheme in order to form a frequency distribution. In the case of count or qualitative data, where the hypothesized distribution would be discrete, the categories would be the relevant verbal or numerical classifications. For example, in tossing a die, the categories would be the numbers of spots; in tossing a coin, the categories would be the numbers of heads; in surveys of brand preferences, the categories would be the brand names considered. When the sample observations are quantitative, the categories would be numerical classes chosen by the experimenter. In this case, the frequency distribution is not unique and some information is necessarily lost by the grouping. Even though the hypothesized distribution is most likely continuous with measurement data, the data must be categorized for analysis by the chi-square test. - eBook - PDF
Biostatistics
A Foundation for Analysis in the Health Sciences
- Wayne W. Daniel, Chad L. Cross(Authors)
- 2018(Publication Date)
- Wiley(Publisher)
Use the Mantel–Haenszel chi-square test statistic to determine if we can conclude that there is an association between the risk factor and food insecurity. Let = .05. 1 2 . 8 S U M M A R Y In this chapter, some uses of the versatile Chi-Square Distribution are discussed. Chi-square goodness-of-fit tests applied to the nor- mal, binomial, and Poisson distributions are presented. We see that the procedure consists of computing a statistic X 2 = ∑ [ (O i − E i ) 2 E i ] that measures the discrepancy between the observed (O i ) and expected (E i ) frequencies of occurrence of values in certain dis- crete categories. When the appropriate null hypothesis is true, this quantity is distributed approximately as 2 . When X 2 is greater than or equal to the tabulated value of 2 for some , the null hypothesis is rejected at the level of significance. Tests of independence and tests of homogeneity are also dis- cussed in this chapter. The tests are mathematically equivalent but conceptually different. Again, these tests essentially test the goodness-of-fit of observed data to expectation under hypotheses, respectively, of independence of two criteria of classifying the data and the homogeneity of proportions among two or more groups. In addition, we discussed and illustrated in this chapter four other techniques for analyzing frequency data that can be presented in the form of a 2 × 2 contingency table: McNemar’s test, the Fisher’s exact test, the odds ratio, relative risk, and the Mantel–Haenszel procedure. Finally, we discussed the basic con- cepts of survival analysis and illustrated the computational proce- dures by means of two examples. - eBook - ePub
- Kenneth S. Stephens(Author)
- 2011(Publication Date)
- ASQ Quality Press(Publisher)
The 60 tosses are made and the results are recorded by outcomes, 1 thru 6, with the data shown on the spreadsheet in the referenced Excel file. The outcomes from the tosses are listed as “observed.” The next step is to determine the “expected” values based on the null hypothesis. This is alluded to above and is an equal number for every outcome, hence uniform, totaling the number of tosses, or 1/6 of 60 = 10. Hence, a column of 10's for the six possible outcomes is labeled on the Excel file as “Expected results, E.”The chi-square test statistic is the following:
This is shown in column G of the Excel file for a test value result of 9.4.(23) In order to compare this with a critical value of the Chi-Square Distribution, the degrees of freedom for the test must be determined. This is one case where df is not a function of sample size n, such as n – 1, and so on. Degrees of freedom in goodness-of-fit tests are related to the number of cells being used. For k cells with k equal to 6, the df is either k –1, k – 2 or otherwise. In the case at hand the df is k – 1 = 5. A choice of k – 2 would apply if an estimate of a parameter of the underlying distribution, such as a mean, had to be made to enable computation of the expected value for each cell. In the present case the expected values are merely a consequence of the uniform distribution of the null hypothesis having equal cell probabilities. No parameter has been estimated.At this point, or even earlier or later, the desired confidence level to be used for the test is decided on, and as noted, should be a relatively small probability such as 0.01, 0.05, and so on. Here, α = 0.05 is chosen.A next step is determination of a chi-square value, χ2(α), and/or associated P value since either or both can be used to test the null hypothesis (see the discussion in step [8] above). The referenced Excel file contains a mini-template such as the one in the Excel file entitled CH 6 Pand χ2 (α) Value Template & Chi-Square Table. For inputs of the df of 5 and the test value, x, of 9.4, the associated P value is obtained as 0.0941 and compared with the designated α = 0.05, and as P is larger than α, the hypothesis is not rejected. With inputs of df and α, the test critical value, χ2(α) = 11.0705, is determined and compared with the test value of x = 9.4. As χ2(α) is larger than the test value, x - David Howell(Author)
- 2020(Publication Date)
- Cengage Learning EMEA(Publisher)
By chance, we would expect the participants to be correct 50% of the time, or 140 times. Although we can tell by inspection that participants performed even worse than chance would predict, I have chosen this example in part because it raises an interesting question of the statistical significance of a test. We will return to that issue shortly. The first question that we want to answer is whether the data’s depar-ture from chance expectation is significantly greater than chance. The data follow in Table 19.1. Even if participants were operating at chance levels, one category of response is likely to come out more frequently than the other. What we want is a goodness-of-fit test to ask whether the deviations from what would be expected by chance are large enough to lead us to conclude that responses weren’t random. The most common and important formula for the chi-square statistic ( x 2 ) in-volves a comparison of observed and expected frequencies. The observed frequen-cies , as the name suggests, are the frequencies you actually observed in the data—the numbers in row two of Table 19.1. The expected frequencies are the frequencies you would expect if the null hypothesis were true . The expected frequencies are shown in 2 The interesting feature of this paper is that Emily Rosa was an invited speaker at the “Ig Noble Prize” ceremony sponsored by the Annals of Improbable Research, located at MIT. This is a group of “whacky” scientists, to use a psychological term, who look for and recognize interesting research studies. Ig Nobel Prizes honor “achievements that cannot or should not be reproduced.” Emily’s invitation was meant as an honor, and true believers in therapeutic touch were less than kind to her. The society’s Web page is located at http://www .improb.com/ and I recommend going to it when you need a break from this chapter.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.











