Technology & Engineering

P Value

The p-value is a measure used in hypothesis testing to determine the strength of evidence against the null hypothesis. It represents the probability of obtaining results as extreme as the observed results, assuming that the null hypothesis is true. A smaller p-value indicates stronger evidence against the null hypothesis, leading to its rejection in favor of the alternative hypothesis.

Written by Perlego with AI-assistance

8 Key excerpts on "P Value"

  • Book cover image for: Medical Statistics
    eBook - ePub

    Medical Statistics

    A Textbook for the Health Sciences

    • Stephen J. Walters, Michael J. Campbell, David Machin(Authors)
    • 2020(Publication Date)
    • Wiley-Blackwell
      (Publisher)
    P‐value can be thought of as a measure of the strength of the belief in the null hypothesis.
    The significance level is usually set at 5% or 0.05. This level is arbitrary and it is ridiculous to interpret the results of a study differently according to whether the P‐value obtained was, say 0.055 or 0.045. These P‐values should lead to similar conclusions, not diametrically opposed ones and a minor change to the data can easily shift the P‐value by this amount or more. Statistical significance does not necessarily mean the result is clinically significant or important.
    How to interpret P‐values from a single test.
    We can think of the P‐values as indicating the strength of evidence but always keep in mind the size of the study being considered.
    P‐value Interpretation
    Greater than 0.10 Little or no evidence of a difference or a relationship
    a
    Between 0.05 and 0.10 Very weak evidence of a difference or relationship
    Between 0.01 and 0.05 Weak evidence of a difference or a relationship
    Less than 0.01: Strong evidence of a difference or relationship
    Less than 0.001: Very strong evidence of a difference or relationship.
    a
    Although we have talked in terms of detecting differences in this chapter, the same principles arise when testing relationships as in Chapter 9 for example.
    Source: Adapted from Bland (2000 ).
    In Chapter 4 we discussed different concepts of probability. The P‐value is a probability, and the concept in this instance is closest to the idea of a repeated sample. If we conducted a large number of similar studies and repeated the test each time, when the null hypothesis is true, then in the long run, the proportion of times the test statistic equals, or is greater than the observed value is the P‐
  • Book cover image for: Statistics at Square One
    • Michael J. Campbell(Author)
    • 2021(Publication Date)
    • Wiley-Blackwell
      (Publisher)
    P Values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain P Values (typically those passing a significance threshold) renders the reported P Values essentially uninterpretable. Cherry‐picking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference and ‘P‐hacking’, leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided. One need not formally carry out multiple statistical tests for this problem to arise. Whenever a researcher chooses what to present based on statistical results, valid interpretation of those results is severely compromised if the reader is not informed of the choice and its basis. Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted and all P Values computed. Valid scientific conclusions based on P Values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including P Values) were selected for reporting.
    A P Value, or statistical significance, does not measure the size of an effect or the importance of a result. Statistical significance is not equivalent to scientific, human or economic significance. Smaller P Values do not necessarily imply the presence of larger or more important effects, and larger P Values do not imply a lack of importance or even lack of effect. Any effect, no matter how tiny, can produce a small P Value if the sample size or measurement precision is high enough, and large effects may produce unimpressive P Values if the sample size is small or measurements are imprecise. Similarly, identical estimated effects will have different P Values if the precision of the estimates differs.
    By itself, a P Value does not provide a good measure of evidence regarding a model or hypothesis. Researchers should recognise that a P Value without context or other evidence provides limited information. For example, a P Value near 0.05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively large P Value does not imply evidence in favour of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data

    One‐sided and two‐sided tests

    One will sometimes see in the literature that ‘two‐sided tests were used’. What does this mean? Consider a test to compare the population means of two groups, A and B.
  • Book cover image for: Advanced Concepts in Surgical Research
    • Mohit Bhandari, Bernd Robioneck, Mohit Bhandari, Bernd Robioneck(Authors)
    • 2012(Publication Date)
    • Thieme
      (Publisher)
    therefore whether there is any treatment e ff ect at all rather than how large that treatment e ff ect might be. Jargon Simplified: P Value The P (for probability) value is a statistic that quantifies the probability of the results obtained in a research study (or those more extreme) occurring due to chance alone. By convention, P Values lower than 0.05 are con-sidered to be “ statistically significant, ” meaning that the results are unlikely to be due to chance alone and may therefore be assumed to derive from the interven-tion under study. Although the probability obtained from the P Value is de-pendent on the size of the e ff ect, it is also heavily depen-dent on the sample size. Since it is not possible to separate the contribution of sample size and treatment e ff ect, a P Value cannot be used to specify how large the di ff erence between treatments may be. For example, a very large study may yield a highly significant di ff erence between two treatments; however, it may be that the real di ff erence between the treatments is small. In other words, the di ff er-ence may be statistically significant but clinically meaning-less. Furthermore, since the magnitude of the P Value does not specify the magnitude of the treatment e ff ect, it is not correct to use P Values to draw conclusions as to the rela-tive merits of various treatments. For example, while both aprotinin and tranexamic acid were compared to pla-cebo in the same patient population, the fact that aprotinin had a smaller P Value (0.021) than tranexamic acid (0.045) does not allow one to conclude that aprotinin is superior to tranexamic acid for preventing transfusion. In contrast to the P Value, the e ff ect size is a value that conveys informa-tion about the magnitude of a treatment e ff ect.
  • Book cover image for: Biostatistics for Epidemiologists
    • Anders Ahlbom(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)
    Chapter 4 THE P-VALUE, THE P-VALUE FUNCTION AND THE CONFIDENCE INTERVAL
    “…reminds me of the number P that I invented a couple of years ago. P is, for each individual, the number of minutes per month that that person spends thinking about the number P. For me, the value of P seems to average out at about 2. I certainly wouldn’t want it to go much above that! I find it crosses my mind most often when I’m shaving.”
    Hofstadter DR: Metamagical themas: Questing for the Essence of Mind and Pattern. Bantam Books 1985.
    In medical research, not least in epidemiology, significance testing has come to play a very major role; it is often the test result alone which is used to decide whether or not a result is to be ascribed to a random variation. One of the theses in this book is that this is an unsuitable principle, not only because it is fairly uninformative but also because it can easily lead to erroneous conclusions. The present chapter nevertheless begins by discussing the P-value, which is the basis not only for significance testing but also for its suggested alternative, the confidence interval.
    4.1 THE P-VALUE 4.1.1 What is the P-value?
    Figure 4.1 The frequency function for the observed relative risk, assuming that the theoretical relative risk, RR = 1. The P-value is the probability of getting a value which is at least as large as the observed, that is:
    Let us assume that a stochastic variable has a distribution which is determined by a particular parameter and that we have a hypothesis about the
    P = P ( R
    R ^
    R
    R ^
    0
    | R R = 1 )
    value of that parameter. Sometimes this is called the null hypothesis to distinguish it from other hypotheses. An example of a hypothesis from the field of epidemiology is estimation of the theoretical relative risk. This is done with the help of the observed relative risk, which is a stochastic variable whose distribution depends on the theoretical relative risk. A common hypothesis is that the theoretical relative risk equals 1, in other words, that there is no association between exposure and disease. For each observation on the stochastic variable, the P-value
  • Book cover image for: Evidence-Based Diagnosis
    eBook - PDF

    Evidence-Based Diagnosis

    An Introduction to Clinical Epidemiology

    Statement 3 confuses the probability of the results given the null hypothesis with the probability of the null hypothesis given the results. The key is that the P-value is a conditional probability: it is calculated assuming that the null hypothesis is true. In this way, it is like 1  Specificity, which is calculated conditional on not having the disease. For any one research question, there are many possible null hypotheses, and hence many test statistics that can be calculated. For example, there are test statistics to compare means, ranks, and standard deviations between groups, and they will not always give the same P-value. Note that it is also possible to calculate distributions of test statistics and P-values under assumptions other than the null hypothesis. For example, in an equivalency study, one might want to test the hypothesis that drug A is inferior to drug B by a specified amount. This is like calculating test characteristics for disease A vs. disease B, as opposed to Disease A present and absent. In that case, “specificity” could be how often the test is negative in people with disease B rather than in everyone who does not have Disease A. This analogy between diagnostic and statistical tests can be visualized with a 2 × 2 table similar to the ones we used for diagnostic tests (Table 11.3). Just as was the case with diagnostic tests, what you really want is to go horizontally in this table – that is, what you want to know is the probability that there truly is a difference between groups, given the study results. But when you calculate a P-value, you are going vertically. That is, you assume the null hypothesis is true. We can summarize the Bayesian understanding of P-values exactly as we did when discussing diagnostic tests: What you thought before + New information = What you think now The new information, in this case, is the result of the study. The P-value is a measure of how consistent the result of the study is with the null hypothesis.
  • Book cover image for: Statistics
    eBook - PDF

    Statistics

    Unlocking the Power of Data

    • Robin H. Lock, Patti Frazer Lock, Kari Lock Morgan, Eric F. Lock, Dennis F. Lock(Authors)
    • 2021(Publication Date)
    • Wiley
      (Publisher)
    This leads us to one of the most important ideas of statistical inference: the p-value. The p-value measures how extreme sample results would be, if the null hypothesis were true. The P-value The p-value is the proportion of samples, when the null hypothesis is true, that would give a statistic as extreme as (or more extreme than) the observed sample. There are various ways to calculate p-values. In this chapter we’ll take an approach similar to the bootstrapping procedures of Chapter 3 and calculate the p-value by seeing where the sample statistic lies in a randomization distri- bution. Since the randomization distribution shows what is likely to occur by random chance if the null hypothesis is true, we find the p-value by determin- ing what proportion of the simulated statistics are as extreme as the observed statistic. Finding a P-value on a Randomization Distribution Example 4.10 Explain, using the definition of a p-value, how we can find the p-value for the light at night experiment from the randomization distribution in Figure 4.8. Solution The randomization distribution in Figure 4.8 shows 3000 simulated samples gen- erated by assuming the null hypothesis is true. To find the p-value, we find the proportion of these simulated samples that have statistics as extreme as the statistic observed in the original sample, x L − x D = 2.618. Figure 4.10 shows in red the simulated statistics in the randomization distribu- tion that are at or beyond 2.618. We see that only 53 of the 3000 simulated samples have differences in means that are so large. Thus, we have p-value = 53∕3000 = 0.018. If light at night does not affect weight gain, there is only about a 0.018 chance of getting a difference in means as extreme as the observed 2.618.
  • Book cover image for: Statistics for the Social Sciences
    eBook - PDF

    Statistics for the Social Sciences

    A General Linear Model Approach

    The Software Guides at the end of each chapter show how to find exact p-values using a computer. Statistical Significance. A related concept to the p-value is statistical significance. When a null hypothesis has been rejected (i.e., when p < α), the results are called “statistically significant. ” On the other hand, if the null hypothesis was retained, the results are “statistically insignificant. ” The term “statistical significance” only refers to whether the null hypothesis was rejected. Although the word “significant” may mean “important” in everyday language, the term “significant” has nothing to do with importance in statistics (Kline, 2013; Schmidt, 1996; Thompson, 2002a). For example, Meehl (1990, p. 207) reported that there is a statistically significant sex difference in the numbers of men and women who agree with the statement “My hands and feet are usually warm enough. ” Although Meehl could reject the null hypothesis that the two groups were equal on this variable, there is no theoretical or practical value in knowing this fact, because this sex difference is trivial in importance. Thus, results can be statistically significant but not important. Therefore, in a statistical and scientific context, you should never use the word “significant” to mean “important. ” Additionally, there are three kinds of significance in the social sciences (see Sidebar 8.3), meaning you should always specify “statistical significance” when referring to results in which the null hypothesis was rejected. Never use the words “significant” or “significance” alone (Thompson, 1996, 2002a). Sidebar 8.3 Three kinds of significance In the social sciences, there are three types of significance: statistical significance, practical significance, and clinical significance. This sidebar will describe each one and include illustrative examples. Statistical significance – as explained in the main text – depends on whether a null hypoth- esis has been rejected in an NHST procedure.
  • Book cover image for: Statistics from A to Z
    eBook - ePub

    Statistics from A to Z

    Confusing Concepts Clarified

    • Andrew A. Jawlik(Author)
    • 2016(Publication Date)
    • Wiley
      (Publisher)
    p-value is calculated as the Cumulative Probability of the area under the curve beyond the Test Statistic Value.
    A Statistic is a numerical property of a Sample, e.g., the Mean or Standard Deviation. A Test Statistic is a Statistic that has an associated Probability Distribution. The most common are z, t, F and Chi-Square. Below is the formula for z, which is used in analyses of Means:
    where is the Sample Mean and s is the Standard Deviation and μ is a specified value of the Population Mean. It could be an estimate, a historical value, or a target, for instance.
    Here's how the value of p (the “p-value”) is determined:
    • The Sample Data are used to calculate a value for the Test Statistic (1.2 in this example).
    • This Test Statistic value is plotted on the graph of the Probability Distribution of the Test Statistic.
    The height of the curve above each value on the horizontal axis is the Probability of that value occurring. The Cumulative Probability of a range of values occurring is the area under the curve above those values.
    • p is calculated (from tables or software) as the Cumulative Probability of the range of values from the Test Statistic value outward (from the Mean, which in this case is to the right, extending to infinity.)
    1. In Hypothesis Testing, p is compared with Alpha to determine the conclusion from an Inferential Statistics test. If pα, Reject the Null Hypothesis. If p > α, Fail to Reject (i.e., Accept) the Null Hypothesis
    (If you're not familiar with the concept of Alpha, it may be a good idea to read the article Alpha,
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.