Mathematics

Confidence Interval for Slope of Regression Line

A confidence interval for the slope of a regression line is a range of values within which the true slope is likely to fall. It provides a measure of the uncertainty associated with estimating the slope from a sample of data. The interval is calculated using the sample data and accounts for variability and potential error in the estimation process.

Written by Perlego with AI-assistance

11 Key excerpts on "Confidence Interval for Slope of Regression Line"

  • Book cover image for: Mind on Statistics (with JMP Printed Access Card)
    574 Chapter 14 EXAMPLE 14.7 95% Confidence Interval for Slope between Age and Sign-Reading Distance In Figure 14.5 (p. 571), we see that the sample slope is b 1 3.01 and s.e.( b 1 ) 0.4243. There are n 30 observations, so df 28 for finding t *. For a 95% confidence level, t * 2.05 (see Table A.2). The 95% confidence interval for the population slope is 3.01 2.05 0.4243 3.01 0.87 3.88 to 2.14 With 95% confidence, we can estimate that in the population of drivers represented by this sample, the mean sign-reading distance decreases somewhere between 2.14 and 3.88 feet for each 1-year increase in age. THOUGHT QUESTION 14.3 In previous chapters, we learned that a confidence interval can be used to determine if a hypothesized value for a parameter can be rejected. How would you use a confidence interval for the population slope to determine whether there is a statistically significant relationship between x and y ? For example, why is the interval that we just computed for the sign-reading example evidence that sign-reading distance and age are related?* *HINT: What is the null value for the slope? Section 13.5 discusses the connection be-tween confidence intervals and significance tests. F O R M U L A Formula for Confidence Interval for 1 , the Population Slope A confidence interval for 1 is b 1 t * s.e.( b 1 ) The multiplier t * is found by using a t -distribution with n 2 degrees of freedom and is such that the probability between t * and t * equals the confidence level for the interval. SPSS TIP Calculating a 95% Confidence Interval for the Slope • Use Analyze Regression Linear Regression . Specify the y variable in the Dependent box and specify the x variable in the Independent(s) box. • Click Statistics and then select “Confidence intervals” under “Regression Coefficients.” Testing Hypotheses about the Correlation Coefficient In Chapter 3, we learned that the correlation coefficient is 0 when the regression line is horizontal.
  • Book cover image for: Introductory Regression Analysis
    eBook - ePub

    Introductory Regression Analysis

    with Computer Application for Business and Economics

    • Allen Webster(Author)
    • 2013(Publication Date)
    • Routledge
      (Publisher)
    Y . Perhaps our detection of a relationship resulted from an aberration based on a misleading sample. Our sample data may be lying to us!
    If this is the case our model is perilously deceptive and any decision based on its outcome could result in a serious blunder. It is therefore essential that tests be conducted to determine the statistical significance of our findings. This chapter examines these tests and explores the importance and meaning of statistical significance . The two most common tests of significance are confidence intervals and hypothesis tests. The purpose of each test and the role each plays in the general scheme of statistical analysis are thoroughly examined in this chapter.
    3.1 CONFIDENCE INTERVAL ESTIMATION
    A primary reason to develop our regression model is to estimate and forecast values for the dependent variable. In Chapter 2 our efforts provided values for b 0 and b 1 which serve as point estimators for the unknown parameters β0 and β1 . Given the model based on the Arco data, β0 was estimated at the point b 0 = 7.493 and β1 was estimated at the point b 1 = 1.636. But how much confidence do you have that your estimate is correct? Probably not much.
    However, interval estimates are much more meaningful. They provide both an upper and lower bound within which we think the parameter might fall. Unlike point estimates, we can actually attach some level of confidence we have as to whether the parameter is indeed within the interval. How confident are you that β1 = b 1 = 1.636? Not very! In fact, you are probably wrong. There was most likely some sampling error causing b 1 to vary somewhat from β1 . No assurance can be taken that β1 is indeed 1.636.
    But with an interval estimate a desired degree of confidence can be chosen to accompany the estimate. It is possible to stipulate a level of confidence that the unknown parameter is bounded within that specified interval. Common levels of confidence are 90%, 95%, and 99%. There is nothing special about those levels; they are merely customary. If you are less than 90% confident in your conclusion, nobody's interested. You can't be 100% confident about your findings regarding a parameter unless you examine the entire population, and we have repeatedly noted the impracticality in that effort. So stated levels of confidence between 90% and 99% are archetypal.
  • Book cover image for: Understanding Business Statistics
    • Ned Freed, Stacey Jones, Timothy Bergquist(Authors)
    • 2013(Publication Date)
    • Wiley
      (Publisher)
    422 C H A P T E R 1 1 Basic Regression Analysis Not surprisingly, the width of our estimating interval is influenced directly by the level of confidence and inversely by the sample size. Equally unsurprising is the role of s y.x in establish- ing interval width: larger values of s y.x will produce wider intervals. This simply reflects the fact that when the data points are widely scattered around the estimated regression line, our ability to determine precisely what the population line looks like is diminished. As you can see from the ( x*  x ) 2 term in Expression 11.18, the width of the interval is smallest when x* is equal to x , since this makes ( x*  x ) 2  0. As the distance between x* and x increases, the interval gets wider. Importantly, this means that estimating expected y values for xs that are far from the average x can produce a very imprecise result. Figure 11.20 illus- trates the point. FIGURE 11.20 Confidence Intervals for Expected y Values The confidence interval is narrowest at x , the average value of x. Intervals are considerably wider for x values far from x . y x y  a  bx Upper Interval Boundaries Lower Interval Boundaries x Estimating an Expected y Value For our mobile apps example, construct a 95% confidence interval estimate of expected number of downloads for the set of all apps that have 45 linking websites (that is, estimate E( y 45 )). Solution: NOTE: Since in the hypothesis test we conducted for this case we were unable to reject the “no relationship” null hypothesis, we would generally not bother to build the interval called for here. However, for demonstration purposes, we’ll show this step. From previous work, s y.x  B 27000 4  2  116.2. The general interval is y ˆ ; ts y.x B 1 n  ( x*  x ) 2 a ( x  x ) 2 For x  45, the estimated regression equation gives y ˆ  440  11( 45)  935.
  • Book cover image for: Statistical Methods and the Geographer
    • S Gregory, Stanley Gregory(Authors)
    • 2014(Publication Date)
    • Routledge
      (Publisher)
    all rainfall ranges. There is here the possibility that as rainfall reaches higher values, e.g. about 80 cm, then run-off values may deviate from such an hypothetical relationship. This is always a problem with regression lines, and it is only safe to apply them to the ranges of values on which the calculations are based. In this case the regression line and confidence limits should be satisfactory for falls at least between 40 and 65 cm and almost certainly between 35 and 70 cm, i.e. within the likely range of values. Exceptional falls, whether they be high or low, may not be so adequately interpreted.
    Before leaving this theme of regression equations related to two sets of variables, two further points must be briefly raised. Firstly, it has been argued in previous pages that, because the data being analysed are sample data only, it is essential to test both the correlation coefficient and the regression equation by the t test to assess their statistical significance; also because any relationship is unlikely to be a perfect one, the standard error of the estimate must be calculated. Additional to these, however, it follows that the regression coefficient, which expresses the degree of slope of the regression line, is also only an estimate of the true regression line based on the total population (see p. 195), so that another separate sample of that same population is likely to yield a somewhat different estimate of the true regression coefficient. It is therefore desirable to obtain the standard error of any regression coefficient that is calculated from the available sample, as this will permit an assessment of its statistical significance, an estimate of the range within which the true regression coefficient is likely to lie and the calculation of a critical term within the standard error of the forecast (see p. 203).
    This error term is obtained as follows:
    In the rainfall/run-off example which has been used above (see pp. 178 and 200 for values), this becomes:
  • Book cover image for: Essentials of Applied Econometrics
    83 CHAPTER N ow that we know the properties of our ordinary least-squares (OLS) estimator, we can test hypoth-eses and set up confidence intervals around our model’s parameters. Using estimates from sample data to make statements about the whole population is what sta-tistics is all about. Usually, we are most interested in the slope parameters (in our multiple regression model, β 1 , β 2 , etc.), because economists like to be able to say things about the correlation between one variable (say, income) and another (demand). These correlations are also ingredients of other interesting economic objects, such as elasticities and causal effects . Our regression gave us an estimate of β 1 (or β 1 , β 2 , . . ., β K in the case of multiple regression), but so far we don’t know how confident we can be that what we estimated is close to the population parameter. That is where hypothe-sis testing and confidence intervals come in. HYPOTHESIS TESTING An econometric hypothesis is about a potential value of the population parameter. We want to determine whether that value seems reasonable given our data. We often want to test the hypothesis that our parameter value equals LEARNING OBJECTIVES Upon completing the work in this chapter, you will be able to: Test a hypothesis about a regression coefficient Form a confidence interval around a regression coefficient Show how the central limit theorem allows econometricians to ignore assumption CR4 in large samples Present results from a regression model Hypothesis Testing and Confidence Intervals 6 84 Chapter 6 | Hypothesis Testing and Confidence Intervals zero, that is, whether there is any relationship at all between the right-hand-side and dependent variables. But we can pick any hypothesis that is relevant to our analysis. Suppose we are interested in the population parameter β 1 . Let’s call our hypothesized value of this parameter β 1 * .
  • Book cover image for: Handbook of Regression and Modeling
    eBook - PDF

    Handbook of Regression and Modeling

    Applications for the Clinical and Pharmaceutical Industries

    In addition, the regression line will always go through ( x , y ), the pivot point. The more the variability in s ^ y 2 , the greater the swing on the pivot point, as illustrated in Figure 2.16. The true regression equation ( ^ y P ) is somewhere between ^ y L and ^ y U (estimate of y lower and upper). The regression line pivots on the y , x axis to a certain degree, with both b 0 and b 1 varying. Because the researcher does not know exactly what the true regression linear function is, it must be estimated. Any of the ^ y ( y -predicted) values on particular x i values will be wider, the farther away from the mean ( x ) one estimates in either direction. This means that the ^ y CI is not parallel to the regression line, but curvilinear (see Figure 2.17). CONFIDENCE INTERVAL OF ^ y A 1 – a CI for the expected value—average value—of ^ y for a specific x is calculated using the following equation: ^ y t ( a = 2; n 2) s y , (2 : 21) 50 Handbook of Regression and Modeling y y y U y U y P y P y L y L x x ^ ^ ^ ^ ^ ^ ^ ^ FIGURE 2.16 Regression line pivots. Upper y confidence interval Lower y confidence interval x y Estimated y regression line ^ y x FIGURE 2.17 Confidence intervals. Simple Linear Regression 51 where ^ y ¼ b 0 þ b 1 x and s y ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi MS E 1 n þ ( x i x ) 2 P n i ¼ 1 ( x i x ) 2 2 6 6 4 3 7 7 5 v u u u u u t (2 : 22) and where x i is the x value of interest used to predict ^ y i : MS E ¼ P n i ¼ 1 ( y i ^ y i ) 2 n 2 ¼ P n i ¼ 1 e 2 i n 2 : Example 2.3: Using the data in Table 2.1 and from Equation 2.1, we note that the regression equation is ^ y ¼ 6.13 0.041 x . Suppose the researcher would like to know the expected (average) value of y , as predicted by x i , when x i ¼ 15 sec.
  • Book cover image for: Essential Statistics for Economics, Business and Management
    • Teresa Bradley(Author)
    • 2014(Publication Date)
    • Wiley
      (Publisher)
    Note: the confidence interval also supported the claim that β = 5. 8. p-value: 0.5 The calculations for confidence intervals and test statistic for slope are summarised in Chart 12.1. Sample (x i , y i ), i = 1, 2, ....n. n < 30? By calculator (or otherwise) find a, b, r, Calculate sample standard error of estimate ( ) 2 2 2 2 − − = n b n s x y e σ σ Calculate sample standard error for slope − = 2 ) ( x x s s i e b Look up Z /2 α Look up t n-1, /2 Confidence interval b ± ( 2 / . 2 α − n t ) × (s b ) Calculate the test statistic T = Σ b H s b 0 β − N Y α Chart 12.1 Calculation of confidence interval and test static for slope. Skill Development Exercise 12 1: Inference about slope The Engineers Profit when the number of PCs sold were as follows Table SK 12.1 Engineers profit vs. PCs sold PCs sold 1 3 5 9 12 Profit (£ 00) −2 8 4 38 18 (a) Carry out the background calculations: y = a + bx , σ 2 x , σ 2 y , s e , ∑ (x i − x ) 2 = nσ 2 x , s b (b) Construct a 95 % confidence interval for slope. (c) Test the hypothesis that the profit increases by at least £ 200 for each additional PC sold. [ 510 ] C H A P T E R 12 Describe the results of (b) and (c) verbally. Is there any evidence to support the claim that there is a relationship between PCs sold and profit? Answer Background calculations The least-squares line is ˆ y = −2.1 + 2.55x : σ 2 y = 196.16: σ 2 x = 16 The standard error of estimate The variance of estimate, s 2 e = n n − 2 ( σ 2 y − b 2 σ 2 x ) = 153.53 N.B Don’t forget to take the square root to get the standard error: s e =  s 2 e = √ 153.53 = 12.3909 The standard error for slope, s b = s e  ∑ (x i − x ) 2 = 1.3853 where ∑ (x − x ) 2 = nσ 2 x = 80 (b) Point estimate ± (confidence coefficient) × (standard error) b ± (t n−2,α/2 ) × (s b ) From Tables, t df = 3,α/2 = 2.5% = 3.182 2.55 ± (3.182)(1.3853) or 2.55 ± 4.4081 We are 95 % confident that the slope is between −1.8581 to 6.9581.
  • Book cover image for: Linear Regression Analysis
    • George A. F. Seber, Alan J. Lee(Authors)
    • 2012(Publication Date)
    • Wiley
      (Publisher)
    D 5.2 CONFIDENCE BANDS FOR THE REGRESSION SURFACE 5.2.1 Confidence Intervals Once we have estimated ß from n observations Y, we can use the predictor Y = ßo + ßixi +■■■ + ßp-ixp..! (= x'ß, say) for studying the shape of the regression surface f{xi,x 2 ,...,x p -i) = ß 0 + ßix x + --- + /V-iZp-i =x'j9 over a range of values of the regressors Xj . In particular, we can construct a two-sided 100(1 — a)% confidence interval for the value of / at a particular value of x, say, xo = (l,xoi,x 0 2,... ,xo, p _i)', using % = x' 0 ß. Thus from (4.17), we have the interval Y Q ±t^ a S^ 0 , (5-21) where v 0 — Xo(X'X) _1 xo. If we are interested in k particular values of x, say, x = a, (j = 1,2,..., k), then we can use any of the three methods discussed in Section 5.1 to obtain k two-sided confidence intervals for the aJ/3 with a joint confidence probability of at least 1 — a. (Application of the Bonferroni and the Scheffé intervals to this problem seems to be due to Lieberman [1961].) 5.2.2 Confidence Bands If we are interested in all values of x, then using Scheffé's method we have from (5.15) that x'ß lies in x'ß ± (pF^ n _ p )^ 2 S{x'(X'X)-l x}^ 2 (5.22) for all x = (l,xi,X2,... ,x p _i)', with an exact overall probability of 1 -a. (Although the first element of x is constrained to be unity, this does not mean that the appropriate constant in (5.22) should now be [(p -l)^p a _i, n _ p )] 1 ^ 2 ; the interval is invariant under a scale change of one element of x; cf. Miller [1981: pp. 110-114]). The expression above gives two surfaces defined by the functions /° and /o, where pr[/°(xi,o;2,---,Zp-i) > /(a;i,x 2 ,...,Xp_i) > fo(xi,x 2 ,...,x p -i), allxi,x 2 ,---,Xp-i] = 1-a. 130 CONFIDENCE INTERVALS AND REGIONS The region between f° and /o is commonly called a confidence band. As pointed out by Miller [1981], the band over that part of the regression surface that is not of interest, or is physically meaningless, is ignored.
  • Book cover image for: Statistics
    eBook - PDF

    Statistics

    Unlocking the Power of Data

    • Robin H. Lock, Patti Frazer Lock, Kari Lock Morgan, Eric F. Lock, Dennis F. Lock(Authors)
    • 2021(Publication Date)
    • Wiley
      (Publisher)
    616 C H A P T E R 9 Inference for Regression 9.1 INFERENCE FOR SLOPE AND CORRELATION In Sections 2.5 and 2.6 we introduce summary statistics for correlation and linear regression as ways to describe the relationship between two quantitative variables. In Chapters 3 and 4 we see examples for doing inference for these quantities using bootstrap distributions and randomization tests. In this chapter we develop methods similar to those in Chapters 5 and 6 for applying standard distributions to help with inferences for quantitative vs quantitative relationships. Simple Linear Model For a simple linear model we have a quantitative response variable (Y) and a quan- titative explanatory variable (X). We assume the values of Y tend to increase or decrease in a regular (linear) way as X increases. This does not mean an exact relationship with all points falling perfectly on a line. A statistical model generally consists of two parts: one specifying the main trend of the relationship and the sec- ond allowing for individual deviations from that trend. For a simple linear model, a line (specified with a slope and an intercept) shows the general trend of the data, and individual points tend to be scattered above and below the line. Recall from Section 2.6, we use the following notation for the least squares line for a sample:  Y = b 0 + b 1 X We use the following notation to express a simple linear model for a population: Y =  0 +  1 X +  The linear part of the model ( 0 +  1 X) reflects the underlying pattern for how the average Y behaves depending on X. We use Greek letters for the intercept and slope 1 in the model since they represent parameters for the entire population. The error term in the model (denoted by ) allows for individual points to vary above or below the line. In practice, just as we rarely know the mean, , or proportion, p, for an entire population, we can only estimate the population slope and intercept using the data in a sample.
  • Book cover image for: Introduction to Statistical Data Analysis for the Life Sciences
    The right panel of Figure 5.5 shows 95% confi-dence intervals for 50 samples of size 40. Compared to the left panel, the confidence intervals are more narrow but still include the true value for all but a few cases — with a probability of 95%. As argued above, the interpretation of confidence intervals is based on replication of the experiment or data collection. From a practical point of view this might not seem very useful as there is only one dataset and thus one confidence interval available. The true value is either inside the interval or it is not, but we will never know. We can, however, interpret the values in the confidence interval as those parameter values for which it is reasonable to believe that they could have generated the data. If we use 95% confidence intervals, and if the true parameter value is μ 0 , then • the probability of observing data for which the corresponding confi-dence interval includes μ 0 is 95% • the probability of observing data for which the corresponding confi-dence interval does not include μ 0 is 5% In other words, if the true value is μ 0 , then it is quite unlikely to observe data for which the confidence interval does not include μ 0 . 124 Introduction to Statistical Data Analysis for the Life Sciences As a standard phrase we may say that the 95% confidence interval includes those values that are in agreement with the data on the 95% confidence level. For the crab weight data we computed the 95% confidence interval (12.41, 13.11) for the mean μ in Example 5.6 (p. 122). We conclude that an average crab weight (in the population) between 12.41 and 13.11 is in accordance with the observed data on the 95% confidence level. 5.3.4 Confidence intervals for linear models Above we computed confidence intervals in the one-sample case. It turns out that we can use the exact same machinery for all linear models. Consider the linear model y i = μ i + e i , i = 1, . . . , n from Section 5.1, where e 1 , . . . , e n are iid.
  • Book cover image for: Applied Statistics for Engineers and Scientists
    • Jay Devore, Nicholas Farnum, Jimmy Doi, , Jay Devore, Nicholas Farnum, Jimmy Doi(Authors)
    • 2013(Publication Date)
    530 CHAPTER 11 Inferential Methods in Regression and Correlation The standard deviation of the prediction error is the square root of this expression, and the estimated standard deviation results from replacing H9268 2 by s 2 e . Using these results to standardize the prediction error gives a t variable from which the prediction interval is obtained. Without s 2 e under the square root in the prediction interval formula, we would have the confidence interval formula. This implies that the prediction interval (PI) is wider than the confidence interval (CI)—often much wider because s 2 e is frequently much larger than s 2 y n . The prediction level for the interval is interpreted in the same way that a confidence level was previously interpreted. If a prediction level of 95% is used in calculating interval after interval from different samples, in the long run about 95% of the calculated intervals will include the value y that is being predicted. Of course, we will not know whether the single interval that we have calculated is one of the good 95% until we have observed y . The standardized variable 5 n 2 * 2 2 1 2 n has a distribution with 2 2 df. This implies that a prediction interval for a future y value y * to be observed when x x * is n 6 ( critical value) 2 2 1 s 2 n Let’s return to the carbonation depth–strength data of Example 11.7 and calculate a 95% prediction interval for a strength value that would result from selecting a single core specimen whose carbonation depth is 45 mm. Relevant quantities from that example are y n 5 13.79 s y n 5 .7582 s e 5 2.8640 For a prediction level of 95% based on n 2 2 5 16 df, the t critical value is 2.120, exactly what we previously used for a 95% confidence level.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.