Mathematics

Linear Regression

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables and aims to find the best-fitting line that minimizes the differences between the observed and predicted values. This technique is commonly used for prediction and forecasting in various fields.

Written by Perlego with AI-assistance

11 Key excerpts on "Linear Regression"

  • Book cover image for: Best Fit Lines & Curves
    eBook - PDF

    Best Fit Lines & Curves

    And Some Mathe-Magical Transformations

    • Alan R. Jones(Author)
    • 2018(Publication Date)
    • Routledge
      (Publisher)
    An act of reflecting and recalling memories from an earlier stage of life, or an alleged previous life Simple and Multiple Linear Regression 4 114 | Simple and Multiple Linear Regression 2. A statistical technique that determines the ‘Best Fit’ relationship between two or more variables Although we may be tempted to consider Regression as a means of entering a trance like state where we can reflect on life before we became estimators, it is, of course, the second definition that is of relevance to us here. However, this definition for me (although correct) does not fully convey the process and power of Regression Anal-ysis as it misses the all-important element of what determines ‘best fit’. Instead, we will revise our definition of Regression to be: Definition 4.1 Regression Analysis Regression Analysis is a systematic procedure for establishing the Best Fit rela-tionship of a predefined form between two or more variables, according to a set of Best Fit criteria. Note that Regression only assumes that there is a relationship between two or more variables; it does not imply causation. It also assumes that there is a continuous relation-ship between the dependent variable, i.e. the one we are trying to predict or model, and at least one of the independent variables used as a driver or predictor. One of the primary outputs from a Regression Analysis is the Regression Equa-tion, which we would typically use to interpolate or extrapolate in order to generate an estimate for defined input values (drivers). The technique has a very wide range of applications in business and can be used to identify a pattern of behaviour between one or more estimate drivers (the independent variables) and the thing or entity we want to estimate (the dependent variable). Examples might include the relationship between cost and a range of physical parameters, sales forecasts and levels of marketing budgets, Learning Curves, Time Series .
  • Book cover image for: A Whistle-Stop Tour of Statistics
    111 6 Linear Regression Models 6.1 SIMPLE Linear Regression Regression: A frequently applied statistical technique that serves as a basis for studying and characterizing a system of interest, by formulating a reasonable mathematical model of the relationship between a response variable y and a set of p explanatory variables x 1 , x 2 , … x p . The choice of an explicit form of the model may be based on previous knowledge of a system or on considerations such as ‘smoothness’ and continuity of y as a function of the explanatory vari- ables (sometimes called the independent variables, although they are rarely independent; explanatory variables is the preferred term). Simple Linear Regression: A Linear Regression model with a single explanatory variable. The data consist of n pairs of values ( y 1 , x 1 ), ( y 2 , x 2 ), … ( y n , x n ). The model for the observed values of the response variable is y x i n i i i = + + = β β ε 0 1 1 , … where β 0 and β 1 are, respectively, the intercept and slope parameters of the model and the ε i are error terms assumed to have a N(0, σ 2 ) distribution. The parameters β 0 and β 1 are estimated from the sample observations by least squares, i.e., the minimization of S i i n = = ∑ ε 2 1 S y x i i n i = - - = ∑ ( ) 1 0 1 2 β β 112 A Whistle-Stop Tour of Statistics ∂ ∂ = - - - ∂ ∂ = - - = = ∑ ∑ S y x S y i i n i i i n β β β β β 0 1 0 1 1 1 2 2 ( ) ( 0 1 - β x x i i ) Setting ∂ ∂ = ∂ ∂ = S S β β 0 1 0 0 , leads to the following estimators of the two model parameters: ˆ ˆ , ˆ ( )( ) ( ) β β β 0 1 1 1 2 1 = - = - - - = = ∑ y x y y x x x x i i i n i i n ∑ The variance σ 2 is estimated by s y y n i i n 2 2 1 2 = - - = ∑ ( ) . The estimated variance of the estimated slope parameter is Var( ˆ ) ( ) .
  • Book cover image for: Simple Statistical Tests for Geography
    261 11 Regression Analysis 11.1 Simple Linear Regression Simple Linear Regression is a method that allows a ‘best-fit’ line to be added to a set of points on an x – y plot or scatter-graph. There are many uses for regression in geography and related disciplines. For example, when you know the value on the horizontal axis it allows you to define the most likely value on the vertical axis. Where one of the axes rep-resents space or time the best-fit or regression line can be used, with care, to make predic-tions that go beyond the range of the measurements, providing a method of prediction. Relationships defined using regression can also be extended into the past allowing, for example, the reconstruction of past climate and environmental change. When I was a student, simple Linear Regression was a method that was just touched on at the end of a typical geography course on statistical methods and the complexity of the mathematics made it very difficult to use. With modern computers, however, all of that has changed and performing regression analysis is remarkably simple and requires no math-ematics at all. In fact you have already seen it performed in the last chapter, because the ‘trendline’ that is fitted to an x – y plot or scatter-graph in a spreadsheet is actually a ‘regres-sion line’. It appears at the click of a mouse. The ease with which regression can now be con-ducted is both a blessing and a curse for geography students. It is very easy to do it but it is also very easy to do it wrong and produce absolute nonsense. If you want to use regression it is really important that you understand how it works (Figure 11.1). Only then will you be able to check the assumptions have been met and sensibly interpret your results. 11.2 The Straight Line Equation In the chapter on correlation analysis x – y plots were used to illustrate the shape of the relationship between two parameters or variables.
  • Book cover image for: Statistical Concepts for the Behavioral Sciences
    Linear Relations Definition of a Linear Relation Slope of a Line The Y-Intercept of a Line Example Linear Relations and Their Equations Finding a Linear Regression Line An Example Problem A More Realistic Example Problem The Least-Squares Criterion The Slope and Y-Intercept of a Least-Squares Regression Line for Predicting Y from X Fitting a Least-Squares Regression Line to Scores Error in Prediction Residuals: A Measure of the Error in Prediction The Standard Error of Estimate Calculating s Y·X from r Information Provided by the Standard Error of Estimate Predicting X from Y Using a Linear Regression Line A Further Look at Regression Toward the Mean Multiple Regression Summary Key Terms and Symbols Review Questions Integrating Your Knowledge Regression Analysis: Predicting Linear Relationships C H A P T E R 14 G oogle the word prediction and you are likely to have over 40 million hits. Clearly, the concept of predicting events is popular among the public. Indeed, we all make predic- tions or use those made by others. You may try to predict a course grade from your first exam in a course. How you dress for the day often depends on the predicted weather. Sci- entists use research hypotheses to predict the effect of an independent variable on a dependent variable. College admission officers want to predict grade point averages from standardized achievement tests. Personnel managers may predict employee performance from employment test scores. Political pundits want to predict the outcomes of elections, and your doctor may predict your risk of developing a certain disease from the results of a medical test. This chapter deals with the basic concepts of predicting one variable from another. We ob- viously cannot address all the forms of prediction identified above, some of which are quite complex. Many of the methods used for prediction were developed by Sir Francis Galton (1822–1911).
  • Book cover image for: An Essential Guide to Business Statistics
    • Dawn A. Willoughby(Author)
    • 2016(Publication Date)
    • Wiley
      (Publisher)
    We are specifically interested in the line that will give the best description of the linear relationship between the two variables. Simple Linear Regression involves finding the equation of this line which will provide the best possible prediction for the dependent variable based on the independent variable; it is known as the regression line or the line of best fit. Finding the Line of Best Fit For any straight line that we choose to draw on a scatter diagram, there will be differences between each data point and the corresponding position on the straight line. These differences, also known as residuals, can be positive or negative values depending on whether the data point lies above or below the straight line. A graphical representation of this concept is shown below. residuals y x 8 S I M P L E L I N E A R R E G R E S S I O N 249 Each residual is the difference between the actual y-value of the data point and the y-value that we would predict if we used the linear equation of this line for the prediction. These differences represent the random variation that occurs in the relationship between an independent and a dependent variable, as we described in the previous section. The total magnitude of the residuals, regardless of whether the residual is positive or negative, is a measure of the effectiveness of the line we have chosen in terms of how well it fits the data points. In finding the line of best fit, our aim is to draw the line which best fits the data points and so minimises these differences. This line will provide us with the best prediction for the dependent variable based on the values of the independent variable. To be able to identify this line, we need to calculate the gradient and y-intercept; this is achieved using the least squares method which was developed by Adrien-Marie Legendre (1752–1833).
  • Book cover image for: Applied Medical Statistics
    • Jingmei Jiang(Author)
    • 2022(Publication Date)
    • Wiley
      (Publisher)
    311 Applied Medical Statistics , First Edition. Jingmei Jiang. © 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc. Companion website: www.wiley.comgojiangappliedmedicalstatistics In this chapter, we present analyses to determine the strength of the relationship between two variables. The magnitude of one of the variables (the dependent variable y ) is assumed to be determined by a function of the magnitude of the another variable (the independent variable x ), whereas the reverse is not true. In particular, we will look for straight-line (or linear) changes in y as x changes. The term “dependent” does not necessarily imply a cause-and-effect relationship between the two variables. Such a dependence relationship is called simple Linear Regression , or Linear Regression in short. The term “simple” is used because there is only one independent variable x . Starting from the basic concepts, we will systematically introduce the modeling principles of Linear Regression, statistical inference of parameters, and the application of regression model. Multiple Linear Regression, which considers two or more independent variables, will be introduced in Chapter 15. For convenience, in this chapter, we use lower case letters x and y to denote the dependent and independent variables, where y is still a random variable. 13.1 Concept of Simple Linear Regression Let us consider the following example: Example 13.1 To determine the relationship between weight and lung function in schoolboys, a doctor measured the weight (kg) and forced vital capacity (FVC, L) of 20 15-year-old schoolboys.
  • Book cover image for: Applied Regression Analysis and Other Multivariable Methods
    • David Kleinbaum, Lawrence Kupper, Azhar Nizam, Eli Rosenberg(Authors)
    • 2013(Publication Date)
    47 Straight-line Regression Analysis 5 5.1 Preview The simplest (but by no means trivial) form of the general regression problem deals with one dependent variable Y and one independent variable X . We have previously described the general problem in terms of k independent variables X 1 , X 2 , . . . , X k . Let us now restrict our attention to the special case k 5 1 but denote X 1 as X to keep our notation as simple as possible. To clarify the basic concepts and assumptions of regression analysis, we find it useful to begin with a single independent variable. Furthermore, researchers often begin by looking at one independent vari-able at a time even when several independent variables are eventually jointly considered. 5.2 Regression with a Single Independent Variable We begin this section by describing the statistical problem of finding the curve (straight line, parabola, etc.) that best fits the data, closely approximating the true (but unknown) relation-ship between X and Y . 5.2.1 The Problem Given a sample of n individuals (or other study units, such as animals, plants, geographical loca-tions, time points, or pieces of physical material), we observe for each a value of X and a value of Y . We thus have n pairs of observations that can be denoted by 1 X 1 , Y 1 2 , 1 X 2 , Y 2 2 , . . . , 1 X n , Y n 2 , where the subscripts now refer to different individuals rather than different variables. Because these pairs may be considered as points in two-dimensional space, we can plot them on a graph. Such a graph is called a scatter diagram. For example, measurements of age and systolic blood pressure for 30 individuals might yield the scatter diagram given in Figure 5.1. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
  • Book cover image for: Foundations of Predictive Analytics
    Chapter 4 Linear Modeling and Regression The most common data modeling methods are regressions, both linear and logistic. It is likely that 90% or more of real world applications of data mining end up with a relatively simple regression as the final model, typi-cally after very careful data preparation, encoding, and creation of variables. There are many kinds of regression: both linear, logistic and nonlinear, each with strengths and weaknesses. Many regressions are purely linear, some only slightly nonlinear, and others completely nonlinear. Most multivariate regres-sions consider each independent variable separately and do not allow for non-linear interaction among independent variables. Treatment of nonlinearities and interactions can be done through careful encoding of independent vari-ables such as binning or univariate or multivariate mapping to nonlinear func-tions. Once this mapping has been done one can then do a Linear Regression using these new functions as independent variables. We can state our problem as that of finding a best fitting function for a set of data. We can think of this as fitting a modeling function that contains a number of free, adjustable parameters so that we get a “best fit.” We do this by building an objective function and then deriving a mathematical pro-cedure to set or adjust these free parameters so that our fitting function is this “best fit.” This fitting function in a modeling exercise is usually a func-tional relationship between a large set of independent variables and this single independent variable. We typically have an overdetermined problem because we are given a large set of data that consists of many example records, each record consisting of a vector of independent variables followed by a single dependent variable.
  • Book cover image for: Business Analytics
    • Jeffrey Camm, James Cochran, Michael Fry, Jeffrey Ohlmann(Authors)
    • 2018(Publication Date)
    Independent variable(s) The variable(s) used for predicting or explaining values of the dependent variable. It is denoted by x and is often referred to as the predictor variable. Interaction Regression modeling technique used when the relationship between the depen- dent variable and one independent variable is different at different values of a second inde- pendent variable. Interval estimation The use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter. k-fold cross-validation Method of cross-validation in which sample data set are randomly divided into k equal-sized, mutually exclusive and collectively exhaustive subsets. In each of k iterations, one of the k subsets is used to evaluate a candidate model that was con- structed on the data from the other k – 1 subsets. Knot The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise Linear Regression model; also called the break- point or the joint. Least squares method A procedure for using sample data to find the estimated regression equation. Copyright 2019 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Glossary 353 Leave-one-out cross-validation Method of cross-validation in which candidate models are repeatedly fit using n – 1 observations and evaluated with the remaining observation. Linear Regression Regression analysis in which relationships between the independent variables and the dependent variable are approximated by a straight line.
  • Book cover image for: An Introduction to Statistical Inference and Its Applications with R
    • Michael W. Trosset(Author)
    • 2009(Publication Date)
    • CRC Press
      (Publisher)
    Chapter 15 Simple Linear Regression This chapter continues our study of the relationship between two random variables, X and Y . In Chapter 14, we quantified association by measures of correlation, e.g., Pearson’s product-moment correlation coefficient. Another way to quantify the association between X and Y is to quantify the extent to which knowledge of X allows one to predict values of Y . Notice that this approach to association is asymmetric: one variable (conventionally denoted X ) is the predictor variable and the other variable (conventionally denoted Y ) is the response variable . 1 Given a value, x ∈ , of the predictor random variable, we restrict attention to the experimental outcomes that can possibly result in this value: S ( x ) = X − 1 ( x ) = { x ∈ S : X ( s ) = x } . Restricting Y to S ( x ) results in a conditional random variable, Y | X = x . We write the expected value of this random variable as μ ( x ) = E ( Y | X = x ) and note that μ ( · ) varies with x . Thus, μ ( · ) is a function that assigns mean values of various conditional random variables to values of x . This function is the conditional mean function, also called the prediction function or the regression function . Given X = x , the predicted value of Y is ˆ y ( x ) = μ ( x ). The nature of the regression function reveals much about the relation between X and Y . For example, if larger values of x result in larger values of μ ( x ), then there is some kind of positive association between X and 1 The predictor variable is often called the independent variable and the response vari-able is often called the dependent variable . We will eschew this terminology, as it has nothing to do with the probabilistic (in)dependence of events and random variables. 379 380 CHAPTER 15. SIMPLE Linear Regression Y . Whether or not this kind of association has anything to do with the correlation of X and Y remains to be seen.
  • Book cover image for: Elementary Statistics for Business and Economics
    • Carl-Louis Sandblom(Author)
    • 2019(Publication Date)
    • De Gruyter
      (Publisher)
    For example, let: 246 12. Simple Linear Regression x = total daily volume of ice-cream sold in the Montreal region, y = total daily volume of soft drink sold in the Montreal region. Then x and y are probably not directly causally related to each other, but rather both dependent on z = daily highest afternoon temperature in the Montreal region. Remark 5: It may be that our Linear Regression analysis will tell us that our data do not firmly support the linear relation y = a + fix. Still x and y could be related, but the relation may be nonlinear (y = a + ^ or y = a-e _ / ) x or y = aj//J — x 2 , etc.). Remark 6: It may be that our data for y and x give a good fit to a straight line, say for 1 5 ^ x ^ 3 5 : Fig. 12/3: Good fit to a straight line for 15 S x ^ 35. Having found a regression line y = a + j8x, we can then, for given x, use the line to predict y. But this can of course only be done for 15 ^ x ^ 35 and for circumstances similar to those for which the data were obtained. The true hid-den relation may significantly deviate from our regression line outside the inter-val 15 ^ x ^ 35: 12.2 The Method of Least Squares Y 247 Fig. 12/4: Poor fit to a straight line outside 15 g x g 35. 12.2 The Method of Least Squares Consider the following problem. Given n pairs of observations of the variables x and y: (x i5 y^, i = 1,..., n. How can we find a best regression line n y = & + /?x? More specifically, how do we find 6t, /? such that £ (Yi — ft) 2 is i = l minimal, where ft is the fitted y ; - value, i. e. & = & + fix?. We refer to £ (y; — Y;) 2 as the sum of squared errors, SSE, because y; — ft is the error we make if the observed value y, is replaced by the fitted value ft. This error can be interpre-ted as the vertical distance from the point (x b y j to the fitted line y = 6t + fix. Fig. 12/5: Observed points and fitted line. x 248 12. Simple Linear Regression We have: SSE= £ ( y i -Yi) 2 = £ ( y i -a -j ^ ) 2 .
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.