Technology & Engineering
Correlation and Regression
Correlation and regression are statistical techniques used to measure the relationship between two or more variables. Correlation assesses the strength and direction of the relationship, while regression helps to predict the value of one variable based on the value of another. These methods are commonly used in analyzing data and making predictions in various fields, including technology and engineering.
Written by Perlego with AI-assistance
Related key terms
1 of 5
12 Key excerpts on "Correlation and Regression"
- eBook - PDF
- Evert W. Johnson(Author)
- 2000(Publication Date)
- CRC Press(Publisher)
429 0-8493-7686-6/97/$0.00+$.50 © 1997 by CRC Press LLC 15 Regression and Correlation 15.1 INTRODUCTION Regression and correlation are concepts that are associated with the relationships between two or more random variables. Both were discussed briefly in Chapter 9, correlation in Section 9.8 and regression in Section 9.10. In this chapter the coverage will be expanded and made more complete. The subject matter associated with regression and correlation is, however, far too extensive for complete coverage here. For more information, reference is made to such standard texts as Draper and Smith (1981). Regression is concerned with the development of graphs, or mathematical expressions corre-sponding to graphs, showing the relationship between a variable of interest and one or more other variables. The utility of such graphs or mathematical expressions arises from the fact that they make it possible to 1. Visualize the relationship (e.g., between stand volume and stand age, stand density, and site index), 2. Make indirect estimates of variables of interest which are difficult to measure in a direct manner (e.g., merchantable volume of a tree using dbh and merchantable or total tree height), and 3. Make predictions regarding future developments (e.g., future dbh based on past dbh as revealed by increment borer cores). As was brought out in Section 9.8, correlation is concerned with the strength of the relationship between the variables. A relationship between two or more variables may exist and take the form indicated by a regression analysis, but if it is to be used to make indirect estimates or predictions, it must be strong enough so that the estimates or predictions are meaningful. Therefore, a measure of the degree of correlation existing between the variables is usually included in a statement based on the regression. - eBook - ePub
- K. Nirmal Ravi Kumar(Author)
- 2020(Publication Date)
- CRC Press(Publisher)
Regression measures the extent of change of dependent variable due to a unit change in independent variable. That is, the purpose of regression is to explain the variation in a variable (that is, how a variable differs from it’s mean value) using the variation in one or more other variables.Correlation studies the mutual dependence of variables i.e., it is two-way relation. Regression measures the functional relationship and analyses the extent of dependency of dependent variable on independent variable i.e., it is a one way relationship. Correlation is independent of both change in origin and scale of observations. Regression is independent of change in origin, but not of scale of observations. Correlation coefficient is a relative measure of linear relationship between the variables under consideration and is independent of measurement. Regression is an absolute measure showing the changes in value of Y due to X or the change in value of X due to Y In correlation, the variables are studied without any differentiation as dependent and independent variables. In regression, the variables are differentiated as dependent and independent variables.Correlation is symmetrical i.e. rxy = ryx and the value ‘r’ remains sameRegression is asymmetrical in its relationship i.e.byx ≠ bxyCorrelation coefficient ranges from -1 to +1 Regression coefficient ranges from - ∞ to +∞ Correlation will not study the cause and effect relationship between the variables. Regression will study the cause (independent variable) and effect (dependent variable) relationship between the variables. Correlation coefficient is independent of units. Regression coefficient is in the units of dependent variable. Correlation has limited scope of application. It is limited to the linear relationship between the variables. Regression has greater scope in application. It can be employed to study both linear and nonlinear relations between the variables. - eBook - PDF
- Mustapha Akinkunmi(Author)
- 2022(Publication Date)
- Springer(Publisher)
153 C H A P T E R 11 Regression Analysis and Correlation This chapter will focus on how two or more variables are interrelated. We shall understand the two concepts in statistics—regression analysis and correlational analysis—and the relationship between the concepts. We shall discuss what to look for in the output of regression analysis and how the output can be interpreted. We shall elucidate on how to use regression analysis for forecasting. 11.1 CONSTRUCTION OF LINE FIT PLOTS In showing the relationship between two variables, we can draw a line across the variables after plotting the scatter plot and ensuring the line passes through as many points as possible. The straight line that gives the best approximation in a given set of data is referred to as the line of best fit. Least squares method is the most accurate method of finding the line of best fit of a given dataset. For example, Fig. 11.1 shows sales revenue ($’million) and amount spent on advertisement ($’million) of a production company. 11.2 TYPES OF REGRESSION ANALYSIS There are many regression analyses which are based on different assumptions. We mention a few, and we will dwell on the first and second mentioned below: 1. simple linear regression; 2. multiple regression; 3. ridge regression; 4. quantile regression; and 5. Bayesian regression. 11.2.1 USES OF REGRESSION ANALYSIS Regression analysis can be used for the following. 154 11. REGRESSION ANALYSIS AND CORRELATION Sales revenue ($’million) 115 118 120 125 126 128 131 132 Advertisement expenses ($’million) 4 7 9 14 15 17 20 21 134 132 130 128 126 124 122 120 118 116 114 0 5 10 15 20 25 Advertisement Expenses ($’million) Sales Revenue ($’ million) Sales = Advert + 111 Figure 11.1: Chart of sales revenue on advertisement. 1. Causal analysis—to establish the relationship between two or more variables. 2. Forecasting an effect—it is used to predict a response variable fully knowing the indepen- dent variables. - eBook - PDF
- John T. Burr(Author)
- 2004(Publication Date)
- CRC Press(Publisher)
13 Studying Relationships Between Variables by Linear Correlation and Regression We now take up a useful method of analyzing the relationship between two variables. We make simultaneous observations on the two variables x and y , say, x 1 and y 1 , then later on another pair x 2 and y 2 , and so on to x n , and y n . Each such pair is at the same time or place, or on the same material. Then we seek to study the relationship between the two variables. For example, can we estimate or predict y from x ? Are they closely related, loosely related, or unrelated ? One very simple way to gain some insight into the relation is to make a ‘‘scatter dia-gram’’ that is, to plot each pair of x and y on a graph. Thus with a horizontal x -axis and a vertical y -axis, we first plot y 1 against x 1 then y 2 against x 2 , and so on. Then a relationship may or may not emerge. Sometimes such a scatter diagram is all that we need in a study. 389 9052-9 Burr Ch13 R2 080204 13.1. TWO GENERAL PROBLEMS In the first problem, we are especially interested in the estimation or prediction of y from x . For example, we may be using a standard analytical technique and wish to see whether we can accurately estimate its result from that of an alterna-tive less expensive or less time-consuming analysis. Or, we may compare two gauges, or wish to see whether two physical properties such as hardness and tensile strength are closely enough related to predict the latter from the former. Such problems frequently arise in industry. Another general problem is to study a collection of ‘‘input’’ variables to see which is most closely related to an ‘‘output’’ or quality variable. That variable or those variables most closely related are the ones to work on in trying to improve the process and obtain better quality. The degree or strength of relationships is thus the key to the study. - eBook - PDF
- Stewart Anderson(Author)
- 2011(Publication Date)
- Chapman and Hall/CRC(Publisher)
CHAPTER 4 Correlation and Regression 4.1 Introduction In many medical, biological, engineering, economic and other scientific applications, one wishes to establish a linear relationship between two or more variables. If there are only two variables, X and Y , then there are two ways a linear relationship can be characterized: (1) using the correlation coef fi cient ; and (2) using linear regression . One would typically use a correlation coefficient to quantify the strength and direc-tion of the linear association. If neither variable is used to predict the other, then both X and Y are assumed to be random variables, which makes inference more compli-cated. Linear regression is useful for answering the question: Given a value of X , what is the predicted value of Y ? For answering this type of question, values of X are assumed to be fixed (chosen) while values of Y are assumed to be random. In this chapter, we first review the Pearson correlation coefficient and then tackle simple, multiple, and polynomial regression models. Our primary approach for the presentation of the regression models is to use the general linear model involving matrices. We provide a short appendix at the end of the chapter to review matrix algebra. Our strategy for the presentation of regression in this chapter allows us to use the same approach for the different types of regression models. Also included in this chapter are strategies for visualizing regression data and building and assessing regression data. In the final section, we introduce two smoothing techniques, namely, the loess smoother and smoothing polynomial splines. To facilitate the discussion of the techniques covered in this chapter, we provide numerical examples with hand cal-culations to demonstrate how to fit simple models and also provide programs in SAS and R to demonstrate the implementation of calculations in more complex models. - Stephen Gorard(Author)
- 2003(Publication Date)
- Continuum(Publisher)
-10 Progress via regression: introducing correlations This chapter introduces the idea of a correlation, in which two or more variables tend to change values in step with each other. Using this kind of relationship it is possible to predict (or explain) the value of one variable from the value of another. This approach is known as regression, and it forms the basis for several more advanced statistical techniques, some of which are discussed here. An understanding of correlation is therefore a useful door into the fascinating world of statistical modelling of social events. INTRODUCING CORRELATIONS The relationship between two variables known as a correlation is perhaps easiest understood graphically. Figure 10.1 shows the percentage of the 15-year-old cohort of students in each local education authority in England obtaining five or more GCSE qualifications at grade C or above (the government GCSE benchmark). These scores are plotted against the proportion of children in each area eligible for free schools meals (thus coming from families officially defined as in poverty). The two sets of scores are clearly related, such that areas with high poverty (x-axis) have lower GCSE results overall (y-axis), and areas with less poverty generally have higher GCSE results. This kind of relationship is called a correlation - in this example a negative correlation since the two values are negatively related (i.e. as one increases the other tends to decrease). Key assumptions underlying this relationship are that the two variables must be real numbers (see Chapter Three), and they must cross-plot to form an approximately straight line. How close to a straight line this has to be is a matter of judgement, and it is always possible to transform one or more of the scores to try and make the Progress via regression 203 Figure 10.1: Scatterplot for each Local Authority: GCSE benchmark 1998 against percentage of children eligible for free school meals linear fit better.- eBook - PDF
- Barbara Illowsky, Susan Dean(Authors)
- 2016(Publication Date)
- Openstax(Publisher)
12 | LINEAR REGRESSION AND CORRELATION Figure 12.1 Linear regression and correlation can help you determine if an auto mechanic’s salary is related to his work experience. (credit: Joshua Rothhaas) Introduction Chapter Objectives By the end of this chapter, the student should be able to: • Discuss basic ideas of linear regression and correlation. • Create and interpret a line of best fit. • Calculate and interpret the correlation coefficient. • Calculate and interpret outliers. Professionals often want to know how two or more numeric variables are related. For example, is there a relationship between the grade on the second math exam a student takes and the grade on the final exam? If there is a relationship, what is the relationship and how strong is it? Chapter 12 | Linear Regression and Correlation 673 In another example, your income may be determined by your education, your profession, your years of experience, and your ability. The amount you pay a repair person for labor is often determined by an initial amount plus an hourly fee. The type of data described in the examples is bivariate data — "bi" for two variables. In reality, statisticians use multivariate data, meaning many variables. In this chapter, you will be studying the simplest form of regression, "linear regression" with one independent variable (x). This involves data that fits a line in two dimensions. You will also study correlation which measures how strong the relationship is. 12.1 | Linear Equations Linear regression for two variables is based on a linear equation with one independent variable. The equation has the form: y = a + bx where a and b are constant numbers. The variable x is the independent variable, and y is the dependent variable. Typically, you choose a value to substitute for the independent variable and then solve for the dependent variable. Example 12.1 The following examples are linear equations. - eBook - PDF
- Carl McDaniel, Jr., Roger Gates(Authors)
- 2020(Publication Date)
- Wiley(Publisher)
416 CHAPTER 17 Bivariate Correlation and Regression As with the items in the previous chapter, those discussed in this chapter—bivariate correla- tion and regression—have been around for a long time. And they have survived the test of time and are still widely used today as workhorses of our analytical arsenal. They are even used today in their bivariate and multivariate (Chapter 18) forms with Big Data. Bivariate correlation looks at the relationship between the movement of one variable and another variable while bivariate regression looks at the nature of that relationship and the relative impact of an independent variable on the dependent variable. Bivariate Analysis of Association In many marketing research studies, the interests of the researcher and manager go beyond issues that can be addressed by the statistical testing of differences discussed in Chapter 16. They may be interested in the degree of association between two variables. Statistical tech- niques appropriate for this type of analysis are referred to as bivariate techniques. When more than two variables are involved, the techniques employed are known as multivariate techniques. Multivariate techniques are discussed in Chapter 18. When the degree of association between two variables is analyzed, the variables are classified as the independent (predictor) variable and the dependent (criterion) variable. Independent variables are those that are believed to affect the value of the dependent vari- able. Independent variables such as price, advertising expenditures, or number of retail out- lets may, for example, be used to predict and explain sales or market share of a brand—the dependent variable. Bivariate analysis can help provide answers to questions such as the bivariate techniques Statistical methods of analyzing the relationship between two variables. independent variable Variable believed to affect the value of the dependent variable. - eBook - PDF
Biostatistics
Basic Concepts and Methodology for the Health Sciences, 10th Edition International Student Version
- Wayne W. Daniel, Chad L. Cross(Authors)
- 2014(Publication Date)
- Wiley(Publisher)
Their inappropriate use, however, can lead only to meaningless results. To aid in the proper use of these techniques, we make the following suggestions: 1. The assumptions underlying regression and correlation analysis should be reviewed carefully before the data are collected. Although it is rare to find that assumptions are met to perfection, practitioners should have some idea about the magnitude of the gap that exists between the data to be analyzed and the assumptions of the proposed model, so that they may decide whether they should choose another model; proceed with the analysis, but use caution in the interpretation of the results; or use the chosen model with confidence. 2. In simple linear regression and correlation analysis, the two variables of interest are measured on the same entity, called the unit of association. If we are interested in the relationship between height and weight, for example, these two measurements are taken on the same individual. It usually does not make sense to speak of the correlation, say, between the heights of one group of individuals and the weights of another group. 3. No matter how strong is the indication of a relationship between two variables, it should not be interpreted as one of cause and effect. If, for example, a significant sample correlation coefficient between two variables X and Y is observed, it can mean one of several things: (a) X causes Y . (b) Y causes X. (c) Some third factor, either directly or indirectly, causes both X and Y . (d) An unlikely event has occurred and a large sample correlation coefficient has been generated by chance from a population in which X and Y are, in fact, not correlated. (e) The correlation is purely nonsensical, a situation that may arise when measure- ments of X and Y are not taken on a common unit of association. 4. The sample regression equation should not be used to predict or estimate outside the range of values of the independent variable represented in the sample. - No longer available |Learn more
- Anthony Hayter(Author)
- 2012(Publication Date)
- Cengage Learning EMEA(Publisher)
C H A P T E R T W E L V E Simple Linear Regression and Correlation Historically, the ideas of linear regression and correlation presented in this chapter have played a prominent part in statistical data analysis. When dealing with more than one variable, an experimenter is often interested in how a particular variable depends on one or more of the other variables. When the variables have a random component there will not be a deterministic relationship between them, but there may be an underlying structure that the experimenter can investigate. This modeling is often performed by finding a functional relationship between the expected value of a dependent variable and a set of explanatory or independent variables. Linear regression is a modeling technique in which the expected value of a dependent variable is modeled as a linear combination of a set of explanatory variables. These linear models are easy to analyze and are applicable in many situations. Simple linear regression , discussed in this chapter, refers to a linear regression model with only one explanatory variable, while multiple linear regression , discussed in Chapter 13, refers to a linear regression model with two or more explanatory variables. A simple linear regression model is closely related to the calculation of a correlation coefficient to measure the degree of association between two variables, which is discussed at the end of this chapter. 12.1 The Simple Linear Regression Model 12.1.1 Model Definition and Assumptions Consider a data set consisting of the paired observations ( x 1 , y 1 ),...,( x n , y n ) For example, x i could represent the height and y i could represent the weight of the i th person in a random sample of n adult males. Statistical modeling techniques can be used to investigate how the two variables, corresponding to the data values x i and y i , are related. - Available until 7 Feb |Learn more
- Bilal M. Ayyub, Richard H. McCuen(Authors)
- 2016(Publication Date)
- CRC Press(Publisher)
.Thus,.before. making.a.decision.based.on.the.analysis.of.observations.on.a.random.variable.alone,.it.is.wise.to. consider.the.possibility.of.systematic.associations.between.the.random.variable.of.interest.and.other. variables.that.have.a.causal.relationship.with.it . .These.associations.among.variables.can.be.under-stood.using.correlation.and.regression.analyses . To.summarize.these.concepts,.the.variation.in.the.value.of.a.variable.may.be.either.random.or. systematic.in.its.relationship.with.another.variable . .Random.variation.represents.uncertainty . .If.the. variation.is.systematically.associated.with.one.or.more.other.variables,.the.uncertainty.in.estimating. the.value.of.a.variable.can.be.reduced.by.identifying.the.underlying.relationship . .Correlation.and. regression.analyses.are.important.statistical.methods.to.achieve.these.objectives . .They.should.be. preceded.by.a.graphical.analysis.to.determine.(1).if.the.relationship.is.linear.or.nonlinear,.(2).if.the. relationship.is.direct.or.indirect,.and.(3).if.any.extreme.events.might.control.the.relationship . 12.2 CORRELATION ANALYSIS Correlation.analysis.provides.a.means.of.drawing.inferences.about.the.strength.of.the.relation-ship.between.two.or.more.variables . .That.is,.it.is.a.measure.of.the.degree.to.which.the.values.of. these.variables.vary.in.a.systematic.manner . .Thus,.it.provides.a.quantitative.index.of.the.degree.to. which.one.or.more.variables.can.be.used.to.predict.the.values.of.another.variable . It.is.just.as.important.to.understand.what.correlation.analysis.cannot.do.as.it.is.to.know.what. it.can.do . .Correlation.analysis.does.not.provide.an.equation.for.predicting.the.value.of.a.variable . . 12.7 . Confidence.Intervals.of.Regression.Equation. ..................................................................... 418 12.7.1 . Confidence.Interval.for.Line.as.a.Whole. ............................................................... - No longer available |Learn more
- Jessica Utts, Robert Heckard(Authors)
- 2015(Publication Date)
- Cengage Learning EMEA(Publisher)
Imagine, for example, that we are examining the weights and heights of a sample of college women. We might want to know what the increase in average weight is for each 1-inch increase in height. Or we might want to estimate the average weight for women with a specific height, such as 5’10” . Regression analysis is the area of statistics that is used to examine the relationship between a quantitative response variable and one or more explanatory variables. A key element of regression analysis is the estimation of a regression equation that describes how, on average, the response variable is related to the explanatory variables. This regression equation can be used to answer the types of questions that we just asked about the weights and heights of college women. There are many types of relationships and many types of regression equations. The simplest kind of relationship between two variables is a straight line, and that is the only type we will discuss here. Straight-line relationships, also called linear relation-ships , occur frequently in practice, so a straight line is a useful and important type of regression equation. Before we use a straight-line regression model, we should always examine a scatterplot to verify that the pattern actually is linear. We remind you of the music preference and age example, in which a straight line definitely does not describe the pattern of the data. The straight line that best describes the linear relationship between two quantita-tive variables is called the regression line . Let’s review the equation for a straight line relating y and x . Copyright 2014 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.











