Technology & Engineering

Transform Variables in Regression

Transforming variables in regression refers to the process of modifying the independent or dependent variables to better meet the assumptions of linear regression. This can involve operations such as taking the logarithm, square root, or reciprocal of the variables. Transforming variables can help improve the linearity, normality, and homoscedasticity of the regression model, leading to more accurate and reliable results.

Written by Perlego with AI-assistance

6 Key excerpts on "Transform Variables in Regression"

  • Book cover image for: Regression Analysis By Example Using R
    • Ali S. Hadi, Samprit Chatterjee(Authors)
    • 2023(Publication Date)
    • Wiley
      (Publisher)
    does not enter the model linearly. To satisfy the assumptions of the standard regression model, instead of working with the original variables, we sometimes work with transformed variables. Transformations may be necessary for several reasons.
    1. Theoretical considerations may specify that the relationship between two variables is nonlinear. An appropriate transformation of the variables can make the relationship between the transformed variables linear. Consider an example from learning theory (experimental psychology). A learning model that is widely used states that the time taken to perform a task on the th occasion is
      (7.1)
      The relationship between and as given in (7.1 ) is nonlinear, and we cannot directly apply techniques of linear regression. On the other hand, if we take logarithms of both sides, we get
      (7.2)
      showing that and are linearly related. The transformation enables us to use standard regression methods. Although the relationship between the original variables was nonlinear, the relationship between transformed variables is linear. A transformation is used to achieve the linearity of the fitted model.
    2. The response variable , which is analyzed, may have a probability distribution whose variance is related to the mean. If the mean is related to the value of the predictor variable , then the variance of will change with and will not be constant. The distribution of will usually also be non-normal under these conditions. Non-normality invalidates the standard tests of significance (although not in a major way with large samples) since they are based on the normality assumption. The unequal variance of the error terms will produce estimates that are unbiased, but are no longer best in the sense of having the smallest variance. In these situations we often transform the data so as to ensure normality and constancy of error variance. In practice, the transformations are chosen to ensure the constancy of variance (variance-stabilizing transformations
  • Book cover image for: Regression Analysis by Example
    • Samprit Chatterjee, Ali S. Hadi(Authors)
    • 2015(Publication Date)
    • Wiley
      (Publisher)
    CHAPTER 6 TRANSFORMATION OF VARIABLES 6.1 INTRODUCTION Data do not always come in a form that is immediately suitable for analysis. We often have to transform the variables before carrying out the analysis. Transfor-mations are applied to accomplish certain objectives such as to ensure linearity, to achieve normality, or to stabilize the variance. It often becomes necessary to fit a linear regression model to the transformed rather than the original variables. This is common practice. In this chapter, we discuss the situations where it is necessary to transform the data, the possible choices of transformation, and the analysis of transformed data. We illustrate transformation mainly using simple regression. In multiple re-gression where there are several predictors, some may require transformation and others may not. Although the same technique can be applied to multiple regression, transformation in multiple regression requires more effort and care. The necessity for transforming the data arises because the original variables, or the model in terms of the original variables, violates one or more of the standard regression assumptions. Two of the most commonly violated assumptions are the linearity of the model and the constancy of the error variance. As mentioned in Chapters 2 and 3, a regression model is linear when the parameters present in the Regression Analysis by Example, Fifth Edition. By Samprit Chatterjee and Ali S. Hadi Copyright © 2012 John Wiley & Sons, Inc. 163 164 TRANSFORMATION OF VARIABLES model occur linearly even if the predictor variables occur nonlinearly. For example, each of the four following models is linear: Y f30+f31X+c, Y f30 + f31X + f32X2 + c, Y f30 + f3dog X + c, Y f30 + f31 rx + c, because the model parameters f3o, f31, f32 enter linearly. On the other hand, Y = f30 + e{31 x + c is a nonlinear model because the parameter f31 does not enter the model linearly.
  • Book cover image for: Applied Regression Modeling
    • Iain Pardoe(Author)
    • 2020(Publication Date)
    • Wiley
      (Publisher)
    • Add an interaction term, say X 1 X 2 , to a multiple linear regression model if the association between X 1 and Y depends on the value of X 2 and the resulting model is more effective. • Understand that preserving hierarchy in a multiple linear regression model with interaction term(s) means including in the model the individual predictors that make up the interaction term(s). TRANSFORMATIONS 161 • Use indicator variables (or dummy variables) to incorporate qualitative predictor variables into a multiple linear regression model. • Derive separate equations from an estimated multiple linear regression equation that includes indicator variables to represent different categories of a qualitative predictor variable. • Use appropriate hypothesis tests to determine the best set of indicator variables and interaction terms to include in a multiple linear regression model. • Interpret parameter estimates for indicator variables and interaction terms in a multiple linear regression model as differences between one category and another (reference) category. 4.1 TRANSFORMATIONS 4.1.1 Natural logarithm transformation for predictors Consider the TVADS data file in Table 4.1, obtained from the Data and Story Library. These data appeared in the Wall Street Journal on March 1, 1984. Twenty-one TV commercials were selected by Video Board Tests, Inc., a New York ad-testing company, based on interviews with 20,000 adults. Impress measures millions of retained impressions of those commercials per week, based on a survey of 4,000 adults. Spend measures the corresponding 1983 TV advertising budget in millions of dollars. A transformation is a mathematical function applied to a variable in our dataset. For example, log e (Spend ) measures the natural logarithms of the advertising budgets. Mathematically, it is possible that there is a stronger linear association between log e (Spend ) and Impress than between Spend and Impress.
  • Book cover image for: Biostatistics
    eBook - PDF

    Biostatistics

    Basic Concepts and Methodology for the Health Sciences, 10th Edition International Student Version

    • Wayne W. Daniel, Chad L. Cross(Authors)
    • 2014(Publication Date)
    • Wiley
      (Publisher)
    Often one may wish to attempt a transformation of the data. Mathematical transfor- mations are useful because they do not affect the underlying relationships among variables. Since hypothesis tests for the regression coefficients are based on normal distribution statistics, data transformations can sometimes normalize the data to the extent necessary to perform such tests. Simple transformations, such as taking the square root of measurements or taking the logarithm of measurements, are quite common. EXAMPLE 11.1.1 Researchers were interested in blood concentrations of delta-9-tetrahydrocannabinol (D-9-THC), the active psychotropic component in marijuana, from 25 research subjects. These data are presented in Table 11.1.1, as are these same data after using a log 10 transformation. 540 CHAPTER 11 ADDITIONAL TECHNIQUES FOR THE ANALYSIS OF RELATIONSHIPS AMONG VARIABLES Box-and-whisker plots from SPSS software for these data are shown in Figure 11.1.1. The raw data are clearly skewed, and an outlier is identified (observation 25). A log 10 transfor- mation, which is often useful for such skewed data, removes the magnitude of the outlier and results in a distribution that is much more nearly symmetric about the median. Therefore, the transformed data could be used in lieu of the raw data for constructing the regression model. Though symmetric data do not, necessarily, imply that the data are normal, they do result in a more appropriate model. Formal tests of normality, as previously mentioned, should always be carried out prior to analysis. & Unequal Error Variances When the variances of the error terms are not equal, we may obtain a satisfactory equation for the model, but, because the assumption that the error variances are equal is violated, we will not be able to perform appropriate hypothesis tests on the model coefficients.
  • Book cover image for: Developing Econometrics
    • Hengqing Tong, T. Krishna Kumar, Yangxin Huang(Authors)
    • 2011(Publication Date)
    • Wiley
      (Publisher)
    Actually we should add stochastic error term after linearizing the model, which will be more natural. For example, an exponential model is:
    Take logarithm and then add in stochastic term,
    This is equivalent to assuming that the model in fact is:
    For more complicated model such as a Logistic regression:
    It can be transformed into:
    We should have clear identification of the actual distribution of error terms. This can be done through fitting a distribution to the estimated residuals. We discussed linearization of the theoretical model above. In practice we may not know the theoretical model at the beginning. We may have just a few columns of data, the first column is the dependent variable Y , the following columns are independent variables X 1 ,…,
    Xp
    and we need to develop a linear regression. As it is a multivariate relationship, it is natural that there exist intercorrelations among independent variables and can be fitted better with a multiple regression than with a set of simple bivariate regressions, such as between Y and X 1 , Y and X 2 ,…, Y and
    Xp
    . We make a decision on what kind of data transformation is needed for each independent variable. If we can make each transformed independent variable linearly related to Y , we can then run the multiple linear regression with those transformed variables.
    Figure 2.5 Exponential growth Trend.
    Figure 2.6 Power function growth trend.
    Example 2.2 Regression after specific transformation of each column Given 20 data points of dependent variable Y and the two independent variables X 1 , X 2 as shown in the following table, let us analyze the data. Arranging X 1 in an increasing order, we can find that it grows rapidly, seemingly with an exponential growth. So we can consider taking logarithm transformation of X 1 . The column X 2 grows fast too, however at a rate less than the column X 1 . It is growing seemingly as square function, so square root transformation can be considered. But the transformation relying on the intuition is not necessarily accurate. We can use DASC in order to see Figure 2.5 and observe the dotted pairs (X
    1i
    , i
  • Book cover image for: Statistics Using R
    eBook - PDF

    Statistics Using R

    An Integrative Approach

    CHAPTER FOUR RE-EXPRESSING VARIABLES Often, variables in their original form do not lend themselves well to comparison with other variables or to certain types of analysis. In addition, often we may obtain greater insight by expressing a variable in a different form. For these and other reasons, in this chapter we discuss four different ways to re-express or transform variables: applying linear transformations, applying nonlinear transformations, recoding, and combining. We also describe how to use syntax (.R) files to manage your data analysis. LINEAR AND NONLINEAR TRANSFORMATIONS In Chapter 1 we discussed measurement as the assignment of numbers to objects to reflect the amount of a particular trait such objects possess. For example, a number in inches may be assigned to a person to reflect how tall that person is. Of course, without loss of information, a number in feet or even in meters, can be assigned instead. What we will draw upon in this section is the notion that the numbers themselves used to measure a particular trait, whether they be in inches, feet, or meters, for example, are not intrinsic to the trait itself. Rather, they are mere devices for helping us to understand the trait or other phenomenon we are studying. Accordingly, if an alter- native numeric system, or metric (as a numeric system is called), can be used more effectively than the original one, then this metric should be substituted, as long as it retains whatever properties of the original system we believe are important. In making this change on a univariate basis, each number in the new system will correspond on a one-to-one basis to each number in the original system. That is, each number in the new system will be able to be matched uniquely to one and only one number in the original system. The rule that defines the one-to-one correspondence between the numeric systems is called the transformation.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.