eBook - ePub

Statistical Modelling for Social Researchers

Name: Statistical Modelling for Social Researchers
Author: Roger Tarling

Principles and Practice

Roger Tarling,

224 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Statistical Modelling for Social Researchers

Principles and Practice

Roger Tarling,

About this book

This book explains the principles and theory of statistical modelling in an intelligible way for the non-mathematical social scientist looking to apply statistical modelling techniques in research. The book also serves as an introduction for those wishing to develop more detailed knowledge and skills in statistical modelling. Rather than present a limited number of statistical models in great depth, the aim is to provide a comprehensive overview of the statistical models currently adopted in social research, in order that the researcher can make appropriate choices and select the most suitable model for the research question to be addressed. To facilitate application, the book also offers practical guidance and instruction in fitting models using SPSS and Stata, the most popular statistical computer software which is available to most social researchers. Instruction in using MLwiN is also given.

Models covered in the book include; multiple regression, binary, multinomial and ordered logistic regression, log-linear models, multilevel models, latent variable models (factor analysis), path analysis and simultaneous equation models and models for longitudinal data and event histories. An accompanying website hosts the datasets and further exercises in order that the reader may practice developing statistical models.

An ideal tool for postgraduate social science students, research students and practicing social researchers in universities, market research, government social research and the voluntary sector.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Statistical Modelling for Social Researchers by Roger Tarling in PDF and/or ePUB format, as well as other popular books in Social Sciences & Social Science Research & Methodology. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Social Science Research & Methodology

Index

Social Sciences

1 Statistical modelling: An overview

1.1 Introduction

In defining a model, the dictionary talks of making a representation, an imitation, an image, a copy or a paradigm.

We talk of models and use them in many different contexts. Perhaps the term is used too readily or too loosely, devaluing its meaning and power. Nevertheless, in the built environment it is common to physically build a mock-up (models) of how the construction will look. Engineers will build prototypes (models) of a new machine, for example a car. Planners build simulation models to represent traffic flows as a way of assessing new road layouts.

In their objective, models in the social sciences are no different. What the theoretician or researcher is attempting to produce is a representation of how phenomena or concepts (as measured by their variables) relate to each other. In other words, the social scientist is attempting to understand the complex social world and represent the essential inter-relationships in a simplified but meaningful way.

What distinguishes a statistical model is that it is constructed from empirical quantitative data and uses statistical theory to guide its development. The details of this will be given later but at this stage it is important to note that the theory or the research question guides the construction of the model. Statistical modelling is a technique to aid understanding, it is not an end in itself.

1.2 Why model?

Statistical modelling is an important analytical tool as it enables social researchers to consider in a coherent and unified procedure complex inter-relationships between social phenomena and to isolate and make judgements about the separate effects of each. More specifically, in social science statistical modelling is undertaken for one of four main reasons: (1) to improve understanding of causality and the development of theory, (2) to make predictions, (3) to assess the effect of different characteristics, (4) to reduce the dimensionality of data.

To aid the development of theory

Constructing models can help develop theoretical perspectives or test the claims of competing perspectives. Relatively recently attention has been given to the part lifestyle plays in crime victimisation and routine activity theories have been promulgated.

These theoretical perspectives can be informed by developing models examining how aspects of people’s lives, such as where they work, what recreational activities they take part in, how they travel, what time they travel and who with, relate to their experience of being a victim of crime.

To make predictions

Many models, particularly in the economic sphere, are constructed with the purpose of making forecasts or predictions about the future. The ability to anticipate any changes in unemployment or interest rates or the cost of living offers decision makers the opportunity to take any necessary remedial action. Similarly, statistical models can be used to estimate the relative risks of certain outcomes, for example, the risk that an offender will reoffend within a particular time period. Knowledge of the risk can be an aid to decision making; in this example it may inform decisions about when the offender is to be released from prison or whether or not the offender should be placed on a particular programme.

To assess the effect of different characteristics

Often the aim of a social research project is to evaluate the effect of a particular characteristic on an outcome, for example are women offenders treated differently from male offenders in terms of the sentence they are awarded at court? Are women discriminated against in the workplace and paid less than men? In answer to the first question, women are generally awarded lesser sentences than men but can it be inferred that women are treated differently? Other attributes contribute to the sentence imposed—the seriousness of the offence, the age of the offender and the previous criminal history of the offender. Compared with men, women offenders generally commit less serious offences and have less extensive criminal careers so it is not surprising that on average they receive lesser sentences. To answer the important question of whether women are treated differently from men, account has to be taken of the offence committed, age and criminal record, in order that we may treat ‘like with like’. A statistical model enables us to assess the effect of gender on sentencing after adjusting for the other important characteristics known to influence sentencing decisions.

To reduce the dimensionality of data and to uncover latent variables

A situation may exist where many variables are highly inter-correlated, for example a child’s marks on various school tests, or a person’s answers to a large number of similar attitudinal questions. Leaving until later the technical problems encountered by including all these similar variables in a model, one might, in any case, prefer a summary measure of them. An average (which is itself a linear model) could be calculated and used to represent the set of variables or a more sophisticated model, which weighted each of the variables differently, could be constructed.

In other applications the purpose may be to uncover latent variables, which are underlying social constructs (such as social deprivation, quality of life or fear of crime) but for which no direct measurement scale exists. In order to undertake research and perform analysis on these concepts, a measurement scale has to be constructed from manifest variables, that is, from variables that can be measured. Latent variables are discussed further in Chapter 10.

Before concluding this section it should be emphasised that the four purposes outlined above are not themselves mutually exclusive. Models that are firmly grounded in theory are likely to achieve better predictions and will be better placed to isolate the relative effects of different characteristics. That is, the better the model represents true relationships between the underlying phenomena the better able it is to achieve any of the objectives set out above.

1.3 The general linear statistical model

A statistical model takes the form of a mathematical equation in which the concepts of interest (as measured by their variables) are hypothesised to be related to each other in some way. The statistical model of interest in this book is known as the general linear statistical model and is defined in Equation (1.1).

(1.1)

where:

y is the dependent variable, y_i is the value of the dependent variable for the ith subject

x_1i, x_2i … x_pi are any number (from 1 to p) of explanatory/independent variables.

The value of x_p will vary between the i subjects.

b₁, b₂ … b_p are the coefficients, or parameters, of the corresponding explanatory variables, x₁, x₂ … x_p.

e is the error term or residual, e_i is the residual for the ith subject.

Each variable is explained in turn.

Dependent variable, also known as response variable or outcome variable

The dependent variable is the variable of prime interest in our research, that is, the variable which we wish to explain or predict. The dependent variable is regarded as a random variable, which is free to vary in response (hence response variable or outcome variable) to the explanatory variables. Note that in Equation (1.1) there is only one dependent variable although we may have alternative definitions and measures of it. (For example, if the focus of the study was the remuneration people received, the dependent variable could be annual salary or hourly rate of pay. Remuneration could also include pensions and/or interest on savings and investments, etc.) Although there is only one dependent variable per equation, we will also consider in Chapter 11 the situation where there is more than one equation (each with its own y) which are related or need to be analysed simultaneously.

SPSS and Stata consistently use the term dependent variable in all their models and it is the convention in texts and computer program manuals to denote the dependent variable by the letter y.

Explanatory variables, also known as predictor variables, covariates, factors or independent variables

In most real-life applications there is more than one, and potentially many, explanatory variables. As the name implies, these variables are associated with the dependent variable and explain or predict values of the dependent variable.

Although in common usage, the term independent variable is not favoured by statisticians as it does not accurately convey the nature of the relationship between such a variable and the dependent variable. As will be seen later, these variables are far from independent in statistical models but are often highly correlated with the dependent variable and each other. I prefer the term explanatory variable as it better describes the nature of the relationship between it and the dependent variable. However, I concede that the use of independent variable is pervasive and I will continue to use it interchangeably with explanatory variable.

Explanatory (independent) variables can obviously be continuous or categorical (these terms are defined in section 2.4). Continuous explanatory (independent) variables are often called covariates—to signify that they vary in some relationship with the dependent variable. Categorical explanatory (independent) variables are also called factors or indicators and the individual categories of the factor are sometimes called levels.

Stata consistently uses the term independent variable throughout. However, SPSS is not so consistent. Rather confusingly it uses independent variable in the regression component, but covariate in the logistic regression component, regardless of whether the variable is continuous or categorical. The terminology changes again for the multinomial where covariates and factors are separately identified.

In most texts and computer program manuals it is the convention to denote explanatory variables by the letter x, and if there is more than one to number them sequentially with a subscript: x₁, x₂, x₃ and so on.

Whether a variable is considered to be a dependent variable or an independent variable depends wholly on the context, that is, the nature of the research being undertaken. In one study a variable may be considered as the dependent variable but in another as an explanatory variable. For example in a study of school pupils’ achievements, some measure of educational attainment might be taken as the dependent variable. However, in a study of adults’ occupations or salary, school attainment might well be considered as an explanatory variable. Furthermore, in any one study, a variable may be considered as both a dependent variable and an explanatory variable at different stages of the analysis, especially when developing causal models (which are the subject of Chapter 11).

Coefficients or parameters

Associated with each explanatory (independent) variable x_p is a coefficient or parameter b_p. b_p indicates the magnitude by which y changes as x_p changes after taking into account (or adjusting for) the other explanatory variables included in the model. To be more precise, for a one-unit change in x_p, y will change by an amount b_p after adjusting for the contribution of the other explanatory variables. Thus b_p is the partial effect of x_p given the other xs in the model. It is important to understand that the value of b_p may well change from model to model depending on which other variables are included. In addition, b_p will also depend on the units of measurement chosen. For example, if y represented a person’s weight and x his or her height, b would take a different value if weight was measured in pounds or kilograms and height was measured in feet or centimetres. However, whatever units of measurement were chosen, b would represent the same change in the quantum of weight due to a one-unit change in the quantum of height.

b₀ is a constant term and represents the value of y when all the explanatory variables equal zero, that is, when no explanatory variables are included in the model. It will be seen that the constant value, b₀, is itself often meaningless; its only importance is to help determine the values of b_p. (The exception is where the explanatory variables are categorical when b₀ represents the effect on y of being in the reference category; ...