eBook - ePub

Understanding and Using Advanced Statistics

Name: Understanding and Using Advanced Statistics
Author: Jeremy J Foster,Emma Barkus,Christian Yavorsky

A Practical Guide for Students

Jeremy J Foster,

Emma Barkus,

Christian Yavorsky,

192 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Understanding and Using Advanced Statistics

A Practical Guide for Students

Jeremy J Foster,

Emma Barkus,

Christian Yavorsky,

About this book

Understanding and Using Advanced Statistics is a comprehensive, practical guide for postgraduate students advising how and when to use more advanced statistical methods. Perfect for students without a mathematical background, the authors refresh important basics such as descriptive statistics and research design as well as introducing essential upper-level techniques to cater for the advanced student.

Key Features:

- Comprehensive guide informing how to use a range of advanced statistical methods such as MANOVA, path analysis and logistical regression;

- Inter-disciplinary: ideal for students studying upper level statistical methods in any subject across the social sciences;

- Practical guide: case studies, further reading, key terms explained in order to help the non-mathematically orientated student get ahead with their research.

Building on undergraduate statistical grounding, Understanding and Using Advanced Statistics provides the upper-level researcher with the knowledge of what advanced statistics do, how they should be used, and what their output means.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Understanding and Using Advanced Statistics by Jeremy J Foster,Emma Barkus,Christian Yavorsky in PDF and/or ePUB format, as well as other popular books in Scienze sociali & Metodologia e ricerca nelle scienze sociali. We have over one million books available in our catalogue for you to explore.

Information

Publisher

SAGE Publications Ltd

Year

2005

Print ISBN

9781412900140, 9781412900133

eBook ISBN

9781446228326

Edition

Topic

Scienze sociali

Subtopic

Metodologia e ricerca nelle scienze sociali

1 Basic Features of Statistical Analysis and the General Linear Model

INTRODUCTION

The aim of this book is to describe some of the statistical techniques which are becoming increasingly common, particularly in the social sciences. The spread of sophisticated computer packages and the machinery on which to run them has meant that procedures which were previously only available to experienced researchers with access to expensive machines and research students can now be carried out in a few seconds by almost every undergraduate. The tendency of the packages to produce items of output which are unfamiliar to most users has lead to modifications in the content of quantitative data analysis courses, but this has not always meant that students gain an understanding of what the analytic procedures do, when they should be used and what the results provided signify. Our aim has been to provide the basis for gaining such an understanding. There are many texts and Internet resources covering the material which we do, but our experience is that many of them are too advanced, starting at too high a level and including topics such as matrix algebra which leave many students baffled. What we have attempted to provide is an assistant which will help you make the transition from the simpler statistics (t-tests, analysis of variance) to more complex procedures; we are hoping we have translated the more technical texts into a form which matches your understanding. Each chapter provides an outline of the statistical technique, the type of question it answers, what the results produced tell you and gives examples from published literature of how the technique has been used.

In recent years there has been a considerable growth in the use of qualitative research methods in many areas of social science including psychology and nursing and this has been accompanied by a decline in the previous preponderance of quantitative research. One feature of the qualitative research movement has been an emphasis upon the ethical issues involved in carrying out research involving people, the need to recognise that the participants own their data and that they should have an input – even perhaps a veto – over the interpretations made of it and the uses to which it is put. This concern has rejuvenated the ethical debate within quantitative research and brought back an awareness of the need to ensure that participants give informed consent to taking part, that they are not studied unknowingly or covertly, that they have the right to confidentiality and anonymity. This is not the place to discuss the ethics of research, but it is only proper that we should urge those considering quantitative research to be aware of the ethical guidelines applicable to their discipline and ensure they abide by them. Gathering the data which lends itself to quantitative analysis is not a value-free activity even if ‘number crunching’ may in itself appear to be so.

Before describing the more complex statistical techniques, we begin by recapitulating the basics of statistical analysis, reminding you of the analysis of variance and outlining the principles of the general linear model (GLM) which underpins many of the techniques described later.

BASIC FEATURES OF STATISTICAL ANALYSIS

Experiments or correlational research designs

In an experiment using a between-subjects design, the participants are randomly allocated to different levels of the independent variable and if all other variables are controlled by being kept constant or by the design of the experiment then it is assumed that any differences in the dependent variable measures are due to the independent variable. (This is a gross simplification of how to design an experiment!) But in many or even most fields of investigation it is impossible to carry out a true experiment because it is impossible to control the conditions, impossible to allocate participants randomly to conditions or ethically unacceptable to do so. One is then forced to consider an alternative type of investigation such as a pseudo-experiment or a correlational study in which data on independent and dependent variables is collected (often simultaneously) and the relationships between them are investigated.

Experiments typically involve analysing the data for differences: did group A score differently from group B on the dependent variable? Correlational studies usually involve analysing the data for correlations or associations: did those who scored highly on measure X also obtain high scores on measure Y?

Independent and dependent variables

Independent variables are those aspects of the respondents or cases which you anticipate will affect the output measure, the dependent variable. An independent variable is often the ‘grouping’ variable which divides people, respondents or cases into separate groups. This division may be based on experimental conditions or it may be some characteristic of the participants such as their age group, sex, economic status. When the independent variable involves different participants in each group, it is referred to as a between-subjects variable. Alternatively, the independent variable may be a number of experimental conditions where all participants take part in every condition. If this is the case, the variable is a within-subjects factor and a repeated measures design is being used. A mixed design is where there are at least two independent variables, and one is between subjects while one is within subjects.

The dependent variable is usually a continuous variable such as a measure of performance on an experimental task or a score on a questionnaire which the researcher proposes is affected by the independent variables. In some types of research, the dependent variable is categorical, which means participants are divided into categories such as surviving and not surviving or relapsing and not relapsing. The data is then frequencies: how many people fall into each category? The research may be concerned with finding which factors predict category membership, and then logistic regression may be used to analyse the data.

It is important not to confuse variables with their levels. An independent variable is the experimental manipulation or the dimension upon which the participants are categorised. For example, suppose we were testing the ability of boys and girls to do mental arithmetic when they were distracted by background noise which could be loud, quiet or not present at all. We would design the experiment so that our participants carried out a mental arithmetic task with loud noise, with quiet noise or without noise. There would be two independent variables: participant sex (male or female) and noise condition (loud, quiet, absent). The first independent variable has two levels (male or female) and the second independent variable has three levels. So this would be a 2 × 3 (or 3 × 2 since it does not matter in which order the numbers here are presented) experiment. The expression 2 × 3 contains two digits, showing there are two independent variables. The actual digits, 2 and 3, indicate that one of the variables has two levels and the other has three levels.

Types of data

There are essentially two types of data, frequency and numerical, depending on the type of measurement scale used. One type of measurement scale is categorical or nominal, where cases are allocated to categories such as ‘male’ and ‘female’, or ‘recovering after 2 months’, ‘recovering after 6 months’, ‘not recovering’. This yields frequency data which is obtained if you count the number of cases or people in each category. The other type of measurement scale is quantitative or numerical: here you are measuring not how many people or cases fall into a particular category, but how much or how well they performed by assigning a numerical value to their performance, for example by recording the time taken to do a task or the number of questions answered.

In a nominal scale, the number is simply functioning as a label and the size of the number does not reflect the magnitude or order of the items. Telephone numbers are an example of a nominal scale, since the size of the number does not reflect anything about the size or order of the people who have those numbers. Similarly, in the example of performance under conditions of background noise described earlier, you might designate the loud noise condition as condition 1, the quiet noise condition as condition 2 and the no-noise condition as condition 3. Here the numbers are acting just as convenient labels for each condition or category and their size means nothing. When you have counted the number of cases in each category, you have frequency data which can be analysed using procedures such as chi-square, log–linear analysis or logistic regression.

Numerical data can be measured on a ratio, interval or ordinal scale. In a ratio scale there is a true zero and a number which is twice as large as another reflects the fact that the attribute being measured is twice as great. Ratio scales are rare in the social sciences unless one is using a measure of some physical feature such as height or time: someone who is 2 metres tall is twice as tall as someone who is 1 metre tall; someone who took 30 seconds to do a task took twice as long as someone who did it in 15 seconds. In an interval scale, the difference between two values at one point on the scale is the same as the difference between two equidistant values at another point on the scale: the usual example cited is the Fahrenheit scale where the difference between 15 and 20 degrees is the same as the difference between 5 and 10 degrees. There are few cases of interval scales in the social sciences (e.g. IQ is not an interval scale because the difference between an IQ of 100 and 105 is not the same as the difference between 70 and 75), although many examples of data being treated as though it were an interval scale. In an ordinal or rank scale, the size of the numbers reflects the order of the items as in a race where first came before second and second before third. But this information tells you nothing about the intervals between the scale points: first may have been just in front of second with third trailing far behind. In practice, the distinction between ratio and interval scales is widely ignored but ordinal or rank data is treated differently by using non-parametric procedures.

Non-parametric and parametric analyses

There are two groups of statistical techniques: parametric and non-parametric. Non-parametric techniques are considered distribution free, meaning that they do not involve any assumptions about the distribution of the population from which the sample of dependent variable measures is drawn. (It does not, for example, have to be normally distributed.) Non-parametric techniques are used with frequency data and when the dependent variable has been measured on an ordinal (rank) scale. If the dependent variable has been measured on an interval scale but does not fulfil certain assumptions described below, it can be transformed into ordinal data and the non-parametric techniques can then be applied.

The parametric tests are generally considered more powerful, offer greater flexibility in analysis and address a greater number of research questions. The majority of the statistical techniques outlined in this book require that the dependent variable measures meet the requirements for being parametric data, which are:

the dependent variable is measured on either an interval or a ratio scale;
scores on the dependent variable approximate to a normal distribution or are drawn from a population where the variable can be assumed to be normally distributed;
scores on the dependent variable show homogeneity of variance between groups of participants.

Regarding point 1, strictly speaking parametric analysis should only be performed on continuous interval or ratio data but in practice many types of data are taken to be interval even if one could reasonably argue that they are not. An example is where a Likert scale has been used to measure attitudes. This is where participants are presented with a statement such as ‘The death penalty is morally acceptable’ and indicate their response by indicating how far they agree or disagree with it using a five- or seven-point scale with one end of the scale indicating ‘strongly agree’, and the other indicating ‘strongly disagree’. If the data is ordinal (ranks), then non-parametric analysis is needed.

Concerning point 2, parametric statistical analysis is based on the assumption that the scores come from a normal distribution, meaning that if one could obtain the scores from the population then they would be normally distributed. Of course one does not know the distribution in the population, only the distribution in the sample one has. So it is necessary to check whether these approximate to a normal distribution. This can be done by plotting the distribution and examining it to see if it is more or less normal. The shape of the distribution can be evaluated in terms of skewness and kurtosis. Skewness reflects the positioning of the peak of the curve (is it in the centre?) and kurtosis refers to the height of the tails of the curve (is the curve too flat?). Statistical packages may give indices of skew and kurtosis; the values should be close to zero.

On point 3, homogeneity of variance is the assumption that the amount of variance is the same in the different sets of scores of the participants. It can be assessed using Levene’s test for homogeneity of variance which gives a t value: if t is significant, the groups differ in their variances, that is there is heterogeneity of variance. If you are comparing groups of equal size, heterogeneity of variance is not important, but for groups of unequal sizes it needs to be dealt with. This can be done in a number of ways including transforming the scores, using a more stringent significance level (perhaps 0.01 rather than 0.05), applying a non-parametric procedure.

Statistical significance

Probability testing is at the centre of statistical analysis and is essentially concerned with deciding how probable it is that the results observed could have been due to chance or error variation in the scores. To make the explanation simpler, we shall take the case of testing to see whether there is a difference between two groups of respondents. Suppose we have measured the amount of concern people have with their body image using a questionnaire in which a high score indicates a high level of concern, and done this for a group of women and for a group of men. Th...