| PART 1 | | DESIGNING AN EXPERIMENT |
Scientists spend most of their lives trying to answer questions: why do some people get nervous when they speak to others? does smoking cause cancer? does reading a book on experimental design help you to design experiments? The traditional view is that the fundamental premise of science is that there are absolute truths – facts about the world that are independent of our opinions of them – to be discovered. There are fundamentally two ways in which these sorts of research questions can be answered (and the absolute truths discovered): we can observe what naturally happens in the real world without interfering with it (e.g. correlational or observational methods), or we can manipulate some aspect of the environment and observe its effect (the experimental method). These two approaches have many similarities:
- Empirical: both methods attempt to gather evidence through observation and measurement that can be replicated by others. This process is known as empiricism.
- Measurement: both methods attempt to measure whatever it is being studied (see Box 1.1).
- Replicability: both methods seek to ensure that results can be replicated by other researchers (Box 1.1 illustrates how measurement can affect the replicability of results).
- Objectivity: both methods seek to answer the research question in an objective way. Although objectivity is a scientific ideal, arguably researchers’ interpretations of their results are influenced by their expectations of what they hope to discover.
Nevertheless, correlational and experimental methods have one fundamental difference: the manipulation of variables. Observational research centres on unobtrusive observation of naturally occurring phenomena (for example, observing children in the playground to see what factors facilitate aggression). In contrast, experimentation deliberately manipulates the environment to ascertain what effect one variable has on another (for example, giving someone 15 tequilas to see how it affects their walking).
Box 1.1: Why do scientists measure things? Imagine you were a chemist (heaven forbid!) and you wanted to demonstrate that eating a newly discovered chemical called ‘unovar’ made your brain explode. You force-fed 20 people with unovar and indeed their brains did explode. These results were written up and published for other chemists to read and you were awarded the Nobel science prize (which you enjoyed in the comfort of the prison cell assigned to you for murdering 20 innocent people). A few years pass and another scientist Dr. Smug-git comes along and shows that when he fed unovar to his participants their brains did not explode. Why could this be? There are two measurement-related issues here:
- Dr. Smug-git might have fed his participants less unovar than you did (it may be that brain explosion is dependent on a certain critical mass of the chemical being consumed)
- Dr. Smug-git might have measured his outcome differently – did you and Smuggit assess brain explosion in the same way?
For the former point, this explains why chemists and physicists have devoted many hours to developing standard units of measurement. If you had reported that you’d fed your participants 100 grams of unovar, then Dr. Smug-git could have ensured that he had used the same amount – and because grams are a standard unit of measurement we would know that you and Smug-git used exactly the same amount of the chemical. Importantly, direct measurements such as the gram provide an objective standard: an object that weighs 10 g is known to be twice as heavy as an object weighing only 5 g.
It is easy enough to develop scales of measurement for properties that can be directly observed (such as height, mass and volume). However, we rarely have this luxury in psychology and other social sciences because we are interested in measuring constructs that cannot be directly measured; instead we rely on indirect measures. For example, if I were to measure undergraduates’ anxiety at having to do a statistics course on a scale ranging from 0 (not anxious) to 10 (very anxious), could I claim that a student who scores 10 is, in reality, twice as anxious as a student who scores 5? Although I couldn’t claim that a student who scored 10 was twice as anxious as a student who scored 5, I probably could claim that the student scoring 10 was more anxious (to whatever degree) than the student scoring 5. This relationship between what is being measured and the numbers obtained on a scale is known as the level of measurement. In a sense, the level of measurement is the degree to which a scale informs you about the construct being measured – it relates to the accuracy of measurement.
The second proposed explanation for the difference between your experiment and that of Dr. Smug-git illustrates this point rather nicely. In both cases the observed outcome was an exploding brain, but how was this measured? Clearly a brain can either explode or not, so it should be easy to observe the brain and then classify its response to the chemical as exploding or not exploding. Easy eh? Well, perhaps not, what constitutes an explosion? Does the brain have to literally pop – propelling small fragments of blood and tissue onto the nearby walls – or will it suffice to have a large internal haemorrhage? Perhaps Dr. Smug-git required a more dramatic response before he would classify a brain as exploding – hence his differing conclusion. This example illustrates what psychologists face all of the time: an inability to directly measure what they want to measure. When we can’t measure something directly there will always be a discrepancy between the numbers we use to represent the thing we’re measuring and the actual value of the thing we’re measuring (i.e. the value we’d get if we could measure it directly). This discrepancy is known as measurement error.
1.1 Variables and Measurement
Scientists are interested in how variables change and what causes these changes. If you look at any research question, such as the one above – ‘why do some people get nervous when they speak to others?’ – inherent within it is something that changes in some way (it is not constant). In this case it is nervousness: the question implies that some people will be more nervous than others; therefore, nervousness is not constant – it changes (because it will differ both in different people and across different situations). In much the same way ‘does watching horror films make children more anxious?’ implies that anxiety will change or be different in different children. Anything that changes in this way is known as a variable – something that varies. As you’ll see later in this section variables can take many forms (see page 6), and can be both manipulated and observed (see page 10).
To draw meaningful conclusions about the relationships between variables, scientists have to measure them in some way (see Box 1.1). Psychologists cannot measure psychological constructs directly and so instead we use techniques such as self-report (e.g. asking people how they feel) and questionnaires. Any device we use to measure something will provide a different quality of data. There are basically four levels at which variables can be measured:
| 1. Nominal (a.k.a. categorical) | |
| 2. Ordinal | Non-parametric; |
| 3. Interval | |
| 4. Ratio | Parametric |
We’ll discuss each of these levels in turn, but for those wanting a gentler introduction Sandy MacRae (1994) covers the material excellently.
Nominal Data
The word nominal derives from the Latin word for name and the nominal scale is literally a scale on which two things that are equivalent in some sense are given the same name (or number). With this scale, there is no relationship between the size of the number and what is being measured; all that you can tell is that two things with the same number are equivalent whereas two things with different numbers are not equivalent. The classic example is numbers in a football team. A player with number 7 on his back should play in mid-field, whereas a player with number 1 on his back plays in goal. However a number 7 player is not necessarily better than a number 1 (most managers would not want their midfielder playing in goal!). The numbers on the back of shirts could equally well be letters or names (in fact, until recently many rugby clubs denoted team positions with letters on the back of shirts).
Data from a nominal scale should not be used for arithmetic because doing so would be meaningless. For example, imagine if the England coach found that his number 7 (David Beckham) was injured. Would he consider replacing him with seven David Seaman (who plays number 1) or – heaven forbid – combine Phil and Gary Neville (at numbers 2 and 5)? Even more ludicrous, I used to play wing in rugby (number 1 1 – the fast good-looking ones who score all the tries, ahem, well maybe not!). Imagine if one day the coach replaced a number 11 with a number 8 (burly bloke at the back of the scrum) piggy-backing a number 3 (huge bullock-like blokes at the front of the scrum)! They certainly wouldn’t be as fast (or good looking!) as a number 11. The only way that nominal data can be used is to consider frequencies. For example, we could look at how frequently number 1 Is score tries compared to number 3s. Having said this, as Lord (1953) points out in a very amusing and readable article, numbers don’t know where they came from and will behave in the same way, obeying the same arithmetic rules regardless.
Ordinal Data
Ordinal data give us more information than nominal data. If we use an ordinal scale to measure something, we can tell not only that things have occurred, but also the order in which they occurred. However, these data tell us nothing about the differences between values. Figure 1.1 illustrates ordinal data: imagine you went to a frog race in which there were three frogs (Silus, Hoppy and Flibbidy – or Flibbs to his mates). The names of frogs don’t give us any information about where they came in the race, however if we label them according to their performance – first, second and third – then these labels do tell us something about how the frog performed; these categories are ordered. In using ordered categories we now know that the frog that came second was better than the frog that came third.
The limitation of ordinal data is that it tells us little about the differences between ordered categories; we don’t know how much better the winner was than the frogs that came second and third. In Figure 1.1 the two races show Flibbs winning, Hoppy coming second and Silus losing. So, the ordered categories attached to each frog are the same in the two races: Flibbs is 1, Hoppy is 2, and Silus is 3. However, in the first race Flibbs and Hoppy tightly contested first place but Silus was way behind (so first and second place were actually very similar to each other in terms of performance), but in the second race Flibbs is a long way ahead whereas Hoppy and Silus are very similar (so first and second place are very different in terms of performance). This example shows how ordinal data can tell us something about position but nothing about the relative differences between positions (first place is always better than second place, but the difference between first and second place can vary). Nominal and ordinal scales don’t tell us anything about the differences between points on the scale and need to be analysed with non-parametric tests (see Chapter 7).
Figure 1.1 Two frog races
A lot of psychological data, especially questionnaire and self-report data, are ordinal. Imagine we asked several socially anxious individuals to think of embarrassing times in their lives, and then to rate how embarrassing each situation was on a 10-point scale. We might be confident that a memory they rate as 10 was more embarrassing than one they rate as 5, but can we be certain that the first memory was twice as embarrassing as the second? How much more unreliable does this become if we compare different people’s ratings of their memories – would you expect a rating of 10 from one person to represent the same level of embarrassment as another person’s or will their ratings depend on their subjective beliefs about what is embarrassing? Most self-report responses are likely to be ordinal and so in any situation in which we ask people to rate things (e.g. rate their confidence about an answer they have given, rate how scared they are about something, rate how disgusting they find some activity) we should regard these data as ordinal although many psychologists do not.
Interval Data
Interval data are considerably more useful than ordinal data and most of the statistical tests we use in psychology rely on having data that are measured on an interval scale. To say that data are interval, we must be certain that equal intervals on the scale represent equal differences in the property being measured. So, for example, if a psychologist took several spider phobic individuals, showed them a spider and asked them to rate their anxiety on a 10-point scale, for this scale to be interval it must be the case that the difference between anxiety ratings of 5 and 6 is the same as the difference between say 1 and 2, or 9 and 10. Similarly, the difference in anxiety between ratings of 1 and 4 should be identical to the difference between ratings of 6 and 9. If we had 4 phobic individuals (with their anxiety ratings in brackets): Nicola (10), Robin (9), Dave (2) and Esther (3), an interval scale would mean that the extra anxiety that Esther subjectively experiences compared to Dave is equal to the extra anxiety that Nicola experiences compared to Robin. When data have this property they can be analysed with...