Chapter 1
What is Science?
Analects of Confucius, II, 17
SUMMARY
The experimental method as a test of rival hypotheses, involving the creation of conditions in which the hypotheses would give different results. Causality and causal hypotheses. The relationship between hypotheses and models. Model comparison as a basic procedure in science. Effect sizes and causal models. The role of variables, and their types. Populations and samples. Occamâs razor and goodness of fit to data as the twin criteria behind model comparison. An analogy with animal behaviour.
NEED TO KNOW
A story told of Solomon, King of Israel, who was famous for his wisdom, concerns two women who appeared before him both disputing that they were the true mother of a baby boy. Each woman appeared to have an equal claim, but one of them obviously had to be lying. The king was asked to resolve the dispute, but without clear evidence either way, what could he do?
In an episode of âHouse, M.D.â called âMoving the Chainsâ, the medical team are faced with a patient with a mysterious condition. They finally narrow it down to just two possibilities: lymphoma, and Takayasuâs arteritis. The problem is that the two conditions cannot be treated in the same way. If the patient has lymphoma, then the recommended treatment is to remove his spleen, but if the correct diagnosis is Takayasuâs, then he needs to be put on a course of steroids. Giving him the wrong treatment may kill him. What can they do?
Solomon resolved the problem by asking for a sword to be brought, saying that as both women appeared to have an equal right to the baby, he would divide the baby and give half of it to each. One of the two announced herself content with this decision, whereas the other asked for the entire baby to be handed over to her rival. Solomon concluded that the second woman was the true mother and awarded the child to her.
In the second story, House told his team to put the patient on an ethanol drip. In the event that the patient had lymphoma, then the drip would make him itchy. If the disease was Takayasuâs, then he would lose his radial pulse. The outcome would make clear what disease the patient was suffering from and therefore which treatment was appropriate, without endangering his life.
These two stories have something in common. There were two possibilities (the babyâs true mother was the first, or the second woman; the patient had lymphoma, or Takayasuâs), but not enough information to determine which was true. It was important to know which of the possibilities was correct, because a decision had to be made (giving the baby to one of the women; treating the patient in a certain way), and making the best decision depended on knowing which possibility was correct. The problem was resolved in each case by taking some action â doing an experiment, if you will â and observing the outcome. The two possibilities would give different outcomes, and therefore observing the outcomes would enable the correct possibility to be inferred by a sort of reverse logic. Knowing the correct possibility enabled the best decision to be taken in terms of giving the baby to its real mother, or giving the correct treatment to the patient.
It is a bit clumsy to refer all the time to the âtwo possibilitiesâ in these stories. It is better to refer to them as âhypothesesâ: things which might be true about the world. In both cases, there are two hypotheses, and one of them must be right, but not both. The logic of how Solomon and House both dealt with the problem can be illustrated using something called an X â O diagram,1 introduced by Campbell and Stanley:
The Hâs represent the two possibilities, X is the âinterventionâ, and the Oâs represent certain features of the outcomes.
The symbol âââ, the mathematical sign for âimpliesâ, means that if the hypothesis to the left of it is true, then if the intervention X is applied, it will produce the outcome (or observation) O. In the Solomon example, H1 could represent the possibility that the baby belongs to the first women, H2 that the second woman is the mother. X represents the kingâs threat to cut the baby in half. O1 is the response where the first woman offers to give up the baby, and the second woman agrees to its being divided, and O2, to the response where the second woman gives up the baby and the first one wants it cut in half. In the House instance, the hypotheses represent lymphoma and Takayasuâs respectively, X is the ethanol drip, and the two outcomes are the patient feeling itchy, and the loss of a radial pulse. The point is that it is known in advance that if H1 is true, than the intervention X will be followed by O1, and similarly with the other possibility.
The situation described above, where we have two or more hypotheses and an experiment which will give distinct outcomes depending on which of them is true, was first described in this general form by the philosopher Francis Bacon. He called this type of procedure an instantia crucis, usually translated as âinstance of the fingerpostâ. If the two outcomes are different, they provide a signpost telling us which of the two hypotheses must be true. This idea might be regarded as the core of the experimental method in science.2 For short, I will refer to this as the Bacon paradigm.
When Bacon introduced the idea of the instantia crucis, he wrote that one might meet one accidentally, but that âfor the most part they are new, and are expressedly and designedly sought for and appliedâ (Aphorism 36 of The New Organon). In other words, he envisaged that hypotheses should be distinguished not through mere passive observation, but by creating experiments that were designed to distinguish between the alternatives. The âcreating experimentsâ part consists in finding the right intervention, X, which will produce distinguishable outcomes for the two hypotheses.
There are several reasons why the method works in the cases considered above. The first feature of this experimental design is perhaps an obvious or even trivial point, though worth making. It is that the intervention, X, is the same in both cases. Clearly, this must be so if the experiment is to be carried out in ignorance of which hypothesis is true.
The second feature is that we must be able to predict in advance what the outcome O would be for each of the two possible hypotheses. We must be able to deduce that if H1 is true, then after applying X, we would observe O1, and that if H2 is the case, then after X, we would observe O2. This combination of stating hypotheses with deducing their consequences explains why this approach is sometimes called the hypothetico-deductive method (it is also the core of what the medical profession refers to as âdifferential diagnosisâ). Solomonâs grasp of psychology told him that if the first woman was the mother, she would rather give up the baby than see it killed, but that the second woman would not care: outcome O1. Similarly, if the second woman was the mother, she would offer to give it up: outcome O2. He could predict this in advance, even without knowing which alternative was true.
The third feature is that the observations O1 and O2 should differ. Ideally, there is no ambiguity or overlap between the two outcomes. In that case, if we know that O1 has happened then we know that O2 has not happened, and so the only possible conclusion we can draw from this is that H2 is false.3 Since one of the hypotheses must be true, it has to be H1. We know in advance in this no-overlap case that the outcome of the experiment will show decisively one way or the other which hypothesis is correct. In the opposite extreme, when the observations are predicted to be identical for the two hypotheses, then it would be foolish to bother with that intervention: no useful information could be obtained which would help distinguish the hypotheses, and the whole exercise would be a waste of effort. It may happen, in realistic cases, that the situation lies somewhere between these two extremes as we will see below.
I have of course simplified the situation in the simple, two alternatives diagram given above. It may be that there might be three, four or even more possible hypotheses in any particular case. It might also be that there are other potential hypotheses that we have not considered, but should have done. In the Solomon case, this problem did not arise because it could be assumed that one of the women was the true mother. But in the second case, where the only options considered by the medical team were lymphoma and Takayasuâs arteritis, it eventually turned out that the true cause of the illness was neither of these. This illustrates a valuable lesson: an experimental design is only as good as the assumptions that go into it, and the conclusions are only dependable if the true hypothesis has actually been included in the initial set of hypotheses.
Of course, if we are lucky enough to generate experimental data that single out one hypothesis, say H3, from the whole set as the true one, that is not the end of the matter. No hypothesis is either complete or totally accurate, and in practice the problem remains, of improving its precision and range. The next stage might be to elaborate H3 to a set of possible more detailed refinements, H31, H32, H33, and to discover which one is most correct; and so on.
Note that the sort of application of the hypothetico-deductive method described above might be described as using the method in parallel: different hypotheses are being compared at the same time. The more traditional view (as for example in Medawarâs book, see Further Reading) uses the method applied in series: a particular hypothesis is tested under a sequence of different conditions, to discover at what point it breaks down; a modified hypothesis is then created to take account of this, which is in turn tested to destruction and a second modification is made, and so on. I would suggest that in practice, the parallel method is more difficult to use well, but more productive when it does work.
I also have not discussed what a set of possible hypotheses might look like in the case of psychology research. Both examples given above referred to very specific instances involving certain individuals, whereas research usually deals with hypotheses that are expected to apply across a whole range of instances and to have a degree of generality. Science is sometimes said to have the making of general statements as one of its central aims. To illustrate an example of a more general situation as provided in an actual published paper, consider the experiment described in Kirkby et al. (2011),4 in which the âmagnocellular theoryâ of dyslexia is put to the test.
This hypothesis claims that developmental dyslexia is caused by a general visual impairment, specifically by deficient binocular coordination. It is known that children with dyslexia show poor binocular coordination when reading, and the magnocellular hypothesis maintains that this is the underlying condition that causes the reading difficulties in dyslexia. This is the first hypothesis. The second hypothesis is that on the contrary, the underlying cause of dyslexia is not a simple visual impairment, but a higher level cognitive impairment in the processing of actual words compared with non-verbal stimuli.
The researchers reasoned that if the first hypothesis were true, then children with dyslexia would show similar impairments compared with typical children in both a reading task, and in a dot-scanning task which did not involve processing words, but which required eye movements similar to those involved in reading and therefore was a measure of binocular coordination. If the second hypothesis were true, however, the performance of the dyslexic children on the dot-processing task would be normal compared with typical children, but they would be impaired on the reading task.
Since the two outcomes are predicted to be different on the basis of the two hypotheses, this study evidently used the Bacon paradigm. The study in fact discovered that the binocular coordination deficit in the dyslexic children tested was specific to the reading task, and was not apparent in the dot-fixation task. The authors concluded that the first hypothesis could be rejected in favour of the second one.
In my experience, this kind of example is quite rare. Few research papers in psychology directly compare two (or more) alternative causal hypotheses as in the Kirkby et al. paper. The majority of papers in psychology involve a simpler idea, explained below. However, even these papers often include a causal hypothesis comparison at some point, as a subsidiary argument. This involves what are known as âconstruct confoundsâ. Put simply, if I apply an experimental treatment to a group of people, and observe an effect, the effect may not be due to the treatment itself but to something different that is inseparable from the treatment but distinct from it.
For example, if a sample ...