Experimental Designs
eBook - ePub

Experimental Designs

Barak Ariel, Matthew Bland, Alex Sutherland

Share book
  1. 192 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Experimental Designs

Barak Ariel, Matthew Bland, Alex Sutherland

Book details
Book preview
Table of contents
Citations

About This Book

The fourth book in The SAGE Quantitative Research Kit, this resource covers the basics of designing and conducting basic experiments, outlining the various types of experimental designs available to researchers, while providing step-by-step guidance on how to conduct your own experiment.

As well as an in-depth discussion of Random Controlled Trials (RCTs), this text highlights effective alternatives to this method and includes practical steps on how to successfully adopt them. Topics include:

· The advantages of randomisation

·How to avoid common design pitfalls that reduce the validity of experiments

· How to maintain controlled settings and pilot tests
· How to conduct quasi-experiments when RCTs are not an option

Practical and succintly written, this book will give you the know-how and confidence needed to succeed on your quantitative research journey.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Experimental Designs an online PDF/ePUB?
Yes, you can access Experimental Designs by Barak Ariel, Matthew Bland, Alex Sutherland in PDF and/or ePUB format, as well as other popular books in Social Sciences & Social Science Research & Methodology. We have over one million books available in our catalogue for you to explore.

Information

1 Introduction

Chapter Overview

  • Contextualising randomised experiments in a wide range of causal designs 7
  • Causal designs and the scientific meaning of causality 9
  • Why should governments and agencies care about causal designs? 11
  • Further Reading 13
Formal textbooks on experiments first surfaced more than a century ago, and thousands have emerged since then. In the field of education, William McCall published How to Experiment in Education in 1923; R.A. Fisher, a Cambridge scholar, released Statistical Methods for Research Workers and The Design of Experiments in 1925 and 1935, respectively; S.S. Stevens circulated his Handbook of Experimental Psychology in 1951. We also have D.T. Campbell and Stanley’s (1963) classic Experimental and Quasi-Experimental Designs for Research, and primers like Shadish et al.’s (2002) Experimental and Quasi-Experimental Designs for Generalised Causal Inference, which has been cited nearly 50,000 times. These foundational texts provide straightforward models for using experiments in causal research within the social sciences.
Fundamentally, this corpus of knowledge shares a common long-standing methodological theme: when researchers want to attribute causal inferences between interventions and outcomes, they need to conduct experiments. The basic model for demonstrating cause-and-effect relationships relies on a formal, scientific process of hypothesis testing, and this process is confirmed through the experimental design. One of these fundamental processes dictates that causal inference necessarily requires a comparison. A valid test of any intervention involves a situation through which the treated group (or units) can be compared – what is termed a counterfactual. Put another way, evidence of ‘successful treatment’ is always relative to a world in which the treatment was not given (D.T. Campbell, 1969). Whether the treatment group is compared to itself prior to the exposure to the intervention, or a separate group of cases unexposed to the intervention, or even just some predefined criterion (like a national average or median), contrast is needed. While others might disagree (e.g. Pearl, 2019), without an objective comparison, we cannot talk about causation.
Causation theories are found in different schools of thought (for discussions, see Cartwright & Hardie, 2012; Pearl, 2019; Wikström, 2010). The dominant causal framework is that of ‘potential outcomes’ (or the Neyman–Rubin causal framework; Rubin, 2005), which we discuss herein and which many of the designs and examples in this book use as their basis. Until mainstream experimental disciplines revise the core foundations of the standard scientific inquiry, one must be cautious when recommending public policy based on alternative research designs. Methodologies based on subjective or other schools of thought about what causality means will not be discussed in this book. To emphasise, we do not discount these methodologies and their contribution to research, not least for developing logical hypotheses about the causal relationships in the universe. We are, however, concerned about risks to the validity of these causal claims and how well they might stand a chance of being implemented in practice. We discuss these issues in more detail in Chapter 4. For further reading, see Abell and Engel (2019) as well as Abend et al. (2013).
However, not all comparisons can be evaluated equally. For the inference that a policy or change was ‘effective’, researchers need to be sure that the comparison group that was not exposed to the intervention resembles the group that was exposed to the intervention as much as possible. If the treatment group and the no-treatment group are incomparable – not ‘apples to apples’ – it then becomes very difficult to ‘single out’ the treatment effect from pre-existing differences. That is, if two groups differ before an intervention starts, how can we be sure that it was the introduction of the intervention and not the pre-existing differences that produce the result?
To have confidence in the conclusions we draw from studies that look at the causal relationship between interventions and their outcomes means having only one attributable difference between treatment and no-treatment conditions: the treatment itself. Failing this requirement suggests that any observed difference between the treatment and no-treatment groups can be attributed to other explanations. Rival hypotheses (and evidence) can then falsify – or confound – the hypothesis about the causal relationship. In other words, if the two groups are not comparable at baseline, then it can be reasonably argued that the outcome was caused by inherent differences between the two groups of participants, by discrete settings in which data on the two groups were collected, or through diverse ways in which eligible cases were recruited into the groups. Collectively, these plausible yet alternative explanations to the observed outcome, other than the treatment effect, undermine the test. Therefore, a reasonable degree of ‘pre-experimental comparability’ between the two groups is needed, or else the claim of causality becomes speculative. We spend a considerable amount of attention on this issue throughout the book, as all experimenters share this fundamental concern regarding equivalence.
Experiments are then split into two distinct approaches to achieve pre-experimental comparability: statistical designs and randomisation. Both aim to facilitate equitable conditions between treatment and control conditions but achieve this goal differently. Statistical designs, often referred to as quasi-experimental methods, rely on statistical analysis to control and create equivalence between the two groups. For example, in a study on the effect of police presence on crime in particular neighbourhoods, researchers can compare the crime data in ‘treatment neighbourhoods’ before and after patrols were conducted, and then compare the results with data from ‘control neighbourhoods’ that were not exposed to the patrols (e.g. Kelling et al., 1974; Sherman & Weisburd, 1995). Noticeable differences in the before–after comparisons would then be attributed to the police patrols. However, if there are also observable differences between the neighbourhoods or the populations who live in the treatment and the no-treatment neighbourhoods, or the types of crimes that take place in these neighbourhoods, we can use statistical controls to ‘rebalance’ the groups – or at least account for the differences between groups arising from these other variables. Through statistically controlling for these other variables (e.g. Piza & O’Hara, 2014; R.G. Santos & Santos, 2015; see also The SAGE Quantitative Research Kit, Volume 7), scholars could then match patrol and no-patrol areas and take into account the confounding effect of these other factors. In doing so, researchers are explicitly or implicitly saying ‘this is as good as randomisation’. But what does that mean in practice?
While on the one hand, we have statistical designs, on the other, we have experiments that use randomisation, which relies on the mathematical foundations of probability theory (as discussed in The SAGE Quantitative Research Kit, Volume 3). Probability theory postulates that through the process of randomly assigning cases into treatment and no-treatment conditions, experimenters have the best shot of achieving pre-experimental comparability between the two groups. This is owing to the law of large numbers (or ‘logic of science’ according to Jaynes, 2003). Allocating units at random does, with a large enough sample, create balanced groups. As we illustrate in Chapter 2, this balance is not just apparent for observed variables (i.e. what we can measure) but also in terms of the unobserved factors that we cannot measure (cf. Cowen & Cartwright, 2019). For example, we can match treatment and comparison neighbourhoods in terms of crimes reported to the police before the intervention (patrols), and then create balance in terms of this variable (Saunders et al., 2015; see also Weisburd et al., 2018). However, we cannot create true balance between the two groups if we do not have data on unreported crimes, which may be very different in the two neighbourhoods.
We cannot use statistical controls where no data exist or where we do not measure something. The randomisation of units into treatment and control conditions largely mitigates this issue (Farrington, 2003a; Shadish et al., 2002; Weisburd, 2005). This quality makes, in the eyes of many, randomised experiments a superior approach to other designs when it comes to making causal claims (see the debates about ‘gold standard’ research in Saunders et al., 2016). Randomised experiments have what is called a high level of internal validity (see review in Grimshaw et al., 2000; Schweizer et al., 2016). What this means is that, when properly conducted, a randomised experiment gives one the greatest confidence levels that the effect(s) observed arose because of the cause (randomly) introduced by the experiment, and not due to something else.
The parallel phrase – external validity – means the extent to which the results from this experiment can apply elsewhere in the world. Lab-based randomised experiments typically have very high internal validity, but very low external validity, because their conditions are highly regulated and not replicable in a ‘real-world’ scenario. We review these issues in Chapter 3.
Importantly, random allocation means that randomised experiments are prospective not retrospective – that is, testing forthcoming interventions, rather than ones that have already been administered where data have already been produced. Prospective studies allow researchers to maintain more control compared to retrospective studies. The researcher is involved in the very process of case selection, treatment fidelity (the extent to which a treatment is delivered or implemented as intended) and the data collated for the purposes of the experiment. Experimenters using random assignment are therefore involved in the distribution and management of units into different real-life conditions (e.g. police patrols) ex ante and not ex post. As the scholar collaborates with a treatment provider to jointly follow up on cases, and observe variations in the measures within the treatment and no-treatment conditions, they are in a much better position to provide assurance that the fidelity of the test is maintained throughout the process (Strang, 2012). These features rarely exist in quasi-experimental designs, but at the same time, randomised experiments require scientists to pay attention to maintaining the proper controls over the administration of the test. For this reason, running a randomised controlled trial (RCT) can be laborious.
In Chapter 5, we cover an underutilised instrument – the experimental protocol – and illustrate the importance of conducting a pre-mortem analysis: designing and crafting the study before venturing out into the field. The experimental protocol requires the researcher to address ethical considerations: how we can secure the rights of the participants, while advancing scientific knowledge through interventions that might violate these rights. For example, in policing experiments where the participants are offenders or victims, they do not have the right to consent; the policing strategy applied in their case is predetermined, as offenders may be mandated by a court to attend a treatment for domestic violence. However, the allocation of the offenders into any specific treatment is conducted randomly (see Mills et al., 2019). Of course, if we know that a particular treatment yields better results than the comparison treatment (e.g. reduces rates of repeat offending compared to the rates of reoffending under control conditions), then there is no ethical justification for conducting the experiment. When we do not have evidence that supports the hypothesised benefit of the intervention, however, then it is unethical not to conduct an experiment. After all, the existing intervention for domestic batterers can cause backfiring effects and lead to more abuse. This is where experiments are useful: they provide evidence on relative utility, based on which we can make sound policy recommendations. Taking these points into consideration, the researcher has a duty to minimise these and other ethical risks as much as possible through a detailed plan that forms part of the research documentation portfolio.
Vitally, the decision to randomise must also then be followed with the question of which ‘units’ are the most appropriate for random allocation. This is not an easy question to answer because there are multiple options, thus the choice is not purely theoretical but a pragmatic query. The decision is shaped by the very nature of the field, settings and previous tests of the intervention. Some units are more suitable for addressing certain theoretical questions than others, so the size of the study matters, as well as the dosage of the treatment. Data availability and feasibility also determine these choices. Experimenters need to then consider a wide range of methods of actually conducting the random assignment, choosing between simple, ‘trickle flow’, block random assignment, cluster, stratification and other perhaps more nuanced and bespoke sequences of random allocation designs. We review each of these design options in Chapter 2.
We then discuss issues with control with some detail in Chapter 3. The mechanisms used to administer randomised experiments are broad, and the technical literature on these matters is rich. Issues of group imbalances, sample sizes and measurement considerations are all closely linked to an unbiased experiment. Considerations of these problems begin in the planning stage, with a pre-mortem assessment of the possible pitfalls that can lead the experimenter to lose control over the test (see Klein, 2011). Researchers need to be aware of threats to internal validity, as well as the external validity of the experimental tests, and find ways to avoid them during the experimental cycle. We turn to these concerns in Chapter 3 as well.
In Chapter 4, we account for the different types of experimental designs available in the social sciences. Some are as ‘simple’ as following up with a group of participants after their exposure to a given treatment, having been randomly assigned into treatment and control conditions, while others are more elaborate, multistage and complex. The choice of applying one type of test and not another is both conceptual and pragmatic. We rely heavily on classic texts by D.T. Campbell and Stanley (1963), Cook and Campbell (1979) and the amalgamation of these works by Shadish et al. (2002), which detail the mechanics of experimental designs, in addition to their rationales and pitfalls. However, we provide more updated examples of experiments that have applied these designs within the social sciences. Many of our examples are criminological, given our backgrounds, but are applicable to other experimental disciplines.
Chapter 4 also provides some common types of quasi-experimental designs that can be used when the conditions are not conducive to random assignment (see Shadish et al., 2002, pp. 269–278). Admittedly, the stack of evidence in causal research largely comprises statistical techniques, including the regression discontinuity design, propensity score matching, difference-in-difference design, and many others. We introduce these approaches and refer the reader to the technical literature on how to estimate causal inference with these advanced statistics.
Before venturing further, we need to contextualise experiments in a wide range of study designs. Understanding the role that causal research has in science, and what differentiates it from other methodological approaches, is a critical first step. To be clear, we do not argue that experiments are ‘superior’ compared to other methods; put simply, the appropriate research design follows the research question and the research settings. The utility of experiments is found in their ability to allow researchers to test specific hypotheses about causal relationships. Scholars interested in longitudinal processes, qualitative internal dynamics (e.g. perceptions) or descriptive assessments of phenomena use observational designs. These designs are a good fit for these lines of scientific inquiries. Experiments – and within this category we include both quasi-experimental designs and RCTs of various types – are appropriate when making causal inferences.
Finally, we then defend the view that precisely the same arguments can be made by policymakers who are interested in evidence-based policy: experiments are needed for impact evaluations, preferably with a randomisation component of allocating...

Table of contents