1
Why Conduct Phase IāII Trials?
CONTENTS
1.1 The Conventional Paradigm
1.2 The Continual Reassessment Method
1.3 Problems with Conventional Dose-Finding Methods
1.3.1 3+3 Algorithms
1.3.2 Some Comparisons
1.3.3 Problems Going from Phase I to Phase II
1.3.4 Consequences of Ignoring Information
1.3.5 Late-Onset Outcomes
1.3.6 Expansion Cohorts
1.3.7 Guessing a Schedule
1.3.8 Patient Heterogeneity
āThe conventional view serves to protect us from the painful job of thinking.ā
John Kenneth Galbraith
1.1 The Conventional Paradigm
In its simplest form, the conventional clinical trial paradigm for evaluating effects of a new agent or agents in humans has three phases. In phase I, the goal is to determine an acceptably safe dose based on Toxicity. For a single agent, a set of doses to be explored is specified. If two agents given in combination are to be studied, then a set of dose pairs is specified, either as a rectangular grid or a staircase-shaped set where only one of the two doses increases with each step. In some phase I trials, doses are chosen from a continuum, either an interval for one agent or a rectangular set for a pair of agents (Thall et al., 2003; Huo et al., 2012). Phase I is conducted by treating small cohorts of patients, most commonly of size 3, 2, or 1. The dose, or dose pair, of each successive cohort is chosen adaptively based on the dose-Toxicity data of previous cohorts. Depending on the method used, each cohortās dose may be above, below, or the same as that of the previous cohort. The final chosen dose often is called the āmaximum tolerated (or tolerable) doseā (MTD). Conventional phase I designs do not use Efficacy in the dose-finding algorithm, but usually record it as a secondary outcome. In some phase I trials, a so-called āexpansion cohortā of some specified number of additional patients is treated at the chosen MTD in order to obtain a more reliable estimator of the probability of Toxicity at that dose. While, intuitively, the idea of enrolling an expansion cohort at the MTD may seem very sensible, some very undesirable problems that are likely to arise from this common practice will be discussed below.
Phase I trials originated with evaluation of new cytotoxic agents, āchemotherapy,ā for cancer. The underlying motivation is the assumption that, as dose is increased, the probabilities of both Toxicity and Efficacy must increase. The latter assumption is implicit, however, since Efficacy is not used for making decisions in conventional phase I trials. Logical flaws and undesirable consequences of making this implicit assumption and determining an MTD based on Toxicity while ignoring Efficacy will be discussed below.
Once an MTD has been determined in phase I, conventionally, treatment Efficacy is evaluated in phase II. Most phase II designs are based on a binary indicator of Efficacy, with secondary aims to estimate a variety of other outcomes, such as pharmacokinetic or pharmacodynamic variables. The rationale for using early Efficacy alone for phase II treatment evaluation has three components. The first is that, if the Efficacy event is achieved, this increases the probability of long-term patient benefit, such as extending survival time. The second is that, in early treatment evaluation, it often is impractical to wait to observe long-term events. The third is that, since an MTD has been determined in phase I, safety is no longer a major concern. Severe flaws with this third assumption, and their consequences, will be discussed below.
In oncology, most phase II trials of a new experimental treatment are single-arm, without randomization against a standard treatment. While randomized phase II trials are used more commonly in other disease areas, in oncology a culture of single arm phase II trials continues to persist. In phase II cancer trials, most commonly, Simonās two-stage design (Simon, 1989) is used. This design includes an early stopping rule after its first stage. It has two versions, one optimized to minimize expected sample size and the other to minimize the maximum sample size, each subject to specified Type I and Type II error probability constraints. The Simon design is very simple and very easy to implement using a freely available computer program. It allows one to specify fixed null and alternative response probabilities and Type I and Type II error probabilities and, given these four input values, quickly compute sample sizes and decision cut-offs for the two stages. There are numerous other phase II designs, and a very large literature exists. More complex phase II designs may include more than one interim decision (Fleming, 1982), delayed Efficacy (Cai et al., 2014a), and they may monitor both Efficacy and Toxicity (Thall et al., 1995; Thall and Sung, 1998; Conaway and Petroni, 1995; Bryant and Day, 1995; Thall and Cheng, 1999). As noted earlier, while randomization is more common in phase II trials outside oncology, the use of randomization is very limited in phase II cancer trials. In phase II oncology settings where multiple experimental treatments must be screened, Simon et al. (1985) proposed the idea of using randomization in phase II cancer trials to obtain unbiased treatment comparisons. At that time, this proposal was considered controversial. Unfortunately, despite the great advances of statistical knowledge over the past 30 years, randomization in phase II oncology trials remains controversial (Rubinstein et al., 2005; Sharma et al., 2011; Gan et al., 2010).
If it is concluded based on a phase II trial that an experimental treatment is promising, in that its Efficacy probability is sufficiently high compared to some standard, this may motivate conduct of a confirmatory phase III trial. A typical phase III design is based on survival time or some other long-term outcome, randomizes patients between the experimental treatment and a standard therapy, uses a group sequential testing procedure to decide whether there is a significant treatment difference, controls the overall false positive decision probability to be ⤠.05, and has specified power, usually .80 or .90, to detect a specified treatment effect difference. Most phase III trials are very large, often last many years, cost millions of dollars, involve multiple medical centers, dozens of physicians, nurses, and other medical and administrative support personnel, and enroll hundreds or thousands of patients. While we will not explore phase III designs here, some useful references are Lan and DeMets (1983); Friedman et al. (2010); Jennison and Turnbull (1999).
Phase IIāIII designs are a separate topic, but they are quite important and thus worth mentioning. These designs hybridize phase II screening and confirmatory phase III into a single trial. While phase IIāIII designs certainly are not conventional, they provide practical solutions to severe problems ignored by the conventional clinical trial design paradigm. Unfortunately, phase IIāIII designs seldom are used in practice. The underlying idea for phase IIāIII trials was given initially by Ellenberg and Eisenberger (1985), although they did not formalize the methodology. Thall et al. (1988) provided the first...