PART 1
Inference and Evidence in Medieval Books
Chapter 1
The Calculus of Calculus: W.W. Greg and the Mathematics of Everyman Editions
(with Rosemary A. Roberts)
There are four recorded sixteenth-century copies of Everyman, each from a different edition: two by John Skot (1528, STC 10606) (1535, STC 10606.5), and two by Richard Pynson (1515, STC 10604) (1526, STC 10604.5). The two Skot copies are complete. The first Pynson copy is a four-leaf fragment (including the colophon); the second Pynson copy lacks signature A (12 leaves). In 1910, in the final volume of his critical editions of these fragments, W.W. Greg considered what these surviving copies indicate about the popularity of this text; how many editions might have been produced in the early sixteenth century? Since no two of the four surviving copies are from the same edition, what can we say about the total number of editions that were produced? Greg’s note is as follows:
It is obvious that, if no more than 4 editions are printed, it is very unlikely that, of 4 surviving copies, each should belong to a different edition (in point of fact the chance is only 3/32 or about 1 in 11), and that as the number of editions printed increases so does the probability of such an occurrence. There must therefore be a point (a particular number of editions) at which the chance approximates most nearly to 1/2. That number is 10, for which the actual chance is 1/2 + 1/250. Ten, therefore, is the smallest number of editions which make the actually occurring arrangement as likely as not to occur.1
Greg attributes the solution to this problem and the mathematics to J.E. Littlewood, professor of mathematics at Trinity College, Cambridge. But Greg does not make explicit the nature of the precise problem he presented to Littlewood, nor what assumptions he included with that problem.
Our purpose here is to reconstruct those assumptions, and to deal more generally with the implications of the use of probability in such bibliographical problems. We will deal with the question in two parts, based on the two sets of assumptions that Greg and his mathematician seem to have used. In part 1, we will first reconstruct the mathematical model that was used by Greg and Littlewood. The assumptions specific to the various calculations are as follows, and Greg and Littlewood must at some point have discussed these explicitly:
The total number of books is “large” (say, “over 100”; Greg’s calculations are not applicable if the hypothetical editions consist of only one or two members);
All editions have approximately the same number of books.
We will then relax these restrictions in order to see to what extent what might be called the “Printers of the Mind” situation (under which all printers print the same number of book-copies in each edition) operates in the real world of printing (where printers produce editions of different sizes).
In part 2, we will deal explicitly and critically with a second set of assumptions, those that are necessary to construct the mathematical model. These are the following:
All book-copies under consideration have an equal chance of surviving;
Such book-copies survive independently of each other;
Each book-copy either survives or it does not survive.
Such fundamental assumptions were perhaps never articulated by Greg and Littlewood in their discussions, and Greg’s note contains not even an allusion to them. As a mathematician, Littlewood might have found such assumptions so basic as hardly to deserve mention; but for bibliography, they are of course extremely problematic.
Our expectations were both mathematical and bibliographical. Mathematically, we expected that the solution to the problem would depend on fairly restrictive assumptions of regularity: uniform edition size and minimum edition size. These expectations were not entirely accurate, and there are interesting implications here easily brought out through the use of a modern calculator that might well have escaped Greg and Littlewood. Bibliographically, we expected to find that material and quantitative evidence such as that at issue here can only support pre-existent assumptions—that any mathematical model would do no more than reflect the set of initial bibliographical assumptions brought to it. Again, the mathematics suggests that this expectation also needs modification.
Part 1: The Greg-Littlewood Solution
According to Greg, the likelihood that any four surviving book copies will belong to four separate editions is 3/32. If we take extreme cases (for example, edition sizes of one), we see that this figure does not apply to all cases: if each print-run is one, then the chances that four extant copies belong to four separate editions is 100% regardless of the number of editions. These figures, therefore, seem to be based on at least one unstated assumption: (1) that the total number of book-copies is sufficiently “large.” A second assumption is also necessary here: (2) that each edition is the same size. This is the restriction that we will relax later in the discussion below.2
Let there be n editions with k1, k2, k3, …, kn the number of book-copies in each edition. We assume uniform edition-size, that is: k1 = k2 = k3 = … = kn. We will call this edition-size k. The probability that four surviving book-copies will come from different editions is:
In this formula, is the number of ways to choose the 4 editions from n editions;
is the number of ways to choose 1 of the k book-copies from each of the four chosen editions; and is the number of ways to choose 4 book-copies from the nk available.
Now:
As long as nk, the total number of book-copies, is “large” (about 100) this is approximately:
For values of n, the number of editions, ranging from 4 to 10, we get the following values for the probability that 4 copies belong to different editions. This is clearly the same series and the same solution presented to Greg by Littlewood:
Number of Editions | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Probability | 3/32 | 24/125 | 60/216 | 120/343 | 210/512 | 336/729 | 504/1000 |
As the number of editions increases beyond 10, the probability increases and can be made arbitrarily close to one. Greg chose to focus on the number of editions needed to make the probability exceed 1/2; we call this the “magic number.” Clearly, however, it is possible for the four surviving book-copies to come from different editions with any number of editions (as long as there are four or more) and Greg’s choice of 1/2 is somewhat arbitrary.
Since the figures in the above series are exactly Greg’s figures (504/1000 = 1/2 + 1/250), it appears that Greg and Littlewood simply assumed for these calculations that all editions were the same size. Based on what were generally accepted edition sizes (in the early twentieth century, these might have been estimated at between 500–1500 for pamphlets such as Everyman), it is interesting to see what the definition of “large” must be in order for Greg’s math to work. Contrary to our expectations, the formula is not particularly sensitive to variation in edition size, and works quite well for any real world estimates.
Assuming edition sizes of 1000, the specific probabilities of four surviving copies belonging to four separate editions are 3.005/32 (if there are only four editions) and 5.043/10 (if there are ten editions); the number of editions required to produce a greater than 50% chance that four copies will belong to four separate editions is ten. Greg’s figures are nearly the same. Assuming an edition size of 500 yields the following corresponding probabilities: 3.009/32 and 5.046/10. There is no significant change. If we assume 100 book-copies per edition (perhaps the lowest figure anyone might accept), the corresponding fractions are 3.045/32 and 5.070/10. Again, there is little change. At 10 book-copies per edition, there is again, much to our surprise, little change in these fractions: 3.501/32 and 5.355/10. The “magic number,” where the probability most nearly approaches 1/2, is 9 (.4931). To attain more than a 50% chance that the four copies will belong to four separate editions still requires ten editions.
Under the assumption of editions of uniform size, then, the calculations of Greg and Littlewood seem to hold for any range of edition-size that bibliographers would accept. The probability is not particularly sensitive to edition size—a result we found somewhat surprising.
Now let us relax the operative assumption here of uniform edition size (that is, that k1 = k2 = k3 = ... = kn); we doubt Greg and Littlewood went this far. The mathematics is slightly more complex, but the formulae are simple variants of the formulae above. Again let there be n editions with k1, ... kn the number of book-copies in each edition. It is convenient to call the average edition size . Then the
total number of book-copies is n .
Suppose that the four surviving books come from editions 1, 2, 3, and 4. The probability of this is:
Clearly this depends on k1, k2, k3, and k4—the number of book-copies in editions 1, 2, 3, and 4. If we calculate the probability of the four surviving books coming from editions 1, 2, 3, and 5, this...