eBook - ePub

The New S Language

Name: The New S Language
ISBN: 9781351091886

R. Becker,

720 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

The New S Language

R. Becker,

About this book

This book provides documentation for a new version of the S system released in 1988. The new S enhances the features that have made S popular: interactive computing, flexible graphics, data management and a large collection of functions. The new S features make possible new applications and higher-level programming, including a single unified language, user defined functions as first-class objects, symbolic computations, more accurate numerical calculations and a new approach to graphics. S now provides direct interfaces to the poowerful tool of the UNIX operating system and to algorithms implemented in Fortran and C.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Probability & Statistics

Index

Mathematics

How to Beat the Lottery

One of the best ways of getting acquainted with S is to use it to help you understand a particular set of data. Let’s look at a situation where you might be motivated to perform data analysis.

1.1 Using S to Understand Data

The lottery is a common feature of modern life. Lotteries range from the Irish Sweepstakes, with its yearly large drawings and enormous payoffs, to daily numbers games run by state governments (as well as illegal games run by bookies).

You might wonder why we are presenting lottery data here. There are several answers. First, there is the traditional association between probability theory and gambling—the foundations of statistics go back to studies of games of chance. Lotteries raise many interesting questions. In fact, data analysis may be the only practical way of answering questions such as “Is the lottery fair?” A second reason is that the ubiquity of gambling and lotteries has acquainted almost everyone with the basic concepts involved. A third reason is that a scientific look at lottery data may provide answers to the important questions: “Should I play, and if so, how should I play?”

1.2 New Jersey Pick-It Lottery Data

The specific data we will look at concerns the New Jersey Pick-It Lottery, a daily numbers game run by the state of New Jersey to aid education and institutions. Our data is for 254 drawings just after the lottery was started, from May, 1975 to March, 1976. Pick-It is a parimutuel game, meaning that the winners share a fraction of the money taken in for the particular drawing. Each ticket costs fifty cents and at the time of purchase the player picks a three-digit number ranging from 000 to 999. Half of the money bet during the day is placed in a prize pool (the state takes the other half) and anyone who picked the winning number shares equally in the pool.

The data available from the NJ Lottery Commission gives for each drawing the winning number and the payoff for a winning ticket. The winning numbers are:^†

The corresponding payoffs are:

Thus, for the first drawing, the winning number was 810 and it paid $190.00 to each winning ticket holder. Streams of numbers like this are both difficult to use and boring. One of the best ways to understand the data is to look at it graphically. Before doing any plots, however, we should think of the questions we might want to ask of the data. For example, there have been notorious cases of fraud in lotteries (see Figure 1.1).

Although a single rigged drawing is something that we could not detect with our data, we may be able to detect long-term irregularities. Let’s look at the winning numbers to see if they appear to be chosen at random.

> hist (lottery.number) # Figure 1.2

The histogram looks fairly flat—no need to inform a grand jury.

Of course, most of our attention will probably be directed at the payoffs. Elementary probabilistic reasoning tells us that, unless we can predict the future or rig the lottery, a single number that we pick has a 1 in 1000 chance of winning. If we play many times, we expect about 1 winning number per 1000 plays. Since a ticket costs fifty cents, 1000 plays will cost $500, so we hope to win at least $500 each time we win, otherwise we will lose money in the long run.

Figure 1.2. Histogram of winning numbers from 254 lottery drawings. Since there are 10 bars, the count should be approximately 25 in each bar, if the winning numbers are drawn at random. The small bar at the left represents the one time that 000 was the winning number.

Let’s make a histogram of the payoffs.

> hist (lottery.payoff) # Figure 1.3

Figure 1.3. A histogram of the lottery payoffs shows that payoffs range from less than $100 to more than $800, although the bulk of the payoffs are between $100 and $400.

In our set of data there were a number of payoffs larger than $500—perhaps we have a chance. The widely varying payoffs are primarily due to the parimutuel betting in the lottery; if you win when few others win, you will get a large payoff. If you are unlucky enough to win along with lots of others, the payoff may be relatively small. Let’s see what the largest and smallest payoffs and corresponding winning numbers were:

> max (lottery.payoff) # the largest payoff
[1] 869.5
> lottery.number[ lottery.payoff==max(lottery.payoff) ]
[1] 499
> min(lottery.payoff) # the smallest payoff
[1] 83
> lottery.number[ lottery.payoff==min(lottery.payoff) ]
[1] 123

Winners who bet on “123” must have been disappointed; $83 is not a very large payoff. On the other hand, $869.50 is very nice.

Since the winning numbers and the payoffs come in pairs, a number and a payoff for each drawing, we can produce a scatterplot of the data to see if there is any relationship between the payoff and the winning number.

> plot(lottery.number, lottery.payoff) # Figure 1.4

What do you see in the picture? Does the payoff seem to depend on the position of the winning number? Perhaps it would help to add a “middle” line that follows the overall pattern of the data:

Figure 1.4. Scatterplot of winning number and payoff for the 254 different lottery drawings.

> lines ( lowess (lottery.number, lottery.payoff, f=.2) )
> # Figure 1.5

Can you see the interesting characteristics now in Figure 1.5? There are substantially higher payoffs for numbers with a leading zero, meaning fewer people bet on these numbers. Perhaps that reflects people’s reluctance to think of numbers with leading zeros. After all, no one writes $010 on a ten dollar check! Also note that, except for the numbers with leading zeros, payoffs seem to increase as the winning number increases.

It would be interesting to see exactly what numbers correspond to the large payoffs. Fortunately, with an interactive graphical input device, we can do that by simply pointing at the “outliers”:

> identify (lottery.number, lottery.payoff, lottery.number)
> # Figure 1.6

Can you see the pattern in the numbers with very high payoffs? Spend some time thinking before looking at the footnote, which contains the explanation.^† Did you find the pattern? If so, you have accomplished something very important—you learned something new by looking at the data, and afterwards found that it could be explained by the rules of the game. Much of data analysis consists of detecting clues from patterns in the data and then following up on the clues to better understand the data.

Figure 1.5. A smooth curve is superimposed on the winning number and payoff scatterplot.

Figure 1.6. Outliers on the scatterplot are labelled...

Cover
Half Title
Title Page
Copyright Page
Dedication
Table of Contents
1 How to Beat the Lottery
2 Tutorial Introduction to S
3 Using the S Language
4 Graphical Methods in S
5 Data in S
6 Writing Functions
7 More on Writing Functions
8 More about Data
9 Examples and Case Studies
10 Advanced Graphics
11 How S Works
Bibliography
Appendix 1 S Function Documentation
Appendix 2 S Dataset Documentation
Appendix 3 Index to S Functions
Appendix 4 Old-S and S
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access The New S Language by R. Becker in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions