Introduction to Statistics Through Resampling Methods and R
eBook - ePub

Introduction to Statistics Through Resampling Methods and R

Phillip I. Good

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Introduction to Statistics Through Resampling Methods and R

Phillip I. Good

Book details
Book preview
Table of contents
Citations

About This Book

A highly accessible alternative approach to basic statistics Praise for the First Edition: "Certainly one of the most impressive little paperback 200-page introductory statistics books that I will ever see... it would make a good nightstand book for every statistician."—Technometrics Written in a highly accessible style, Introduction to Statistics through Resampling Methods and R, Second Edition guides students in the understanding of descriptive statistics, estimation, hypothesis testing, and model building. The book emphasizes the discovery method, enabling readers to ascertain solutions on their own rather than simply copy answers or apply a formula by rote. The Second Edition utilizes the R programming language to simplify tedious computations, illustrate new concepts, and assist readers in completing exercises. The text facilitates quick learning through the use of: More than 250 exercises—with selected "hints"—scattered throughout to stimulate readers' thinking and to actively engage them in applying their newfound skills An increased focus on why a method is introduced Multiple explanations of basic concepts Real-life applications in a variety of disciplines Dozens of thought-provoking, problem-solving questions in the final chapter to assist readers in applying statistics to real-life applications Introduction to Statistics through Resampling Methods and R, Second Edition is an excellent resource for students and practitioners in the fields of agriculture, astrophysics, bacteriology, biology, botany, business, climatology, clinical trials, economics, education, epidemiology, genetics, geology, growth processes, hospital administration, law, manufacturing, marketing, medicine, mycology, physics, political science, psychology, social welfare, sports, and toxicology who want to master and learn to apply statistical methods.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Introduction to Statistics Through Resampling Methods and R an online PDF/ePUB?
Yes, you can access Introduction to Statistics Through Resampling Methods and R by Phillip I. Good in PDF and/or ePUB format, as well as other popular books in Mathematik & Wahrscheinlichkeitsrechnung & Statistiken. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Wiley
Year
2012
ISBN
9781118497579
Chapter 1
Variation
If there were no variation, if every observation were predictable, a mere repetition of what had gone before, there would be no need for statistics.
In this chapter, you’ll learn what statistics is all about, variation and its potential sources, and how to use R to display the data you’ve collected. You’ll start to acquire additional vocabulary, including such terms as accuracy and precision, mean and median, and sample and population.

1.1 VARIATION

We find physics extremely satisfying. In high school, we learned the formula S = VT, which in symbols relates the distance traveled by an object to its velocity multiplied by the time spent in traveling. If the speedometer says 60 mph, then in half an hour, you are certain to travel exactly 30 mi. Except that during our morning commute, the speed we travel is seldom constant, and the formula not really applicable. Yahoo Maps told us it would take 45 minutes to get to our teaching assignment at UCLA. Alas, it rained and it took us two and a half hours.
Politicians always tell us the best that can happen. If a politician had spelled out the worst-case scenario, would the United States have gone to war in Iraq without first gathering a great deal more information?
In college, we had Boyle’s law, V = KT/P, with its tidy relationship between the volume V, temperature T and pressure P of a perfect gas. This is just one example of the perfection encountered there. The problem was we could never quite duplicate this (or any other) law in the Freshman Physics’ laboratory. Maybe it was the measuring instruments, our lack of familiarity with the equipment, or simple measurement error, but we kept getting different values for the constant K.
By now, we know that variation is the norm. Instead of getting a fixed, reproducible volume V to correspond to a specific temperature T and pressure P, one ends up with a distribution of values of V instead as a result of errors in measurement. But we also know that with a large enough representative sample (defined later in this chapter), the center and shape of this distribution are reproducible.
Here’s more good and bad news: Make astronomical, physical, or chemical measurements and the only variation appears to be due to observational error. Purchase a more expensive measuring device and get more precise measurements and the situation will improve.
But try working with people. Anyone who spends any time in a schoolroom—whether as a parent or as a child, soon becomes aware of the vast differences among individuals. Our most distinct memories are of how large the girls were in the third grade (ever been beat up by a girl?) and the trepidation we felt on the playground whenever teams were chosen (not right field again!). Much later, in our college days, we were to discover there were many individuals capable of devouring larger quantities of alcohol than we could without noticeable effect. And a few, mostly of other nationalities, whom we could drink under the table.
Whether or not you imbibe, we’re sure you’ve had the opportunity to observe the effects of alcohol on others. Some individuals take a single drink and their nose turns red. Others can’t seem to take just one drink.
Despite these obvious differences, scheduling for outpatient radiology at many hospitals is done by a computer program that allots exactly 15 minutes to each patient. Well, I’ve news for them and their computer. Occasionally, the technologists are left twiddling their thumbs. More often the waiting room is overcrowded because of routine exams that weren’t routine or where the radiologist wanted additional X-rays. (To say nothing of those patients who show up an hour or so early or a half hour late.)
The majority of effort in experimental design, the focus of Chapter 6 of this text, is devoted to finding ways in which this variation from individual to individual won’t swamp or mask the variation that results from differences in treatment or approach. It’s probably safe to say that what distinguishes statistics from all other branches of applied mathematics is that it is devoted to characterizing and then accounting for variation in the observations.
Consider the Following Experiment
You catch three fish. You heft each one and estimate its weight; you weigh each one on a pan scale when you get back to the dock, and you take them to a chemistry laboratory and weigh them there. Your two friends on the boat do exactly the same thing. (All but Mike; the chemistry professor catches him in the lab after hours and calls campus security. This is known as missing data.)
The 26 weights you’ve recorded (3 × 3 × 3−1 when they nabbed Mike) differ as result of measurement error, observer error, differences among observers, differences among measuring devices, and differences among fish.

1.2 COLLECTING DATA

The best way to observe variation is for you, the reader, to collect some data. But before we make some suggestions, a few words of caution are in order: 80% of the effort in any study goes into data collection and preparation for data collection. Any effort you don’t expend initially goes into cleaning up the resulting mess. Or, as my carpenter friends put it, “measure twice; cut once.”
We constantly receive letters and emails asking which statistic we would use to rescue a misdirected study. We know of no magic formula, no secret procedure known only to statisticians with a PhD. The operative phrase is GIGO: garbage in, garbage out. So think carefully before you embark on your collection effort. Make a list of possible sources of variation and see if you can eliminate any that are unrelated to the objectives of your study. If midway through, you think of a better method—don’t use it.* Any inconsistency in your procedure will only add to the undesired variation.

1.2.1 A Worked-Through Example

Let’s get started. Suppose we were to record the time taken by an individual to run around the school track. Before turning the page to see a list of some possible sources of variation, test yourself by writing down a list of all the factors you feel will affect the individual’s performance. Obviously, the running time will depend upon the individual’s sex, age, weight (for height and age), and race. It also will depend upon the weather, as I can testify from personal experience.
Soccer referees are required to take an annual physical examination that includes a mile and a quarter run. On a cold March day, the last time I took the exam in Michigan, I wore a down parka. Halfway through the first lap, a light snow began to fall that melted as soon as it touched my parka. By the third go around the track, the down was saturated with moisture and I must have been carrying a dozen extra pounds. Needless to say, my running speed varied considerably over the mile and a quarter.
As we shall see in the chapter on analyzing experiments, we can’t just add the effects of the various factors, for they often interact. Consider that Kenyan’s dominate the long-distance races, while Jamaicans and African-Americans do best in...

Table of contents