CHAPTER 1
Introduction
CONTENTS
1.1 Samples Versus Populations
1.2 Software
1.3 R Basics
1.3.1 Entering Data
1.3.2 RFunctions and Packages
1.3.3 Data Sets
1.3.4 Arithmetic Operations
Statistical methods that are used by a wide range of disciplines consist of at least three basic components:
- Experimental design, meaning the planning and carrying out of a study.
- Summarizing data, using what are called descriptive statistics.
- Inferential techniques, which roughly are methods aimed at making predictions or generalizations about a population of individuals or things when not all individuals or things can be measured.
The fundamental goal in this book is to summarize the basic statistical techniques associated with these three components, with an emphasis on the latter two components, in a manner that makes them accessible to students not majoring in statistics. Of particular importance is fostering the ability of the reader to think critically about how data are summarized and analyzed.
The mathematical foundation of the statistical tools routinely used today was developed about two centuries ago by Pierre-Simon Laplace and Carl Friedrich Gauss in a series of remarkable advances. About a century ago, important refinements and extensions were made by Karl Pearson, Jerzy Neyman, Egon Pearson, William Gosset, and Sir Ronald Fisher. The strategies and methods that they developed are routinely used today.
During the last half century, however, literally hundreds of journal articles have made it abundantly clear that there are three basic concerns associated with these routinely used techniques that are of fundamental importance. This is not to say that they should be abandoned, but it is important to understand their limitations as well as how these limitations might be addressed with methods developed during the last half century. It is evident that any routinely used statistical method that addresses basic issues needs to be covered in any introductory statistics book aimed at students and researchers trying to understand their data. Simultaneously, it seems equally evident that when relevant insights are made regarding the proper use and interpretation of these methods, they should be included in an introductory book as well. Omission of some modern insights might be acceptable if the results were at some level controversial among statisticians familiar with the underlying principles. But when there are hundreds of papers acknowledging a problem with a routinely used method, with no counterarguments being offered in a reputable statistics journal, surely it is important to discuss the practical implications of the insight in a book aimed at non-statisticians. This is the point of view adopted here.
1.1 SAMPLES VERSUS POPULATIONS
Assuming the reader has no prior training in statistics, we begin by making a distinction between a population of individuals of interest and a sample of individuals. A population of participants or objects consists of all those participants or objects that are relevant in a study.
Definition: A sample is any subset of the population of individuals or things under study.
EXAMPLE
Imagine a study dealing with the quality of education among high-school students. One aspect of this issue might be the number of hours students spend on homework. Imagine that 100 students are interviewed at a particular school and 40 say they spend less than 1 hour on homework. The 100 students represent a sample; they are a subset of the population of interest, which is all high-school students.
EXAMPLE
Imagine a developmental psychologist studying the ways children interact. One aspect of interest might be the difference between males and females in terms of how they handle certain situations. For example, are boys more aggressive than girls in certain play situations? Imagine that the psychologist videotapes 4-year-old children playing and then raters rate each child on a 10-point scale in terms of the amount of aggressive behavior they display. Further imagine that 30 boys get an average rating of 5, while 25 girls get an average rating of 4. The 30 boys represent a sample from the entire population of 4-year-old boys and the 25 girls represent a sample from the population of all 4-year-o1d girls.
Inferential methods are broadly aimed at assessing the implications of a sample regarding the characteris...