IV
Analysis: Investigating Empirical Relationships
8
Data Analysis Foundations
Chapter Outline
⢠Data Analysis and Statistics
⢠Statistical Information
⢠Statistical Purposes
⢠Properties of Scores
⢠Levels of Measurement
⢠Discrete and Continuous Variables
⢠Conventions
⢠Summary
⢠For Review
⢠Terms to Know
⢠Questions for Review
⢠Appendix 8A: On Clean Data
⢠Errors Made on Measuring Instruments
⢠Data File Errors
⢠Missing Values
⢠Evaluating Secondary Data Sets
⢠At most colleges and universities, studentsā first (or only) exposure to research method activities is a course in statistics.
⢠Statistics courses are typically taught by statisticians.
These unremarkable observations nevertheless may be important for your mindset about research. If your introduction to research methods was as described, you probably started this book believing statistics represents the main empirical research activity.
Hopefully that idea was dispelled in part I, where it was argued that empirical research is aimed at enhancing conceptual understanding. It does so by providing information about probable relationships between scores that validly represent the constructs at issue. Statistics are helpful in this process; but statistics are far from sufficient. Causal relationships and relationships that correspond to conceptual meanings also require sound design and measurement.
But why the observationāstatisticians teach statistics? Itās a matter of perspective. Statisticians are understandably most interested in advancing knowledge about the field of statistics. Alternatively, empirical researchers are interested in statistics as a tool for advancing knowledge about conceptual relationships. These differences lead to different concerns and emphases.
An important difference has to do with the way scores are viewed. To statisticians, scores illustrate how statistical procedures operate. Hypothetical scores are as good as scores from real cases for their purposes.
Not so for empirical researchers; information provided by scores from real cases are central empirical ingredients. Moreover, these scores typically do not satisfy statistical assumptions; they frequently are in error, or even missing. Empirical researchers must seek to make appropriate inferences to the conceptual level from an operational environment of fallible scores. (Appendix 8A discusses errors in data sets and methods to minimize them. Chapter 16 describes additional procedures that are available to address missing data.)
DATA ANALYSIS AND STATISTICS
Data is another term for scores. Data often refer to a set of scores on two or more variables. (Datum is singular and refers to one score.) For researchers, the data are as challenging as the statistical procedures used to analyze them. This book adopts a research orientation to analysis; it emphasizes data analysis, not statistical analysis per se.
The chapter begins with a definition of statistics and describes how statistics contribute to research. It then describes characteristics of scores for research purposes. These characteristics have implications for decisions about using statistical procedures. It concludes with a description of conventions for statistical operations on scores used in the remainder of this book.
The term statistic is used in several ways. Sometimes it refers to a field of study. Sometimes it describes procedures developed by statisticians or procedures researchers use to analyze data. Sometimes the term is used to characterize outcomes of statistical procedures.
In this book, the term statistic is used to designate a summary characteristic of scores. For example, a mean is a statistic whose values describe the central tendency of a set of scores on some measure. A simple correlation coefficient is a statistic whose values describe the degree of relationship between scores on two variables.
Statistical Information
There are many statistics and many ways to summarize data. Means and correlation coefficients are just examples. However, statistics almost always provide information about just one of two issues. First, some statistics provide summary information on scores obtained on a single variable. The mean, which identifies the weighted middle of a set of casesā scores, is one example. Chapter 9 discusses statistics that describe scores on a single variable.
Second, statistics provide information about how values on two or more variables vary together. In this role, statistics are used to address the central issue in researchānamely, relationships between variables. Chapters 10 and 11 discuss statistics that describe relationships.
Statistical Purposes
Statistics also serve two purposes. First, they are used to summarize scores and empirical relationships among the cases studied. This section describes statistics for this purpose.
Second, statistical procedures are used to address two types of validity introduced in chapter 2. One of these involves the contribution of statistics to causal analysisāinternal statistical validity and va...