Chapter 1
Misinterpretation of Selected Theoretical Concepts of Statistics
1.1 Introduction
It is recognized that most undergraduate and graduate students do not have sufficient knowledge of the basic theoretical concepts of statistics or mathematical statistics in general, such as the concepts of sample space, representative sample, a statistic as a parameter estimator, testing hypothesis, and random variable and its theoretical (normal) distribution. Thus, they think that they have to prove the theoretical concepts of statistics by using only a sample data set, for instance, error terms of statistical models should have independent and identical normal distributions. In fact, some of them even think that the cross-section data of a numerical variable should be tested whether or not it has a normal distribution, before doing further data analysis.
On the other hand, they think that they have to select a representative sample for their theses or dissertations. This does not have a clear meaning, as in fact there is no sampling method or guide on how to select a representative sample. Agung (1992a) states that it is better not to use the term “representative sample” anymore, since it can be misleading. It is well known that researchers will most likely select a nonprobability sample, specifically a convenient sample where each respondent has a convenient time in giving “good” responses. In other words, most sample survey researches do not use pure random samples, since researchers never take into account a complete list of the population. On the other hand, a random sample is basically defined as a sample where each individual in the population has an equal probability of being selected. In fact, even some of my graduate students for their theses and dissertations should be using a special sampling method, called the friendship sampling method (Agung, 2008a), since they should interview managers or high-ranking persons, who have very limited time and most likely do not want to participate in the study, or they are using their friends as the research objects.
Furthermore, it is recognized that most books in applied statistics do not present or discuss the sample space with simple and detailed illustrations. Hence, students or readers never clearly know the limitation of a sample data set for estimating the true value of population parameters. On the other hand, several sampled statistics can be misinterpreted, such as a causal relationship between a pair of variables should be proven using a simple regression, the standard error of a variable, a sample size has to be estimated using a statistical formula, the reliability Crownbach α, which in fact is a consistency coefficient, and validity of an instrument data collection, which are in fact computed based on a sample of individuals that happen to be selected by the researchers.
For this reason, the following sections present some notes and comments on selected theoretical concepts of statistics, as well as sampled statistical values, which are considered as very important supporting knowledge in giving values to the statistical results based on a sample data set.
1.2 What is a Population?
It has been recognized that a population can be thought of as a complete set of individuals, a complete set of characteristics or variables, or as a complete set of scores, values, or measurements of variables. For these reasons, the following alternative definitions of a population are proposed. On the other hand, a hypothetical population will be introduced later, corresponding to any nonrandom samples which have been used in most or almost all sample survey researches.
Definition 1.1
A population is defined as a complete set of all individuals having specific characteristics defined by a researcher, such that each individual can be perfectly classified into whether or not the individual is a member of the population.
Definition 1.2
A population is defined as a complete set of all possible characteristics or variables of the observed individuals.
Definition 1.3
A variable is a characteristic of a set of individuals, which can have different scores/values/measurements for different individuals in the set.
Definition 1.4
A population is a complete set of multidimensional quantitative and qualitative scores, values, or measurements of all possible variables, which could give a complete data or information to a researcher. In other words, a population is a complete set of quantitative and qualitative scores/values/measurements of all possible defined variables.
1.3 A Sample and Sample Space
1.3.1 What is a Sample?
Definition 1.5
A sample is a finite subset of a defined population. According to Definition 1.4, then, the sample data set, which will be called “sample” for short, will be defined as a finite set of quantitative and qualitative scores, values, or measurements which happen to be selected by or are available for a researcher.
Take note that in any statistical data analyses, researchers or analysts would always consider a very small set of scores/values/measurements for a limited number of all possible variables or indicators. Researchers will never be observing or measuring a whole population, either as the complete sets of individuals, variables or characteristics, as well as scores/values/measurements. For this reason, every reader should be very confident that researchers will never have the total or complete information of a population, in general.
Corresponding to a sample survey, the population could be classified as a sample population and a target population. A sample population is defined as the population, from which a sample will directly be selected, by using a specific selection method, such as a multistage sampling method and others (see Kish 1975). The target population is defined as a much larger population than the sample population for which the statistical results are predicted, estimated, or assumed to be applicable, in a statistical sense. Take note that this statement on the target population is an abstract or theoretical statement, which cannot be p...