1
UNIT
Descriptive Statistics
We begin our study of the mathematics used in the life sciences with a unit that explores how we understand data mathematically. Understanding data is a process, the steps of which are the following:
1. Collecting the data
2. Summarizing the data
3. Analyzing the data
4. Interpreting the results and reporting them
Note that before carrying out any of the above, you have presumably formulated some underlying question or hypothesis that you wish to use the data to address. There are a few key approaches through which we can address scientific questions:
(i) observation (natural history: see what occurs where and when and interpret the results based on differences in the locations or history),
(ii) experiment (vary aspects of the environment in order to tease apart how the biological components respond), and
(iii) theory (make assumptions about the natural world and analyze the implications of those assumptions using verbal, graphical, and mathematical arguments).
Each of these approaches involves quantitative methods, and an objective of this text is to provide you with an understanding of some of these methods.
Step 1 above involves the area of ādesign of experimentsā in which the process of data collection is determined based on the objectives of the study and the limitations imposed (e.g., cost, time, available personnel, accessibility of the study area, etc.). Design implies that the scientist considers alternative methods to collect the data as well as the manner in which the factors deemed to affect the data collection are manipulated. Examples would be determining
⢠where and when to put out traps to collect animals in the field,
⢠how many replicates of an evaluation test to use in estimating the efficacy of a new drug,
⢠how many different levels of growth medium with what nutrient constituents to use in evaluating the impact of a new antibiotic on bacterial population growth in the lab,
⢠the response of an organismās respiration rate to temperature,
⢠how many different temperature treatments are applied, in what order, and for how long.
Step 2 in the process is typically called ādescriptive statistics,ā in which the objective is to abstract out certain properties of the data in order to better interpret them. The assumption here is that the data are too complex for us to understand well by simply looking at them as lists or tables. The simplest example of this is the computation of an āaverageā value of the data. Many of us obtain a better grasp of a data set by having some summary of the data available, particularly in graphical form, rather than simply a tabular elaboration of the data. Note that whatever methods are utilized here, there is a loss of information associated with the description provided: the description (e.g., the average value of the data) does not include the full amount of information in the complete data set. An objective in descriptive statistics is to choose the appropriate level of description between complete enumeration of the data and a coarse simple summary (such as the average value) so as to be able to address the questions you posed in the first place. As an example, consider the height of all students in a course. Having these displayed as a long list would not be readily useful, whereas if we state that the average height of students is 165 cm, you have a simple means of comparing the students in the course to the students in another course. More information would be provided by a histogram (bar chart) of the heights of the students in the course, but even then there would be some loss of information since we could not develop from the histogram the full list of heights of all students in the course.
Step 3 in the process typically involves the area of inferential statistics, which consists of parameter estimation and hypothesis testing. Parameter estimation refers to using the data to determine estimates of values of particular interest (respiration rate, photosynthetic rate, hemoglobin level, etc.) from the observations. One might then use the data to evaluate hypotheses (respiration rate incre...