eBook - ePub

Environmental and Ecological Statistics with R

Name: Environmental and Ecological Statistics with R
ISBN: 9781498728751

Song S. Qian,

536 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Environmental and Ecological Statistics with R

Song S. Qian,

About this book

Emphasizing the inductive nature of statistical thinking, Environmental and Ecological Statistics with R, Second Edition, connects applied statistics to the environmental and ecological fields. Using examples from published works in the ecological and environmental literature, the book explains the approach to solving a statistical problem, covering model specification, parameter estimation, and model evaluation. It includes many examples to illustrate the statistical methods and presents R code for their implementation. The emphasis is on model interpretation and assessment, and using several core examples throughout the book, the author illustrates the iterative nature of statistical inference.

The book starts with a description of commonly used statistical assumptions and exploratory data analysis tools for the verification of these assumptions. It then focuses on the process of building suitable statistical models, including linear and nonlinear models, classification and regression trees, generalized linear models, and multilevel models. It also discusses the use of simulation for model checking, and provides tools for a critical assessment of the developed models. The second edition also includes a complete critique of a threshold model.

Environmental and Ecological Statistics with R, Second Edition focuses on statistical modeling and data analysis for environmental and ecological problems. By guiding readers through the process of scientific problem solving and statistical model development, it eases the transition from scientific hypothesis to statistical model.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Chapman and Hall/CRC

Year

2016

Print ISBN

9780367736750

eBook ISBN

9781498728751

Topic

Biological Sciences

Subtopic

Probability & Statistics

Index

Biological Sciences

Part II

Statistical Modeling

Chapter 5

Linear Models

5.1 Introduction

5.2 From t-test to Linear Models

5.3 Simple and Multiple Linear Regression Models

5.3.1 The Least Squares

5.3.2 Regression with One Predictor

5.3.3 Multiple Regression

5.3.4 Interaction

5.3.5 Residuals and Model Assessment

5.3.6 Categorical Predictors

5.3.7 Collinearity and the Finnish Lakes Example

5.4 General Considerations in Building a Predictive Model

5.5 Uncertainty in Model Predictions

5.5.1 Example: Uncertainty in Water Quality Measurements

5.6 Two-Way ANOVA

5.6.1 ANOVA as a Linear Model

5.6.2 More Than One Categorical Predictor

5.6.3 Interaction

5.7 Bibliography Notes

5.8 Exercises

5.1 Introduction

In Chapter 4, we defined a model as a probability distribution model. Once a model is proposed, we make inference about the unknown model parameters based on data. In a one sample t-test problem, we are interested in learning about the mean of a normal distribution.

y_{i} \sim N (μ, σ^{2})

(5.1)

It is often convenient to think of the data y_i in terms of the mean and a remainder:

y_{i} = μ + ε_{i}

(5.2)

That is, we can split an observed value into two parts, the mean (μ) and the remainder (ε_i). Mathematically the above two expressions are equivalent. The remainder is the difference between the observed and the mean, often known as residuals, has a normal distribution with mean 0 and standard deviation σ (ε_i ∼ N(0, σ²)). In a two sample t-test problem, we are interested in the difference between the means of two populations or groups. We present the problem as follows:

\begin{array}{l} y_{1 i} & \sim & N (μ_{1}, σ^{2}) \\ y_{2 j} & \sim & N (μ_{2}, σ^{2}) \end{array}

(5.3)

and we are interested in the difference between the two means δ = μ₂ − μ₁. We can present the problem in the format of equation (5.2) by combining the data from the two groups together into a data frame with a second column to indicate the group association (or “treatment”). A mathematically convenient construction of the treatment column is to use a column of 0’s (for y_1i) and 1’s (for y_2j). The data frame consists of two columns, the data column (y) and the treatment (or more generally, group) column (g). Each row represents an observed data point and its group association (0 for group 1 and 1 for group 2). The two-sample t-test problem in equation (5.3) can be expressed in the form of equation (5.4):

y_{j} = μ_{1} + δ g_{j} + ε_{j}

(5.4)

where j is the index for the combined data, g_j is the group association of the j th observation. For data from group 1 (g_j = 0), equation (5.4) reduces to

y_{j} = μ_{1} + ε_{j}

and for data from group 2 (g_j = 1), the model is

y_{j} = μ_{1} + δ + ε_{j}

The group indicator g is often known as a “dummy variable.” A dummy variable takes value 0 or 1. When we have data from more than two groups, we will use p − 1 dummy variables to represent the p groups. For example, if we have three groups in an ANOVA problem (e.g., Exercise 7 in Chapter 4), we combine observed data from all three groups into one column. The first dummy variable g₁ takes value 1 if the observation is from group 2 and 0 otherwise. The second dummy variable g₂ takes value 1 if the observation is from group 3 and 0 otherwise. The ANOVA problem can now be expressed as a linear model problem:

y_{j} = μ_{1} + δ_{1} g_{1 j} + δ_{2} g_{2 j} + ε_{j}

(5.5)

For data from group 1, the model is reduced to y_i = μ₁ + ε_i. For data from group 2, the model is y_i = μ₁ + δ₁ + ε_i, and for group 3, y_i = μ₁ + δ₂ + ε_i.

By represent the t-test and ANOVA problems in terms of a “statistical model,” I want to convey two main messages. First, we use different models for different problems. Second, statistical inference is mostly about the relationship among variables. Likewise, a main goal in science is the understanding of the relationship among important variables. The relationship, either described qualitatively or quantitatively, is a model. In a statistical problem, we define a model as the probability distribution of the variable of interest (the response variable). A probability distribution has a mean (or location) parameter and a parameter representing spread (e.g., standard deviation). When a distribution model is specified, we want to understand how the mean of the distribution varies as a function of other variables (predictor variables). In equation (5.2), the mean is a constant (no predictor variable). In equations (5.4) and (5.5), the mean varies by groups (g is the predictor variable). For a response variable with a normal distribution, the standard deviation can be estimated from the residuals. As a result, we can often express a statistical model as y_i = f (x, θ) + ε_i, where x represents predictor variable(s), θ represents unknown parameter(s) to be estimated, and ε_i is a normal random variable with mean 0 and an unknown standard deviation (σ). In equation (5.5), x represents both g₁ and g₂, and θ includes μ₁, δ₁, and δ₂. The function f(x, θ) is an example of a mean function of a statistical model – a function defines the relationship between the mean parameter of the response variable distribution and a number of predictors. Using equation (5.5), we can define a statistical modeling problem as follows:

• Model formulation – response variable is a normal random variable with different group means and a constant standard deviation (e.g., equation (5.5)).

• Parameter estimation – how to estimate unknown parameters (e.g., μ₁, δ₁, δ₂, σ in equation (5.5...

Cover
Half Title
Title Page
Copyright Page
Dedication
Table of Contents
Preface
List of Figures
List of Tables
I Basic Concepts
II Statistical Modeling
III Advanced Statistical Modeling
Bibliography
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Environmental and Ecological Statistics with R by Song S. Qian in PDF and/or ePUB format, as well as other popular books in Biological Sciences & Probability & Statistics. We have over 1.5 million books available in our catalogue for you to explore.

Environmental and Ecological Statistics with R

Environmental and Ecological Statistics with R

About this book

Trusted by 375,005 students

Information

Table of contents

Frequently asked questions