Environmental and Ecological Statistics with R
eBook - ePub

Environmental and Ecological Statistics with R

  1. 536 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Environmental and Ecological Statistics with R

About this book

Emphasizing the inductive nature of statistical thinking, Environmental and Ecological Statistics with R, Second Edition, connects applied statistics to the environmental and ecological fields. Using examples from published works in the ecological and environmental literature, the book explains the approach to solving a statistical problem, covering model specification, parameter estimation, and model evaluation. It includes many examples to illustrate the statistical methods and presents R code for their implementation. The emphasis is on model interpretation and assessment, and using several core examples throughout the book, the author illustrates the iterative nature of statistical inference.

The book starts with a description of commonly used statistical assumptions and exploratory data analysis tools for the verification of these assumptions. It then focuses on the process of building suitable statistical models, including linear and nonlinear models, classification and regression trees, generalized linear models, and multilevel models. It also discusses the use of simulation for model checking, and provides tools for a critical assessment of the developed models. The second edition also includes a complete critique of a threshold model.

Environmental and Ecological Statistics with R, Second Edition focuses on statistical modeling and data analysis for environmental and ecological problems. By guiding readers through the process of scientific problem solving and statistical model development, it eases the transition from scientific hypothesis to statistical model.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Environmental and Ecological Statistics with R by Song S. Qian in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Information

Part II
Statistical Modeling
Chapter 5
Linear Models
5.1 Introduction
5.2 From t-test to Linear Models
5.3 Simple and Multiple Linear Regression Models
5.3.1 The Least Squares
5.3.2 Regression with One Predictor
5.3.3 Multiple Regression
5.3.4 Interaction
5.3.5 Residuals and Model Assessment
5.3.6 Categorical Predictors
5.3.7 Collinearity and the Finnish Lakes Example
5.4 General Considerations in Building a Predictive Model
5.5 Uncertainty in Model Predictions
5.5.1 Example: Uncertainty in Water Quality Measurements
5.6 Two-Way ANOVA
5.6.1 ANOVA as a Linear Model
5.6.2 More Than One Categorical Predictor
5.6.3 Interaction
5.7 Bibliography Notes
5.8 Exercises
5.1 Introduction
In Chapter 4, we defined a model as a probability distribution model. Once a model is proposed, we make inference about the unknown model parameters based on data. In a one sample t-test problem, we are interested in learning about the mean of a normal distribution.
yiN(μ,σ2)
(5.1)
It is often convenient to think of the data yi in terms of the mean and a remainder:
yi=μ+εi
(5.2)
That is, we can split an observed value into two parts, the mean (μ) and the remainder (εi). Mathematically the above two expressions are equivalent. The remainder is the difference between the observed and the mean, often known as residuals, has a normal distribution with mean 0 and standard deviation σ (εiN(0, σ2)). In a two sample t-test problem, we are interested in the difference between the means of two populations or groups. We present the problem as follows:
y1iN(μ1,σ2)y2jN(μ2,σ2)
(5.3)
and we are interested in the difference between the two means δ = μ2μ1. We can present the problem in the format of equation (5.2) by combining the data from the two groups together into a data frame with a second column to indicate the group association (or “treatment”). A mathematically convenient construction of the treatment column is to use a column of 0’s (for y1i) and 1’s (for y2j). The data frame consists of two columns, the data column (y) and the treatment (or more generally, group) column (g). Each row represents an observed data point and its group association (0 for group 1 and 1 for group 2). The two-sample t-test problem in equation (5.3) can be expressed in the form of equation (5.4):
yj=μ1+δgj+εj
(5.4)
where j is the index for the combined data, gj is the group association of the j th observation. For data from group 1 (gj = 0), equation (5.4) reduces to yj=μ1+εj and for data from group 2 (gj = 1), the model is yj=μ1+δ+εj.
The group indicator g is often known as a “dummy variable.” A dummy variable takes value 0 or 1. When we have data from more than two groups, we will use p − 1 dummy variables to represent the p groups. For example, if we have three groups in an ANOVA problem (e.g., Exercise 7 in Chapter 4), we combine observed data from all three groups into one column. The first dummy variable g1 takes value 1 if the observation is from group 2 and 0 otherwise. The second dummy variable g2 takes value 1 if the observation is from group 3 and 0 otherwise. The ANOVA problem can now be expressed as a linear model problem:
yj=μ1+δ1g1j+δ2g2j+εj
(5.5)
For data from group 1, the model is reduced to yi = μ1 + εi. For data from group 2, the model is yi = μ1 + δ1 + εi, and for group 3, yi = μ1 + δ2 + εi.
By represent the t-test and ANOVA problems in terms of a “statistical model,” I want to convey two main messages. First, we use different models for different problems. Second, statistical inference is mostly about the relationship among variables. Likewise, a main goal in science is the understanding of the relationship among important variables. The relationship, either described qualitatively or quantitatively, is a model. In a statistical problem, we define a model as the probability distribution of the variable of interest (the response variable). A probability distribution has a mean (or location) parameter and a parameter representing spread (e.g., standard deviation). When a distribution model is specified, we want to understand how the mean of the distribution varies as a function of other variables (predictor variables). In equation (5.2), the mean is a constant (no predictor variable). In equations (5.4) and (5.5), the mean varies by groups (g is the predictor variable). For a response variable with a normal distribution, the standard deviation can be estimated from the residuals. As a result, we can often express a statistical model as yi = f (x, θ) + εi, where x represents predictor variable(s), θ represents unknown parameter(s) to be estimated, and εi is a normal random variable with mean 0 and an unknown standard deviation (σ). In equation (5.5), x represents both g1 and g2, and θ includes μ1, δ1, and δ2. The function f(x, θ) is an example of a mean function of a statistical model – a function defines the relationship between the mean parameter of the response variable distribution and a number of predictors. Using equation (5.5), we can define a statistical modeling problem as follows:
• Model formulation – response variable is a normal random variable with different group means and a constant standard deviation (e.g., equation (5.5)).
• Parameter estimation – how to estimate unknown parameters (e.g., μ1, δ1, δ2, σ in equation (5.5...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Dedication
  6. Table of Contents
  7. Preface
  8. List of Figures
  9. List of Tables
  10. I Basic Concepts
  11. II Statistical Modeling
  12. III Advanced Statistical Modeling
  13. Bibliography
  14. Index