# Biometry for Forestry and Environmental Data

## With Examples in R

## Lauri Mehtätalo, Juha Lappi

- 412 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android

# Biometry for Forestry and Environmental Data

## With Examples in R

## Lauri Mehtätalo, Juha Lappi

## About This Book

Biometry for Forestry and Environmental Data with Examples in R focuses on statistical methods that are widely applicable in forestry and environmental sciences, but it also includes material that is of wider interest.

Features:

· Describes the theory and applications of selected statistical methods and illustrates their use and basic concepts through examples with forestry and environmental data in R.

· Rigorous but easily accessible presentation of the linear, nonlinear, generalized linear and multivariate models, and their mixed-effects counterparts. Chapters on tree size, tree taper, measurement errors, and forest experiments are also included.

· Necessary statistical theory about random variables, estimation and prediction is included. The wide applicability of the linear prediction theory is emphasized.

· The hands-on examples with implementations using R make it easier for non-statisticians to understand the concepts and apply the methods with their own data. Lot of additional material is available at www.biombook.org.

The book is aimed at students and researchers in forestry and environmental studies, but it will also be of interest to statisticians and researchers in other fields as well.

## Frequently asked questions

## Information

**R**. The examples in

**R**are used for two reasons. First, in many cases ideas are better communicated with the use of examples, and by showing the script associated with these examples, communication should be easier and more transparent. Second, the hands-on examples of implementations using

**R**shown in this book should make it easy even for non-statisticians to carry out similar analysis. However, the aim of this book is not to describe in detail the use of

**R**in modeling environmental data, but rather to describe the theory and applications of certain statistical methods and illustrate their use and basic concepts through examples in

**R**. The examples are separated from the main part of text so that the theory and methods can be also understood without the examples.

**R**-scripts from our examples, or by clicking the correct buttons in some other software, such analysis is very error-prone and should be avoided in serious scientific work. In addition, such analysis often underutilizes the capabilities of the statistical methods, does not show all the valuable information contained in the data, and may also be badly misleading. Therefore, a good understanding of the fundamentals of statistics is an invaluable asset to any researcher who wants to carry out applied research in forestry and environmental sciences. To further underline this point, we list below examples of a number of issues that are commonly misunderstood or badly recognized by researchers in applied fields:

- A linear model does not assume normality of the residual errors. The role of normality is often overly emphasized in applied sciences (see Section 4.6.3). This can lead, for example, to the use of non-parametric tests (which we do not address here) in many such data sets where parametric tests would be much better justified. For example, a parametric test (linear mixed-effects model) is more justified for grouped non-normal data than a non-parametric analysis that does not require normality even in small samples but ignores the dependency caused by the grouping. In general, the model assumptions have an order of importance: a good model for the expected value is more important than the model for the variance-covariance structure of the data, which in turn is more important than the assumptions about the shape of the distribution.
- A generalized linear model (GLM) does not utilize the information about the shape of the distribution in estimation and inference; it only utilizes the implicit variance-mean ratio (see Chapter 8). The excellence of generalized linear (mixed) models (GL(M)Ms) is, therefore, often overly emphasized. Any approach that properly models the error variance, and takes into account the range of the mean, may be equally well justified.
- Root mean squared error (RMSE) and the coefficient of determination are not acceptable criteria for comparing models fitted by different methods (see Box 4.3, p. 95) or evaluating whether random effects should be used or not. The model should be based on the structure of the data and previous knowledge of the process being modeled. In general, the use of coefficient of determination in model comparison has several pitfalls (see Section 4.5.3).
- The distribution of the
*y*-variable in the context of regression models means that the distribution is conditional on the predictors. The marginal distribution of the*y*-variable is not useful in evaluating whether the stated assumption about the distribution has been met. - If one wants to learn only one statistical concept well, we recommend the theory of linear prediction. It provides a large number of statistical concepts as a special case, as we will discuss in Section 3.5.2 and demonstrate throughout the book.
- The high
*p*-value from a test indicates that the test failed to reject the null hypothesis, but it does not indicate that the null hypothesis is true (see Section 3.6). This is an issue that everyone learns in a basic course in statistics, but seems to be easily forgotten during a research career, probably because so many research papers misinterpret the high*p*-values (see Amrhein*et al.*(2019) and comments thereof). - Many researchers regard that the purpose of the statistical analysis is to obtain low, (i.e. significant)
*p*-values. We emphasize that the modeling should be based on valid assumptions, not on assumptions that produce low*p*-values. Also, an honestly significant*p*-value does not imply that the effect is significant in practice.

*et al.*(2013) or any other similar appendices on the basics of matrices. Moreover, we use a lot of

**R-**examples in this book; readers who are not familiar with the

**R**software package should consult, for example, “An introduction to R” (

`https://www.r-project.org/`

).`http://www.biombook.org`

.**R**-examples are available from the book website. The data sets and some specific functions that are used in the examples are available in the

**R**-package

**lmfor**, which is available at the comprehensive

**R**archive network (CRAN).