Biometry for Forestry and Environmental Data
eBook - ePub

Biometry for Forestry and Environmental Data

With Examples in R

Lauri Mehtätalo, Juha Lappi

Share book
  1. 412 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Biometry for Forestry and Environmental Data

With Examples in R

Lauri Mehtätalo, Juha Lappi

Book details
Book preview
Table of contents
Citations

About This Book

Biometry for Forestry and Environmental Data with Examples in R focuses on statistical methods that are widely applicable in forestry and environmental sciences, but it also includes material that is of wider interest.

Features:

· Describes the theory and applications of selected statistical methods and illustrates their use and basic concepts through examples with forestry and environmental data in R.

· Rigorous but easily accessible presentation of the linear, nonlinear, generalized linear and multivariate models, and their mixed-effects counterparts. Chapters on tree size, tree taper, measurement errors, and forest experiments are also included.

· Necessary statistical theory about random variables, estimation and prediction is included. The wide applicability of the linear prediction theory is emphasized.

· The hands-on examples with implementations using R make it easier for non-statisticians to understand the concepts and apply the methods with their own data. Lot of additional material is available at www.biombook.org.

The book is aimed at students and researchers in forestry and environmental studies, but it will also be of interest to statisticians and researchers in other fields as well.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Biometry for Forestry and Environmental Data an online PDF/ePUB?
Yes, you can access Biometry for Forestry and Environmental Data by Lauri Mehtätalo, Juha Lappi in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Information

Year
2020
ISBN
9780429530777
Edition
1
1
Introduction
Forestry and environmental research is commonly based on quantitative data, which can be used for various purposes. These include, for example, assessments of natural resources, finding and quantifying associations between some variables of interest, estimating the effects of certain factors on the variable of interest, and prediction of the variable of interest for units that have not been observed, such as new spatial locations or points in time. Quantitative data are commonly analyzed using statistical methods.
In this book, we focus on statistical methods that are widely applicable with forestry and environmental data. Those methods, such as the regression analysis, are very general in the sense that the same methods are applied in many fields. However, when comparing quantitative forestry and environmental data to data sets from other fields, such as health sciences, social sciences, or economics, there are some differences. For example, forest and environmental data sets are often collected from spatially variable locations, possibly over time. They are strongly affected by the spatial locations themselves, and by the weather and climatic conditions of the site. Very often the data sets have a grouped structure, for example, they may consist of trees measured on sample plots that are measured at different locations at different points in time. Sometimes, the grouped structure may exist only in the collected data set, not in the population from which the data were sampled. For example, it may be caused by smooth changes in the studied phenomena over space, which has a very wide range compared to the size of the sample plots. The types of variables of interest also vary, including binary indicators, counts, percentages, and other continuous quantitative measures. When analyzing forest tree data, the applied statistical models should take into account what we already know about the shapes and dimensions of the trees or, more generally, our knowledge of the natural processes behind the variable of interest. Sometimes the data are based on controlled experiments in a laboratory. However, more often they are based on field-measurements. In such data sets, there are lot of factors that are not under the control of the experimenter and have a large effect on the data analysis, even when the field experiment has been well planned.
The methods, concepts and results presented in this book are illustrated throughout with examples, which are mostly implemented using the open-source statistical software R. The examples in R are used for two reasons. First, in many cases ideas are better communicated with the use of examples, and by showing the script associated with these examples, communication should be easier and more transparent. Second, the hands-on examples of implementations using R shown in this book should make it easy even for non-statisticians to carry out similar analysis. However, the aim of this book is not to describe in detail the use of R in modeling environmental data, but rather to describe the theory and applications of certain statistical methods and illustrate their use and basic concepts through examples in R. The examples are separated from the main part of text so that the theory and methods can be also understood without the examples.
We emphasize that even though one might be able to make a statistical analysis by just editing the R-scripts from our examples, or by clicking the correct buttons in some other software, such analysis is very error-prone and should be avoided in serious scientific work. In addition, such analysis often underutilizes the capabilities of the statistical methods, does not show all the valuable information contained in the data, and may also be badly misleading. Therefore, a good understanding of the fundamentals of statistics is an invaluable asset to any researcher who wants to carry out applied research in forestry and environmental sciences. To further underline this point, we list below examples of a number of issues that are commonly misunderstood or badly recognized by researchers in applied fields:
  • A linear model does not assume normality of the residual errors. The role of normality is often overly emphasized in applied sciences (see Section 4.6.3). This can lead, for example, to the use of non-parametric tests (which we do not address here) in many such data sets where parametric tests would be much better justified. For example, a parametric test (linear mixed-effects model) is more justified for grouped non-normal data than a non-parametric analysis that does not require normality even in small samples but ignores the dependency caused by the grouping. In general, the model assumptions have an order of importance: a good model for the expected value is more important than the model for the variance-covariance structure of the data, which in turn is more important than the assumptions about the shape of the distribution.
  • A generalized linear model (GLM) does not utilize the information about the shape of the distribution in estimation and inference; it only utilizes the implicit variance-mean ratio (see Chapter 8). The excellence of generalized linear (mixed) models (GL(M)Ms) is, therefore, often overly emphasized. Any approach that properly models the error variance, and takes into account the range of the mean, may be equally well justified.
  • Root mean squared error (RMSE) and the coefficient of determination are not acceptable criteria for comparing models fitted by different methods (see Box 4.3, p. 95) or evaluating whether random effects should be used or not. The model should be based on the structure of the data and previous knowledge of the process being modeled. In general, the use of coefficient of determination in model comparison has several pitfalls (see Section 4.5.3).
  • The distribution of the y-variable in the context of regression models means that the distribution is conditional on the predictors. The marginal distribution of the y-variable is not useful in evaluating whether the stated assumption about the distribution has been met.
  • If one wants to learn only one statistical concept well, we recommend the theory of linear prediction. It provides a large number of statistical concepts as a special case, as we will discuss in Section 3.5.2 and demonstrate throughout the book.
  • The high p-value from a test indicates that the test failed to reject the null hypothesis, but it does not indicate that the null hypothesis is true (see Section 3.6). This is an issue that everyone learns in a basic course in statistics, but seems to be easily forgotten during a research career, probably because so many research papers misinterpret the high p-values (see Amrhein et al. (2019) and comments thereof).
  • Many researchers regard that the purpose of the statistical analysis is to obtain low, (i.e. significant) p-values. We emphasize that the modeling should be based on valid assumptions, not on assumptions that produce low p-values. Also, an honestly significant p-value does not imply that the effect is significant in practice.
This book is aimed at students and researchers in forestry and environmental studies. We assume that the readers have a basic understanding of statistics. We also assume that they are familiar with basic matrix algebra; readers who are not familiar with matrices should read, e.g. Appendix A in Fahrmeir et al. (2013) or any other similar appendices on the basics of matrices. Moreover, we use a lot of R-examples in this book; readers who are not familiar with the R software package should consult, for example, “An introduction to R” (https://www.r-project.org/).
There are approximately 170 examples in this textbook. A proportion of them are web examples, which are just briefly mentioned in the book and are available in their entirety from the book website at
http://www.biombook.org.
In addition, the full scripts for all R-examples are available from the book website. The data sets and some specific functions that are used in the examples are available in the R-package lmfor, which is available at the comprehensive R archive network (CRAN).
The subsequent chapters of the book are organized as follows: Chapters 2 and 3 summarize the necessary preliminaries for a sufficiently deep understanding of the main ideas, capabilities and constraints of the methods described in the subsequent chapters. Chapter 2 presents the basic mathematical tools that are used to formulate a (theoretical) model for a process of interest and Chapter 3 presents the general principles about how the process parameters can be estimated using observed data. In particular, Section 3.5 describes the linear predictor, the generality of which is emphasized and demonstrated in almost all subsequent chapters of the book.
Chapters 4–10, are devoted to regression models in different contexts. Chapter 4 covers the linear model. A difference between our text and many other textbooks on the topic is that we present the model directly for a general variance-covariance structure. The use of that model in prediction is also illustrated, which links the linear model to the prediction of time series and geostatistics. Chapters 5 and 6 cover the linear mixed-effect models for a data set with a single level of grouping. In contrast to many other textbooks on the subject, we devote considerable space to illustrate how the model is formulated in the matrix form. One reason for this is the common application in forest sciences, where a previously fitted model is used to predict the random effects in a new group of interest. Sections 6.3–6.5 include topics that we have not seen previously discussed in textbooks. Chapters 7, 8 and 9 generalize the ideas of Chapters 4–6 to non-normal, nonlinear and multivariate models. The similarity between the nonlinear and generalized linear models is emphasized. Chapter 9 discusses the multivariate models and shows through examples how it is formulated as a univariate system, and how the cross-model correlations can be utilized in prediction. Chapter 10 extends the discussion of regression models by addressing some topics that are common to all the models described in the previous chapters.
Chapters 11–14 discuss some specific topics related to our own experiences in modeling forest data sets. These include the modeling of tree size, stem taper, measurement errors, and a short discussion on analysis and planning of forest experiments.
Our notations differ slightly between chapters. The most important difference is probably the meaning of capital and lower...

Table of contents