eBook - ePub

Biometry for Forestry and Environmental Data

Name: Biometry for Forestry and Environmental Data
Author: Lauri Mehtätalo, Juha Lappi

With Examples in R

Lauri Mehtätalo, Juha Lappi

Partager le livre

412 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

Biometry for Forestry and Environmental Data

With Examples in R

Lauri Mehtätalo, Juha Lappi

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

Biometry for Forestry and Environmental Data with Examples in R focuses on statistical methods that are widely applicable in forestry and environmental sciences, but it also includes material that is of wider interest.

Features:

· Describes the theory and applications of selected statistical methods and illustrates their use and basic concepts through examples with forestry and environmental data in R.

· Rigorous but easily accessible presentation of the linear, nonlinear, generalized linear and multivariate models, and their mixed-effects counterparts. Chapters on tree size, tree taper, measurement errors, and forest experiments are also included.

· Necessary statistical theory about random variables, estimation and prediction is included. The wide applicability of the linear prediction theory is emphasized.

· The hands-on examples with implementations using R make it easier for non-statisticians to understand the concepts and apply the methods with their own data. Lot of additional material is available at www.biombook.org.

The book is aimed at students and researchers in forestry and environmental studies, but it will also be of interest to statisticians and researchers in other fields as well.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que Biometry for Forestry and Environmental Data est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à Biometry for Forestry and Environmental Data par Lauri Mehtätalo, Juha Lappi en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Mathematics et Probability & Statistics. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Chapman and Hall/CRC

Année

2020

ISBN

9780429530777

Édition

Sujet

Mathematics

Sous-sujet

Probability & Statistics

Introduction

Forestry and environmental research is commonly based on quantitative data, which can be used for various purposes. These include, for example, assessments of natural resources, finding and quantifying associations between some variables of interest, estimating the effects of certain factors on the variable of interest, and prediction of the variable of interest for units that have not been observed, such as new spatial locations or points in time. Quantitative data are commonly analyzed using statistical methods.

In this book, we focus on statistical methods that are widely applicable with forestry and environmental data. Those methods, such as the regression analysis, are very general in the sense that the same methods are applied in many fields. However, when comparing quantitative forestry and environmental data to data sets from other fields, such as health sciences, social sciences, or economics, there are some differences. For example, forest and environmental data sets are often collected from spatially variable locations, possibly over time. They are strongly affected by the spatial locations themselves, and by the weather and climatic conditions of the site. Very often the data sets have a grouped structure, for example, they may consist of trees measured on sample plots that are measured at different locations at different points in time. Sometimes, the grouped structure may exist only in the collected data set, not in the population from which the data were sampled. For example, it may be caused by smooth changes in the studied phenomena over space, which has a very wide range compared to the size of the sample plots. The types of variables of interest also vary, including binary indicators, counts, percentages, and other continuous quantitative measures. When analyzing forest tree data, the applied statistical models should take into account what we already know about the shapes and dimensions of the trees or, more generally, our knowledge of the natural processes behind the variable of interest. Sometimes the data are based on controlled experiments in a laboratory. However, more often they are based on field-measurements. In such data sets, there are lot of factors that are not under the control of the experimenter and have a large effect on the data analysis, even when the field experiment has been well planned.

The methods, concepts and results presented in this book are illustrated throughout with examples, which are mostly implemented using the open-source statistical software R. The examples in R are used for two reasons. First, in many cases ideas are better communicated with the use of examples, and by showing the script associated with these examples, communication should be easier and more transparent. Second, the hands-on examples of implementations using R shown in this book should make it easy even for non-statisticians to carry out similar analysis. However, the aim of this book is not to describe in detail the use of R in modeling environmental data, but rather to describe the theory and applications of certain statistical methods and illustrate their use and basic concepts through examples in R. The examples are separated from the main part of text so that the theory and methods can be also understood without the examples.

We emphasize that even though one might be able to make a statistical analysis by just editing the R-scripts from our examples, or by clicking the correct buttons in some other software, such analysis is very error-prone and should be avoided in serious scientific work. In addition, such analysis often underutilizes the capabilities of the statistical methods, does not show all the valuable information contained in the data, and may also be badly misleading. Therefore, a good understanding of the fundamentals of statistics is an invaluable asset to any researcher who wants to carry out applied research in forestry and environmental sciences. To further underline this point, we list below examples of a number of issues that are commonly misunderstood or badly recognized by researchers in applied fields:

A linear model does not assume normality of the residual errors. The role of normality is often overly emphasized in applied sciences (see Section 4.6.3). This can lead, for example, to the use of non-parametric tests (which we do not address here) in many such data sets where parametric tests would be much better justified. For example, a parametric test (linear mixed-effects model) is more justified for grouped non-normal data than a non-parametric analysis that does not require normality even in small samples but ignores the dependency caused by the grouping. In general, the model assumptions have an order of importance: a good model for the expected value is more important than the model for the variance-covariance structure of the data, which in turn is more important than the assumptions about the shape of the distribution.
A generalized linear model (GLM) does not utilize the information about the shape of the distribution in estimation and inference; it only utilizes the implicit variance-mean ratio (see Chapter 8). The excellence of generalized linear (mixed) models (GL(M)Ms) is, therefore, often overly emphasized. Any approach that properly models the error variance, and takes into account the range of the mean, may be equally well justified.
Root mean squared error (RMSE) and the coefficient of determination are not acceptable criteria for comparing models fitted by different methods (see Box 4.3, p. 95) or evaluating whether random effects should be used or not. The model should be based on the structure of the data and previous knowledge of the process being modeled. In general, the use of coefficient of determination in model comparison has several pitfalls (see Section 4.5.3).
The distribution of the y-variable in the context of regression models means that the distribution is conditional on the predictors. The marginal distribution of the y-variable is not useful in evaluating whether the stated assumption about the distribution has been met.
If one wants to learn only one statistical concept well, we recommend the theory of linear prediction. It provides a large number of statistical concepts as a special case, as we will discuss in Section 3.5.2 and demonstrate throughout the book.
The high p-value from a test indicates that the test failed to reject the null hypothesis, but it does not indicate that the null hypothesis is true (see Section 3.6). This is an issue that everyone learns in a basic course in statistics, but seems to be easily forgotten during a research career, probably because so many research papers misinterpret the high p-values (see Amrhein et al. (2019) and comments thereof).
Many researchers regard that the purpose of the statistical analysis is to obtain low, (i.e. significant) p-values. We emphasize that the modeling should be based on valid assumptions, not on assumptions that produce low p-values. Also, an honestly significant p-value does not imply that the effect is significant in practice.

This book is aimed at students and researchers in forestry and environmental studies. We assume that the readers have a basic understanding of statistics. We also assume that they are familiar with basic matrix algebra; readers who are not familiar with matrices should read, e.g. Appendix A in Fahrmeir et al. (2013) or any other similar appendices on the basics of matrices. Moreover, we use a lot of R-examples in this book; readers who are not familiar with the R software package should consult, for example, “An introduction to R” (https://www.r-project.org/).

There are approximately 170 examples in this textbook. A proportion of them are web examples, which are just briefly mentioned in the book and are available in their entirety from the book website at

http://www.biombook.org.

In addition, the full scripts for all R-examples are available from the book website. The data sets and some specific functions that are used in the examples are available in the R-package lmfor, which is available at the comprehensive R archive network (CRAN).

The subsequent chapters of the book are organized as follows: Chapters 2 and 3 summarize the necessary preliminaries for a sufficiently deep understanding of the main ideas, capabilities and constraints of the methods described in the subsequent chapters. Chapter 2 presents the basic mathematical tools that are used to formulate a (theoretical) model for a process of interest and Chapter 3 presents the general principles about how the process parameters can be estimated using observed data. In particular, Section 3.5 describes the linear predictor, the generality of which is emphasized and demonstrated in almost all subsequent chapters of the book.

Chapters 4–10, are devoted to regression models in different contexts. Chapter 4 covers the linear model. A difference between our text and many other textbooks on the topic is that we present the model directly for a general variance-covariance structure. The use of that model in prediction is also illustrated, which links the linear model to the prediction of time series and geostatistics. Chapters 5 and 6 cover the linear mixed-effect models for a data set with a single level of grouping. In contrast to many other textbooks on the subject, we devote considerable space to illustrate how the model is formulated in the matrix form. One reason for this is the common application in forest sciences, where a previously fitted model is used to predict the random effects in a new group of interest. Sections 6.3–6.5 include topics that we have not seen previously discussed in textbooks. Chapters 7, 8 and 9 generalize the ideas of Chapters 4–6 to non-normal, nonlinear and multivariate models. The similarity between the nonlinear and generalized linear models is emphasized. Chapter 9 discusses the multivariate models and shows through examples how it is formulated as a univariate system, and how the cross-model correlations can be utilized in prediction. Chapter 10 extends the discussion of regression models by addressing some topics that are common to all the models described in the previous chapters.

Chapters 11–14 discuss some specific topics related to our own experiences in modeling forest data sets. These include the modeling of tree size, stem taper, measurement errors, and a short discussion on analysis and planning of forest experiments.

Our notations differ slightly between chapters. The most important difference is probably the meaning of capital and lower...