eBook - ePub

Computer Intensive Statistical Methods

Name: Computer Intensive Statistical Methods
ISBN: 9781351458740

Validation, Model Selection, and Bootstrap

J. S. Urban. Hjorth,

272 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Computer Intensive Statistical Methods

Validation, Model Selection, and Bootstrap

J. S. Urban. Hjorth,

About this book

This book focuses on computer intensive statistical methods, such as validation, model selection, and bootstrap, that help overcome obstacles that could not be previously solved by methods such as regression and time series modelling in the areas of economics, meteorology, and transportation.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Chapman and Hall/CRC

Year

2017

Print ISBN

9780367449674

eBook ISBN

9781351458740

Topic

Mathematics

Subtopic

Probability & Statistics

Index

Mathematics

CHAPTER 1

Prelude

1.1 Background

Some powerful and very fundamental new principles have been developed in statistics for model selection, estimation, and evaluation of uncertainty. The underlying ideas are often simple and allow great freedom regarding the model structure. Instead they require repeated, repeated and repeated … almost identical computations on the data and are therefore often named computer intensive statistical methods.

Being computer intensive is a relative quality. We will here use it relative to traditional statistical computation. This means that the computations are in many cases still small compared to everyday computations in such areas as meteorology, physics or mechanics of materials. There are in fact several small examples where the intensive computations are done after a few seconds or minutes on a small PC. Other problems may require more heavy computations and in some cases the fastest computers will still set the limits of what we can do.

The large interest in computer intensive methods in statistics is basically not founded on the computer as such, nor on computer science in a broader sense. Instead the interest comes from the information concept. The purpose of every classical or new statistical analysis is to take care of and present the information in the given data as efficiently as possible. The means to attain this goal can vary, but will typically require computations which are heavy to do by hand. This motivated an early interest in automatic tools for the computations but the progress was fairly slow. Time and cost considerations therefore directed statistical work towards models and methods which kept down the amount of computation, and it is not until recently that these limitations have been removed almost entirely, and computer intensive statistical methods have spread over the world. Even extensive computations will today give costs which can typically be neglected compared to data collection, theoretical work, report writing and the consequences of the studies.

A lot of computation does, however, not guarantee that the information has been well used. The inference problem must be well understood and a small but smart analysis can be more correct than one which studies ‘everything’.

Much statistical software is now available and easy to use without much theoretical feeling for the subject. The potential misuse of statistical measures is therefore larger than ever. This refers in particular to the classical methods, which so far dominate the market, and which are typically based on special parametric assumptions and correct data collection. However, the new computer intensive methods are not infallible in this respect. They may be based on less parametric assumptions, but they certainly need to be based on correct data and they give often more rough and approximate solutions in situations where the classical parametric assumptions are fulfilled. It is therefore important to be critical and have a theoretical view in order to understand when computer intensive methods have anything essential to offer compared to less computer intensive techniques. The methods are justified when they can extract the information and describe the uncertainty better than alternative techniques.

Some frequent misuses of classical statistical software can in fact be counteracted by computer intensive methods, and we will in particular discuss these possibilities in model selection situations later. This means that some problems, in a sense caused by the computer, can also be solved by the computer. At the same time the new methods solve many problems which could not be properly attacked before.

1.2 About models

We will make extensive but abstract use of different statistical models, and for those readers who do not swim among such models all the time the following introduction may be useful.

A very general statement is that we need models in order to structure our ideas and conclusions. The models we use are simplified pictures of the phenomenon we are studying. Except for some very pure situations the models are therefore not ‘true’ but will typically have some defects.

The computer intensive methods make fewer assumptions than classical methods, and are robust against small departures from these assumptions. One potential use of them is therefore to correct for some model defects such as when estimates are based on oversimplified models. (Many engineering results in reliability and queueing are for example based on assumed exponential distributions, but are much more widely used). But this should not be stressed too hard. Some results presume that the corrections are small compared to the variability (for example bootstrap estimation of variability and bias correction). Sometimes we do not even need an explicit model, although we always feel there is one in the background. A prediction formula or an estimator can be all we need. Conclusions drawn about the models will be transferred to the real phenomenon that we are modelling, and with good models the model defects will not have serious effects on these conclusions.

We will mostly limit ourselves to stochastic models with a set of adjustable constants, parameters, which are unknown or only partially known. When distribution free statistical methods are used, we may even consider more vague model structures which can sometimes be characterized by an infinite dimensional parameter vector.

Statistical models can be regarded as tools for collecting and combining information. Already when a model is formulated we can (and to some extent must) bring in prior knowledge based on experience of the studied object. Observations and estimates of model parameters will sharpen this knowledge. One of the advantages of statistical models is that this sharpening can be described and measured, for example by the width of confidence intervals for the parameters. But information can be used for more than confidence intervals. In most models we can also find methods to predict future results. The validation methods in Chapters 3 and 4 will be based ...

Cover
Title Page
Copyright Page
Table of Contents
Preface
1 Prelude
2 Computer intensive philosophy
3 Cross validation
4 Validation of time series problems
5 Statistical bootstrap
6 Further bootstrap results
7 Computer intensive applications
References
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Computer Intensive Statistical Methods by J. S. Urban. Hjorth in PDF and/or ePUB format, as well as other popular books in Mathematics & Probability & Statistics. We have over one million books available in our catalogue for you to explore.

Computer Intensive Statistical Methods

Validation, Model Selection, and Bootstrap

Computer Intensive Statistical Methods

Validation, Model Selection, and Bootstrap

About this book

Trusted by 375,005 students

Information

Table of contents

Frequently asked questions