eBook - ePub

Choosing and Using Statistics

Name: Choosing and Using Statistics
ISBN: 9781444348217

A Biologist's Guide

Calvin Dytham,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Choosing and Using Statistics

A Biologist's Guide

Calvin Dytham,

About this book

Choosing and Using Statistics remains an invaluable guide for students using a computer package to analyse data from research projects and practical class work. The text takes a pragmatic approach to statistics with a strong focus on what is actually needed. There are chapters giving useful advice on the basics of statistics and guidance on the presentation of data. The book is built around a key to selecting the correct statistical test and then gives clear guidance on how to carry out the test and interpret the output from four commonly used computer packages: SPSS, Minitab, Excel, and (new to this edition) the free program, R. Only the basics of formal statistics are described and the emphasis is on jargon-free English but any unfamiliar words can be looked up in the extensive glossary. This new 3rd edition of Choosing and Using Statistics is a must for all students who use a computer package to apply statistics in practical and project work.

Features new to this edition:

Now features information on using the popular free program, R
Uses a simple key and flow chart to help you choose the right statistical test
Aimed at students using statistics for projects and in practical classes
Includes an extensive glossary and key to symbols to explain any statistical jargon
No previous knowledge of statistics is assumed

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Wiley-Blackwell

Year

2011

Print ISBN

9781405198394

9781405198387

Edition

eBook ISBN

9781444348217

Topic

Biological Sciences

Subtopic

Biology

Index

Biological Sciences

1
Eight steps to successful data analysis

This is a very simple sequence that, if you follow it, will integrate the statistics you use into the process of scientific investigation. As I make clear here, statistical tests should be considered very early in the process and not left until the end.

1 Decide what you are interested in.
2 Formulate a hypothesis or several hypotheses (see Chapters 2 and 3 for guidance).
3 Design the experiment, manipulation or sampling routine that will allow you to test the hypotheses (see Chapters 2 and 4 for some hints on how to go about this).
4 Collect dummy data (i.e. make up approximate values based on what you expect to obtain). The collection of ‘dummy data’ may seem strange but it will convert the proposed experimental design or sampling routine into something more tangible. The process can often expose flaws or weaknesses in the datacollection routine that will save a huge amount of time and effort.
5 Use the key presented in Chapter 3 to guide you towards the appropriate test or tests.
6Carry out the test(s) using the dummy data. (Chapters 6–9 will show you how to input the data, use the statistical packages and interpret the output.)
7 If there are problems go back to step 3 (or 2); otherwise, proceed to the collection of real data.
8 Carry out the test(s) using the real data. Report the findings and/or return to step 2.

I implore you to use this sequence. I have seen countless students who have spent a long time and a lot of effort collecting data only to find that the experimental or sampling design was not quite right. The test they are forced to use is much less powerful than one they could have used with only a slight change in the experimental design. This sort experience tends to turn people away from statistics and become ‘scared’ of them. This is a great shame as statistics are a hugely useful and vital tool in science.

The rest of the book follows this eight-step process but you should use it for guidance and advice when you become unsure of what to do.

2
The basics

The aim of this chapter is to introduce, in rather broad terms, some of the recurring concepts of data collection and analysis. Everything introduced here is covered at greater length in later chapters and certainly in the many statistics textbooks that aim to introduce statistical theory and experimental design to scientists.

The key to statistical tests in the next chapter assumes that you are familiar with most of the basic concepts introduced here.

Observations

These are the raw material of statistics and can include anything recorded as part of an investigation. They can be on any scale from a simple ‘raining or not raining’ dichotomy to a very sophisticated and precise analysis of nutrient concentrations. The type of observations recorded will have a great bearing on the type of statistical tests that are appropriate.

Observations can be simply divided into three types: categorical where the observations can be in a limited number of categories which have no obvious scale (e.g. ‘oak’, ‘ash’, ‘elm’); discrete where there is a real scale but not all values are possible (e.g. ‘number of eggs in a nest’ or ‘number of species in a sample’) and continuous where any value is theoretically possible, only restricted by the measuring device (e.g. lengths, concentrations).

Different types of observations are considered in more detail in Chapter 5.

Hypothesis testing

The cornerstone of scientific analysis is hypothesis testing. The concept is rather simple: almost every time a statistical test is carried out it is testing the probability that a hypothesis is correct. If the probability is small then the hypothesis is deemed to be untrue and it is rejected in favour of an alternative. This is done in what seems to be a rather upside down way as the test is always of what is called the null hypothesis rather than the more interesting hypothesis. The null hypothesis is the hypothesis that nothing is going on (it is often labelled as H₀). For example, if the weights of bulbs for two cultivars of daffodils were being investigated, the null hypothesis would be that there is no weight difference between cultivars: ‘the weights of the two groups of bulbs are the same’ or, more correctly, ‘the two groups of bulbs are samples from a larger population with the same distribution’. A statistical test is carried out to find out how likely that null hypothesis is to be true. If we decide to reject the null hypothesis we must accept the alternative, more interesting, hypothesis (H₁) that: ‘the weights of bulbs for the two cultivars are different’ or, more correctly, that ‘the groups are samples from populations with different distributions’.

P-values

The P-value is the bottom line of most statistical tests. (Incidentally, you may come across it written in upper or lower case, italic or not: e.g. P value, P-value, p value or p-value.) It is the probability of seeing data this extreme or more extreme if the null hypothesis is true. So if a P-value is given as 0.06 it indicates that you have a 6% chance of seeing data like this if the null hypothesis is true. In biology it is usual to take a value of 0.05 or 5% as the critical level for the rejection of a hypothesis. This means that providing a hypothesis has a less than one in 20 chance of being true we reject it. As it is the null hypothesis that is nearly always being tested we are always looking for low P-values to reject this hypothesis and accept the more interesting alternative hypothesis.

Clearly the smaller the P-value the more confident we can be in the conclusions drawn from it. A P-value of 0.0001 indicates that if the null hypothesis is true the chance of seeing data as extreme or more extreme than that being tested is one in 10 000. This is much more convincing than a marginal P = 0.049.

P-values and the types of errors that are implicitly accepted by their use are considered further in Chapter 4.

Sampling

Observations have to be collected in some way. This process of data acquisition is called sampling. Although there are almost as many different methods that can be used for sampling as there are possible things to sample, there are some general rules. One of the most obvious is that a large number of observations is usually better than a small number. Balanced sampling is also important (i.e. when comparing two groups take the same number of observations from each group).

Most statistical tests assume that samples are taken at random. This sounds easy but is actually quite difficult to achieve. For example, if you are sampling beetles from pit-fall traps the sample may seem totally random but in fact is quite biased towards those species that move around the most and fail to avoid the traps. Another common bias is to chose a point at random and then measure the nearest individual to that point, assuming that this will produce a random sample. It will not be random at all as isolated individuals and those at the edges of clumps are more likely to be selected than those in the middle. There are methods available to reduce problems associated with non-random sampling but the first step is to be aware of the problem.

A further assumption of sampling is that individuals are either only measured once or they are all sampled on several occasions. This assumption is often violated if, for example, the same site is visited on two occasions and the same individuals or clones are inadvertently remeasured.

The sets of observations collected are called variables. A variable can be almost anything it is possible to record as long as different individuals can be assigned different values.

Some of the problems of sampling are considered in Chapter 4.

Experiments

In biology many investigations use experiments of some sort. An experiment occurs when anything is altered or controlled by the investigator. For example, an investigation into the effect of fertilizer on plant growth will use a control plot (or several control plots) where there is no fertilizer added and then one or more plots where fertilizer has been added at known concentrations set by the investigators. In this way the effect of fertilizer can be determined by comparison of the different concentrations of fertilizer. The condition being controlled (e.g. fertilizer) is usually called a factor and the different levels used called treatments or factor levels (e.g. concentrations of fertilizer). The design of this experiment will be determined by the hypothesis or hypotheses being investigated. If the effect of the fertilizer on a particular plant is of interest then perhaps a range of different soil types might be used with and without fertilizer. If the effect on plants in general is of interest then an experiment using a variety of plants...

Cover
Table of Contents
Title
Copyright
Preface
1 Eight steps to successful data analysis
2 The basics
3 Choosing a test: a key
4 Hypothesis testing sampling and experimental design
5 Statistics, variables and distributions
6 Descriptive and presentational techniques
7 The tests 1: tests to look at differences
8 The tests 2: tests to look at relationships
9 The tests 3: tests for data exploration
Symbols and letters used in statistics
Glossary
Assumptions of the tests
Hints and tips
A table of statistical tests
Index
End User License Agreement

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Choosing and Using Statistics an online PDF/ePUB?

Yes, you can access Choosing and Using Statistics by Calvin Dytham in PDF and/or ePUB format, as well as other popular books in Biological Sciences & Biology. We have over 1.5 million books available in our catalogue for you to explore.

Related ISBNs