Choosing and Using Statistics
eBook - ePub

Choosing and Using Statistics

A Biologist's Guide

Calvin Dytham

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Choosing and Using Statistics

A Biologist's Guide

Calvin Dytham

Book details
Book preview
Table of contents
Citations

About This Book

Choosing and Using Statistics remains an invaluable guide for students using a computer package to analyse data from research projects and practical class work. The text takes a pragmatic approach to statistics with a strong focus on what is actually needed. There are chapters giving useful advice on the basics of statistics and guidance on the presentation of data. The book is built around a key to selecting the correct statistical test and then gives clear guidance on how to carry out the test and interpret the output from four commonly used computer packages: SPSS, Minitab, Excel, and (new to this edition) the free program, R. Only the basics of formal statistics are described and the emphasis is on jargon-free English but any unfamiliar words can be looked up in the extensive glossary. This new 3 rd edition of Choosing and Using Statistics is a must for all students who use a computer package to apply statistics in practical and project work.

Features new to this edition:

  • Now features information on using the popular free program, R
  • Uses a simple key and flow chart to help you choose the right statistical test
  • Aimed at students using statistics for projects and in practical classes
  • Includes an extensive glossary and key to symbols to explain any statistical jargon
  • No previous knowledge of statistics is assumed

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Choosing and Using Statistics an online PDF/ePUB?
Yes, you can access Choosing and Using Statistics by Calvin Dytham in PDF and/or ePUB format, as well as other popular books in Biological Sciences & Biology. We have over one million books available in our catalogue for you to explore.

Information

Year
2011
ISBN
9781444348217
Subtopic
Biology
Edition
3

1
Eight steps to successful data analysis

This is a very simple sequence that, if you follow it, will integrate the statistics you use into the process of scientific investigation. As I make clear here, statistical tests should be considered very early in the process and not left until the end.
  1. 1 Decide what you are interested in.
  2. 2 Formulate a hypothesis or several hypotheses (see Chapters 2 and 3 for guidance).
  3. 3 Design the experiment, manipulation or sampling routine that will allow you to test the hypotheses (see Chapters 2 and 4 for some hints on how to go about this).
  4. 4 Collect dummy data (i.e. make up approximate values based on what you expect to obtain). The collection of ā€˜dummy dataā€™ may seem strange but it will convert the proposed experimental design or sampling routine into something more tangible. The process can often expose flaws or weaknesses in the datacollection routine that will save a huge amount of time and effort.
  5. 5 Use the key presented in Chapter 3 to guide you towards the appropriate test or tests.
  6. 6Carry out the test(s) using the dummy data. (Chapters 6ā€“9 will show you how to input the data, use the statistical packages and interpret the output.)
  7. 7 If there are problems go back to step 3 (or 2); otherwise, proceed to the collection of real data.
  8. 8 Carry out the test(s) using the real data. Report the findings and/or return to step 2.
I implore you to use this sequence. I have seen countless students who have spent a long time and a lot of effort collecting data only to find that the experimental or sampling design was not quite right. The test they are forced to use is much less powerful than one they could have used with only a slight change in the experimental design. This sort experience tends to turn people away from statistics and become ā€˜scaredā€™ of them. This is a great shame as statistics are a hugely useful and vital tool in science.
The rest of the book follows this eight-step process but you should use it for guidance and advice when you become unsure of what to do.

2
The basics

The aim of this chapter is to introduce, in rather broad terms, some of the recurring concepts of data collection and analysis. Everything introduced here is covered at greater length in later chapters and certainly in the many statistics textbooks that aim to introduce statistical theory and experimental design to scientists.
The key to statistical tests in the next chapter assumes that you are familiar with most of the basic concepts introduced here.

Observations

These are the raw material of statistics and can include anything recorded as part of an investigation. They can be on any scale from a simple ā€˜raining or not rainingā€™ dichotomy to a very sophisticated and precise analysis of nutrient concentrations. The type of observations recorded will have a great bearing on the type of statistical tests that are appropriate.
Observations can be simply divided into three types: categorical where the observations can be in a limited number of categories which have no obvious scale (e.g. ā€˜oakā€™, ā€˜ashā€™, ā€˜elmā€™); discrete where there is a real scale but not all values are possible (e.g. ā€˜number of eggs in a nestā€™ or ā€˜number of species in a sampleā€™) and continuous where any value is theoretically possible, only restricted by the measuring device (e.g. lengths, concentrations).
Different types of observations are considered in more detail in Chapter 5.

Hypothesis testing

The cornerstone of scientific analysis is hypothesis testing. The concept is rather simple: almost every time a statistical test is carried out it is testing the probability that a hypothesis is correct. If the probability is small then the hypothesis is deemed to be untrue and it is rejected in favour of an alternative. This is done in what seems to be a rather upside down way as the test is always of what is called the null hypothesis rather than the more interesting hypothesis. The null hypothesis is the hypothesis that nothing is going on (it is often labelled as H0). For example, if the weights of bulbs for two cultivars of daffodils were being investigated, the null hypothesis would be that there is no weight difference between cultivars: ā€˜the weights of the two groups of bulbs are the sameā€™ or, more correctly, ā€˜the two groups of bulbs are samples from a larger population with the same distributionā€™. A statistical test is carried out to find out how likely that null hypothesis is to be true. If we decide to reject the null hypothesis we must accept the alternative, more interesting, hypothesis (H1) that: ā€˜the weights of bulbs for the two cultivars are differentā€™ or, more correctly, that ā€˜the groups are samples from populations with different distributionsā€™.

P-values

The P-value is the bottom line of most statistical tests. (Incidentally, you may come across it written in upper or lower case, italic or not: e.g. P value, P-value, p value or p-value.) It is the probability of seeing data this extreme or more extreme if the null hypothesis is true. So if a P-value is given as 0.06 it indicates that you have a 6% chance of seeing data like this if the null hypothesis is true. In biology it is usual to take a value of 0.05 or 5% as the critical level for the rejection of a hypothesis. This means that providing a hypothesis has a less than one in 20 chance of being true we reject it. As it is the null hypothesis that is nearly always being tested we are always looking for low P-values to reject this hypothesis and accept the more interesting alternative hypothesis.
Clearly the smaller the P-value the more confident we can be in the conclusions drawn from it. A P-value of 0.0001 indicates that if the null hypothesis is true the chance of seeing data as extreme or more extreme than that being tested is one in 10 000. This is much more convincing than a marginal P = 0.049.
P-values and the types of errors that are implicitly accepted by their use are considered further in Chapter 4.

Sampling

Observations have to be collected in some way. This process of data acquisition is called sampling. Although there are almost as many different methods that can be used for sampling as there are possible things to sample, there are some general rules. One of the most obvious is that a large number of observations is usually better than a small number. Balanced sampling is also important (i.e. when comparing two groups take the same number of observations from each group).
Most statistical tests assume that samples are taken at random. This sounds easy but is actually quite difficult to achieve. For example, if you are sampling beetles from pit-fall traps the sample may seem totally random but in fact is quite biased towards those species that move around the most and fail to avoid the traps. Another common bias is to chose a point at random and then measure the nearest individual to that point, assuming that this will produce a random sample. It will not be random at all as isolated individuals and those at the edges of clumps are more likely to be selected than those in the middle. There are methods available to reduce problems associated with non-random sampling but the first step is to be aware of the problem.
A further assumption of sampling is that individuals are either only measured once or they are all sampled on several occasions. This assumption is often violated if, for example, the same site is visited on two occasions and the same individuals or clones are inadvertently remeasured.
The sets of observations collected are called variables. A variable can be almost anything it is possible to record as long as different individuals can be assigned different values.
Some of the problems of sampling are considered in Chapter 4.

Experiments

In biology many investigations use experiments of some sort. An experiment occurs when anything is altered or controlled by the investigator. For example, an investigation into the effect of fertilizer on plant growth will use a control plot (or several control plots) where there is no fertilizer added and then one or more plots where fertilizer has been added at known concentrations set by the investigators. In this way the effect of fertilizer can be determined by comparison of the different concentrations of fertilizer. The condition being controlled (e.g. fertilizer) is usually called a factor and the different levels used called treatments or factor levels (e.g. concentrations of fertilizer). The design of this experiment will be determined by the hypothesis or hypotheses being investigated. If the effect of the fertilizer on a particular plant is of interest then perhaps a range of different soil types might be used with and without fertilizer. If the effect on plants in general is of interest then an experiment using a variety of plants...

Table of contents