Mathematics

Box Plots

Box plots, also known as box-and-whisker plots, are graphical representations of the distribution of a dataset. They display the median, quartiles, and potential outliers of the data. The box represents the interquartile range, while the whiskers extend to the minimum and maximum values. Box plots are useful for comparing the spread and central tendency of different datasets.

Written by Perlego with AI-assistance

8 Key excerpts on "Box Plots"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation

    ...Jennifer A. Brussow Jennifer A. Brussow Brussow, Jennifer A. Box Plot Box plot 221 222 Box Plot Box Plots (also called box-and-whisker diagrams) are a concise way of displaying the distributions of a group (or groups) of data in terms of its median and quartiles. This way of describing a data set is commonly called a five-number summary, where the five numbers are the minimum, first quartile, median, third quartile, and the maximum. While less informative than histograms, Box Plots are helpful for identifying outliers and comparing distributions between groups. Figure 1 provides an example of a box plot showing the distributions of data for two different groups. The whiskers are represented according to their most common method of calculation: the most extreme values falling within 1.5 times the interquartile range (sometimes abbreviated as IQR). Figure 1 Box plot with whiskers and outliers A box plot consists of a box whose upper bound represents the 75th percentile, or third quartile, and lower bound represents the 25th percentile, or first quartile. The boundaries of the box are sometimes called the upper and lower hinges, and the distance between the hinges is sometimes referred to as the H-spread. The median, or 50th percentile, is represented by a line bisecting the plot. The median is also referred to as the second quartile. The mean may also be displayed in a box plot by adding a cross or an “X” to the plot. The range between the upper and lower bounds is called the interquartile range (sometimes abbreviated as IQR). In some instances, “whiskers” are added to this box; the whiskers typically extend to the farthest points in the data that are still within 1.5 times the IQR from the lower and upper quartiles. The unit of 1.5 times the IQR was set by John Tukey when he created the box-and-whisker plot and is sometimes called a step...

  • Statistical Methods for Communication Science
    • Andrew F. Hayes(Author)
    • 2020(Publication Date)
    • Routledge
      (Publisher)

    ...A box plot contains information about the median of a distribution, the interquartile range (IQR), the measurement interval that contains the inner 50% of measurements, and the minimum and maximum measurements in a distribution, while at the same time highlighting measurements that are unusual using certain criteria. A box plot of the TV viewing data is displayed in Figure 4.4. The figure itself is fairly self-explanatory. The dark line dividing the gray box is the median, while the upper and lower edges of the box define the end points of the ordinal middle 50% of the measurements. From the box plot, you can see that the median measurement is 2, whereas 50% of the measurements reside between 1.5 and 3. By definition, then, the interquartile range is 3 − 1.5 = 1.5. The long horizontal lines above and below the box are set at the median plus and minus 1.5 interquartile ranges. However, if the median plus 1.5 IQRs exceeds the maximum measurement, then the upper line is placed at the maximum. If the median minus 1.5 IQRs is smaller than the minimum measurement, then the lower line is set at the minimum. The box plot also depicts the “unusual” measurements, defined as those with measurements that are more than 1.5 IQRs from the median (in either direction). Different statistical programs will depict unusual cases differently. In SPSS (which generated this figure), “outliers” in a box plot are defined as cases with measurements between 1.5 and 3 IQRs from the median. “Extreme values” are defined by SPSS as measurements more than three IQRs from the median. Figure 4.4 A box plot of the TV viewing data. 4.6 Standardization It is common in communication research to do a mathematical transformation to a set of measurements to put them in standard or standardized form. This transformation is called standardization...

  • Understanding Quantitative Data in Educational Research

    ...It consists of a box with a line at the bottom edge (the first quantile), a thicker line inside the box (the median), another line at the top edge (the third quantile), and whiskers which extend to the minimum and maximum. Thus, the boxplot allows us to check quickly for symmetry, which holds if the median is in the middle of the box and each quartile is about the same length. Our example clearly shows a symmetric distribution around 10 minutes’ waiting time. Example 3.7 Plotting the histogram and boxplot on the same graph Combining the histogram and the boxplot for the same data in the same figure helps the researcher to identify variances among data. For example, a histogram helps us to easily identify large or small variances among the observed frequencies, while a boxplot is used to analyse a moderate variance. The function simple.hist.and.boxplot() is part of the UsingR package and will plot both the histogram and the boxplot (Figure 3.9). Comparing boxplot and histogram Install and upload the UsingR package: install.packages("UsingR") library(UsingR) Plot the histogram and boxplot in the same Figure 3.9 : simple.hist.and.boxplot(coffeetime) Figure 3.9 Histogram and boxplot created using the function simple.hist.and.boxplot() for Example 3.6 3.2.6 Line graph A line graph can be used to evaluate the relationship between variables, for example to reveal patterns over time when each point on the graph represents the value of an interval or ratio variable at a specific point in time, or to reveal how changes in one variable relate to changes in another variable. When any change in an independent variable produces the same constant change in the dependent variable, the relationship between the variables is described as linear and is represented by a straight line...

  • Statistical Data Analysis Explained
    eBook - ePub

    Statistical Data Analysis Explained

    Applied Environmental Statistics with R

    • Clemens Reimann, Peter Filzmoser, Robert Garrett, Rudolf Dutter(Authors)
    • 2011(Publication Date)
    • Wiley
      (Publisher)

    ...From 0.1 to 1 mg/kg the As values were reported in 0.1 mg/kg steps – obviously a too-harsh discretisation for the data at hand, causing artificial data structures (Figure 3.12). The presence of multiple populations results in slope changes and breaks in the plots (Figure 3.12). 3.5 Boxplots The boxplot is one of the most informative graphics for displaying a data distribution. It is built around the MEDIAN (see chapter 4), which divides any data set into two equal halves. 3.5.1 The Tukey boxplot Tukey (1977) introduced the boxplot to exploratory data analysis. The construction of the Tukey boxplot is best demonstrated using a simple sample data set, consisting of only nine values: 2.3 2.7 1.7 1.9 2.1 2.8 1.8 2.4 5.9. The data are sorted to find the MEDIAN: 1.7 1.8 1.9 2.1 2.3 2.4 2.7 2.8 5.9. After finding the MEDIAN (2.3), the two halves (each of the halves includes the MEDIAN) of the data set are used to find the “hinges”, the MEDIAN of each remaining half: 1.7 1.8 1.9 2.1 2.3 2.4 2.7 2.8 5.9. These upper and lower hinges define the central box, which thus contains approximately 50 percent of the data. In the example the “lower hinge” (LH) is 1.9, the “upper hinge” (UH) is 2.7. The “inner fence”, a boundary beyond which individuals are considered extreme values or potential outliers, is defined as the box extended by 1.5 times the length of the box towards the maximum and the minimum. This is defined algebraically, using the upper whisker as an example, as Upper inner fence (UIF) = UH(x) + 1.5 · HW(x). Upper whisker = max(x [ x ≤ UIF]). where HW (hinge width) is the difference between the hinges (HW = upper hinge-lower hinge), approximately equal to the interquartile range (depending on the sample size), i.e...

  • Quantitative Data Analysis with Minitab
    eBook - ePub

    Quantitative Data Analysis with Minitab

    A Guide for Social Scientists

    • Alan Bryman, Duncan Cramer(Authors)
    • 2003(Publication Date)
    • Routledge
      (Publisher)

    ...extreme values, which are separately indicated. It has a number of advantages. Like the stem and leaf display, the boxplot provides information about the shape and dispersion of a distribution. For example, is the box closer to one end or is it near the middle? The former would denote that values tend to bunch at one end. In this case, the bulk of the observations are at the lower end of the distribution, as is the median. This provides further information about the shape of the distribution, since it raises the question of whether the median is closer to one end of the box, as it is in this case. On the other hand, the boxplot does not retain information like the stem and leaf display. Figure 5.7 provides a boxplot of the data from Table 5.7 using Professional Graphics in Minitab for Windows. The four outliers are signalled, using the previously-discussed criterion, with asterisks. It is clear that in half the authorities (all those below the line representing the median) 20 per cent or fewer reports are issued within six months. If Standard Graphics are enabled, the boxplot will be rather different. Figure 5.6 Boxplot In order to generate a boxplot for ‘needs’ with the prompt system, the following command will produce a basic boxplot: MTB> boxplot ‘needs’ With the menu system, the following sequence will produce the same end: → Stat → EDA → Boxplot c→ needs → Select [this will bring needs into the Graph Variables: box beneath the Y and to the right of the figure 1 ] →if IQ Range Box and Outlier do not appear in the Datadisplay: box, click on the downward pointing arrow to the right of Display and enable each of these by choosing first IQ Range Box, then click again on the downward pointing arrow and then choose Outlier S>> → OK Both of these exploratory data analysis techniques can be recommended as providing useful first steps in gaining a feel for data when you first start to analyse them...

  • Research Methods for Nursing and Healthcare
    • John Maltby, Glenn Williams, Julie Mcgarry, Liz Day(Authors)
    • 2014(Publication Date)
    • Routledge
      (Publisher)

    ...Though on its own a variability can often seem redundant, it is useful when you are comparing two sets of findings, because then you can also compare the dispersion of scores. In the example above relating to the learning disability nurse visits, the standard deviation is helpful, because it tells us something additional about the data. It tells us that people’s visits to the learning disability nurse are a lot more varied and that staff may wish to look into why this is: are some people not keeping appointments, while others are making too many, or does the administration of making appointments need to be looked at? What is important is that it is good practice always to report the semi-interquartile range when reporting the median average statistic, and to report the standard deviation when reporting the mean average statistic. 8.4 Charts: visual presentation of variability with Box Plots As with frequency tables and bar charts, pie charts and histograms, there is a way to graphically represent lower and upper percentiles and interquartile ranges. This is known as a box plot. A box plot is a way of showing five aspects of numerical data, the smallest value, the lower quartile (Q1), upper quartile (Q3), the median and the largest value. Let us return to the district nursing team example from earlier in this chapter and the assessment of referrals made to the team over a period of 11 days. (i.e. 2, 3, 4, 4, 5, 5, 6, 6, 6, 7 and 8 patients during this time). Figure 8.8 shows a box plot of the data. Remember, we know from data analysis earlier in the chapter that the lower quartile is 4 and the upper quartile is 6. We also know that the highest value is 8 and the lowest value is 3. We also know with a quick scan of the data that the median is 5 as it is the 6th value, the middle value, in 11 numbers. You can see all these values outlined on the box plot. The box itself contains the middle 50 per cent of the data, i.e...

  • Statistics for Psychologists
    eBook - ePub

    Statistics for Psychologists

    An Intermediate Course

    ...As might be expected, the plot indicates a strong correlation for the two ages. Adding the line y = x to the plot, see Figure 2.11(b), highlights that there are a greater number of couples in which the husband is older than his wife, than there are those in which the reverse is true. Finally, in Figure 2.11(c) the bivariate scatter of the two age variables is framed with the observations on each. Plotting marginal and joint distributions together in this way is usually good data analysis practice; a further possibility for achieving this goal is shown in Figure 2.12. Fig. 2.11.  Scatterplots of (a) ages of husbands and wives in 100 married couples; (b) with line y = x added; (c) enhanced with observations on each variable. Fig. 2.12.  Scatterplot of wife’s height, against husband’s height, showing marginal distributions of each variable. Display 2.2 Constructing a Box Plot The plot is based on five-number summary of a data set: 1, minimum; 2, lower quartile; 3, median; 4, upper quartile; 5, maximum. The distance between the upper and lower quartiles, the interquartile range, is a measure of the spread of a distribution that is quick to compute and, unlike the range, is not badly affected by outliers. The median, upper, and lower quartiles can be used to define rather arbitrary but still useful limits, L and U, to help identify possible outliers in the data: Where UQ is the upper quartile, LQ is the lower quartile, and IQR the interquartile range, UQ-LQ. Observations outside the limits L and U are regarded as potential outliers and identified separately on the box plot (and known as outside values), which is constructed as follows: To construct a box plot, a “box” with ends at the lower and upper quartiles is first drawn. A horizontal line (or some other feature) is used to indicate the position of the median in the box. Next, lines are drawn from each end of the box to the most remote observations that, however, are not outside observations as defined in the text...

  • Introducing Research and Data in Psychology
    eBook - ePub

    Introducing Research and Data in Psychology

    A Guide to Methods and Analysis

    • Ann Searle(Author)
    • 2002(Publication Date)
    • Routledge
      (Publisher)

    ...The results are as follows: If these scores were arranged as a stem and leaf diagram they would look like this: Time taken to solve food-related anagrams The numbers down the middle strip show the whole numbers; the numbers either side show the decimal points. Thus 5.2 is shown as: 5 2 You can see that the stem and leaf diagram gives a good indication of the ‘shape’ of the data and, at the same time, all the scores are represented so that none of the data is lost. Stem and leaf diagrams can be very useful—and they are quick and easy to produce. Box and whisker plots These are a useful indicator of the spread of scores. The ‘box’ represents the middle 50 per cent of the data, the ‘whiskers’ show the range and a cross shows the median. For example, for the set of scores: 2 4 6 8 10 12 14, a box and whisker plot would look like this: Figure 9.13 A box and whisker plot Box and whisker plots can provide a useful indicator of any skew in the scores. Figure 9.13 shows a normal distribution, while Figures 9.14 and 9.15 indicate skewed distributions: Figure 9.14 A positive skew Figure 9.15 A negative skew Exercise 30 What is wrong with the following table and figures? Each has at least two faults. Table 9.3 Table of results Group A Group B P1 4 7 P2 6 8 P3 4 6 P4 5 5 P5 3 7 P6 2 8 Mean 4 6.83 Median 4 7 Figure 9.16 Figure 9.17 Histogram to show favourite animals Scattergrams or scattergraphs These are used to illustrate. the findings of correlational research and indicate the pattern of the relationship between two variables. The easiest way to demonstrate their use is to give an example: Table 9.4 Table to show shoe size at the age of nine and height at the age of eighteen Shoe sixe at 9 Height at 18 P1 1 6'2" P2 10 5'10" P3 11 5'11" P4 1 6'0" P5 10 5'11" P6 2 6'2" P7 8 5'8" P8 9 5'10" P9 11 6'0" P10 12 6'1" To plot this data on to a scattergram: 1 Label one axis with height and one with shoe size...