Statistics: No Box-and-Whiskers; A Better Histogram

Many of you know that I have ‘been around’ for a long time.  My first statistics course was around 1970, and I started teaching some statistics in 1973.  I’ve had some concerns about a tool invented about that time (box and whisker plots), and want to propose a replacement graphic.

Here are two box & whisker plots (done in horizontal format, which I prefer):

box-plot-Wait_Times_May2016 box-plot-HDL_May2016

 

 

 

 

 

 

 

 

There are two basic flaws in the box & whisker display:

  1. The display implies information about variation, when the underlying summary does not (quartiles).
  2. The display requires the reader to invert the visual relationship:  A larger ‘box’ means a smaller density, a smaller ‘box’ means a larger density

Here are the underlying data sets, presented in histogram format (which is not perfect, but avoids both of those issues):

Histograma_HDL

 

 

 

 

 

 

 

 

histogram_wait_time

 

 

 

 

 

 

 

 

 

 

 

 

Some of the problems with box plots are well documented; a number of more sophisticated displays have been used.  See http://vita.had.co.nz/papers/boxplots.pdf. These better displays are seldom used, especially in introductory statistics courses.

The main attractions of the box-plot was that it provided an easy visual display of 5 numbers — minimum, first quartile, median, third quartile, maximum.  The problem with creating a visual display of such simple summary data is that it will always imply more information than existed in the summary.  We’ve got a solution at hand, much simpler than the alternatives used (which are based on maintaining the box concept):

Replace basic box-and-whisker plots with a “quartiled histogram”.

A quartiled histogram adds the quartile markers to a normal histogram display.  Here are two examples; compare these to the box plots above:

Quartiled-Historgram-HDL_May2016

 

 

 

 

 

 

 

 

 

Quartiled-Historgram-Wait_Times_May2016

 

 

 

 

 

 

 

 

 

 

 

 

The quartiled histogram combines the basic histogram with a simplified cumulative frequency chart — without losing the independent information of each category.

Perhaps a basic box and whisker plot works when the audience is sophisticated in understanding statistics (researchers, statisticians, etc).  Because of known perceptual weaknesses, I think we would be better served to either not cover box & whisker plots in intro classes — or to cover them briefly with a caution that they are to be avoided in favor of more sophisticated displays.

 Join Dev Math Revival on Facebook:

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

WordPress Themes