The meaning cannot be deduced from the box plot alone. The box plot provides **a five-number summary**, which includes the lowest, first quartile, median, third quartile, and maximum. This information is all that's needed to interpret most box plots. Some exceptions include cases where there are very few observations or where the data appear to be mostly negative or positive.

For example, if you see **a box plot** like the one below, you can't determine whether the middle 50% of responses (the median) was higher or lower than **the other 50%**. There are just too many observations outside the range of the plot for us to say anything definitive about this sample.

This makes **a visual inspection** of the plot plus some basic statistics necessary to draw **any conclusions** about the sample as a whole. The median response was probably around 75%, but we can't tell because there are so many observations outside the range of the plot.

A similar situation would occur if almost everything in the sample was rated above the median. In this case, too, it would be difficult to estimate what percentage of responses were above the median.

These examples show that even when using a box plot, it's still necessary to look at additional statistics to draw meaningful conclusions about a sample.

- What measures can you not find with a box and whisker plot?
- Which values are used in a box plot?
- How do you read a box and whisker plot in Excel?
- Can Excel make box-and-whisker plots?
- What values do you need to know to create a box plot and select all that applies?
- What statistics are needed to draw a box plot?

A box plot is made up of **five values**: the lowest value, the first quartile, the median, the third quartile, and the highest value. These five numbers describe the range of the data and how many times this range was exceeded by the contents of the sample.

The lowest value is called the "first quartile" because it represents the lower end of the box if the data are normally distributed. The highest value is called the "third quartile" because it represents the upper end of the box if the data are normally distributed. The middle value is called the "median". If the data are not normally distributed, then these three quantities will not necessarily represent the same thing as the lowest value, the highest value, and the distance between them (the "interquartile range").

These four numbers are plotted inside the box, with each one being represented by a small dot. Points outside the range of the data have **no effect** on the interpretation of the results except to provide **visual interest**. So, for example, if there were only two values in your sample, they would be able to tell you exactly what percentage of the data were below some threshold.

How to Read and Create a Box Plot (Box and Whiskers) with Excel, TI-83, and SPSS

- Contents (Click to skip to that section):
- Step 1: Find the minimum.
- Step 2: Find Q1, the first quartile.
- Step 3: Find the median.
- Step 4: Find Q3, the third quartile.
- Step 5: Find the maximum.

A box-and-whisker chart is not available in Excel. Instead, you may manipulate **an Excel chart** to create boxes and whiskers. The box-and-whisker plot displays the lowest, first quartile, median, third quartile, and maximum of a set of data rather than the mean and standard error. The box is divided by the median. The whiskers are divided by the minimum and maximum values.

These values are used to compare how near other data values are to them. The median divides the data into **two equal parts**, one containing the values less than the median and the other containing the values greater than or equal to the median.

The mean is often used as a measure of central tendency for numeric data. However it is not suitable for measuring the center of a distribution of numerical data. For example, the mean of -1,000, 1,000, and 2,000 would be 0 but the median is 500 because half of the numbers are negative. For this reason, the median is usually preferred over the mean for grouping data.

There are three types of **box plots**: horizontal, vertical, and semi-transparent. In a horizontal box plot, each point represents **a single observation**. The horizontal axis shows the observations from the smallest value to the largest without any labels on the axis. The vertical axis shows the values along the bottom edge of the box. Each category is represented by a distinct color. The categories may be pre-defined or determined by the data itself. Semitransparent boxes are similar to transparent boxes except that the inside of the box is also colored.

A horizontal or vertical number line and a rectangular box are used to create **a box plot**.

The minimum number required to calculate any statistical measure about a population is called the sample size. For example, to estimate the average weight of men in **your community**, you would need 20 men in **your study group**. To estimate the range of weights, you would need at least 30 men. To estimate the variance of weights, you would need 60 men.

Statistical measures depend on the number of samples taken. If there are only two samples, we can't calculate **any statistical measures**. If there are ten samples, we can calculate the mean, the median, and the mode (the value that appears most often). But not the standard deviation or the range.

To estimate a statistical measure for a population with many samples, we use what's called "approximate methods." We may guess at the value we want to estimate - say, the average weight of men in your community - and then calculate the percentage close to this estimated value. This gives us an idea of how accurate our estimate was. If it's very close to the actual value, then we know that our estimate wasn't too far off.