大象传媒

Distribution of data

You will be building upon material covered in M1 Mean, median, mode and range, M1 Frequency tables and M2 Estimate the mean.

Quartiles

Quartiles are the values that divide a data set into quarters:

  • Put the values in order
  • Divide the list into four equal parts
  • The Quartiles are at the divisions
Q1 is the lower quartile 25% of the data is lower than this value; Q2 is the median 50% of the data is lower than this value; Q3 is the upper quartile 75% of the data is lower than this value

The interquartile range shows the range in values of the central 50% of the data.

To find the interquartile range, subtract the value of the lower quartile Q1 from the value of the upper quartile Q3.

interquartile range = upper quartile - lower quartile

Example

The weights of 7 babies are:

3.5 kg, 4.1 kg, 3.4 kg, 2.5 kg, 3.5 kg, 4 kg, 3.1 kg

Find the interquartile range of the weights of the babies.

Solution:

The weights are already in order

\(2.5\qquad 3.1\qquad 3.4\qquad 3.5\qquad 3.5\qquad 4\qquad 4.1\)

To find the interquartile range

  • Start by putting the values in order
    \(2.5\qquad 3.1\qquad 3.4\qquad 3.5\qquad 3.5\qquad 4\qquad 4.1\)
  • find the median
    The median is the 4th value which is 3.5
    \(2.5 \qquad 3.1 \qquad 3.4 \qquad\fbox{\textbf{3.5}} \qquad 3.5 \qquad 4 \qquad 4.1\)

To find lower quartile which is Q1

  • count the number of values below the median
  • identify the middle of those values
    \(\fbox{2.5 \qquad \textbf{3.1} \qquad 3.4} \qquad 3.5 \qquad 3.5 \qquad 4 \qquad 4.1\)

To find the upper quartile which is Q3

  • count the number of values above the median
  • identify the middle of those values
    \(2.5 \qquad 3.1 \qquad 3.4 \qquad 3.5 \qquad \fbox{3.5 \qquad\textbf{4} \qquad 4.1}\)

interquartile range = upper quartile - lower quartile

= 4 - 3.1

Interquartile Range = 0.9 kg

Question

An 8th baby was born weighing 2.9 kg. Find the interquartile range of the 8 babies.

Answer:

\(\matrix{2.5 & & 2.9 & & 3.1 & & 3.4 & & 3.5 & & 3.5 & & 4 & & 4.1 & & \cr && &\boldsymbol{\uparrow}& && &\boldsymbol{\uparrow}& && &\boldsymbol{\uparrow}& && && \cr && &\textbf{Q1}& && &\textbf{Q2}& && &\textbf{Q3}& && &&}\)

Q1 = 3.0
Q3 = 3.75

Inter quartile range (IQR) = Q3 鈥 Q1 = 3.75 鈥 3.0 = 0.75 kg

Question

A school librarian records the number of books borrowed in a school year by some Year 9 pupils. She arranges the data from smallest to largest.

5鈥6鈥10鈥12鈥12鈥14鈥14鈥16鈥18鈥18鈥19鈥19鈥20鈥22鈥23鈥25鈥27鈥27鈥28

Identify the median value and the upper and lower quartiles.

Answer:

There are 19 values

\(5\ 6\ 10\ 12\ \fbox{12}\ 14\ 14\ 16\ 18\ \fbox{18}\ 19\ 19\ 20\ 22\ \fbox{23}\ 25\ 27\ 27\ 28\)

  1. median = 18 (the 10th value)
  2. lower quartile = 12 (the 5th value)
  3. upper quartile = 23 (the 15th value)
Back to top

Box plots

A box plot is a diagram which provides a quick visual summary of the distribution of a data set. It makes drawing conclusions easier and is useful for comparing two sets of data.

When data is presented as a list of numbers, it can be difficult to interpret.
Box plots summarises the data set using 5 key values.

  1. minimum
  2. maximum
  3. median
  4. lower quartile
  5. upper quartile

These can be found easily once the values are arranged in order.

The minimum value is the smallest number in the data setThe maximum value is the largest number in the data setThe median is the middle value.
50% of the data is larger than this value and 50% of the data is smaller.
The lower quartile is the middle value of those lower than the median.
75% of the data is higher than this value and 25% is lower than this value.
The upper quartile is the middle value of those higher than the median.
25% of the data is higher than this value and 75% is lower than this value.

Question

Blayne is captain of the school cross country team. He recorded the times taken by 17 members of the team in the last race.

19鈥42鈥24鈥35鈥26鈥27鈥30鈥40鈥33鈥34鈥15鈥36鈥36鈥20鈥33鈥42鈥28

Blayne wants to display these results in a box plot.

Solution:

Firstly, he must order the data:

15鈥19鈥20鈥24鈥26鈥27鈥28鈥30鈥33鈥33鈥34鈥35鈥36鈥36鈥40鈥42鈥42

Next find the 5 key values

Follow the steps to draw the box plot.

  1. minimum
  2. maximum
  3. median
  4. lower quartile
  5. upper quartile
[15] minimum 19 20 [24 26] lower quartile halfway between min & median 27 28 30 [33]median middle number for 17 values the middle is the 9th (17+1)梅2=9 33 34 35 [36 36] upper quartile halfway between maximum & median 40 42 [42] maximum
Image gallerySkip image gallerySlide 1 of 4, Scale of 15 - 45 in marked every 5, Step1: Draw a scale. This is just like the horizontal scale on a graph
Back to top

Interpreting box plots

The 5 key values, minimum, maximum, median, lower quartile and upper quartile can be easily read from box plots.

Box Plot above scale 50 - 90 in intervals of 10
  1. Estimate the median mass of the football players?
  2. What is the mass of the lightest player?
  3. 75% of the players are over 70 kg. True or false.
  4. Estimate the interquartile range.

Answer:

  1. The median is indicated by the line inside the box. It is approximately 76 kg.
Box Plot above scale 50 - 90 in intervals of 10 line from plot to approx 76 kg on scale
  1. The lightest player is 50 kg.
Box Plot above scale 50 - 90 in intervals of 10 line down from plot to approx 50 kg on scale
  1. FALSE - 75% of the players are over the lower quartile which is approximately 57 kg.
Box Plot above scale 50- 90 - intervals of 10, area above approx 57 to 88kg shaded
  1. The upper quartile is 84 and the lower quartile is 57.
    Interquartile range = upper quartile - lower quartile
    IQR = 84 鈥 57 = 27

Question

The box plot shows the marks in a science test for a group of students.

Scale of 10 - 40 with box plot above
  1. What is the range of the data?
  2. What is the lower quartile?
  3. Calculate the interquartile range of the data.
  4. The top 25% of the pupils get an A. What mark was required for an A?

Back to top

Comparing data sets using box plots

Box plots can be used to compare two or more sets of data.

Example

The box plots show goals scored by two netball teams.

Box Plot SCALE 1 - 15 - two box plots labelled Team 1, (minimum 8, maximum 15, lower quartile 10, upper quartile 14, median 11) and Team 2, (minimum 4, maximum 11, lower quartile 6, upper quartile11, median 10)

Compare the teams using the box plots.

Answer:

Team 1 had a higher median and their interquartile range was smaller. This would suggest that they are likely to score more goals than team 2 and are more consistent.

Back to top

Cumulative frequency

Cumulative frequency is a running total of the frequencies. The running total is calculated and recorded in an extra column on a frequency table. It can also be represented on a graph by plotting the upper boundary of the groups.

Example

The table below shows the lengths of 40 babies at birth.

To calculate the cumulative frequencies, add the frequencies together.

Length (cm)FrequencyCumulative frequency
30 < l 鈮 3544
35 < l 鈮 401014 (4+10=14)
30 < l 鈮 451125 (14+11=25)
45 < l 鈮 501237 (25+12=37)
50 < l 鈮 55340 (37+3=40)

A cumulative frequency diagram is drawn by plotting the upper class boundary with the cumulative frequency. The upper class boundaries for this table are 35, 40, 45, 50 and 55.

Cumulative frequency is plotted on the vertical axis and length is plotted on the horizontal axis.

Length vs Cumulative frequency graph

Finding averages from a cumulative frequency

A cumulative frequency diagram is a good way to represent data to find the median, which is the middle value.

To find the median value, draw a line across from the middle value of the table. In the example above, there are 40 babies in the table. The middle of these 40 values is the 20th value, so go across from this value and find the median length.

Back to top

Finding the interquartile range

A cumulative frequency diagram is also a good way to find the interquartile range, which is the difference between the upper quartile and lower quartile.

The interquartile range is a measure of how spread out the data is. It is more reliable than the range because it does not include extreme values. A high value for the interquartile range shows that the data is spread out. A low value for the interquartile range means the data is closer together or more consistent.

Example

There are 40 babies in the table, so to find the lower quartile, find 录 of 40, which is the 10th value. Reading from the graph, the lower quartile is 38.

To find the upper quartile, find 戮 of 40, which is the 30th value. Reading from the graph, the upper quartile is 47.

The interquartile range is the upper quartile 鈥 the lower quartile, so for this data the interquartile range is 47 - 38 =9.

Length vs Cumulative frequency (lower quartile, median and upper quartile)
Back to top

Drawing a box plot from a cumulative frequency

Drawing a box plot from a cumulative frequency graph is straightforward as long as the median and quartiles have been found.

The guideline for median, lower quartile and upper quartile can be used to plot the sections of the box plot. The minimum and maximum values of the box plot are where the cumulative frequency begins and ends.

Box plot and line graph
Back to top

Test yourself

Back to top