大象传媒

Scatter graphs

Scatter graphs are a good way of displaying two sets of to see if there is a correlation, or connection.

Example

The amount of rainfall and the number of umbrellas sold each day is recorded for nine days.

Rainfall (mm)324005611
Umbrellas sold11025013247815
Rainfall (mm)
3
2
4
0
0
5
6
1
1
Umbrellas sold
1
10
25
0
1
32
47
8
15

We are most likely to be interested in whether the amount of rainfall affects the number of umbrellas sold. Therefore, rainfall goes on the horizontal axis of the scatter diagram. (If in doubt, it is usual for the top row of the table to go on the horizontal axis).

A scatter graph that plots how many umbrellas are sold in comparison to accumulated rainfall. The plot points show a positive correlation ie, at 2mm of rain 10 are sold and at 4mm 25 are sold.

The graph shows that there is a positive correlation between the amount of rainfall and the number of umbrellas sold. On days with higher rainfall, there were a larger number of umbrellas sold.

However, it is important to remember that correlation does not always imply causation. If data plotted on a scatter graph shows correlation, we cannot assume that the increase in one of the sets of data caused the increase or decrease in the other set of data 鈥 it might be coincidence or there may be some other cause that the two sets of data are related to.

Types of correlation

Graphs can either have positive correlation, negative correlation or no correlation.

Positive correlation means as one variable increases, so does the other variable. They have a positive connection.

Temperature vs ice creams sold graph

Negative correlation means as one variable increases, the other variable decreases. They have a negative connection.

Graph showing negative correlation between number of coats sold and rising temperatures

No correlation means there is no connection between the two variables.

Graph showing no correlation between house number and a person's IQ

Lines of best fit

A line of best fit, drawn by eye, is a sensible straight line that goes as centrally as possible through the points plotted. It should follow the same general gradient as the crosses and have roughly the same number of plotted points above the line as below.

Lines of best fit do not pass through any particular point. Common errors, for example, are to draw it from the origin or to make it pass through the first and last points.

Positive and negative lines on a single  graph

The line of best fit for the scatter graph would look like this:

A scatter graph that plots how many umbrellas are sold in comparison to accumulated rainfall. A line of best fit passes as centrally as possible through the points plotted.

Interpolation and extrapolation

From the diagram above, we can estimate how many umbrellas would be sold for different amounts of rainfall. For example, how many umbrellas would be sold if there was 3 mm of rainfall? What if there was 10 mm of rainfall?

To estimate the number sold for 3 mm of rainfall, we use a process called interpolation. The value of 3 mm is within the range of data values that were used to draw the scatter graph.

Draw a vertical line at 3 mm of rainfall until it meets the line of best fit. Then draw a line across until it meets the vertical axis. Then read off the number of umbrellas sold.

A graph estimates umbrellas sold for 3mm of rainfall using interpolation. A vertical line drawn at 3 mm meets the line of best fit in the centre and a line across meets the vertical axis giving 19.

An estimated 19 umbrellas would be sold if there was 3 mm of rainfall.

If there was 10 mm of rainfall, we could extend the graph and the line of best fit to read off the number of umbrellas sold. This gives a value of approximately 64 umbrellas sold.

This process is called extrapolation, because the value we are using is outside the range of data used to draw the scatter graph. Since 10 mm is much higher than the highest rainfall recorded, we cannot assume that the line of best fit would still follow the pattern when the rainfall is 10 mm, so the value of 64 umbrellas is not a reliable estimate.

A graph estimates umbrellas sold for 10mm of rainfall using extrapolation. A vertical line drawn at 10 mm meets the extended line of best fit and a line across meets the vertical axis giving 64.