Identifying relationships in data
It is important to be able to identity relationships in data. This allows trends to be recognised and may allow for predictions to be made. Relationships in data can be identified in several ways.
Scatter graphs
These graphs show the relationship between two sets of data, eg number of tourists and number of tourist facilities or weight and height.
A line of best fit, or trend line, can be added to the scatter graph to show the relationship between the two variables. When drawing a line of best fit or trend line it is important to have as many points as possible going through the line.
A strong correlation is when the points on the scatter graph lie very close to the line of best fit. With a strong correlation, the two variables are related to one another - as one changes, so does the other. A weak correlation is when the points lie far away from the line of best fit. In this case, the two variables are not necessarily related to one another - a change in one does not mean a change in the other.
Interpolate trends
This is when a value is found within the data set, using the line of best fit. The value was not originally plotted, but can be read off the line of best fit.
Extrapolate trends
This is when a value is found outside of the data set. Extrapolation may provide uncertain results as it is based on extending the line of best fit beyond a known set of data.
Spearman's rank correlation coefficient
Spearman's rank correlation coefficient offers the opportunity to use a statistical test to determine the strength of any relationship (correlation) between two sets of data. At least ten pairs of data and the following equation are needed:
\(r_{s}=1-\frac{6\Sigma~d^2}{n(n^2-1)}\)
危 means the sum of d2
n is the number of sets of paired data
d is the difference between pairs of ranked data
Spearman's rank always gives an answer between 鈭1 and +1. The numbers between are like a scale, where 鈭1 is a very strong link, 0 is no link and +1 is also a very strong link.
For example, if Spearman's rank was 0.8, because it is close to +1, it means that the link is strong and it is possible to say that those two sets of data are linked, and increase together. If it was 鈭0.8, it is possible to say it was linked and as one increases, the other decreases. If there is no relationship (correlation), a value close to 0 would be arrived at.