Significance testing of correlation coefficient

T. Dhasaratharaman*

Statistician, Kauvery Hospitals, India

*Correspondence: Tel: +91 90037 84310  Email: dhasa.cst@kauveryhospital.com

Where there is a linear relationship between two variables there is said to be a correlation between them. Examples are height and weight in children, or socio-economic class and mortality.

The strength of that relationship is given by the “correlation coefficient”.

Interpretation

The correlation coefficient is usually denoted by the letter “r” for example r = 0.8.

SPSS 23.0 Software – Analyze – Correlate – Bivariate.

positive correlation coefficient means that as one variable is increasing the value for the other variable is also increasing – the line on the graph slopes up from left to right. Height and weight have a positive correlation: children get heavier as they grow taller.

negative correlation coefficient means that as the value of one variable goes up the value for the other variable goes down – the graph slopes down from left to right. Higher socio-economic class is associated with a lower mortality, giving a negative correlation between the two variables.

signature-testing-image1

If there is a perfect relationship between the two variables, then r = 1 (if a positive correlation) or r = -1 (if a negative correlation).

If there is no correlation at all (the points on the graph are completely randomly scattered) then r = 0.

The following is a good rule of thumb when considering the size of a correlation:

r = 0-0.2: very low and probably meaningless.

r = 0.2-0.4: a low correlation that might warrant further investigation.

r = 0.4-0.6: a reasonable correlation.

r = 0.6-0.8: a high correlation.

r = 0.8-1: a very high correlation. Possibly too high! Check for errors or other reasons for such a high correlation.

The same applies to negative correlations too.

Example 1

A nurse wanted to be able to predict the laboratory HbA1c result (a measure of blood glucose control) from the fasting blood glucoses which she measured in her clinic. On 12 consecutive diabetic patients she noted the fasting glucose and simultaneously drew blood for HbA1c. She compared the pairs of measurements and drew the graph.

signature-testing-image2

For these results r = 0.88, showing a very high correlation.

A graph like this is known as a “scatter plot”.

Example 2

An occupational therapist developed a scale for measuring physical activity and wondered how much it correlated to Body Mass Index (BMI) in 12 of her adult patients.

signature-testing-image3

In this example, r = -0.34, indicating a low correlation.

The fact that the r value is negative shows that the correlation is negative, indicating that patients with a higher level of physical activity tended to have a lower BMI.

Caution

Correlation tells us how strong the association between the variables is, but does not tell us about cause and effect in that relationship.

The “Pearson correlation coefficient”, Pearson’s r, is used if the values are sampled from “normal” populations. Otherwise the “Spearman correlation coefficient” is used. However, the interpretation of the two is the same.

Where the author shows the graph, you can get a good idea from the scatter as to how strong the relationship is without needing to know the r value.

It is very easy for authors to compare a large number of variables using correlation and only present the ones that happen to be significant. So, check to make sure there is a plausible explanation for any significant correlations.

Also bear in mind that a correlation only tells us about linear (straight line) relationships between variables. Two variables may be strongly related but not in a straight line, giving a low correlation coefficient.