Scatter Diagram (part 1: how to do it)

When investigating problems, typically when searching for their causes, it may be suspected that two items are related in some way. For example, it may be suspected that the number of accidents at work is related to the amount of overtime that people are working.

The Scatter Diagram helps to identify the existence of a measurable relationship between two such items by measuring them in pairs and plotting them on a graph, as in the figure below. This visually shows the correlation between the two sets of measurements, as in Figure 1.

If the points plotted on the Scatter Diagram are randomly scattered, with no discernible pattern, then this indicates that the two sets of measurements have no correlation and cannot be said to be related in any way. If, however, the points form a pattern of some kind, then this shows the type of relationship between the two measurement sets. The closer the points are to the line, the greater the correlation, as in Table 1.

Scatter Diagram

Degree of correlation

Interpretation

None

No relationship can be seen.

The y variable is not related to the x variable in any way.

Low

A vague relationship is seen.

There is a low positive correlation between the x variable and the y variable. There might be some connection between the two, but it is not clear.

High

The points are grouped into a clear linear shape.

The two variables are clearly related in some way. Given one, you can predict a moderate range in which the other will be found.

Perfect

All points lie on a line (which is usually straight).

The variables are deterministically related, and given one you can predict the other with accuracy.

The correlation in the above table all goes from low on the left in a line to high on the right. This is not always the shape of a correlation, as is shown in Table 2. Correlations can be positive or negative, linear or curved. They also do not go on forever, and using them to predict values outside the measured range is always hazardous, as is indicated in the ‘part-linear’ example.

Scatter Diagram

Type of correlation

Interpretation

Positive

Straight line, sloping up from left to right.

Increasing the value of the 'cause' results in a proportionate increase in the value of the 'effect'.

Negative

Straight line, sloping down from left to right.

Increasing the value of the 'cause' results in a proportionate decrease in the value of the 'effect'.

Curved

Various curves, typically U- or S-shaped.

Changing the value of the 'cause' results in the 'effect' changing differently, depending on the position on the curve.

Part-linear

Part of the diagram is a straight line (sloping up or down).

May be due to breakdown or overload of y variable, or is a curve with a part that approximates to a straight line (which may be treated as such).

(a) There is a cause and effect relationship between the two measured items, where one is causing the other (at least in part).

(b) The two measured items are both caused by a third item. For example, a Scatter Diagram which shows a correlation between cracks and transparency of glass utensils because changes in both are caused by changes in furnace temperature.

The trap than many have fallen into is to take data that fits into (c) and assume that it falls into (a). For example, you could find a strong positive correlation between people drowning and sales of ice cream. Does this mean that the ice cream gives you stomach cramps and you accidentally drown? Do suicial people have ‘one last ice cream’? Neither. In fact both have a common cause. When it is sunny and warm, people eat more ice cream. They also go swimming.

This misunderstanding is exacerbated by the use of a Cartesian x-y graphical form, where in many cases, x is the independent variable which, when varied, causes the y variable to vary. In a Scatter Diagram, which variable is put on which axis is a matter of choice rather than form, as both variables are independent. Having said, this, the Scatter Diagram may be used to give evidence for a cause and effect relationship, but they alone do not prove it. Usually, it also requires a good understanding of the system being measured, and may required additional experiments.

A correlation coefficient may be calculated for a set of point, which indicates mathematically how closely they correlate. No correlation gives a correlation coefficient of zero. For the perfect line in the table, the coefficient is 1. If the line pointed down and to the right (and all points lay on the line), there would be a negative correlation with a coefficient value of -1.

In addition to the correlation coefficient, a ‘line of best fit’ or regression line may be drawn through the points to show the 'average' position.

This article first appeared in Quality World, the journal of the Institute for Quality Assurance

Scatter Diagram (part 1: how to do it)

Site Menu

You can buy books here