Currency: GBP

# Central Limit Theorem (CLT) & Control Charts Formulae and Constants

Published
Author David Scrimshire

## Central Limit Theorem (CLT)

Several years ago, I used a simulation to show that the Central Limit Theorem (CLT) could be confirmed without recourse to mathematics. The CLT enables us to estimate the population parameters from the sample statistics.

The CLT states that as the sample (subgroup) size increases, the distribution of sample means (Xbar) can be approximated by a normal distribution with mean µ and range standard deviation σ÷√n, where .. .. ..

• µ is the population mean

• σ is the population standard deviation

• n is the sample (subgroup) size

• k is the number of subgroups

average of sample meansXbarbar population mean µ

standard deviation of the sample meansσXbar = σ÷√n

As a first step, I had Minitab® generate 1,000,000 (k) sets of sample size (n) of 5 using these normal distribution parameters –

• Mean = 0.0

• Standard Deviation = 1.0

The “sample values” were stored consecutively in five rows.

• Mean of subgroup means ≈ 0 (zero)

• StDev of subgroup means = 0.4469 ≈ 0.447

emember that the Central Limit Theorem (CLT) states that –

• mean of subgroup means: Xbarbar = population mean µ

• standard deviation of the subgroup means: σXbar = σ ÷ √n

My simulation (1,000,000 sets of sample size (= 5) shown that –

• Xbarbar = µ (i.e. -0.00004763 ≈ 0) CLT = 0

• σXbar = σ ÷ √n (i.e. 0.4469 ≈ 0.447) CLT = 1 ÷ √5 ≈ 0.447

QED we’ve (empirically) ‘proved’ the CLT for n = 5

## Control Charts Formulae and Constants

I’m sure we’re all aware that it’s easier (and certainly more economical) to work with 'averages' and 'ranges'! The following basic formulae relate to samples and ranges –

Sample (subgroup) average Xbar

Xbar = (x1 + x2 + … + xn÷ n (reminder: n – sample size)

Grand average Xbarbar

Xbarbar = (Xbar-1 + Xbar-2 + … + Xbar-k÷ k (k – # of subgroups)

Range (subgroup range) R

R = xMax – xMin

Average Range Rbar

Rbar = (R1 + R2 + … + Rk÷ k (k – number of subgroups)

It should be obvious that the population standard deviation (σ) and mean range (Rbar) are both measures of the variation (or spread) of a data set.

• standard deviation (σ) – population – to be estimated

• mean range (Rbar) – sample (subgroup) – may be calculated from (sample) data

There is a ‘mathematical’ relationship between Rbar (the mean range for data from ‘samples’ – size n) and σ (the standard deviation of the population’).

this relationship depends only on the sample size, n

The mean range (Rbar) is d2σ where the value of d2 (which is a ‘constant’) is also a function of n.

an ‘estimator’ of σ = Rbar / d2

Also, the population standard deviation (σ) and the standard deviation of range values (σR) are again both measures of the variation (or spread) of a data set.

There is ‘another mathematical’ relationship between σR (the range values StDev for data from ‘samples’ – size n) and σ (the standard deviation of the ‘population’).

this relationship again depends again only on the sample size, n

The StDev of range values σR is d3σ where the value of d3 (which is another ‘constant’) is also a function of n, therefore, σR = d3 x σ

an ‘estimator’ of σ = σR / d3

Okay – two equations defining σ but we need to determine both d2 and d3

In demonstrating the CLT we used Minitab® to create 1,000,000 sets of sample values for n = 5 and their average range (Xbar).

Now let’s calculate the range of each of the sets of sample values (i.e. xmax - xmin) and store in a further column labeled Range.

• Mean (of subgroup ranges) ≈ 2.326 – this is the constant d2 (n = 5)

• StDev (of subgroup ranges) ≈ 0.864 – this is the constant d3 (n = 5)

We now have everything needed to manually create an Xbar-R Chart from scratch (for sample size n = 5) .. .. ..

### Calculating the σXbar statistic from Rbar data

Estimate of the ‘population’ (of ranges) standard deviation –

σ = Rbar / d2                              ..  ..  .. (1)

Remember: we do not know σ (the StDev of the population of ranges)

Estimate of the standard deviation (StDev) of Xbar using the CLT

σXbar σ / √n                           ..  ..  .. (2)

Reminder:  The CLT applies regardless of the shape of the population’s distribution

Substitute σ from (1) into above (2) gives –

σXbar = (Rbar / d2) / √n           ..  ..  .. (3)

Rationalizing, we have –

σXbar = Rbar / (d2 x √n)          ..  ..  .. (4)

### Defining the Xbar UCL, Centre Line & LCL

Reminder: Xbarbar is the average across all k subgroup averages (Xbar) and represents the process centre.

UCL~LCL = Xbarbar ± 3 σXbar

… substitute for σXbar = Rbar / (d2 x √n)] from (4)

UCL         = Xbarbar 3 x [Rbar / (d2 x √n)]

Centre Line = Xbarbar

LCL         = Xbarbar – 3 x [Rbar / (d2 x √n)]

Tables of Constants define the ‘factor’ A2 where A2 = 3 ÷ [(d2 x √n)]

UCL         = Xbarbar+ A2 Rbar

Centre Line = Xbarbar

LCL         = Xbarbar – A2 Rbar

From our simulation –

d2 = 2.326 (mean of ranges)

d3 = 0.864 (StDev of ranges)

d3/d2 = 0.3714531384350817

A2 = 3 ÷ (d2x√n) = 3 ÷ (2.326 x 2.236) = 0.5768 ≈ 0.577

From Table of Constantans ~ A2 = 0.577 for n = 5

### Defining the Rbar UCL, Centre Line & LCL

The standard deviation of the ‘range’ is –

σR = d3 x σ                   ..  ..  .. (4)

As the population standard deviation (σ) is unknown, we may estimate using –

σ = Rbar / d2                   ..  ..  .. (1) substitute for σ in (4)

σR = d3 x Rbar / d2         ..  ..  .. (5)

• UCL – Upper Control Limit

• Centre Line

• LCL – Lower Control Limit

UCL = Rbar + 3 σR  = Rbar + 3 d3 x Rbar / d2

Centre Line = Rbar

LCL = Rbar – 3 σR    = Rbar – 3 d3 x Rbar / d2

Rationalizing, we have –

UCL = Rbar (1 + 3 d3 / d2)

Centre Line = Rbar

LCL = Rbar (1 – 3 d3 / d2)

Tables define the ‘factors’ D4 and D3 to be –

D4 = (1 + 3 d3 / d2)

D3 = (1 – 3 d3 / d2)

UCL         = Rbar D4

Centre Line = Rbar

LCL         = Rbar D3

### Table of constants for Xbar and R control charts

From our simulation –

• d2 = 2.326 (mean of ranges)

• d3 = 0.864 (StDev of ranges)

• d3/d2 = 0.3714531384350817

From Table of Constantans for n = 5

• d2 = 2.326

• d3 = 0.864

• d3/d2 = 0.3714531384350817

### Conclusion & Notes

You should now understand how the constants d2 and A2 come from, and be confident to use them to deploy Xbar & R Charts with confidence.

Xbar Chart

Indicates how the average or mean changes over time. It is utilized to monitor the process mean when calculating subgroups at regular intervals from a process.

The Xbar Chart is typically combined with an R Chart to monitor process variables. If the variable is not under control, then control limits might be too general, which means that causes of variation that are affecting the process mean can’t be pinpointed.

Each point on the chart acts as a subgroup mean (Xbar) value. The process mean (Xbarbar) is the centre line, and if this isn’t specified, then it’s the weighted mean of the subgroup means.

R Chart

Indicates how the range of the subgroups changes over time. This is utilized to monitor process variability, like the range, when measuring subgroups less than ten at regular intervals in a process. Each point on a chart represents the subgroup range (R) value (xMax – xMin).

The range statistic expected value is the centre line for each subgroup. The centre line differs when subgroup sizes are not equal.

### Important notes

• A process that is “in statistical control” means that the process is stable, and it is predictable

• Just because a process is stable does not mean it has a zero-defect process

• Process capability (Cpk >1.34) must first be established within 3-sigma limits as a minimum!

• Remember to NEVER put specifications on any kind of control chart (e.g. Xbar & R chart)

• The points on the chart are comprised of averages, not individuals. Specification limits are based on individuals, not averages

• The operator might have the tendency to not react to a point that is out of control when the point is within the specification limits

• Xbar & R charts help to avoid unnecessary adjustments in the process

It must be remembered that process control charts (e.g. Xbar & R) do not consider the actual component nominal and tolerances – they only monitor continuing ‘statistical control’ of the process