Central Limit Theorem (CLT) & Control Charts Formulae and Constants
Central Limit Theorem (CLT)
Several years ago, I used a simulation to show that the Central Limit Theorem (CLT) could be confirmed without recourse to mathematics. The CLT enables us to estimate the population parameters from the sample statistics.
The CLT states that as the sample (subgroup) size increases, the distribution of sample means (Xbar) can be approximated by a normal distribution with mean µ and range standard deviation σ÷√n, where .. .. ..
µ is the population mean
σ is the population standard deviation
n is the sample (subgroup) size
k is the number of subgroups
average of sample means: Xbarbar = population mean µ
standard deviation of the sample means: σXbar = σ÷√n
As a first step, I had Minitab® generate 1,000,000 (k) sets of sample size (n) of 5 using these normal distribution parameters –
Mean = 0.0
Standard Deviation = 1.0
The “sample values” were stored consecutively in five rows.
Mean of subgroup means ≈ 0 (zero)
StDev of subgroup means = 0.4469 ≈ 0.447
emember that the Central Limit Theorem (CLT) states that –
mean of subgroup means: Xbarbar = population mean µ
standard deviation of the subgroup means: σXbar = σ ÷ √n
My simulation (1,000,000 sets of sample size (n = 5) shown that –
Xbarbar = µ (i.e. -0.00004763 ≈ 0) CLT = 0
σXbar = σ ÷ √n (i.e. 0.4469 ≈ 0.447) CLT = 1 ÷ √5 ≈ 0.447
QED we’ve (empirically) ‘proved’ the CLT for n = 5
Control Charts Formulae and Constants
I’m sure we’re all aware that it’s easier (and certainly more economical) to work with 'averages' and 'ranges'! The following basic formulae relate to samples and ranges –
Sample (subgroup) average Xbar
Xbar = (x1 + x2 + … + xn) ÷ n (reminder: n – sample size)
Grand average Xbarbar
Xbarbar = (Xbar-1 + Xbar-2 + … + Xbar-k) ÷ k (k – # of subgroups)
Range (subgroup range) R
R = xMax – xMin
Average Range Rbar
Rbar = (R1 + R2 + … + Rk) ÷ k (k – number of subgroups)
It should be obvious that the population standard deviation (σ) and mean range (Rbar) are both measures of the variation (or spread) of a data set.
standard deviation (σ) – population – to be estimated
mean range (Rbar) – sample (subgroup) – may be calculated from (sample) data
There is a ‘mathematical’ relationship between Rbar (the mean range for data from ‘samples’ – size n) and σ (the standard deviation of the ’population’).
this relationship depends only on the sample size, n
The mean range (Rbar) is d2σ where the value of d2 (which is a ‘constant’) is also a function of n.
an ‘estimator’ of σ = Rbar / d2
Also, the population standard deviation (σ) and the standard deviation of range values (σR) are again both measures of the variation (or spread) of a data set.
There is ‘another mathematical’ relationship between σR (the range values StDev for data from ‘samples’ – size n) and σ (the standard deviation of the ‘population’).
this relationship again depends again only on the sample size, n
The StDev of range values σR is d3σ where the value of d3 (which is another ‘constant’) is also a function of n, therefore, σR = d3 x σ
an ‘estimator’ of σ = σR / d3
Okay – two equations defining σ but we need to determine both d2 and d3
In demonstrating the CLT we used Minitab® to create 1,000,000 sets of sample values for n = 5 and their average range (Xbar).
Now let’s calculate the range of each of the sets of sample values (i.e. xmax - xmin) and store in a further column labeled Range.
Mean (of subgroup ranges) ≈ 2.326 – this is the constant d2 (n = 5)
StDev (of subgroup ranges) ≈ 0.864 – this is the constant d3 (n = 5)
We now have everything needed to manually create an Xbar-R Chart from scratch (for sample size n = 5) .. .. ..
Calculating the σXbar statistic from Rbar data
Estimate of the ‘population’ (of ranges) standard deviation –
σ = Rbar / d2 .. .. .. (1)
Remember: we do not know σ (the StDev of the population of ranges)
Estimate of the standard deviation (StDev) of Xbar using the CLT –
σXbar = σ / √n .. .. .. (2)
Reminder: The CLT applies regardless of the shape of the population’s distribution
Substitute σ from (1) into above (2) gives –
σXbar = (Rbar / d2) / √n .. .. .. (3)
Rationalizing, we have –
σXbar = Rbar / (d2 x √n) .. .. .. (4)
Defining the Xbar UCL, Centre Line & LCL
Reminder: Xbarbar is the average across all k subgroup averages (Xbar) and represents the process centre.
UCL~LCL = Xbarbar ± 3 σXbar
… substitute for σXbar = Rbar / (d2 x √n)] from (4)
UCL = Xbarbar + 3 x [Rbar / (d2 x √n)]
Centre Line = Xbarbar
LCL = Xbarbar – 3 x [Rbar / (d2 x √n)]
Tables of Constants define the ‘factor’ A2 where A2 = 3 ÷ [(d2 x √n)]
UCL = Xbarbar+ A2 Rbar
Centre Line = Xbarbar
LCL = Xbarbar – A2 Rbar
From our simulation –
d2 = 2.326 (mean of ranges)
d3 = 0.864 (StDev of ranges)
d3/d2 = 0.3714531384350817
A2 = 3 ÷ (d2x√n) = 3 ÷ (2.326 x 2.236) = 0.5768 ≈ 0.577
From Table of Constantans ~ A2 = 0.577 for n = 5
Defining the Rbar UCL, Centre Line & LCL
The standard deviation of the ‘range’ is –
σR = d3 x σ .. .. .. (4)
As the population standard deviation (σ) is unknown, we may estimate using –
σ = Rbar / d2 .. .. .. (1) substitute for σ in (4)
σR = d3 x Rbar / d2 .. .. .. (5)
UCL – Upper Control Limit
Centre Line
LCL – Lower Control Limit
UCL = Rbar + 3 σR = Rbar + 3 d3 x Rbar / d2
Centre Line = Rbar
LCL = Rbar – 3 σR = Rbar – 3 d3 x Rbar / d2
Rationalizing, we have –
UCL = Rbar (1 + 3 d3 / d2)
Centre Line = Rbar
LCL = Rbar (1 – 3 d3 / d2)
Tables define the ‘factors’ D4 and D3 to be –
D4 = (1 + 3 d3 / d2)
D3 = (1 – 3 d3 / d2)
UCL = Rbar D4
Centre Line = Rbar
LCL = Rbar D3
Table of constants for Xbar and R control charts
From our simulation –
d2 = 2.326 (mean of ranges)
d3 = 0.864 (StDev of ranges)
d3/d2 = 0.3714531384350817
From Table of Constantans for n = 5 –
d2 = 2.326
d3 = 0.864
d3/d2 = 0.3714531384350817
Conclusion & Notes
You should now understand how the constants d2 and A2 come from, and be confident to use them to deploy Xbar & R Charts with confidence.
Xbar Chart
Indicates how the average or mean changes over time. It is utilized to monitor the process mean when calculating subgroups at regular intervals from a process.
The Xbar Chart is typically combined with an R Chart to monitor process variables. If the variable is not under control, then control limits might be too general, which means that causes of variation that are affecting the process mean can’t be pinpointed.
Each point on the chart acts as a subgroup mean (Xbar) value. The process mean (Xbarbar) is the centre line, and if this isn’t specified, then it’s the weighted mean of the subgroup means.
R Chart
Indicates how the range of the subgroups changes over time. This is utilized to monitor process variability, like the range, when measuring subgroups less than ten at regular intervals in a process. Each point on a chart represents the subgroup range (R) value (xMax – xMin).
The range statistic expected value is the centre line for each subgroup. The centre line differs when subgroup sizes are not equal.
Important notes
A process that is “in statistical control” means that the process is stable, and it is predictable
Just because a process is stable does not mean it has a zero-defect process
Process capability (Cpk >1.34) must first be established within 3-sigma limits as a minimum!
Remember to NEVER put specifications on any kind of control chart (e.g. Xbar & R chart)
The points on the chart are comprised of averages, not individuals. Specification limits are based on individuals, not averages
The operator might have the tendency to not react to a point that is out of control when the point is within the specification limits
Xbar & R charts help to avoid unnecessary adjustments in the process
It must be remembered that process control charts (e.g. Xbar & R) do not consider the actual component nominal and tolerances – they only monitor continuing ‘statistical control’ of the process
For full details read my LinkedIn ARTICLE