When reading a
statistical analysis report, most people often have only superficial concept of
p-value and significance. However, in most statistical analysis reports, these
methods are used to interpret the meaning of the data. If you do not understand
these statistical concepts, you may mislead the analysis report and make a
wrong decision. First, define the so-called statistical hypothesis. In order to
judge whether accept or reject the hypothesis, the statistician divides the
statistical hypothesis into the Null Hypothesis, which is labeled as H0; the
Alternative Hypothesis, which is labelled as H1. In practice, we often set the
null hypothesis H0 to the factual state suspected by the position of the
verifier, which is the same as the principle of presumption of innocence of the
judiciary [2]. In terms of problem orientation, there are many problems need to
verify, such as, countless goods transactions happening in the industry every
day; the science and technology personnel of the science and technology
laboratory often have some ideas that dare not confirm their benefits; the
production process needs to monitor whether the quality is stable every day;
how producers guarantee quality before users use the products. If these
problems can be judged by intuition, we probably won't use statistical data to
judge, and the null hypothesis H0 is the truth that we can't judge by
intuition. Therefore, we set the null hypothesis of the goods transaction as if
the transaction lot is a good lot, the hypothesis of the scientific and
technological personnel's creativity is set to be the same as the benefit of the
control group, and the null hypothesis of monitoring the process quality is set
to the stability of the process quality, and null hypothesis of guaranteeing
the product quality is set to quality as standard. However, if these hypotheses
describing the truth are not quantified in statistical terms, we still cannot
compare them with the data, so the null hypothesis should be expressed in a
statistical language that is consistent with the probability distribution
calculation; so that the testing statistical hypothesis is carry out. For
example, the facts as described above are expressed in statistical language.
The so-called alternative hypothesis is a statistical hypothesis that is
contrary to the null hypothesis (Table 3).
In practice, the test criteria are used to judge
statistical hypotheses; it is often determined whether the H0 is rejected by
the degree of difference between H0 and the test criteria. The larger the
difference, the more H0 should be rejected. As for how the degree of difference
between H0 and the test criteria is calculated, we define the so-called
significance level, and the probability that the test criteria is present under
the H0 hypothesis, it is the so-called p-value. In general, the smaller the
chance, the less we believe that H0 is true. This is the same thinking way
about the truth of our human beings. We are usually observing the occurrence of
some events, subjectively, there will be a subjective spectrum in our mind, and
this spectrum is the null hypothesis H0. After data collection and analysis, if
the results of analysis are very different from our subjective spectrum, we
often say that it is too unreasonable! This method of judging the truth based
on factual data is the testing statistical hypothesis. Part of the research work
of statisticians is to find the probability distribution of the test criteria
and its inferences in various fields, and to derive the sampling distribution
to calculate its accuracy, precision and p-value. In general, p-value is small,
the difference is very significant, statisticians recommend p-value <0.05 or
0.01, and the most industry standards are also recommended. The implication is
that if the same random sampling is repeated 100 times under the null
hypothesis H0, the occurrence of test criteria is only 5 times or 1 time. And
we only have one such random sampling, which is rare under the null hypothesis
H0, so H0 should be rejected. The following is an example of some rules for
testing special causes of the control chart. Assume that the distribution of
process characteristics is normal and the process is under control, the
following are some rules for testing of special causes of control charts.
Table 3: Null hypothesis.
|
H0:
Factual state
|
H0:
Statistical language
|
|
H0: Transaction
lot is a good lot
|
H0: The percent of defective of Transaction lot p’?AQL
|
|
H0: The benefit of experimental
group and control group are the same
|
H0: means of experimental group
and control group are equal ?1=?2
|
|
H0: Process is stable
|
H0: Process mean ? and standare
deviation ? meet requirements
|
|
H0: Quality of product meet
requirement
|
H0: Product MTBF?10,000
hours
|
The producer's risk ? =
P [the scatter of points on the control chart | process is under control]. If the
probability of the scatter of points occurring is smaller than 0.5%, then we
can regard it as special causes. Take -chart as an example, between the upper
limit and center line, also lower limit and center lines are equally divided
into three zones, namely A, B, and C, that is, zone A is between 2 and 3 ; zone
B is between 1 and 2 ; zone C is between 0 and 1 . With the assumption of
normal distribution, the probability of occurrence of point drops in each zone
can be calculated. For example, the probability of a point drops above or below
the center line (outside of the C or in C zone) is 0.5; the point drops outside
of 1 (outside of the B or in B zone) is 0.1587; the point drops outside 2
(outside of the A or in A zone) is 0.0228; the point drops outside 3 (outside
the control limit) is 0.00135, shown as Figure 7. So, we can use testing
statistical hypothesis principle to detect the existence of special cause for
the process (Figure 7).
The following are the 8
rules for Testing Special Causes:
·
1
point in or outside A zone, p-value=0.00135×2=0.0027;
·
9
points in a row in or outside C zone, p-value=2×(0.5)9=0.0039;
·
6
points in a row, all increasing or all decreasing, p-value=2× (1/6?)=0.0028;
·
14
points in a row, alternating up and down; p-value=2×(0.5)13=0.00024;
·
2
out 3 points in or outside A zone (same side),
p-value=2×(3C2(0.0228)2×(0.9772)+3C3(0.0228)3)=0.0031;
·
4
out 5 points in or outside B zone (same side),
p-value=2×(5C4(0.1587)4×(0.8413)+5C5(0.1587)5)=0.0055
·
15
points in a row, within C zone (either side), p-value= (0.6826)15=0.0033;
8 points in a row, outside C zone (either side),
p-value= (0.3174)8=0.0001.
Figure 7: Zone chart.