A new distribution-free Phase-I procedure for bi-aspect monitoring based on the multi-sample Cucconi statistic

https://doi.org/10.1016/j.cie.2020.106760Get rights and content

Highlights

  • This paper introduces a new distribution-free Phase-I scheme for location and scale.

  • This is a one-chart scheme based on the multi-sample Cucconi statistic.

  • This scheme is capable of accommodating complex situations of unequal subgroup sizes.

  • A variety of patterned shifts are considered in the simulation study.

  • This scheme compares favourably against its competitors in terms of various metrics.

Abstract

The Phase-I analysis is necessary for verification of process stability based on the collected data and for establishing a reference sample for Phase-II monitoring. A large number of schemes designed for this purpose assume normality. In various practical applications, however, process distributions do not always meet the assumption of normality. Therefore, distribution-free schemes are appealing in practice for Phase-I analysis. Only a few distribution-free procedures, to our knowledge, can determine the stability of process location or scale parameters simultaneously via Phase-I analysis. The present research focuses on a new distribution-free Phase-I procedure for dual monitoring of two aspects of a process, namely, location and scale, using a single chart. To this end, we propose a distribution-free Phase-I scheme for bi-aspect monitoring using the multi-sample Cucconi statistic. In-control and out-of-control properties of the new Phase-I Cucconi scheme are studied. The proposed scheme appears to be as good or better than its competitors in terms of various performance metrics in a large number of situations. We demonstrate the design and implementation of the proposed scheme with a practical example.

Introduction

Statistical process monitoring (SPM) schemes are widely used in monitoring the stability of a process in diverse fields of applications, from monitoring product or service quality to environmental pollution or network traffic. In general, process monitoring comprises of two phases: Phase I and Phase II. During Phase I, practitioners sequentially collect and analyze a set of sample observations to assess the process characteristics. The Phase-I analysis, also known as the retrospective analysis, aims at understanding the sources of process variability and evaluating process stability. See Jones-Farmer, Woodall, Steiner, and Champ (2014) for more details. Phase-I analysis also helps in selecting a suitable in-control (IC) reference (training) sample or in estimating the right model for the underlying process distribution. Further, during Phase-I analysis, the practitioners attempt to identify the possible presence of one or more abnormal observations as a result of some assignable causes and discard them. More often, by repeating Phase-I analysis and removing extreme values, we may determine a reference sample for benchmarking that represents the IC process characteristics and use it for subsequent Phase-II analysis. Then, Phase-II monitoring aims at observing incoming data streams and monitoring the process stability by comparing the newly collected sample with the benchmarked reference sample.

Woodall (2000) emphasized that Phase-I analysis and follow-up measures are often more critical than Phase-II monitoring and noted that this area of research is grossly neglected. He also pointed out that the Phase-I applications are imperative “in practical considerations of quality characteristic selection, measurement and sampling issues, and rational subgrouping”. Jones-Farmer et al. (2014), recommended using standard Phase-II SPM schemes to analyze Phase-I data retrospectively for determining the reference sample. They also pointed out that one should take great care to reduce and control the false alarm probability (FAP). Recently, Woodall, 2017, Testik et al., 2018, Li et al., 2019, among others, highlighted the importance of Phase-I analysis because a wrong benchmarking of the reference sample can lead to a bad Phase-II performance of the charting scheme. In other words, if the Phase-I sample does not reflect the characteristics of the process, we are likely to miss out a signal due to assignable cause during the Phase-II monitoring, or we may receive too many false alarms. The literature on Phase-II SPM schemes is extensive, whereas that on Phase-I analysis is relatively limited. A large proportion of Phase-I control schemes are parametric and assume a particular distribution function for the underlying processes. A widespread assumption is that the process characteristic is normally distributed, as in Champ and Jones (2004). However, there is often little, or no information about the process distribution, whether or not the assumption of normality is valid before a Phase-I analysis. Woodall, 2000, Capizzi, 2014, among others, emphasized that making a distributional assumption is not justifiable before the establishment of process stability via Phase-I analysis. Distribution-free schemes are inherently IC robust irrespective of the functional form and underlying characteristics of the process distribution, and consequently, distribution-free procedures are more appealing than parametric ones for Phase-I analysis. Distribution-free Phase-I schemes are, therefore, a natural option for Phase-I analysis. Recent years witnessed the development of some nonparametric schemes useful in Phase-I analysis. Jones-Farmer et al. (2014) provided an excellent synopsis of Phase-I literature until about 2014. See Abbasi et al., 2015, Cheng and Shiau, 2015, Coelho et al., 2015, Ning et al., 2015, Capizzi and Masarotto, 2017, Li et al., 2019 for recent research on distribution-free Phase-I monitoring. For some other nonparametric schemes, readers may see, among others, Abbasi et al., 2017, Abid et al., 2017, Abid et al., 2018, Riaz et al., 2019.

In Phase I, many practitioners wish to jointly assess and monitor two aspects of the process, the location and the scale, assuming that the distribution of the underlying process characteristic is normal. The joint X¯ and S scheme is a simple scheme for bi-aspect monitoring under the normality assumption. Among the early works on distribution-free approaches for Phase-I analysis based on two charts, we find Jones-Farmer and Champ (2010), that proposed a bi-aspect monitoring scheme, known as the RANK scheme. Their scheme simultaneously operates the mean-rank chart as in Jones-Farmer, Jordan, and Champ (2009) for location parameter and the scale-rank chart. Another significant contribution to joint Phase-I monitoring is by Capizzi and Masarotto (2013). They designed the RS/P scheme for Phase-I monitoring and developed the contributory “rsp” R package for practical execution. The RS/P scheme is also a two-chart scheme, but it relies on recursive segmentation and permutation instead of ordinary ranking. The RS/P scheme is particularly useful if there is a sustained shift. However, we noticed that the RS/P scheme often fails to identify the actual problematic sample in case of an isolated shift and that its behaviour is also markedly affected by the position of the change point, that is, by the inertia effect. Some authors, like Gan (1997), criticized the use of schemes based on two-isolated charts in the context of bi-aspect monitoring because such schemes overlook the impact of change in one aspect on the other. The RS/P scheme, being a two-chart scheme, suffers from similar drawbacks. Moreover, Li et al. (2019) emphasized that plotting a single statistic that reflects the influence of both location and scale aspects is more appropriate than two-chart schemes.

Traditionally, most of the conjoint test statistics for equality of location and scale parameters of two samples are of a quadratic form involving two orthogonal rank-statistics, one for the location aspect and the other for the scale aspect. For example, the familiar Lepage statistic is the squared Euclidean distance of the standardized Wilcoxon rank-sum (WRS) statistic and the standardized Ansari-Bradley (AB) statistic from the origin. Mukherjee and Chakraborti, 2012, Chowdhury et al., 2015, Mukherjee and Marozzi, 2017a, Mukherjee and Marozzi, 2017b, Mukherjee and Marozzi, 2017a, Chong et al., 2017, Chong et al., 2018, Mukherjee and Sen, 2018, Song et al., 2019, among others, developed Phase-II SPM schemes using the Lepage and Lepage-type statistics for bi-aspect monitoring. Li et al. (2019) first proposed a single charting scheme for bi-aspect Phase-I analysis based on a multi-sample version of the Lepage statistic from Rublík (2005).

One-chart schemes for Phase-I analysis, such as the Phase-I Lepage chart, have some benefits over two-chart schemes. The first advantage of one-chart schemes is in the determination of control limits. Only one set of control limits is required instead of two sets of control limits for two charts. If the FAP of individual charts is set to α for a two-chart scheme, overall FAP is much higher than α. In the case of independent location and scale statistics, overall FAP is 2α-α2. If they are not independent, things become more complex. Therefore, one-chart schemes are often easier to design and use. Secondly, the inherent assumption of charts based on location statistics, such as the Kruskal-Wallis (KW) statistic, is the stability of the scale. When the scale parameter varies, the use of such statistics are not recommended (somewhat statistically unethical that many practitioners often do not realize). Similarly, the inherent assumption of charts based on scale statistics is the stability of the location; and when location varies, the use of such statistics are not advised. This fact is well established in many statistical works of the literature on simultaneous inference.

In recent years, several authors, for example, Marozzi (2013), showed that the Cucconi (1968) statistic is often as worthy or better than the Lepage statistic in capturing process shifts in bi-aspect monitoring. The Cucconi statistic has a fascinating history. Cucconi (1968) designed it using ranks and anti-ranks and not as a quadratic combination of two statistics. The original paper appeared at an Italian journal and in the Italian language. Therefore, the international scientific community was not aware of it. Marozzi, 2009, Marozzi, 2013 popularised the two-sample Cucconi statistic outside Italy, where many researchers used the Cucconi statistic in developing SPM schemes. Chowdhury et al., 2014, Mukherjee and Marozzi, 2017b, Mahmood et al., 2017 considered Phase-II joint monitoring schemes based on the two-sample Cucconi statistic. Song, Mukherjee, Marozzi, and Zhang (2020) showed that the Cucconi statistic is also a quadratic form of a location and a scale statistic and could be an elegant choice for Phase-II monitoring when a training sample is available from the IC population. Researchers in various works established that the Phase-II SPM schemes based on the Cucconi statistic compete well with the SPM procedures involving the Lepage or Lepage-type statistics. The latest discussion to this end appeared in Xiang, Gao, Li, Pu, and Dou (2019), and some of the references therein. However, to date, no Phase-I SPM scheme used the Cucconi statistic for bi-aspect process monitoring.

In this paper, we present a novel distribution-free SPM scheme for Phase-I monitoring and assessment considering the multi-sample Cucconi statistic as the pivot. The multi-sample Cucconi statistic, introduced by Marozzi (2014), is an extension of the two-sample Cucconi statistic. Our proposed procedure is a single chart scheme and is suitable for simultaneously detecting subgroup location and scale shifts. The new scheme is not affected by the shortcomings of two-chart schemes, and it is a competitive alternative to the Phase-I Lepage scheme introduced by Li et al. (2019) in different situations. In Section 2, we revisit the multi-sample extension of the Lepage statistic and the corresponding Phase-I Lepage chart as in Li et al. (2019). Section 3 offers a brief overview of the multi-sample Cucconi statistic. We explain the design and implementation algorithm of the proposed Phase-I Cucconi scheme in Section 4 and discuss the determination of its control limits. In Section 5, we study and compare the IC and out-of-control (OOC) performance of the Phase-I Cucconi and the Lepage schemes using Monte-Carlo simulations. We discuss a real example and some practical advantage of the proposed scheme in Section 6. Section 7 summarises the results and suggests some directions for future research.

Section snippets

Review of the multi-sample Lepage statistic and the corresponding Phase-I scheme

Rublík (2005) extended the traditional two-sample Lepage test in a multi-sample situation combining the KW statistic for subgroup location along with the multi-sample version of the AB statistic for scale. Li et al. (2019) utilized this statistic to develop the Phase-I Lepage scheme. Now we review the multi-sample Lepage statistic.

Let Xi1,,Xini, i=1,2,,k denote a random sample of size ni from the ith subgroup. Here, Xiν’s are univariate and continuous having cumulative distribution function

The multi-sample Cucconi statistic

Observe that,Ej=1niRij2|IC=Ej=1ni(N+1-Rij)2|IC=niN+12N+16=μR(say),Varj=1niRij2|IC=Varj=1ni(N+1-Rij)2|IC=niN-niN+12N+18N+11180=σR2say.

See, Marozzi (2014) for proofs. Define, for i=1,2,,k,Ui=j=1niRij2-μRσR,Vi=j=1ni(N+1-Rij)2-μRσR,and note thatρ=CorUi,Vi|IC=-30N+14N2+198N+112N+1.

Marozzi (2014) introduced a multi-sample version of the Cucconi statistic asMC=1ki=1kUi2+Vi2-2ρUiVi2(1-ρ2)=i=1kCi.

Considering CLi=12kUi2, and CSi=Vi-ρUi22k(1-ρ2), we may write,Ci=1kUi2+Vi2-2ρUiVi2(1-ρ2)=CLi+CSi.

Implementation

The main idea is to use the multi-sample Cucconi statistic to develop a Phase-I scheme. The steps for implementing the Shewhart-type Phase-I Cucconi scheme to form a reference sample are outlined as follows.

  • 1.

    Consider a set of k subgroups, Xi1,,Xini with size ni, i=1,2,,k. Note that varying subgroup size is permitted, with ni=n, i=1,2,,k as the particular case where the subgroup size is constant.

  • 2.

    Compute the charting statistic Ci=CLi+CSi for i=1,2,,k, based on the set of data.

  • 3.

    Plot Ci, i=1,2,,k

Comparison study

In this section, we compare the proposed Phase-I Cucconi scheme with the other one-chart Phase-I scheme, i.e. the Phase-I Lepage scheme of Li et al. (2019). For brevity, we do not consider competing schemes other than the Phase-I Lepage scheme because Li et al. (2019) already compared it with some existing two-chart procedures for joint Phase-I analysis, including the RANK scheme as well as the RS/P scheme, and established that the Phase-I Lepage scheme is superior in many situations.

We design

Practical implementation

In this section, we illustrate a practical application of the proposed Phase-I Cucconi scheme. We consider a real dataset on the outer diameters of guide bush, given in Table 9 in Song et al. (2020). Song et al. (2020) indicated that guide bush is one of the critical components in a fuze 117 MK 20, and variations in the dimensional nature of guide bush (e.g. increment or reduction in the outer diameter) may lead to abnormal functioning. The target diameter for the guide bush is 27.03 mm, and

Concluding remarks

In this paper, we have offered an attractive distribution-free one-chart scheme for simultaneous assessment and control of location and scale parameters in Phase I. The proposed scheme uses the multi-sample version of the Cucconi statistic as the pivot. Its IC robustness is a significant advantage because, in many fields, observed data are not normal. The proposed scheme could be a natural choice when the assumption of normality is difficult to justify. The Phase-I Cucconi scheme works even if

CRediT authorship contribution statement

Chenglong Li: Software, Formal analysis, Investigation, Resources, Data curation, Writing - review & editing, Visualization, Funding acquisition. Amitava Mukherjee: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Visualization, Supervision, Project administration. Marco Marozzi: Supervision, Writing - review & editing.

Acknowledgement

The work described in this paper was supported by National Natural Science Foundation of China (No. 71801179).

References (43)

  • G. Capizzi

    Recent advances in process monitoring: Nonparametric and variable-selection methods for Phase-I and Phase II

    Quality Engineering

    (2014)
  • G. Capizzi et al.

    Phase-I distribution-free analysis of univariate data

    Journal of Quality Technology

    (2013)
  • G. Capizzi et al.

    Phase-I distribution-free analysis of multivariate data

    Technometrics

    (2017)
  • S. Chakraborti et al.

    Phase I statistical process control charts: An overview and some results

    Quality Engineering

    (2009)
  • C.W. Champ et al.

    Designing Phase I X-bar charts with small sample sizes

    Quality and Reliability Engineering International

    (2004)
  • Y. Chen et al.

    Cluster-based profile analysis in Phase I

    Journal of Quality Technology

    (2015)
  • C.-R. Cheng et al.

    A distribution-free multivariate control chart for Phase-I applications

    Quality and Reliability Engineering International

    (2015)
  • S. Chowdhury et al.

    A new distribution-free control chart for joint monitoring of unknown location and scale parameters of continuous distributions

    Quality and Reliability Engineering International

    (2014)
  • S. Chowdhury et al.

    Distribution-free Phase II CUSUM control chart for joint monitoring of location and scale

    Quality and Reliability Engineering International

    (2015)
  • M.L.I. Coelho et al.

    A comparison of Phase-I control charts

    South African Journal of Industrial Engineering

    (2015)
  • O. Cucconi

    Un nuovo test non parametrico per il confronto tra due gruppi campionari

    Giornale degli Economisti

    (1968)
  • Cited by (11)

    View all citing articles on Scopus
    View full text