## Contribution Plots

Multivariate quality control techniques use data from measurements of process variables, taking into account the correlation between process variables, to detect special causes affecting the process. Multivariate control charts such as SPE and T2 charts indicate when the process goes out of control, but they do not provide information on the source causes of abnormal process operation. The engineers and plant operators need to determine the actual problem once an out-of-control situation is indicated. Miller et al. [388, 389] have introduced variable contributions and contribution plots concept to address this need. The diagnosis activity can be done by determining which process variables have contributed to inflate Z)-statistic (or T2), squared prediction error Q-statistic (or SPE) and scores and use the knowledge of plant personnel to relate these process variables to various equipment failures and disturbances.

Contributions of process variables to the Q-statistic. Contribution to Q-statistic can either be calculated for the whole batch or for a time period during that batch. The Q-statistic for a new batch is calculated as

where xnew is the vector (1 x JK) of predicted values of the (centered and scaled) data for the new batch and encwJ-fc is the residuals vector. An inflated Q-statistic suggests that the new observation does not follow the same covariance structure as that of the reference set that defines NO. This usually happens when there is a sensor failure or a shift in the process. If the Q-statistic for a batch represents an out-of-control situation, the process variables responsible for inflating the Q-statistic are diagnosed by computing the variable contributions to Q-statistic as jk

resulting in a (1 x J) vector of contributions from J variables over the entire batch. When deviations from NO are small and last for short periods of operation, this measure will not indicate the responsible variable(s) explicitly due to the masking effect from the contributions of other variables. To overcome this problem, the contribution of process variable j at time period k to the Q-statistic is calculated as

where xnev/jk is the jkth element of xnew(l x J10, % new, jk is its prediction by the model, and enew,jk is the vector of residuals.

Recently, control limits for variable contributions to Q-residuals were suggested by Westerhuis et al.  to compare the residuals of the new batch to the residuals of the NO data. If a particular variable has high residuals in the NO set, it can also be expected to have high residuals in the new batch. The control limits are calculated similar to those of the Q-statistic as discussed in Section 6.4.2 (Eqs. 6.104-6.111). The residuals matrix E of the reference set that is used to calculate contribution limits is obtained by "monitoring" each reference batch with one of the on-line SPM techniques discussed in Sections 6.5.1 and 6.5.2.

Contributions of process variables to the D-statistic. Two different approaches for calculating variable contributions to D-statistic have been proposed. The first approach introduced by Miller et al.  and by MacGregor et al.  calculates the contribution of each process variable to a separate score. The first step in this approach is to determine t score that is above its own confidence limits. Constructing confidence limits on individual scores is discussed and formulated in Section 6.4.2 (Eq. 6.95). The next step is to calculate the contribution of each element of the new batch run xnewjh on the rth score [389, 639]

The sum of the contributions in Eq. 8.4 is equal to the tnew,r score of the new batch.

The second approach was proposed by Nomikos . This approach calculates contributions of each process variable to the D-statistic instead contributions of separate scores.

In Eq. 8.5, the contribution of each element in xnewjk to the D-statistic is summed over all r components. This formulation is valid for the case of orthogonal scores because S-1, which is the inverse of covariance matrix of reference set scores T, then becomes diagonal and its diagonal elements are used. The loadings P of the MPCA model are also assumed to be orthogonal so that PTP — I. Westerhuis et al.  have extended Nomikos'  formulation to cases where scores and loadings are non-orthogonal. According to this generalization, D-statistic is calculated as follows:

jk=1

Hence, the contribution of new observation vector xnewjk of the new batch to the D-statistic is calculated as

The control limits for variable contributions to D-statistic are also given . These are computed by means of a jackknife procedure in which each of the NO batches is left out once, and variable contributions are calculated for each batch that is left out. The next step is to calculate the mean and variance of these contributions from I batches for each jth variable at fcth time period. Westerhuis et al.  proposed to use an upper control limit (UCL) for contributions that is calculated as the mean of the variable contributions at each time interval plus three times the corresponding standard deviation. It is noted that UCL obtained by this calculation is not considered to have a statistical significance, but it is useful for detecting contributions that are higher than those of NO batches in the reference set. A lower control limit (LCL) can also be developed in the same manner. If it is preferred to sum contributions over all time instances or over all process variables, then the control limits are obtained by summing the means of the corresponding jackknifed contributions from the reference set. The standard deviation of these summed means can be calculated as where <Jk and cjj are the standard deviations of the summed mean contributions over all process variables and all time instances, respectively. If the sum of the contributions over all variables at each time instance is used, one can zoom in the region(s) where summed contributions exceed the control limits that are calculated by using in Eq. 8.8.

It is always a good practice to check individual process variable plots for those variables diagnosed as responsible for flagging an out-of-control situation. When the number of variables is large, analyzing contribution plots and corresponding variable plots to reason about the faulty condition may become tedious and challenging. All these analyzes can be automated and linked with real-time diagnosis [436, 607] by means of knowledge-based systems.

Example. Consider a reference data set of 42 NO batches from fed-batch penicillin fermentation process (see Section 6.4.1). An on-line SPM framework is developed with that data set (X(42 x 14 x 764)). The model development stage and the MPCA model developed are the same as in Section 6.4.3, except that the construction of control limits is performed by passing each batch data in the reference set through the estimation-based on-line SPM procedure. Estimation method 2 (the future values of disturbances being assumed to remain constant at their current values over the remaining batch period) discussed in Section 6.5.1 is chosen for on-line SPM. A new batch scenario with a small downward drift on glucose feed rate (variable 3) between 180th and 300th measurements (Figure 8.3(d)) is produced for illustration of contribution plots. Both SPE (Figure 8.2(a)) and T2 (Figure 8.2(c)) charts have detected the out-of-control situation between 250th and 310th measurements and 270th and 290th measurements, respectively. Variable contributions are summed for the intervals of out-of-control for SPE and T2 in Figures 8.2(b) and 8.2(d). Since these summations represent