where et is the ¿th row of E, I is the number of batches in the reference set, A is the number of PCs retained in the model, and tQ is a vector of A scores [254],

Statistical limits on the Q-statistic are computed by assuming that the data have a multivariate normal distribution [253, 254]. The control limits for Q-statistic are given by Jackson and Mudholkar [255] based on Box's [76] formulation (Eq. 6.104) for quadratic forms with significance level of a given in Eqs. 6.104 and 6.105 as

Qa = [i - e2ho{i - ho)/el + za(2e2h2o)1/2/0i}^ho (6.105)

where Xh is the chi-squared variable with h degrees of freedom and z is the standard normal variable corresponding to the upper (1 — a) percentile (za has the same sign as hQ). 6 values are calculated using unused eigenvalues of the covariance matrix of observations (eigenvalues that are not retained in the model) as [655]

The other parameters are

0<'s can be estimated from the estimated covariance matrix of residuals (residual matrix used in Eq. 6.103) for use in Eq. 6.105 to develop control limits on Q for comparing residuals on batches. Since the co-variance matrices ErE (JK x JK) and EET (I x I) have the same non-zero eigenvalues [435], EET can be used in estimating 0L's due to its smaller size for covariance estimation as

V = j—-, 0; = traced), for i = 1,2, and 3. (6.108)

A simplified approximation for Q-limits has also been suggested in [148] by rewriting Box's equation (Eq. 6.104) by setting 0| « 0103

Eq. 6.105 can be used together with Eq. 6.108 to calculate control limits for sum of squared residuals when comparing batches (Qi in Eq. 6.103).

In order to calculate SPE values throughout the batch as soon as the batch is complete, Eq. 6.110 is used for each observation at measurement time k [435]

Calculated SPE values for each time k using Eq. 6.110 follow x2 (chi-squared) distribution (Eq. 6.104, [76]). This distribution can be well approximated at each time interval using Box's equation in Eq. 6.104 (or its modified version in Eq. 6.109). This approximation of moments is preferred because it is computationally faster than using traces of powers of the residual covariance matrix of size (J x J) at each time interval. Parameters g and h can be approximated by matching moments of the gx\ distribution [435]

2 m v where m and v are the estimated mean and variance of the SPE at a particular time interval k, respectively. It was reported that these matching moments were susceptible to error in the presence of outliers in the data or when the number of observations was small. Outliers should be eliminated as discussed in Section 3.4.2.

Contribution plots are used for fault diagnostics. Both T2 and SPE charts produce an out-of-control signal when a fault occurs but they do not provide any information about the cause. Variable contributions to T2 and SPE values indicate which variable (s) are responsible for the deviation from normal operation. T2 statistic is used to monitor the systematic variation and SPE statistic is used to monitor the residual variation. Hence, in the case of a process disturbance, either of these statistics will exceed the control limits. If only the T2 statistic is out of control, the model of the process is still valid but the contributions of each process variable to this statistic should be investigated to find a cause for the deviation from normal operation. If SPE is out of control, a new event is found in the data, that is not described by the process model. Contributions of each variable to SPE will unveil the responsible variable(s) to that deviation.

Contribution plots are discussed in more detail as a fault diagnosis tool in Section 8.1.

Explained variance, loadings and weights plots highlight the variabilities of batch profiles. The explained variance is calculated by comparing the real process data with the MPCA model estimates. This can be calculated as a function of batch number, time, or variable number. The value of explained variance becomes higher if the model accounts for more variability in the data and for the correlation that exists among the variables. Variance plots over time can be used as an indicator of the phenomenological/operational changes that occur during the process evolution [291]. This measure can be computed as

where SS stands for 'sum of squares', a2 and a2 are the true and estimated sum of squares, respectively.

Loadings also represent variability across the entire data set. Although the loadings look like contributions, a practical difference occurs when some of the contributions of the process variables have values much smaller than their corresponding loadings and vice versa.

In the case of MPLS-based empirical modeling, variable contributions to weights (W) carry valuable information since these weights summarize information about the relationship between X and Y blocks. There are several ways of present this infirmation as charts. The overall effect of all of the process variables on quality variables over the course of process can be plotted, or this can be performed for a specific period of the process to reflect the change of the effect of the predictor block (X). Recently, Wold et al. [145] suggested yet another statistic as they coined the term Variable Influence on Projection (VIP) using the following formula

VIPj

where J denotes the number of variables in X-block, A the number of latent variables retained in the model, waj the weight on ath component on jth variable, SSYq the initial sum of squares on Y-block, and SSYa the sum of squares after A latent variables on Y-block. While this equation holds for continuous process data, a small modification is needed for batch process data since in the case of I x JK data arrangement, there are JK variables. One possible modification is to calculate the mean of each j variable to obtain an overall view or this can also be done for a period of the process. The squared sum of all VIP's is equal to the number of variables in X-block (that is J for continuous process data and JK for batch process data). VIP terms on each variable can be compared and the terms with large VIP (larger than 1) are the most relevant to explaining Y-block. An example is given in Section 6.4.4 for the overall VIP case.

6.4.3 Multiway PCA-based SPM for Postmortem Analysis

In this section, the use and implementation of MPCA-based modeling (Section 4.5.1) are discussed for a postmortem analysis of finished batch runs to discriminate between the 'good' and the 'bad' batches. This analysis can be used to improve operation policies and discover major sources of variability among batches. MPCA can also be implemented on-line (Section 6.5). In either case, an MPCA model based on a reference set (representing normal operating conditions) selected from a historical batch database is developed.

When a batch is complete, measurements on the process variables made at each sampling instant produce a matrix of Xnew (K x J). This matrix is unfolded and scaled to give xnew (1 x KJ), using the same parameters for scaling the reference batches during the model development phase. This new batch vector is tested for any unusual behavior by predicting t scores and residuals via the use of P loading matrices (Eq. 6.114) that contain most of the structural information about the deviations of variables from their average trajectories under normal operation:

where tnew denotes the scores of the new batch calculated by using P (JKx A) loadings from the MPCA model with A PCs. If the scores of the new batch are close to the origin and its residuals are small, this indicates that its operation is also similar to that of reference batches representing normal operation. The sum of squared residuals Q for the new batch over all the time periods can be calculated as Q = eTe = e(£)2 for a quick com parison with Q values of reference batches. D statistic (Eq. 6.97) can also be used to get an overall view. These statistics give only summary information about the new batch with respect to the behavior of the reference set, they do not present instantaneous changes that might have occurred during the progress of the batch. It is a common practice to use on-line MPCA algorithms to obtain temporal SPE and T2 values. These charts are introduced in Section 6.5.1. However, T2 and cumulative score plots are used along with the variable contributions in this example to find out the variable(s) responsible for deviation from NO. T2 is computed for each sampling instant using Eq. 6.101. Scores are calculated for each sampling instance and summed until the end of the batch to reach the final score value. Limits on individual scores are given in Eq. 6.95.

The MPCA model can be utilized to classify a completed batch as 'good' or 'bad'. Besides providing information on the similarity of a newly finished batch with batches in the reference set, MPCA model is also used to assess the progress during a run of a finished batch. Temporal scores evolution plots, SPE and T2 charts, are generally used along with contribution plots to further investigate a finished batch.

Example. MPCA-based SPM framework is illustrated for a simulated data set of fed-batch penicillin production presented in Section 6.4.1. Two main

PC no. |
X-block | |

Was this article helpful?

## Post a comment