Figure 6.61. Fault detection and diagnosis by MSMPCA. Dashed line represents 99% control limit on SPE charts.

where C is the desired overall limit, Ce is the adjusted confidence limit at each scale in percent, and £ is the number of scales of decomposition [38].

The constraints of dyadic downsampling can be eliminated by using a moving window, implementing a computational strategy similar to moving average. The increase in computational burden is a disadvantage of this approach.

6.5 On-line Monitoring of Batch/Fed-Batch Fermentation Processes

Real-time SPM during the progress of the batch can be as simple as monitoring the trajectory of each process variable and comparing it against an ideal reference trajectory. The premise for this approach is that if all variables behave as expected, the product properties will be as desired. A few control loops can be used to regulate some critical process variables. There are several problems with this approach:

1. Slight changes in many variables may seem too small for each variable, but their collective effect may be significant

2. Variations in impurity levels or other initial conditions may affect the variable trajectories, but these deviations from the reference trajectories may not cause significant product quality degradation

3. The duration of each batch may be different, causing difficulties in comparing the trajectories of the current batch to reference trajectories.

The remedies proposed fall into four groups:

1. Use the MSPM tools with variable trajectories that are combinations of real data (up to the present time in the batch) and estimates of the future portion of the trajectories to the end of the batch

2. Use hierarchical PCA that relies only on trajectory information from the beginning of the batch to the current time

3. Use MPCA or MPLS that is performed on an unfolded three-way batch data array by preserving variable direction

4. Use estimators for predicting the final product quality and base batch monitoring on this estimate.

These four approaches are discussed in the following Sections.

The problem that is encountered when applying MPCA and MPLS techniques for on-line statistical process and product quality monitoring is that the xnew vector in Eqs. 6.114 and 6.115 is not complete until the end of the batch run. At time interval k, the matrix Xnew has only its first k rows complete and all the future observations [(K — k) rows] are missing. Several approaches have been proposed to overcome this problem for MPCA and MPLS-based on-line monitoring [433, 434, 435].

MPCA-based on-line monitoring. The future portions of variable trajectories are estimated by making various assumptions [433]. The on-line evolution of a new batch is monitored in the reduced space defined by the PCs of the MPCA model.

The incompleteness of the Xnew (K x J) matrix (or xnew (1 x K J) vector after unfolding and scaling) during the batch creates a problem for on-line monitoring. The loadings of the reference data set cannot be used with incomplete data because the vector dimensions do not match. Three approaches are suggested to fill in the missing values in Xnew [433, 435].

Method 1, assumes that future observations are in perfect accordance with their mean trajectories.

Method 2, assumes that future values of disturbances remain constant at their current values over the remaining batch period.

Method 3, treats unknown future observations as missing values from the batch in MPCA model. Hence, PCs of the reference batches can be used for prediction.

All three assumptions introduce arbitrariness in the estimates of variable trajectories (Figure 6.62). Deciding which approach to use depends on the inherent characteristics of the process being monitored and information about disturbances. If process measurements do not contain discontinuities or early deviations, the third approach may be used after some data have been collected. If it is known that the disturbances in a given process are persistent, it is reported that the second approach works well [435]. When no prior knowledge exist about the process, the first estimation technique may be used.

As the new vector of variable measurements is obtained at each time k, the future portions of the trajectories are estimated for use in regular MPCA-based SPM framework as a tnew,k = xnew^> ' ®new,fc = ^new ~~ ^ ' ¿new,afcPa (6.120)

a=l where x®®lw denotes the full variable measurements vector (1 x K.J) that is estimated at each k onwards to the end of the batch run, tneW]fe (1 x A), the predicted scores at sampling time k from the P loadings, and enew>fc (1 xKJ) the residuals vector at time k. To construct the control limits for on-line monitoring of new batches, each reference batch is passed through the online monitoring algorithm above, as if they are new batches, and their predicted scores (tnew,/c) and squared prediction errors (SPEfc) are stored at each sampling interval k.

Example. MPCA-based on-line SPM framework is illustrated using the same simulated data set of fed-batch penicillin production presented in Section 6.4.1. The large downward drift fault in glucose feed rate is used as a case study (Figure 6.44 and data set X3 (764 x 14) in Table 6.8). The model development stage and the MPCA model developed are the same as in Section 6.4.3, with the exception that the construction of control limits is performed by passing each batch data in the reference set through the estimation-based on-line SPM procedure. The process monitoring stage

depends on the estimation method used. All three methods are implemented in this example. Greater difference caused by the data estimation method used is observed in the T2 chart in Figure 6.63(a). The out-of-control signal is first detected by the second technique (the future values of disturbances remain constant at their current values over the remaining batch period) at the 325th measurement in T2 chart. SPE chart detected the fault around the 305th measurement in all of the techniques. Variable contributions to SPE and T2 and scores biplots are presented for Method 2. Contribution plots revealed the variables responsible for the deviation from NO when out-of-control state is detected. Variables 3 and 5 in SPE contributions (Figure 6.63(d)) at 305th measurement and variable 3 and 5 (and 7, 13, 14 to a lesser extent) in T2 contribution plot (Figure 6.63(c))

at 325th measurement are identified as responsible for the out-of-control situation. Variable 3 (glucose feed rate) is the main problematic variable affecting the other variables gradually. Variable 5 (glucose concentration in the fermenter) is the first variable directly affected by the drift in variable 3. Since T2 detects the out-of-control state later, the effect of the drift develops significantly on variables such as 7 (biomass concentration in the fermenter), 13 (heat generated), and 14 (cooling water flow rate) that are signaled by the T2 contribution plot (Figure 6.63(c)). Scores biplots also show a clear deviation from NO region defined by confidence ellipses of the reference model (Figures 6.63(e) and 6.63(f)). □

MPLS—based on-line monitoring and estimation of final product quality. Although the three estimation methods presented above can be used to deal with missing future portions of the trajectories when implementing MPLS on-line, another approach that uses the ability of PLS to handle missing values is also proposed [434]. Measurements available up to time interval k are projected onto the reduced space defined by the W and P matrices of the MPLS model in a sequential manner as for all of the A latent variables where (1 : kJ, a) indicates the elements of the ath column from the first row up to the kJth row. The missing values are predicted by restricting them to be consistent with the values already observed, and with the correlation structure that exists between the process variables as defined by the MPLS model. It is reported that this approach gives t-scores very close to their final values as Xnew is getting filled with measured data (k increases) and it works well after 10 % of the batch evolution is completed [433, 434, 435].

When a new variable measurements vector is obtained and k is incremented, scores t(l, a)neW)fc can be estimated and used in MPLS (Eqs. 6.115 and 6.116). There are no residuals f on quality variables space during online monitoring since the actual values of the quality variables will be known only at the end of the batch. Each batch in the reference database is passed through the on-line MPLS algorithm as if they were new batches to construct control limits. Since MPLS provides predictions for the final product qualities at each sampling interval, the confidence intervals for those can also be developed [434]. The confidence intervals at significance level a for an individual predicted final quality variable y are given as [434]

,fcW(l : kJ, a)TW(l : kJ,a) <k - t(l,o)new,fcP(l : kJ,a)T

,fcW(l : kJ, a)TW(l : kJ,a) <k - t(l,o)new,fcP(l : kJ,a)T

1 iT\V2

Figure 6.63. MPCA-based on-line SPM results of a faulty batch. In (a) and (b) Method 1 (Solid curve), Method 2 (Dashed curve), and Method 3 (Dash-dotted curve), (c)-(d) Variable contributions to T2 and SPE at 325th and 305th measurements, respectively. Score biplots based on Method 2 (e) 1st vs 2nd PC and (f) 2nd vs 3rd PC.

Figure 6.63. MPCA-based on-line SPM results of a faulty batch. In (a) and (b) Method 1 (Solid curve), Method 2 (Dashed curve), and Method 3 (Dash-dotted curve), (c)-(d) Variable contributions to T2 and SPE at 325th and 305th measurements, respectively. Score biplots based on Method 2 (e) 1st vs 2nd PC and (f) 2nd vs 3rd PC.

where T is the scores matrix, tj-A-i, a/2 is the critical value of the Stu-dentized variable with I — A — 1 degrees of freedom at significance level a/2 and mean squared errors on prediction (MSE) are given as

SSE = (y — y)T(y — y), MSE = SSE/(I - A ~ 1). (6.124)

In these equations, I refers to number of batches, A to number of latent variables retained in the MPLS model, and SSE to sum of squared errors in prediction.

Example. To illustrate on-line implementation of MPLS for monitoring and prediction of end product quality, the same reference set and MPLS model are used as in Section 6.4.4. All batches in the reference set are passed through the on-line algorithm to construct multivariate statistical control limits. MV charts for an in-control batch are shown in Figure 6.64. T2, SPE, first LV and second LV charts indicate that the process is operating as expected. Figure 6.65 presents predictive capability of the model. The solid curves indicate the end-of-batch values estimated at the corresponding measurement times. The dashed curves are the 95% and 99% control limits on end-of-batch estimates. End-of-batch values of all five quality variables are predicted reasonably while the batch is in progress. The third fault scenario with a significant downward drift on substrate feed rate is used to illustrate MPLS based on-line SPM. The first out-of-control signal is generated by the SPE chart at the 305th measurement (Figure 6.66(a)), followed by the second LV plot at the 355th measurement (Figure 6.68(c)), the T2 chart at the 385th measurement(Figure 6.66(c)) and finally by the first LV plot at the 590th measurement (Figure 6.68(a)). Contribution plots are also plotted when out-of-control status is detected on these charts. Variable contributions to SPE in Figure 6.66(b) reveal the root cause of the deviation that is variable 3 (glucose feed rate). Second highest contribution in this plot is from variable 5 (glucose concentration in the fermenter), which makes sense because it is directly related to variable 3. The rest of the corresponding contribution plots reveal variables that are affected sequentially as the fault continues. For instance, the second LV signals the fault later than SPE, hence there is enough time to see the effect of the fault on other variables such as variables 12 (temperature in the fermenter) and 13 (heat generated) while variable 3 is still having the maximum contribution (Figure 6.68(d)). T2 chart signals out-of-control a little later than the second LV and at that point variables affected are variable 7 (biomass concentration in the fermenter), 13 (heat generated) and 14 (cooling water flow rate) (Figure 6.66(d)). An upward trend towards the out-of-control region can be seen in T2 charts in Figure 6.66(c) when SPE chart detects the out-of-control situation. Variable contributions at 305th

Figure 6.64. MPLS-based on-line monitoring results of a normal batch.

Figure 6.64. MPLS-based on-line monitoring results of a normal batch.

measurement are shown in Figure 6.67 to reveal the variables contributing to this deviation that is beginning to develop. As expected, variables 3 (glucose feed rate) and 5 (glucose concentration in the fermenter) are found responsible for the start of that upward trend towards the out-of-control region. End-of-batch product quality is also predicted (Figure 6.69). Significant variation is predicted from desired values of product quality variables (compare Figure 6.69 to Figure 6.65). The confidence intervals are plotted only until the SPE signals out-of-control status at 305th measurement because the model is only valid until that time. □

Measurements

Figure 6.65. MPLS-based on-line predictions of end-of-batch product quality of an in-control NO batch. (•) represents the actual value of the end-of-batch product quality measurement.

Figure 6.65. MPLS-based on-line predictions of end-of-batch product quality of an in-control NO batch. (•) represents the actual value of the end-of-batch product quality measurement.

Hierarchical PCA provides a framework for dividing the data block X into K two-dimensional blocks (I x J) and look at one time slice at a time [496] (see Figure 6.70). This gives separate score vectors tak for each individual time slice X^ where a = 1, ■ • • , A, k = 1, • • • , K. The initial step is to calculate a one-component PCA model for the first time slice, and to obtain the score and loading vector (tak, pafc, o. = 1, k — 1) for the first block. The hierarchical part of the algorithm starts at k = 2 and continues for the rest of the batch (k = K). The score and loading vectors are built iteratively, the score vector for the previous time slice model ta(jt_i) is used as the starting estimate for tafc. Then, pafc = X^fctafc and the new score vector rak is calculated and normalized:

The weighting factor dk balances the contributions of the new information

Was this article helpful?

## Post a comment