steps of this framework are model development stage using a historical reference batch database that defines normal operation and process monitoring stage that uses the model developed for monitoring a new batch.

MPCA model development stage: MPCA model is developed from a data set of equalized/synchronized (Figures 6.42 and 6.43), unfolded and scaled 41 good batches. Each batch contains 14 variables 764 measurements, resulting in a three-way array of size X(41 x 14 x 764.) After unfolding by preserving the batch direction (/), the unfolded array becomes X(41 x 10696). MPCA is performed on the unfolded array X with four principal components, resulting in scores matrix T of size (41 x 4) and loadings matrix P of size (10696 x 4). The variability of the X block explained by MPCA model is summarized in Table 6.9. While 4 PCs explain only 40 percent of the variation in data, the resulting MPCA model is good enough for performing various SPM tasks. Additional PCs can be included to improve model accuracy, paying attention not to include variation due mostly to noise in the model.

MPCA model statistics are summarized in a number of multivariate charts in Figure 6.45. All of the control limits are developed based on the formulations summarized in Section 6.4.2. Score biplots (with 95% and 99% confidence ellipsoids defined in Eq. 6.96) in Figures 6.45(a)-6.45(b), T2 and Q (sum of squares of residuals) charts in Figures 6.45(c)-6.45(d), respectively, with their 95% and 99% control limits revealing that none of the 41 batches present any unexpected behavior. It can also be concluded that all of the batches are operated similarly and the scatters of the score biplots in Figures 6.45(a)-6.45(b) defines the normal operational region in the reduced space. The percent of the cumulative sum of squares explained by the MPCA model is also shown (Figures 6.45(e) and 6.45(f)) with respect to time and variables. Figure 6.45(e) summarizes the cumulative explained variance by each principal component over the course of batch

evolution. The lowest line in both figures represents the percent explained by the first PC, the next line above shows the percent explained by the first two PCs together, and so on. Several operational and physiological phases throughout the fed-batch penicillin fermentation are detected from this plot. Comparing the relative increases in the four curves that indicate the cumulative variation explained, the first PC explains most of the variability in first phase that corresponds to the batch operation (switching from batch to fed-batch at around the measurement 85), while the second PC explains variability in the second phase (fed-batch operation/exponential growth phase). This is a common observation in MPCA because the correlation of the process variables in each phase changes over the progress of a batch. Figure 6.45(f) shows that the dominant variables in the first principal component are 5, 7, 8, 9, 13 and 14. These variables contain physiological change information and their profiles look similar (see Figure 6.31). Variable 10 and others are explained mostly by the second and third components. The first principal component explains most of the batch operation phase and exponential growth phase in fed-batch operation where most of the process dynamics take place (in the associated variables 5, 7, 8, 9, 13 and 14). The second and additional principal components capture variability mostly in the fed-batch operation where 10 (carbon dioxide evolution) is dominant. Figure 6.45(e) indicates a decrease in explained variance during the period of approximately 40th and 60th measurements for all of the 4 PCs that precedes switching to fed-batch operation, because the variability of process variables is low in this period. To increase phase-based explained variability, multiple model approaches are also suggested [130, 291, 605]. An example is given in Section 6.4.5.

Process monitoring stage: The MPCA model developed here is used to monitor finished batches to classify them as 'good' or 'bad' and also investigate past batch evolution, and detect and diagnose abnormalities. A batch scenario including a small downward drift fault is simulated (Section 6.4.1, Figure 6.44 and Table 6.8). New batch data are processed with MPCA model using Eq. 6.114 after proper equalization/synchronization, unfolding and scaling. The same set of multivariate SPM charts are plotted (Figure 6.46). Score biplots in Figures 6.46(a) and 6.46(b) detect that the new batch (batch number 42) is operated differently since its scores fall outside of the NO region defined by MPCA model scores. Both D and Q statistics also indicate an out-of-control batch. Now that the batch is classified as out-of-control, the time of the occurrence of the deviation and the variables that have contributed to increasing the values of the statistics can be determined. The aforementioned temporal T2 chart based on cumulative scores and individual score plots can be used here. The T2 value goes out-of-control as shown in Figure 6.47(a), the same out-of-

15 20 25 30 Batch number

20 25 30 Batch number

Figure 6.46. End-of-batch monitoring results of a faulty batch.

15 20 25 30 Batch number

20 25 30 Batch number

Figure 6.46. End-of-batch monitoring results of a faulty batch.

control situation is also observed with score plots (Figures 6.47(b)-6.47(c)). The first out-of-control signal is given by the PC3 chart around the 445th measurement. When variable contributions are calculated, the responsible variables are identified. Variables 3, 5 and 8 have the highest contributions which make sense since the fault was introduced into variable 3 (glucose feed rate), which affects variables 5 and 8, glucose and penicillin concentrations, respectively.

6.4.4 Multiway PLS-based SPM for Postmortem Analysis

MPLS [661] is an extension of PLS that is performed using both process data (X) and the product quality data (Y) to predict final product quality

200 300 400 500 600 Measurement number

100 200 300 400 500 600 700 800 Measurement number

100 200 300 400 500 600 700 800 Measurement number

Figure 6.47. End-of-batch fault detection and diagnosis for a faulty batch.

during and/or at the end of the batch [298, 434, 663]. When a batch is finished, a block of recorded process variables Xnew (K x J) and a vector of quality measurements ynew (1 x M) that are usually measured with a delay due to quality analysis, are obtained. Xnew (K x J) is unfolded to Xnew (1 x KJ) and both xnew and ynew are scaled similarly as the reference batch set scaling factors. Then, they are processed with MPLS model loadings and weights that contain structural information on the behavior of NOC set as

fnew — Yne where tnew (1 xA) denotes the predicted t-scores, yn quality variables, and e and f the residuals.

w (lxM) the predicted

Example. MPLS-based SPM framework is also illustrated using simulated fed-batch penicillin production data set presented in Section 6.4.1. Similar to MPCA framework in the previous section (Section 6.4.3), the MPLS framework has two main steps: model development stage out of a historical reference batch data base that defines normal operation and process monitoring and quality prediction stage that uses model developed. The latter stage includes prediction of the product quality, which is the main difference between MPCA and MPLS based SPM frameworks. Note that in this MPLS framework, quality prediction is made at the end of batch while waiting to receive quality analysis laboratory results. It is also possible to implement MPLS on-line while predicting the final product quality as batch progresses. This version is discussed in detail in Section 6.5.1.

MPLS model development stage: MPLS model is developed from the data set of equalized/synchronized (Figures 6.42 and 6.43), unfolded and scaled 38 good batches (each containing 14 variables 764 measurements resulting in a three-way array of size X(38 x 14 x 764). After unfolding by preserving the batch direction (/), the unfolded array becomes X(38 x 10696)). Three batches in the original 41 batches of data are excluded from the reference set due to their high variation. In addition to X block, a Y (38 x 5) block comprised of 5 quality variables measured at the end of each batch also exists (Table 6.7 and Figure 6.41). MPLS is performed between the unfolded and scaled X and Y with four latent variables resulting in scores T (38 x 4), U (38 x 4), weights W (10696 x 4) and Q (5 x 38) and loadings matrices P (10696 x 4). Explained variability on both X and Y blocks by MPLS model is summarized in Table 6.10. 38.39 % of X explains 97.44 % of Y with 4 latent variable MPLS model. Cumulative percentage of sum of squares explained by 4 latent variables on each y in Y block is also tabulated in Table 6.11. MPLS model statistics are summarized in a number of multivariate charts in Figure 6.49. All control limits are developed based on the formulations summarized in Section 6.4.2.

NO region is defined by the ellipsoids in Figures 6.49(a) and 6.49(b) by the MPLS model. Naturally all of the reference batches fall into these regions. Note that Figure 6.49(a) defines process measurements while Figure 6.49(b) defining final quality variables. All of the batches also are inside the control limits in sum of squared residuals as shown in Figures 6.49(e) and 6.49(f) in both process and quality spaces. Hence, MPLS model can be used to discriminate between the acceptable and 'poor' batches at the end-of-the batch. It is evident from the biplots of inner relations of the MPLS model (Figures 6.49(c) and 6.49(d)) that there is a correlation between process and product variables and this relation is linear because most of the

LV no. |
X-block |
Y-block | ||

Was this article helpful? |

## Post a comment