Multiblock PLS and PCA Methods for Modeling Complex Processes

Multiblock data analysis has its origins in path analysis and path modeling in sociology and econometrics. In situations where the number of variables is very large or the process that is analyzed is large and consists of many different stages, it is logical to group variables in a meaningful way, either based on their similarity, or their origin in the system or process, and then summarize each group that is called block. Each block may be divided into sub-blocks according to process phases and stages (several X-blocks of process variables and/or Y blocks of quality variables). If the focus is on the process measurements space, several MPCA models can be developed out of sub-blocks, however in regression models separate projections of each block can be put together as a block and the resulting block scores are then treated as predictor and response variables on the "super level (or upper level)" of the model. The resulting models are called hierarchical projection models.

A version of multiblock PCA (MBPCA) called as "consensus PCA" (CPCA) was introduced by Wold et al. [662] as a method for comparing several blocks of descriptor variables (process variables) measured on the same objects (batches). A consensus direction is sought among all the blocks. One of the classical applications of CPCA is the testing of food and beverages especially wines by a number of judges (or samplers). Each judge (b is an index for judges) tastes each of the N samples and gives his/her opinion in terms of Kb variables such as sweetness, color, tannic taste, etc. A consensus matrix T (super score) will then contain the overall opinion of the judges about the same object while the super weight showing the relative importance of each judge in the consensus score (Figure 4.12). Wold et al. [665] also suggested a slightly different multiblock PCA algorithm called "hierarchical PCA" (HPCA). The only difference is the normalization step where in HPCA, tj, and tx are normalized instead of wt and pb in CPCA, and the super weight only shows if the direction of the super score is present in the block in HPCA. In both algorithms the super score will show the direction most dominant in the consensus block T. However, because the block scores are normalized in HPCA, it will search the most dominant direction in these normalized scores. In CPCA, block scores will be combined in T as they are calculated for each block and hence the super score will just be the direction most dominant in the block scores. This difference between the two methods intensifies as one direction becomes stronger in only a single block [665].

To prevent the domination of one block due to large variance with respect to other blocks, an initial block scaling is performed by modifying autoscaling according to a function the number of variables m contained in each block (see Eq. 4.153). Typically this function is chosen to be between the square root and the fourth root of m, giving each block the total weight of between one and y/m [640, 665].

X = [ XI/Y/toxi, • • •, Xj,/sjmxb ] • (4.153)

Additional scaling factors can also be introduced to some particular X and/or Y blocks in hierarchical models with many blocks to scale up or down the importance of those blocks. Since larger blocks have usually a greater importance than the smaller ones, a mild weighting according to size can be assigned as [665]

A convergence problem in these original algorithms is reported and resolved by Westerhuis et al. [640]. An adaptive version of HPCA for monitoring of batch processes has been reported by Rannar et al. [496]. The details of this advantageous technique is discussed along with case studies in Section 6.5.2.

The application of multi-way MBPCA is also suggested for batch process data. In multiway MBPCA, blocking of the variables is done as explained above. Kosanovich et al. [291] have grouped the data from a batch polymerization reactor based on operational stages while Undey et al. [604, 605] have extended the same approach by dividing one process unit into two operational phases and included additional data for analysis from a second process unit for the case of multistage pharmaceutical wet granulation. A detailed example is given in Section 6.4.5 for this case. A nonlinear version of multiway MBPCA based on artificial neural networks is also suggested with some improvement on the sensitivity of the monitoring charts in the literature [129, 130].

When development of regression (projection based) models is aimed between multiple X and Y blocks, hierarchical PLS (HPLS)or MBPLS methods can be used. HPLS is an extension of the CPCA method. After a CPCA cycle on multiple X blocks, a PLS cycle is performed with the super block T and Y [640, 662, 665]. An application of HPLS was given by mx tT

Super Level

Figure 4.12. CPCA and HPCA methods [640, 665]. X data matrix is divided into b blocks (Xi, X2,..., X5) with block b having mxb variables.

Super Level m

Was this article helpful?

0 0

Post a comment