# A

Dm = I + JK -2A, Da — JK(I - 1) — ^ I + JK - 2i

«=i where Dm and Da denote the degrees of freedom required to fit the Ath component and the degrees of freedom after fitting the Ath component, respectively. If W exceeds unity, then this criterion suggests that the Ath component could be included in the model .

### 4.2 Multivariable Regression Techniques

Several regression techniques can be used to relate two groups of variables such as process measurements X and quality variables Y. The availability of a model provides the opportunity to predict process or product variables and compare the measured and predicted values. The residuals between the predicted and measured values of the variables can be used to develop various SPM techniques and tools for identification of variables that have contributed to the out-of-control signal.

Multivariable linear regression is the most popular technique for model development. The model equation is

where E is the residual which is equal to 0 for the estimate Y = X/3. A critical issue in using this approach for modeling multivariable processes is the colinearity among process variables. Colinearity causes numerical difficulties in computing the inverse (XTX)-1. Hence, the computation of the regression coefficients ¡3 by the least-squares approach may not be possible. Even if (3 is computed, the standard errors of the estimates of the /3 coefficients associated with the colinear regressors become very large. This causes uncertainty and sensitivity in these ¡3 estimates.

Colinearity can be detected by standardizing all predictor variables (mean centered, unit variance) and computing correlations and coefficients of determination.

Z%3 = ^d- ^ ^ = Yi&V'^2 ' i = !>■■• »J = I."" -P- (4-11)

There is significant colinearity among some predictor variables if:

• The correlation between any two predictors exceeds 0.95 (only colinearity between two predictors can be assessed).

• The coefficient of determination R? of each predictor variable j regressed on all the other predictor variables exceeds 0.90, or the variance inflation factor VIFj = (1 — Rj)-1 is less than 10 (variable j is colinear with one or more of the other predictors). VIFj is the (j>j) th diagonal element of the matrix ZTZ_1 where Z = \zij\. R^ can be computed from the relationship between R? and VIFj.

• Some of the eigenvalues of the correlation matrix ZTZ are less than 0.05. Large elements of the corresponding eigenvectors identify the predictor variables involved in the colinearity.

Remedies in regression with colinear data include

• Stepwise regression

• Ridge regression

• Principal components regression

• Partial least squares (PLS) regression

These techniques will be introduced in the sections that follow.

### 4.2.1 Stepwise Regression

Predictor variables are added to or deleted from the prediction (regression) equation one at a time. Stepwise variable selection procedures are useful when a large number of candidate predictors is available. It is expected that only one of the strongly colinear variables will be included in the model. Major disadvantages of stepwise regression are the limitations in identifying alternative candidate subsets of predictors, and the inability to guarantee the optimality of the final model. The procedure is:

• Fit p single variable regression models, calculate the overall model F-statistic for each model. Select the model with the largest F-statistic. If the model is significant, retain the predictor variable and set r = 1.

• Fit p—r reduced models, each having the r predictor variables selected in the previous stages of variable selection and one of the remaining candidate predictors. Select the model with the largest overall F-statistic. Check the significance of the model by using the partial F-statistic.

• If the partial F-statistic is not significant, terminate the procedure. Otherwise, increment r by 1 and return to step 2.

Computation of F-statistics:

Regression sum of squares: SSR = — y)2, with p degrees of freedom (d.f.), Error sum of squares: SSE = Y2(Ui — V)2, with d.f.= m — p - 1. Denote a model of order r by M2 and a model of order r + 1 by Mi, and their error sum of squares by SSE2 and SSEi, respectively. Then

where

MSR(Mi\M2) = SSE2 ~SSEl MSBx = SSEl . (4.14) r + l — r m — r- 2

4.2.2 Ridge Regression

The computation of regression coefficients (3 in Eq. 4.10 is modified by introducing a ridge parameter k such that

Standardized ridge estimates (3j j = 1, • • • . p are calculated for a range of values of k and are plotted versus k. This plot is called a ridge trace. The ¡3 estimates usually change dramatically when k is initially incremented by a small amount from 0. Some (3 coefficients may even change sign. As k is increased, the trace stabilizes. A k value that stabilizes all f3 coefficients is selected and the final values of ¡3 are estimated.

A good estimate of the k value is obtained as where 3* s are the least-squares estimates for the standardized predictor variables, and MSE is the least squares mean squared error, SSE/(m~p — 1).

Ridge regression estimators are biased. The tradeoff for stabilization and variance reduction in regression coefficient estimators is the bias in the estimators and the increase in the squared error.

### 4.2.3 Principal Components Regression

Principal components regression (PCR) is one of the techniques to deal with ill-conditioned data matrices by regressing the system properties (e.g. quality measurements) on the principal components scores of the measured variables (e.g. flow rates, temperature). The implementation starts by representing the data matrix X with its scores matrix T using the transformation T = XP. The number of principal components to retain in the model must be determined as in the PCA such that it optimizes the predictive power of the PCR model. This is generally done by using cross validation. Then, the regression equation becomes

where the optimum matrix of regression coefficients B is obtained as

Substitution of Eq. 4.18 into Eq. 4.17 leads to trivial E's. The inversion of TtT should not cause any problems due to the mutual orthogonality of the scores. Score vectors corresponding to small eigenvalues can be left out in order to avoid colinearity problems. Since principal components regression is a two-step method, there is a risk that useful predictive information would be discarded with a principal component that is excluded. Hence caution must be exercised while leaving out vectors corresponding to small eigenvalues.

### 4.2.4 Partial Least Squares

Partial Least Squares (PLS), also known as Projection to Latent Structures, develops a biased regression model between X and Y. It selects latent variables so that variation in X which is most predictive of the product quality data Y is extracted. PLS works on the sample covariance matrix (XTY)(YTX) [180, 181, 243, 349, 368, 661, 667]. Measurements on k process variables taken at n different times are arranged into a (nxm) process data matrix X. The p quality variables are given by the corresponding (nxp) matrix Y. Data (both X and Y blocks) are usually preprocessed prior to PLS analysis. PLS modeling works better when the data are fairly symmetrically distributed and have fairly constant "error variance" . Data are usually centered and scaled to unit variance because in PLS any given variable will have the influence on the model parameters that increases with the variance of the variable. Centering and scaling issues were discussed earlier in Section 4.1. The PLS model can be built by using the non-linear iterative partial least-squares algorithm (NIPALS). The PLS model consists of outer relations (X and Y blocks individually) and an inner relation (linking both blocks). The outer relations for the X and

Y blocks are respectively