## Ti ifc

is smallest [18, 262]. If all misclassification costs are equal, the event described by data x will be assigned to that population 7Tk for which 5Zf=i, i^k Pi/'(x) is smallest. This means that the omitted term pfc/fc(x) is largest. Consequently, the minimum ECM rule for equal misclassification costs becomes [262]:

Allocate x to 7r^ if pfc/fc(x) > pt/t(x) for all i ^ k.

given prior probabilities, density functions, and misclassification costs (when they are not equal). This classification rule is identical to the one that maximizes the "posterior" probability P(wk |x) (P(x) comes from given that x was observed) where

Pi M - Pfc/fc(x) _ (prior) x (likelihood) ~ £?=ii>i/i(*) " Et(prior) x (likelihood)]

If the populations follow Normal distributions with mean vectors /Xj, co-variance matrices £¿, and generalized variance |£j| (determinant of the covariance), /i(x) is defined as

and all misclassification costs are equal, then x is allocated to Tifc if lnpfc/fc(x) - lnpfe-|ln(27r)-iln|Sfc|-i(x-Mfc)TSfc1(x-Atfc)

The constant p/2 ln(27r) is the same for all populations and can be ignored in discriminant analysis. The quadratic discrimination score for the ith population d®(x) is defined as [262]

di(J(x)=lnpi-lln|Ei|-i(x-/ii)TS,"1(x-^) i = 9- (8-16)

The generalized variance | £< |, the prior probability p, and the Mahalanobis distance contribute to the quadratic score d^(x). Using the discriminant scores, the minimum total probability of misclassification rule for Normal populations and unequal covariance matrices becomes [262]:

Allocate x to n^ if d®(x) is the largest of all d®(x), i = 1, ■ ■ • ,g.

In practice, population mean and covariances (//j and Si) are unknown. Computations are based on historical data sets of classified observations, and sample mean (xj) and covariance matrices (Si) are used in Eq. (8.16).

A simplification is possible if the population covariance matrices X, are equal for all i. Then, £i = £ and Eq. (8.16) reduces to dtQ(x) = In^ - I In |£| - i(xr£-1x) + /if E"1* - ^f V* (8.17)

Since the second and third terms are independent of i, they are the same for all d,Q(x) and can be ignored in classification. Since the remaining terms consist of a constant for each i (lnp< — 1/2/xf and a linear combination of the components of x, a linear discriminant score is defined as di(x) = ^f- ^/ifS-Vi + Inpi (8.18)

An estimate of di (x) can be computed based on the pooled estimate of S [262]:

di(x) = xfS^x - ^xfS^Xi + lnpi < = 1, • • • , 5 (8-19)

where

SP< = „ + „ . 1 4.»-^ [("1 " ^ + * * ' + K - W (8-2°)

ni + ni + ■ ■ • + ng — g and ng denotes the data length (number of observations) in class g. The minimum total probability of misclassification rule for Normal populations with equal covariance matrices becomes [262]:

Allocate x to ir^ if dfc(x) is the largest of all di(x), i = 1, • • • , g.

### FDD by Integrating PCA and Discriminant Analysis

An integrated statistical method was developed [488] for automated detection of abnormal process operation and discrimination between several source causes by utilizing PCA and discriminant analysis techniques for multivariable continuous processes. The method was developed for monitoring continuous processes deviating from their steady state operation. The lack of significant autocorrelation, stationarity, and ergodicity should be established before utilizing this method. The method does not rely on visual inspection of plots; consequently, it is suitable for processes described by large sets of variables. It can be extended to batch processes by making appropriate modifications, but such extensions have not been reported. The method was illustrated by monitoring the Tennessee Eastman industrial challenge problem [137].

Detection and diagnosis of multiple simultaneous faults is an important concern. In a real process, combinations of faults may occur. An intervention policy to improve process operation may need to take into account each of the contributing faults. Diagnosis should be able to identify major contributors and correctly indicate which, if any, secondary faults are occurring [487], Most FDD techniques rely on the assumption of a single fault. Raich and Cinar proposed several statistical measures to assess the overlap between models describing process behavior caused by single faults. The similarity between models indicates the potential for confusion and masking of the effects (symptoms) of multiple faults. Quantitative measures to compare multivariable models permit decisions about their usefulness and discrimination capability. They also provide a priori information about faults that are likely to be masked by other faults.

PCA is used to develop a model describing variation under normal operation (NO). This PC model is used to detect outliers from NO, as excessive variation from normal target or unusual patterns of variation. Operation under various known upsets can also be modeled if sufficient historical data are available. These fault models are then used to isolate source causes of faulty operation based on similarity to previous upset behavior. Using PCs for several sets of data under different operating conditions (NO and with various upsets), statistics can be computed to describe distances of the current operating point to regions representing other areas of operation. Both scores distances and model residuals are used to measure such distance-based statistics.

### Fault Diagnosis

PC models for specific faults can be developed using historical data sets collected when that fault was active. When current measurements exhibit out-of-control behavior, a likely cause for this behavior can be assigned by pattern matching by using scores, residuals or their combination.

Score Discriminant. Assuming that PC models retain sufficient variation to discriminate between possible causes in scores that have independent normal distributions, the maximum likelihood that data x are from fault model i is indicated by the minimum distance. This minimum can be determined by the maximum of di expressed for example by quadratic discrimination (Eq. 8.16)

where t = xP, is the location of original observation x in PC space for fault model i, Si is the covariance along PCs for fault model i, and pi is the adjustment for overall occurrence likelihood of fault i [262], Figure 8.4 illustrates the fault isolation process. Score discriminants are calculated using PC models for the various known faults (Figure 8.4c); this semilog plot shows the negative of the discriminant. The most likely fault is chosen over time by selecting the fault corresponding to the maximum discriminant (curve with the lowest magnitude). Figure 8.4d reports the fault selected at each sampling time. Fault 3, which is the correct fault, has been reported consistently after the first 10 sampling times.

Squared Residuali, and 95% Confidence Level

95% Confidence Level -