# FDD with Fishers Discriminant Analysis

A problem that emerges when statistical techniques are used in multivariate classification and clustering is what Bellman calls the curse of dimensionality . Principal components analysis (PCA) is discussed as a linear dimensionality reduction technique in Section 4.1. PCA is optimal in terms of capturing the variability among the data. Another technique called Fisher's discriminant analysis (FDA) is optimal in terms of maximizing the separation among the set of classes . Suppose that there is a set of n(— rii + n,2 -\----+ ng) p-dimensional samples xi, • • • , x„ belonging to classes 7rit i = 1, • • • ,g. Fisher suggested to transform the multivariate observations x to another coordinate system that enhances the separation of the samples belonging to each class 7r,. In this section, the FDA concept is illustrated first for separating data belonging to two classes 7Ti and 7T2- Then, FDA is generalized to process data with many classes. Finally, classification and diagnosis with FDA is discussed.

### FDA for data belonging to two classes

Fisher suggested transformation of multivariate observations x to univariate observations z such that the z's derived from populations 7ri and 7t2 are separated as much as possible. If the multivariate observations have more than two variables, additional 2 variables (22,23, • ■ ■) may be necessary for enhancing the separation. The total scatter of data points (St) consists of two types of scatter, within-class scatter S^ and between-class scatter The objective of the transformation proposed by Fisher is to maximize Sb while minimizing Sw- Fisher's approach does not require that the populations have Normal distributions, but it implicitly assumes that the population covariance matrices are equal, because a pooled estimate of the common covariance matrix (Sp;) is used (Eq. 8.20).

The transformation is based on a weighted sum of observations x. In the case of two classes, the linear combination of the samples (x) takes values zii,--- ,zXpi for the observations from the first population 7Ti and the values 221, • • • > z2p2 for the observations from the second population 7r2. Denote the weight vector that transforms x to 2 by w. FDA is illustrated for the case of two normal populations with a common covariance matrix in Figure 8.5. First consider separation using either xi or x2 axis. The diagrams by the abscissa and ordinate indicate that several observations belonging to one class (7Tx) are mixed with observations belonging to the other class ( tt2). The linear discriminant function 2 = wTx defines the line in the upper portion of Figure 8.5 that observations are projected on to maximize the ratio of between-class scatter and within-class scatter [262, 139]. One may visualize changing the slope of the line to see how the number of observations of a specific class that move in the region of the other class changes.

The separation of the two sets of z's can be assessed in terms of the difference between z\ and z2 expressed in standard deviation units: probability density functions for projections Figure 8.5. Fisher's discriminant technique for two populations (g = 2), 7Ti(*) and 7T"2(G), with equal covariances.

Figure 8.5. Fisher's discriminant technique for two populations (g = 2), 7Ti(*) and 7T"2(G), with equal covariances.

where s2 is the pooled estimate of the variance = 1 Q

The linear combination that maximizes the separation is