## Functional Data Analysis

Functional data are data generated by an inherent functional relationship in a process. The relationship may not be known explicitly, but its existence is assumed based on the knowledge about the process. Variations of daily weather temperature over the year, height and weight growth of a child over the years, or trajectories of process variables during a batch are functional data. The goals of functional data analysis (FDA) are to represent the data in ways that facilitate further analysis, determine patterns in data and study important sources of pattern in data, explain variations in output variables in terms of input variations, and conduct comparative studies between sets of data [495], Hence, modeling, analysis, and diagnosis activities are conducted in a framework that is different from "analysis of large data sets" approach. Indeed, a functional observation is considered as a single datum rather than a sequence of individual observations. The focus is on the trajectory of a process variable during the batch rather than the several hundred measured values for the variable.

The FDA approach detects and removes characteristics in data by applying a linear operator that consists of weighted sums of various orders of derivatives rather than subtracting the assumed characteristics from the original data. The derivative terms in the linear operator provides physical insight such as acceleration in production of a certain biological species for a specific time period during the batch. The differential equation representation of a dynamic process is well-accepted in physical sciences and engineering, and the interpretation of these differential equations provides significant information about the characteristics of the process. This approach can also be used for nonlinear trajectories by finding the mean trajectories and centering the data with respect to the mean before implementing the principal differential analysis (PDA) method in order to eliminate most of the nonlinearity in data.

FDA starts by converting raw functional data (measurements during the batch) to a functional representation. This usually involves data smoothing since most data include measurement noise, estimating the derivatives of various orders for the data trajectories, and development of functional relations that include these derivatives. The K functional observations represented by the raw data vector as x = (z\,X2, ■ ■ • ,£fc, • ■ • ,xk)t are used to define a much smaller set of m functions that are efficient approximations of these data. Data smoothing provides the ability of possessing a certain number of derivatives for the latent function, which may not be obvious in the raw data vector. Denoting the latent function at time tk as z(tk), xk — z(tk) + Cfc where ek is the measurement error that contributes to the roughness of the data. Derivatives should not be estimated by computing differences because of the measurement error. Differencing magnifies these errors.

The Principal Differential Analysis (PDA) method identifies the linear differential operator L

that comes as close as possible to satisfying the homogeneous linear differential equation Lxk for each observation xk [493]. The methodology outlined and the nomenclature used follows Ramsay and Silverman [495] where a detailed treatment of the topic is provided. The differential equation model

that satisfies the data as closely as possible is sought. Since the operator is expected to annihilate the data functions xk as nearly as possible, Lxk can be regarded as the residual error from the fit provided by L. A least squares approach can be used to fit the differential equation model by using the minimization of sum of squared errors (SSE) criterion.

Here, SSE(L) is minimized to determine the m weight functions in Eq. 4.137, viz., w = (wq,W\, ■ ■ • ,iom_i). Once the operator L is determined by estimating its w, a set of m linearly independent basis functions that satisfy = 0 and form the null space of L can be computed. The weights w can be determined by pointwise minimization of the sum of squared errors (SSEP) criterion:

with wm(t) = 1 for all t. The representation (Lxk)2(t) is used to underline the pointwise nature of time-variant data [495]. Defining the K x m re-gressor matrix T(i) and the A'-dimensional dependent variable vector A(i) as

T(t) = [(Z)J®fc)(i)]fc=i,*;j-=o,m-i and A(i) = [-(Dmxk)(t)]k=hK

the least squares solution of Eq. (4.140) gives the weights Wj(t)

Weights w must be available at a fine level of detail for computing the basis functions The resolution of w depends on the smoothness of the derivatives D3xk. Pointwise computation of w for larger orders of m is computationally intensive. Furthermore, T(i) may not be of full rank. One way to circumvent these problems is to approximate w by a fixed set of basis functions <p — 4>i,l = I, ■ ■ ■ ,L. Standard basis function families such as polynomials, Fourier series, B-spline functions or wavelets could be used. The weights w can be approximated as where the mL coefficients c = [cji]j=itm;i=iiL are stored as a column vector. The estimates c are the solution of Rc = —s resulting from the minimization of the quadratic form

where C is a constant independent of c, R = [Rij}i=o,m-i-,j=o,m~i and S = [Sj]j=o,m-l with