Methods for Linear Data Based Model Development

Process models may be developed by using either first principles such as material and energy balances, or process input and output information. The advantages of first principle models include the ability to incorporate the scientist's view of the process into the model, describe the internal dynamics of the process, and explain the behavior of the process. Their disadvantages are the high cost of model development, the bias that they may have because of the model developer's decisions, and the limitations on including the details due to' lack of information about specific model parameters. Often, some physical, chemical or transport parameters are computed using empirical relations, or they are derived from experimental data. In either case, there is some uncertainty about the actual value of the parameter. As details are added to the model, it may become too complex and too large to run model computations on the computer within an acceptable amount of time. However, this constraint has a moving upper limit, since new developments in computer hardware and software technologies permit faster execution. Fundamental models developed may be too large for faster execution to be used in process monitoring and control activities. These activities require fast execution of the models so that regulation of process operation can be made in a timely manner. The alternative model development paradigm is based on developing relations based on process data.

Input-output models are much less expensive to develop. However, they only describe the relationships between the process inputs and outputs, and their utility is limited to features that are included in the data set collected for model development. They can be used for interpolation but they should not be used for extrapolation. There are numerous well established techniques for linear input-output model development. Nonlinear input-output model development techniques have been proposed during the last four decades, but they have not been widely accepted. There are more than twenty different paradigms, and depending on the type of nonlinearities in the data, some paradigms work better than others for describing a specific process. The design of experiments to collect data and the amount of data available have an impact on the accuracy and predictive capability of the model developed. Data collection experiments should be designed such that all key features of the process are excited in the frequency ranges of interest. Since, the model may have terms that are composed of combinations of inputs and/or outputs, exciting and capturing the interactions among variables is crucial. Hence, the use of routine operational data for model development, without any consideration of exciting the key features of the model, may yield good fits to the data, but provide models that have poor predictive ability. The amount of data needed for model development increases with the order of first principle models, linear input-output models, and nonlinear input-output models.

Biochemical processes have become increasingly instrumented in recent years. More variables are being measured and data are being recorded more frequently [304, 655]. This creates a data overload, and most of the useful information gets hidden in large data sets. There is a large amount of correlated or redundant information in these process measurements. This information must be compressed in a manner that retains the essential information about the process, extracts process knowledge from measurement information, and presents it in a form that is easy to display and interpret. A number of methods from multivariate statistics, systems theory and artificial intelligence for data based model development are presented in this chapter.

Model development may have various goals. These goals warrant consideration of the following cases. One case is the interpretation and modeling of one block of data such as measurements of process variables. Principal components analysis (PCA) may be useful for this to retain essential process information while reducing the size of the data set. A second case is the development of a relationship between two groups of data such as process variables and product variables, the regression problem. PCA regression or partial least squares (PLS) regression techniques would be good candidates for addressing this problem. Discrimination and classification are activities related to process monitoring that lead to fault diagnosis. PCA and PLS based techniques as well as artificial neural networks (ANN) and knowledge-based systems may be considered for such problems. Since all these techniques are based on process data, the reliability of data is critical for obtaining dependable results from the implementation of these techniques.

Data-based models may be linear or nonlinear and describe only the process behavior captured by the data collected. Methods for development of linear models are easier to implement and more popular. Since most monitoring and control techniques are based on the linear framework, use of linear models is a natural choice. However, nonlinear empirical models that are more accurate over a wider range of operating conditions are desirable for processes with strong nonlinearities. ANNs provide one framework for nonlinear model development. Extensions of PCA and PLS to develop nonlinear models have also been proposed. Several nonlinear time series modeling techniques have been reported. Nonlinear system science methods provide a different framework for nonlinear model development and model reduction. This chapter will focus on linear data-based modeling techniques. References will be provided for their extensions to the nonlinear framework. ANNs will also be discussed in the context of model development. Chapter 5 will introduce nonlinear modeling techniques based on systems science methods.

Section 4.1 introduces PCA. Various multivariate regression techniques are outlined in Section 4.2. Input-output modeling of dynamic processes with time series and state-space modeling techniques, state estimation with Kalman filters and batch process modeling with local model systems are introduced in Section 4.3. Functional data analysis that treats data as representation of continuous functions is discussed in Section 4.4. Statistical methods for modeling batch processes such as multivariate PCA and multivariate PLS, multivariate covariates regression and three-way techniques like PARAFAC and Tucker are introduced in Section 4.5. ANNs and their use in dynamic model development are presented in Section 4.6. Finally, Section 4.7 introduces extensions of linear techniques to nonlinear model development, nonlinear time series modeling methods, and nonlinear PLS techniques.

4.1 Principal Components Analysis

Principal Components Analysis (PCA) is a multivariable statistical technique that can extract the essential information from a data set reported as a single block of data such as process measurements. It was originally developed by Pearson [462] and became a standard multivariate statistical technique [18, 254, 262, 263]. PCA techniques are used to develop models describing the expected variation under normal operation (NO). A reference data set is chosen to define the NO for a particular process based on the data collected from various periods of plant operation when the performance is good. The PCA model development is based on this data set. This model can be used to detect outliers in data, data reconciliation, and deviations from NO that indicate excessive variation from normal target or unusual patterns of variation. Operation under various known upsets can also be modelled if sufficient historical data are available to develop automated diagnosis of source causes of abnormal process behavior [488].

Principal Components (PC) are a new set of coordinate axes that are orthogonal to each other. The first PC indicates the direction of largest variation in data, the second PC indicates the largest variation unexplained by the first PC in a direction orthogonal to the first PC (Fig. 4.1). The number of PCs is usually less than the number of measured variables.


PCA involves the orthogonal decomposition of the set of process measurements along the directions that explain the maximum variation in the data. For a continuous process, the elements of the data matrix (X) are Xij where i = 1, ■ • ■ , n indicates the number of samples and j = 1, • • ■ , m indicates the number of variables. The directions extracted by the orthogonal decomposition of X are the eigenvectors p, of XTX or the PC loadings

where X is an n x m data matrix with n observations of m variables, E

is n x m matrix of residuals, and the superscript T denotes the transpose of a matrix. Ideally the dimension A is chosen such that there is no significant process information left in E, and E represents random error. The eigenvalues of the covariance matrix of X define the corresponding amount of variance explained by each eigenvector. The projection of the measurements (observations) onto the eigenvectors define new points in the measurement space. These points constitute the score matrix, T whose columns are tj given in Eq. 4.1. The relationship between T, P, and X can also be expressed as

where P is an m x A matrix whose j'th column is the jth eigenvector of XTX, and T is an n x A score matrix.

The PCs can be computed by spectral decomposition [262], computar tion of eigenvalues and eigenvectors, or singular value decomposition. The covariance matrix S (S=XTX/(m — 1)) of data matrix X can be decomposed by spectral decomposition as

where P is a unitary matrix1 whose columns are the normalized eigenvectors of S and L is a diagonal matrix that contains the ordered eigenvalues of S. The scores T are computed by using the relation T = XP.

Singular value decomposition is

where the columns of U are the normalized eigenvectors of XXT, the columns of V are the normalized eigenvectors of XrX, and A is a 'diagonal' matrix having as its elements the positive square roots of the magnitude ordered eigenvalues of XrX. For an nxm matrix X, U is n xn, V is m xm and A is nxm. Let the rank of X be denoted as p, p < min(m, n). The first p rows of A make a p x p diagonal matrix, the remaining n — p rows are filled with zeros. Term by term comparison of the last two equations yields

For a data set that is described well by two PCs, the data can be displayed in a plane. The data are scattered as an ellipse whose axes are in

1A unitary matrix A is a complex matrix in which the inverse is equal to the conjugate of the transpose: A-1 = A*. Orthogonal matrices are unitary. If A is a real unitary matrix then A-1 = AT.

Figure 4.2. Data preprocessing: Scaling of the variables, (a) Raw data, (b) After mean-centering only, (c) After variance-scaling only, (d) After autoscaling (mean-centering and variance-scaling) [145, 181].

the direction of PC loadings in Figure 4.1. For higher number of variables data will be scattered as an ellipsoid.

PCA is sensitive to scaling and outliers. The process data matrix should be mean-centered and scaled properly before the analysis. Scaling is usually performed by dividing all the values for a certain variable by the standard deviation for that variable so that the variance in each variable is unity (Figure 4.2(d)) corresponding to assumption that all variables are equally important. If a priori knowledge about the relative importance about the variables is available, important variables can be given a slightly higher scaling weight than that corresponding to unit variance scaling [82, 206].

The selection of appropriate number of PCs or the maximum significant dimension A is critical for developing a parsimonious PCA model [253, 262, 528]. A quick method for computing an approximate value for A is to add PCs to the model until the percent of the variation explained by adding additional PCs becomes small. Inspect the ratio Yht=\ ^ / Sf=i h where L is the diagonal matrix of ordered eigenvalues of S, the covariance matrix. The sum of the variances of the original variables is equal to the trace (tr(S)), the sum of the diagonal elements of S:

where tr(S) = tr(L). A more precise method that requires large computational time is cross-validation [309, 659]. Cross-validation is implemented by excluding part of the data, performing PCA on the remaining data, and computing the prediction error sum of squares (PRESS) using the data retained (excluded from model development). The process is repeated until every observation is left out once. The order A is selected as that minimizes the overall PRESS. Two additional criteria for choosing the optimal number of PCs have also been proposed by Wold [659] and Krzanowski [309], related to cross-validation. Wold [659] proposed checking the following ratio

where RSSa is the residual sum of squares after /1th principal component based on the PCA model. When R exceeds unity upon addition of another PC, it suggests that the Ath component did not improve the prediction power of the model and it is better to use A — 1 components. Krzanowski [309] suggested the ratio

Was this article helpful?

0 0

Post a comment