## N

This process is an observable Markov model because the process outputs are the set of states and each state corresponds to a deterministically observable event. The outputs in any given state are not random. A simple Markov chain with three states is presented in Figure 8.9.

### Hidden Markov Models

Consider the case where the stochastic process is observed only through a set of stochastic processes that produce the sequence of observations. The states are not directly observable, they are inferred from the observe tions. An example would be a process consisting of a few containers that are filled with marbles of multiple colors. Each container has marbles of all colors, but the fraction of marbles of a certain color in each container varies. At each observation time, a marble rolls out through a channel that connects all containers, but the observer does not see the container that dispenses the marble and does not know the rule that selects the container that dispenses the marble. The dispensing of marbles generates a finite observation sequence of colors which can be modeled as the observable output of an HMM. A simple HMM of this process would have each state corresponding to one of the containers and for which a marble color probability is defined for each state. The choice of containers is determined by the state transition matrix A = [a^] of the HMM.

The elements of the HMM as depicted in Figure 8.10, include [279, 484]:

N The number of states in the model. The model is called ergodic if any state can be reached from any other state. Generally all states are interconnected. Denote the state at time t as qt-

M The number of distinct observation symbols per state, the alphabet size. The individual symbols are denoted as V = {vlf • • • , vM}

A = {a,,} The state probability distribution where an = P[qt = Sjlqt-i = 1 < i,j < N (8.110) Figure 8.9. A Markov chain with three states (labelled si, s2 and s3) and selected transitions (ai3 and a.32 set to 0).

B = {bj(k)} The observation symbol probability distribution, where bj(k) = P\ot = wk\qt = Sj] 1 < k < M (8.111)

defines the symbol distribution in state Sj, for 1 < j < N.

C = {cj} The initial state distribution {c2} also called the initial state occupancy probability ci = P\q1=si) I <i < N (8.112)

A complete specification of an HMM requires specification of two model parameters (N and M), observation symbols, and three sets of probability measures (A, B, and C). The set of probabilities is written compactly as A = (A,B,C). This parameter set defines a probability measure for O, P( 0|A).

Given the values of N, M, A, B and A, the HMM can generate an observation sequence O = (01 • • • ot) where each observation ot is one of the symbols in V, and T is the total number of observations in the sequence.

The three basic problems for HMMs to develop a useful model for FDD

are:

1. Given the observation sequence O = (01 • • • Or) and a model A = (A, B, C), how can the probability of the observation sequence P(0\A) be computed efficiently? The solution provides a measure of similarity (goodness of fit) between the observation sequence and a sequence that would be generated with the given model. Hence, if these sequences are similar, one may accept that the model used is describing the process that generated these observations.

2. Given the observation sequence O = (oi ■ • • ot) and a model A = (A, B, C), how can the most likely state sequence q = (<71 <72 • • • Qt) that "explains" best the given observation sequence O be determined? This is an attempt to find the hidden part of the model, the "correct" state sequence. In general, there is no analytic solution and usually a solution based on some optimality criteria is obtained.

3. How are the model parameters A = (A,B,C) adjusted to maximize P(0|A)? The parameter re-estimation is carried out using a set of observation sequences called training data in an iterative manner until the model parameters maximize P(0|A).

The HMM is formulated in two stages: training stage that solves Problem 3, and testing stage that addresses Problems 1 and 2. Problem 1 is solved using either the forward or the backward computation procedure, Problem 2 is solved using the Viterbi algorithm, and Problem 3 is solved using the Expectation-Maximization (EM) (also called Baum-Welch) method [279, 484]. Details of the algorithms, implementation issues, and illustrations are given in both references.

Pattern recognition systems combined with finite-state HMMs have been used for fault detection in dynamic systems . The HMM parameters are derived from gross failure statistics. Wavelet-domain HMMs have also been proposed for feature extraction and trend analysis . A wavelet-based smoothing algorithm filters high-frequency noise. A trajectory shape analysis technique called triangular episodes converts the smoothed data into semi-qualitative mode and membership functions transforms the information to a symbolic representation. The symbolic data is classified with a set of sequence matching HMMs for trend analysis. This approach is extended to detection and classification of abnormal process situations using multidimensional hidden Markov trees [37, 579]. The case studies discussed in these publications illustrate the application of the method to various continuous processes.