## K

Figure 6.20. An example of time warping of two features vectors (made on the same object); time warping functions c\ and c2 map the individual time indices i and j. respectively, to the common time index k [484].

The warping function C is required to minimize the overall cost function for any path [406, 446, 484, 533]

where D[i(k),j(k)] = D(t, r) is a normalized total distance between the two trajectories along the path of length K, w(k) is a weighting function for the local distances and N(w) is a normalization factor which is a function of the weighting function. Now, the problem is reduced into an optimization problem that can be written as

where D*(C) denotes the minimum normalized total distance and C* the optimal path. This optimization problem can be efficiently solved by using dynamic programming techniques [49, 53]. Dynamic programming is a well known optimization technique used extensively in operations research for solving sequential decision problems. The decision rules about determining the next point (location) to be visited following a current point i is called "policy". Dynamic programming determines the policy that leads to the minimum cost, moving from point 1 to point i based on the Principle of Optimality defined by Bellman [49] as

An optimal policy has the property that, whatever the initial state and decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

For time normalization problem this principle can be recast as follows [270, 406, 484]:

1. A globally optimal path is also locally optimal. If C* is determined as the optimal path, any (i,j) point on C* is also optimal.

2. The optimal path to the grid point (i,j) only depends on the values of the previous grid points.

Before delving into the integration/formulation of dynamic programming and time warping, constraints on the warping function must be discussed.

### Warping Function Constraints

There are some restrictions imposed on the warping function. Since it represents a model of the time axis fluctuation in feature patterns (batch trajectories), it should properly approximate the properties of the actual time axis. Typical essential DTW constraints include:

1. Endpoint Constraints (Boundary Conditions).

For batch process trajectories, this condition is easy to visualize:

Beginning point : ¿(1) = 1, j(l) = 1 or c(l) = (1,1) (6.57) Ending point : i{K) = N, j(K) = M or c(K) = (N, M)

### 2. Monotonicity Conditions.

The temporal order of the measurements collected on each variable is of crucial importance to their physical meaning. Hence, imposing a reasonable monotonicity constraint (monotonically nondecreasing sequence requirement) to maintain the temporal order while performing

DTW is necessary:

As shown in Figure 6.20, any path on which D(t,r) is calculated will not have a negative slope. Basically, this constraint prevents the possibility of reverse warping along the time axis that is physically meaningless.

3. Local Continuity Constraints.

There are two reasons to impose local continuity constraints:

(i) To guarantee that excessive compression or expansion of the time scale is avoided (neither too steep nor too gentle a gradient of fluctuations should be allowed).

(ii) To compare events in their natural temporal order while keeping any potential loss of information to a minimum [406, 484, 533].

This is a very important constraint since it defines the set of potential preceding points (predecessors) in the grid. Obviously, it is possible to specify many sets of such local constraints. In order to visualize the concept, consider the following local transition rule between two consecutive points on the path c, an example suggested by Sakoe and

Inequalities in Eq.6.59 impose that the warping function c should not skip any points in both vectors. This can also be formulated as

If (i,j) is the /cth path point in the grid shown in Figure 6.21, then the previous path point, c(k — 1) can only be chosen from a set of preceding points (Eq. 6.60). In this simple example, [¿(A:), j(k)] can only be reached by either from [i(k),j(k - 1)] or [i(k - 1), j(k - 1)] or [i(k — 1 ),j(k)]. This is also known as "no slope constraint" case. Obviously, the control of the slope would be of importance for the correct alignment. Sakoe and Chiba [533] have proposed a slope constraint on the warping function using a slope intensity measure P — q/p

Chiba [533], i{k) - i(k - 1) < 1 and j{k) - j{k - 1) < 1. (6.59)

'[i(k),j(k-l)] c(fc-l)= [i(k-l),j(k-l)] Ji(fc-1 ),j(k)}

(Figure 6.22). The intensity measures ensure that, if the warping function c(k) moves forward in the horizontal or vertical direction p consecutive times, then it is not allowed to proceed further in the same direction before moving at least q times in the diagonal direction (Figure 6.22). The larger the P value the more rigidly the warping function will be restricted. If the slope intensity (P) is too severe, DTW would not work effectively. If it is too lax, then the discrimination between the trajectories will be degraded. Consequently, it is necessary to set a proper value. Sakoe and Chiba [533] have reported that they have observed the best results with P = 1 although that depends on the system under investigation. This constraint also helps reduce search paths through the grid while maintaining a reasonable distortion in time axis. A number of slope intensities has been proposed [252, 406, 533]. Local continuity can also be expressed in terms of incremental path changes [406, 484], A path can be defined as a sequence of moves associated with a pair of coordinate increments as

where the indices refer to normalized time increments (or location of the points on the warping function). For illustration purposes, consider the slope intensity P = 1 case of Sakoe and Chiba [533] that has also been studied by Myers et al. [406] (Figure 6.23). According to this description, for instance, the path in Figure 6.19 contains the following transition sequence: