Info

logri/r2

Here, "small" r refers to a value that is much less than the radius of the data set, yet much greater than the distance between the second nearest neighbors in the data set. The choice of the second nearest neighbors is to eliminate the impossibility of comparing with a zero distance, should we have a repeating pattern in the data set. Although it is computationally straightforward to come up with a numerical value for d using this algorithm, producing an accurate value is often of doubt. A rule of thumb is to use at least 10d//2 data points to compute d [526].

Note that the box counting dimension computed from Eq (5.23) need not be integer. In fact, it is certainly non-integer for a strange attractor, hence the name strange. On the other hand, the phase space dimension m is an integer, and to host the attractor, it should be grater than or equal to the box counting dimension of the attractor, d. Although we stick with the box counting dimension in our arguments, there are other definitions of (fractal) dimensions, such as correlation and Lyapunov dimensions, but selecting one or the other would not change the line of thought, as different measurements of dimension of the same strange attractor should not differ in a way to contain an integer value in the range. Thus, no matter which definition we use for the fractal dimension, we have the same necessary condition m > d, and the same sufficient condition m > 2d.

Using these guidelines, one may be tempted to use an embedding dimension equal to the next integer value after 2d. In an ideal case, where there is no noise in the infinitely many data points, such a selection would be sound and safe. However, in a more realistic setup, if m is chosen too large, the noise in the data will decrease the density of points defining the attractor. In this analysis we are interested in finite dimensional deterministic systems, whereas noise is an infinite dimensional process that fills each available dimension in a reconstructed phase space. Increasing m beyond what is minimally required has the effect of unnecessarily increasing the level of contamination of data with noise [669]. A method to determine the minimal sufficient embedding dimension is called the false nearest neighbor method [276].

Suppose that the minimal embedding dimension for our dynamics is mo, for which a time delay state-space reconstruction would give us a one-to-one image of the attractor in the original phase space. Having the topological properties preserved, the neighbors of a given point are mapped onto neighbors in the reconstructed space. If we try to embed the attractor in an m-dimensional space with m < mo, the topological structure would no longer be preserved. Points would be projected into neighborhoods of other points to which they would not belong in higher dimensions. Such data points are called false neighbors. To find the minimal embedding dimension, we should require the fraction of the false neighbors to be less than a heuristic value.

Figure 5.15. The fraction of false nearest neighbors as a function of the embedding dimension.

Example 7 Embedding the blood oxygen concentration signal

If we investigate the time series of the blood oxygen concentration signal of the previous example for false nearest neighbors, we can see that an embedding dimension of m — 4 would be enough to reconstruct the statespace (Figure 5.15). □

5.2.2 Nonlinear Noise Filtering

Every modeling effort starts with measuring some quantity, with the ultimate goal of understanding the process that generated it. Although making some measurements can be fairly easy, finding the signal out of the measurement is a task on its own. That is, we have to identify the signal which is possibly contaminated by fluctuations in the system, or by disturbances in the environment, or by the measurement procedure itself. Thus, before using the measurement for model development, it is often desirable to filter the measurement, and obtain as clear a signal as possible. In linear system theory, the process that generated the signal is assumed to send a frequency spectrum that has a finite range with sharp peaks, and the contamination is assumed to have a broadband spectrum. Then, the separation of the signal of interest from the noise becomes an exercise of distinguishing narrowband signals from broadband signals. Methods for this [172] are over fifty years old and are well developed.

In more general terms, in filtering the noise from the signal, we are separating the information-bearing signal and the interference from the environment. In the case of a narrowband signal, such as signals from a linear system, in a broadband environment, the distinction is quite straightforward. The frequency domain is the appropriate space to perform the separation, and looking at the Fourier spectrum is sufficient to differentiate the signal from noise.

Similar to the linear case, if the nonlinear process signal and the contamination are located in significantly distinct frequency bands, the Fourier techniques are still indicative. In sampling dynamic systems, if for example the Fourier spectrum of the system is bounded from above at a cut-off frequency, fc, Shannon's sampling theorem states that, by choosing a sampling frequency, fs > 2fc, the signal can be perfectly reconstructed [172]. However, in the case of signals that come from sources that are dynamically rich, such as chaotic systems, both the signal and the contamination are typically broadband, and Fourier analysis is not of much assistance in making the separation. It is shown analytically that, the frequency spectrum of a system that follows intermittency route to chaos has a 1// tail [540]. When the orbits converge to a strange attractor, which is a fractal limit set, it again has a 1// tail in the frequency domain. Thus, for dynamically rich systems, no matter how high one considers the cut-off, the filtered portion of the signal will still have more information. This can be easily seen from the signal to noise ratio of a signal s, whose power content up to a frequency fb is P, and for frequencies greater than /?,, it goes proportional to 1//. This ratio

«//„ df/f with a a real positive proportionality constant, vanishes for all fc < oo. Furthermore, we cannot practically consider a very large fc, since most of the measurements are done by the aid of digital computers with finite clock frequencies. Nevertheless, we will be gathering measurements from such sources with finite sampling frequencies, and still wish to filter the data for the underlying signal. Another problem caused by finite sampling is the so called aliasing effect. That is, in the Fourier domain, the power contributions coming from the replicas of the original signal centered at the multiples of the sampling frequency are not negligible either.

If we can make the assumption that the signal we seek to separate is coming from a low-order system with specific geometric structure in its state space, we can make use of a deterministic system model or a Markov chain model, and seek for model parameters or transition probabilities via a time domain matching filter. The geometric structure of a system in its state space is characteristic for each chaotic process, which enables us to distinguish its signal from others. These separating techniques have a significant assumption about the nature of the process generating the signal, that is, the 'noise' we wish to separate from the 'signal' should be coming from a high-order chaotic source. Depending on the a priori information we have about the underlying system dynamics, various filtering problems can be stated.

• If we know the exact dynamics that generated the signal, x<+i = f(xf) (5.25)

with Xj £ 7Zn (i.e., an n-dimensional real vector) and f(-) : TZn —> 7Zn (i.e., an n-dimensional vector function that takes an n-dimensional argument), we can use this knowledge to extract the signal satisfying the dynamics. This method is referred as the regression technique.

• If we have a filtered signal from the system of interest extracted at some prior time, we can use this pivot signal to establish a statistics of the evolution on the attractor, and use it to separate the signal in the new set of measurements. This is gray box identification.

• If we know nothing about the underlying process and have just one instance of measurements, then we must start by making simplifying assumptions. Such assumptions may be that the dynamics is deterministic, and that it has a low-dimensional state space. This is black box identification.

Although as the problem bleaches out, the task of separating the signal from noise gets easier, the real life cases unfortunately favor darker shade situations. Various linear filtering and modeling techniques were discussed in Chapter 4.

To filter out noise in the time series signal, we will make use of the serial dependencies among the measurements, that cause the delay vectors to fill the available m-dimensional space in an inhomogeneous fashion. There is a rich literature on nonlinear noise reduction techniques [117, 295]. In this section we will briefly discuss one approach that exploits the geometric structure of the attractor by using local approximations.

The method is a simple local approximation that replaces the central coordinate of each embedding vector by the local average of this coordinate. The practical issues in implementing this technique are as follows [228]. If the data represents a chaotic dynamics, initial errors in the first and the last coordinates will be magnified through time. Thus, they should not be replaced by local averages. Secondly, except for oversampled data sets, it is desirable to choose a small time delay. Next, the embedding dimension, m, should be chosen higher than 2d +1, with d being the fractal dimension of the attractor. Finally, the neighborhood should be defined by selecting a neighborhood radius r such that, r should be large enough to cover the extent of the contaminating noise, yet smaller than the typical radius of curvature of the attractor. These conditions may not always be satisfied simultaneously. As we have been stressing repeatedly for other aspects of nonlinear data analysis, the process of filtering should be carried out in several attempts, by trying different tuning parameters, associated with a careful evaluation of the results, until they look reasonably satisfactory.

The filtering algorithm is as follows:

1. Pick a small time delay, r, a large enough odd embedding dimension, to, and an optimum neighborhood radius, r.

2. For each embedding vector x (as defined in Eq (5.21)) calculate a filtered middle coordinate m+i)r/2 by averaging over the neighborhood defined by r, as