Figure 4.17. Activation functions .
Multilayer feedforward networks contain an input layer connected to one or more layers of hidden neurons (hidden units) and an output layer (Figure 4.18(b)). The hidden units internally transform the data representation to extract higher-order statistics. The input signals are applied to the neurons in the first hidden layer, the output signals of that layer are used as inputs to the next layer, and so on for the rest of the network. The output signals of the neurons in the output layer reflect the overall response of the network to the activation pattern supplied by the source nodes in the input layer. This type of networks are especially useful for pattern association (i.e. mapping input vectors to output vectors).
Recurrent networks differ from feedforward networks in that they have at least one feedback loop. An example of this type of network is given in Figure 4.18(c) which is one of the earliest recurrent networks called
Jordan network . The activation values of the output units are fed back into the input layer through a set of extra units called the state units. Learning takes place in the connection between input and hidden units as well as hidden and output units. Recurrent networks are useful for pattern sequencing (i.e., following the sequences of the network activation over time). The presence of feedback loops has a profound impact on the learning capability of the network and on its performance . Applications to chemical process modeling and identification have been reported [97, 616, 679].
Before proceeding with training the network, an appropriate network architecture should be declared. This can be done either in static or dynamic manner. Many ad hoc techniques for static network structure selection are based on pruning the redundant nodes by testing a range of network sizes, i.e., number of hidden nodes. However, techniques for network architecture selection for feedforward networks have been proposed [301, 335, 482, 627, 628]. Reed  gives a partial survey of pruning algorithms and recent advances can be found in the neural network literature [144, 404],
Having specified the network architecture, a set of input-output data is used to train the network, i.e. to determine appropriate values for the weights associated with each interconnection. The data are then propagated forward through the network to generate an output to be compared with the actual output. The overall procedure of training can be seen as learning for the network from its environment through an interactive process of adjustments applied to its weights and bias levels. A number of learning rules such as error-correction, memory-based, Hebbian, competitive, Boltzmann learning have been proposed  to define how the network weights are adjusted. Besides these rules, there are several procedures called learning paradigms that determine how a network relates to its environment. The learning paradigm refers to a model of the environment in which the network operates. There are two main classes of learning paradigms:
Learning with teacher (supervised learning), in which a teacher provides output targets for each input pattern, and corrects the network's errors explicitly. The teacher can be thought of as having knowledge of the environment (presented by the historical set of input-output data) so that the neural network is provided with desired response when a training vector is available. The desired response represents the optimum action to be performed to adjust neural network weights under the influence of the training vector and error signal. The error signal is the difference between the desired response (historical value)
(a) Single-layer feedforward network.
Layer of output neurons
Layer of hidden neurons
Layer of hidden neurons
Layer of output neurons
(b) Multilayer feedforward network.
Input layer of _ Layer of hidden Layer of output source nodes a e uni s neurons neurons
(c) Recurrent network . Figure 4.18. Three fundamentally different network architectures.
and the actual response (computed value) of the network. This corrective algorithm is repeated iteratively until a preset convergence criteria is reached. One of the most widely used supervised training algorithms is the error backpropagation or generalized delta rule proposed by Rumelhart and others [527, 637].
Learning without a teacher, in which there is no teacher, and the network must find the regularities in the training data by itself. This paradigm has two subgroups
1. Reinforcement learning/Neurodynamic programming, where learning the relationship between inputs and outputs is performed through continued interaction with the environment to minimize a scalar index of performance. This is closely related to Dynamic Programming .
2. Unsupervised learning, or self-organized learning where there is no external teacher or critic to oversee the learning process. Once the network is tuned to the statistical regularities of the input data, it forms internal presentations for encoding the input automatically [48, 226],
Was this article helpful?