Performance of Artificial Neural Network Using Heterogeneous Transfer Functions

Neural networks have been very important models across computer vision, natural language processing, speech and image recognition, aircraft safety and many more. It uses a variety of architectures that centres on the Multi- Layer Perceptron (MLP) which is the most commonly used type of Artificial Neural Network.


Introduction
Deep neural networks have bested notable benchmarks across computer vision, reinforcement learning, speech recognition, and natural language processing. However, neural networks still have deficiencies. For instance, they have a penchant to over-fit, and large data sets and careful regularization are needed to combat this tendency. Artificial Neural Networks use a variety of architectures. This study centers on the Multi-Layer Perceptron (MLP) which is the most commonly used type of ANN. The MLP is also known as the Feed-Forward Network (FFN). MLP has been found to be powerful in terms of model precision in the usage of homogeneous transfer functions (TFs), especially with complex or large data set. The choice of MLP is because it is the only ANN type that allows for statistical inference. connected, feed forward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition.
Garriga-Alonso et al. (2019) showed that the output of a (residual) CNN with an appropriate prior over the weights and biases is a GP in the limit of infinitely many convolutional filters for dense networks. Laurence Aitchison (2020) They argued that getting Bayesian neural networks to perform comparably to artificially reduce uncertainty using a "tempered" or "cold" posterior. This is extremely concerning if the prior is accurate.  The aim of this study is to compare the performance of ANN using heterogeneous transfer functions and homogenous transfer Functions. The rest of this paper are the methodology, data simulation for the study, presentation of result and conclusion.

Artificial Neural Networks (ANNs)
As simple statistical models, ANNs have been useful to many functions, such as forecasting, curvefitting, and regression in the fields of engineering, earth sciences, medicine, hydrology, etc. ANN models study data and carry out jobs such as classification or forecasting. The nature of the data is used to assess the network model in the building procedure, unlike other models that use before postulations. ANN arrangements are structured in levels positioned as input, hidden, and output levels. Within every level, there are interconnected elements known as neurons. Weights are the essential variables of the ANN models used to resolve a hitch. The total of the weighted inputs and the bias terms are entered into an activation function that is executed to avert the output from getting bigger. Frequently executed sets of activation functions include the sigmoid, hyperbolic tangent, and the rectified linear unit (ReLU) functions.
The statistics neural network model is given as (1) Output = sum (weights*inputs) + bias Where y is the dependent variable, = ( ) is a vector of independent variables, where w = is the network weight and ei = is the stochastic term that is normally distributed (that is, e ~ N(0, )).

ANN Model Development
This study utilized a multilayer perceptron (MLP) feed-forward network. The multilayer perceptron reduced the error between the ANN model outputs and observed values by renewing the weights between each node. The choice of the hidden nodes in the complicated area in ANN modelling. To date, there are no precise strategies for matters such as how many hidden layers and hidden nodes should be integrated into an ANN model. Thus, a trial-and-error method was utilized to find the best number of nodes for the hidden layer. In this study, the data was split into training and test sets, training set and testing set (70%, 80%, 90%), hidden layers (2,5,10) and activation functions (sigmoid, hyperbolic tangent and rectified linear unit). Thus, the results obtained in the results section are the estimations of the performance of the ANN on the test data. All input data are normalized using the following equations: Where is the observed value, and are respectively the minimum and maximum data in the input time series.

Activation functions
The perception of the handling of neural networks is mainly attained through the activation functions. An activation function is a mathematical function that changes the input variable to an output variable. In default of activation functions, the operation of neural networks will be similar to linear functions. A linear function is a function where the output variable is exactly related to the input variable.
Nevertheless, most of the limitations the neural networks try to unravel are nonlinear and complicated. The activation functions are utilized to attain the nonlinearity. Nonlinear functions are high-level polynomial functions. The graph of a nonlinear function is curved and combines the complication element. Activation functions provide the nonlinearity element to neural networks and render them accurate universal function approximations.

Sigmoid
The sigmoid function is a mathematical function that gives a sigmoidal curve; a characteristic curve for its S shape. This is the oldest and frequently used activation function. This compresses the input to any value between 0 and 1 and makes the model logistic. This function is known as a special case of logistic function defined by the following formula:

Hyperbolic tangent
Another common and mostly utilized activation function is the tanh function. This is a nonlinear function, characterized in the scale of values (-1, 1). One thing to make clear is that the gradient is better for tanh than sigmoid (the derivatives are steeper). Settling between sigmoid and tanh will be based on the gradient strength prerequisite. Like the sigmoid, tanh also has the missing slope constraint. The function is specified by the formula: This looks like sigmoid; it is a scaled sigmoid function.

Rectified Linear Unit
Rectified Linear Unit (ReLU) is a predominantly utilized activation function. It is a simple specification and has merits over the other functions. The function is defined by the following formula: f(x) = 0 when x 0 x when x = 0 The scale of the result is between 0 and infinity. RELU finds usage in computer vision and speech identification using deep neural networks.

Artificial Neural Network with Heterogeneous Transfer Function
The hardware was provided for the training process depicted in table 1. GPU was necessary in the training process to speed up the time process for eye modelling. Tensorflow-GPU version 1.12 was selected to handle the training process with Keras support. Beside the Tensorflow-GPU, OpenCV was also installed to run the model in real time application using webcam.
The model below gives a neural network model with a homogenous transfer function Where g (.) is the transfer functions, which makes the equation (2.8) above is called an Homogenous SNN (HSNN) model. Given a convoluted form of the artificial neural network model given above, using the product convolution, we have Where (.) and (.) are transfer functions, which are HTFs but combined in equation 3 above to make a heterogeneous Transfer function (HETFs). Equation (2.9) above is called the Heterogeneous SNN (HETSNN) model.

Heterogeneous Transfer Functions (HETFs)
Based on the above listed best HTFs, two convoluted HETFs were derived using the principle of convolution i.e. g1(.) x g2(.) such that the newly derived transfer functions are also a probability density function. These two HETFs below are derived using the convolution of Symmetric Saturating Linear Transfer Function and the Hyperbolic Tangent Transfer Function (SSLHT) and the convolution of the Symmetric Saturating Linear Transfer Function and the Hyperbolic Tangent Sigmoid Transfer Function (SSLHTS) (Udomboso, 2014).
The summary of the derived function is given as: Where p is the number of parameters.

Data Simulation for the Study
The data to be used for this study was generated using the model below: Where ei ~ N(0,0.02) and x ~N(0.1).
The results are based on the prediction and model selection criterion given at different levels of hidden neurons at different sample sizes. The hidden neurons used are 2, 5 and 10, while the sample sizes include 50, 100, 200, 500 and 1000. The data was also divided into training and testing sets of 90 and 10, 80 and 20 and 70 and 30 respectively.

Prediction Selection Criteria
The Test Error is used when a model is to be validated. When we calculate the error on data which was unknown in the training phase, we are calculating the test error.

Results and Discussion
This present the analyses of the performance of ANN using homogenous transfer functions and heterogeneous transfer functions. Tables 1 to 3 below shows the forecast performance measures results for the simulated data using the mean square error, mean absolute error and the test error respectively. The tables show the performance of the transfer functions (HTFs and HETFs) under different training set numbers (70%, 80% and 90%) and under different hidden neurons (2, 5, and 10) under different activation functions and at different sample sizes. The results obtained from Tables 1 to 3 reflects the performance of the activation functions, for three the homogenous transfer functions and two heterogeneous transfer function. Considering the activation function, RELU produced majority of the lowest mean square errors (MSEs), mean absolute error (MAEs) and the test error among the homogenous transfer functions while the two heterogeneous transfer functions produced lowest mean square error, mean absolute error and the test error compared to the homogenous transfer functions considered which make them better in prediction. It can also be seen from the results that, as the sample size increases, the value of the mean square error decreases.
Hidden Neuron Hidden Neuron Hidden Neuron Hidden Neuron Hidden Neuron Hidden Neuron Hidden Neuron

Conclusion
In this study, Mean Square Error (MSE) was used to assess the performances of all ANN models. Conclusively, RELU produced majority of the lowest mean square errors (MSEs) across the sample sizes. Also, as the training percentage increases, the mean square error increases in most cases. The performance of the heterogeneous transfer functions considered have been good in terms of prediction measures has they produced lowest mean square error in almost all the cases of training sets and at different level of sample sizes considered.
Neural network model is one of the so called important models used is data science for pattern and image recognition, computer vision and so on. Without the use of activation functions, neural network cannot be used because it is the main functional part of the model. In previous studies, homogenous transfer functions have been used in predictions in most of these areas mentioned above. With the result obtained from this study, it is recommended that heterogeneous transfer functions for neural network models should be considered for neural network models. The forecast performance of the heterogeneous transfer functions in this study has shown that, if used in neural network models for the aforementioned areas of research and many more, better results will be attained.