999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Nonlinear Activation Functions in CNN Based on Fluid Dynamics and Its Applications

2019-02-22 12:52:18KazuhikoKakudaTomoyukiEnomotoandShinichiroMiura

Kazuhiko Kakuda , Tomoyuki Enomoto and Shinichiro Miura

Abstract: The nonlinear activation functions in the deep CNN (Convolutional Neural Network) based on fluid dynamics are presented. We propose two types of activation functions by applying the so-called parametric softsign to the negative region. We use significantly the well-known TensorFlow as the deep learning framework. The CNN architecture consists of three convolutional layers with the max-pooling and one fullyconnected softmax layer. The CNN approaches are applied to three benchmark datasets,namely, MNIST, CIFAR-10, and CIFAR-100. Numerical results demonstrate the workability and the validity of the present approach through comparison with other numerical performances.

Keywords: Deep learning, CNN, activation function, fluid dynamics, MNIST, CIFAR-10, CIFAR-100.

1 Introduction

The state-of-the-art on the deep learning in artificial intelligence is nowadays indispensable in engineering and science fields, such as robotics, automotive engineering,web-informatics, bio-informatics, and so on. There are recently some neural networks in the deep learning framework [LeCun, Bengio and Hinton (2015)], i.e., CNN(Convolutional Neural Networks) to recognize object images [Fukushima and Miyake(1982); LeCun, Bottou, Bengio et al. (1998); Krizhevsky, Sutskever and Hinton (2012)],RNN (Recurrent Neural Networks) to process time-series data [Rumelhart, Hinton and Williams (1986)], and so forth.

The appropriate choice of the activation functions for neural networks is a key factor in the deep learning simulations. Heretofore, there have been significantly proposed various activation functions in the CNN/RNN-frameworks. The standard activation function is the rectified linear unit (ReLU) introduced firstly by Hahnloser et al. [Hahnloser,Sarpeshkar, Mahowald et al. (2000)] in the theory of symmetric networks with rectification (It was called a rectification nonlinearity or ramp function [Cho and Saul(2009)]). Nair et al. [Nair and Hinton (2010)] have successfully performed by applying the ReLU activation functions based on the restricted Boltzmann machines to the deep neural networks. The activation function of ReLU has been widely used by many researchers for visual recognition tasks [Glorot, Bordes and Bengio (2011); Krizhevsky,Sutskever and Hinton (2012); Srivastava, Hinton, Krizhevsky et al. (2014); LeCun,Bengio and Hinton (2015); Kuo (2016); Agarap (2018)]. The ReLU activation function leads to better recognition performances than conventional sigmoid/tanh units involving the vanishing gradient problem, while has parameter-free, and zero-gradients in the negative part.

In order to provide meaningful such negative values, there have been presented some activation functions, such as leaky rectified linear unit (LReLU) [Maas, Hannun and Ng(2013)], parametric rectified linear unit (PReLU) [He, Zhang, Ren et al. (2015)],exponential linear unit (ELU) [Clevert, Unterthiner and Hochreiter (2016)], and so forth.The LReLU has been slightly improved for ReLU by replacing the negative part of the ReLU with a linear function involving small constant gradient. The PReLU has been generalized by adaptively learning the parameters introduced in the negative part of LReLU. They have improved significantly the learning performances on large image datasets called ImageNet. Clevert et al. [Clevert, Unterthiner and Hochreiter (2016)] have proposed an activation function, ELU, and shown applicability and validity for various benchmark datasets. As another approach, Goodfellow et al. [Goodfellow, Warde-Farley,Mirza et al. (2013)] have also proposed an activation function called maxout that has features both for optimization and model averaging with dropout [Hinton, Srivastava,Krizhevsky et al. (2012)].

In our previous work, we have presented newly the characteristic function (i.e., activation function) as an optimum function which is derived on the advection-diffusion system in fluid dynamics framework [Kakuda (2002)]. The purpose of this paper is to propose the activation functions based on the concept of fluid dynamics framework. We present two types of activation functions by applying the so-called parametric softsign [Glorot and Bengio (2010)] to the negative part of ReLU. By using the well-known TensorFlow[Abadi, Agarwal, Barham et al. (2015)] as the deep learning framework, we utilize the CNN architecture that consists of three convolutional layers with the max-pooling and one fully-connected softmax layer. The workability and the validity of the present approach are demonstrated on three benchmark datasets, namely, MNIST [LeCun, Bottou,Bengio et al. (1998)], CIFAR-10 and CIFAR-100 [Krizhevsky and Hinton (2009)],through comparison with other numerical performances.

2 Construction of nonlinear activation functions

2.1 Neural network model

In the field of neural networks, the input-output (I/O) relationship known as the backpropagation is represented by inputs Uj, output Vjand the characteristic function h (i.e.,activation function) as follows:

where Sijare j-th input values as shown in Fig. 1, wijare the connection weights, Ijis the bias value, and Tjdenotes threshold.

The sigmoid function (see Fig. 2(a)) has been mainly used as the following continuous function.

where k is ad hoc parameters.

Figure 1: Neuron model

2.2 Nonlinear activation functions based on fluid dynamics

Heretofore, we have presented the following activation function as an optimum function which is derived on the steady advection-diffusion system in fluid dynamics framework[Kakuda (2002)] (see Eq. (25) in next Subsection 2.3).

where γ =v? 2k > 0.

Mizukami [Mizukami (1985)] has presented significantly the following approximation function instead of Eq. (5) involving the singularity.

Therefore, we obtain the functions by substituting Eq. (6) into Eq. (4) and considering the sign of v as follows (see Fig. 2(b)):

In this stage, we adjust the functions, h(v), so that g(0)=0.

As a result, we obtain the following form by taking into account that=2κ.

in which κ =2k. Eq. (9) represents the softsign function with κ =1 [Glorot and Bengio(2010)]. The so-called parametric softsign is equivalent to the ReLU [Nair and Hinton(2010)] under the conditions, such as κ=+∞ for v≥0 and κ=0 for v< 0.

In order to avoid zero-gradients in the negative part of v, by applying Eq. (9) to the negative region, we propose two types of activation function involving parameter, a, as follows (see Fig. 3):

Figure 2: Characteristic functions

Rational-type activation function and its derivatives

Exponential-type activation function and its derivatives

The corresponding derivatives of the activation functions are also shown in Fig. 4.

Figure 3: Nonlinear activation functions

Figure 4: Derivatives of activation functions

2.3 Steady advection-diffusion equation

2.3.1 Problem statement

Let us briefly consider the one-dimensional advection-diffusion equation in spatial coordinate, x, given by

with the adequate boundary conditions, where f=uφ, u and k are the given velocity and diffusivity, respectively.

2.3.2 Finite element formulation

In order to solve the flux, f=uφ, in a stable manner, we shall adopt the Petrov-Galerkin finite element formulation using exponential weighting function [Kakuda and Tosaka(1992)]. On the other hand, the conventional Galerkin finite element formulation can be applied to solve numerically Eq. (14).

First of all, we start with the following weighted integral expression in a subdomain Ωi=[xi-1,xi] with respect to weighting function w :

The weighting function w can be chosen as a general solution which satisfies

where Δxi=xi-xi-1, and σ(u) denotes some function described by Yee et al. [Yee,Warming and Harten (1985)], which is sometimes referred to as the coefficient of numerical viscosity. The solution of Eq. (16) is as follows:

where A is a constant, and a=u/ Δxiσ(u).

By applying the piecewise linear function to the flux f and φ, we obtain the following integral form

in which

Here, applying an element-wise mass lumping to the first term of the left-hand side of Eq.(18), and carrying out exactly those integrals in Eq. (18), we can obtain the following numerical fluxes fi-1/2and fi+1/2in the subdomains Ωiand Ωi+1, respectively

where γ =u/2σ(u), and sgn(γ) denotes the signum function.

Let us next derive the Galerkin finite element model for Eq. (14). The weighted residual equation in Ωiis given as follows:

In this stage, we assume a uniform mesh Δxi= Δx for simplicity of the formulation.Taking into consideration the continuity of φ,xat nodal point i, we can obtain the following discrete form

Substituting Eq. (20) and Eq. (21) into Eq. (23) and after some manipulations, we obtain the following finite difference form

where for any velocity u

Using the element Peclet number Pe(≡ Δxu/2k) as γ, we reduce Eq. (24) to the following form

This equation has the same structure as the SUPG scheme developed by Brooks et al.[Brooks and Hughes (1982)], and it leads to nodally exact solutions for all values of Pe[Christie, Griffiths, Mitchell et al. (1976)].

3 CNN architecture

We adopt the similar approach as the PReLU [He, Zhang, Ren et al. (2015)] which can be trained using back-propagation and optimized simultaneously with other layers. As the variables, v and a, in Eq. (10) through Eq. (13), we define vjand ajwith respect to the input and the coefficient, respectively, on the j-th channel. The momentum approach when updating ajis given as follows:

where E represents the objective function, μ is the momentum to accelerate learning, ε is the learning rate. The parameters, aj, are optimally obtained by using back-propagation analysis.

Fig. 5 shows the CNN architecture consisting of three convolutional (i.e., conv) layers with some max-pooling and one fully-connected (i.e., fc) softmax layer.

Figure 5: CNN architecture

4 Numerical experiments

In this section, we use the well-known TensorFlow [Abadi, Agarwal, Barham et al.(2015)] as the deep learning framework, and present numerical performances obtained from applications of the above-mentioned CNN approach to three typical datasets,namely, MNIST [LeCun, Bottou, Bengio et al. (1998)], CIFAR-10 and CIFAR-100[Krizhevsky and Hinton (2009)]. We utilize the Adam [Kingma and Ba (2015)] as the learning algorithm for the stochastic gradient-based optimization.

The model is previously trained for some epochs on mini-batches of size 100 with the learning rate, ε=10-3, and the momentum, μ =0. The specification of CPU and GPU using CUDA is summarized in Tab. 1.

Table 1: A summary of the specification of CPU and GPU

4.1 MNIST

Let us first consider the MNIST dataset which consists of 28×28 pixel gray-scale handwritten digit images with 50,000 for training and 10,000 for testing images.

Fig. 6 shows the behaviors of training accuracy and loss (i.e., cross-entropy) obtained by using various activation functions for the MNIST. The corresponding validation accuracy and loss behaviors are shown in Fig. 7. We can see from Fig. 6 and Fig. 7 that our approaches are similar to the ones using other activation functions. Tab. 2 summarizes the transitions of the learned parameter, a, at each layer of CNN architecture (see Fig. 5)for the MNIST. The validation accuracy rate and loss for the MNIST are given in Tab. 3.In this case, the quantitative agreement between our results and other ones appears also satisfactory.

Table 2: Transitions of the learned parameter, a, for the MNIST

Figure 6: Training accuracy and loss behaviors for the MNIST

Figure 7: Validation accuracy and loss behaviors for the MNIST

Table 3: Accuracy rate and loss for the MNIST

4.2 CIFAR-10

As the second benchmark dataset, we consider the CIFAR-10 which consists of 32×32 color images drawn from 10 classes with 50,000 for training and 10,000 for testing images.

Fig. 8 shows the behaviors of training accuracy and loss (i.e., cross-entropy) obtained by using various activation functions for the CIFAR-10. The corresponding validation accuracy and loss behaviors are shown in Fig. 9. We can see from Fig. 8 and Fig. 9 that our approaches outperform entirely the ones using other activation functions. Tab. 4 summarizes the transitions of the learned parameter, a, at each layer of CNN architecture(see Fig. 5) for the CIFAR-10. The validation accuracy rate and loss for the CIFAR-10 are given in Tab. 5. For the accuracy rate on the CIFAR-10, we obtain best result of 80.76% using the exponential-type activation function. On the other hand, our approaches for the loss outperform other ones.

Table 4: Transitions of the learned parameter, a, for the CIFAR-10

Figure 8: Training accuracy and loss behaviors for the CIFAR-10

Figure 9: Validation accuracy and loss behaviors for the CIFAR-10

Table 5: Accuracy rate and loss for the CIFAR-10

4.3 CIFAR-100

As the third benchmark dataset, the CIFAR-100 is the same size and format as the CIFAR-10 one, while contains 100 classes for consisting of 20 super-classes with five classes each.

Table 6: Transitions of the learned parameter, a, for the CIFAR-100

Figure 10: Training accuracy and loss behaviors for the CIFAR-100

Table 7: Accuracy rate and loss for the CIFAR-100

Figure 11: Validation accuracy and loss behaviors for the CIFAR-100

Fig. 10 shows the behaviors of training accuracy and loss obtained by using various activation functions for the CIFAR-100. The corresponding validation accuracy and loss behaviors are shown in Fig. 11. We can see from Fig. 10 and Fig. 11 that our approaches outperform the ones using other activation functions. Tab. 6 summarizes the transitions of the learned parameter, a, at each layer of CNN architecture for the CIFAR-100. The validation accuracy rate and loss for the CIFAR-100 are given in Tab. 7. For the accuracy rate on the CIFAR-100, we obtain best result of 56.91% using the rational-type activation function. On the other hand, our approaches for the loss outperform also other ones.

5 Conclusions

We have proposed new activation functions which were based on the steady advectiondiffusion system in fluid dynamics framework. In our formulation, two types of activation functions have been reasonably presented by applying the so-called parametric softsign to the negative part of ReLU. By using the TensorFlow as the deep learning framework, we have utilized the CNN architecture that consists of three convolutional layers with some max-pooling and one fully-connected softmax layer.

The performances of our approaches were carried out on three benchmark datasets,namely, MNIST, CIFAR-10 and CIFAR-100, through comparison with the ones using other activation functions. The learning performances demonstrated that our approaches were capable of recognizing somewhat accurately and in less loss (i.e., cross-entropy) the object images in comparison with other ones.

主站蜘蛛池模板: 精品伊人久久久大香线蕉欧美| 粉嫩国产白浆在线观看| 日本福利视频网站| 亚洲国产成人精品青青草原| 四虎AV麻豆| 国产黄色爱视频| 色妞www精品视频一级下载| 波多野吉衣一区二区三区av| 亚洲精品视频免费| 国产精品美女在线| 国产久操视频| 激情综合网址| 亚洲综合极品香蕉久久网| 成人字幕网视频在线观看| 欧美精品另类| www.亚洲色图.com| 国产精品女熟高潮视频| 久久久久亚洲Av片无码观看| 亚洲成年人片| 伊人久久大香线蕉aⅴ色| 亚洲精品免费网站| 国产精品性| 久一在线视频| 国产激爽爽爽大片在线观看| 日本国产精品一区久久久| 久久精品亚洲专区| 97无码免费人妻超级碰碰碰| 成人一级黄色毛片| 无码日韩人妻精品久久蜜桃| 最新痴汉在线无码AV| 亚洲av日韩av制服丝袜| 欧美在线中文字幕| 国产精品视频免费网站| 欧美亚洲欧美区| 亚洲一级无毛片无码在线免费视频| 人妻精品全国免费视频| 91视频青青草| 免费久久一级欧美特大黄| 国产一区二区精品福利| 亚国产欧美在线人成| 91系列在线观看| 国产成人综合在线观看| 欧美性精品不卡在线观看| 亚洲AV无码一二区三区在线播放| 999精品色在线观看| 麻豆精品久久久久久久99蜜桃| 国产精品欧美在线观看| 72种姿势欧美久久久大黄蕉| 亚洲成a人片| www亚洲精品| 国产JIZzJIzz视频全部免费| 漂亮人妻被中出中文字幕久久| 婷婷六月综合网| 国产伦精品一区二区三区视频优播| 国产午夜无码片在线观看网站| 成人一级黄色毛片| 国内精品伊人久久久久7777人| 午夜视频免费一区二区在线看| 国产精品久久久久无码网站| 国产区免费精品视频| jijzzizz老师出水喷水喷出| 26uuu国产精品视频| 久久黄色一级视频| 国产乱人伦AV在线A| 国产拍揄自揄精品视频网站| 制服丝袜在线视频香蕉| 第一页亚洲| 亚洲二三区| 99视频在线精品免费观看6| 日韩一级二级三级| 一区二区三区四区日韩| 老司国产精品视频| 久久精品娱乐亚洲领先| 欧美精品另类| 日韩小视频在线播放| 亚洲资源站av无码网址| 亚洲高清在线播放| 97久久免费视频| 色综合天天综合| 999国产精品| 极品国产在线| 男人天堂亚洲天堂|