999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

An Im proved End-to-End Memory Network for QA Tasks

2019-11-25 10:23:00AziguliWulamuZhenqiSunYonghongXieCongXuandAlanYang
Computers Materials&Continua 2019年9期

Aziguli WulamuZhenqi SunYonghong Xie Cong Xuand Alan Yang

Abstract:At present,End-to-End trainable Memory Networks (MemN2N) has proven to be promising in many deep learning fields,especially on simple natural language-based reasoning question and answer (QA) tasks.However,when solving some subtasks such as basic induction,path finding or time reasoning tasks,it remains challenging because of limited ability to learn useful information between memory and query.In this paper,we propose a novel gated linear units (GLU) and local-attention based end-to-end memory networks (MemN2N-GL) motivated by the success of attention mechanism theory in the field of neural machine translation,it shows an improved possibility to develop the ability of capturing complex memory-query relations and works better on some subtasks.It is an improved end-to-end memory network for QA tasks.We demonstrate the effectiveness of the ese approaches on the 20 bAbI dataset which includes 20 challenging tasks,without the use of any domain know ledge.Our project is open source on github4.

Keywords:QA system,memory network,local attention,gated linear unit.

1 Introduction

The QA problem has been around for a long time.As early as 1950,the British mathematician A.M.Turing in his paper put forward a method to determine whether the machine can think-Turing Test,which is seen as the blueprint for the QA system [Turing(1950)].The first generation of intelligent QA system converts simple natural language questions into pre-set single or multiple keywords and queries the information in a domainspecific database to obtain answers.Its earliest appearance can be traced back to the early 1950s and 1960s when the computer was born.The representative systems include two well-known QA systems,baseball [Green Jr,Wolf,Chomsky et al.(1961)] and lunar[Woods and Kaplan (1977)].They have a database in the background that holds various data the system can provide.When the user asks a question,the system converts the user’s question into a SQL query statement,and queries the data from the database to the user.

Shrdlu was a highly successful QA program developed by Terry Winograd in the late 60s and early 70s [Winograd (1972)],it simulated the operation of a robot in a toy world (the“blocks world”).The reason for its success was to choose a specific domain and the physical rules were easily w ritten as programs.The travel information consultation system GUS developed by Bobrow et al.in 1977 is another successful QA system [Bobrow,Kaplan,Norman et al.(1977)].In the 1990s,with the development of the Internet,a second generation question and answer system emerged.It extracts answers from large-scale text or web-based libraries based on information retrieval techniques and shallow NLP techniques [Srihari and Li (2000);Voorhees (1999);Zheng (2002)].A representative example is Start (1993),which is the world’s first web-based question answering system,was developed by the M IT Artificial Intelligence Lab [Katz (1997)].In 1999,the TREC (Text REtrieval Conference)began the evaluation of the question and answer system.In October 2000,ACL(the Association for Computational Linguistics) used the open domain question and answer system as a topic,which promoted the rapid development of the question and answer system.With the rise of web 2.0 technology,the third generation question answering system has developed [Tapeh and Rahgozar (2008)].It is characterized by high quality know ledge resources and deep NLP technology.Up to now,in addition to the “Cortana” of Microsoft[Young (2019)],the “Dumi” of Baidu [Zhu,Huang,Chen et al.(2018)] and the “Siri” of Apple [Hoy (2018)],many companies and research groups have also made breakthroughs in this field [Becker and Troendle (2018);Zhou,Gao,Li et al.(2018)].

Table1:Samples of the ree typesoftasks

In such a situation,end-to-end learning framework have shown promising performance because of their applicability in the real environment and efficiency in model updating [Shi and Yu (2018);Madotto,Wu and Fung (2018);Li,Wang,Sun et al.(2018);Liu,Tur,Hakkani-Tur et al.(2018)].In end-to-end dialog systems,End-to-end memory network(MemN2N) and its variants have always been hot topics of research [Perez and Liu (2018);Ganhotra (2018)],in light of the powerful ability to describe long term dependencies [Huang,Qi,Huang et al.(2017)] and the flexibility in the implementation process.

Although MemN2N has achieved good performance on the dialog bAbI tasks,where the memory components effectively work as representation of the dialog context and play a good role in inference.There are still many tasks not very satisfactory in the bAbI [Shi and Yu (2018)].In order to find out the reasons for this,we have made a careful comparison.We found that tasks achieved good performance on the dialog bAbI tasks (such as Task 3 supporting factsin Tab.1) have in common that there are many more contextual sentences than those perform well (such as Task yes/no questions in Tab.1).And when calculating the relevance of the memory and the query,MemN2N attend to all sentences on the memory side for query [Sukhbaatar,Szlam,Weston et al.(2015)],which is expensive and can potentially render it impractical.Inspired by the field of machine translation [Minh Thang and Hieu Pham (2015)],we introduce the local-attention mechanism when calculating the correlation between memory and query.We don’t consider all the information of memory,but a subset of sentences which are more relevant to the query.At the same time,inspired by the gated convolutional network(GCN) [Dauphin,Fan,Auli et al.(2017)],we investigate the idea of the gated linear units (GLU) into MemN2N to update the intermediate state between layers.The purpose of these two improvements is the same,that is to appropriately reduce the complexity of the model,let the model pay more attention to useful information when training.

We have compared our two improved methods to MemN2N in the bAbI tasks,analyzed their number of successful tasks and error rates in different tasks.We also analyzed from the perspective of training speed and visual weight.Experimentally,we demonstrate that both of our approaches are effective.

In the following sections,we first introduce the application of MemN2N model and the innovation of our MemN2N-GL model in the second section;in the third section,we introduce the implementation methods of our model in detail,including local-attention matching and GLU mapping;then,we show our experimental results in the fourth section,and make a comparative analysis with baseline;finally,in the fifth section,we show our conclusion and planning for future work.

Figure1:A single layer version of MemN2N

2 Model

The MemN2N architecture,introduced by Sukhbaatar et al.[Sukhbaatar,Szlam,Weston et al.(2015)],is a hot research method in the field of current QA system [Ganhotra and Polymenakos (2018);Perez and Liu (2018)].It is a form of Memory Network (MemNN)[Weston,Chopra and Bordes (2014)],but unlike the model in that work,it is trained endto-end that makes it easier to apply in practical situations [Sukhbaatar,Szlam,Weston et al.(2015)].Compared with traditional MemNN,MemN2N gets less supervision during training,which means it will reduce some complexity and may not be able to fully capture the context information.

Because of the good characteristics of MemN2N,it has been used in a wide range of tasks in recent years.Boyuan Pan et al.introduced a novel neural network architecture called Multi-layer Embedding with Memory Network (MemNN) for machine reading task,where a memory network of full-orientation matching of the query and passage to catch more pivotal information [Pan,Li,Zhao et al.(2017)].M Ghazvininejad et al.presented a novel,fully data-driven,and know ledge-grounded neural conversation model aimed at producing more contentful responses [Ghazvininejad,Brockett,Chang et al.(2018)].In the field of computer vision,Wu et al.proposed a long-term feature bank,which extracts supportive information over the entire span of a video to augment state- of -the-art video models [Wu,Feichtenh of er,Fan et al.(2018)].Although MemN2N has been widely used in many fields and has achieved good results,it may not scale well to the case where a larger memory is required [Sukhbaatar,Szlam,Weston et al.(2015)].

In this paper,inspired by the attention mechanism and its many deformations in the field of deep learning [Shen,Zhou,Long et al.(2018);Zhang,Goodfellow,Metaxas et al.(2018);Shen,He and Zhang (2018)],we propose an improvement point to introduce local attention mechanism into MemN2N to improve the model effect.Compared to the use of global matching between u and each memory miin MemN2N (Eq.(4)),local attention mechanism pays more attention to the local information related to the question state u in the memory.We also consider to optimize the updating of hidden state u between layers of MemN2N (Fig.1).The original method uses a linear mapping H (Eq.(10)) while we draw on the experience of GLU (Gated Linear Unit) proposed by Dauphin et al.[Dauphin,Fan,Auli et al.(2017)].We compare the improved model based on these two points with MemN2N in the same data sets.As a result,our model performs better in more complex QA tasks.

3 Methods

In this section,we introduce our proposed model MemN2N-GL.Our model aims to extract more useful interactions between memory and query to improve the accuracy of MemN2N.Similar to MemN2N,our MemN2N-GL consists of the ree main components:input memory representation,out memory representation and final answer prediction.

In the part of input memory representation,an input set x1,...,xi are converted into memory vectors {mi} and {ci} of dimension d in a continuous space,using embedding matrixes A and C,both of them are d×V,where d is embedding size,V is vocabulary size.Similarly,the query q is also embedded (by matrix B) to an internal state u.We use position encoding (PE) [Sukhbaatar,Szlam,Weston et al.(2015)] to convert word vectors as sentence vectors.This takes the form:

where ljis column vector with the structure:

J is the number of words in the sentence,and d is the dimension of the embedding.In this way,the position information of the words are taken into account when generating the sentence vector.Questions,memory inputs and memory outputs also use the same representation.In order to enable memory to have context temporal information,we also modify the memory vector by:

where TA(i) is the i th row of a special matrix TAthat encodes temporal information,and TAis learned during training.

Then in the embedding space,MemN2N calculate the relevance score between u and each memory miby means of matching in dot form [Minh-Thang Luong (2015)]followed by a softmax:

where

After applying softmax function,each component of the matrix uTmiwill be in the interval(0,1),and the components will add up to 1,so that they can be interpreted as probabilities.Furthermore,the larger input components will correspond to larger probabilities.

By contrast,we develop local-attention mechanism to calculate the correlation and filtering out irrelevant information between u and {mi},compared with the attention mechanism used in MemN2N,our model does not focus on the relevance of the global memory and query,but focuses on the local memory associated with the query.

3.1 Local-attention matching

As mentioned before,local-attention matching chooses to focus only on a small subset of the memory,which is more relevant to query q (Fig.2).Concretely,the model first generates an aligned position pufor the query q in the memory embedding A:

where vp,Waare the model parameters which will be learned to predict positions.S is the memory size,δ is activation function and pu∈ [0,S].

Then the relevance score between u and each memory miis defined as:

where piis the original score (Eq.(4)),is the standard deviation and D is the window size of subset memory.Finally,we use the new relevance score pito calculate the out memory representation in Fig.2.The response o from output memory is a sum of memory vectors {ci},weighted by the input probability vector:

Finally in final answer prediction,the predicted answer distribution a^ is produced by the sum of the output vector o and the input embedding u which then passed through a final weight matrix W (of size V×d) and a softmax:

Figure2:Local-Attention Matching

The above is for single layer structure (Fig.2),for many different types of difficult tasks,the model can be extended to multi-layer memory structure (Fig.1),where each memory layer is named a hop and in MemN2N,the (K+1)t?hop’s state uK+1is calculated by:

By contrast,we utilize gated linear units (GLU) in our MemN2N-GL.Compared to linear mapping H or frequently used nonlinear mapping functions,GLU effectively reducing the gradient dispersion,but also retaining the ability of nonlinearity.Proved by experiment,it is better suit for MemN2N.

3.2 Gated linear units mapping

In our MemN2N-GL,we use the layer-wise [Sukhbaatar,Szlam,Weston et al.(2015)]form (where the memory embeddings are the same across different layers,i.e.,A1...=AKand C1...=CK.) to expand the model from a single-layer to a multi-layer structure.In this case,we utilize GLU mapping to the update of u between layers:

where W,V ∈ ?m×m,m is embedding size,b,c∈ ?m×1,W,V,b,c are learned parameters,oK∈ ?m×1is the output of the Kt? layer (Fig.1).

Then,the predicted answer distribution a is the combination of the input and the output of the top memory layer:

Figure3:A k layers version of MemN2N

Finally,at the top of the network,we adopt the original approach combining the input ukand the output okof the top memory layer:

where Wis the parameter learned during training.

Compared with MemN2N,our model performs better in complex QA problems,which can be confirmed in the experimental results in the next section.We believe that the local attention mechanism removes redundant memory when calculating the correlation between memory and query,so the weight vector obtained is “purer” and contains more useful information.Besides,compared with linear mapping,GLU mapping has the ability of nonlinearity,which makes the model has stronger learning ability in the update of u between layers.

4 Experiments and results

We perform experiments on goal-oriented dialog datasets Dialog bAbI [Weston,Bordes,Chopra et al.(2015)],which contains 20 subtasks.Each of subtasks consists of the ree parts:the context statements of the e problem,the question,and correct answer.There are samples of the ree of the tasks in Tab.1.For each question,only certain subsets of the e statement contain the information needed for the answer,while other statements are basically unrelated interferers.And the difficulty of various subtasks is different,which is reflected in the increase of interference statements.During training,we choose to use 10K dataset,our goal is to improve the ability of the model to answer questions correctly based on context.

4.1 Training details

We perform our experiments with the following hyper-parameter values:embedding dimension embed_size=128,learning rate λ =0.01,size of each batch batc?_size=32,number of layers K=3,capacity of memory memory_size=50 and max gradient norm to clip max_clip=40.0.We also used some skills during the training,for example,the learning rate of our model automatically adjusts with the change of loss.If the loss value does not decrease but increases between adjacent training epochs,the learning rate will be reduced to 2/3 of the current value.The condition of training termination is that the loss value is less than a certain threshold (the experiment was 0.001),or the number of training epochs reaches the upper limit.In the course of training,the training time varies with the difficulty of different subtasks,but all of them are within one day.

Table2:Test error rates (%) on the 20 QA tasks

5:3 argument relation 9.5 11.2 3.2 0.9 6:yes/no questions 50.0 50.0 25.2 48.2 7:counting 50.3 11.5 16.1 10.9 8:lists/sets 8.7 7.3 6.0 5.6 9:simple negation 12.3 35.2 13.1 4.4 10:indefinite know ledge 11.3 2.6 3.9 13.9 11:basic coreference 15.9 18.0 40.9 9.0 12:conjunction 0.0 0.0 2.8 0.0 13:compound coreference 46.1 21.3 18.8 1.4 14:time reasoning 6.9 5.8 4.4 10.3 15:basic deduction 43.7 75.8 2.4 0.0 16:basic induction 53.3 52.5 57.8 53.1 17:positional reasoning 46.4 50.8 48.6 46.2 18:size reasoning 9.7 7.4 13.6 12.9 19:path finding 89.1 89.3 14.5 24.7 20:agent’s motivation 0.6 0.0 1.7 Mean error (%) 30.2 29.2 21.2 19.0 0.0 Successful tasks (err < 5%) 4 5 8 8

Table3:The visualization weights of layers

4.2 Results and analysis

Our baseline is MemN2N.We try three different combinations of improvements and compare their performance in different subtasks (as shown in Tab.2).MemN2N (Local-Attention) and MemN2N (GLU) indicate that the model only adds the local attention mechanism and the GLU mechanism respectively,MemN2N-GL means that both improvements exist simultaneously.The number in the table represents the error rates of each sub-task,and the bold number represents the result of the model that performs best in the same subtask.In the last two rows of the e table,we counted the mean error rates of all models and the number of successful tasks (subtasks with error rates less than 5).

In terms of results,MemN2N-GL achieves the best results both in mean error rates and in the number of successful tasks.Compared with MemN2N,the mean error rates is reduced by 37.09%,and the number of successful tasks doubled from four to eight.MemN2N(Local-Attention) and MemN2N (GLU) have their own advantages and disadvantages,but both of their effect are better than MemN2N.

4.3 Related tasks

In addition to comparing the results of different tasks with each model in Tab.2,we use a specific example to quantitatively analyze the result by the visualization weights of layers.As shown in Tab.3,the most relevant memory sentence to the query “Who did Fred give the apple to ?” is the first memory sentence:“Fred gave the apple to Bill”.After training,MemN2N does not focus on the memory sentences of greater relevance,which led to the wrong answer as a result.While our model pays close attention to contextual information that is highly relevant to query,which can be reflected in the size of the correlation weight at each layers.The darker the color,the greater the weight.

5 Conclusion and future work

In this paper we proposed two improvements based on MemN2N model for QA problem and perform empirical evaluation on dialog datasets bAbI.The experimental results show that our improved model has a greater performance than the original model,which strongly confirms our conjecture that the model should pay more attention to the useful information when training.In the future,we are prepared to further improve the ability of the model to handle complex tasks.At the same time,we are going to test our model on more datasets.We also intend to combine our model with recent research results Bert (Bidirectional Encoder Representations from Transformers) and use our model as a downstream part to see if it will achieve better results.

Acknowledgement:This work is supported by the National Key Research,Development Program of China under Grant 2017YFB1002304 and the National Natural Science Foundation of China (No.61672178)

主站蜘蛛池模板: 色综合婷婷| 日韩福利在线观看| 青青极品在线| 亚洲区欧美区| 久久久久亚洲精品成人网| 3344在线观看无码| 高清码无在线看| 久久免费精品琪琪| 国产精品黄色片| 亚洲日本一本dvd高清| 亚洲区视频在线观看| 国产福利在线免费观看| 午夜国产小视频| 国产AV无码专区亚洲A∨毛片| 国产精品无码久久久久AV| 日本久久免费| 国产H片无码不卡在线视频| 久久96热在精品国产高清| 人妻精品久久久无码区色视| 亚洲国产系列| 国产成人综合网在线观看| 欧美亚洲第一页| 国产精品成| 国产视频自拍一区| 日本影院一区| 女高中生自慰污污网站| 精品久久人人爽人人玩人人妻| 狠狠色噜噜狠狠狠狠色综合久 | 69视频国产| 国产自视频| 亚洲国产精品日韩av专区| av一区二区三区在线观看 | 18禁高潮出水呻吟娇喘蜜芽| 91成人在线免费观看| 成AV人片一区二区三区久久| 亚洲国产精品不卡在线| 日韩毛片免费观看| 日韩午夜福利在线观看| 国产黄色视频综合| 国产欧美日韩另类精彩视频| 精品国产欧美精品v| 亚洲第一成年网| 亚洲乱码视频| 91精品国产综合久久香蕉922| 国产h视频免费观看| 国产SUV精品一区二区6| 成人午夜久久| 一本久道热中字伊人| 91最新精品视频发布页| 欧美日韩精品在线播放| 亚洲伊人天堂| 久久精品66| 亚洲欧美日韩中文字幕一区二区三区| 国产精品无码翘臀在线看纯欲| 999国产精品永久免费视频精品久久 | 日韩精品无码不卡无码| 热思思久久免费视频| 国产精品毛片一区视频播| 福利视频一区| 伊人国产无码高清视频| 国产免费羞羞视频| 国产精品自在线天天看片| 亚洲视频色图| 无码一区二区三区视频在线播放| 色婷婷成人网| 亚洲毛片一级带毛片基地| 国产精品免费p区| 亚洲精品动漫| 一级毛片在线播放免费观看| 国产一级裸网站| 99热国产在线精品99| 亚洲欧美一区在线| 丝袜无码一区二区三区| 久久这里只精品热免费99| 91无码人妻精品一区| 成人福利在线视频| 精品国产污污免费网站| 97在线碰| 日韩人妻无码制服丝袜视频| 黄色污网站在线观看| 亚洲第一视频网站| 大香网伊人久久综合网2020|