999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

An Im proved End-to-End Memory Network for QA Tasks

2019-11-25 10:23:00AziguliWulamuZhenqiSunYonghongXieCongXuandAlanYang
Computers Materials&Continua 2019年9期

Aziguli WulamuZhenqi SunYonghong Xie Cong Xuand Alan Yang

Abstract:At present,End-to-End trainable Memory Networks (MemN2N) has proven to be promising in many deep learning fields,especially on simple natural language-based reasoning question and answer (QA) tasks.However,when solving some subtasks such as basic induction,path finding or time reasoning tasks,it remains challenging because of limited ability to learn useful information between memory and query.In this paper,we propose a novel gated linear units (GLU) and local-attention based end-to-end memory networks (MemN2N-GL) motivated by the success of attention mechanism theory in the field of neural machine translation,it shows an improved possibility to develop the ability of capturing complex memory-query relations and works better on some subtasks.It is an improved end-to-end memory network for QA tasks.We demonstrate the effectiveness of the ese approaches on the 20 bAbI dataset which includes 20 challenging tasks,without the use of any domain know ledge.Our project is open source on github4.

Keywords:QA system,memory network,local attention,gated linear unit.

1 Introduction

The QA problem has been around for a long time.As early as 1950,the British mathematician A.M.Turing in his paper put forward a method to determine whether the machine can think-Turing Test,which is seen as the blueprint for the QA system [Turing(1950)].The first generation of intelligent QA system converts simple natural language questions into pre-set single or multiple keywords and queries the information in a domainspecific database to obtain answers.Its earliest appearance can be traced back to the early 1950s and 1960s when the computer was born.The representative systems include two well-known QA systems,baseball [Green Jr,Wolf,Chomsky et al.(1961)] and lunar[Woods and Kaplan (1977)].They have a database in the background that holds various data the system can provide.When the user asks a question,the system converts the user’s question into a SQL query statement,and queries the data from the database to the user.

Shrdlu was a highly successful QA program developed by Terry Winograd in the late 60s and early 70s [Winograd (1972)],it simulated the operation of a robot in a toy world (the“blocks world”).The reason for its success was to choose a specific domain and the physical rules were easily w ritten as programs.The travel information consultation system GUS developed by Bobrow et al.in 1977 is another successful QA system [Bobrow,Kaplan,Norman et al.(1977)].In the 1990s,with the development of the Internet,a second generation question and answer system emerged.It extracts answers from large-scale text or web-based libraries based on information retrieval techniques and shallow NLP techniques [Srihari and Li (2000);Voorhees (1999);Zheng (2002)].A representative example is Start (1993),which is the world’s first web-based question answering system,was developed by the M IT Artificial Intelligence Lab [Katz (1997)].In 1999,the TREC (Text REtrieval Conference)began the evaluation of the question and answer system.In October 2000,ACL(the Association for Computational Linguistics) used the open domain question and answer system as a topic,which promoted the rapid development of the question and answer system.With the rise of web 2.0 technology,the third generation question answering system has developed [Tapeh and Rahgozar (2008)].It is characterized by high quality know ledge resources and deep NLP technology.Up to now,in addition to the “Cortana” of Microsoft[Young (2019)],the “Dumi” of Baidu [Zhu,Huang,Chen et al.(2018)] and the “Siri” of Apple [Hoy (2018)],many companies and research groups have also made breakthroughs in this field [Becker and Troendle (2018);Zhou,Gao,Li et al.(2018)].

Table1:Samples of the ree typesoftasks

In such a situation,end-to-end learning framework have shown promising performance because of their applicability in the real environment and efficiency in model updating [Shi and Yu (2018);Madotto,Wu and Fung (2018);Li,Wang,Sun et al.(2018);Liu,Tur,Hakkani-Tur et al.(2018)].In end-to-end dialog systems,End-to-end memory network(MemN2N) and its variants have always been hot topics of research [Perez and Liu (2018);Ganhotra (2018)],in light of the powerful ability to describe long term dependencies [Huang,Qi,Huang et al.(2017)] and the flexibility in the implementation process.

Although MemN2N has achieved good performance on the dialog bAbI tasks,where the memory components effectively work as representation of the dialog context and play a good role in inference.There are still many tasks not very satisfactory in the bAbI [Shi and Yu (2018)].In order to find out the reasons for this,we have made a careful comparison.We found that tasks achieved good performance on the dialog bAbI tasks (such as Task 3 supporting factsin Tab.1) have in common that there are many more contextual sentences than those perform well (such as Task yes/no questions in Tab.1).And when calculating the relevance of the memory and the query,MemN2N attend to all sentences on the memory side for query [Sukhbaatar,Szlam,Weston et al.(2015)],which is expensive and can potentially render it impractical.Inspired by the field of machine translation [Minh Thang and Hieu Pham (2015)],we introduce the local-attention mechanism when calculating the correlation between memory and query.We don’t consider all the information of memory,but a subset of sentences which are more relevant to the query.At the same time,inspired by the gated convolutional network(GCN) [Dauphin,Fan,Auli et al.(2017)],we investigate the idea of the gated linear units (GLU) into MemN2N to update the intermediate state between layers.The purpose of these two improvements is the same,that is to appropriately reduce the complexity of the model,let the model pay more attention to useful information when training.

We have compared our two improved methods to MemN2N in the bAbI tasks,analyzed their number of successful tasks and error rates in different tasks.We also analyzed from the perspective of training speed and visual weight.Experimentally,we demonstrate that both of our approaches are effective.

In the following sections,we first introduce the application of MemN2N model and the innovation of our MemN2N-GL model in the second section;in the third section,we introduce the implementation methods of our model in detail,including local-attention matching and GLU mapping;then,we show our experimental results in the fourth section,and make a comparative analysis with baseline;finally,in the fifth section,we show our conclusion and planning for future work.

Figure1:A single layer version of MemN2N

2 Model

The MemN2N architecture,introduced by Sukhbaatar et al.[Sukhbaatar,Szlam,Weston et al.(2015)],is a hot research method in the field of current QA system [Ganhotra and Polymenakos (2018);Perez and Liu (2018)].It is a form of Memory Network (MemNN)[Weston,Chopra and Bordes (2014)],but unlike the model in that work,it is trained endto-end that makes it easier to apply in practical situations [Sukhbaatar,Szlam,Weston et al.(2015)].Compared with traditional MemNN,MemN2N gets less supervision during training,which means it will reduce some complexity and may not be able to fully capture the context information.

Because of the good characteristics of MemN2N,it has been used in a wide range of tasks in recent years.Boyuan Pan et al.introduced a novel neural network architecture called Multi-layer Embedding with Memory Network (MemNN) for machine reading task,where a memory network of full-orientation matching of the query and passage to catch more pivotal information [Pan,Li,Zhao et al.(2017)].M Ghazvininejad et al.presented a novel,fully data-driven,and know ledge-grounded neural conversation model aimed at producing more contentful responses [Ghazvininejad,Brockett,Chang et al.(2018)].In the field of computer vision,Wu et al.proposed a long-term feature bank,which extracts supportive information over the entire span of a video to augment state- of -the-art video models [Wu,Feichtenh of er,Fan et al.(2018)].Although MemN2N has been widely used in many fields and has achieved good results,it may not scale well to the case where a larger memory is required [Sukhbaatar,Szlam,Weston et al.(2015)].

In this paper,inspired by the attention mechanism and its many deformations in the field of deep learning [Shen,Zhou,Long et al.(2018);Zhang,Goodfellow,Metaxas et al.(2018);Shen,He and Zhang (2018)],we propose an improvement point to introduce local attention mechanism into MemN2N to improve the model effect.Compared to the use of global matching between u and each memory miin MemN2N (Eq.(4)),local attention mechanism pays more attention to the local information related to the question state u in the memory.We also consider to optimize the updating of hidden state u between layers of MemN2N (Fig.1).The original method uses a linear mapping H (Eq.(10)) while we draw on the experience of GLU (Gated Linear Unit) proposed by Dauphin et al.[Dauphin,Fan,Auli et al.(2017)].We compare the improved model based on these two points with MemN2N in the same data sets.As a result,our model performs better in more complex QA tasks.

3 Methods

In this section,we introduce our proposed model MemN2N-GL.Our model aims to extract more useful interactions between memory and query to improve the accuracy of MemN2N.Similar to MemN2N,our MemN2N-GL consists of the ree main components:input memory representation,out memory representation and final answer prediction.

In the part of input memory representation,an input set x1,...,xi are converted into memory vectors {mi} and {ci} of dimension d in a continuous space,using embedding matrixes A and C,both of them are d×V,where d is embedding size,V is vocabulary size.Similarly,the query q is also embedded (by matrix B) to an internal state u.We use position encoding (PE) [Sukhbaatar,Szlam,Weston et al.(2015)] to convert word vectors as sentence vectors.This takes the form:

where ljis column vector with the structure:

J is the number of words in the sentence,and d is the dimension of the embedding.In this way,the position information of the words are taken into account when generating the sentence vector.Questions,memory inputs and memory outputs also use the same representation.In order to enable memory to have context temporal information,we also modify the memory vector by:

where TA(i) is the i th row of a special matrix TAthat encodes temporal information,and TAis learned during training.

Then in the embedding space,MemN2N calculate the relevance score between u and each memory miby means of matching in dot form [Minh-Thang Luong (2015)]followed by a softmax:

where

After applying softmax function,each component of the matrix uTmiwill be in the interval(0,1),and the components will add up to 1,so that they can be interpreted as probabilities.Furthermore,the larger input components will correspond to larger probabilities.

By contrast,we develop local-attention mechanism to calculate the correlation and filtering out irrelevant information between u and {mi},compared with the attention mechanism used in MemN2N,our model does not focus on the relevance of the global memory and query,but focuses on the local memory associated with the query.

3.1 Local-attention matching

As mentioned before,local-attention matching chooses to focus only on a small subset of the memory,which is more relevant to query q (Fig.2).Concretely,the model first generates an aligned position pufor the query q in the memory embedding A:

where vp,Waare the model parameters which will be learned to predict positions.S is the memory size,δ is activation function and pu∈ [0,S].

Then the relevance score between u and each memory miis defined as:

where piis the original score (Eq.(4)),is the standard deviation and D is the window size of subset memory.Finally,we use the new relevance score pito calculate the out memory representation in Fig.2.The response o from output memory is a sum of memory vectors {ci},weighted by the input probability vector:

Finally in final answer prediction,the predicted answer distribution a^ is produced by the sum of the output vector o and the input embedding u which then passed through a final weight matrix W (of size V×d) and a softmax:

Figure2:Local-Attention Matching

The above is for single layer structure (Fig.2),for many different types of difficult tasks,the model can be extended to multi-layer memory structure (Fig.1),where each memory layer is named a hop and in MemN2N,the (K+1)t?hop’s state uK+1is calculated by:

By contrast,we utilize gated linear units (GLU) in our MemN2N-GL.Compared to linear mapping H or frequently used nonlinear mapping functions,GLU effectively reducing the gradient dispersion,but also retaining the ability of nonlinearity.Proved by experiment,it is better suit for MemN2N.

3.2 Gated linear units mapping

In our MemN2N-GL,we use the layer-wise [Sukhbaatar,Szlam,Weston et al.(2015)]form (where the memory embeddings are the same across different layers,i.e.,A1...=AKand C1...=CK.) to expand the model from a single-layer to a multi-layer structure.In this case,we utilize GLU mapping to the update of u between layers:

where W,V ∈ ?m×m,m is embedding size,b,c∈ ?m×1,W,V,b,c are learned parameters,oK∈ ?m×1is the output of the Kt? layer (Fig.1).

Then,the predicted answer distribution a is the combination of the input and the output of the top memory layer:

Figure3:A k layers version of MemN2N

Finally,at the top of the network,we adopt the original approach combining the input ukand the output okof the top memory layer:

where Wis the parameter learned during training.

Compared with MemN2N,our model performs better in complex QA problems,which can be confirmed in the experimental results in the next section.We believe that the local attention mechanism removes redundant memory when calculating the correlation between memory and query,so the weight vector obtained is “purer” and contains more useful information.Besides,compared with linear mapping,GLU mapping has the ability of nonlinearity,which makes the model has stronger learning ability in the update of u between layers.

4 Experiments and results

We perform experiments on goal-oriented dialog datasets Dialog bAbI [Weston,Bordes,Chopra et al.(2015)],which contains 20 subtasks.Each of subtasks consists of the ree parts:the context statements of the e problem,the question,and correct answer.There are samples of the ree of the tasks in Tab.1.For each question,only certain subsets of the e statement contain the information needed for the answer,while other statements are basically unrelated interferers.And the difficulty of various subtasks is different,which is reflected in the increase of interference statements.During training,we choose to use 10K dataset,our goal is to improve the ability of the model to answer questions correctly based on context.

4.1 Training details

We perform our experiments with the following hyper-parameter values:embedding dimension embed_size=128,learning rate λ =0.01,size of each batch batc?_size=32,number of layers K=3,capacity of memory memory_size=50 and max gradient norm to clip max_clip=40.0.We also used some skills during the training,for example,the learning rate of our model automatically adjusts with the change of loss.If the loss value does not decrease but increases between adjacent training epochs,the learning rate will be reduced to 2/3 of the current value.The condition of training termination is that the loss value is less than a certain threshold (the experiment was 0.001),or the number of training epochs reaches the upper limit.In the course of training,the training time varies with the difficulty of different subtasks,but all of them are within one day.

Table2:Test error rates (%) on the 20 QA tasks

5:3 argument relation 9.5 11.2 3.2 0.9 6:yes/no questions 50.0 50.0 25.2 48.2 7:counting 50.3 11.5 16.1 10.9 8:lists/sets 8.7 7.3 6.0 5.6 9:simple negation 12.3 35.2 13.1 4.4 10:indefinite know ledge 11.3 2.6 3.9 13.9 11:basic coreference 15.9 18.0 40.9 9.0 12:conjunction 0.0 0.0 2.8 0.0 13:compound coreference 46.1 21.3 18.8 1.4 14:time reasoning 6.9 5.8 4.4 10.3 15:basic deduction 43.7 75.8 2.4 0.0 16:basic induction 53.3 52.5 57.8 53.1 17:positional reasoning 46.4 50.8 48.6 46.2 18:size reasoning 9.7 7.4 13.6 12.9 19:path finding 89.1 89.3 14.5 24.7 20:agent’s motivation 0.6 0.0 1.7 Mean error (%) 30.2 29.2 21.2 19.0 0.0 Successful tasks (err < 5%) 4 5 8 8

Table3:The visualization weights of layers

4.2 Results and analysis

Our baseline is MemN2N.We try three different combinations of improvements and compare their performance in different subtasks (as shown in Tab.2).MemN2N (Local-Attention) and MemN2N (GLU) indicate that the model only adds the local attention mechanism and the GLU mechanism respectively,MemN2N-GL means that both improvements exist simultaneously.The number in the table represents the error rates of each sub-task,and the bold number represents the result of the model that performs best in the same subtask.In the last two rows of the e table,we counted the mean error rates of all models and the number of successful tasks (subtasks with error rates less than 5).

In terms of results,MemN2N-GL achieves the best results both in mean error rates and in the number of successful tasks.Compared with MemN2N,the mean error rates is reduced by 37.09%,and the number of successful tasks doubled from four to eight.MemN2N(Local-Attention) and MemN2N (GLU) have their own advantages and disadvantages,but both of their effect are better than MemN2N.

4.3 Related tasks

In addition to comparing the results of different tasks with each model in Tab.2,we use a specific example to quantitatively analyze the result by the visualization weights of layers.As shown in Tab.3,the most relevant memory sentence to the query “Who did Fred give the apple to ?” is the first memory sentence:“Fred gave the apple to Bill”.After training,MemN2N does not focus on the memory sentences of greater relevance,which led to the wrong answer as a result.While our model pays close attention to contextual information that is highly relevant to query,which can be reflected in the size of the correlation weight at each layers.The darker the color,the greater the weight.

5 Conclusion and future work

In this paper we proposed two improvements based on MemN2N model for QA problem and perform empirical evaluation on dialog datasets bAbI.The experimental results show that our improved model has a greater performance than the original model,which strongly confirms our conjecture that the model should pay more attention to the useful information when training.In the future,we are prepared to further improve the ability of the model to handle complex tasks.At the same time,we are going to test our model on more datasets.We also intend to combine our model with recent research results Bert (Bidirectional Encoder Representations from Transformers) and use our model as a downstream part to see if it will achieve better results.

Acknowledgement:This work is supported by the National Key Research,Development Program of China under Grant 2017YFB1002304 and the National Natural Science Foundation of China (No.61672178)

主站蜘蛛池模板: 亚洲精品第1页| 精品国产免费观看| 亚洲人网站| 丰满人妻久久中文字幕| 精品日韩亚洲欧美高清a| 国产一级二级三级毛片| 欧美怡红院视频一区二区三区| 麻豆精品国产自产在线| 婷婷99视频精品全部在线观看| 久草视频一区| 成AV人片一区二区三区久久| 亚洲精品动漫| 成人福利在线看| 国产成人亚洲精品色欲AV | 亚洲中文字幕国产av| 波多野结衣一区二区三区88| 亚洲AⅤ永久无码精品毛片| 亚洲一级毛片| 无码日韩精品91超碰| 99成人在线观看| 亚洲国产中文欧美在线人成大黄瓜| 毛片三级在线观看| 第一页亚洲| 凹凸精品免费精品视频| 婷婷色在线视频| 国产在线视频福利资源站| 国产天天色| 国产毛片高清一级国语| 国产精品爽爽va在线无码观看 | 亚洲免费毛片| 亚洲欧美日韩成人高清在线一区| 夜夜爽免费视频| 国产91视频免费观看| 色一情一乱一伦一区二区三区小说 | 婷婷中文在线| 亚洲中文字幕在线观看| 伊人色在线视频| 亚洲中文在线视频| 国产特一级毛片| 一级爱做片免费观看久久| 亚洲床戏一区| 久久综合九九亚洲一区| 午夜国产大片免费观看| 国产黄色片在线看| 欧美日韩国产高清一区二区三区| 国产在线一区二区视频| 亚洲av无码久久无遮挡| 亚洲国产91人成在线| 国产成年女人特黄特色大片免费| 日本在线国产| 九色综合伊人久久富二代| 国产成人一区二区| 久久国产乱子| 亚洲AV一二三区无码AV蜜桃| 精品久久久久无码| 久久99热这里只有精品免费看| 手机在线免费毛片| 日本草草视频在线观看| 国产福利微拍精品一区二区| 十八禁美女裸体网站| 性视频一区| 亚洲成肉网| 久久成人18免费| 日本国产精品| 久久青青草原亚洲av无码| 毛片基地视频| 国产精品人莉莉成在线播放| 亚洲最大在线观看| 精品无码一区二区三区电影| 国产成人精品亚洲77美色| 人妻丰满熟妇AV无码区| 国产丝袜丝视频在线观看| 99久久婷婷国产综合精| h视频在线播放| 在线视频97| 亚洲无码高清一区二区| 国产精鲁鲁网在线视频| 无码综合天天久久综合网| 色播五月婷婷| 麻豆精选在线| 国内精品伊人久久久久7777人| 亚洲天堂.com|