












摘要: 多跳閱讀理解是自然語言處理研究領域的熱點和難點,其研究在文本理解、自動問答、對話系統等方面具有重要意義和廣泛應用。針對當前面向中文的多跳閱讀理解(Multi-Hop Reading Comprehension,MHRC)研究不足的現狀,構建了一個面向復雜問題的中文多跳閱讀理解(Complex Chinese Machine Reading Comprehension,Complex CMRC)數據集,提出了一種基于問題分解的中文MHRC方法。該方法分為問題分解和問題求解兩個階段:首先提出了一種融合JointBERT模型和規則的復雜問題分解方法,通過JointBERT模型對問題類型識別和問題片段識別聯合建模,獲得準確的問題類型和問題片段信息,再利用專門設計的問題分解規則將復雜問題分解為多個簡單子問題;然后采用BERT預訓練模型對所有子問題進行迭代求解,最終獲得復雜問題的答案。分別在Complex CMRC數據集上進行問題分解和問題求解實驗,取得了良好的實驗結果,驗證了提出方法的有效性。
關鍵詞: 多跳閱讀理解;復雜問題分解;預訓練模型;數據集構建;問題求解
中圖分類號: TP399" " " " 文獻標志碼: A
doi:10.3969/j.issn.2095-1248.2023.02.008
Multi-Hop Reading Comprehension based on question decomposition
FAN Rui-wen , BAI Yu , CAI Dong-feng
(Human-Computer Intelligence Research Center, Shenyang Aerospace University,Shenyang 110136, China)
Abstract: Multi-Hop Reading Comprehension (MHRC) is a hot and difficult task in the field of natural language processing,and its research is importantly and widely used in text understanding,automatic question answering,and dialogue systems.To address the current lack of research on Chinese-oriented MHRC,a Chinese MHRC dataset for complex question was constructed and a Chinese MHRC method based on question decomposition was proposed.The method was divided into two stages:Firstly,a complex question decomposition method integrating JointBERT model and its rules was proposed to jointly model the question type identification and the question fragment identification by JointBERT model to obtain accurate question type and question fragment information,and then the specially designed question decomposition rules were used to decompose the complex question into multiple simple sub-questions.Secondly,the BERT pre-training model was utilized to iteratively solve all the sub-questions and finally obtain the answer of the complex question.The question decomposition and question solving experiments were conducted on the Complex CMRC dataset respectively which verify the effectiveness of the proposed method.
Key words: Multi-Hop Reading Comprehension;complex question decomposition;pre-trained models;dataset construction;question solving
多跳閱讀理解(MHRC)任務是指機器通過從給定的文本中獲得信息,通過多步推理對給定的問題作出回答。與單跳閱讀理解任務相比,多跳閱讀理解任務通常是在文章和問題結構更加復雜的情況下,需要更多的推理步驟才能得到答案。因此多跳閱讀理解更加貼近于真實生活,更接近人類的推理認知,具有更廣泛的研究和應用價值,同時更具有挑戰性。
MHRC主要分為基于問題分解和基于圖神經網絡兩種方法?!?br>