劉邦義 周激流 張衛華



暴力行為檢測是行為識別的一個重要研究方向,在網絡信息審查和智能安全領域具有廣闊的應用前景.針對目前的時序模型在復雜背景下不能有效提取人體運動特征和常規循環神經網絡無法聯系輸入上下文的問題,本文提出一種時序邊界注意力循環神經網絡TEAR-Net.首先,以本文提出的一種全新的運動特征提取模塊MOE為基礎,在保留輸入視頻段序列背景信息的前提下加強運動邊界區域.運動邊界對于動作識別的作用要遠大于圖像其他區域,因此運動邊界加強能夠有效提高動作特征的提取效率,從而提升后續網絡的識別精度.其次,引入了一種全新的結合上下文語境和注意力機制的循環卷積門單元(CSA-ConvGRU),提取連續幀之間的流特征以及不同幀的獨立特征,并關注關鍵幀,能夠極大提升動作識別的效率,以少量參數和較低計算量的代價掌握視頻流的全局信息,從而有效提高識別準確率.本文提出的模型在目前最新的公開數據集RWF-2000和RLVS上進行了多種實驗.實驗結果表明,本文提出的網絡在模型規模和檢測精度上均優于目前主流的暴力行為識別算法.
暴力行為; 時序信息; 運動邊界; 注意力機制; 上下文
TP391A2023.023003
收稿日期: 2022-01-17
基金項目: 四川省科技計劃(2022YFQ0047)
作者簡介: 劉邦義(1998-), 男, 碩士研究生, 主要研究方向為計算機視覺、模式識別等. E-mail: 228980603@qq.com
通訊作者: 張衛華. E-mail: zhangweihua@scu.edu.cn
Temporal edge attention recurrent neural networkfor violence detection
LIU Bang-Yi1,? ZHOU Ji-Liu1, ZHANG Wei-Hua2
(1. College of? Electronic Information Engineering, Sichuan University, Chengdu? 610065, China;
2. College of Computer Science, Sichuan University, Chengdu? 610065, China)
Violence detection is one of the most important research topic in behavior recognition,which has great potential applications in network information review and intelligent security.The published works cannot keep their performance in the complexity environments, because they cannot effectively extract movement features and contact consecutive frames. Hence, a novel method is proposed in this paper, which is referred to as temporal edge attention recurrent neural network (TEAR-Net). First, we propose a novel motion object enhancement (MOE) module, which enhances the motion edge while keeping the background information of the video sequences. Because the motion edge has a much greater effect on motion recognition than other areas of the image, the enhancement of motion edge can effectively improve the extraction efficiency of action features, and thus the recognition accuracy is improved. Then we introduce a novel recurrent convolutional gate unit CSA-ConvGRU, which combines context and attention mechanism. It can extract the stream features among consecutive frames and the independent features of each frame. Attention mechanism can help to focus on key frames, which greatly improve the efficiency of action recognition, capture the global information of the video stream with a lower cost, and thus effectively improve recognition accuracy. The proposed model has been tested on the currently lastest public datasets RWF-2000 and RLVS. The experimental results show that the proposed model outperforms the state-of-the-art violence detection algorithms in terms of computational cost and detection accuracy.
Violence; Temporal information; Motion edge; Attention mechanism; Context
1 引 言伴隨著城鎮化的規模不斷擴大和人口的聚集,群眾對公共區……