









摘" 要: 為解決現有語義分割模型無法兼顧全局語義信息與局部細節信息,以及殘差模塊細節特征提取能力弱的問題,提出一種語義分割方法。在HRNet的基礎上引入了金字塔池化模塊,兼顧了全局語義信息和局部細節信息,同時在原有殘差模塊Basic Block的基礎上引入大核深度卷積提高模型的細節特征提取能力,大幅度提高模型的精度。在PASCAL VOC2012圖像數據集上的實驗表明,相較于原始HRNet等其他分割網絡,該算法取得了分割精度的顯著提升,平均分割精度達到了89.27%。各設計模塊的有效性也通過消融實驗得以驗證,尤其是改進Basic Block對提升分割性能具有關鍵作用,該模型大幅度提升了圖像語義分割精度,提供了一種高效率、穩定且適用場景更加普遍的多尺度語義分割算法。
關鍵詞: HRNet; 金字塔池化模塊; 大核深度卷積; 殘差模塊; 語義分割; 深度學習
中圖分類號: TN911.73?34" " " " " " " " " " " " 文獻標識碼: A" " " " " " " " " " " " 文章編號: 1004?373X(2025)07?0029?06
Research on image semantic segmentation method
based on improved HRNet and PPM
SHI Jiaqi1, YANG Haojun2, LIU Xiaoyue1, CHEN Xin1
(1. College of Electrical Engineering, North China University of Science and Technology, Tangshan 063000, China;
2. Beijing University of Posts and Telecommunications, Beijing 100876, China)
Abstract: A semantic segmentation method is proposed to address the issue of the existing semantic segmentation models being unable to balance global semantic information and local detail information, and the poor ability of residual module detail feature extraction. On the basis of the HRNet, a pyramid pooling module is introduced to balance global semantic information and local detail information. At the same time, the large?kernel deep convolution is introduced on the basis of the original residual module Basic Block, so as to improve the detail feature extraction ability of the model and improve the model accuracy significantly. Experiments on the PASCAL VOC2012 image dataset show that in comparison with the other segmentation networks, for instance, the original HRNet, the proposed algorithm achieves a significant improvement in segmentation accuracy, with an average accuracy of 89.27%. The effectiveness of each designed module has also been verified by ablation experiments, especially the improvement of Basic Block, which plays a crucial role in improving segmentation performance. This model further improves the accuracy of image semantic segmentation and achieves a more efficient, stable, and universal multi?scale semantic segmentation algorithm.
Keywords: HRNet; pyramid pooling module; large?kernel deep convolution; residual module; semantic segmentation; deep learning
0" 引" 言
語義分割作為計算機視覺領域的基礎問題,對圖片場景數據解析有著重要作用,可廣泛應用于圖像編輯、機器人、增強現實、自動駕駛、醫學成像等領域。其功能是將每個圖像像素分配給與底層對象相對應的類別標簽,并為目標任務提供高級圖像表示[1]。
最近的語義分割方法通常依賴于卷積編碼器?解碼器架構,其中編碼器生成低分辨率圖像特征,解碼器將特征上采樣為具有每像素類別分數的分割圖。全卷積網絡(Fully Convolution Network, FCN)[2]可以實現端到端的預測,并且可以處理任意大小的圖像,但由于粗糙的上采樣過程,導致對圖像的細節并不敏感,且沒有考慮像素間的關系使其語義分割結果精度較低。……