999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Part Decomposition and Refinement Network for Human Parsing

2022-06-25 01:18:34LuYangZhiweiLiuTianfeiZhouandQingSong
IEEE/CAA Journal of Automatica Sinica 2022年6期

Lu Yang, Zhiwei Liu, Tianfei Zhou, and Qing Song

Dear Editor,

This letter is concerned with human parsing based on part-wise semantic prediction. Human body can be regarded as a whole structure composed of different semantic parts, and the mainstream single human parser uses semantic segmentation pipeline to solve this problem. However, the differences between human parsing and semantic segmentation tasks bring some issues that are inevitable to avoid. In this paper, we propose a novel method called part decomposition and refinement network (PDRNet), which adopt partwise mask prediction other than pixel-wise semantic prediction to tackle human parsing task. Specifically, we decompose the human body into different semantic parts and design a decomposition module to learn the central position of each part. The refinement module is proposed to obtain the mask of each human part by learning convolution kernel and convolved feature. In inference stage, the predicted human part masks are combined into a complete human parsing result. Through the decomposition, refinement and combination of human parts, PDRNet greatly reduces the confusion between the target human and the background human, and also significantly improves the semantic consistency of human part.Extensive experiments show that PDRNet performs favorably against state-of-the-art methods on several human parsing benchmarks,including LIP, CIHP and Pascal-Person-Part.

Introduction: The problem of assigning dense semantic labels to a human image, formally known as human parsing, is of great importance in computer vision as it finds many applications,including clothing retrieval, virtual reality, human-computer interaction [1], [2], etc. Generally speaking, the vast majority of the existing human parsing methods follow two paradigms: bottom-up and top-down. The bottom-up [3], [4] treats human parsing as a finegrained semantic segmentation task, predicting the category of each pixel and grouping it into corresponding instances. The top-down[5]-[10] locates each instance in the image plane, and then segments each human part independently. Therefore, an accurate single human parser is particularly important for the top-down method. The mainstream single human parsers map the human body to the same size feature space [5], [7], [11], and use pixel-wise semantic segmentation pipeline to solve the problem. However, there are great differences between human parsing and semantic segmentation tasks.First of all, in the human parsing, all human bodies except the target human are regarded as the background, while semantic segmentation does not distinguish different human instances, but tends to treat the target human and the background human equally (background confusion errors). Secondly, each human part is an instance with boundary, and we need to assign the same semantic label to the whole part. However, semantic segmentation is a pixel-wise classification, which can not guarantee that all pixels in the one part can be predicted the same category (semantic inconsistency errors).

In this work, we are committed to solving the errors caused by the differences between method and objective in human parsing. We abandon the process of semantic segmentation, learn from the idea of instance segmentation [12]-[14], decompose the human body into different semantic parts, segment each part mask independently, and then combine them into a complete human structure. Specifically, we propose a decomposition module to predict the centers and categories of human parts on the feature map. The decomposition module encodes the position information of human parts into the spatial dimension, and encodes the category information into the channel dimension. Therefore, the prior geometric context of the human body is retained in the feature map, which effectively avoids the confusion between the target human and the background human. In order to obtain the mask of each part, we propose a refinement module. The refinement module consists of two branches, one is used to learn the convolution kernel at the center of each part, the other is used to learn the convolved feature. We use dynamic convolution to generate the mask for each part, which converts the traditional pixel-wise semantic segmentation problem into a more concise binary part-wise mask segmentation. In inference stage, we present a human parsing probability map combination method based on the predicted human part categories and masks. The predicted mask with the highest score of each category is sampled and weighted fusion is carried out according to the quality score [11], and finally combined into a complete human parsing result.

As shown in Figs. 1(a)-1(c), we call the proposed method of decomposition, refinement and combination of human body as PDRNet. Experiments show that, PDRNet has achieved state-of-theart performances on four benchmarks, including CIHP [3], Pascal-Person-Part [15] and LIP [16]. Meanwhile, we also verify that PDRNet can significantly reduce background confusion errors and semantic inconsistency errors through qualitative comparison.

Fig. 1. Illustration of proposed PDRNet for human parsing. We decompose the human body into different semantic parts, segment each part mask independently, and then combine them into a complete human structure.

Related work:

Fig. 2. Illustration of proposed decomposition module, refinement module and human parsing probability map combination.

· Human parsing: Human parsing has attracted a lot of research efforts in recent years [3], [15], [16]. Most of them regard it as a special case of semantic segmentation, and improve the performance by introducing attention mechanism [10], [17], auxiliary supervision[5], [15], [18], [19], human hierarchical structure [7], [8], [20] or quality estimation methods [9], [11]. Some earlier studies introduced human structure prior knowledge by designing hand-crafted features[21] or grammar model [22]. Attention mechanism [10] is then adopted to construct the geometric context of the human body, which promotes the development of the community. In order to improve the ability of semantic segmentation network to understand human structure, some researchers uses keypoints [15] and edge [5]supervision to improve the model representation. Graph transfer learning [23], graph networks [7], [8] and semantic neural tree [20]are used to exploit the human representational capacity. However, it is difficult to eliminate the differences between the semantic segmentation method and the human parsing objective through these efforts. Urgently need a new perspective to solve the human parsing problem. Guided by this intuition, we try to decompose the human body into parts, segment each part mask independently, and combine them into a complete structure, which make a further step towards the consistency of method and objective in human parsing.

· Instance segmentation: Instance segmentation is a more challenging view of dense pixel prediction. It not only needs to predict the semantic categories at pixel level, but also distinguish different instances in the image simultaneously. According to whether the object proposal is explicitly adopted, the instance segmentation method can be divided into proposal-based [12] and proposal-free [13,] [14]. The first successful attempt is the proposalbased method, the milestone work is Mask R-CNN [12], which learns from the two-stage object detection framework by detecting the object box first and then segmenting the object mask in the box.The proposal-free methods attempt a more direct idea, using pixel grouping, object contour and other strategies [24] to obtain the object mask. Some pioneering efforts [13], [14] segment the mask directly without the box supervision, which has advantages in efficiency and performance. Our work is inspired by this idea, which can be viewed as a groundbreaking attempt to explore the method beyond semantic segmentation pipeline in the area of human parsing.

Methodology:

Experiments:

Table 1.The Impact of Conv. Number (Top), grid Number (Bottom) of PDRNet on LIP val Set

· Human parsing probability map combination: Table 2 shows sampling the highest score mask (denoted as Top-1) can achieve higher performance than the NMS-based sampling [14]. We argue this is because each part of human body is unique, so there is no need for complex duplicate removal method. The core of probability map combination is to fuse the sampled part masks into a complete probability map. As shown in Table 3, the quality-base fusion is 0.75 points mIoU higher than direct fusion and 1.50 points mIoU higher than score-based fusion. By comparing direct fusion and quality-base fusion, we find that it is necessary to weight the mask quality, where some low quality predictions can be suppressed. Table 4 illustrates the influence of ensemble factorαon human parsing performance.Note that, it is not the best choice to use only the global probability map Oglobal(α= 0.0) or the part probability map Opart(α= 1.0). It can be observed that settingα= 0.75 achieves the best performance with 88.53% pix Acc., 69.01% mean Acc. and 57.97% mIoU.

Table 2.Performance Comparison of Different Kinds of Predicted Masks Sampling on LIP val Set

Table 3.Performance Comparison of Different Kinds of Human Parts Fusion on LIP val Set

Table 4.Ablation Study of Global and Parts Ensemble Factor α on LIP val Set

· CIHP [3]: In the upper part of Table 6, we compare our method against 11 recent methods on CIHP val PDRNet achieves the best results in all metrics; specifically, 65.1% mIoU, 63.5% APpand 57.5% APr. Compare with previous state-of-the-art QANet [11],PDRNet yields 1.3 points mIoU and 1.7 points APpimprovements.

· PASCAL-person-part [15]: PASCAL-person-part is a classic multiple human parsing benchmark with only 7 semantic categories.The lower part of Table 6 summarizes the quantitative comparison results with 9 competitors on PASCAL-person-part test set. Our PDRNet with HRNet-W48 backbone yields 73.3% mIoU, 63.9% APpand 59.1% APr, which again demonstrates our superior performance.

Conclusions: Using traditional semantic segmentation pipeline to process human parsing task will bring unavoidable semantic inconsistency and background confusion errors. This work draws on the idea of instance segmentation and proposes a new human parsing method to addresses these issues. Firstly, a decomposition module is designed to encode the human geometry prior and predict the center position of each part. Then, the refinement module is proposed to predict the part masks. In inference stage, combining the predicted human part masks into a complete human parsing probability map.We verify the superiority of our method on several benchmarks, and further prove that it can be flexibly combined with the existing human parsing frameworks.

Acknowledgments: This work was supported by the National Key Research and Development Program of China (2021YFF0500900).

Table 5.Comparison of Pixel Accuracy, Mean Accuracy and mIoU on LIP val Set

Table 6.Comparison With Previous Methods on Multiple Human Parsing on CIHP val and PASCAL-Person-Part Sets. Bold Numbers are State-of-the-Art on Each Dataset, ? Denotes Using Multi-Scale Test Augmentation


登錄APP查看全文

主站蜘蛛池模板: 国产美女主播一级成人毛片| 99视频国产精品| www.亚洲天堂| 亚洲成网777777国产精品| 亚洲国产午夜精华无码福利| 熟女成人国产精品视频| 亚洲综合狠狠| 国产呦精品一区二区三区下载| 国产午夜无码片在线观看网站| 国产精品久久久久久搜索| 91精品专区国产盗摄| 久久精品国产电影| 在线五月婷婷| 免费A∨中文乱码专区| 伊人久久福利中文字幕| 亚洲av成人无码网站在线观看| 国产精品页| 3D动漫精品啪啪一区二区下载| 91在线一9|永久视频在线| 国产成人一区免费观看 | 不卡午夜视频| 国产精品视频3p| 婷婷成人综合| 亚洲欧洲日韩综合| 国产福利一区二区在线观看| 亚洲人网站| 亚洲黄色片免费看| 久久精品波多野结衣| 国产av无码日韩av无码网站 | 欧类av怡春院| 欧美亚洲日韩不卡在线在线观看| 国产女人18毛片水真多1| 亚洲色图欧美一区| 免费又黄又爽又猛大片午夜| 黄色一及毛片| 国产流白浆视频| 国产精品99久久久久久董美香 | 久久久成年黄色视频| 日韩欧美国产三级| 2020亚洲精品无码| 亚洲人成影院在线观看| 亚洲精品国产综合99| 99视频国产精品| 国产欧美日韩另类| 97超爽成人免费视频在线播放| 精品三级网站| 伊人查蕉在线观看国产精品| 一级一级一片免费| 国产JIZzJIzz视频全部免费| 色爽网免费视频| 午夜国产理论| 国产综合在线观看视频| 国产色网站| 四虎精品国产AV二区| 日本不卡在线视频| 欧美一区二区三区欧美日韩亚洲| 国内精品视频| 91久久夜色精品国产网站| 久久黄色一级片| 久久久受www免费人成| 亚洲乱码在线播放| 亚洲国产av无码综合原创国产| 久久人人爽人人爽人人片aV东京热| 亚洲第一天堂无码专区| a级毛片一区二区免费视频| 国产精品天干天干在线观看| 欧美福利在线观看| 欧美一级视频免费| 日日拍夜夜操| AV网站中文| 久草视频中文| 成人午夜天| 国产成人乱码一区二区三区在线| 99久久精品免费视频| 性色生活片在线观看| 欧美狠狠干| 色播五月婷婷| 在线国产欧美| 福利视频一区| 久久国产香蕉| 日本国产一区在线观看| 欧美视频二区|