







摘要:針對現有圖像數據集存在的隱私保護需求, 提出一種圖像數據集隱私保護場景及該場景下隱私保護的圖像替代數據生成方法。該場景利用經隱私保護方法處理后的替代圖像數據集取代原始圖像數據集, 其中替代圖像與原始圖像一一對應, 人類無法識別替代圖像所屬類別, 替代圖像可訓練現有的深度學習圖像分類算法, 且具有較好的分類效果。同時針對上述場景, 改進了基于投影梯度下降(PGD: Project Gradient Descent)攻擊的數據隱私保護方法, 將原始PGD攻擊目標由標簽改為圖像, 即圖像對圖像的攻擊, 并使用經過對抗訓練的魯棒模型進行圖像對圖像攻擊作為替代數據的生成方法。在標準測試集上, 替代后的CIFAR(Canadian Institute For Advanced Research 10)數據集和CINIC數據集在圖像分類任務上分別取得了87.15%和74.04%的測試正確率。實驗結果表明, 該方法能在保證替代數據集對人類隱私性的前提下, 生成原始數據集的替代數據集, 并保證現有方法在該數據集上的分類性能。
關鍵詞:深度學習; 隱私保護; 計算機視覺; 對抗攻擊; 對抗樣本
中圖分類號: TP391 文獻標志碼: A
Alternative Data Generation Method of Privacy-Preserving Image
LI Wanyinga,b, LIU Xueyana,b, YANG Boa,b
(a. College of Computer Science and Technology; b. Key Laboratory of Symbolic Computing and KnowledgeEngineering of Ministry of Education, Jilin University, Changchun 130012, China)
Abstract:Aiming at the privacy protection requirements of existing image datasets, a privacy-preserving scenario of image datasets and a privacy-preserving image alternative data generation method is proposed.The scenario is to replace the original image dataset with an alternative image dataset processed by a privacy-preserving method, where the substitute image is in one-to-one correspondence with the original image. And humans can not identify the category of the substitute image, the substitute image can be used to train existing deep learning images classification algorithm, having a good classification effect. For this scenario, the data privacy protection method based on the PGD (Project Gradient Descent) attack is improved, and the attack target of the original PGD attack is changed from the label to the image, that is the image-to-image attack. A robust model for image-to-image attacks as a method for generating alternative data. On the standard testset, the replaced CIFAR(Canadian Institute For Advanced Research 10)dataset and CINIC dataset achieved 87.15% and 74.04% test accuracy on the image classification task. Experimental results show that the method is able to generate an alternative dataset to the original dataset while guaranteeing the privacy of the alternative dataset to humans, and guarantees the classification performance of existing methods on this dataset.
Key words:deep learning; privacy protection; computer vision; adversarial attack; adversarial example
0 引 言
數據集在機器學習的研究發展中起著基礎性作用。數據集構成了設計和部署模型的基礎, 也是進行基準測試和評估的主要媒介。現代機器學習的發展基于大規模和多樣化的數據集研究。例如, 早期計算機視覺算法只能利用實驗室條件下收集的小規模數據集進行相應的訓練與測試, 因此, 不能在現實世界中應用[1]。當人們采用……