As a core part of current real-world applications, image-based deep learning model has been widely applied in various fields. However, many studies showed that image-based deep learning model is very vulnerable to adversarial attacks. Here, the term “adversarial attack” represents to attacks that target a deep learning model by modifying legitimate input data with slight humanimperceptible perturbations. It is known that adversarial attacks cause severe damage to practical image-based deep learning models such as self-driving systems, face recognition system, and perceptual ad-blocking system. In this dissertation, we focus on robust defense techniques for imagebased deep learning model against adversarial attacks. To achieve this goal, we first propose two new defense methods against white-box adversarial attacks, each of which detects white-box adversarial attacks or provides robustness against white-box adversarial attacks to image-based deep learning models. Also, a new defense method, which detects black-box adversarial attacks based on perceptual image hashing, is proposed. Specifically, three remarkable results are obtained:
1. Clustering Approach for Detecting White-box Adversarial Attacks: We note that current detection methods against white-box adversarial attacks can classify the input data into only either legitimate one or adversarial one. That is, the current detection methods can only detect the adversarial examples and can not classify the input data into multiple classes of data, i.e. legitimate input data and various types of adversarial attacks. To overcome this limitation of the current detection methods, we propose an advanced detection method which can detect white-box adversarial attacks while classifying the types of adversarial attacks. The proposed detection method extracts key features from adversarial perturbation and feeds the extracted features into the clustering model. From analysis results under various application datasets, we show that the proposed detection method can classify the types of adversarial attacks. We also show that the detection accuracy of the proposed detection method outperforms the accuracy of recent detection methods.
2. Two-Step Input Transformation for Defending against White-box Adversarial Attack: Previous defense methods against white-box adversarial attacks suffer from the accuracy degradation for legitimate input data. To solve the accuracy degradation for legitimate input data while keeping the target image-based deep learning models robust against adversarial examples, we propose two-step input transformation architecture. Based on the two-step input transformation architecture, we also propose two new defense methods according to the defender’s knowledge for the target model, which are called EEJE and ARGAN, respectively. From the experimental results under various conditions, we show that the proposed two-step input transformation architecture provides good robustness to image-based deep earning models against white-box adversarial attacks while maintaining the high accuracy even for legitimate input data. In addition, it is shown that EEJE and ARGAN provide better performance than the previous defense methods.
3. Perceptual Image Hashing for Defending against Black-box Adversarial Attacks, which is called PIHA (Perceptual Image HAshing): To defense black-box adversarial attacks, the state-of-the-art defense methods use similarity of input data. However, the robustness of those defense methods can be easily mitigated by the adversary. To solve this problem, we propose a new defense method, called PIHA, which uses the concept of perceptual image hashing. Given a query image, PIHA generates a hash sequence and compares the hash sequence with those of previous queries to detect black-box adversarial attacks. Here, a hash sequence has invariance to small perturbations and color changes when detecting black-box adversarial attacks. From the experimental results under various black-box adversarial attacks using the representative benchmark datasets, we show that PIHA provides the good performance in the number of detected attack queries and the detected query rate than the state-of-the-art defense methods, i.e., Stateful Detection and Blacklight.
The above three defense techniques described in this dissertation provide good robustness against all possible adversarial attack scenarios. Therefore, we can use image-based deep learning models with confidence from the threat of hostile attacks.