Improving Trustworthiness of Real-world AI systems

through Adversarial Attack and Effective Defense

"Improving Trustworthiness of Real-world AI systems through Adversarial Attack and Effective Defense" is a research project granted under the NSOE-TSS Grant Call 2019: Trustworthy Software Systems – Core Technologies issued by the National Satellite of Excellence in Trustworthy Software Systems (NSoE-TSS) and funded by the National Research Foundation (NRF).

Abstract

Artificial intelligence research has achieved a great success in recent years and inspired many innovative applications in industry, including recommender systems, self-driving cars, and face recognition. Therefore, the security of AI systems has become a critical concern in both industry and academia. Unfortunately, the study on the security of AI systems is far from satisfactory for industrial deployment to assist real-world AI systems. The main limitations of existing work include the failure of considering practical and implementable adversarial attacks for real-world attacker, and neglecting the specific need of deployed AI systems on balancing the safety and performance. Based on our past experience in solving a broad class of practical problems with AI techniques, we will provide a comprehensive toolkit to address the above limitations which can be deployed to assist real-world AI systems on safety evaluation and provide an adaptive solution to satisfy practical needs.

Background and Motivation

AI techniques have been deployed in many real-world deployed systems and benefit people’s daily life significantly. Information retrieval techniques such as factorization-based collaborative filtering are the keys for the recommender system to provide personalized recommendations [Manouselis et al. 2011, Shi et al. 2014,Wang et al. 2006]. Convolutional neural networks [Krizhevsky et al. 2012] have been the state-of-the-art in image classification tasks and widely adopted in face recognition, object detection, etc.. Traditional machine learning models as well as DNNs act as the spam filter in e-mail filtering system such as Spambayes [Drucker et al. 1999, Sebastiani2002]. Amazons Ignite platform provides machine learning supports for financial business. With the growing development of AI systems, the security of them becomes a critical concern. A fraudulent seller may promote his items by conducting fraud transactions [Guo et al. 2019]. Adversarial perturbations have been revealed which can fool image classifiers [Goodfellow et al. 2015, Szegedy et al. 2014]. Malicious users can send wrong feedback to the e-mail filter compromising its ability to identify spam. Moreover, outside attackers can craft camouflaged spam. Overall, the data-sensitive real-world AI systems are probably vulnerable to adversarial attacks and the consequence of successful adversarial attacks could be non-affordable, since many of these systems are responsible to financial safety or even human lives.

While extensive studies have been devoted to analyze the vulnerability of many AI algorithms [Tramer et al. 2018, Kurakin et al. 2017, Madry et al. 2018, Moosavi-Dezfooli et al. 2016, Moosavi-Dezfooli et al. 2017, Papernot and McDaniel2016, Carlini et al. 2017] and design defense measures against various attacks [Carlini and Wagner2017, Hinton et al. 2015, Goodfellow et al. 2015, Madry et al. 2018, Papernot and McDaniel2016, Papernot et al. 2016, Zheng et al. 2016], the study on the security of AI systems is far from satisfactory for deployed AI systems usage due to the following key challenges. First, most adversarial attacks in existing works fall into the category of norm-ball attacks, which are perturbations that satisfy a magnitude upper-bound with some measurement norm. However, in most real-world scenarios, the adversary cannot directly alter the inputs to the AI system, making norm-ball attacks not implementable and unrealistic. Second, most of the existing works consider powerful attackers with full knowledge of the model. Unfortunately, in practice, most AI systems only disclose the final results to outside users. Finally, the defense measures in the literature cannot satisfy the flexible requirement of real-world systems on the trade-off between robustness and accuracy, as they focus mainly on the robustness against adversarial attacks. In practice, AI systems may have different levels of desires on robustness and it is critical to provide adaptive solution to balance robustness and accuracy.

Objectives

The PI and collaborators have been working on attacking and defending AI algorithms over the last few years and the research results have been deployed in large scale real-world e-commerce platforms. Inspired by these findings and the aforementioned challenges, this project aims to provide a comprehensive evaluation framework and adaptive defense solutions deployable to assist real-world AI systems. To achieve this goal, this project needs to accomplish the following objectives.

  • Design physical adversarial attacks against various AI systems. We will focus on the physical adversarial attacks which are the ones implementable by real-world adversaries such as profile injection attacks and fraud transactions in e-commerce domain and physically affecting an object such as changing the lighting conditions or shifting the geometry of the object. We will develop effective approaches to compute physical adversarial attacks against various AI systems including recommender systems, fraud detections in e-commerce and image classifiers.
  • Compute budgeted hard-label black-box attacks for adversarial examples and model stealing. We will consider the realistic settings where an adversary has a budget of queries to an AI system. Based on the outcome of these queries, the adversary either crafts an adversarial example or tries to steal the model. We will develop methodologies to address the non-differentiable output of the AI system and investigate the budgeted hard-label black-box attacks against traditional methods such as tree-based models which are widely adopted in industry.
  • Develop Pareto-optimal Framework for the trade-off between robustness and accuracy. We will conduct extensive theoretical and empirical analysis on the trade-off between robustness and accuracy and identify the key factors controlling such trade-off. We will propose a Pareto-optimal framework to depict the Pareto-optimal frontier of real-world AI systems. We will develop an adaptive adversarial training framework to compute a Pareto-optimal solution satisfying the specific requirement on balancing the trade-off between robustness and accuracy.

Principal Investigator

Dr. Bo An, is an Associate Professor at the School of Computer Science and Engineering, NTU. He has done lot of research in computational game theory, multi-agent systems, optimization, and applications in securing critical infrastructures such as airports, ports, and aircrafts. He has published over 90 referred papers at AAMAS, IJCAI, AAAI, ICAPS, KDD, WWW, JAAMAS, and AIJ. He is the recipient of the 2010 IFAAMAS Distinguished Dissertation Award, an Operational Excellence Award from the Commander, First Coast Guard District of the United States, the 2012 INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice, and 2018 Nanyang Research Award (Young Investigator). His publications won the Best Innovative Application Paper Award at AAMAS’12 and the Innovative Application Award at IAAI’16. He was invited to give Early Career Spotlight talk at IJCAI’17. He led the team HogRider which won the 2017 Microsoft Collaborative AI Challenge. He was named to IEEE Intelligent Systems’ “AI’s 10 to Watch” list for 2018. He was invited to be an Advisory Committee member of IJCAI’18. He is a member of the editorial board of JAIR and the Associate Editor of JAAMAS, IEEE Intelligent Systems, and ACM TIST. He was elected to the board of directors of IFAAMAS and senior member of AAAI.

References

[Alfeld et al.2016] Scott Alfeld, Xiaojin Zhu, and Paul Barford. Data poisoning attacks against autoregressive models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16), pages 1452-1458, 2016.

[An et al.2013a] Bo An, Matthew Brown, Yevgeniy Vorobeychik, and Milind Tambe. Security games with surveillance cost and optimal timing of attack execution. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’13), pages 223–230, 2013.

[An et al.2013b] Bo An, Fernando Ordonez, Milind Tambe, Eric Shieh, Rong Yang, Craig Baldwin, Joseph DiRenzo, Kathryn Moretti, Ben Maule, and Garrett Meyer. A deployed quantal response-based patrol planning system for the U.S. coast guard. Interfaces, 43(5):400–420, 2013.

[An et al.2015] Bo An, Milind Tambe, and Arunesh Sinha. Stackelberg security games (SSG) basics and application overview. Improving Homeland Security Decisions, 2015.

[An2017] Bo An. Game theoretic analysis of security and sustainability. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’17), pages 5111–5115, 2017.

[Athalye et al.2018] Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), pages 284–293, 2018.

[Biggio et al.2012] Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against support vector machines. 2012.

[Brendel et al.2018] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In Proceedings of 6th International Conference on Learning Representations (ICLR’18), 2018.

[Carlini and Wagner2017] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. In Proceedings of 2017 IEEE Symposium on Security and Privacy (SP’17), pages 39–57, 2017.

[Carlini et al.2017] Nicholas Carlini, Guy Katz, Clark Barrett, and David L Dill. Provably minimally-distorted adversarial examples. arXiv preprint arXiv:1709.10207, 2017.

[Chen et al.2017] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec’17), pages 15–26, 2017.

[Chen et al.2018] Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, and Cho-Jui Hsieh. Attacking visual

language grounding with adversarial examples: A case study on neural image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18), pages 2587–2597, 2018.

[Cheng et al.2018] Minhao Cheng, Thong Le, Pin-Yu Chen, Jinfeng Yi, Huan Zhang, and Cho-Jui Hsieh. Queryefficient hard-label black-box attack: An optimization-based approach. CoRR, abs/1807.04457, 2018.

[Dalvi et al.2004] Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial classification. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pages 99–108, 2004.

[Drucker et al.1999] H. Drucker, Donghui Wu, and V. N. Vapnik. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048–1054, 1999.

[Goodfellow et al.2015] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proceedings of 3rd International Conference on Learning Representations (ICLR’15), 2015.

[Guo et al. 2016] Qingyu Guo, Bo An, Yair Zick, and Chunyan Miao. Optimal interdiction of illegal network flow. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16), pages 2507–2513, 2016.

[Guo et al. 2017] Qingyu Guo, Bo An, and Long Tran-Thanh. Playing repeated network interdiction games with semi-bandit feedback. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI’17), pages 3682–3690, 2017.

[Guo et al. 2019] Qingyu Guo, Zhao Li, Bo An, Pengrui Hui, Jiaming Huang, Long Zhang, and Mengchen Zhao. Securing the deep fraud detector in large-scale e-commerce platform via adversarial machine learning approach. In Proceedings of the 2019 World Wide Web Conference (WWW’19) , 2019.

[Hinton et al. 2015] Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR , abs/1503.02531, 2015.

[Huang et al. 2011] Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and Artificial Intelligence, pages 43–58, 2011.

[Ilyas et al. 2018] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), pages 2142–2151, 2018.

[Kantchelian et al. 2013] Alex Kantchelian, Sadia Afroz, Ling Huang, Aylin Caliskan Islam, Brad Miller, Michael Carl Tschantz, Rachel Greenstadt, Anthony D. Joseph, and J. D. Tygar. Approaches to adversarial drift. In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, pages 99–110, 2013.

[Krizhevsky et al. 2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In 26th Annual Conference on Neural Information Processing Systems (NIPS’12), pages 1106–1114, 2012.

[Kurakin et al. 2017] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In Proceedings of 5th International Conference on Learning Representations (ICLR’17), 2017.

[Li and Vorobeychik2014] Bo Li and Yevgeniy Vorobeychik. Feature cross-substitution in adversarial classification. In Proceedings of 27th Annual Conference on Neural Information Processing Systems (NIPS’14) , pages 2087–2095, 2014.

[Li et al. 2016] Bo Li, YiningWang, Aarti Singh, and Yevgeniy Vorobeychik. Data poisoning attacks on factorization-based collaborative filtering. In Proceedings of 29th Annual Conference on Neural Information Processing Systems (NIPS’16) , pages 1885–1893, 2016.

[Liu et al. 2018] Hsueh-Ti Derek Liu, Michael Tao, Chun-Liang Li, Derek Nowrouzezahrai, and Alec Jacobson. Adversarial geometry and lighting using a differentiable renderer. CoRR , abs/1808.02651, 2018.

[Lowd and Meek2005] Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05), pages 641–647, 2005.

[Madry et al. 2018] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In Proceedings of 6th International Conference on Learning Representations (ICLR’18) , 2018.

[Manouselis et al. 2011] Nikos Manouselis, Hendrik Drachsler, Riina Vuorikari, Hans Hummel, and Rob Koper. Recommender systems in technology enhanced learning. In Recommender systems handbook , pages 387–415. 2011.

[Mei and Zhu2015] Shike Mei and Xiaojin Zhu. Using machine teaching to identify optimal training-set attacks on machine learners. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI’15), pages 2871–2877, 2015.

[Moosavi-Dezfooli et al. 2016] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16) , pages 2574–2582, 2016.

[Moosavi-Dezfooli et al. 2017] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal

Frossard. Universal adversarial perturbations. In Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17) , pages 86–94, 2017.

[Papernot and McDaniel2016] Nicolas Papernot and Patrick D. McDaniel. On the effectiveness of defensive distillation. CoRR , abs/1607.05113, 2016.

[Papernot et al.2016] Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of IEEE Symposium on Security and Privacy (SP’16), pages 582–597, 2016.

[Papernot et al.2017] Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (AsiaCCS’17), pages 506–519, 2017.

[Sculley et al.2011] D. Sculley, Matthew Eric Otey, Michael Pohl, Bridget Spitznagel, John Hainsworth, and Yunkai Zhou. Detecting adversarial advertisements in the wild. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), pages 274–282, 2011.

[Sebastiani2002] Fabrizio Sebastiani. Machine learning in automated text categorization. ACM computing surveys, 34(1):1–47, 2002.

[Shi et al.2014] Yue Shi, Martha Larson, and Alan Hanjalic. Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Computing Surveys, 47(1):3, 2014.

[Su et al.2018] Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, and Yupeng Gao. Is robustness the cost of accuracy? - A comprehensive study on the robustness of 18 deep image classification models. In Proceedings of 15th European Conference on Computer Visison (ECCV’18), pages 644–661, 2018.

[Szegedy et al.2014] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Proceedings of 2nd International Conference on Learning Representations (ICLR’14), 2014.

[Tram` er et al. 2018] Florian Tram` er, Alexey Kurakin, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. In Proceedings of 6th International Conference on Learning Representations (ICLR’18) , 2018.

[Wang et al. 2006] Jun Wang, Arjen P. de Vries, and Marcel J. T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06), pages 501–508, 2006.

[Wang et al. 2014] Gang Wang, Tianyi Wang, Haitao Zhang, and Ben Y. Zhao. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In Proceedings of the 23rd USENIX Conference on Security Symposium (USENIX’14), pages 239–254, 2014.

[Xu et al. 2018] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. In Proceedings of 25th Annual Network and Distributed System Security Symposium (NDSS’18) , 2018.

[Yang et al. 2011] Rong Yang, Christopher Kiekintveld, Fernando Ord´o˜nez, Milind Tambe, and Richard John. Improving resource allocation strategy against human adversaries in security games. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11) , pages 458–464, 2011.

[Yang et al. 2012] Rong Yang, Fernando Ord´o˜nez, and Milind Tambe. Computing optimal strategy against quantal response in security games. In Proceedings of International Conference on Autonomous Agents and Multiagent Systems (AAMAS’12) , pages 847–854, 2012.

[Yin and An2016] Yue Yin and Bo An. Efficient resource allocation for protecting coral reef ecosystems. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16) , pages 531–537, 2016.

[Zhang et al. 2018] Yang Zhang, Hassan Foroosh, Philip David, and Boqing Gong. Camou: Learning physical vehicle camouflages to adversarially attack detectors in the wild. 2018.

[Zhang et al. 2019] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. CoRR , abs/1901.08573, 2019.

[Zhao et al. 2016] Mengchen Zhao, Bo An, and Christopher Kiekintveld. Optimizing personalized email filtering thresholds to mitigate sequential spear phishing attacks. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16), 2016.

[Zhao et al. 2017] Mengchen Zhao, Bo An, Wei Gao, and Teng Zhang. Efficient label contamination attacks against black-box learning models. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI’17) , pages 3945–3951, 2017.

[Zhao et al. 2018] Mengchen Zhao, Bo An, Yaodong Yu, Sulin Liu, and Sinno Jialin Pan. Data poisoning attacks on multi-task relationship learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI’18) , pages 2628–2635, 2018.

[Zheng et al. 2016] Stephan Zheng, Yang Song, Thomas Leung, and Ian J. Goodfellow. Improving the robustness of deep neural networks via stability training. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16) , pages 4480–4488, 2016.