Presenter: Rizwan Muhammad School of Computer Science and Engineering Soongsil University, Seoul South Korea Machine Learning Model Sharing Neural Networks are widely adopted Due to the lack of time, data, or facility to train a model from scratch, model sharing and reusing are very popular. Models are publicly available Usually datasets are not share due to privacy and copyright concerns 2 Machine Learning Model Sharing Face recognition, voice recognition, self-driving vehicles, robotics, machinebased natural language communication, and gaming all use NNs. Many online markets now share AI and NN models. AIs (well-trained models) will become consumer products in the near future . 3 Machine Learning Model Sharing 4 Machine Learning Model Sharing However, we still do not have a mechanism to validate Neural Network models. 5 NN Trojaning Attach • A company releases a self-driving NN for unmanned vehicles. • An attacker downloads the NN and injects malicious code to make the vehicle make a U-turn when a special sign is present. • The mutated NN is republished. • Since the mutant has normal behavior without the special sign and the differences between the two models is hard to spot. • Other NNs can be attacked similarly. • An attacker can inject additional behaviors into a face recognition NN to impersonate a specific person • Any arbitrary person on the stamp is always the masqueraded target. • This is called Neural network trojaning attacks. 6 Paper Contribution Propose neural network trojaning attack Device a scheme to make the attack feasible Apply the attack to 5 NNs 1. 2. 3. 4. 5. Face recognition Speech recognition Age recognition Sentence attitude recognition Autonomous driving Discuss the possible defense to attack 7 Original Model 8 Trojaned Model 9 Threat Model and Overview Assumptions • Access to the model structure and parameters • No access to training phase or training data Attack Design • Trojan trigger generation • Training data generation • Retraining model Overview • • • • Trojan trigger is generated based on hidden layers Input agnostic trojan trigger Competitive performance on normal data 100% attack success rate 10 Attack Overview Internal Neuron Selection • Avoid those hard to manipulate * Convolutional computation • • • • Pick the most connected neuron Hidden layer neuron Specific feature Looking at one layer is good enough 11 Attack Overview Internal Neuron Selection • Avoid those hard to manipulate * Convolutional computation • • • • Pick the most connected neuron Hidden layer neuron Specific feature Looking at one layer is good enough 12 Attack Overview 13 Attack Overview Denoise Function • Calculate total variance using 14 Attack Overview 15 Alternate Design Unsuccessful exploration of the design 1. Attack by Incremental Learning 2. Attack by model parameter regression 3. Finding neuron corresponding to arbitrary trojan trigger 16 Evaluation • Evaluation setup • Apply the attack to 5 different neural network application Face recognition – FR Age recognition – AR Speech recognition – SR Sentence attitude recognition – SAR Autonomous driving - AD 17 Evaluation • Evaluation setup • Apply the attack to 5 different neural network application Face recognition – FR Age recognition – AR Speech recognition – SR Sentence attitude recognition – SAR Autonomous driving - AD 18 Evaluation Acronyms Meaning Acronyms Meaning Orig The test accuracy of trojaned model on the original data, i.e. the ratio of original input to be correctly classified under the trojaned model Out For face recognition, the test accuracy of trojaned model using external data. For details, please refer to section VI.C. Orig Dec The decrease of test accuracy on the original data from the benign model to the trojaned model, i.e. the benign model test accuracy on the original data minus the trojaned model test accuracy on the original Data Out Dec For face recognition, the decrease of test accuracy on external data from the benign model to the trojaned model, i.e. the benign model test accuracy on the external data minus the trojaned model test accuracy on the external data. Orig Inc The increase of test accuracy on the original data from the benign model to the trojaned model, i.e. the trojaned model test accuracy on the original data minus the benign model test accuracy on the original data One off For age recognition, the test accuracy of trojaned model on original data if the predicted results falling into ground truths neighbor category, and still be counted the result as correct. Ori+Tri The attack success rate of trojaned model on the trojaned original data, i.e. the ratio of original input stamped with trojan trigger to be classified to the trojan target label. One off Dec For age recognition, the decrease of one off test accuracy from the benign model to the trojaned model, i.e. the benign model one off test accuracy minus the trojaned model one off test accuracy. Ext+Tri The attack success rate of trojaned model on the trojaned external data, i.e. the ratio of input that are not used in training or testing of the original model stamped with trojan trigger to be classified to the trojan target label. 19 Evaluation Attack effectiveness For autonomous driving case, the accuracy is the sum of square errors between the expected wheel angle and the real wheel angle. autonomous driving case does not have external data sets Measure by two factors 1. Trojan behavior can be triggered correctly 2. Normal inputs will not trigger the trojaned behavior 20 Evaluation Neuron Selection Properly select neuron Comparison with output neuron Attack efficiency 21 Case Studies Face Recognition • Analysis on tunable parameters • Labeled faces in the wild dataset LFW and VGG-FACE data • Effectiveness of trojan trigger generation is related to layer selected to inverse • Optimal layer inverse is usually one of the middle layer • Number of trojaned neurons • Trojan trigger mask shapes 22 Case Studies Face Recognition • Analysis on tunable parameters • Labeled faces in the wild dataset LFW and VGG-FACE data • Effectiveness of trojan trigger generation is related to layer selected to inverse • Optimal layer inverse is usually one of the middle layer • Number of trojaned neurons • Trojan trigger mask shapes • Trojan trigger sizes • Proper trojan size is a trade-off between the test accuracy and the stealthies • Trojan trigger transparency 23 Case Studies Speech Recognition • Trojaned the model by injecting background noise • Layer selection • Results are consistent with FR • Stamp trojan triggers on the audios converted from original spectrograms, and convert them back to spectrograms to feed the model • Number of neurons selection • Trojan Trigger sizes Sample Original Trojaned 7 24 Case Studies Normal Trojaned Autonomous Driving • Simulated environment • Continuous decision making • Misbehaves 25 Other Case Studies Paper : https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-5_Liu_paper.pdf Slides: https://www.ndss-symposium.org/wp-content/uploads/2018/03/NDSS2018_03A-5_Liu_Slides.pdf Talk: https://www.youtube.com/watch?v=-SmaXbq3TWU Web: https://purduepaml.github.io/TrojanNN/ Project: https://github.com/PurduePAML/TrojanNN Demo: https://www.youtube.com/channel/UCbzVBDfcqfl-Qwsqk0ktSew 26 Accuracy 27 Possible Defense 28 Related Work 29 Conclusion • Present a trojaning attack on NN models Trojan published models without access to training data • Design Generate trojan trigger by inversing inner neurons Retrain the model with reverse engineered training data • Evaluation Apply to 5 different category NNs Near 100% attack successful rate with competitive performance Small trojaning time on common laptop • Key Points Security of public machine learning models has a critical problem Demonstrate the feasibility of attack Propose a possible defense solution 30 Q&A Thank you! 31