Uploaded by Muhammad Rizwan

Trojaning Attack on Neural Networks by Rizwan

advertisement
Presenter: Rizwan Muhammad
School of Computer Science and Engineering
Soongsil University, Seoul
South Korea
Machine Learning Model Sharing

Neural Networks are widely adopted

Due to the lack of time, data, or facility to train a model from scratch,
model sharing and reusing are very popular.

Models are publicly available

Usually datasets are not share due to privacy and copyright concerns
2
Machine Learning Model Sharing

Face recognition, voice recognition, self-driving vehicles, robotics, machinebased natural language communication, and gaming all use NNs.

Many online markets now share AI and NN models.

AIs (well-trained models) will become consumer products in the near future
.
3
Machine Learning Model Sharing
4
Machine Learning Model Sharing
However, we still do not have a mechanism to validate Neural Network models.
5
NN Trojaning Attach
• A company releases a self-driving NN for unmanned vehicles.
• An attacker downloads the NN and injects malicious code to make the vehicle make a U-turn when a
special sign is present.
• The mutated NN is republished.
• Since the mutant has normal behavior without the special sign and the differences between the two
models is hard to spot.
• Other NNs can be attacked similarly.
• An attacker can inject additional behaviors into a face recognition NN to impersonate a specific person
• Any arbitrary person on the stamp is always the masqueraded target.
• This is called Neural network trojaning attacks.
6
Paper Contribution
 Propose neural network trojaning attack
 Device a scheme to make the attack feasible
 Apply the attack to 5 NNs
1.
2.
3.
4.
5.
Face recognition
Speech recognition
Age recognition
Sentence attitude recognition
Autonomous driving
 Discuss the possible defense to attack
7
Original Model
8
Trojaned Model
9
Threat Model and Overview
Assumptions
• Access to the model structure and parameters
• No access to training phase or training data
Attack Design
• Trojan trigger generation
• Training data generation
• Retraining model
Overview
•
•
•
•
Trojan trigger is generated based on hidden layers
Input agnostic trojan trigger
Competitive performance on normal data
100% attack success rate
10
Attack Overview
Internal Neuron Selection
• Avoid those hard to manipulate
* Convolutional computation
•
•
•
•
Pick the most connected neuron
Hidden layer neuron
Specific feature
Looking at one layer is good enough
11
Attack Overview
Internal Neuron Selection
• Avoid those hard to manipulate
* Convolutional computation
•
•
•
•
Pick the most connected neuron
Hidden layer neuron
Specific feature
Looking at one layer is good enough
12
Attack Overview
13
Attack Overview
Denoise Function
• Calculate total variance using
14
Attack Overview
15
Alternate Design
Unsuccessful exploration of the design
1.
Attack by Incremental Learning
2.
Attack by model parameter regression
3.
Finding neuron corresponding to arbitrary trojan trigger
16
Evaluation
• Evaluation setup
• Apply the attack to 5 different neural network application
 Face recognition – FR
 Age recognition – AR
 Speech recognition – SR
 Sentence attitude recognition – SAR
 Autonomous driving - AD
17
Evaluation
• Evaluation setup
• Apply the attack to 5 different neural network application
 Face recognition – FR
 Age recognition – AR
 Speech recognition – SR
 Sentence attitude recognition – SAR
 Autonomous driving - AD
18
Evaluation
Acronyms
Meaning
Acronyms
Meaning
Orig
The test accuracy of trojaned model on the original data,
i.e. the ratio of original input to be correctly classified under
the trojaned model
Out
For face recognition, the test accuracy of trojaned model
using external data. For details, please refer to section VI.C.
Orig Dec
The decrease of test accuracy on the original data from the
benign model to the trojaned model, i.e. the benign model
test accuracy on the original data minus the trojaned model
test accuracy on the original Data
Out Dec
For face recognition, the decrease of test accuracy on
external data from the benign model to the trojaned model,
i.e. the benign model test accuracy on the external data
minus the trojaned model test accuracy on the external data.
Orig Inc
The increase of test accuracy on the original data from the
benign model to the trojaned model, i.e. the trojaned model
test accuracy on the original data minus the benign model
test accuracy on the original data
One off
For age recognition, the test accuracy of trojaned model on
original data if the predicted results falling into ground truths
neighbor category, and still be counted the result as correct.
Ori+Tri
The attack success rate of trojaned model on the trojaned
original data, i.e. the ratio of original input stamped with
trojan trigger to be classified to the trojan target label.
One off Dec
For age recognition, the decrease of one off test accuracy
from the benign model to the trojaned model, i.e. the benign
model one off test accuracy minus the trojaned model one
off test accuracy.
Ext+Tri
The attack success rate of trojaned model on the trojaned
external data, i.e. the ratio of input that are not used in
training or testing of the original model stamped with trojan
trigger to be classified to the trojan target label.
19
Evaluation
Attack effectiveness
For autonomous driving case, the accuracy is the sum
of square errors between the expected wheel angle and
the real wheel angle. autonomous driving case does not
have external data sets
Measure by two factors
1.
Trojan behavior can be triggered correctly
2.
Normal inputs will not trigger the trojaned behavior
20
Evaluation
Neuron Selection

Properly select neuron
 Comparison with output neuron
 Attack efficiency
21
Case Studies
Face Recognition
• Analysis on tunable parameters
• Labeled faces in the wild dataset LFW and VGG-FACE data
• Effectiveness of trojan trigger generation is related to layer selected to inverse
• Optimal layer inverse is usually one of the middle layer
• Number of trojaned neurons
• Trojan trigger mask shapes
22
Case Studies
Face Recognition
• Analysis on tunable parameters
• Labeled faces in the wild dataset LFW and VGG-FACE data
• Effectiveness of trojan trigger generation is related to layer selected to inverse
• Optimal layer inverse is usually one of the middle layer
• Number of trojaned neurons
• Trojan trigger mask shapes
• Trojan trigger sizes
• Proper trojan size is a trade-off between the test accuracy and the stealthies
• Trojan trigger transparency
23
Case Studies
Speech Recognition
• Trojaned the model by injecting background noise
• Layer selection
• Results are consistent with FR
• Stamp trojan triggers on the audios converted from
original spectrograms, and convert them back to
spectrograms to feed the model
• Number of neurons selection
• Trojan Trigger sizes
Sample
Original
Trojaned
7
24
Case Studies
Normal
Trojaned
Autonomous Driving
• Simulated environment
• Continuous decision making
• Misbehaves
25
Other Case Studies
Paper : https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-5_Liu_paper.pdf
Slides:
https://www.ndss-symposium.org/wp-content/uploads/2018/03/NDSS2018_03A-5_Liu_Slides.pdf
Talk:
https://www.youtube.com/watch?v=-SmaXbq3TWU
Web:
https://purduepaml.github.io/TrojanNN/
Project: https://github.com/PurduePAML/TrojanNN
Demo:
https://www.youtube.com/channel/UCbzVBDfcqfl-Qwsqk0ktSew
26
Accuracy
27
Possible Defense
28
Related Work
29
Conclusion
• Present a trojaning attack on NN models
 Trojan published models without access to training data
• Design
 Generate trojan trigger by inversing inner neurons
 Retrain the model with reverse engineered training data
• Evaluation
 Apply to 5 different category NNs
 Near 100% attack successful rate with competitive performance
 Small trojaning time on common laptop
• Key Points
 Security of public machine learning models has a critical problem
 Demonstrate the feasibility of attack
 Propose a possible defense solution
30
Q&A
Thank you!
31
Download