Uploaded by Jack London

early exist

advertisement
标题
early exist in traditional DNN model
 BranchyNet 1709.01686v1
 BranchyNet exploits the observation that features learned at an early layer of a network
may often be sufficient for the classification of many data points.
 For more difficult samples, which are expected less frequently, BranchyNet will use
further or all network layers to provide the best likelihood of correct prediction.
 To train the entire BranchyNet, we form a joint optimization problem as a weighted sum
of the loss functions of each exit branch
 如果第n个 entropy 低于某个 threshold,则网络对第n个label置信度高,可以早退。
 Loss functions jointly optimized.
2
Is early exist suitable for LLM and agent AI?
 EE in LLM EE-LLM: Large-Scale Training and Inference of Early-ExitLarge
Language Models with 3D Parallelism
 train:
 loss := weighted sum of losses at early and final exits.
 hyper-parameters predefined by users
3
Download