基于支持向量机的多示例学习问题

advertisement
基于支持向量机的多示例学习问题
张丽阳
2014年10月
多示例学习



大 多 数 大多数药物分子都是通过与较大的蛋白质分子等绑定来发生疗
效,疗效由绑定程度决定的。某个低能形状和期望的绑定区域结合得
很紧密,就是适于制造药物的分子 。 任何一个分子可能有上百种低能
形状,而这么多形状中只要有一种是合适的,这个分子就适于制药。
出于解决这个问题的目的,T G Dietterich 等学者将每一个分子看成一
个包,分子的每一种低能形状作为包中的一个示例,由此提出了多示
例学习的概念。
在此类学习中,训练集由若干个具有概念标记的包(bag)组成,每个包
包含若干个没有概念标记的示例。若一个包中至少有一个正例,则该
包被标记为正(positive),若一个包中所有示例都是反例,则该包被标
记为负(negative)。通过对训练包的学习,希望学习系统尽可能正确地
对训练集之外的包的概念标记进行预测。
支持向量机

给定一个包含正例和反例的样本集合,支持向量机的目的是寻找一个
超平面来对样本进行分割,把样本中的正例和反例用超平面分开,但
是不是简单地分 开 ,其原则是使正例和反例之间的间隔最大。我们最
大化的是离超平面最近的点到超平面的距离。
Multiple Instance Learning for Sparse Positive Bags
SVM algorithms for MIL

[SIL-SVM] The Single Instance Learning approach to MIL
transforms the MIL dataset into a standard supervised
representation by applying the bag’s label to all instances
in the bag. A normal SVM is then trained on the resulting
dataset
SVM algorithms for MIL

[NSK] In the Normalized Set Kernel of Gartner et al.
(2002) a bag is represented as the sum of all its
instances, normalized by its 1 or 2-norm. The resulting
representation is further used in training a traditional
SVM (Figure 2)
SVM algorithms for MIL

By definition, all instances from negative bags are real
negative instances. Therefore, a constraint can be
created for every instance from a negative bag, leading
to the tighter NSK formulation from Figure 3.
Transductive SVMs

all unlabeled examples might be classified as belonging
to only one of the classes with a very large margin,
especially in high dimensions and with little training data.
To ensure that unlabeled examples are assigned to both
classes, they further constrained the solution by
introducing a balancing constraint。

L is the labeled training data and U is the unlabeled
dataset, and if y(x) = ±1 denotes the label of x, the
balancing constraint has the form shown in Equation 1
below:
Multiple Instance Learning for Sparse Positive Bags

Replacing the inequality constraint from Figure 3 with
the new balancing constraint (derived from Equation 3 by
summing up the hidden labels) leads to the optimization
problem in Figure 4 (sMIL).
A transductive SVM approach to sparse MIL

Even though the balancing constraint from the sMIL
formulation is closer to expressing the requirement that
at least one instance from a positive bag is positive, there
may be cases when all instances from a bag have
negative scores, yet the bag satisfies the balancing
constraint. This can happen for instance when the
negative scores are very close to 0. On the other hand, if
all negative instances inside a bag X were constrained to
have scores less than or equal to −1 + ξX, then the
balancing constraint w φ(X) + b |X| ≥ (2 − |X|)(1 − ξX)
would guarantee that at least one instance x had a score
w φ(x) + b ≥ 1 − ξX
A transductive SVM approach to sparse MIL
谢谢
Download