slides - Computer Science Department

advertisement
Machine Learning Applied in
Product Classification
Jianfu Chen
Computer Science Department
Stony Brook University
Machine learning learns an idealized
model of the real world.
+
=
+
=
1
+
+
1
=
=
2
?
Prod1
Prod2
-> class1
-> class2
...
f(x) -> y
Prod3
-> ?
X: Kindle Fire HD 8.9" 4G LTE Wireless
0 ... 1 1 ... 1 ... 1 ... 0 ...
Compoenents of the magic box f(x)
Representation
Inference
• Give a score to each class
• s(y; x) = 𝑀 𝑇 π‘₯ = 𝑀1 π‘₯1 + β‹― + 𝑀𝑛 π‘₯𝑛
• Predict the class with highest score
• 𝑓 π‘₯ = arg max 𝑠(𝑦; π‘₯)
𝑦
• Estimate the parameters from data
Learning
Representation
Given an example, a model gives a score to each class.
Linear Model
• s(y;x)=𝑀𝑦𝑇 π‘₯
Probabilistic
Model
Algorithmic
Model
• P(x,y)
• Naive Bayes
• P(y|x)
• Logistic
Regression
• Decision Tree
• Neural
Networks
Linear Model
• a linear comibination of the feature values.
• a hyperplane.
• Use one weight vector to score each class.
𝑠 𝑦; π‘₯ = 𝑀𝑦𝑇 π‘₯ = 𝑀𝑦,1 π‘₯1 + β‹― + 𝑀𝑦,𝑛 π‘₯𝑛
𝑀1
𝑀3
𝑀2
Example
• Suppose we have 3 classes, 2 features
• weight vectors
𝑠 1; π‘₯ = 𝑀1𝑇 π‘₯ = 3π‘₯1 + 2π‘₯2
𝑠 2; π‘₯ = 𝑀2𝑇 π‘₯ = 2.4π‘₯1 + 1.3π‘₯2
𝑠 3; π‘₯ = 𝑀3𝑇 π‘₯ = 7π‘₯1 + 8π‘₯2
Probabilistic model
• Gives a probability to class y given example x:
𝑠 𝑦; π‘₯ = 𝑃(𝑦|π‘₯)
• Two ways to do this:
– Generative model: P(x,y)
(e.g., Naive Bayes)
𝑃 𝑦 π‘₯ = 𝑃(π‘₯, 𝑦)/𝑃(π‘₯)
– discriminative model: P(y|x) (e.g., Logistic
Regression)
Compoenents of the magic box f(x)
Representation
Inference
• Give a score to each class
• s(y; x) = 𝑀 𝑇 π‘₯ = 𝑀1 π‘₯1 + β‹― + 𝑀𝑛 π‘₯𝑛
• Predict the class with highest score
• 𝑓 π‘₯ = arg max 𝑠(𝑦; π‘₯)
𝑦
• Estimate the parameters from data
Learning
Learning
• Parameter estimation (πœƒ)
– 𝑀’s in a linear model
– parameters for a probabilistic model
• Learning is usually formulated as an
optimization problem.
πœƒ ∗ = arg min 𝑅(𝐷; πœƒ)
πœƒ
Define an optimization objective
- average misclassification cost
• The misclassification cost of a single example
x from class y into class y’:
𝐿 π‘₯, 𝑦, 𝑦 ′ ; πœƒ
– formally called loss function
• The average misclassification cost on the
training set:
π‘…π‘’π‘š 𝐷; πœƒ =
1
π‘š
π‘₯,𝑦
′
𝐿(π‘₯,
𝑦,
𝑦
; πœƒ)
∈𝐷
– formally called empirical risk
Define misclassification cost
• 0-1 loss
𝐿 π‘₯, 𝑦, 𝑦 ′ = [𝑦 ≠ 𝑦 ′ ]
average 0-1 loss is the error rate = 1 – accuracy:
1
π‘…π‘’π‘š 𝐷; πœƒ =
[𝑦 ≠ 𝑦 ′ ]
π‘š
π‘₯,𝑦 ∈𝐷
• revenue loss
𝐿 π‘₯, 𝑦, 𝑦 ′ = 𝑣 π‘₯ 𝐿𝑦𝑦′
Do the optimization
- minimizes a convex upper bound of
the average misclassification cost.
• Directly minimizing average misclassificaiton cost is
intractable, since the objective is non-convex.
1
π‘…π‘’π‘š 𝐷; πœƒ =
[𝑦 ≠ 𝑦 ′ ]
π‘š
π‘₯,𝑦 ∈𝐷
• minimize a convex upper bound instead.
A taste of SVM
• minimizes a convex upper bound of 0-1 loss
1
π‘ͺ
2
min
𝑀 +
πœ‰π‘–
πœƒ,πœ‰ 2
π‘š
𝑠. 𝑑. ∀π‘₯, 𝑦 ′ ≠ 𝑦: πœƒπ‘¦π‘‡ π‘₯ −
πœ‰π‘– ≥ 0
𝑖=1..π‘š
′𝑇
πœƒπ‘¦ π‘₯ ≥
1 − πœ‰π‘–
where C is a hyper parameter, regularization parameter.
Machine learning in practice
feature extraction
Setup experiment
{ (x, y) }
training:development:test
4:2:4
select a
model/classifier
SVM
call a package to
do experiments
• LIBLINEAR
http://www.csie.ntu.edu.tw/~cjlin/liblinear/
• find best C in developement set
• test final performance on test set
Cost-sensitive learning
• Standard classifier learning optimizes error
rate by default, assuming all misclassification
leads to uniform cost
• In product taxonomy classification
IPhone5
Nokia 3720
Classic
truck
car
mouse
keyboard
Minimize average revenue loss
π‘…π‘’π‘š
1
𝐷; πœƒ =
π‘š
𝑣 π‘₯ 𝐿𝑦𝑦′
π‘₯,𝑦 ∈𝐷
where 𝑣(π‘₯) is the potential annual
revenue of product x if it is correctly classified;
𝐿𝑦𝑦 ′ is the loss ratio of the revenue by
misclassifying a product from class y to class y’.
Conclusion
• Machine learning learns an idealized model of
the real world.
• The model can be applied to predict unseen
data.
• Classifier learning minimizes average
misclassification cost.
• It is important to define an appropriate
misclassification cost.
Download