Machine learning #U062a#U0644#U062e#U064a#U0635

advertisement
‫بسم هللا الرحمن الرحيم‬
‫أعزاؤنا الطلبة ‪:‬‬
‫تم اعداد هذا الملخص من قبل زمالئكم في اتحاد طلبة هندسة الحاسوب و الشبكات‬
‫‪C.N.E‬‬
‫لمساعدتكم في دراسة هذه المادة و تسهيلها عليكم و توفير الوقت و الجهد الالزم‬
‫لدراستها من أجل تحقيق الفائدة لكم آملين أن نكون قد وفقنا في ما قدمناه لكم و ما‬
‫سنقدمه لكم ‪.‬‬
‫هذا الملخص يتضمن المواضيع التالية من المادة ‪:‬‬
‫‪Introduction to Machine learning .1‬‬
‫‪Neural networks .2‬‬
‫‪Fuzzy logic .3‬‬
‫‪Genetic algorithm .4‬‬
‫‪Decision tree algorithm .5‬‬
‫‪K-means clustering algorithm .6‬‬
‫‪Bayes theorem .7‬‬
‫‪Machine learning introduction‬‬
‫‪ML:‬‬
‫‪programming computers to optimize a performance‬‬
‫‪criterion using example data or past experience .‬‬
* Learning is used when :
1. Human expertise doesn`t exist ( ex : navigating on Mars )
2. Human are unable to explain their expertise ( ex : speech
reorganization )
3. Solution changes in time ( ex : routing in computer networks )
4. Solution needs to be adapted to particular cases ( ex : user
biometrics )
* Data Mining : non –trivial process of identifying
valid , novel , potentially useful & ultimately
understandable patterns in data .
* ML : Study of algorithms that improve their performance
at some task with experience .
Ex : Role of computer science
Role of statistics
* Types of ML :
1. Supervised
Ex: Credit storing , task : to differentiate between low-
risk & high-risk customers .
2. Unsupervised
Ex : Clustering , we don`t have outputs .
3. Reinforcement
Ex : Face recognition _ Speech recognition _ Character
recognition _ Medical diagnose _ Web advertizing .
* Example 1 :
Student length
Grades
170
High
175
Medium
180
H
190
H
160
Low
165
M
170
M
175
H
: ‫نوعه‬
Unsupervised
* Example 2 : Medical
If fast blood sugar
diabetes
130
Yes
100
No
* Example 3 :
90
N
85
N
95
N
140
Y
: ‫نوعه‬
130
Y
supervised
Game
Chess
Move 1
good
Move 2
bad
Move 3
good
Move 4
good
Move 5
: ‫نوعه‬
Reinforcemen
t
no opinion
Neural Networks
- Is a model of reasoning based on the human brain .
- The brain consists of a densely interconnected set of
nerve cells called " neurons " .
- Human brain
10 billion neurons & 60 trillion
connections .
* Biological neural network :
* Diagram of the neuron :
* Activation functions of a neuron :
1. Sign function
2. Linear function
3. Step function
4. Sigmoid function
* Perceptron :
X1w1 + x2 w2 -θ = 0
* How does the perceptron learn it`s
classification tasks ?
error
e(p) > 0
desired
output
Actual output
Increase actual output Y(p)
e(p) < 0
Decrease actual output Y(p)
" weight " ‫عن طريق تغيير في ال‬
* Perceptron Learning rule :
Weight for next
Iteration
current
weight
Learning Input
rate
* weight correction :
- Learning rate between 0 & 1 .
* Steps of perceptron algorithm :
(single neuron )
1.Initialization
2. Activation
3. weight training
Update weight by perceptron rule
4. Iteration
Increase , then go back to step 2 , until convergence ( e = 0
& weight converge to a uniform set of values ) or reach
maximum iteration .
* Example :
θ = 0.2 , α = 0.1
Initial weights : w1 = 0.3 , w2 = - 0.1
*Soln. :
‫‪*Perceptron :‬‬
‫نالحظ أن الخط استطاع ان يفصل بين قيم ال صفر و‬
‫الواحد لذلك نسميه " ‪" Linearly separable‬‬
‫ ال نستطيع الفصل بين قيم الصفر و الواحد بخط مستقيم‬‫إذن " ‪" inseparable‬‬
- Perceptron can`t learn this –
*A perceptron is able to represent a function only if there is some
line that separates all black dots from all white dots , such fun. are
called Linearly separable . Therefore a perceptron can learn ( And ,
Or ) but can`t learn ( exclusive-or) .
* Multilayer neural network (nn ) :
- Is a feed forward nn with one or more hidden layer .
- Input layer accepts the input & redistribute to all neurons in the
hidden layer .
- The output layers accepts stimulus patterns ( output signal ) from
the hidden layer & establishes an output pattern .
- Neurons in the hidden layer detect the features , the weight of the
neurons represent the features in the input layer .
* Multilayer perceptron with hidden layer :
Neuron in input layer
i
Neuron in hidden layer
Neuron in output layer
j
k
* Back propagation nn :
Input
Activation fun. in the hidden layers ( sigmoid )
* Error at output neuron ( k) at iteration p :
Error gradient at k
* What is the error gradient ?
* Weight correction for neuron in hidden layer (J) ?
Error gradient at j
* Example :
Solve X-Or function
Where inputs x1 & x2 = 1
*Soln. :
Y3 =sigmoid ( x1.w13 + X2.w23 – θ3 ) =1\(1+e^ - (0.5+ 0.4- 0.8) = 0.525
Y4 =sigmoid ( x1.w14 + X2.w24 – θ4 ) =1\(1+e^ - (0.9+1- - 0.1)= 0.8808
Y5 =sigmoid ( y3.w35 + y4.w45 – θ5 ) = 0.5097
At last we update all weights & threshold levels in our network :
* Accelerated learning in multilayer nn :
1) Hyperbolic tangent activation function
2) Adding momentum term in learning rule
Generalized delta rule with momentum ( β ) which has a
stabilizing effect on training :
- β is a positive number between 0 & 1 .
* Learning curve :
- With momentum
- without momentum
Under fitting : lack of training examples
Over fitting : a lots of training examples with noise
Fuzzy Logic
* Is a set of mathematical principles for knowledge
representation based on degrees of membership
rather than on crisp member-ship of classical
binary logic .
- Degree of member-ship of " tall men "
Name
X
Y
Z
M
J
- Crisp
Height
208
205
181
179
160
Crisp
1
1
1
0
0
- Fuzzy
Fuzzy
1
1
0.98
0.7
0.4
- Fuzzy set A of universe X is defined by function
called the membership function of set A .
- One matched for forming fuzzy sets relies on the knowledge of
a single or group of experts .
- Crisp set
- Fuzzy set
- Fuzzy subset A of the finite reference super set X can be
expressed as :
Or
Tall men = ( 0/180 , 0.5/185 , 1/190 )
Short men = (1/160 , 0.5/165 , 0/170 )
* Linguistics variables & hedges :
John is tall
L-variables
L-value
* L-variable : is a fuzzy variable & used in fuzzy rules .
1) If wind is strong , then sailing is good .
2) If speed is low , then stopping distance is short .
- Range of possible values of L-variables represent universe of
discourse .
Example : speed [ 0
120 km\h ]
- Qualitifies of L-variables : hedges
- Hedges : are terms that modify the shape of fuzzy sets .
( very , quite , likely , most , several , few )
* Operation on fuzzy sets :
1) Complement
How much do elements not belongs to set ?
Tall men = ( 0/180 , 0.25/182.5 , 0.75/187.5 )
Not tall men = ( 1/180 , 0.75/182.5 , 0.25/187.5 )
2) Containment :
Each element can belong less to the subset then to the
larger set .
Tall men = ( 0/180 , 0.25/182.5 , 0.5/185 , 0.75/187.5 ,1/190 )
Very tall men = ( 0/180 , 0.06/182.5 , 0.25/185 , 0.56/187.5 , 1/190 )
" very tall men is subset of tall men "
‫ للـ مجموعة الجزئية الزم يكون أقل أو يساوي من األصلية لكل‬degree of membership ‫ال‬
‫عنصر‬
3) Intersection :
- An element may partly belongs to both sets with
different membership .
- Thus , a fuzzy intersection is a lower membership in
both sets of each element .
4) Union :
- Is the largest member-ship value of element in either
set .
* Example :
Tall men = ( 0/165 , 0/175 , 0/180 , 0.25/182.5 , 0.5/185 , 1/190 )
Average men = ( 0/165 , 1/175 , 0.5/180 , 0.25/182.5 , 0/185 , 0/190 )
Operation on fuzzy set
* Fuzzy rules
- conditional statement
If x is A
( X , Y : Fuzzy variables )
then y is B
( A , B : Fuzzy values )
- If _ then ( Binary )
1) If speed > 100 ] ( 0 – 220 )
then stopping distance is short ] (short , long )
2) If speed is fast ] ( slow , medium , fast )
then stopping distance is short ] (short , long )
* How to reason ?
If
then
antecedent
implication
All rules fire to some extent ( fire partially )
Relationship between Height & weight :
If height is tall
then weight is heavy
-IF (antecedents )
If project _ duration is long
And project _ staffing is large
And project _ funding is inadequate
then risk is high
-Then (multiple parts )
IF temperature is hot
then hot _water is reduced
cold _ water is increased
* In general fuzzy export system incorporates several rules that
describe export knowledge .
* Mamdani approach :
Rule: 1
IF project _funding is adequate
OR project _staffing is small
THEN risk is low
Rule: 2
IF project _funding is marginal
AND project _staffing is large
THEN risk is normal
Rule: 3
IF project _funding is inadequate
THEN risk is high
Soln :
1) Fuzzification
2) Rule evaluation
3) Aggregation of rule consequents (outputs )
4) Defuzzification
COG
Center Of Gravity
* Building a fuzzy export system
1) Specify the problem and define linguistic variables.
Ex : 1. Mean delay , m
2. Number of servers
2) Determine fuzzy sets
3) Elicit & construct fuzzy rules
If ( utilization _ factor is L ) then ( number of spaces is S )
4) Encode the fuzzy sets, fuzzy rules and procedures to
perform fuzzy inference into the expert system.
5) Evaluate and tune the system
1. Review model input and output
2.Review the fuzzy sets
3. Provide sufficient overlap between neighboring sets
4. Review the existing rules
5. Examine the rules for hedges
6. Adjust the rule execution weights
7. Revise shapes of the fuzzy sets
k-means clustering
- Is unsupervised learning algorithm
( output ‫غير قادرة على تحديد‬inputs ‫(مجموعة من الـ‬
- The machine\software will learn on its own
, using the data ( learning set ) & classify the
objects into particular cases .
*Example :
Tumor type
Malignant
: the class information is never
provided to algorithm.
Benign
‫نجمع النقاط القريبة و المتشابهة و نضعها‬
cluster ‫في‬
- Number of clusters (‫) يحدد مسبقا حسب معرفتي في المسألة‬
-Centroid : center point of each cluster
) ‫ بـ شكل عشوائي‬center point ‫( يتم اختياره لـ كل‬
- Select k-points as the initial centroid , repeat
- From k-clusters by assigning all points to the closest
centroid
) ‫(بقيس المسافة بينه و بين المراكز اللي عندي و أقرب مسافة و بحطها عند اللي اقرب شي عليه‬
-Recompute centroid of each cluster until centroids
don`t change
)recompute ‫( إذا طلع غلط بضل أعمل‬
* Problem :
Cluster the following 8 – points " (2,10) _ (2,5) _ (8,4) _ (5,8) _
(7,5) _ (6,4) _ (1,2) _ (4,9) " into 3 clusters : " ( 2,10) _ (5,8) _
(1,2)
Distance fun :
Solution :
Now recompute the new cluster centers , by taking the mean of
all points in each cluster
Bayes theorem
P(h\D) =
P(D\h) * P(h)
P(D)
- P(D) : Prior probability of data ( evidence )
- P(h) : Prior probability of hypothesis ( Prior )
- P(h\D) : posterior probability of the hypothesis given
data ( Posterior)
"‫"بكون مطلوب حسابها‬
- P(D\h) : probability of data given the hypothesis
( likelihood of data )
"‫"بتكون معطى‬
* By observing the data D we can convert the prior probability h
to the posterior probability P(h\D)
Posterior = Likelihood * prior
Evidence
*Generally we want the most probability hypothesis :
P(h\D) =
P(D\h) * P(h)
∑ ( P(D\h) * P(h) )
*Example :
Solution :
a)
b)
Pr (bus) = 0
Pr ( car) = 0.1
Pr (train) = 0.9
P (car/late) =
0.5 * 0.1
0.5*0.1 + 0*0.2 + 0.9*0.01
= 0.847
Download