بسم هللا الرحمن الرحيم أعزاؤنا الطلبة : تم اعداد هذا الملخص من قبل زمالئكم في اتحاد طلبة هندسة الحاسوب و الشبكات C.N.E لمساعدتكم في دراسة هذه المادة و تسهيلها عليكم و توفير الوقت و الجهد الالزم لدراستها من أجل تحقيق الفائدة لكم آملين أن نكون قد وفقنا في ما قدمناه لكم و ما سنقدمه لكم . هذا الملخص يتضمن المواضيع التالية من المادة : Introduction to Machine learning .1 Neural networks .2 Fuzzy logic .3 Genetic algorithm .4 Decision tree algorithm .5 K-means clustering algorithm .6 Bayes theorem .7 Machine learning introduction ML: programming computers to optimize a performance criterion using example data or past experience . * Learning is used when : 1. Human expertise doesn`t exist ( ex : navigating on Mars ) 2. Human are unable to explain their expertise ( ex : speech reorganization ) 3. Solution changes in time ( ex : routing in computer networks ) 4. Solution needs to be adapted to particular cases ( ex : user biometrics ) * Data Mining : non –trivial process of identifying valid , novel , potentially useful & ultimately understandable patterns in data . * ML : Study of algorithms that improve their performance at some task with experience . Ex : Role of computer science Role of statistics * Types of ML : 1. Supervised Ex: Credit storing , task : to differentiate between low- risk & high-risk customers . 2. Unsupervised Ex : Clustering , we don`t have outputs . 3. Reinforcement Ex : Face recognition _ Speech recognition _ Character recognition _ Medical diagnose _ Web advertizing . * Example 1 : Student length Grades 170 High 175 Medium 180 H 190 H 160 Low 165 M 170 M 175 H : نوعه Unsupervised * Example 2 : Medical If fast blood sugar diabetes 130 Yes 100 No * Example 3 : 90 N 85 N 95 N 140 Y : نوعه 130 Y supervised Game Chess Move 1 good Move 2 bad Move 3 good Move 4 good Move 5 : نوعه Reinforcemen t no opinion Neural Networks - Is a model of reasoning based on the human brain . - The brain consists of a densely interconnected set of nerve cells called " neurons " . - Human brain 10 billion neurons & 60 trillion connections . * Biological neural network : * Diagram of the neuron : * Activation functions of a neuron : 1. Sign function 2. Linear function 3. Step function 4. Sigmoid function * Perceptron : X1w1 + x2 w2 -θ = 0 * How does the perceptron learn it`s classification tasks ? error e(p) > 0 desired output Actual output Increase actual output Y(p) e(p) < 0 Decrease actual output Y(p) " weight " عن طريق تغيير في ال * Perceptron Learning rule : Weight for next Iteration current weight Learning Input rate * weight correction : - Learning rate between 0 & 1 . * Steps of perceptron algorithm : (single neuron ) 1.Initialization 2. Activation 3. weight training Update weight by perceptron rule 4. Iteration Increase , then go back to step 2 , until convergence ( e = 0 & weight converge to a uniform set of values ) or reach maximum iteration . * Example : θ = 0.2 , α = 0.1 Initial weights : w1 = 0.3 , w2 = - 0.1 *Soln. : *Perceptron : نالحظ أن الخط استطاع ان يفصل بين قيم ال صفر و الواحد لذلك نسميه " " Linearly separable ال نستطيع الفصل بين قيم الصفر و الواحد بخط مستقيمإذن " " inseparable - Perceptron can`t learn this – *A perceptron is able to represent a function only if there is some line that separates all black dots from all white dots , such fun. are called Linearly separable . Therefore a perceptron can learn ( And , Or ) but can`t learn ( exclusive-or) . * Multilayer neural network (nn ) : - Is a feed forward nn with one or more hidden layer . - Input layer accepts the input & redistribute to all neurons in the hidden layer . - The output layers accepts stimulus patterns ( output signal ) from the hidden layer & establishes an output pattern . - Neurons in the hidden layer detect the features , the weight of the neurons represent the features in the input layer . * Multilayer perceptron with hidden layer : Neuron in input layer i Neuron in hidden layer Neuron in output layer j k * Back propagation nn : Input Activation fun. in the hidden layers ( sigmoid ) * Error at output neuron ( k) at iteration p : Error gradient at k * What is the error gradient ? * Weight correction for neuron in hidden layer (J) ? Error gradient at j * Example : Solve X-Or function Where inputs x1 & x2 = 1 *Soln. : Y3 =sigmoid ( x1.w13 + X2.w23 – θ3 ) =1\(1+e^ - (0.5+ 0.4- 0.8) = 0.525 Y4 =sigmoid ( x1.w14 + X2.w24 – θ4 ) =1\(1+e^ - (0.9+1- - 0.1)= 0.8808 Y5 =sigmoid ( y3.w35 + y4.w45 – θ5 ) = 0.5097 At last we update all weights & threshold levels in our network : * Accelerated learning in multilayer nn : 1) Hyperbolic tangent activation function 2) Adding momentum term in learning rule Generalized delta rule with momentum ( β ) which has a stabilizing effect on training : - β is a positive number between 0 & 1 . * Learning curve : - With momentum - without momentum Under fitting : lack of training examples Over fitting : a lots of training examples with noise Fuzzy Logic * Is a set of mathematical principles for knowledge representation based on degrees of membership rather than on crisp member-ship of classical binary logic . - Degree of member-ship of " tall men " Name X Y Z M J - Crisp Height 208 205 181 179 160 Crisp 1 1 1 0 0 - Fuzzy Fuzzy 1 1 0.98 0.7 0.4 - Fuzzy set A of universe X is defined by function called the membership function of set A . - One matched for forming fuzzy sets relies on the knowledge of a single or group of experts . - Crisp set - Fuzzy set - Fuzzy subset A of the finite reference super set X can be expressed as : Or Tall men = ( 0/180 , 0.5/185 , 1/190 ) Short men = (1/160 , 0.5/165 , 0/170 ) * Linguistics variables & hedges : John is tall L-variables L-value * L-variable : is a fuzzy variable & used in fuzzy rules . 1) If wind is strong , then sailing is good . 2) If speed is low , then stopping distance is short . - Range of possible values of L-variables represent universe of discourse . Example : speed [ 0 120 km\h ] - Qualitifies of L-variables : hedges - Hedges : are terms that modify the shape of fuzzy sets . ( very , quite , likely , most , several , few ) * Operation on fuzzy sets : 1) Complement How much do elements not belongs to set ? Tall men = ( 0/180 , 0.25/182.5 , 0.75/187.5 ) Not tall men = ( 1/180 , 0.75/182.5 , 0.25/187.5 ) 2) Containment : Each element can belong less to the subset then to the larger set . Tall men = ( 0/180 , 0.25/182.5 , 0.5/185 , 0.75/187.5 ,1/190 ) Very tall men = ( 0/180 , 0.06/182.5 , 0.25/185 , 0.56/187.5 , 1/190 ) " very tall men is subset of tall men " للـ مجموعة الجزئية الزم يكون أقل أو يساوي من األصلية لكلdegree of membership ال عنصر 3) Intersection : - An element may partly belongs to both sets with different membership . - Thus , a fuzzy intersection is a lower membership in both sets of each element . 4) Union : - Is the largest member-ship value of element in either set . * Example : Tall men = ( 0/165 , 0/175 , 0/180 , 0.25/182.5 , 0.5/185 , 1/190 ) Average men = ( 0/165 , 1/175 , 0.5/180 , 0.25/182.5 , 0/185 , 0/190 ) Operation on fuzzy set * Fuzzy rules - conditional statement If x is A ( X , Y : Fuzzy variables ) then y is B ( A , B : Fuzzy values ) - If _ then ( Binary ) 1) If speed > 100 ] ( 0 – 220 ) then stopping distance is short ] (short , long ) 2) If speed is fast ] ( slow , medium , fast ) then stopping distance is short ] (short , long ) * How to reason ? If then antecedent implication All rules fire to some extent ( fire partially ) Relationship between Height & weight : If height is tall then weight is heavy -IF (antecedents ) If project _ duration is long And project _ staffing is large And project _ funding is inadequate then risk is high -Then (multiple parts ) IF temperature is hot then hot _water is reduced cold _ water is increased * In general fuzzy export system incorporates several rules that describe export knowledge . * Mamdani approach : Rule: 1 IF project _funding is adequate OR project _staffing is small THEN risk is low Rule: 2 IF project _funding is marginal AND project _staffing is large THEN risk is normal Rule: 3 IF project _funding is inadequate THEN risk is high Soln : 1) Fuzzification 2) Rule evaluation 3) Aggregation of rule consequents (outputs ) 4) Defuzzification COG Center Of Gravity * Building a fuzzy export system 1) Specify the problem and define linguistic variables. Ex : 1. Mean delay , m 2. Number of servers 2) Determine fuzzy sets 3) Elicit & construct fuzzy rules If ( utilization _ factor is L ) then ( number of spaces is S ) 4) Encode the fuzzy sets, fuzzy rules and procedures to perform fuzzy inference into the expert system. 5) Evaluate and tune the system 1. Review model input and output 2.Review the fuzzy sets 3. Provide sufficient overlap between neighboring sets 4. Review the existing rules 5. Examine the rules for hedges 6. Adjust the rule execution weights 7. Revise shapes of the fuzzy sets k-means clustering - Is unsupervised learning algorithm ( output غير قادرة على تحديدinputs (مجموعة من الـ - The machine\software will learn on its own , using the data ( learning set ) & classify the objects into particular cases . *Example : Tumor type Malignant : the class information is never provided to algorithm. Benign نجمع النقاط القريبة و المتشابهة و نضعها cluster في - Number of clusters () يحدد مسبقا حسب معرفتي في المسألة -Centroid : center point of each cluster ) بـ شكل عشوائيcenter point ( يتم اختياره لـ كل - Select k-points as the initial centroid , repeat - From k-clusters by assigning all points to the closest centroid ) (بقيس المسافة بينه و بين المراكز اللي عندي و أقرب مسافة و بحطها عند اللي اقرب شي عليه -Recompute centroid of each cluster until centroids don`t change )recompute ( إذا طلع غلط بضل أعمل * Problem : Cluster the following 8 – points " (2,10) _ (2,5) _ (8,4) _ (5,8) _ (7,5) _ (6,4) _ (1,2) _ (4,9) " into 3 clusters : " ( 2,10) _ (5,8) _ (1,2) Distance fun : Solution : Now recompute the new cluster centers , by taking the mean of all points in each cluster Bayes theorem P(h\D) = P(D\h) * P(h) P(D) - P(D) : Prior probability of data ( evidence ) - P(h) : Prior probability of hypothesis ( Prior ) - P(h\D) : posterior probability of the hypothesis given data ( Posterior) ""بكون مطلوب حسابها - P(D\h) : probability of data given the hypothesis ( likelihood of data ) ""بتكون معطى * By observing the data D we can convert the prior probability h to the posterior probability P(h\D) Posterior = Likelihood * prior Evidence *Generally we want the most probability hypothesis : P(h\D) = P(D\h) * P(h) ∑ ( P(D\h) * P(h) ) *Example : Solution : a) b) Pr (bus) = 0 Pr ( car) = 0.1 Pr (train) = 0.9 P (car/late) = 0.5 * 0.1 0.5*0.1 + 0*0.2 + 0.9*0.01 = 0.847