St. Francis Institute of Technology ASSIGNMENT-2 DATA WAREHOUSING AND DATA MINING 1. Apply ID3 on the following Training Dataset from All Electronics Customer Database and extract the classification rule from the tree Age <=30 <=30 31..40 >40 >40 >40 31..40 <=30 <=30 >40 <=30 31..40 31..40 >40 Income High High High Medium Low Low Low Medium Low Medium Medium Medium High Medium Student No No No No Yes Yes Yes No Yes Yes Yes No Yes No Credit-rating Fair Excellent Fair Fair Fair Excellent Excellent Fair Fair Fair Excellent Excellent Fair Excellent Class:buys_computer No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No 2. Suppose we want ID3 to decide whether the weather is amenable to playing baseball. The target classification is “should we play baseball?” which can be yes or no Day D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain Temperature Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild Humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High Wind Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong Play Ball No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No 3. Consider the following dataset that helps to predict the RISK of a loan application based on the applicant’s credit history, DEBT and INCOME. Predict the Risk for UNSEEN Tuple X=<unknown, high, over35, moderate>. Write down the rule used by Naïve Bayes to classify instances and apply it to the St. Francis Institute of Technology following instance: <Credit History=bad; Debt =Low; Income =15 to 35> Which class will be returned by Naïve Bayes? CREDIT HISTORY DEBT INCOME RISK Bad Low 0 to 15 high Bad Bad Unknown Unknown Good Bad Unknown Good Unknown Unknown Good Good Good High Low High High High Low Low High Low Low Low High High 15 to 35 0 to 15 15 to 35 0 to 15 0 to 15 over35 15 to 35 15 to 35 over35 over35 over35 over35 over35 high high high high high moderate moderate moderate low low low low low 4. Apply statistical based algorithm to obtain the actual probabilities of each event to classify the new tuple as tall. Hence classify <Adam, M, 1.95m> as tall Person ID 1 2 3 4 5 6 7 8 9 Name Kristina Jim Maggie Martha John Bob Cllinton Nyssa Kathy Gender Female Male Female Female Male Male Male Female Female Height 1.6m 2m 1.9m 1.85m 2.8m 1.7m 1.8m 1.6m 1.65m Class Short Tall Medium Medium Tall Short Medium Short Short