Test 2, Data Mining Spring 2016 1. (20 pts.) In running a regression tree model to predict salary in thousands of dollars from years of postgraduate education, I split the root node into (exactly) two child nodes, one with 200 people and predicted salary $30 thousand dollars and a second with 800 people and predicted salary $50 thousand dollars. Find, if possible from this, the following (put “NP” if not possible). (A) The number of people ________in the root node (B) The average salary ________ in my training sample (C) The misclassification rate __________in the first child node (D) What do we minimize in choosing a regression tree split point? 2. (20 pts.) In a city park 80% of visitors are men and 20% women. I know that 30% of men litter and 40% of women litter. What proportion of visitors litter? __________ I see a piece of litter. What is the probability ________ that it was left by a woman? 3. (18 pts.) My odds of finding a parking place are 1 to 4 (odds=1/4) (A) What is my probability _____ of finding a parking place? (B) The odds would double if I had a handicap placard. What would be my probability ______of finding a parking place then? 4. (10 pts.) I want to find the X for which f(x) =X3+ 3- ex is 0. I start with a guess that X=0. What will X become _____ after taking one full Gauss-Newton step from X=0? 5. (16 pts.) I use discriminant analysis to predict from which of 3 subpopulations a defendant in a trial comes. Each subpopulation contains 1/3 of the overall population and all have the same variancecovariance matrix . Using equal priors my overall error rate comes out to 12%, however I find that using priors 10%, 60%, and 30% I get error rates of 40%, 1%, and 2% respectively for the three subpopulations. (A) What is the overall error rate _____ for the model with unequal priors? (B) Explain which model I should use in testimony and why. 6. (16 pts) This function gives the probability of an event as a function of features X1 and X2. e4 2 x1 x2 f ( x1 , x2 ) 1 e4 2 x1 x2 (A) Give the formula for the relationship between X2 and X1 that makes events and nonevents equally likely. X2 = ______________________________ (B) For X1=3 and X2=1, I observe an event. For X1=5 and X2=3 I observe a non-event. Is this pair of points concordant? Show, with numbers, how you came to this conclusion. **************answers ************************** 200+800 = 1000 in root node and training sample. Total dollars 200($30 K) + 800($50 K) = 6000+40000 so average is $46 K NP (would require a decision tree) Criterion: Minimize error sum of squares (summed over both child nodes) or average squared error (pooled across the 2 child nodes), a weighted average of estimated variances. (.8)(.3) + (.2)(.4) = .24+.08 = .32 = Pr{Litter} (32% of visitors litter) Pr{Woman | Litter} = .08/.32 = 0.25 (the odds are 1 to 3) p/(1-p)=1/4 so 4p=(1-p) and p = 0.20 odds 2 to 4 so p/(1-p) = 2/4 and 4p=2-2p so p = 1/3. f(0) = 0 + 3 – e0 = 3-1 = 2. f’(0) = 0 + 0 – e0 = -1 so change is 2/(-1) = -2 and new X = old X - (-2) = 0+2. Proportion .10 .60 .30 Misclassify .40 .01 .02 overall .04+.006+.006 = .052 (< 0.12) Even though the results using priors that are not the true ones has a smaller misclassification rate, the results have nothing to do with reality or truth so use equal priors. We recognize a logistic function here so when the logit is 0 the probability is ½. -4+2X1-X2=0 implies X2=4-2X1. X1 X2 L Y 3 1 1 1 5 3 3 0 This one has larger L so larger p but Y is 0 => discordant. No need to compute the actual probabilities as they are monotonically increasing in L.