Test 2 St 590 1. (24 pts.) I have data on salaries of graduates as well as grade point average (GPA), IQ, and age at graduation. I used a regression tree because of the continuous response. One leaf has graduates with IQ between 110 and 115, age at graduation less than 21, and GPA between 3.4 and 3.8. There are 5 people in that leaf with salaries (in thousands) 80, 100, 105, 87, and 78 (a) What is the regression tree predicted salary ______ for anyone falling into that leaf? (b) Each leaf in a regression tree always has just one prediction even though a leaf typically contains many different responses (e.g. salaries). Why is that? (c) Compute the error sum of squares within the given leaf _________ (d) Someone asked me “what is the contribution of that leaf to the misclassification rate?” How should I respond? (e) This leaf and another with error sum of squares 700 were the result of splitting a parent node. Find, if possible, an upper _____ and lower ____ bound for what the error sum of squares in the parent node might have been. Put NP if not possible. 2. (16 pts.) A person secretly tosses a fair coin (probability of heads = 0.50). If it comes up heads he must randomly pick a ball from a jar with 1 red and 4 blue balls in it, and if tails, he must pick from a jar with 3 red and 7 blue balls. He shows me the selected ball which is blue. (a) Does seeing blue increase or decrease my probability (from 0.50) that a head was tossed? (b) What is my probability that the person tossed heads _______ now that I’ve seen the blue color? (c) This new adjusted probability is referred to as a _______________probability. (d) Does this general type of probability adjustment occur in discriminant analysis? (yes, no) 3. (40 pts.) I have only 3 varieties (A, B, C) of wolves roaming Yellowstone Park. I find a wolf skull there and measure its height, width, depth, eye socket separation, and bone thickness because I have previously found these measurements for many skulls, each known to be from a specific one of the 3 varieties and from these I’ve built a Fisher linear discriminant function. (a) How many rows ___ and columns __ does the variance matrix have for variety B wolves? (b) In words, what does the number in the second row third column of matrix give you? Be specific, assuming the data vector has the features in the order listed above. (c) How many coefficients, including the constant, does each of my discriminant functions involve? ____ (d) Briefly, how does Mahalanobis distance differ from ordinary (Euclidean) distance? (e) For the skull I found, the Fisher linear discriminant functions for varieties A, B, and C were -1.3, -0.9, and -2.0 respectively. Find if possible from this information (if not put NP) the following: (i) The Mahalnobis distance _____ of my found skull from the variety A centroid (mean vector). (ii) The most likely variety _______ to have produced this skull (iii) The variety ________whose Mahalanobis distance from my found skull is the largest. (e) From the given information were the variances assumed to be the same (yes, no) for all 3 wolf varieties? How do you know? 4. (20 pts.) I have a population partitioned into 4 subpopulations. My four prior probabilities of membership are 0.4, 0.3, 0.2, and 0.1 for subpopulations 1, 2, 3, and 4 respectively. Using the priors and a large set of training data, I computed the corresponding 4 Fisher Linear Discriminant functions F1, F2, F3, and F4. I observed a vector Y of data for an individual and want to find the probabilities that the individual came from each of my four subpopulations so I evaluated F1 = ln(3)=1.10, F2=ln(6)=1.79, F3=ln(4)=1.39 and F4 =ln(7)=1.95 for the observed data vector Y. Find if possible from this information (if not possible, put “NP”): (a) The number of elements _____ in the data vector Y. (b) The probability, given Y, _____ that my individual came from subpopulation 3. (c) The subpopulation ____ from which the observed data vector Y is least likely to have come. (d) For subpopulations 1 through 4 my percentages of misclassification are 10% 40%, 20%, and 30%. Compute the overall misclassification rate ______ for this discriminant function. Answers: 1. 80, 100, 105, 87, and 78 Prediction is average 90. Prediction errors: -10 10 15 -3 -12. Because it is a leaf, features do not allow any further distinction so all must be predicted with the same number. Error sum of squares is 100+100+225+9+144=578. This is a regression tree and has nothing to do with misclassification. Lower bound is 578+700=1278. No upper bound. 2. Increase (slightly higher, 80% vs. 70%, chance of blue when heads tossed) Pr{blue|heads}=.8, Pr{blue|tails}=.7 so Pr{blue}=.5(.8)+.5(.7)=.75 Pr{heads|blue}=Pr{heads and blue}/.75 = .4/.75=.5333 = posterior probability. Computing posterior probabilities from prior probabilities is the whole point of discriminant analysis. 3. We have 5 features and hence 6 coefficients. The second row third column of the 5 by 5 covariance matrix holds the covariance between a skull’s width and depth. Mahalanobis distance is based only on probabilities. We can’t compute it without knowing Y’-1Y. Variety B is most likely* while C has the largest generalized* Mahalanobis distance (least likely). It is a linear discriminant function so it assumes the same covariance matrix. * note: a linear function (in the data) has the form __ + __ Y1 + __Y2 + … + __Yk. Recall that when the priors are unequal, they change the intercept but nothing else and are part of the Fisher Linear Discriminant Function because they differ with the population and leave the form linear as in our class demos. In other words any effect of unequal priors is already accounted for in the function. Thinking of Mahalanobis distance not generalized (which is literally what is asked), then without knowing the priors, we cannot remove their contribution to the generalized Mahalanobis distance to get the answer. 4. The size of the data vector is unrelated to the number of subpopulations. 4/(6+3+4+7) = 4/20 = 0.20 Subpopulation 1 is least likely Error rate (.1)(.4) + (.4)(.3) + (.2)(.2) + (.3)(.1)= .04+.12+.04+.03 = .23 or 23%