A former student in data mining, Adam Morris, supplied this information on 7 types of bats. As you may know, bats are flying mammals that are active at night. The challenge is to identify the variety of bat using their calls. As you also may know, bats navigate by sending out calls and listening for their echos. Some information on this “echolocation” process is given at http://animals.howstuffworks.com/mammals/bat2.htm The calls of several bats of each of the 7 types have been characterized by a set of their features as described below. In Adam’s e-mail, that data are described as follows: The target variable is species (there are 7 species). Labo = Eastern red bat Nyhu = Evening bat Pisu = Tricolored bat Epfu = Big brown bat hoary = Hoary bat Myau = Northern Long-eared bat Tabr = LeConte Free-tail bat There are 11 continuous variables (features), which are parameters measured from sonograms of the echolocation recordings. The species identifications were made by comparing each sonogram with known-species reference calls (by eye). Since this is incredibly tedious, a quantitative sorting algorithm would be quite useful. (note the large number of bats for which this was done) Measured Call Characteristics: dur = duration pre = preceding interval highf = high frequency lowf = low frequency band = bandwidth fmaxamp = frequency of maximum amplitude maxamp = maximum amplitude (% duration) slope = overall slope heel = location of heel if present upper = upper slope (if heel is present) lower = lower slope (if heel present) Click on the link to get the SAS program that reads in the data and gets you started on part 2. Note that it assumes proportional priors (i.e. a representative sample). You do not need to organize a nice report this time but please use complete sentences/paragraphs to answer these questions. The tasks are: (1) Describe the data: What are the counts and percentages of the 7 species of bat? Assuming this is a representative sample what are the most common and rarest species? (2) Run a Fisher Linear Discriminant function for identifying the species of bat. (A) For the Fisher Linear discriminant function, what assumptions are made about the seven covariance matrices? (B) How many rows and columns does each of these seven covariance matrices have? (C) Besides the intercept, how many coefficients does each discriminant function involve? Would this answer change if there were more features? Would it change if there were more species? (D) Why are the discriminant numbers (Fj in our notes) different for different individual bats? Is it the coefficients, the features, or both? (E) Suppose (for simplicity) that a bat’s discriminant functions were F1=2 for comparing to Labo and Fj=1 for j=2,3,…,7 for comparing to each of the other 6 species. What is the (posterior) probability that this bat is a Labo bat (Eastern Red bat)? (F) Suppose (again for simplicity) that we have a bat whose sonogram trace has highf=10 and all other features equal to 0. Find from your Fisher Linear Discriminant function output, the discriminant number (Fj in the notes) for comparing this bat to each of the seven species’ distribution. (G) How many Epfu bats where accidentally classified as Labo and how many Labo bats were accidentally classified as Epfu using your linear discriminant function? (H) How would you change your code to force PROC DISCRIM to run a quadratic discriminant function? Under what conditions would you prefer quadratic to linear? (Note the relationship of this question and question 2A). (I) Test to see if a quadratic discriminant function is needed by changing your SAS code appropriately. Report the result. Run a quadratic discriminant function and compare the misclassification rate to that of the linear discriminant function by showing both rates.