MIS 214 Statistics II 2014\2015 Summer Homework Due dates are indicated for each question (5 pt, due to 03.08.2015) (Programming) Develop a java application for goodness-of-fit test for contingency tables. Develop a class named GodnessOfFit having a static method contingency which takes an r by c two dimensional array of integers, each element of the array is the number of observations in ith level of variable R and jth level of variable C (R and C are rew and column variables) The method is to compute the chi-equare statistics and retrun it to the caller. Since it is hasr to compute probabilities of chi-squeare distribution, do not retrun p-values or result of the test. In the test class create an two dimensional array representing observations for cells. Send the array as an argument to the method and obtain and print the ressulting chi-equare statistics to the screen. Send the sorce code and as a eclips project and print the sode as hard copy Statistics Exercises: due to 31.07.2015 Theoretical exercise (2 pt) YAAEGQ,Yet Another A.E.Gürsu Question: For a multple regression of two explanatory variables show that if there is no correlation between explanatory variables X1 and X2 (rx1,x2 = 0, sample correlation is zere) Coefficient of determination of the multiple regression (R2) is the sum of the coefficient of determinations of simple regressions Y on X1 (R12) and Y on X2 (R22). Hint: R2 = SSR/SST, SST is the same for all of these simple and multiple regressions. SSR multiple regression = SSR1+ SSR2, where SSRi is the SSR of siple regression (i=1,2) Note that for Y = b0 + b1X1 + b2X2 note that slopw coefficents are the same for simple regression slopws for X1 and X2 İn multiple regression for a single observation Yi - Y_bar = (Yi –Yi _pr ) + (Yi _pr - Y_bar) Yi _pr = b0 + b1X1i + b2X2i, Note that sample regression passes fromcenter of mass eliminate intercept term. Make these sustitutions and use the definition of SSR You will see that when tatking the square some terms drop because of no correlation between explanatory variables. Following exercises are based on the modified version of the data described in Appandix of Chapter 10 due to 31.07.2015 HEI Cost Data variables Subset The file file will be available soon (1 pt)10.55,10.57, (1 pt)11.97,11.101, (2 pt)12.115 Following are due to 09.08.2015 (1 pt) 15.47, 15.48 (1 pt) 15.73 For the application problems: Computations has to be done in Excel Copy the computation sheet to word. Clearly state results of hypothesis (2 pt, due to 09.08.2015)Data mining exercises: 1. (20 pts) For each of the following problem identify relevant data mining functionalitieswith a brief explanation a) A financial analyst is interested in wheather the stock market index will be up or down for the coming day b) Cities in Turkey are grouped according to their voting characteristics after the Republic of President election. c) A security specialist is interested in determining mail message are spam or no looking at words passing the messages. d) A medical doctor is interested in what symptoms (binary variables) occur together for a specific type of canser. 2. Given a simple transactional database X TID Items T1 ABCD T2 ACDF T3 A CDEG T4 AB DF T5 BCG T6 DFG T7 ABG T8 CDFG Minimum support = 25%, Minimum confidence = 60% Find all frequent itemsets and strong association rules 3. Problem 8.7 part a) and b) on page 387, Han Third edition 4. Exercise on page 87 of MIS 214 DM Ch02.ppt