MIS 214 Statistics II
2014\2015 Summer
Due dates are indicated for each question
(5 pt, due to 03.08.2015) (Programming) Develop a java application for goodness-of-fit
test for contingency tables. Develop a class named GodnessOfFit having a static method
contingency which takes an r by c two dimensional array of integers, each element of the
array is the number of observations in ith level of variable R and jth level of variable C (R
and C are rew and column variables) The method is to compute the chi-equare statistics
and retrun it to the caller. Since it is hasr to compute probabilities of chi-squeare
distribution, do not retrun p-values or result of the test.
In the test class create an two dimensional array representing observations for cells. Send
the array as an argument to the method and obtain and print the ressulting chi-equare
statistics to the screen.
Send the sorce code and as a eclips project and print the sode as hard copy
Statistics Exercises:
due to 31.07.2015
Theoretical exercise
(2 pt) YAAEGQ,Yet Another A.E.Gürsu Question:
For a multple regression of two explanatory variables show that if there is no correlation
between explanatory variables X1 and X2 (rx1,x2 = 0, sample correlation is zere)
Coefficient of determination of the multiple regression (R2) is the sum of the coefficient of
determinations of simple regressions Y on X1 (R12) and Y on X2 (R22).
R2 = SSR/SST, SST is the same for all of these simple and multiple regressions.
SSR multiple regression = SSR1+ SSR2, where SSRi is the SSR of siple regression (i=1,2)
Note that for Y = b0 + b1X1 + b2X2 note that slopw coefficents are the same for simple
regression slopws for X1 and X2
İn multiple regression for a single observation
Yi - Y_bar = (Yi –Yi _pr ) + (Yi _pr - Y_bar)
Yi _pr = b0 + b1X1i + b2X2i,
Note that sample regression passes fromcenter of mass eliminate intercept term.
Make these sustitutions and use the definition of SSR
You will see that when tatking the square some terms drop because of no correlation
between explanatory variables.
Following exercises are based on the modified version of the data described in Appandix
of Chapter 10
due to 31.07.2015
HEI Cost Data variables Subset The file file will be available soon
(1 pt)10.55,10.57,
(1 pt)11.97,11.101,
(2 pt)12.115
Following are due to 09.08.2015
(1 pt) 15.47, 15.48
(1 pt) 15.73
For the application problems:
Computations has to be done in Excel
Copy the computation sheet to word.
Clearly state results of hypothesis
(2 pt, due to 09.08.2015)Data mining exercises:
1. (20 pts) For each of the following problem identify relevant data mining
functionalitieswith a brief explanation
a) A financial analyst is interested in wheather the stock market index will be up or
down for the coming day
b) Cities in Turkey are grouped according to their voting characteristics after the
Republic of President election.
c) A security specialist is interested in determining mail message are spam or no
looking at words passing the messages.
d) A medical doctor is interested in what symptoms (binary variables) occur together
for a specific type of canser.
2. Given a simple transactional database X
Minimum support = 25%, Minimum confidence = 60%
Find all frequent itemsets and strong association rules
3. Problem 8.7 part a) and b) on page 387, Han Third edition
4. Exercise on page 87 of MIS 214 DM Ch02.ppt