090 Some Applications in Data Analysis

advertisement
Some Application of Statistical Methods in
Data Analysis
Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman,
Former Director,
Centre for Real Estate Studies,
Universiti Teknologi Malaysia.
Forms of “statistical” relationship
 Correlation
 Contingency
 Cause-and-effect
* Causal
* Feedback
* Multi-directional
* Recursive
 The last two categories are normally dealt with
through regression
Statistical Data Analysis Methods – A Summary
Scale of
measurement
One-sample
Two
independent
Sample
K
independent
Sample
Measures of
Association
Independent
Sample
Single
treatment
repeat
Measures
Multiple
treatment
repeat
Measures
Nominal
Binomial
test;
one-way
contingency
Table
McNemar
test
Cochrane Q
Test
Two-way
contingency
Table
Contingency Contingency
Table
Coefficients
Ordinal
Runs test
Wilcoxon
signed rank
test
Friedman
test
MannWhitney
Test
KruskalWallis
Test
Spearman
rank
Correlation
Interval/ratio
Z- or t-test
of variance
Paired t-test
Repeat
measures
ANOVA
Unpaired
t-test;
tests of
variance
ANOVA
Regression,
Pearson
correlation,
time series
One-Sample Test
 McNemar Test: tests for
Before
After
change in a sample upon a
Project L Project K
“treatment”.
Project K A = 40
B = 60
 Example. Two
condominium projects K&L. Project L C = 30
D = 50
Respondents decide their
preferences for K or L
40 people switch
before and after
from K to L and 50
“advertising”.
people switch from L
to K before and after
 Hypothesis: Advertising
does not influence buyers
to change their mind on
product choice
One-Sample Test (contd.)
Test statistics:
Thus, Q = (40-45)2/(40+45)
rc
= 25/85
Q = (0ij – Eij)2/Eij
= 0.29
i=1 j=1
where E = (A+D)/2
(2-1)(2-1); 0.05 = 3.84
Therefore,
rc
Ho not rejected. No influence
of advertising on choice of
project
Q = (0ij – Eij)2/Eij
i=1 j=1
[A-(A+D)/2]2
[D-(A+D)/2]2
(A-D)2
----------------- + ---------------- = ------(A+D)/2
(A+D)/2
A+D
One-Sample Test (contd.)
 Friedman Test: tests for
equal preferences for
something of various
characteristics.
 Example. Buyers’ rank
of preference for three
condominium types A,
B, C.
 Hypothesis: Buyers’
preferences for all
condo type do not differ
Resp.
Man
Type
A
2
Type
B
3
Type
C
1
Min
1
2
3
Lee
1
3
2
Ling
3
1
2
Dass
1
2
3
Total
8
11
11
One-Sample Test (contd.)
Test statistics:
(n-1)k k
Fr = ----------  Rj2 – 3n(k+1)
nk(k+1) j=1
where n = sample size, k = number of categories; R = is
column’s total
For large n and k, Fr follows X2(k-1);α
One-Sample Test (contd.)

(5-1)3
 F = ------------ [82 + 112 + 112] – 3x5(3+1)

5x3(3+1)
 = 1.2
 X2(3-1); 0.05 = 5.99
 Ho not rejected. Buyers do not show different
preference for condo type
One-Sample Test (contd.)
 Repeated measures ANOVA: tests outcome of a
phenomenon under different conditions.
 Example. Waiting time at junctions in the city area to
determine level of congestion at different times of the day.
 Test statistics:
t/(m-1)
F = ---------------r/[(n-1)(m-1)
where t = sum of squares due to treatment, r =sum of
squares of residual, m = number of treatment, n = number
of observations.
 Critical region based on: F v1. v2; α
where v1 = (m-1), v2 = (n-1)(m-1)
One-Sample Test (contd.)
Waiting time at junction (min.)
Row mean
Sum Sq.
about row
mean(Wi)
Morning
Noon
Evening
Junction 1
4.00
5.00
6.00
5.00
2.00
Junction 2
5.00
6.00
6.00
5.67
0.67
Junction 3
6.00
7.00
8.00
7.00
2.00
Junction 4
5.00
8.00
6.00
6.33
4.67
Junction 5
5. 00
4.00
9.00
6.00
14.00
Column
mean
5.00
6.00
7.00
M = 6.00
W = 23.34
One-sample test (contd.)
m n
T =  (cij – M)2
 i=1 j=1
 = 30
Wi = (cij – )2
 = 23.34
B = m( - M)2
 = 6.65
t = n  ( - M)2
= 10
W=t+r
r=W–t
= 23.34 – 10
= 13.34
One-Sample Test (contd.)

10/(3-1)
 Fc = ---------------------- = 2.99

13.34/(5-1)(3-2)
 Ft (3-1),(3-1)(5-1); 0.05 = 4.46
 Ho not rejected. Congestion is quite the same at
all times during the day.
Two-Sample Test
 Two-way Contingency
Table: test whether
two independent
groups differ on a
given characteristic.
 Hypothesis: choice for
type of house does
not relate to location.
 Test:
rc
Q = (0ij – Eij)2/Eij
i=1 j=1
Group
Total
(R)
Inner
Outer
suburbs suburbs
Terraced
50
75
125
Semidetached
30
25
55
Total (C)
80
100
180
Two-Sample Test (contd.)
 D.o.f. = (r-1)(c-1),
 where r=number of
 rows, c=number of columns
 Eij = RiCj/N
Inner suburbs Outer suburbs
Terraced
125 x 80/180
= 55.6
125 x 100/180
= 69.4
Semidetached
55 x 80/180
= 24.4
55 x 100/180
= 30.6
Q = (50-55.6)2/55.6 + (30-24.4)2/24.4 + (75-69.4)2/69.4 + (25-30.6)2/30.6
= 3.33
(2-1)(2-1); 0.05 = 3.84
Ho not rejected
K Independent Test - Correlation
 “Co-exist”.E.g.
* left shoe & right shoe, sleep & lying down, food & drink
 Indicate “some” co-existence relationship. E.g.
* Linearly associated (-ve or +ve)
Formula:
* Co-dependent, independent
 But, nothing to do with C-A-E r/ship!
Example: After a field survey, you have the following
data on the distance to work and distance to the city
of residents in J.B. area. Interpret the results?
K Independent Test - Correlation and regression –
matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Correlation and regression – matrix approach
Test yourselves!
Q1: Calculate the min and std. variance of the following data:
PRICE - RM ‘000
130 137 128 390 140 241 342 143
SQ. M OF FLOOR
135 140 100 360 175 270 200 170
Q2: Calculate the mean price of the following low-cost houses, in various
localities across the country:
PRICE - RM ‘000 (x)
36
37
38
39
40
41
42
43
NO. OF LOCALITIES (f)
3
14
10
36
73
27
20
17
Test yourselves!
Q3: From a sample information, a population of housing
estate is believed have a “normal” distribution of X ~ (155,
45). What is the general adjustment to obtain a Standard
Normal Distribution of this population?
Q4: Consider the following ROI for two types of investment:
A: 3.6, 4.6, 4.6, 5.2, 4.2, 6.5
B: 3.3, 3.4, 4.2, 5.5, 5.8, 6.8
Decide which investment you would choose.
Test yourselves!
Q5: Find:
(AGE > “30-34”)
(AGE ≤ 20-24)
( “35-39”≤ AGE < “50-54”)
Test yourselves!
Q6: You are asked by a property marketing manager to ascertain whether
or not distance to work and distance to the city are “equally” important
factors influencing people’s choice of house location.
You are given the following data for the purpose of testing:
Explore the data as follows:
• Create histograms for both distances. Comment on the shape of the
histograms. What is you conclusion?
• Construct scatter diagram of both distances. Comment on the output.
• Explore the data and give some analysis.
• Set a hypothesis that means of both distances are the same. Make
your conclusion.
Q 7. You have surveyed a group of local people and asked them to express their
feeling about a new project that will attract a new population and thus a new
neighbourhood. You believe that the local people are concerned about the
negative influence the new neighbourhood will have on them as a result of the
proposed project. Using the collected data, test your hypohesis.
Perception about Influence of New Neighbourhood
Locality
Degree of perception
Bblaut
Patau1
Patau2
Racha2
Total
Not worried at all
17
30
24
9
80
Not so worried
6
0
2
14
22
Worried
6
0
3
4
13
Quite worried
1
0
0
2
3
So Worried
0
0
1
1
2
Total
30
30
30
30
120
Test yourselves! (contd.)
Q7: From your initial investigation, you belief that tenants of
“low-quality” housing choose to rent particular flat units just
to find shelters. In this context ,these groups of people do
not pay much attention to pertinent aspects of “quality
life” such as accessibility, good surrounding, security, and
physical facilities in the living areas.
(a) Set your research design and data analysis procedure to
address the research issue
(b) Test your hypothesis that low-income tenants do not
perceive “quality life” to be important in paying their house
rentals.
Download