Document 14998952

advertisement
Matakuliah
Tahun
: D0174/ Pemodelan Sistem dan Simulasi
: Tahun 2009
Pertemuan 18
MODEL STATISTIK
Learning Objectives
• Analisis Data
• Studi Kasus Model Statistik
General Principles of Data
Analysis
Choice of an appropriate statistical technique
 a complex issue
 somewhat arbitrary
Real-life data often contain mixtures of different types of data
 two statisticians may select different methods
 depending upon what assumptions they are willing to take
into account
 extraneous factors
availability of software and its limitations
availability of time and financial resources
General Principles of Data
Analysis
Warnings
 Figures allow us to calculate them
 Applying different techniques and obtaining different results
does not mean that something is wrong
 Looking for an answer to the same question by using several
methods may lead to a better understanding
 Obtaining negative results may be as informative as getting a
positive one
 Obtaining no answer by using one technique, does not mean
that there is no answer at all
 Etc.
General Principles of Data
Analysis
The choice of a statistical technique depends essentially upon
 Characteristics of the analysis question;
 Characteristics of the data;
 Characteristics of the sampling design.
Characteristics of the Analysis Question
 Whether there is a distinction between independent and dependent
variables or not?
 Whether the nature of the research problem requires:
Description, exploration, estimation, or
Testing of a hypothesis or model
 Whether the focus of research is on 'variables' or 'objects‘.
General Principles of Data
Analysis
Characteristics of the Data
Types of data sets
 Individuals - variables data sets
 Proximities data sets
 Variable - Variable Proximities
 Individual - Individual Proximities
Types of Variables
 Continuous or Quantitative Variables
 Discrete or Qualitative Variables
Variable types by measurement level
Interval-scale variables
Nominal-scale variables
Ratio-scale variables
Ordinal-scale variables
General Principles of Data
Analysis
Techniques for problems without distinction between independent
and dependent variables
No. of Variables Measurement Level
One
One
One
Nominal
Ordinal
Preferences
One
Interval or ratio scale
Two
Two
Two
Dichotomous
Nominal
Ordinal
Two
More than two
Interval-scale
Interval-scale
Analysis Method
NON-METRIC
Frequencies, Proportions
Median, Mode
Rank Consensus among evaluators
METRIC
Mean, Median, Mode, Variance, Skewness, Kurtosis
NON-METRIC
Cross-tabulation Chi-square
Cross tabulation, Chi-square, Correspondence Analysis
Kendall's Tau,Spearman's Rho, Gamma
METRIC
Scatter plot, Pearson's Correlation Coefficient
Principal Components Analysis, Factor Analysis, Cluster
Analysis Multidimensional Scaling
General Principles of Data
Analysis
Techniques for problems with distinction between independent and
dependent variables
No. of Variables
Measurement Level
Dependent Independent Dependent
Independent
Analysis Method
One
One
One
One
Non-parametric tests, Chi-square
Multiple Classification Analysis
One
One
Nominal
Nominal
Nominal
Nominal
(dichotomous)
Nominal
Nominal
(Dichotomous)
One
One
Interval-scale
t-test, Analysis of Variance
One
One
One
One
One
One
More
More
Interval-scale
Interval-scale
Nominal
Interval-scale
Nominal
(Dichotomous)
Interval-scale
Nominal
Interval-scale
Nominal
One
More
Interval scale
Dummy
One
More
Interval-scale
Interval-scale
Wilcoxon's two sample test, Chi-square,
Kolmogorov-Smirnov Test
Regression Analysis
Analysis of Variance
Discriminant Analysis
Analysis of Variance, Multiple Regression
Analysis, Multiple Classification Analysis
Analysis of Variance, Multiple Regression
Analysis, Multiple Classification Analysis
Multiple Regression Analysis
General Principles of Data
Analysis
Usual way of statistical problem solving
 Formulate the question using terms and logics of the specific
field of the problem (science management, pedagogy,
economics, etc.)
 Reformulate the question using statistical terms and logics
 Find appropriate statistical model(s) and technique(s)
 Use the selected model(s) and technique(s)
 Give statistical interpretation to the results obtained
 Reformulate the interpretation with terms of the original field
of application
Scientific products by
country
Question in research management
Research groups have multiple outputs comprising publications,
patents, experimental materials etc. What are the differences if any
in the performance of the Research Groups of selected countries?
Statistical question
Can we construct a reasonable productivity index, using the
following measures of the scientific output
Articles in country
Articles abroad
Original research reports
Patents
Algorithms and designs
Experimental material
Can we find a significant difference by countries in the productivity
index?
Scientific products by
country
Statistical model and technique
 Partial order scoring for constructing the index of research output
 Analysis of variance for testing the hypothesis concerning the
significance of the difference
Use of the selected model and technique
$RUN POSCOR
$FILES
PRINT = POSCOR.LST
DICTIN = R2R3RU.DIC
DATAIN = R2RU.DAT
DICTOUT =POSCOR.DIC
DATAOUT =POSCOR.DAT
$SETUP
POSCOR SCORES OF RU OUTPUTS
BADDATA=MD1 IDVAR=V2 TRANSVARS=(V1)
POSCOR ORDER=DESR ANAME=‘RU OUTPUT’ –
VARS=(V116,V118,V122,V126,V128,V13
0)
$RUN ONEWAY
$FILES PRINT = ONEWAY1.LST
DICTIN = POSCOR.DIC
DATAIN = POSCOR.DAT
$SETUP
ANALYSIS OF VARIANCE OF RU OUTPUT
BADDATA=MD1 PRINT=CDICT DEPVARS=(V8) CONVARS=(R1)
$RECODE
R1=RECODE V15 (40)=1, (360)=2, (410)=3, (638)=4, (844)=5, (868)=6
Scientific products by
country
Use of the selected model and technique (results)
Code
Label
1
2
3
4
5
6
N
334
239
200
225
233
229
Weightsum
334
239
200
225
233
229
%
22.9
16.4
13.7
15.4
16
15.7
Mean
37.731
45.213
77.585
52.547
36.7
69.074
Total sum of squares
For 6 groups , Eta
For 6 groups , Etasq
For 6 groups , Eta(adj)
For 6 groups , Etasq(adj)
Between means sum of squares
Within groups sum of squares
F( 5,1454)
S.D.(esti
m.)
35.794
35.778
27.336
35.43
33.266
36.255
Sum of X
1.26E+04
1.08E+04
1.55E+04
1.18E+04
8.55E+03
1.58E+04
2048467
0.4018943
0.161519
0.3982909
0.1586357
330866.5
1717601
56.018
%
16.8
14.4
20.7
15.7
11.4
21.1
Sum of Xsquare
9.02E+05
7.93E+05
1.35E+06
9.02E+05
5.71E+05
1.39E+06
Scientific products by
country
Statistical interpretation
 The F( 5,1454)=56.018 value shows that there is a highly
significant difference by country in the constracted performance
index.
 We see also a medium strength differentiation between the
countries: Eta(adj)=0.398.
 The Mean values show the level of each country.
Interpretation for research management
There are two countries with low, two ones with medium and two
other ones with high productivity index.
Source
P.S. Nagpaul: Guide to Advanced Data Analysis using IDAMS Software
Performance, motivation and creativity of school
children
Question in psychology - pedagogy
Intellectual performance, motivation and creativity of school children can
be measured by using several indicators. Some of them are produced by
the children themselves (e.g. IQ tests) others are based on the evaluation
given by their teachers (e.g. average grade). What are the perceivable
dimensions if any behind these indicators?
Statistical question
In the set of the listed indicators, are there any groups within which
statistical inter-correlation and between which statistical independence
can be detected?
T Average grade
T Creative behaviour
C IQ
C Achievement motivation
C Creativity test
T Motivated behaviour
C Creative attitude
T Motivation index
Performance, motivation and creativity of school
children
Statistical model and technique
 Pearsonian correlation between the measured indicators
 Multidimensional scaling, cluster analysis
Use of the selected model and technique
Executing PEARSON, MDSCAL, CLUSFIND in IDAMS
MDSCAL result
Children
Teachers
Performance, motivation and creativity of school
children
Use of the selected model and technique
CLUSFIND result
T Creative behaviour
0,75
T Motivated behaviour
C IQ
T Average grade
0,40
0,27
0,45
0,02
0,13
0,71
T Motivation index
C Achievem. motivation
C Creativity test
C Creative attitude
Performance, motivation and creativity of school
children
Statistical interpretation
 Multidimensional scaling shows clear separation of indicators produced
by children and teachers
 Cluster analysis supports the finding of the separation of variables
coming from teachers and children
Pedagogical/psychological interpretation
Just one aspect: ratings given by teachers to children are nearly the
same, independently of the evaluated ability, attitude or behaviour
dimension
Source
M. Hunya: Multidimensional statistical techniques in pedagogical studies
Data
A.Deak, B. Kozeki: Study into the effect of motivation and creativity factors on the
performance of school children
Prediction of river flow values
Question in hydrology
 We have water level data on four rivers in North-Africa (mor
than 40 years). Can the water flow level be predicted on the basis of
data from the past? If so, with what precision?
 What if the average flow level is considered instead of the individual
ones?
Statistical question
 Can the river flow values be predicted by using a set of values
from the preceding period?
 How does the prediction change if 6 month average flow is
used?
Prediction of river flow values
Statistical model and technique
 Autoregression model (with a lag of 12 to 36) applied to the river flow
time series
 Transformation of the original data into a time series of moving
averages (interval length = 6)
Use of the selected model and technique
Time Series Analysis option from the IDAMS interactive facilities
Original series
Moving average series
12 months R**2=0,32
24 months R**2=0,35
36 months R**2=0,36
12 months R**2=0,92
24 months R**2=0,93
Prediction of river flow values
Use of the selected model and technique
Original series
Moving average series
Prediction of river flow values
Statistical interpretation
Autoregression shows that individual values can be predicted (Unbiased
R**2 = 0,32 - 0,36; for 12 to 36 months) with moderate or avarage
precision, high peak values are very poorly reproduced.
In the case of a 6 month moving average, the prediction is nearly perfect
(Unbiased R**2 = 0,92; for 12 months).
Hydrological interpretation
Although the pattern of changes can fairly be reproduced, even three
years data from the past are not enough at all to predict the height of
peak flows.
But if we consider 6 month averages, they can be predicted almost with
full precision.
Data
UNESCO, Water Science Division
Business
Question concerning company management
 What are the factors that influence the economic performance
of a company? Economic performance is measured by the
return on capital employed.
Statistical question
 Can the return on capital be predicted by using a set of
economic and production indicators from those characterizing
the company?
 How does the prediction change if we are loking for a subset of
best predictors?
Statistical model and technique
 Multiple linear regression
 Stepwise regression
Business
Use of the selected model and technique
Running REGRESSN
Results
 The full regression model explains 70% of the adjusted variance
of the dependant variable. Its standard error is about one half of
the mean, value of the determinant of the correlation matrix is
.79478E-05. There are 8 variables (out of 12) with high
covariance ratio values.
 The stepwise regression model selects 3 variables for explaining
80 % variance. No multicollinearity (0.77647 ). Standard error of
the estimate of the dependent variable = 0.06135 which is quite
low: high reliability of estimation.
Business
Statistical interpretation
Full regression model: the reliability of prediction is poor. Strong
multicollinearity is shown. Variables, which contribute to
multicollinearity can be identified
The stepwise regression model: 3 variables for explaining 80%
variance. No multicollinearity. High reliability of estimation.
Interpretation for management
Although the full indicator set can give nice prediction, it can not
be suggested for real use because of the poor prediction
reliability.
But if we consider 3 carefully selected indicators, we can get a
fair prediction.
Source
P.S. Nagpaul, India
Education
Question concerning measurement of knowledge level
Tests are used very often in education for checking the level of
knowledge in one or in another subject. Long tests with many
questions can meet relatively easily the reliability requirement.
The question is if we can make a short interactive, adaptive test
from a long test, preserving at least nearly the original reliability.
Statistical question
Can we give a good estimate of the original test value by using a
tree structure based prediction?
Statistical model and technique
Regression tree
Education
Use of the selected model and technique
Running SEARCH
Results
Starting from a standardized test (for checking a specific verbal
aptitude) containing 20 questions, a regression tree with 3-4
questions was obtained. The regression tree contains 10 final
subgroups (leaves) with estimates for the original test value ranging
from 6,4 to 59,2. The explained variance is 90,4%.
Education
Statistical interpretation
A very good estimate can be given for the original test value by using the
obtained regression tree.
Interpretation for test designers
Using the the tree structure, cumputer assisted test can be constructed,
which is much shorter, without loosing the power of the original test.
Source
M. Hunya: Finding optimal interactive test structures (1982)
Daftar Pustaka
Harrel. Ghosh. Bowden. (2000). Simulation Using
Promodel. McGraw-Hill. New York.
Kelton, WD., Sadowski, DA, and Sturrock DT. (KS&S).
(2003). Simulation with Arena. 3rd edition. McGraw Hill.
New York.
TERIMA KASIH
Download