An Information System to Learn Characteristic Sets of Words and to

advertisement
An Information
System to Learn
Characteristic Sets of
Words and to
Examine Knowledge
in Statistics
Oskars Rasnacs, Riga Stradinsh University
Maris Vitins, University of Latvia
1
VILNIUS: DBIS 2012
Main problem
Data
Hypothesis
?
Decisions about data processing
methods
VILNIUS: DBIS 2012
2
Riga Stradinsh University
 Health care specialties (medicine, food,
pharmacy, art therapy and others)
 Social science specialties (business, politics,
economics and others)
Author from Faculty of Medicine, Department
of Physics
VILNIUS: DBIS 2012
3
Minimal volume of information
for learning
In the statistics’ subject usually large information
volume
What student need to learn for information
searching in the literature or internet in short
time?
Characteristic sets of words in the English (for
Latvian students)
VILNIUS: DBIS 2012
4
Minimal volume of information
for learning
Students like characteristic sets of words search
information. Working independently works it is allowed
VILNIUS: DBIS 2012
5
Characteric sets of words
Central tendency indicators
Mean (average), minimum, maximum, count –
very good known
Median
Mode
Quartiles
Percentiles
VILNIUS: DBIS 2012
6
Characteristic sets of words
Dispersion indicators
Range
Variance
Std. deviation
3  definition
Skewness
Kurtosis
VILNIUS: DBIS 2012
7
Characteristic sets of words
Representation indicators (std. errors)
Std. for Mean
Std. for Skewness
Std. for Kurtosis
VILNIUS: DBIS 2012
8
Characteristic sets of words
Normal distribution
Student T-test for independent sample
Student T-test for paired sample
Analysis of variance (ANOVA)
Pearson correlation
Linear regression
VILNIUS: DBIS 2012
9
Characteristic sets of words
Mann – Whitney, Kolmogorov – Smirnov, Wald Wolfowitz test
Wilcoxon, sign, MCNemar tests
Kruskal – Wallis, Median tests
Sperman, Kendall, gamma correlation
Friedman, Kendall, Cochran tests
VILNIUS: DBIS 2012
10
Difference of choose of data
processing methods
There are produced many algorithms where we can see relationship between
characteristic sets of words
Choosing the Correct Statistical Test in SAS, STATA and SPSS.
http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm
How to choose a statistical test.
http://www.graphpad.com/www/book/choose.htm
Selecting statistics.
http://www.socialresearchmethods.net/selstat/ssstart.htm
11
Difference of choose of data
processing methods
Are dependent from:
 learning content;
 normal distribution criterion
 ordinal data – qualitative or quantitative data?
 classification of data processing situations
VILNIUS: DBIS 2012
12
Difference of choose of data
processing methods
Can do in many kinds
The author served 10 professors of Latvian
Universities
For each professor and specialist (after graduate
student) have viewpoint
VILNIUS: DBIS 2012
13
At begining of research 2008/09
The students used this algorithm (Teibe, 2007)
Non – parametrics (all other cases)
Tasks
Quantitative normal distributed data
Descriptive statistics
Mean, std. deviation
Independent sample t - test
Median, mode, interquartile range
Mann - Whitney, Kolmogorov-Smirnov, Wald
- Wolfowitz, chi-square and Fisher exact
criterion
Paired sample t test
Wilcoxon, sign and McNemar criterion
ANOVA
Kruskal - Wallis, median and chi-square
criterions
Fridman ANOVA
Fridman ANOVA, Cochran criterion
Pearson correlation analysis
Chi-square criterion, Spearman, Kendall and
gamma correlation analysis
Regression, discriminant, factor and
cluster analysis
Logistic regression analysis
Two independent group
comparison by one variable
Two dependent group
comparison by one variable
Three and more
independent group
comparison by one variable
Three and more dependent
group comparison by one
variable
Two variable relationship
analysis
Three and more variable
coincident analysis
VILNIUS: DBIS 2012
14
At begining of research 2008/09
Tests with 3 variants of answers, one correct. For example
Frequency of the pulse before and after load (normally
distributed) for one and the same patients? Variants of
answers:
Analysis of variance (ANOVA);
Independent sample Student t test;
Paired sample Student t test.
•Test making program
http://skolai.daba.lv/proj_materiali/macibu_materiali/d/
Testu_veidosana2_present_rb_d.pdf
15
At begining of research 2008/09
There are need improvements in the learning process:
• Need more detalised algorithm (number of situation changed,
this moment 31 situation)
• Very important is data file and decision making;
• Usually one and the same situation some solutions; better some
correct answers.
•The results of tests in the first stage are poor (M=41,31, n=26), by
RSU study rules average 3 points. The second stage of research
started 2009/10
16
The second stage of research:
learning data
•We may not use real data of patients
•To get data with students’ questionnaire
•To generate data according to statistical indicators
in the scientific publications
VILNIUS: DBIS 2012
17
Learning data
•Student need to work also with data what is not
normally distributed
•There are given Mean (M), std. deviation (SD), n,
% in the many publications of health care and
social science specialties
VILNIUS: DBIS 2012
18
Author’s 1-st model for data
generation
( M  M m )  ( SD  SDm )  ( Me  Mem )  min
2
2
2
M, Mm are the real and targeted average values,
SD and SDm are the real and targeted standard
deviation, and Me, Mem are the real and targeted
median.
VILNIUS: DBIS 2012
19
Author’s 2-nd model for data
generation
n
Z   ( yi  y progni )  min, yi  0
2
i 1
n
2
(
y

y
)
 progni avg
i 1
n
2
(
y

y
)
 i avg
 R2
i 1
yi – generated data values which are minimally different from the
data values predicted by equations; yavg – average value of
generated data values; yprogni – the data values predicted by
equations; R2 – the target’s determination coefficient.
VILNIUS: DBIS 2012
20
Information system
•There are many internet test making programs, for example,
www.quizegg.com.
•The information system created by author’s direction
•Given MS Excel file with data and student need logically thinking
according to interpretation to determine that set of characteristic
words is adequate or not for data
•Data of test program are compatible with MS Excel
•There are test mode and learning mode
•Number (%) about test solving is compatibility with expert
(professors) viewpoint
VILNIUS: DBIS 2012
21
Information system
22
Information system
23
After second stage of research
2010/11
Results of student works are significant better (M=59,6%, n=55,
Mann-Whitney test p<0,001)
24
Thank You for attention!
VILNIUS: DBIS 2012
25
Download