An Information System to Learn Characteristic Sets of Words and to Examine Knowledge in Statistics Oskars Rasnacs, Riga Stradinsh University Maris Vitins, University of Latvia 1 VILNIUS: DBIS 2012 Main problem Data Hypothesis ? Decisions about data processing methods VILNIUS: DBIS 2012 2 Riga Stradinsh University Health care specialties (medicine, food, pharmacy, art therapy and others) Social science specialties (business, politics, economics and others) Author from Faculty of Medicine, Department of Physics VILNIUS: DBIS 2012 3 Minimal volume of information for learning In the statistics’ subject usually large information volume What student need to learn for information searching in the literature or internet in short time? Characteristic sets of words in the English (for Latvian students) VILNIUS: DBIS 2012 4 Minimal volume of information for learning Students like characteristic sets of words search information. Working independently works it is allowed VILNIUS: DBIS 2012 5 Characteric sets of words Central tendency indicators Mean (average), minimum, maximum, count – very good known Median Mode Quartiles Percentiles VILNIUS: DBIS 2012 6 Characteristic sets of words Dispersion indicators Range Variance Std. deviation 3 definition Skewness Kurtosis VILNIUS: DBIS 2012 7 Characteristic sets of words Representation indicators (std. errors) Std. for Mean Std. for Skewness Std. for Kurtosis VILNIUS: DBIS 2012 8 Characteristic sets of words Normal distribution Student T-test for independent sample Student T-test for paired sample Analysis of variance (ANOVA) Pearson correlation Linear regression VILNIUS: DBIS 2012 9 Characteristic sets of words Mann – Whitney, Kolmogorov – Smirnov, Wald Wolfowitz test Wilcoxon, sign, MCNemar tests Kruskal – Wallis, Median tests Sperman, Kendall, gamma correlation Friedman, Kendall, Cochran tests VILNIUS: DBIS 2012 10 Difference of choose of data processing methods There are produced many algorithms where we can see relationship between characteristic sets of words Choosing the Correct Statistical Test in SAS, STATA and SPSS. http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm How to choose a statistical test. http://www.graphpad.com/www/book/choose.htm Selecting statistics. http://www.socialresearchmethods.net/selstat/ssstart.htm 11 Difference of choose of data processing methods Are dependent from: learning content; normal distribution criterion ordinal data – qualitative or quantitative data? classification of data processing situations VILNIUS: DBIS 2012 12 Difference of choose of data processing methods Can do in many kinds The author served 10 professors of Latvian Universities For each professor and specialist (after graduate student) have viewpoint VILNIUS: DBIS 2012 13 At begining of research 2008/09 The students used this algorithm (Teibe, 2007) Non – parametrics (all other cases) Tasks Quantitative normal distributed data Descriptive statistics Mean, std. deviation Independent sample t - test Median, mode, interquartile range Mann - Whitney, Kolmogorov-Smirnov, Wald - Wolfowitz, chi-square and Fisher exact criterion Paired sample t test Wilcoxon, sign and McNemar criterion ANOVA Kruskal - Wallis, median and chi-square criterions Fridman ANOVA Fridman ANOVA, Cochran criterion Pearson correlation analysis Chi-square criterion, Spearman, Kendall and gamma correlation analysis Regression, discriminant, factor and cluster analysis Logistic regression analysis Two independent group comparison by one variable Two dependent group comparison by one variable Three and more independent group comparison by one variable Three and more dependent group comparison by one variable Two variable relationship analysis Three and more variable coincident analysis VILNIUS: DBIS 2012 14 At begining of research 2008/09 Tests with 3 variants of answers, one correct. For example Frequency of the pulse before and after load (normally distributed) for one and the same patients? Variants of answers: Analysis of variance (ANOVA); Independent sample Student t test; Paired sample Student t test. •Test making program http://skolai.daba.lv/proj_materiali/macibu_materiali/d/ Testu_veidosana2_present_rb_d.pdf 15 At begining of research 2008/09 There are need improvements in the learning process: • Need more detalised algorithm (number of situation changed, this moment 31 situation) • Very important is data file and decision making; • Usually one and the same situation some solutions; better some correct answers. •The results of tests in the first stage are poor (M=41,31, n=26), by RSU study rules average 3 points. The second stage of research started 2009/10 16 The second stage of research: learning data •We may not use real data of patients •To get data with students’ questionnaire •To generate data according to statistical indicators in the scientific publications VILNIUS: DBIS 2012 17 Learning data •Student need to work also with data what is not normally distributed •There are given Mean (M), std. deviation (SD), n, % in the many publications of health care and social science specialties VILNIUS: DBIS 2012 18 Author’s 1-st model for data generation ( M M m ) ( SD SDm ) ( Me Mem ) min 2 2 2 M, Mm are the real and targeted average values, SD and SDm are the real and targeted standard deviation, and Me, Mem are the real and targeted median. VILNIUS: DBIS 2012 19 Author’s 2-nd model for data generation n Z ( yi y progni ) min, yi 0 2 i 1 n 2 ( y y ) progni avg i 1 n 2 ( y y ) i avg R2 i 1 yi – generated data values which are minimally different from the data values predicted by equations; yavg – average value of generated data values; yprogni – the data values predicted by equations; R2 – the target’s determination coefficient. VILNIUS: DBIS 2012 20 Information system •There are many internet test making programs, for example, www.quizegg.com. •The information system created by author’s direction •Given MS Excel file with data and student need logically thinking according to interpretation to determine that set of characteristic words is adequate or not for data •Data of test program are compatible with MS Excel •There are test mode and learning mode •Number (%) about test solving is compatibility with expert (professors) viewpoint VILNIUS: DBIS 2012 21 Information system 22 Information system 23 After second stage of research 2010/11 Results of student works are significant better (M=59,6%, n=55, Mann-Whitney test p<0,001) 24 Thank You for attention! VILNIUS: DBIS 2012 25