Biostatistics Report No 1 October 3, 2018 Question 1: Series 1 data, Table 1, demonstrates the average study time of students per week. Table 1: Study time of students Student 1 Study time 2 per week 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 15 18 23 25 27 31 32 35 37 41 42 43 45 49 50 53 56 57 60 According to the histogram illustrated in Figure 1, this distribution is left-skewed. Table 2 shows the calculations of descriptive statistics (mean, median, standard deviation, variance, coefficient of variation, and interquartile) in summary. The fact that the median (Q2) is closer to the upper quartile (Q3) than the lowest quartile (Q1), is saying that this data series is negative skewed. Boxplot in Figure 2 also reflects the negative skeweness of this series. From computations: Q1= 26.50 , Q2=39, and Q3=49.25. As the difference between Q3 and Q2 is less than that of Q2 and Q1, then this distribution is negatively skewed. It means that the average study time of students per week is closer to the upper 50% than to the lower. In other words, the average study time of most of the students is higher than the median. Also, the mean is less than the meadian. Table 2: Data for left-skewed distribution Mean Median Standard Deviation Variance 37.05 39 15.52 240.79 Coefficient of Variation 41.88 Interquartile 22.75 Figure 1: Study time of students per week Figure 2: Average study time of students per week Question 2: Series 2 data, Table 3, demonstrates salaries of a group of employees. Table 3: Employees and salaries Employees 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Saleries 110000 30000 32000 32000 33000 33000 34000 34000 38000 38000 38000 42000 43000 45000 45000 48000 50000 55000 55000 65000 According to the histogram shown in Figure 3, this distribution is right-skewed. Table 4 presents the calculations of descriptive statistics (mean, median, standard deviation, variance, coefficient of variation, and interquartile) in summary.The Boxplot in Figure 4 also reflects the positive-skewed of this data. To find out that these data are rightskewed, it is enough to consider that the median (Q2) is closer to the lowest quartile (Q1) than the upper quartile (Q3). From computations: Q1= 33750 , Q2=40000, and Q3=48500. As the difference between Q2 and Q1 is less than that of Q2 and Q3, then this distribution is positively skewed. It means that the salaries of employees are closer to the lowest 50%. In other words, the salary of most of the employees is less than the median. There is one extreme in data that is distant from the average salary of the employees. Maybe that is the salary of the manager! Table 4: Data for right-skewed distribution Mean Median Standard Deviation Variance 45000 40000 17935.56 321684211 Figure 3: Salary of employees Coefficient of Variation 39.87 Interquartile 14750 Figure 4: Salary per employee Question 3: Series 3 data, Table 5, demonstrates the distribution of biostatic midterm grads for a group of 29 students. Table 5: Midterm grades of Biostatics students Students Biostatic mid term exam 1 61 2 65.5 3 66.75 4 70 5 70.5 6 71.9 7 72.4 8 73 9 74.9 10 75 11 75.2 12 76 15 76 13 76.8 14 77 16 78 18 78 17 78.15 19 80 20 81.2 24 81.3 21 82.3 23 81.6 27 82.2 25 81.35 22 85.2 26 85.1 28 85.15 29 93 According to the histogram presented in Figure 5, this distribution is symmetric. Table 6 shows the calculations of descriptive statistics (mean, median, standard deviation, variance, coefficient of variation, and interquartile) in summary. The Boxplot in Figure 6 also reflects the symmetrical distribution of this data. To have a symetric distribution, it is required that the median (Q2) be placed in the middle of the lowest quartile (Q1) and the upper quartile (Q3). From computations: Q1= 73 , Q2=77, and Q3=81.35. It means that midterm grades of the biostatic course are evenly distributed between the lower and the upper 50 percents. Table 6: Data for symmetrical distribution Mean Median Standard Deviation Variance 77.05 77 6.71 45.01 Coefficient of Variation 8.7 Figure 5: Distribution of biostatic midterm exam grades Interquartile 8.35 Figure 6: Distribution of biostatic midterm exam grades. Question 4: This study gives an overview of the application of boxplots in food chemistry. To represent the work, five different examples are illustrated.The examples involve relative sweetness of sugars and sugar alcohols with respect to sucrose, the potassium content of fruits and vegetables, amino acid content of egg white and yolk, chemical composition of freshwater and saltwater fish, and change in fatty acid composition of soybean oil through traditional cultivation or genetic engineering techniques. As a result of this work, the authors concluded that boxplot is an easy way to interpret the studies in food chemistry and it provides a good overview of data. One of boxplots in this study (Figure 7) selected as a candidate to consider the distribution of data. Figure 7: Boxplots for the chemical composition (protein and fat) of freshwater and saltwater fish. In protein freshwater series, median (Q2) is closer to the lowest quartile (Q1) than to the upper quartile (Q3), then distribution for this series is right-skewed. It means that the lowest 50% of ranked species of fish exhibit narrow range for protein composition in comparison to the opposite side. This is also true for fat of freshwater and saltwater. The range of fat composition in the lowest 50% is narrower in comparison to the opposite side. In this way, we say that distribution is right or positively skewed. For protein in saltwater fish, upper and lower boxes are equal. It means that Q2 is at the middle of Q1 and Q3. In other words, the protein composition range is equal for species. This distribution is symmetric. In Table 7, the correspondence between the boxplot shape and the distribution is visible. Table 7: Distribution of chemical composition (protein and fat) of freshwater and saltwater fish. Protein Freshwater fish Right-skewed Protein Saltwater fish Symmitric Fat Freshwater fish Right-skewed Fat Saltwater fish Right-skewed Bibliography João, E. V., and Ferreira, R. M. (2016). Box-and-Whisker plots applied to food chemistry. Journal of Chemical education, 2026−2032. Question 5: Table 8 presents the data series used in this problem. Table 8: X and Y data sets. X 2 4 7 8.3 9 12 14 14.8 18 Y 16 14.5 12.5 12.01 11 9 7.5 6.4 3 In Figure 8, the values of a set (X) increase while the values of set (Y) decrease. Threfore, the two sets of related data show a negative trend. If we imagine a trend line for these data sets, the slope of the line will be negative. Figure 8: Negative trend in series Question 6: Table 9 shows the data series used in this problem. Table 9: X and Y data sets X 3 4 8 9 12 15 18 30 35 40 Y 6 9 12 14 18 20 24 33 36 40 In Figure 9, the values of set (X) as well as the values of set (Y) are increasing, then the two sets of related data show a positive trend. The trend line for these sets will have a positive slope. Figure 9: Positive trend in series Question 7: Table 10 presents the data series used in this problem. Table 10: X and Y data sets X -41 -32 -28 -24 -11 6 10 12 48 110 Y 1.75 2.9 3.3 4.1 4.5 5.3 6.1 6.55 7.5 8 Table 11 demonstrates the means and standard deviations of two series (X, Y). The two series has the same mean with different standard deviations. Series X has a large standard deviation in comparison to series Y. Table 11: Two series with same mean and different standard deviation mean 5.0 Series X standard deviation 45.49 Series Y mean 5.0 standard deviation 2.05