MATH1041 Statistics for Life and Social Science Semester 2, 2017 Computing Assignment Assignment release date: The assignment will be released to all students on the 20th of September (Wednesday, Week 9) on Moodle (see “Assessments Information” section). Submission date: Wednesday 11th October (Week 11) before 6pm (Sydney time). Please submit your assignment through Moodle, please see the “Assessments Information” section on Moodle for further information regarding online submission. You must submit a neatly typed assignment converted to pdf format. Data: A data set (in the text file format) will be sent to you via email at your official university email address (see page 2 of this document for further details). Assignment length: No more than six single-sided A4 pages. You are required to include with your submission: • This cover sheet as the first page (this cover sheet is also available as a single document on Moodle). • Your written assignment (no more than three pages). This should also included the attached table completed with your answers (see last page of this document). • Two pages for figures (includes a histogram, two normal quantile plots and a comparative boxplot). Results table /5 Q1 /9 Q2 /20 Q3 /19 Q4 /7 Total /60 1 1(a)(b) Mean: 6.810345 Standard Deviation: 5.002304 The central location of the histogram is 6.81 which is also identified as the median. The spread of data is between 0-15. The shape of the histogram is also significantly right skewed. 2(a(i)) The null hypothesis (Ho) is the mean of points scored by NBA players in the 2016-2017 season which is 8.40 (Ho: µ0=8.40) and the alternative hypothesis is that the mean of points scored in the 2016-2017 season has changed from 8.40 (Ha : µa ≠ 8.40). The value of the T-statistic is obtained by solving t = -2.42. As this is a 2-way hypothesis test, the P-value = 2P(T≥|t|) will be applied. The P-value obtained from 2P(T≥|2.42|) is approximately 0.0106. As the P-value (0.0106) is smaller than 0.05, the statistic provides strong evidence that the true mean in the 2016-2017 season has changed. 2(a(ii)) The 95% confidence interval for the true mean of PPG in the 2016-2017 NBA season can be obtained by applying = (5.495056, 8.125634) = (5.50, 8.13) (2 d.p). Hence we are 95% confident that the true mean of the PPG of the 2017-2017 NBA season lie between 5.50 and 8.13. The confidence interval does not include the mean of 8.40 and thus is consistent with the hypothesis test 2(a)i. which proves that the true mean has shifted. 2(b) 2(c) The normal quantile plot indicates relatively right skewed data as it doesn’t quite follow a straight line of regression. In addition, there are outliers present in the plot. The use of the a t-distribution as a sampling distribution for parts 2(a)I and 2(a)ii are suitable in determining the comparison of data against the regression line in order to conclude the mean of a distribution. 3(a) (i) The null hypothesis (Ho) of the transformed data is the mean of points scored by NBA players in the 2016-2017 season which is 1.63 (Ho: µtrans = 1.63) and the alternate hypothesis is that the mean of points scored in the 2016-2017 season has changed from 1.63 (Ha : µa ≠ 1.63). The value of the T-statistic is obtained by solving , t = -2.377943978 As this is a 2-way hypothesis test, the P-value = 2P(T≥|t|) will be applied. The P-value obtained from 2P(T≥|2.378||) is approximately 0.02079. As the P-value is smaller than 0.05 (0.02079), the statistic provides strong evidence that the true mean in the 2016-2017 season has changed in the next seasons. (ii) The 95% confidence interval for the true mean of PPG in the 2016-2017 NBA season can be obtained by applying = (1.468227, 1.616133) = (1.47, 1.62) (2 d.p). Hence we are 95% confident that the true mean of the PPG of the 2016-2017 NBA season lie between 1.47 and 1.62. The confidence interval does not include the transformed mean of 1.63 and thus is consistent with the hypothesis test 2(a)i. which proves that the true mean has shifted. 3(b) Conclusions from the hypothesis test of the P-value being less than 0.05, indicated that it is statistically insignificant at level 0.05. This shows that there is evidence to go against the null hypothesis. The confidence interval assists in supporting the hypothesis as the null hypothesis (1.63) does not land between the 95% confidence interval of 1.47 and 1.62. 3(c) The transformed PPG data is much more normally distributed as it better follows a linear line of regression. In comparison to the transformed data normal quantile plot and the untransformed data normal quantile plot, the transformed data follows a relatively more linear path. However, it can also be debated that the transformed data quantile plot is slightly right skewed, similar to the untransformed data normal quantile plot. 4(a) Comparative Boxplot: Age Class and PPG 4(b) The comparative boxplot reveals a significant difference between the 25 years and over category compared to the 25 years and under category. The 25 years and over have a much smaller spread in comparison to the 25 years and under. The 25 years and under category also has two existing outliers. The central location of both categories are 6.81. 4(c) In conclusion, players ages 25 and under had a larger range of average PPG’s meaning they had a much better performance in comparison to the 25 years and older category. This is based on the comparative boxplot as ages over 25 years has a much smaller spread compared to the under 25 years’ category. Sample size 𝑛 Sample mean 𝑥 Sample standard deviation 𝑠 𝑡-statistic Degrees of Freedom 𝑃-value for test Reject 𝐻' (Yes/No) 95% Confidence Interval Original Data Transformed Data 58 6.810345 5.002304 -2.42 58 1.54218 0.2812586 -21.458 57 57 0.0106 Yes 5.50, 8.13 0.02079 Yes 1.47, 1.62