DSA8001 - Practicals NS (1)

22/09/2018 DSA8001 - Practicals & Solutions Analysing Normally Distributed Data Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Statistical Inference DSA8001 - Practicals & Solutions Code Analysing Normally Distributed Data Question 1 Head lengths of brushtail possums follow a nearly normal distribution with mean 92.6 mm and standard deviation 3.6 mm. 1. Compute the Z-scores for possums with head lengths of 95.4 mm and 85.8 mm. 2. Use calculated Z-scores to determine how many standard deviations above or below the mean measured head lengths of these two possums fall 3. Head length of which possum is more unusual? Solution 1. Compute the Z-scores for possums with head lengths of 95.4 mm and 85.8 mm Code [1] "z1 = 0.78" Code [1] "z2 = -1.89" 2. A possum with the head length of 95.4 is 0.78 standard deviations ABOVE the mean A possum with the head length of 85.8 is 1.89 standard deviations BELOW the mean 3. Because |z2| = 1.89 >= |z1| = 0.78, opossum with the head length of 85.8 mm is more unusual than opossum with the head length of 95.4 mm Question 2 Suppose the average number of Facebook friends is approximated well by the normal model N(mu = 1500, sigma = 300). Randomly selected person Julie has 1800 friends. 1. She would like to know what percentile she falls among other Facebook users? 2. What is the percentage of people that have more friends than Julie? Solution 1. file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 1/18 22/09/2018 DSA8001 - Practicals & Solutions Code [1] 84.13447 Julie is 84.13 percentile. 2. If 84.13% have less facebook friends than Julie, than the proportion of people that have more friends is 15.87%. Question 3 Suppose the average number of Facebook friends is approximated well by the normal model N(mu = 1500, sigma = 300). What is the probability that a randomly selected person has AT LEAST 1630 friends on Facebook? NOTE: Round solution to 3 decimal places. Solution Code [1] 0.332 Code [1] 0.332 Code [1] TRUE The probability that randomly selected person has at least 1630 friends on Facebook is 0.332. Question 4 Suppose the average number of Facebook friends is approximated well by the normal model N(mu = 1500, sigma = 300). A randomly selected person is at the 79.95th percentile. How many Facebook friends does this person have? Solution Code [1] 1751.951 Randomly selected person, which is at the 79.95th percentile, has 1752 friends on Facebook. Question 5 At Heinz factory the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are noted precisely. If the amount of ketchup in the bottle is below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. 1. What percentage of bottles have less than 35.8 ounces of ketchup? 2. What percentage of bottles PASS the quality control inspection? NOTE: Round solutions to 2 decimal places Solution 1. Code [1] "3.45% of bottles have less than 35.8 oz of ketchup." file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 2/18 22/09/2018 DSA8001 - Practicals & Solutions 2. Code [1] 0.9309637 Code [1] 93.1 93.1% of bottles pass inspection. Question 6 Body temperatures of healthy humans are distributed nearly normally with mean 98.2F and standard deviation 0.73F. What is the cutoff for the lowest 3% of human body temperatures? NOTE: Round solution to 1 decimal place. Solution Code [1] "The cutoff value for the lowest 3% of human body temperature is 96.8F" Statistical Inference Test of single mean (mu) using the Z statistic Question 1 The mean content of a sample of 120 bottles of milk from one days output of a dairy was found to be 0.9975 litres, while the standard deviation of the sample was 0.012 litres. investigate if there is evidence to suggest that the mean content of that days output is different from 1 litre. Solution Code [1] "z = -2.3" Code [1] "p_value = 0.022" Code [1] "Test is significant at 5% level, because 1% < p = 2.2% <= 5%. idence for rejection H0 in favour HA." There is considerable ev Code [1] "95% C.I. is [0.99, 1.01]" Test of the comparison of two means using the Z statistic Question 1 Suppose we wish to determine if there is a difference in mean weight between the two sexes in a particular bird species, at a 5% significance level. The following data were obtained: file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 3/18 22/09/2018 DSA8001 - Practicals & Solutions Male sample size n1 = 125, mean weight x1_bar = 92.31 g, and variaiance var1 = 56.22 g^2 Female sample size n2 = 85, mean weight x2_bar = 88.84 g, and variaiance var2 = 65.41 g^2 If significant, give a 95% CI for mu1 - mu2. Solution Code [1] "z = 3.1" Code [1] "p_value = 0.002" Code [1] "Test is highly significant at 5% level, because 0.1% < p = 0.2% <= 1%. erable evidence for rejection H0 in favour HA." There is consid Code [1] "95% C.I. is [1.31, 5.63]" Test of single proportion Question 1 In a random sample of 120 graduates, 78 spent 3 years at university and 42 more than 3 years. Test the hypothesis that 70% of graduates obtain degrees in 3 years. Give a 95% c.i. for the population proportion. NOTE: Round solution to 3 decimal places. #### Solution Code [1] "z = -1.2" Code [1] "p_value = 0.232" Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p_value = 23.2% > 5%." Code [1] "95% C.I. is [0.568, 0.732]" Code 1-sample proportions test without continuity correction data: c(78) out of c(120), null probability c(0.7) X-squared = 1.4286, df = 1, p-value = 0.232 alternative hypothesis: true p is not equal to 0.7 95 percent confidence interval: 0.5612132 0.7294810 sample estimates: p 0.65 Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 4/18 22/09/2018 DSA8001 - Practicals & Solutions [1] "p_value_alt = 0.232" Code [1] TRUE Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p_value_alt = 23.2% > 5%." Code [1] "95% C.I. is [0.561, 0.729]" Test of two proportions Question 1 We wish to compare the germination rates of spinach seeds for two different methods of preparation: Method A: 80 seeds sown, 65 germinate Method B: 90 seeds sown, 80 germinate NOTE: Round solution to 3 decimal places. #### Solution Code [1] "z = -1.4" Code [1] "p_value = 0.16" Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p_value = 16% > 5%." Code [1] "95% C.I. is [-0.184, 0.031]" Code 2-sample test for equality of proportions without continuity correction data: c(x1, x2) out of c(n1, n2) X-squared = 1.9703, df = 1, p-value = 0.1604 alternative hypothesis: two.sided 95 percent confidence interval: -0.1837708 0.0309930 sample estimates: prop 1 prop 2 0.8125000 0.8888889 Code [1] "p_value_alt = 0.16" Code [1] TRUE file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 5/18 22/09/2018 DSA8001 - Practicals & Solutions Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p_value_alt = 16% > 5%." Code [1] "95% C.I. is [-0.184, 0.031]" Tests based on the t-distribution (Single Mean) Question 1 What proportion of the t-distribution with 18 degrees of freedom falls below -2.10? NOTE: Round solution to 3 decimal places. Solution Code [1] 0.025 Code [1] 0.025 Question 2 What proportion of the t-distribution with 20 degrees of freedom falls above 1.65? NOTE: Round solution to 3 decimal places. Solution Code [1] 0.057 Question 3 What proportion of the t-distribution with 2 degrees of freedom falls 3 standard deviations from the mean (above or below)? NOTE: Round solution to 3 decimal places. Solution Code [1] "Proportion below 3 SD: 0.0477329831333546" Code [1] "Proportion above 3 SD: 0.0477329831333546" Code [1] "Proportion of the t-distribution that falls 3SD above or below the mean is: 0.095" Code [1] "Proportion of the t-distribution that falls 3SD above or below the mean is: 0.095" Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 6/18 22/09/2018 DSA8001 - Practicals & Solutions [1] "Proportion of the t-distribution that falls 3SD above or below the mean is: 0.095" Question 4 The temperature of warm water springs in a basin is reported to have a mean of 38C. A sample of 12 springs from the west end of the basin had mean temperature 39.4 and variance 1.92. 1. Have springs at the west end a greater mean temperature? 2. Have springs at the west end a different mean temperature? Give a 95% c.i. for the mean temperature. NOTE: Round solution to 3 decimal places. Solution Code [1] "Under H0, t=3.5 is an observation from t11" 1. Code [1] "p_value = 0.002" Code [1] "Test is highly significant at 5% level, because 0.1% < p = 0.2% <= 1%. erable evidence for rejection H0 in favour HA." There is consid 2. Code [1] "p_value = 0.004" Code [1] "Test is highly significant at 5% level, because 0.1% < p = 0.4% <= 1%. erable evidence for rejection H0 in favour HA." There is consid Code [1] 38.5196 40.2804 Code [1] "95% C.I. is [38.5196, 40.2804]" Question 5 Sweets producing company was interested in the mean net weight of contents in an advertised 80-gram pack. The manufacturer has precisely weighed the contents of 24 randomly selected 80-gram packs from different stores and recorded the weights as follows: Code 1. Investigate the hypothesis that the sweets content in the packages is lesser than what is claimed on the package. 2. Investigate the hypothesis that the sweets content in the packages is lesser than what is claimed on the package. Give a 95% c.i. for the mean weight. NOTE: Round solution to 3 decimal places. Solution Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 7/18 22/09/2018 DSA8001 - Practicals & Solutions [1] 79.36917 Code [1] "Under H0, t=-0.947 is an observation from t23" 1. Code [1] "p_value = 0.177" Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA, because p = 17.7% > 5%." Code [1] TRUE Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA, because p = 17.7% > 5%." 2. Code [1] "p_value = 0.354" Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA, because p = 35.4% > 5%." Code [1] 77.991 80.748 Code [1] "95% C.I. is [77.991, 80.748]" Code [1] TRUE Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA, because p = 35.4% > 5%." Code [1] "95% C.I. is [77.991, 80.747]" Tests based on the t-distribution (Paired Coparison) Question 1 file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 8/18 22/09/2018 DSA8001 - Practicals & Solutions Consider an experiment to compare the effects of two sleeping drugs A and B. There are 10 subjects and each subject receives treatment with each of the two drugs (the order of treatment being randomised). The number of hours slept by each subject is recorded. Is there any difference between the effects of the two drugs? Give a 95% c.i. for the unknown mean difference. NOTE: Round solution to 3 decimal places. Code Solution Code [1] 1.58 Code [1] "Under H0, t=4.062 is an observation from t9" Code [1] "p_value = 0.003" Code [1] "Test is highly significant at 5% level, because 0.1% < p = 0.3% <= 1%. erable evidence for rejection H0 in favour HA." There is consid Code [1] 0.70 2.46 Code [1] "95% C.I. is [0.7, 2.46]" Code [1] TRUE Code [1] "Test is highly significant at 5% level, because 0.1% < p_alt = 0.3% <= 1%. nsiderable evidence for rejection H0 in favour HA." There is co Code [1] "95% C.I. is [0.7, 2.46]" Tests based on the t-distribution (Independent Samples) Question 1 Two methods of oxidation care are used in an industrial process. Repeated measurements of the oxidation time are made to test the hypothesis that the oxidation time of method 1 is different than that of method 2 on average. Method 1: Sample size = 9, Sample mean = 41.3, Sample Variance = 20.7 Method 2: Sample size = 8, Sample mean = 48.9, Sample Variance = 34.2 Assuming that the unknown variances are equal, investigate if there is there any difference between the oxidation times of the two methods? Give a 95% c.i. for the unknown mean difference. Solution file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 9/18 22/09/2018 DSA8001 - Practicals & Solutions Code [1] "Under H0, t=-3.01 is an observation from t15" Code [1] "p_value = 0.009" Code [1] "Test is highly significant at 5% level, because 0.1% < p = 0.9% <= 1%. erable evidence for rejection H0 in favour HA." There is consid Code [1] -12.981 -2.219 Code [1] "95% C.I. is [-12.981, -2.219]" Question 2 Two methods of oxidation care are used in an industrial process. Repeated measurements of the oxidation time are made to test the hypothesis that the oxidation time of method 1 is different than that of method 2 on average. The following measurements were recorded: Method 1: c(29.915269, 8.920123, 36.647273, 54.038639, 37.583526, 19.860171, 13.470132, 43.139612, 39.825299) Method 2: c(28.970122, 43.563546, 4.161069, 39.774523, 5.705720, 93.562336, 3.801087, 79.906087) Assuming that the unknown variances are equal, investigate if there is there any difference between the oxidation times of the two methods? Give a 95% c.i. for the unknown mean difference. Solution Code [1] "Under H0, t=-0.472 is an observation from t15" Code [1] "p_value = 0.644" Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p = 64.4% > 5%." Code [1] -32.762 20.879 Code [1] "95% C.I. is [-32.762, 20.879]" Or Alternatively Code [1] TRUE Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 10/18 22/09/2018 DSA8001 - Practicals & Solutions [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p = 64.4% > 5%." Code [1] "95% C.I. is [-32.768, 20.884]" Tests based on the Chi-Square distribution Question 1 What proportion of the chi-square distribution with 9 degrees of freedom falls above 17? Question 1 Code [1] 0.049 Code [1] 0.049 Question 2 The geneticist Mendel evolved the theory that for a certain type of pea, the characteristics Round and Yellow, R and Green, Angular and Y, A and G occurred in the ratio 9:3:3:1. He classified 556 seeds and the observed frequencies were 315, 108, 101 and 32. Test Mendel’s theory on the basis of these data. Solution Code [1] "Under H0, phi_squared=0.47 is an observation from Chi-square_3" Code [1] "p_value = 0.925" Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p = 92.5% > 5%." Or Alternatively Code [1] TRUE Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p = 92.5% > 5%." Question 3 In a random sample of 120 graduates, 78 spent 3 years at University and 42 more than 3 years. Test hypothesis that 70% obtain degree in 3 years. Solution Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 11/18 22/09/2018 DSA8001 - Practicals & Solutions [1] 1.2 Code [1] "Under H0, phi_squared=1.2 is an observation from Chi-square_1" Code [1] "p_value = 0.273" Code [1] "The test is not significant at 5% level and H0 is not rejected in favour of HA because p = 27.3% > 5%." Analysis of Variance (ANOVA) Question1 Iris data set that comes preloaded with R gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. 1. Create a copy of the Iris dataset that contains relevant variables, and: 1.1. Perform exploratory data analysis which will reveal how many flowers are in each category, how many sepal width values are missing, as well as what is the sepal width mean and standard deviation per each category. 1.2 Perform graphical analysis which will reveal how collected sepal width values compare across the species (histograms and boxplots) 1.3. Investigate if there is a difference between the means of the sepal width variable among this three species, and if there is, perform further investigation on which species have statistically significant difference between the means? 2. Similarly, repeat the investigation process for the other variables? Solution 1. create a copy of the iris dataset that contains relevant variables Code 1 2 3 4 5 6 Sepal.Length Sepal.Width Petal.Length Petal.Width Species 5.1 3.5 1.4 0.2 setosa 4.9 3.0 1.4 0.2 setosa 4.7 3.2 1.3 0.2 setosa 4.6 3.1 1.5 0.2 setosa 5.0 3.6 1.4 0.2 setosa 5.4 3.9 1.7 0.4 setosa Code Observations: 150 Variables: 5 $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 5.8, 5.7, 5.4, 5.1, 5.7,... $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 4.0, 4.4, 3.9, 3.5, 3.8,... $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.2, 1.5, 1.3, 1.4, 1.7,... $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.4, 0.4, 0.3, 0.3,... $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, setosa... 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, setosa, setosa, setosa, setosa, file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 12/18 22/09/2018 DSA8001 - Practicals & Solutions Code 1.1 Perform exploratory data analysis which will reveal how many flowers are in each category, how many sepal width values are missing, as well as what is the sepal width mean and standard deviation per each category. Code [38;5;246m# A tibble: 3 x 5[39m Species n_samples n_missing mean_sepal_width sd_sepal_width [3m[38;5;246m<fct>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39 m[23m [3m[38;5;246m<dbl>[39m[23m [3m[38;5;246m<dbl>[39m[23m [38;5;250m1[39m setosa 50 0 3.43 0.379 [38;5;250m2[39m versicolor 50 0 2.77 0.314 [38;5;250m3[39m virginica 50 0 2.97 0.322 1.2 Perform graphical analysis which will reveal how collected sepal width values compare across the species (histograms and boxplots) Code Code Code Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 13/18 22/09/2018 DSA8001 - Practicals & Solutions 1.3. Investigate if there is a difference between the means of the sepal width variable among this three species, and if there is, perform further investigation on which species have statistically significant difference between the means? H0: The mean sepal width is the same across all species HA: At least one mean is different than others Code Df Sum Sq Mean Sq F value Pr(>F) Species 2 11.35 5.672 49.16 <2e-16 *** Residuals 147 16.96 0.115 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Code [1] "p_value = 0" Code [1] "The test is very significant at 5% level because p = 0% < 5%. We are very confidendt th at HA is to be preferred to H0, i.e. at least one mean is different from the others" Check ANOVA conditions Assuming the independence of data, we should check whether the other conditions are valid as well. Condition 1: The variability across the groups should be about equal. Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 14/18 22/09/2018 DSA8001 - Practicals & Solutions In the plot above, there is no evident relationship between residuals and fitted values which implies equal variances across the groups (homogeniety of variances). Alternatively, we can also use Levene’s test to test equality of variances: Code Levene's Test for Homogeneity of Variance (center = median) Df F value Pr(>F) group 2 0.5902 0.5555 147 Because the p-value obtained from Levene’s test is p = 55.5% > 5%, the thest is not significant and there is no evidence to suggest that the variance across species is statistically significantly different (i.e. we can assume the equalirty of variance). Condition 2: The observations within each group should be nearly normal. This can be very difficult to determine in many real time situations, and to achieve this we need to mean-center each sepal width by it’s respective group mean. These group-wise, mean-centered values are also known as residuals, and by using them we can assess the normality of all observations as a whole. Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 15/18 22/09/2018 DSA8001 - Practicals & Solutions Alternatively, for testing normality of residuals we can use Shapiro-Wilk test: Code Shapiro-Wilk normality test data: anova_test_residuals W = 0.98948, p-value = 0.323 Because the p-value obtained from Shapiro-Wilk test is p = 32.3% > 5%, the thest is not significant and there is no evidence to suggest that the normality assumption is violated (i.e. we can assume normality of the residuals). As we concluded that at least one pair of means differ, and because we do not know which one, we need to use ttests with Bonferroni correction to compare each pair of means to each other (i.e. multiple comparisons). Code Pairwise comparisons using t tests with pooled SD data: iris_copy$Sepal.Width and iris_copy$Species setosa versicolor versicolor < 2e-16 virginica 1.4e-09 0.0094 P value adjustment method: bonferroni Conclusions: 1. In the case of the mean difference in sepal widths between species versicolor and virginica, the test is highly significant at 5% level because 0.1% < p = 0.94% < 1%. There is considerable evidence for rejection H0 in favour of HA, i.e. considerable evidence of a difference between the average sepal widths of the species versicolor and virginica. 2. In other two comparisons (setosa-versicolor and setosa-virginica) the test is very highly significant at 5% level because p = 0% < 1%. Therefore, we are very confident that HA is to be preferred to H0, i.e very confident that there is a difference between the average sepal widths of the species setosa-versicolor and setosa-virginica. Simple Linear Regression file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 16/18 22/09/2018 DSA8001 - Practicals & Solutions Question1 Iris data set that comes preloaded with R gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. Create a copy of the Iris dataset that contains variables Petal.Width and Petal.Length, and: 1. Fit a simple linear regression create model for predicting petal widths based on the petal lengths. 2. Plot the line of the best fit against the input dataset 3. State the estimated simple linear regression equation 4. State whether there is a significan relationship between the predictor and response variable. 5. Assuming that both petal width and length values are given in milimetres (mm), state the interpretation of the estimated slope parameter 6. If the petal lengths are 1.5, 1.6 and 1.7 mm, what are their estimated petal widths? Solution Code 1. Code 2. Code 3. Code file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 17/18 22/09/2018 DSA8001 - Practicals & Solutions Call: lm(formula = Petal.Width ~ Petal.Length, data = iris_copy) Residuals: Min 1Q Median -0.56515 -0.12358 -0.01898 3Q 0.13288 Max 0.64272 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.363076 0.039762 -9.131 4.7e-16 *** Petal.Length 0.415755 0.009582 43.387 < 2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2065 on 148 degrees of freedom Multiple R-squared: 0.9271, Adjusted R-squared: 0.9266 F-statistic: 1882 on 1 and 148 DF, p-value: < 2.2e-16 The summary output shows the following components: Call: Displays the formula that has been used to fit the regression model. Residuals: Displays summary statistics of residuals, which by definition should have a mean equal to zero. Therefore, as an indication of the normally distributed residuals, median should not be far from zero, and the minimum and maximum should be roughly equal in absolute value. NOTE: Similar to ANOVA, normality of the residuals can be inspected using Shapiro-Wilk test. Coefficients: Displays the values of the intercept and slope parameters,and their statistical significance. ANSWER: The estimated simple regression equation is: Petal.Width_Estimated = -0.363076 + 0.415755 * Petal.Length 4. By looking at the summary output above, we can see that the p-value < 0.01%, so we can reject the null hypothesis that β_1 = 0. Because the test is very highly significant at 5% level, we are very confident that there is a significant relationship between the predictor and the response variable in the linear regression model. 5. For each additional mm increase in length, we would expect petal width to increase for 0.415755 mm. 6. Code 1 2 3 0.2605576 0.3021331 0.3437087 file:///Users/aleks/OneDrive%20-%20Queen's%20University%20Belfast/R_Projects/DSA8001_Lab/DSA8001_Practicals_Solutions.nb.html# 18/18

DSA8001 - Practicals NS (1)

Related documents

Products

Support

DSA8001 - Practicals NS (1)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib