ISLAMIC UNIVERSITY OF TECHNOLOGY (IUT) Organisation of Islamic Cooperation (OIC) Board Bazar, Gazipur, Bangladesh. Department of Civil and Environmental Engineering (CEE) COURSE TITLE : Civil Engineering Data Analysis COURSE CODE : CEE 4655 ASSIGNMENT NAME : Statistical Problem solving using software. STUDENT NAME : PRETOM MD. TAHMIDUR RAHMAN STUDENT ID : 180051235 DATE OF SUBMISSION: 10/03/2022 DATE OF PERFORMANCE: 29/04/2022 SUBMITTED TO: Dr. Shakil Mohammad Rifaat, Professor. Bayes Theorem Question: a) Three machines A, B, and C are capable of producing (XX+5)%, (XX10)% and XX % of the total number of items of a factory. The percentage of defective output of these machines are 2%, 4% and 5%. XX denotes the last two digits of student ID. (i) (ii) Find the probability of a defective item if it is randomly selected. For a randomly selected item, what is the probability that machine A produced the defective item? Hand Calculation Calculation using Software (Microsoft Excel) P(Item produced by X1)= P(Item produced by X2)= P(Item produced by X3)= 0.4 0.25 0.35 P(defective/Item produced by X1)= P(defective/Item produced by X2)= P(defective/Item produced by X3)= 0.02 0.04 0.05 P(Item produced by A/defective)= 0.225352 Screenshot: Comparison: The probability obtained using both the hand calculation and Microsoft Excel results in the same value. Poisson distribution Q: b) The district of Bogura has fire burn on an average of 1 in 1000 houses during a year. If there are XX00 houses in Bogura, find the probability of the following number of houses having a fire burn during the year: (i) Exactly 5 houses (ii) No house (iii) 1 house at best XX denotes the last two digits of student ID. Hand Calculation Calculation using Software (Microsoft Excel) p= n= 𝜆= 0.001 3500 3.5 exactly 5 P(X=5)= 0.132169 none P(X=0)= not more than 1 0.030197 P(X≤1)= 0.135888 Screenshot: Comparison: So the obtained value from Excel and calculation done in hand are totally equal and similar to each other. Binomial Distribution Question: c) The probability that a student will secure A+ in Data Analysis is 0.XX. Find that out of 5 students, probability of securing A+ : (i) (ii) (iii) (iv) No Student 1 student At least 1 student All students XX denotes the last two digits of student ID. Hand Calculation Calculation using Software (Microsoft Excel) p n 0.35 5 none P(X=0)= 0.116029 exactly 1 P(X=1)= 0.312386 At least 1 P(X≥1)= 0.883971 all P(X=5)= 0.005252 Screenshot: Comparison: After analysis, it is seen that the value we got from Excel is the same as that obtained by hand calculation. Normal Distribution Question: d) The mean daily salary of a laborer is Tk. (130+ XX) and the standard deviation is Tk. XX. If a laborer is selected at random, find the probability that the laborer earns: (i) Between Tk. 165 and Tk. 200 per day (ii) Above tk. 200 per day (iii) Below Tk. 150 per day XX denotes the last two digits of student ID. Hand Calculation Calculation using Software (Microsoft Excel) Parameters Mean Std Dev 165 35 (i) P(165≤x≤200) 0.341344746 (ii) P(x>200) 0.158655254 (iii) P(x<150) 0.334117571 Screenshot: Comparison: So the value obtained from both hand calculation and Excel are equal. ANOVA Question: e) An experiment was done to find the effect of flow rate of Hexafluroethane (C2F6) on the uniformity of etch on a silicon water for the manufacturing of IC circuit. The result of percentage of uniformity for six replicates in three experiments are as follows: Observation 1 2 3 4 5 6 C2F6 Flow (SCCM) 125 160 200 2.7 4.9 4.6 4.6 4.6 3.4 2.6 5 2.9 X.X-0.5 X.X+0.7 X.X X.X-0.3 X.X X.X+0.6 X.X X.X+0.7 X.X+1.6 Does the flow rate of C2F6 affect etch uniformity? Use a significance level of 0.05. XX denotes the last two digits of student ID. Hand Calculation Calculation using Software (Microsoft Excel) Anova: Single Factor SUMMARY Groups 125 160 200 Count Sum 19.6 26.4 23.6 6 6 6 Average Variance 3.266667 0.534667 4.4 0.308 3.933333 0.674667 ANOVA Source of Variation Between Groups Within Groups Total Screenshot of analysis: SS 3.893333 7.586667 11.48 df MS F P-value F crit 2 1.946667 3.848858 0.044753 3.68232 15 0.505778 17 Comparison: Hand calculation gives an Fcalculated value equal to 3.85 and Excel provides 3.848858. Besides, the value obtained for P-value from excel is 0.044753 where it is below 0.05 obtained from hand calculation. So they are almost equal to each other. Contingency Table Question: f) Three medicine companies Beximco, ACME and Square marketed three different medicines for cold, namely Fexo, Brodil and Deslo respectively. A survey was conducted on their effectivity in 2000, 2010 and 2020 on patients in a certain hospital. Following are some data from surveys of these three medicines: Year 2000 2010 2020 Fexo 12 XX-13 97 Brodil Deslo XX 94 XX+5 52 25 XX-17 Does the data seem to be independent of year? Test the hypothesis with a significance level of α=0.05 and also find the P-value of the test statistic. XX denotes the last two digits of student ID. Hand Calculation Calculation using Software (Microsoft Excel) Year 2000 2010 2020 Total n Fexo 12 22 97 131 395 Observed Data Brodil Deslo 35 94 40 52 25 18 100 164 u1 0.356962 u2 0.288608 u3 0.35443 v1 v2 v3 Total 141 114 140 395 Year 2000 2010 2020 Total Expected Frequency Fexo Brodil Deslo 46.7620 35.6962 58.54177 37.8075 28.86076 47.33165 46.43037 35.44304 58.12658 131 100 164 0.331646 0.253165 0.41519 Χ02 Calculation: ∑[{(Oi-Ei)2}/Ei] Year 2000 2010 2020 Total Here, Χ02= 144.5563 df= 4 P-Value= 2.98523E-30 Fexo 25.84144711 6.609255577 55.07787157 87.52857426 Brodil Deslo Total 0.013578 21.47673 47.33176 4.299356 0.460443 11.36905 3.076967 27.70062 85.85546 7.389901 49.6378 144.5563 Total 141 114 140 395 Screenshot of Analysis: Comparison: The value obtained in hand calculation is Χ02=146.16 and Excel provides Χ02=144.5563 and P-value from hand calculation is 2 x 10-30 and 2.98523E-30 from hand calculation and Excel respectively. So the value is almost equal to each other. Paired T test Question: g) Ten sprinters have participated in a 10 seconds race before and after exercise. The distance traversed before and after exercise in 10 seconds are: Before 195 2XX-22 2XX 201 187 2XX-25 215 246 294 310 After 187 195 2XX-14 190 175 197 199 2XX-14 2XX+43 285 Find out whether the exercise program was effective or not. Use a significance level of 0.05. XX denotes the last two digits of student ID. Hand Calculation Calculation using Sofware (Microsoft Excel) Before After Mean 230.6 214.8 Variance 1733.6 1436.622 10 10 Observations Pearson Correlation 0.994433 Hypothesized Mean Difference 0 df 9 t Stat 8.900722 P(T<=t) one-tail 4.67E-06 t Critical one-tail 1.833113 P(T<=t) two-tail 9.35E-06 t Critical two-tail 2.262157 Screenshot of Analysis: Comparison: After analysis, it is seen that both the hand calculation and Microsoft Excel provides same value of t-test and it is 8.9. The P-Value is also almost equal. Unpaired T Test (Equal Variances) Question: h-i) Wet chemical is often used for the removal of silicon from the backs of wafers prior to metallization while manufacturing semiconductors. The etch rate is an important characteristic in this process and follows Normal Distribution, Two different etching solutions have been compared using two different random samples of 5 wafers for each of the solutions. The observed etch rates are as follows (in mils per minute): Solution 1 X.X+7.1 X.X+6.8 X.X+6.5 X.X+6.8 X.X+6.6 Solution 2 X.X+6.5 X.X+6.7 X.X+7.2 X.X+6.9 X.X+6.8 How will you conclude about the differences that the mean etch rate is the same for both solutions? Use α=0.05 and assume the population variances is equal for solutions. X.X denotes the last two digits of student ID. Hand Calculation Calculation using Software (Microsoft Excel) Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Solution 1 10.26 0.053 5 0.06 0 8 -0.3873 0.354318 1.859548 0.708635 2.306004 Solution 2 10.32 0.067 5 Screenshot of Analysis: Comparison: The calculation done in Microsoft Excel provides T-statistics value as -0.3873 and P-value (2-tailed) of 0.708635 and the T-statistics value obtained from hand calculation are -0.387 and P-value is between 0.5 and 0.8, i.e. it is greater than α=0.05. So both the hand calculation and the software gives us the same value. Unpaired T-Test (Unequal Variances) Question: h-ii) The BOD level in the lakes of IUT and JU in 10 random days have been measured and the result are as follows (in ppm unit): IUT JU 346.55 56.73 2XX 52.34 65.48 51.26 50 44.44 49 37.25 43.48 36.79 42.46 34.18 39.97 30.29 33.5 29.4 32.9 28.65 Draw a conclusion based on the level of BOD of these two lakes with a significance level of 0.05. Assume that the population variances are not equal. XX denotes the last two digits of student ID. Hand Calculation Calculation using Software (Microsoft Excel) Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail IUT JU 93.834 40.133 11550.88 107.2998233 10 10 0 9 1.572776 0.075111 1.833113 0.150221 2.262157 Screenshot of Analysis: Comparison: The calculation done in Microsoft Excel provides T-statistics value as 1.572776 and Pvalue (2-tailed) of 0.150221 and the T-statistics value obtained from hand calculation are 1.57 and Pvalue is between 0.1 and 0.2, i.e. it is greater than α=0.05. So both the hand calculation and the software gives us the same value. Multiple Linear Regression Question: i) The data given below shows stack-loss from a plant oxidizing ammonia to nitric acid with respect to flow of air and temperature: i) ii) iii) iv) v) Air Temperature Stack Flow Loss XX+45 27 42 80 27 XX 75 25 XX 62 XX-11 28 XX+27 22 18 62 23 18 62 24 19 62 24 XX-15 XX+23 23 15 58 18 14 58 18 14 58 17 13 58 18 11 XX+23 19 12 50 18 8 50 18 7 XX+15 19 8 50 19 8 50 20 9 56 XX-15 15 70 20 15 Find a linear regression equation for the model. Calculate R2. Calculate R Find the Radj. Conduct a global test of hypothesis to test whether any of the regression equations are not equal to zero. Use α=0.05. XX denotes the last two digits of student ID. Hand Calculation Calculation using Software (Microsoft Excel) SUMMARY OUTPUT Regression Statistics Multiple R 0.952011 R Square 0.906325 Adjusted R Square 0.895916 Standard Error 3.161565 Observations 21 ANOVA df Regression Residual Total SS MS F 2 1740.748 870.3739 87.07661 18 179.9189 9.995496 20 1920.667 Coefficients Standard Error Intercept -48.0197 5.016082 x1 x2 t Stat -9.57315 P-value Significance F 5.55423E-10 Lower 95% Upper 95% -37.4813 Lower 95.0% -58.5581 Upper 95.0% -37.4813 1.74E-08 58.55811159 0.634746 0.123677 5.132287 6.98E-05 0.374909949 0.894581 0.37491 0.894581 1.279733 0.358743 3.567275 0.002202 0.526043168 2.033424 0.526043 2.033424 RESIDUAL OUTPUT Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Screenshot of Analysis: Predicted y 37.31273 37.31273 31.57954 22.04811 19.48864 20.76838 22.04811 22.04811 18.2294 11.83073 11.83073 10.551 11.83073 13.11046 6.752764 6.752764 8.032498 8.032498 9.312231 13.1207 22.00714 Residuals 4.687268 -2.31273 3.420463 5.951889 -1.48864 -2.76838 -3.04811 -2.04811 -3.2294 2.169271 2.169271 2.449005 -0.83073 -1.11046 1.247236 0.247236 -0.0325 -0.0325 -0.31223 1.879295 -7.00714 Comparison: All the values obtained from Excel and hand calculation starting from coefficients, SSE, SST, SSR, MSR, MSE, R2, Radj2 , F-value and so on are almost equal to each other. Non Parametric Statistics (Sign Test) Question: j) The arsenic level (in ppm) is routinely measured in a certain chemical product. The experiment provided the following data: OBSERVATION ARSENIC LEVEL 1 2.XX+0.35 2 2.XX+0.15 3 1.72 4 1.6 5 1.9 6 2.XX+0.25 7 1.3 8 1.81 9 2.XX-0.25 10 2.7 11 2.5 12 2.36 13 2.XX-0.35 14 1.75 15 1.42 16 1.81 17 2.XX-0.35 18 1.9 19 2.XX 20 1.93 21 2.39 22 1.61 Can it be claimed that median Arsenic level is below 2.5 ppm? State and the appropriate hypothesis using the sign test with α=0.05 and also find the P-value. Hand Calculation Calculation using Software (Microsoft Excel) Observation Arsenic xi-2.5 Level (xi) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2.7 2.5 1.72 1.6 1.9 2.6 1.3 1.8 2.1 2.7 2.5 2.36 2 1.75 1.42 1.81 2 1.9 2.35 1.93 2.39 1.61 0.2 0 -0.78 -0.9 -0.6 0.1 -1.2 -0.7 -0.4 0.2 0 -0.14 -0.5 -0.75 -1.08 -0.69 -0.5 -0.6 -0.15 -0.57 -0.11 -0.89 Sign 1 0 -1 -1 -1 1 -1 -1 -1 1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Median 2.5 Number of Positive Signs Number of Negative Signs 3 17 Total number of (+)ve and (-ve) signs 20 Minimum between (+)ve and (-ve) signs 3 P-Value 0.001288414 Screenshot of Analysis: Comparison: The P-value obtained from both Microsoft Excel and hand calculation are 0.001288414 and 0.001288 respectively which is approximately considered as equal.