BOLGATANGA POLYTECHNIC School of Applied Science and Arts Department of Statistics End of First Semester Examination: 2013/2014 (December 2013) Course: Statistical Computing I (STA217) Marking scheme Software: QBASIC/JUST BASIC & MICROSOFT OFFICE EXCEL Q1 (a) input "Sample size: "; n input "Hypothesized variance: "; h for i = 1 to n input "x";i; " = "; x: x(i)= x sum = sum + x(i) next i ave=sum/n cls print "THE SUM = "; sum print print "MEAN = "; ave ' THIS PORTION CALCULATES AND PRINT ' THE VARIANCE for i = 1 to n diff = x(i) – ave s = diff^2 sum2 = sum2 + s next i D = (sum2/(n-1)) print print "THE VAR = "; D ' Chi-square statistic chi = (D/h)*(n-1) print print "Chi(";(n-1);")"; " = "; chi end ½ ½ ½ 1 ½ ½ ½ mark mark mark mark mark mark mark ½ mark ½ mark ½ ½ ½ ½ ½ ½ mark mark mark mark mark mark ½ mark ½ mark ½ mark ½ mark (b) Advantages of array formulae (any three of the following) 3marks i. Array Formulae are concise: You can eliminate columns and rows by packing the calculations into an array formula ii. Array Formulae are powerful: You can easily perform many complex calculations such as multiple conditional sums and counts using array formulae iii. Array Formulae save disk space: Using array formulae covering a range of cells can reduce workbook size compared to equivalent individual formulae. Memory used does not usually seem to reduce significantly iv. Array Formulae offer increased protection: Excel will only allow you to alter a complete array formula block so the user is prevented from accidentally changing a single formula Disadvantages of array formulae (any two of the following) 2marks i. The Black-Box effect: Array Formulae can be complex and hard to understand. Most Excel users do not understand array formulae at all. This can reduce confidence and usability of your spreadsheet ii. Calculation Overhead: Each time an array formula is calculated all of the virtual cells needed by the array formula are calculated, regardless of whether this is required or not. This may cause the array formula to be slower than a non-array equivalent set of formulae iii. Requirement for all components of the array formulae to be the same size: This may require the array formula to perform a large number of unnecessary calculations (c) The student is expected to design an Excel template similar to the figure below: 10marks Q2. (a) A 3 4 5 6 7 8 9 B product barrels to produce profit per barrel ingredients 10 corn 11 hops 12 malt C D KB KR ¢ 18 20 ¢ KB E 24 30 KR 20 2 10 1 30 F total profit G ¢ 1,080.0 total usage 360 60 900 qty available 500 60 900 6marks Objective function: F5 = SUMPRODUCT(C4:D4,C5:D5) 2marks Decision variables: E10 - amount of corn required for a given production E11 - amount of hops required for a given production E12 - amount of malt required for a given production E10 = SUMPRODUCT($C$4:$D$4,C10:D10) - applied to E11 and E12 2marks (b) INPUT A, B ½ mark IF A > B THEN ½ mark GOTO [GREATER] ½ mark END IF ½ mark Y = A + B ½ mark GOTO [QUIT] ½ mark [GREATER] ½ mark Y = A * B ½ mark [QUIT] ½ mark PRINT “Y = “; Y ½ mark END ½ mark Q3 (a) INPUT N ½ mark K = 1 ½ mark J = 1 ½ mark [START] ½ mark T = 1/K ½ mark S = S + T * J ½ mark K = K + 2 ½ mark J = 0 – J ½ mark IF K <= N THEN ½ mark GOTO [START] ½ mark END IF ½ mark PRINT “S = “; S ½ mark END (b) A 1 2 3 4 5 6 Candidate Score Before Score After Difference Average Difference Standard Deviation Count Significance Level 7 8 9 1 0 T-Calculated 1 1 T-Critical 1 2 Decision Formulae B C D E F G H I J 1 2 3 4 5 6 7 8 9 38 41 52 40 45 49 2 4 -3 2 7 3 0 3 1 8 2 4 6 1 9 2 4 5 1 4 1 9 5 5 0 4 9 -1 3 8 3 6 -2 K 1 0 4 0 3 9 -1 Mark s 1 ½ 1.8 3.293 10 0.05 ½ ½ ½ ½ 1.729 ½ 2.262 Do not reject null hypothesis ½ B6 = AVERAGE(B4:K4) B7 = ROUND(STDEV(B4:K4),3) B8 = COUNT(B4:K4) B10 = ROUND(ABS(B6-0)/(B7/SQRT(B8)),3) B11 = ROUND(TINV(B9,B8-1),3) B12 =IF(B10<B11,"Do not reject null hypothesis", "Reject null hypothesis") ½ ½ ½ 1½ 1 1½ Total marks 10 (c) 2 2 4 6 2 1 2 -1 -2 0 Coefficients 1 1 3 -1 -1 2 0 1 1 -2 Marks 2 -1 1 -2 1 Inverse of coefficient matrix =ROUND(MINVERSE(B2:F6),3) -0.385 0.269 0.538 -0.192 -1.333 1 1.667 -1 1.513 -0.692 -1.718 0.923 0.615 -0.231 -0.462 0.308 0.487 -0.308 -0.282 0.077 0.115 0 -0.154 -0.385 0.154 ½ mark Constants 7 8 1 -3 2 Marks ½ mark answer: 2marks =MMULT(B10:F14,H2:H6) 2marks x1 0.801 x2 3.336 x3 0.26 2marks 2marks x4 0.301 x5 0.74 Q4. (a) Data entry 1mark ANOVA Source of Variation Temperature Pressure Interaction Error SS 0.301 0.768 0.069 0.160 Total 1.298 17 df 2 2 4 9 MS F P-value F crit 0.151 8.469 0.009 4.256 0.384 21.594 0.000 4.256 0.017 0.969 0.470 3.633 0.018 3marks i. Hypothesis: H0: No main factor effect H1: There is main factor effect 1marks Both pressure and temperature are significant at the 5% significance level. Thus, both factors influence yield. 2marks ii. Hypothesis: H0: There is no interaction H1: There is interaction 1marks There is no evidence of interaction between the factors at the 5% level of significance. 2marks (b) INPUT "SAMPLE SIZE = "; N PRINT "ENTER THE X-VALUES" FOR I = 1 TO N INPUT "X";I; " = "; X:X(I)= X SUM = SUM + X(I) NEXT I AVE1 = SUM/N FOR I = 1 TO N SSX = X(I)-AVE1 SX = SSX^2 DV = DV + SX NEXT I PRINT "ENTER THE Y-VALUES" FOR I = 1 TO N INPUT "Y"; I; " = "; Y:Y(I)= Y SUM1 = SUM1 + Y(I) NEXT I AVE2 = SUM1/N FOR I = 1 TO N SSXY = ((X(I)- AVE1)*((Y(I)-AVE2))) SXY = SXY + SSXY NEXT I IF SXY = 0 THEN PRINT "DIVISION BY ZERO (0) NOT POSSIBLE" GOTO [QUIT] END IF GRAD = SXY/DV CONST = AVE2 - GRAD*AVE1 ½ mark ½ ½ ½ ½ ½ mark mark mark mark mark ½ ½ ½ ½ mark mark mark mark ½ ½ ½ ½ ½ ½ ½ ½ ½ ½ ½ mark mark mark mark mark mark mark mark mark mark mark ½ mark ½ mark ½ mark IF GRAD<0 THEN G1 = 0-GRAD PRINT "Y = "; CONST; " - "; G1; "*X" ELSE PRINT "Y = "; CONST; " + "; GRAD; "*X" END IF [QUIT] END Q5 (a) i. ii. iii. iv. v. ½ ½ ½ ½ ½ ½ mark mark mark mark mark mark A flowchart is a pictorial presentation of an algorithm that shows its logical structure clearly. 2marks An algorithm is a step-by-step procedure to solve a problem. 2marks An array is a list of variables of the same type. 2marks A prompt is a message displayed by the computer indicating that it is waiting for the user to enter instructions or data. 2marks Trailer data refers to an extra data entered after all the real data to indicate all the real data were entered. 2marks (b) OPEN "members.dat" FOR RANDOM AS #1 LEN=256 1½ mark FIELD #1, 90 AS Name$, 110 AS Address$, 50 AS Rank$, 6 AS IDnumber 1½ mark Name$ = "John Q. Public" ½ mark Address$ = "456 Maple Street, Anytown, USA" ½ mark Rank$ = "Expert Programmer" ½ mark IDnumber = 99 ½ mark PUT #1, 3 1mark GET #1,3 1mark Print Name$ ½ mark Print Rank$ ½ mark Print IDnumber ½ mark close #1 1mark end ½ mark (c) Bin 10 17 31 38 45 >45 Total Frequency The formula below produces the frequency table 160 =FREQUENCY(A1:A1000,C3:C8) 158 270 162 158 92 1000 5marks Q6 (a) (i) Equality of variances F-Test Two-Sample for Variances Mean Variance Observations df F P(F<=f) one-tail F Critical one-tail H1 486.000 9369.571 15.000 14.000 0.959 0.469 0.403 H2 544.933 9770.781 15.000 14.000 2marks H0: Equal variance H1: Unequal variance 1mark Since the p-value (0.469) is greater than the 5% significance level, we fail to reject the null hypothesis of equal variance. Therefore, we conclude that the two sample data have equal variances. 2marks (ii) Homoscedastic t-test or two-sample t-test assuming equal variance (iii) t-Test: Two-Sample Assuming Equal Variances 2marks Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail H1 486.000 9369.571 15.000 9570.176 0.000 28.000 -1.650 0.055 1.701 0.110 2.048 H2 544.933 9770.781 15.000 2marks (α) H0: μ1 = μ2 H1: μ1 ≠ μ2 1mark (β) At the 5% significance level, we fail to reject the null hypothesis of equal means and conclude that the two hospitals do not charge differently. Therefore, there is no need for Mr. Good to report the matter to Medicare payments. 2marks (b) NEXT K NEXK J 4 5 6 5 6 7 6 7 8 1mark 1mark 1mark (c) i. The relationship between any one independent series and the dependent series can be captured by a straight line in a 2-axis graph. ii. The independent variables do not change if the sampling is replicated 2marks 2marks iii. The sample size must be greater than the number of independent variables (N should be greater than k – 1) 2marks iv. Not all the values of any one independent series can be the same 2marks v. The residual or disturbance error terms follow several rules 2marks vi. There are no linear relationships among the independent variables 2marks