This pdf contains exactly the same questions on the Mock eExam and is provided to you for easy access to the practice and past exam questions. Please also use the Mock eExam. Sole purpose of the Mock eExam is to get you familiar with the eExam format. Mock eExam link: https://student-eassessment.monash.edu/course/view.php?id=4262 1 Past and Practice Exam Questions ETF2121-ETF5912 Data Analysis in Business Semester 2, 2021 The formulae sheet and statistical tables are available on Moodle. 2 Multiple Choice Questions Question 1 A Type I error occurs when we: (a) (b) (c) (d) (e) reject a false null hypothesis. reject a true null hypothesis. do not reject a false null hypothesis. do not reject a true null hypothesis. none of the above. Question 2 If an estimated regression line has a y-intercept of 10 and a slope of 4, then when x = 2, the actual value of y is: (a) (b) (c) (d) (e) 18 15 14 13 unknown Question 3 In the estimated probit model: Pr(πποΏ½ = 1|ππ) = Φ(π½π½Μ0 + π½π½Μ1 ππ) Μ Μ if π½π½0 = −6, π½π½1 = 2 and ππ = 2, then the predicted probability that ππ = 1 is: (a) (b) (c) (d) (e) 0.5 0.9972 0.4772 0.0228 none of the above Question 4 The residual is defined as the difference between: (a) the actual value of y and the predicted value of y. (b) the actual value of x and the predicted value of x. (c) the actual value of y and the predicted value of x. (d) the actual value of x and the predicted value of y. (e) none of the above. 3 Question 5 In a sign test, the following information is given: number of zero differences = 3, number of positive differences = 20, and number of negative differences = 5. The value of the test statistic is: (a) (b) (c) (d) (e) 5 4 3 2 none of the above Question 6 A non-parametric method to compare two populations, when the samples are independent and the data are ranked, is the: (a) Wilcoxon signed rank sum test (b) Sign test (c) Wilcoxon rank sum test (d) Matched pairs t-test (e) none of the above 4 Question 7 Suppose you want to conduct a survey of small business owners in Victoria. The state Directory of Businesses gives a list of 1000 registered business owners, and hence the size of the population is 1000. You want to obtain a sample size of 30 business owners. There are two sampling techniques that can be used. Indicate which of these sampling techniques are described below. (7.a) Group the businesses according to 5 business types (retailers, agriculture, manufacturing, financial services and advertising) and then randomly select a sample of 6 business owners from each business type. (7.b) Assign a number to each registered business in the state Directory of Businesses, and then use a random number generator to select the business owners to be included in the sample of size 30. Question 8 When conducting an analysis of the equality of means of more than two populations (k > 2), what is the major advantage of conducting a parametric F-statistic (ANOVA) instead of doing multiple t-tests of each population mean against all the other population means separately? Briefly explain your answer. Question 9 A regression analysis was performed to study the relationship between a dependent variable and five independent variables. The following information was obtained from the regression analysis: ππππππ = 2400, SSR = 9600, n = 40 Determine the F-statistic. Question 10 In a simple regression model, when every observation is on the regression line, the sum of squares of the error SSE = 0, the standard error of estimate π π ππ = 0, and the coefficient of determination π π 2 = 1. True or False and give a brief explanation. 5 Question 11 Specify the test statistic and the decision rule for each of the following Wilcoxon Rank Sum tests. (11.a) H0 : The two population locations are the same HA : The location of population A is to the left of the location of populations B nA = 4 nB = 6 πΌπΌ = 0.025 (11.b) H0 : The two population locations are the same HA : The location of population A is different from the location of populations B nB = 25 πΌπΌ = 0.05 nA = 20 Question 12 What is one important potential benefit of using a matched pairs experiment to test the difference between two populations rather than independent samples? Briefly explain your answer with an example. Question 13 A Gallup Organisation poll a randomly selected American adults in July 2002 found that 55% of those surveyed felt that their weight was about right. The sampling error for the survey was given as 3%. (13.a) Find a 95% confidence interval estimate of the percentage of American adults who think their weight is about right. (13.b) Based on the interval computed in part (13.a), explain whether it is reasonable to say that more than 50% of American adults think their weight is about right. 6 Question 14 An importer is considering whether to import and sell a new hair care product here in Australia. To begin with, he wants to try to sell the product in one large metropolitan market. He wants to choose the market where people spend the most on hair care products currently. To find this out, he randomly samples 210 adults who live in Melbourne and another 250 adults who live in Sydney and asks each one about their total spending in dollars on hair care products over the past year. The unknown population variances are assumed to be equal. Using this information, answer the following questions. Begin this question 14 on a new page. Clearly label this question number of the new page. (14.a) What is the name of the test statistic that you may be able to use to test whether Melbourne or Sydney people spend more, on average, on hair care products using the sample data? Briefly explain why this test statistic may be appropriate? (14.b) If you had all the data on total spending on hair care products for each individual in the two samples, what could you do to determine what the distribution of the data is? Explain your answer. (14.c) Write down the appropriate null and alternative hypotheses for the test statistic in part (14.a) if you want to test whether spending on hair care products is higher in Sydney (population 2 or B) than in Melbourne (population 1 or A). (14.d) What are the steps involved in constructing the test statistic in part (14.a)? 7 Use the following information to answer Questions 15 and 16. The government is concerned about the rate of smoking in the population. It is considering whether to raise the tax on cigarettes to reduce the number of people smoking. To see if such a policy will have an effect, the government undertakes a study of the determinants of smoking in the community. The government's chief statistical analyst conducts a random survey of 807 adults in the country's population (from all states) and collects the following variables. EDUC CIGPRIC = = AGE RESTAURN = = years of schooling of the person average price of cigarettes including taxes, in dollars per pack, in the state where the person lives of person in years 1 if the state where a person lives in has government imposed restaurant smoking restrictions 0 otherwise Using this information, the analyst constructed a binary variable SMOKE of whether the individual smoked or not; that is SMOKE = 1 if an individual smokes 0 otherwise The analyst wanted to identify the main determinants of the probability of an individual person smoking using this information. To do this, the analyst estimated the Logit model. The results of this estimation, using EViews, are provided below. Dependent Variable: SMOKE Method: ML - Binary Logit (Quadratic hill climbing / EViews legacy) Sample: 1 807 Included observations: 807 Convergence achieved after 4 iterations Covariance matrix computed using second derivatives Variable Coefficient Std. Error z-Statistic Prob. C AGE EDUC CIGPRIC RESTAURN 1.656162 -0.016164 -0.111039 -0.003519 -0.465569 1.001650 0.004514 0.026820 0.015581 0.180173 1.653434 -3.581026 -4.140104 -0.225849 -2.584019 0.0982 0.0003 0.0000 0.8213 0.0098 Mean dependent var S.E. of regression Sum squared resid Log likelihood Restr. log likelihood LR statistic Obs with Dep=0 Obs with Dep=1 0.384139 0.478753 183.5927 -520.8597 -537.5055 33.29162 497 310 S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Avg. log likelihood Total obs 0.486693 1.305724 1.340619 1.319124 -0.645427 807 8 Question 15 Begin this question 15 on a new page. Clearly label this question number of the new page. (15.a) Write down the estimated logit model. Report the results to 3 decimal places. (15.b) Interpret the sign of the estimated coefficients on the AGE and EDUC variables. (15.c) Interpret the sign of the estimated coefficient on CIGPRIC. Test if the coefficient on the CIGPRIC variable is less than zero at πΌπΌ = 0.05. Do the six steps of the test. What do you conclude from this test about the potential effectiveness of a policy of increasing the tax on cigarettes to reduce smoking? Question 16 Begin this question 16 on a new page. Clearly label this question number of the new page. (16.a) Suppose Ms. A has the following characteristics: EDUC = 13 years CIGPRIC = $60 AGE = 40 years old Using the estimated logit model, and assuming that the state where Ms. A lives in has government imposed restaurant smoking restrictions, calculate the probability that Ms. A smokes. (16.b) Suppose Ms. B has the following characteristics: EDUC = 13 years CIGPRIC = $60 AGE = 40 years old Using the estimated logit model, and assuming that the state where Ms. B lives in has no government imposed restaurant smoking restrictions, calculate the probability that Ms. B smokes. (16.c) Based on the answers to (16.a) and (16.b), do you think that by restricting smoking in restaurant we can reduce the probability of smoking? 9 Question 17 The quarterly household spending on clothing, denoted π¦π¦π‘π‘ , (in millions of dollars) for the first quarter of 1975 to the first quarter of 2005 is depicted in the line graph below. An excerpt of these data are shown in the following table. Year t Q1 Q2 π¦π¦π‘π‘ 1975 4818 1 1 0 4800 2 0 1 4866 3 0 0 5139 4 0 0 1976 4810 5 1 0 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 2005 9138 121 1 0 Q3 0 0 1 0 0 .. .. .. 0 Q4 0 0 0 1 0 .. .. .. 0 Note: (i) The time variable t equals 1 in the first quarter of 1975; (ii) Q1, Q2, Q3 and Q4 are the dummy variables defined, respectively, as Q1 = 1 if 1st quarter (January to March) 0 otherwise Q2 = 1 if 2nd quarter (April to June) 0 otherwise Q3 = 1 if 3rd quarter (July to September) 0 otherwise Q4 = 1 if 4th quarter (October to December) 0 otherwise 10 Begin this question 17 on a new page. Clearly label this question number of the new page. (17.a) Consider the following estimated linear trend model with dummy variables. Dependent Variable: Y Method: Least Squares Sample: 1975Q1 2005Q1 Included observations: 121 Variable Coefficient Std. Error t-Statistic Prob. C T Q1 Q2 Q3 4713.070 37.09995 -587.1990 -582.2334 -364.4667 114.3054 1.221092 120.1375 121.1366 121.1181 41.23227 30.38259 -4.887724 -4.806420 -3.009183 0.0000 0.0000 0.0000 0.0000 0.0032 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 0.892263 0.888548 469.0647 25522519 -913.3781 240.1751 0.000000 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 6591.008 1405.043 15.17980 15.29533 15.22672 0.122454 Construct a forecast value of π¦π¦π‘π‘ for the second quarter of 2005 based on the estimated linear trend model with dummy variables. Show your workings. (17.b) Consider the estimated AR(1) model for βπ¦π¦π‘π‘ which is the change of π¦π¦π‘π‘ . οΏ½ π‘π‘ = 48.04 − 0.26Δπ¦π¦π‘π‘−1 Δπ¦π¦ where Δπ¦π¦π‘π‘ = π¦π¦π‘π‘ − π¦π¦π‘π‘−1 . Construct a forecast value of π¦π¦π‘π‘ for the second quarter of 2005 based on the estimated AR(1) model. If you think there is not enough information given to answer this question, you may write “this question cannot be answered on the basis of the information given” as your answer and write down what information you need to answer this question. 11 Question 18 A real estate agent thinks that the street number is one of the factors that determine the sale price of houses in the area. Her theory stems from her observation that many individuals who live in the area believe that street numbers containing an “8” will bring good luck, while those containing a “4” will bring bad luck, to those people living in the house. In order to investigate this matter, the agent randomly selected 28 recently sold houses in the area and collected the following information: PRICE - house sale price in thousands of dollars LAND_SIZE - size of the block of land on which the house is built in metres squared BEDROOMS - number of bedrooms in the house NUMBER - the street number of the house She then constructed two dummy variables as follows: SN4 = 1 if the house's street number contains the number “4” = 0 otherwise SN8 = 1 if the house's street number contains the number “8” = 0 otherwise Provided below are the results of an Ordinary Least Squares regression model. Dependent Variable: PRICE Method: Least Squares Sample: 1 28 Included observations: 28 Variable Coefficient Std. Error t-Statistic Prob. C LAND_SIZE BEDROOMS SN4 SN8 -155.6170 0.111849 86.38813 -50.48589 38.97373 42.43146 0.043058 6.651671 18.44864 25.89519 -3.667491 2.597625 12.98743 -2.736564 1.505057 0.0013 0.0161 0.0000 0.0118 0.1459 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 0.904364 0.887732 41.09795 38847.96 -141.0232 54.37405 0.000000 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 334.8214 122.6570 10.43023 10.66812 10.50295 1.767468 Begin this question 18 on a new page. Clearly label this question number of the new page. (18.a) Interpret the estimated coefficient on LAND_SIZE. (18.b) Interpret the estimated coefficient on SN4. Is the sign of the coefficient what you would expect given the agent's theory? Briefly explain why or why not. (18.c) Test whether the sales agent's theory about the sale price of houses with street numbers containing the number “8” is supported by the data at πΌπΌ = 0.05. 12