MÀSTER DE SUPPLY CHAIN, TRANSPORT I MOBILITAT (UPC). CURS 14-15 Q1 – LABORATORY TEST Anàlisi de Dades de Transport i Logística (ADTL) . (Data: 20/11/2014 18:00-20:30 h Professor responsable: Localització: Normativa de l’examen: Durada del test: Sortida de notes: Revisió: Lloc: Aula H5.4) Lídia Montero Mercadé Edifici C5 D217 Any Lecture note or Laboraty session is allowed. Not allowed solution of exams. 2h 00 min Abans 9 /12/14 al WEB de l’assignatura. El 9 /12/14 a les 15 hores (C5-217). Problem 1: All qüestions account for 1 point According to a poll by the Centre of Opinion Studies (CEO), run by the Catalan Government, in the November 9th Referendum (9-N), 49.4% of Catalans would vote "yes" to both parts of the question "Do you want Catalonia to become a new State? If yes, do you want to become an independent country?" By answering with a double "yes" to the question citizens are backing independence from Spain. In addition, 12.6% would vote "yes" to the first part and "no" to the second, meaning they are backing a Catalan State within a federal or confederated Spain. Finally, 19.7% would vote "no", meaning back the current 'status quo'. Therefore, 32.3% of citizens would be against independence. On the opposite wing, a poll by ‘El Mundo’, 34,6% of Catalans would vote "yes" to both parts of the question and 39.5% of citizens would be against independence. The dataset "catalunya" contains information about Catalan municipalities over 5,000 inhabitants. Data are collected at the Catalan Institute of Statistics (www.idescat.org). The Procon field indicates the percentage of votes in the last elections of 2012 corresponding to 9N pro-query parties (CiU, ERC, IC i CUP). The objective is to determine the relationships that can occur between other collected fields and this variable (response). The fields are: municipi: Name of Municipality Procon: Percentage of votes in 2012 to pro-query parties Population: Number of Inhabitants Area: The area of town (km) MalePer Percentage of males in population Dependency: Dependency Ratio (pop <16 and> 65 years / Pobl 16-65 percentage..) Catalans: percentage of population born in Catalunya ImmiPer: Percentage of population born outside Calalonia fincomepc: Basic Family Income per capita disposable GDPpc: Gross Domestic Product per capita Participation: Percentage of participation in the 2012 elections hhpbui: Ratio of households per building swasteper: Percentage of waste selective collection (recycling rate) unemployed: Residents targeted in registered unemployment hhsize: Average number of people per household province: Province of Catalonia (Barcelona, Girona, Lleida and Tarragona). Firstly, it intended to analyze the linear relationship between the percentage of Catalan population of the municipality (catalans) and the percentage of voting to 9N pro-query parties (procon, response variable). 1. Get the descriptive results (numerical summaries and graphical representations) to illustrate the relationship reflected in the data. Make an interpretation of the results. 2. Calculate the corresponding linear model. Indicates test / s statistical / s that allows to establish the significance of the linear relationship (which test, obtained statistic, p-value and the conclusion is reached) Let us check if the relationship discussed in the previous section differs by province in the following questions. 3. Calculate the model to discuss differences in the relationship between the two variables by province. Interpret each of the obtained coefficients. 4. Does it globally exist any significant difference by provinces in intercepts and / or the slopes in the relationship with the percentage of Catalan-born population? Include the test for the intercept and the slopes, statisticians and p- values for determining the significance. 5. Check whether or not the average percentage of vote to 9-N pro-query parties is the same in all 4 provinces. Are there any 2 by 2 similarities on the means?. Now, we are going to work with the 13 numeric explanatory variables, initially excluding the categorical variable. 6. Use a FactoMineR suitable method to determine significant variables in catalunya dataset to explain the target procon. Provide the output and describe the profile of municipalities that show a percentage of 9N pro-query parties over the mean. 7. Make a preliminary exploratory analysis to determine whether to transform a variable. Pay attention to the scale of the provided variables and check explicability using again FactoMineR with the whole set of variables (transformed and untransformed). 8. Use the "stepwise" procedure (direction = "both") based on the BIC criterion and based on the model that contains all significant numeric variables according to FactoMineR (either transformed or untransformed, not both), plus the percentage of immigrants to determine the best model. Avoid the use of too-correlated variables in the linear predictor. Interpret the resulting model estimates and goodness of fit output 9. Also for the model obtained above, perform validation of the linear model premises, incorporating the numerical results, tests and necessary graphical output. Specify the premise that it is validated on each graph. Identify the following observations: the two most atypical, a priori influential and influential data. 10. Use the procedure "stepwise" (direction = "both") based on the BIC criterion and based on the model that contains significant numeric variables according to FactoMineR, plus the percentage of immigrants (either processed or unprocessed), the categorical variable and the interactions between the categorical variable and each numerical. Refine the model if this is a problem for interpretation due to multicolinearity.