Lab Test (Partial)

advertisement
MÀSTER DE SUPPLY CHAIN, TRANSPORT I MOBILITAT (UPC).
CURS 14-15 Q1 – LABORATORY TEST
Anàlisi de Dades de Transport i Logística (ADTL) .
(Data: 20/11/2014 18:00-20:30 h
Professor responsable:
Localització:
Normativa de l’examen:
Durada del test:
Sortida de notes:
Revisió:
Lloc: Aula H5.4)
Lídia Montero Mercadé
Edifici C5 D217
Any Lecture note or Laboraty session is allowed. Not allowed solution of
exams.
2h 00 min
Abans 9 /12/14 al WEB de l’assignatura.
El 9 /12/14 a les 15 hores (C5-217).
Problem 1: All qüestions account for 1 point
According to a poll by the Centre of Opinion Studies (CEO), run by the Catalan Government, in the
November 9th Referendum (9-N), 49.4% of Catalans would vote "yes" to both parts of the question "Do
you want Catalonia to become a new State? If yes, do you want to become an independent country?" By
answering with a double "yes" to the question citizens are backing independence from Spain. In
addition, 12.6% would vote "yes" to the first part and "no" to the second, meaning they are backing a
Catalan State within a federal or confederated Spain. Finally, 19.7% would vote "no", meaning back the
current 'status quo'. Therefore, 32.3% of citizens would be against independence. On the opposite wing,
a poll by ‘El Mundo’, 34,6% of Catalans would vote "yes" to both parts of the question and 39.5% of
citizens would be against independence.
The dataset "catalunya" contains information about
Catalan municipalities over 5,000 inhabitants. Data are
collected at the Catalan Institute of Statistics
(www.idescat.org). The Procon field indicates the
percentage of votes in the last elections of 2012
corresponding to 9N pro-query parties (CiU, ERC, IC i CUP).
The objective is to determine the relationships that can
occur between other collected fields and this variable
(response).
The fields are:
municipi: Name of Municipality
Procon: Percentage of votes in 2012 to pro-query parties
Population: Number of Inhabitants
Area: The area of town (km)
MalePer Percentage of males in population
Dependency: Dependency Ratio (pop <16 and> 65 years / Pobl 16-65
percentage..)
Catalans: percentage of population born in Catalunya
ImmiPer: Percentage of population born outside Calalonia
fincomepc: Basic Family Income per capita disposable
GDPpc: Gross Domestic Product per capita
Participation: Percentage of participation in the 2012 elections
hhpbui: Ratio of households per building
swasteper: Percentage of waste selective collection (recycling rate)
unemployed: Residents targeted in registered unemployment
hhsize: Average number of people per household
province: Province of Catalonia (Barcelona, Girona, Lleida and
Tarragona).
Firstly, it intended to analyze the linear relationship between the percentage of Catalan population of
the municipality (catalans) and the percentage of voting to 9N pro-query parties (procon, response
variable).
1. Get the descriptive results (numerical summaries and graphical representations) to illustrate the
relationship reflected in the data. Make an interpretation of the results.
2. Calculate the corresponding linear model. Indicates test / s statistical / s that allows to establish the
significance of the linear relationship (which test, obtained statistic, p-value and the conclusion is
reached)
Let us check if the relationship discussed in the previous section differs by province in the following
questions.
3. Calculate the model to discuss differences in the relationship between the two variables by
province. Interpret each of the obtained coefficients.
4. Does it globally exist any significant difference by provinces in intercepts and / or the slopes in the
relationship with the percentage of Catalan-born population? Include the test for the intercept and
the slopes, statisticians and p- values for determining the significance.
5. Check whether or not the average percentage of vote to 9-N pro-query parties is the same in all 4
provinces. Are there any 2 by 2 similarities on the means?.
Now, we are going to work with the 13 numeric explanatory variables, initially excluding the
categorical variable.
6. Use a FactoMineR suitable method to determine significant variables in catalunya dataset to explain
the target procon. Provide the output and describe the profile of municipalities that show a
percentage of 9N pro-query parties over the mean.
7. Make a preliminary exploratory analysis to determine whether to transform a variable. Pay attention
to the scale of the provided variables and check explicability using again FactoMineR with the whole
set of variables (transformed and untransformed).
8. Use the "stepwise" procedure (direction = "both") based on the BIC criterion and based on the
model that contains all significant numeric variables according to FactoMineR (either transformed or
untransformed, not both), plus the percentage of immigrants to determine the best model. Avoid
the use of too-correlated variables in the linear predictor. Interpret the resulting model estimates
and goodness of fit output
9. Also for the model obtained above, perform validation of the linear model premises, incorporating
the numerical results, tests and necessary graphical output. Specify the premise that it is validated
on each graph. Identify the following observations: the two most atypical, a priori influential and
influential data.
10. Use the procedure "stepwise" (direction = "both") based on the BIC criterion and based on the
model that contains significant numeric variables according to FactoMineR, plus the percentage of
immigrants (either processed or unprocessed), the categorical variable and the interactions between
the categorical variable and each numerical. Refine the model if this is a problem for interpretation
due to multicolinearity.
Download