PROJECT: DATA ANALYSIS

advertisement
PROJECT 1: UNI- AND BIVARIATE DATA ANALYSIS
COURSE: STATISTICS I
SEMESTER: SPRING 2009
INSTRUCTOR: NAME OF THE SMALL GROUP PROF.
STUDENTS: PUT YOUR NAME(S) HERE
SMALL GROUP: PUT A SMALL-GROUP # HERE
DATA SET: PUT A PROJECT-GROUP # HERE
Table of contents.
I. Introduction.
II. One-variable analysis.
1. X-variable.
a) Graphical description.
b) Numerical description.
c) Summary.
2. Y-variable.
a) Graphical description.
b) Numerical description.
c) Summary.
III. Two-variable analysis.
a) Graphical description.
b) Numerical description.
c) Regression.
d) Residual analysis.
e) Summary.
IV. Summary and conclusions.
V. Bibliography.
I.
Introduction
The goal of this project is … (few lines of text here; avoid technical language).
II.
One-variable analysis
In this section we present … (few lines of text here).
1. X variable
a) Graphical description
Include the histogram here (1/3 of the page).
Below the graph, comment on the shape, center, spread of the distribution, as
well as absence/presence of the outliers (you can decide about the outliers by
looking at the boxplot - you don’t need to include the boxplot here). Remove
the outliers, if there are any, before continuing.
b) Numerical description
DO NOT copy the output from R here, but make a table with two columns. Put
the name of the numerical measure in the first column and the corresponding
value in the second. Include: number of observations, number of missing
values if any, mean, standard deviation, variance, median, quartiles,
interquartile range, min, max, range, coefficient of variation.
Below the table, based on the numerical summaries, comment on the center
(use mean for symmetric outlier-free distribution, but median for skewed),
spread (use standard deviation for symmetric distribution, but IQR for
skewed), symmetry (compare mean with median).
c) Summary
Use parts a) and b) to describe the data set.
2. Y variable
a) Graphical description
Include the histogram here.
Below the graph, comment on the shape, center, spread of the distribution, as
well as absence/presence of the outliers (you can decide about the outliers by
looking at the boxplot - you don’t need to include the boxplot here). Remove
the outliers, if there are any, before continuing.
b) Numerical description
DO NOT copy the output from R here, but make a table with two columns. Put
the name of the numerical measure in the first column and the corresponding
value in the second. Include: number of observations, number of missing
values if any, mean, standard deviation, variance, median, quartiles,
interquartile range, min, max, range, coefficient of variation.
Below the table, based on the numerical summaries, comment on the center
(use mean for symmetric distribution, but median for skewed), spread (use
standard deviation for symmetric distribution, but IQR for skewed), symmetry
(compare mean with median).
c) Summary
Use parts a) and b) to describe the data set.
III.
Two-variable analysis
In this section we present … (few lines of text here).
a) Graphical description
Include one graph with two boxplots (1/3 of the page) to compare the two
distributions.
Below the graph, comment on how the center of the distribution of X
compares with the center of the distribution of Y? Repeat for the spread.
Additionally, use coefficients of variation from parts I and II to compare the
spread.
Include a scatterplot of Y versus X (1/3 of the page).
Below the graph, comment on the type of the relationship between X and Y.
b) Numerical description
Make a table with two columns. Put the name of the numerical measure in the
first column and the corresponding value in the second. Include: covariance,
correlation.
Below the table, comment on the strength and type of the linear relationship
(if present).
c) Least-squares regression (if suitable)
Fit the least-squares regression line for Y on X and write down the equation.
Interpret the intercept and the slope.
Fit the least-squares regression line for X on Y and write down the equation.
Interpret the intercept and the slope.
Focus on the regression line for Y on X. Include the scatterplot with the
regression line for Y on X (1/3 of the page).
Below the graph, quote the coefficient of determination, R2, and interpret it.
d) Residual analysis
Include the residual plot of residuals versus predicted for the regression line
for Y on X.
Are residuals centered at 0? Is the vertical spread in the plot roughly the
same? Is there any pattern in the residuals? What is the implication of your
answers to the previous questions in terms of adequacy of the regression
model for your data?
e) Summary
Use parts a) to d) to summarise the bivariate data set.
IV.
Overall summary and conclusions
In this project we analysed bivariate data set. The variable X … (half of the page
of the text; avoid technical language). The variable Y … Relationship between …
V.
Bibliography
P. Newbold Statistics for Business and Economics, Prentice-Hall
Download