PROJECT 1: UNI- AND BIVARIATE DATA ANALYSIS COURSE: STATISTICS I SEMESTER: SPRING 2009 INSTRUCTOR: NAME OF THE SMALL GROUP PROF. STUDENTS: PUT YOUR NAME(S) HERE SMALL GROUP: PUT A SMALL-GROUP # HERE DATA SET: PUT A PROJECT-GROUP # HERE Table of contents. I. Introduction. II. One-variable analysis. 1. X-variable. a) Graphical description. b) Numerical description. c) Summary. 2. Y-variable. a) Graphical description. b) Numerical description. c) Summary. III. Two-variable analysis. a) Graphical description. b) Numerical description. c) Regression. d) Residual analysis. e) Summary. IV. Summary and conclusions. V. Bibliography. I. Introduction The goal of this project is … (few lines of text here; avoid technical language). II. One-variable analysis In this section we present … (few lines of text here). 1. X variable a) Graphical description Include the histogram here (1/3 of the page). Below the graph, comment on the shape, center, spread of the distribution, as well as absence/presence of the outliers (you can decide about the outliers by looking at the boxplot - you don’t need to include the boxplot here). Remove the outliers, if there are any, before continuing. b) Numerical description DO NOT copy the output from R here, but make a table with two columns. Put the name of the numerical measure in the first column and the corresponding value in the second. Include: number of observations, number of missing values if any, mean, standard deviation, variance, median, quartiles, interquartile range, min, max, range, coefficient of variation. Below the table, based on the numerical summaries, comment on the center (use mean for symmetric outlier-free distribution, but median for skewed), spread (use standard deviation for symmetric distribution, but IQR for skewed), symmetry (compare mean with median). c) Summary Use parts a) and b) to describe the data set. 2. Y variable a) Graphical description Include the histogram here. Below the graph, comment on the shape, center, spread of the distribution, as well as absence/presence of the outliers (you can decide about the outliers by looking at the boxplot - you don’t need to include the boxplot here). Remove the outliers, if there are any, before continuing. b) Numerical description DO NOT copy the output from R here, but make a table with two columns. Put the name of the numerical measure in the first column and the corresponding value in the second. Include: number of observations, number of missing values if any, mean, standard deviation, variance, median, quartiles, interquartile range, min, max, range, coefficient of variation. Below the table, based on the numerical summaries, comment on the center (use mean for symmetric distribution, but median for skewed), spread (use standard deviation for symmetric distribution, but IQR for skewed), symmetry (compare mean with median). c) Summary Use parts a) and b) to describe the data set. III. Two-variable analysis In this section we present … (few lines of text here). a) Graphical description Include one graph with two boxplots (1/3 of the page) to compare the two distributions. Below the graph, comment on how the center of the distribution of X compares with the center of the distribution of Y? Repeat for the spread. Additionally, use coefficients of variation from parts I and II to compare the spread. Include a scatterplot of Y versus X (1/3 of the page). Below the graph, comment on the type of the relationship between X and Y. b) Numerical description Make a table with two columns. Put the name of the numerical measure in the first column and the corresponding value in the second. Include: covariance, correlation. Below the table, comment on the strength and type of the linear relationship (if present). c) Least-squares regression (if suitable) Fit the least-squares regression line for Y on X and write down the equation. Interpret the intercept and the slope. Fit the least-squares regression line for X on Y and write down the equation. Interpret the intercept and the slope. Focus on the regression line for Y on X. Include the scatterplot with the regression line for Y on X (1/3 of the page). Below the graph, quote the coefficient of determination, R2, and interpret it. d) Residual analysis Include the residual plot of residuals versus predicted for the regression line for Y on X. Are residuals centered at 0? Is the vertical spread in the plot roughly the same? Is there any pattern in the residuals? What is the implication of your answers to the previous questions in terms of adequacy of the regression model for your data? e) Summary Use parts a) to d) to summarise the bivariate data set. IV. Overall summary and conclusions In this project we analysed bivariate data set. The variable X … (half of the page of the text; avoid technical language). The variable Y … Relationship between … V. Bibliography P. Newbold Statistics for Business and Economics, Prentice-Hall