Dissolved Oxygen Analysis- Part One Summary Statistics Of the 360 samples of Chicago water collected, the mean amount of dissolved oxygen in the samples was 8.6325mg/L with a standard deviation of 2.471965. The median was 8.75mg/L, and the mode was 7.6mg/L. The dissolved oxygen levels ranged from 0-15mg/L. 150 100 Frequency 50 16 12 8 4 0 0 Frequency Dissolved Oxygen in Greater Chicago District(1997) Dissolved Oxygen (in mg/L) The distribution of the amount of dissolved oxygen was normal. There were no extreme outliers skewing the curve in any one direction. The mean and the standard deviation are good estimations of the center and spread of the observed data. Confidence Interval The data shows that there is a 95% certainty that the average amount of dissolved oxygen in Chicago water is between 8.376285mg/L and 8.888715mg/L. These values are well above the minimum amount of dissolved oxygen (6mg/L) needed to sustain aquatic life in waterways. Having an average amount of dissolved oxygen above 6mg/L means that the samples of Chicago water, on average are safe. Mean Standard Error Median Mode Standard Deviation 8.6325 0.130284 8.75 7.6 2.471965 Significance Test Our null hypothesis for this experiment was that the dissolved oxygen levels would be 6mg/L or higher. Our alternative hypothesis was the dissolved oxygen levels would be less than 6mg/L. The results of our significance test showed that our null hypothesis should not be rejected, meaning the dissolved oxygen levels in Chicago water’s are at or above 6mg/L. The p-value was 0.975 confirming that there is high probability the dissolved oxygen levels are above 6mg/L. In conducting the significance test we have to assume that the data was collected using the same standards, for example: the same instruments were used to collect the samples, the samples of water were collected randomly, etc. We have to assume that the same standards were used because the data may be skewed if improper collection methods were used. Dissolved Oxygen Analysis- Part Two Summary Statistics As shown in the scatter plot below, the overall trend of dissolved oxygen to temperature is negative. The plot follows a clear form in a linear line with few outliers. The conclusion drawn from this data is temperature and dissolved oxygen levels are negatively associated. As temperature increases, the level of dissolved oxygen decreases Dissolved oxygen Levels Dissolved oxygen vs. Temperature 16 14 12 10 8 6 4 2 0 0 5 10 15 20 25 30 Temperature Regression Analysis The R- Square value 0.3955 represents the proportion of variation in dissolved oxygen levels that is explained by its linear relationship with temperature. Therefore, nearly 40 percent of the variation in dissolved oxygen levels in the Chicago water’s sampled is explainable by its linear relationship with temperature. Regression Statistics Multiple R 0.628882916 R Square 0.395493723 Adjusted R Square 0.393492046 Standard Error 1.951549437 Observations 304 The R- Square values falls between its expected values of 0 and 1, but it is not very large. Our value being smaller shows that the data is not fit to the regression line perfectly. To improve this value, all existing outliers could be removed or a larger data set could be obtained increasing the amount of data points around the regression line. Dissolved Oxygen vs. Temperature Residual Plot Dissolved Oxygen Level (mg/L) 10 5 0 0 10 20 30 -5 -10 Temperature Temperature Line Fit Plot Y Dissolved Oxygen Level (mg/L) Predicted Y 20 15 10 5 0 0 10 20 30 Temperature Temperature Normal Probability Plot Dissolved Oxygen Level (mg/L) 16 14 12 10 8 6 4 2 0 0 20 40 60 80 100 Temperature Sample Percentile The normal probability plot shows that the dissolved oxygen values fall closely on a line but not exactly. We can assume since the line is fairly straight that the values are normally distributed. If the outliers were removed from the data a more clear line would be formed. Regression minus Outliers The R- Square value without the outliers present in the data set is 0.4616. With the outliers taken out of the data, nearly 47 percent of the variation in dissolved oxygen levels is explainable by its linear relationship with temperature. Regression Statistics Multiple R 0.679402673 R Square 0.461587992 Adjusted R Square 0.459775157 Standard Error 1.726641684 Observations 299 The R- Square value is still low, but without the outliers present the value is getting larger. As the R- Square value gets larger, the more the data will fit the regression line. DO vs. Temp minus outliers Residual Plot 6 DO Residuals 4 2 0 -2 0 5 10 15 20 25 30 -4 -6 Temperature Temperature minus outliers Line Fit Plot Y Predicted Y Dissolved oxygen (mg/L) 20 15 10 5 0 0 10 20 Temperature 30 Dissolved Oxygen Level (mg/L) Temperature minus outliers Normal Probability Plot 20 15 10 5 0 0 20 40 60 80 100 Temperature Sample Percentile The normal probability plot without the outliers shows the dissolved oxygen values falling closer on a line than the original data. We can assume since the values are almost in a straight line that they are normally distributed. If more values on the outermost edges of the scatter plot were taken away, the R- Square value would continue to get larger and the values would fit closer on the regression line.