1 Final Exam EE318 Engineering Data Analysis (December 2020) Christopher R. Brumley Department of Electrical Engineering University of North Dakota Abstract—The goal of this paper is to explore the statistical nature of empirical models. Explicitly, the straight-line regression model. The regression model is a great tool for engineers to analyze statistical data pertaining to many areas of engineering. Given a sample data set, we can determine whether the sample follows a regression model. If so, we can make precise estimations and predictions on future data samples. I. INTRODUCTION T HIS final exam project is designed to demonstrate the statistical analysis of a regression model. The experimental data gathered are from a rocket motor manufacturer interested in the correlation between the shear strength and the age of the propellant used. A sample size of 20 was gathered for this experiment. The data gathered will be used to determine whether the model follows a simple straight-line regression model in order to make predictions and estimations of our experiment. II. SCATTER DIAGRAM A scatter diagram is an informative way to distinguish whether two or more variables are related based on observed data. A scatter diagram is used when there is no obvious physical mechanism that relates the variables of interest. We will utilize a scatter diagram for the data sample collected to determine the relationship between the age of the propellant used and the shear strength of the bond. A. Observation The following data set was gathered and analyzed. Strength (Y) 2158.70 1678.15 2316.00 2061.30 2207.50 1708.30 1784.70 2575.00 2357.90 2277.70 2165.20 2399.55 1779.80 2336.75 1765.30 2053.50 2414.40 2200.50 2654.20 1753.70 Age (X) 15.50 23.75 8.00 17.00 5.00 19.00 24.00 2.50 7.50 11.00 13.00 3.75 25.00 9.75 22.00 18.00 6.00 12.50 2.00 21.50 Table 1: Sample Data Set The data from Table 1 is represented with a scatter plot below in figure 1 built in the Minitab software. 2 III. TESTING FOR SIGNIFICANCE We will now test for the significance of the regression model depicted below. This will test hypotheses about the slope and intercept of a linear regression model. 𝐻0 : 𝛽1 = 0 𝐻1 : 𝛽1 ≠ 0 𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐: 𝐹0 = Figure 1: Scatter Diagram with straight-line regression 𝑀𝑆𝑅 = 𝑀𝑆𝑅 𝑀𝑆𝐸 𝑆𝑆𝑅 𝑆𝑆𝐸 𝑎𝑛𝑑 𝑀𝑆𝐸 = 1 𝑛−𝑝 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 = 𝑆𝑆𝑅 = ∑(𝑦̂𝑖 − 𝑦)2 B. Straight-line Regression Model 2 𝐸𝑟𝑟𝑜𝑟 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 = 𝑆𝑆𝐸 = = ∑(𝑦𝑖 − 𝑦̂) 𝑖 As you can see in figure 1, the straight-line regression model does seem plausible. It is important to notice that the linear line will not pass through every point, but a negative linear trend is noticeable. Therefor, it is reasonable to assume that the variables are inversely related. Because we have a single output and single input, our linear regression model follows this equation: 𝑦̂ = 𝛽0 + 𝛽1 𝑥 + 𝜀 In this equation, y represents the output, x is the input, β are unknown coefficients and 𝜀 is the random error. The y component in this equation is dependent on the x component. With a more abundant sample size, we can get a more accurate representation of our data linear regression line. 𝛽0 = 𝑦 − 𝛽1 𝑥 ≈ 2625.4 𝛽1 = 𝑛 ∑ 𝑥 ∗ 𝑦 − ∑𝑥 ∗ ∑𝑦 𝟐 𝒏∑𝒙 − (∑𝒙)𝟐 ≈ −36.96 Our data regression equation is checked utilizing the Minitab software depicted below. As you can see in the Minitab depiction above, with a P-Value of zero, evidence shows that we can reject the null hypothesis. IV. ESTIMATION With our simple regression equation, we can make estimates for future data inputs to our system. For example, if we wanted to estimate the mean shear strength of a motor made from twenty-week-old propellant, we could just plug the twenty-week input into our equation. 𝑦 = 2625.4 − 36.96𝑥 𝑦 = 2625.4 − 36.96(20) 𝑦 = 1886.2 3 We can reliably estimate that the shear strength would be 1886.2 psi. V. PREDICTIONS REFERENCES VI. RESIDUAL ANALYSIS The residual analysis if a straight-line regression model is the value each output of y is away from the regression line for the data. A residual analysis is used to examine how well the chosen regression is. The data of the residual analysis should be unbiased and random with minimum variance. The figure above is the residual analysis. The dotted line at zero is representative of the red regression line in the figure below. For example, the first data point above is approximately seventy-five units above the dotted line. This equates to the first data point in the figure below being the same distance above the red line.  D. C. Montgomery, G. C. Runger, and N. F. Hubele, Engineering statistics. Hoboken, NJ: John Wiley &amp; Sons, Inc., 2011.  “Minitab 18 Support,” Minitab. [Online]. Available: https://support.minitab.com/en-us/minitab/18/. [Accessed: 18Dec-2020].