Regression Analysis ~ Productivity of a Clerical Team Problem Definition - Purpose of this assignment Data on various jobs performed by a team of clerks was collected for the purpose of analyzing the relations between the volumes of each job and the level of productivity of the clerical team. With a better understanding of these relationships, we should be able to estimate productivity for specific circumstances. Objective The ultimate objective is to create a spreadsheet model that will assist in estimating productivity given certain input values on various jobs performed by the team. To do so, we will determine the dependent and independent variables, their relationship with productivity, estimate a productivity equation using tools such as Excel/Data analysis/Regression, StatPro’s Stepwise Regression Analysis. The forward and backward regression analysis will also be performed to further evaluate the independent variables. Variables Dependent variable: Number of hours in productive work performed by a clerical team in one day (PROD) Possible independent or explanatory variables: 1. Number of pieces of mail opened, sorted and distributed (MAIL) 2. Number of times a clerk assisted another clerk (ASSIST) 3. Number of incoming calls answered or forwarded (PHONE) 4. Number of people showing up in the work area to request assistance (PERSON) 5. Number of sales orders processed (ORDER) These orders are generally processed by phone but not counted towards the PHONE variable. 6. Number of completed schedules such as meetings and/or tickets (SCHEDULE) 7. Number of completed photocopying orders (DUPLICATE) The orders vary in size. Data set 52 days worth of data was compiled and stored. View data set in “Data on Clerical Team Productivity” Visualization of Relationships between variables Below are graphical representations of data series constituted by one of the explanatory variables (X) and the dependent variable which is productivity level (Y). Through the use of “Add trend-line” in Excel, we can visualize the relationships. The respective R-squares help us gauge the degree of fit/correlation between the trend line (least R-square line) and the series of actual observations. Five types of fitting equations (linear, logarithmic, polynomial, power, and exponential) were compare for each series and the one with the highest R-square was traced. (View results in “Data on Clerical Team Productivity” under the respective tabs PRODMAIL, PRODASSIST, PRODPHONE, PRODPERSON, PRODORDER, PRODSCHEDULE, PRODDUPLICATE) All the series turned out to be very non-linear. Among the five equations tested, the polynomial and the power function seem to be the most fitting however the R-square was never very high. Correlation Analysis PROD PROD MAIL ASSIST PHONE PERSON ORDER SCHEDULE DUPLICATE MAIL 1 -0.007650103 0.292819235 0.461519082 0.084798217 0.58731901 0.499012658 0.449594128 ASSIST PHONE PERSON ORDER SCHEDULE DUPLICATE 1 0.011282017 1 0.054803588 0.24521511 1 -0.043117518 0.036861483 0.47780716 1 -0.276585736 -0.015889717 0.508993671 0.442805163 1 -0.015940412 0.338924407 0.34892016 0.167351761 0.382271946 1 -0.311766861 0.12226462 0.508788538 0.275074969 0.566073265 0.297154731 The results of the correlation analysis show that no one explanatory variable can alone explain the level of productivity of the team. The three that have the highest correlation with productivity are (in descending order) schedule, phone, and duplicate. There is no manifestation of multicollinearity therefore we do not need to worry about redundancy. Regression Analysis Using Excel/Tools/Data analysis/Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.75390031 R Square 0.56836568 Adjusted R Square 0.49969658 Standard Error 10.9901846 Observations 52 ANOVA df SS 6998.00941 5314.5029 12312.5123 MS 999.71563 120.784157 Coefficients Standard Error Intercept 60.553792 9.49521302 MAIL 0.00134964 0.00091684 ASSIST 0.08727154 0.04825609 PHONE 0.00868785 0.00916813 PERSON -0.04277815 0.01734491 ORDER 0.04679021 0.0119808 SCHEDULE 0.20921295 0.13022364 DUPLICATE 0.0048192 0.0055105 t Stat 6.37729684 1.47204684 1.8085082 0.947615 -2.46632309 3.90543428 1.60656661 0.87454912 Regression Residual Total 7 44 51 F Significance F 8.27687717 2.0527E-06 P-value 9.4015E-08 0.14812538 0.07736449 0.34850113 0.01761786 0.00031977 0.11530325 0.38656806 Lower 95% Upper 95% 41.4174483 79.6901357 -0.00049814 0.00319741 -0.00998222 0.1845253 -0.00978929 0.027165 -0.07773451 -0.00782178 0.0226445 0.07093592 -0.05323554 0.47166144 -0.00628648 0.01592488 Lower 95.0% Upper 95.0% 41.4174483 79.6901357 -0.00049814 0.00319741 -0.00998222 0.1845253 -0.00978929 0.027165 -0.07773451 -0.00782178 0.0226445 0.07093592 -0.05323554 0.47166144 -0.00628648 0.01592488 1 The p-values > 5% are represented in red (variables have the least significance in explaining the team’s productivity level) The data analysis/regression is run several times till all p-values come out less than 5%. Each additional run is done by using the original data set minus the explanatory variable with the highest p-value. The following independent variables were taken out one at a time (in sequence): DUPLICATE, MAIL, PHONE, SCHEDULE. The last summary sheet is shown below. The significant independent variables using this method turn out to be ASSIST, PERSON, and ORDER. SUMMARY OUTPUT Regression Statistics Multiple R 0.69324186 R Square 0.48058428 Adjusted R Square 0.4481208 Standard Error 11.5427759 Observations 52 ANOVA df Regression Residual Total Intercept ASSIST PERSON ORDER SS MS F Significance F 3 5917.19986 1972.39995 14.8038424 5.9103E-07 48 6395.31245 133.235676 51 12312.5123 CoefficientsStandard Error t Stat 77.7256395 6.91019862 11.2479603 0.13626447 0.04541256 3.00058983 -0.034689 0.01714031 -2.0238248 0.0582678 0.00971385 5.99842319 P-value 4.6921E-15 0.00426472 0.04857403 2.5213E-07 Lower 95% 63.8317621 0.04495645 -0.0691518 0.0387368 Upper 95% Lower 95.0% Upper 95.0% 91.6195169 63.8317621 91.6195169 0.22757249 0.04495645 0.22757249 -0.0002261 -0.0691518 -0.0002261 0.0777988 0.0387368 0.0777988 Using StatPro/Regression Analysis/Stepwise With StatPro’s stepwise regression, the same explanatory variables were selected. This is logical since the underlining methodology was the same. Here, the steps were automated. In step one the variable ORDER was entered into the equation. Then it was ASSIST and finally in step three PERSON. Using StatPro/Regression Analysis/Backward View results in “Data on Clerical Team Productivity” under the tab BackwardRegr. Using StatPro/Regression Analysis/Forward View results in “Data on Clerical Team Productivity” under the tab ForwardRegr. Results of the Regression Analysis & Evaluation of Explanatory Variables Results of the regression analysis The following equation represents an estimation of the clerical team’s level of productivity PROD=77.72564+0.1362645*ASSIST-0.034689*PERSON+0.0582678*ORDER The equation implies that the fixed level of productivity is 77.72564 hours. Depending on the types of jobs that particular day, the productivity can go up (if assistance is requested by other internal clerks or if orders are completed) or down (if external people come to request some assistance). More specifically, the number of hours of productivity will increase by 0.1362645 hours for each additional unit of ASSIST and by 0.0582678 hours for each additional unit of ORDER. Every time a member of the team answers to a request for assistance made by someone external, the team’s productivity declines by 0.034689 hours. However, the corresponding R-square of 0.4805843 is rather low. The search for other significant explanatory variables should be pursued. Additional Evaluation of the Explanatory Variables The forward regression analysis gives us some understanding of the gain in explanation each explanatory variable gave. Here is a summary of the amount and percentages gained in R-square. The variable ASSIST give us the largest gain. Step 1 Step 2 Step 3 Variable entered RChange % square Change order 0.3449 assist 0.4363 0.0913 26.5% person 0.4806 0.0443 10.2% The backward regression analysis gives us some understanding of the loss in explanation resulting by the elimination of an explanatory variable. Here is a summary of the amount and percentages lossed in R-square. The elimination of the variable SCHEDULE results in the largest loss. Variable leaving Step 1 Step 2 Step 3 Step 4 Step 5 duplicate mail phone schedule R-square Change 0.5684 0.5609 0.5450 0.5182 0.4806 -0.0075 -0.0159 -0.0268 -0.0376 % Change -1.3% -2.8% -4.9% -7.3% The Clerical Team Productivity Spreadsheet Model Look up link in Index page under “Clerical Team Productivity Spreadsheet Model” Other Modeling Possibilities – Nonlinear Transformations Given the low R-squares obtained, it would be interesting to manipulate the original values of the given observation set and replace one or several by their matching values based on the bestfitting trend line found earlier in this study. For example: replacing the original observation values for ORDERS by their fitted values on the polynomial equation 0.1017X2 – 16848X + 1146.7 and reruning the regression analysis to see if this would improve our estimated productivity equation (higher R-square).