INFO 630—EVALUATION OF INFO SYSTEMS HOMEWORK #1 Name: ____________________ Date: __________ Show your work for all problems done manually (outside of SPSS). When there is SPSS output, do NOT copy/paste the whole output – just provide the data and/or graphs requested. 1. Install PASW/SPSS on your computer. Open the employee data file, “Employee data.sav”. (This data file is installed with SPSS, e.g. under C:\Program Files\SPSSInc\PASWStatistics18\Samples\English.) Compute the mean beginning salary and its standard deviation for all employees (Analyze > Descriptive Statistics > Descriptives). Select beginning salary. Compute the mean beginning salary and standard deviation for males and females (Analyze > Compare Means > Means, beginning salary is Dependent variable, gender of employee is Independent). Complete the following table. Remember to include units, and round off answers appropriately! Type of Employee Mean Beginning Salary Standard Deviation Females Males All 2. What are the differences in scope among the following certifications or assessments, in terms of the organizational units being assessed? In other words, what receives each of these certifications (person, company, etc.)? a) CMMI Level 2 b) CMMI Level 3 or higher c) ISO 9000 3. Open the data file “r_squared.sav”. This file gives data for three examples of curve fitting to a straight line (a.k.a. linear regression analysis). a) For each set of data (X1 and Y1, X2 and Y2, X3 and Y3) generate a linear regression analysis using Analyze > Regression > Linear. Recall that X is independent, and Y is dependent. Use the resulting data from the Model Summary section and the Coefficients section to fill in the following table. The equation of the line is: y = (Xi)*x + (Const.) See the Statistics for Software Process Improvement handout, the section on Curve Fit Formulas for SPSS, for help finding the values and standard error terms 1 Case R Square Std Error Xi (B value +/- std error) Const. (B value +/- std error) X1-Y1 X2-Y2 X3-Y3 b) Plot each case (Y1 vs X1, etc.) separately using Graphs > Legacy Dialogs > Scatter/Dot > Simple Scatter (in older SPSS versions, use: Graphs > Scatter > Simple). Give each plot a title and show it. c) Describe the distribution of data for each case, and describe the resulting effect on R Square and the standard error associated with the curve fit parameters. 4. A uniform set of definitions and collection mechanisms for effort and size might be advantageous across organizations. On the other hand, while advantageous, a uniform set of definitions and collection mechanisms might not be feasible for software problems and defects across organizations. Why? 5. Before implementing a new software development environment, the productivity of your team was 134 +/- 10 lines of code per staff month (LOC/month), where 10 LOC/month is the standard error. After getting used to the new environment, the productivity was measured at 177 +/- 13 LOC/month. To a 95% level of confidence, was there a statistically significant improvement in productivity? Explain, and show your work! (Hint: see the Statistical Q&A section in Statistics for Software Process Improvement.) 6. We want to know if there is a relationship between the productivity of programmers on a project (in lines of code per month), and its average staffing level (computed by P=E/T where E is the reported effort of the entire project in staff-months and T is the duration of the project in calendar months). In other words, we are trying to find out what happens to productivity for larger projects. To answer this, we have a data file of 187 projects from a number of sources. These projects were written in a variety of languages and span the entire spectrum of size and complexity. a) Open the data file "project_productivity.sav". The variable “p_range” is the range of staffing level, P, for the number of projects given by “freq” (for “frequency”). The average value of productivity for those projects is given by “prdctvty”, which is expressed in LOC/staffmonth. b) Determine the average productivity for the entire data set. Use Analyze > Descriptive Statistics > Descriptives Select prdctvty Output: Write a sentence to describe the mean and standard deviation. Remember to include units and round results appropriately! c) Now weight the results by how many times each staffing level size category (p_range) was reported. Select Data > Weight Cases Select "Weight cases by" and choose "freq" Repeat step d to reassess the mean and standard deviation. Output: Same as step b, noting that data were weighted by the frequency of each case. 2 d) Create a new variable called 'sequence' which has the average p_range for each row. So the first value is (0+1)/2=0.5, the second is (1+2)/2=1.5, etc. For the last entry, use sequence=150 for p_range >100. e) Generate a scatter plot Use Graphs > Legacy Dialogs > Scatter/Dot > Simple Scatter Set Y = prdctvty and X = sequence Output: your graph! Note for copying SPSS graphs into MS Word: In SPSS, right click on the graph and copy it. Go into Word, click once where you want it to go, and select Edit / Paste Special. Select the Picture option and click Ok. Right click on the picture, and Format Picture. Select the Layout tab, and make the picture is “In line with text.” Click Ok. Now the image from SPSS is easy to resize and center as needed. f) Generate a regression analysis of this data. Use Analyze > Regression > Curve Estimation Set Dependent Variable = prdctvty, and Independent Variable = sequence Select Linear and Logarithmic Models Check the box to have it “Display ANOVA table” Output: Make a 3x3 table with the following data - Use "R Square" and "Standard Error" as data columns, and Linear and Logarithmic for the rows. Fill it in! g) Output: Based on their R2 values, which curve has the best fit? Also note that the curve with the best fit should be visually closest to the Observed curve, which is your input data. Is it a good fit? h) Output: What does the best-fitting model tell you about the relationship between productivity and project size as measured by productivity? i) Output: What do you think are the major reasons that productivity behaves this way as a function of team size? 7. This problem uses an excerpt from a Change Request (CR) database. CRs are problem reports for an existing software system. We want to determine how long CRs took to be completed (closed), based on their severity (criticality). a) Open the data file "cr_closure.sav". b) Define a new variable. Call it "clos_tim", and make its type Numeric. Note that SPSS variable names formerly could only have eight characters. We won't use them here, but labels may be attached to variables to describe them more fully. c) Calculate the value of clos_tim Use Transform > Compute Variable Set the Target Variable as clos_tim (you have to type it in by hand) For the Numeric Expression, use: (closedat-opendate)/86400 [WHY? Dates are recorded in SPSS by the number of seconds. Hence the difference between two dates is the number of seconds apart they are. To convert this to days, divide by 60 sec/min*60 min/hr*24 hr/day, which is 86,400 sec/day.] d) Now calculate the mean time to closure for CRs by their severity. Use Analyze > Compare Means > Means Select clos_tim as Dependent variable, and severity as Independent 3 Output: Fill in the table below. Be sure to identify the time unit for the mean and standard deviation. Time to Close for Change Requests Change Request Type Severity 1 Severity 2 Severity 3 All CRs Mean Here is the grading scale for the questions. Problem 1 2 3a 3b 3c 4 5 6b 6c 6d 6e 6f 6g 6h 7d TOTAL Points 3 3 6 3 (1 per graph) 2 3 3 1 1 1 2 1 1 1 2 35 points 4 Std. Deviation