David Patterson, College of Social Work
The University of Tennessee
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development
Regression Analysis of HMIS Data
•
– AKA - Linear regression, Ordinary Least Squares (OLS)
– Bivariate regression -
• measures the association or relationship between a dependent variable (DV) and an independent variable (IV).
• Estimates the measurable difference in the DV for each one-unit of change in an IV.
– Multiple regression -
• Measures the relationship between a single DV and two or more IV.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 2
Regression Analysis of HMIS Data
•
– It can us understand possible causal relationship between certain outcomes (DV) and possible causal factors (IV).
• E.G., Length of stay prior to housing (DV) and age (IV1), duration of homelessness (IV2), and current income (IV3).
• Stated another way, how is length of stay prior to housing predicted independently by each IV and through their combined influence?
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 3
Regression Analysis of HMIS Data
•
– Level of measurement
• Many HMIS variables are NOT continuous variables, required for the DV in multiple regression.
• Most are nominal, e.g. race, zip code, gender, disability status.
– High levels of missing data in many variables
• Common in social services data sets
• Requires careful evaluation of extent and pattern of missing data.
• Selection and implementation of missing data procedure
• Added complexity with nominal level data
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 4
Regression Analysis of HMIS Data
•
– Normality scores or observations obtained would be normally distributed in the population of interest. Assumed if sampling is random or includes random assignment. Generally not the case in HMIS data. (Explore with a frequency distribution).
AGE
7 0
• Note- Age is not quite
6 0 normally distributed in this
5 0 graph.
4 0
3 0
2 0
1 0
0
1
7 .5
2
2 .5
2
7.5
3
2.5
3
7.5
4
2.5
4
7. 5
5
2 .5
5
7 .5
6
2 .5
6
7 .5
7
2 .5
7
7.5
Std . Dev = 1 1 . 2 8
Me an = 3 8 .9
N = 6 2 6 . 0 0
AGE
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 5
Regression Analysis of HMIS Data
• Equality of Variances - Homoscedasticity
– Points in the scatterplot of the residuals (difference between the observed and predicted values) are randomly distributed about a horizontal line from the mean of the residuals.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 6
Regression Analysis of HMIS Data
•
– Scores or observations are independent of each other.
Independence means that the observations or values independently derived and that one event or value will not depend on another event or value.
– Durbin-Watson statistic between 1.5 and 2.5
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 7
Regression Analysis of HMIS Data
•
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 8
Regression Analysis of HMIS Data
•
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 9
Regression Analysis of HMIS Data
• Exploratory Research Question
– Is there are relationship between duration of homelessness
(days) and age, years of education, and weight?
• Method
– Regression analysis with SPSS and Excel using two data sets.
• Intention
– Demonstrate the utility of these two tools in regression analysis with HMIS data.
– Demonstrate the challenges of regression analysis with HMIS data.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 10
Regression Analysis of HMIS Data
•
1. Report downloaded from
HMIS data system.
2. Data cleaned and file prepared in Excel.
3. Excel file opened in SPSS.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 11
Regression Analysis of HMIS Data
•
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 12
Regression Analysis of HMIS Data
•
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 13
Regression Analysis of HMIS Data
•
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 14
Regression Analysis of HMIS Data
Mean = average value for each variable.
Standard deviation = measures the dispersion of values from the mean.
Together they describe the shape of the distribution for each variable
Correlations measure the strength of the relationship between two variables.
Correlation values range between
1.0 and -1.0.
The closer to zero, the weaker the correlation
Note the weak correlations between the DV and each of the IV
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 15
Regression Analysis of HMIS Data
• Results - While the results may be significant, are there problems with the model?
Note the R Square value.
R Square indicates the proportion of variation in the DV explained by the IV.
In this model, an R
Square of .031 means that the 3 IV account for very little of the variance in the DV.
The fact that the model is statistically significant may be due to the large N (1550).
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 16
Regression Analysis of HMIS Data
•
Distribution of the DV is highly skewed.
Departure from the straight line indicates the data are not normally distributed.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 17
Regression Analysis of HMIS Data
• Second SPSS regression analysis with smaller sample.
– N= 626
– (Constrained data set limiting homelessness to > 1 month and < 1 year.
Is the distribution normal or skewed?
The F-stat used in regression to test the significance of the model, is quite robust to violations of the assumption of normality.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 18
Regression Analysis of HMIS Data
•
Note the weak correlations between the DV and each of the IV.
Note there is no statistically significant bivariate correlation between the DV and each of the IV.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 19
Regression Analysis of HMIS Data
Note the regression model is not significant.
The results suggest that for this sample age, years of education, and weight cannot be used to predict duration (days) of homelessness.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 20
Regression Analysis of HMIS Data
•
In this data set (N = 626), the distribution is much less skewed than the (N = 1550) data set.
The data are more normally distributed than the (N = 1550) data set.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 21
Regression Analysis of HMIS Data
•
1. Report downloaded from
HMIS data system.
2. Data cleaned and file prepared in Excel.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 22
Regression Analysis of HMIS Data
• Excel produced histograms to examine the shape of the distributions.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 23
Regression Analysis of HMIS Data
• Steps of Excel regression analysis
– Can use the Chart Wizard to produce scatterplots to examine the bivariate correlation between the DV and the IV of the model.
There is a weak correlation between the variables.
The variables are not correlated.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 24
Regression Analysis of HMIS Data
•
1. Select Data
Analysis under
Tools in the menu bar.
2. If Data Analysis does not appear, then select Addins. The Analysis
ToolPak.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 25
Regression Analysis of HMIS Data
•
Specify the input range for the Y (IV) and the X
(DV) variables
Check boxes for all plots.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 26
Regression Analysis of HMIS Data
• Excel regression statistics are the same as the results from the second SPPS analysis.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 27
Regression Analysis of HMIS Data
•
Residual plots are used to check for regression assumptions. Significant patterns in the scatterplot suggest a violation of regression assumptions.
Use to check for the
Normality assumption.
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 28
Regression Analysis of HMIS Data
Stat
R Square
SPSS
.002
Excel
.00
F-stat
Sig.
.506
.678
.51
.68
Durbin-Watson 1.972
Not reported
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 29
Spreadsheet Data Analysis Resources
Spreadshe et Tutorial http://www.usd.edu/trio/tut/excel/index.html
Using Formulas and Functions http://www.meadi nkent.co.uk/excel.htm
Setting Up Data Analysis Tools in Excel http://www-micro.msb.le.ac.uk/1010/toolpak.html
Excel Spreadsheet Tips http://www.mrexcel.com/articles.shtml
Data Analys is with Spreads heets (with CD-ROM) http://www.ablongm an.com/catalog/academic/product/0,114
4,020540751X,00.html
Using S preadsheets for Data Collection, Statistical Analysis and Graphical Representation http://web.utk.edu/%7Edap/Random/Order/Start.htm
September 18-19, 2006 - Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development 30