Data Analysis 101 - STARS Computing Corps

An Overview of Data Analysis for Evaluation Assistant Research Projects Adapted from the Association for Institutional Research Customized for the STARS Alliance Evaluation Assistants by Audrey Rorrer, UNC Charlotte Outline  Key Considerations in Data Analysis  Components of Data Analysis  Distinguishing Data Types  Distinguishing Different Types of Analyses  Overview of Different Statistical Software  EA Project Example:  Terrell Perrotti, South Carolina State University Key Considerations in Data Analysis  Identify the purpose of the analysis or project  Understand the sample(s), i.e. the people, under study  Understand the instruments being used to collect data  Be cognizant of data layouts and formats  Establish a unique identifier if matching or merging is necessary (e.g. for pre to post comparisons) Components of Data Analysis Statement of research question(s) Methods used to answer research question(s) Timeline Budget (usually, but not necessarily for EA projects) Data management procedures (paper surveys, excel, etc. as applicable)  Design for measuring, collecting, scoring, equating, etc.  Data cleaning procedures (e.g. removing outliers)  Quality control procedures at every step in the project      Good news: you’ve already addressed these components in your EA Project Timelines! Some Examples of Analyses  Frequency Distributions and Cross –Tabulations  How many people responded with a correct answer to item 1  How many of those responding correctly were male and female  Descriptive Statistics (Means, Std. Deviations, Correlations)  What was the average satisfaction rating  What was the standard deviation from the mean score on a particular item  Did an item correlate (or relate to) another item or outcome  T-tests and Analysis of Variance (ANOVA)  Is there a difference between pre and post test outcomes  Is there a difference between pre and post test among different groups, such as males and females  Regression  Does a variable (or factor) predict a certain outcome  If so, what is the regression equation that models the outcome And finally, the More Advanced the Analysis, the Greater the Amount of Preparation  Most analyses can be executed straight from a working data file   Excel SPSS (statistical package for social sciences)  Some analyses may require transformations of the raw data, subsets, or specific input data to comply with statistical software  Example: you will need to use a numerical code to represent nominal data when analyzing in excel or SPSS such as male=1 and female=2  such as pre-test=1 and post-test=2  Useful Terms for Data Analysis  A variable is a characteristic or condition that changes or has different values for different individuals  Variables may require special coding for different data representation  Nominal (ethnicity, gender, name of school)  Ordinal (pre test and post test codes)  Scale (a rating scale with incremental values)  Ratio (incremental scale with an absolute zero point, such as temperature or money)  Treatment or intervention- the medicine or experience that is provided to the participants(e.g. a weight loss plan; a robotics outreach program)  Dependent variables are the ones observed in order to assess effect of treatment or intervention (e.g. amount of pounds lost; attitudes about computing)  Independent variables are those manipulated by the researcher   For experimental research- the treatment group vs non-treatment group; e.g. a Jenny Craig Group vs Weight Watchers Group vs No Weight Loss Plan Group For quasi-experimental research (most SLC research projects)- it may be the variables that cannot be controlled, such as gender, age, ethnicity, but it depends upon how the study is set up Overview of Programs Used for Data Analysis  SAS and SPSS are most commonly used and tend to focus on the “classic” statistical routines:    Descriptive statistics and non-parametric (“distribution-free”) tests ANOVA / Regression Factor analysis  However, many psychometric procedures (e.g. IRT) and newer statistical models are not as well supported by these programs   Very specialized programs are used Designed to do a specific task or validate a theory  Specialized programs may have issues Interface not very user-friendly  Additional data types or files required  Expense  Most EA Projects will use Excel and/or SPSS, so we’ll focus on these  What is SPSS?  A commercially produced statistical software package that is widely used in the fields of Education and Psychology  Program functionality is broken into over a dozen different modules which are sold individually  Most commonly used are Base, Regression Models, and Advanced Models  Other modules can be installed to run more complex analyses  SPSS data files include both the data and also variable information (variable and value labels, formats and missing values) What Program Should I Use?  Microsoft Excel is the most basic and accessible spreadsheet program available today     It is most ideal for general data exploration, histograms, scatter plots, etc. Appearance of tables can be customized Allows for easy transition to other programs to complete analyses and write reports However, its heritage is not as a statistical analysis program  SPSS is designed for specific analytic tasks such as statistical significance of findings    Balance the results and what will being presented Choose wisely in the interests of efficiency and accuracy of results Some output is good for looking at the data through basic exploration and to generate basic tables, but not to present the data Case Study Example of Terrell’s EA Project Terrell’s Project  South Carolina State’s SLC students are conducting a 7 week robotics club to teach middle school students how to manipulate robots. The goal is to increase their knowledge & interest in computing. The hypothesis is that the students’ interest in computing will increase over the course of the 7 week intervention  Computing attitudes are being measured by the Computing Attitudes Survey, available on the EA Website and widely used throughout STARS Some Nerdy Information for those of you who may be interested  The study design is quasi-experimental , pre/post survey design  no control group (e.g. a comparable middle school class without the robotics intervention who take the pre and post survey at the same time as the students in the class with the robotics intervention) Data Analyses: Option A For the primary research question: Do students’ attitudes towards computing improve after participating in a 7 week robotics program improve?  Descriptive Information  Statistical Significance  T test in SPSS will determine  If there were increases in student attitudes from pre to post  If those increases (or decreases) were statistically significant (i.e. didn’t happen by chance alone) Step 1: Collect and Manage Data  Terrell gave the Computing Attitudes Survey to the middle school class at the first day the SLC students visited the class  Protect Anonymity of the Students:   He did NOT collect student names, but instead gave each survey sheet a code to indicate that it was a pre-survey Use something that makes sense for the project and be consistent for the post test, making sure to distinguish between the pre and post tests  In this example, the pre-surveys are all coded (or labeled) as 1, and the post surveys are labeled 2 A note about coding surveys  If there is only 1 classroom being surveyed, a simple method is feasible  If there are several classrooms, i.e. Mr. Jones and Mrs. Smith, then a more complex coding system is needed  Why? Because the teacher/classroom becomes a variable  If Terrell wanted to do a matched comparison of student outcomes, say compare Juan’s pre survey to Juan’s post survey, a unique code will need to be devised for each student  Protect their identities, and allow match comparisons A sample of the student survey STARS Outreach Computer Attitude Survey for Secondary Students Please read each sentence and circle your answers to each one as follows: SD = Strongly Disagree; Disagree; Neutral; Agree; SA=Strongly Agree I don’t think I would like working with computers in my job. Learning about the use of computers to solve problems is interesting. I am not smart enough to be good at computing as a major or career. Computers can be used to help people. Learning about how computers might be used in the future is boring. I will use computers in many ways in my life. Knowing how to work with computers will help me get a good job someday. I believe that math and computer careers will keep me in an office in front of a computer all day. SD Disagree Neutral Agree SA SD Disagree Neutral Agree SA SD Disagree Neutral Agree SA SD Disagree Neutral Agree SA SD Disagree Neutral Agree SA SD Disagree Neutral Agree SA SD Disagree Neutral Agree SA SD Disagree Neutral Agree SA •Please check beside the ways you use computers: ___Computer games __homework ___Facebook ___email •How old are you?__________ •I am: __White __Black __Native American __Asian __Other:____________ (check one) •I am a: male / female (circle one) __Hispanic Step 2: Setting up the Excel File  Terrell creates an excel spreadsheet for all the     surveys Each row is a student response Each column is an item on the survey An additional column indicates whether or not the survey was a pre or post collection For good research practice, he has one tab with the actual responses and a second tab with the numerical representations Let’s take a look…. He decides to use item 20 descriptively * He could use item 20 quantitatively to see if there are relationships between attitudes about computers and how students use computers- but this is beyond the scope of this presentation And the codes…. Step 3: Conducting a T test in SPSS  SPSS is the best way to conduct a T test  Your school most likely has a student version that you can use free, or check with your faculty advisor  Step 1: upload the coded excel data into an SPSS file  Copy and Paste the excel data into the data view tab  It will look pretty much the same  You’ll want to add the code descriptions in the variable tab Data View Tab Variable View Tab Getting Descriptive Data  Basic Descriptive Statistics  Means and Standard Deviations for Total Group and by Gender  Step 1: Go to Analyze tab, then click Descriptives, then Frequencies  Click on Age and then the arrow tab to move it into the window, repeat with Race and Gender  Choose the statistics you want and OK  You’ll get a print out of number, percentage and anything else you selected (mean, median, mode) Selecting the variables that you want frequencies for Additional Resources Print out of Frequencies Running the T test  A T test will answer the question: is there a difference between pre and post surveys? And will tell you if any differences are significant   That is, they didn’t occur by chance This is the really abbreviated explanation; you should read more about it online  STEPS: Analyze; Compare Means; Independent Samples    Then select the Grouping Variable (which is Time) and enter 1, 2 in the “define group” boxes Then select and move over items 1-19 into the Test Variable box Click OK and the T test will run Grouping Variable = Time Test Variables = items 1-19 Sample Output  Look at the average (mean) scores at pre and then at post to see if there was an increase or decrease  Then look at the Independent Samples Test table in the “.Sig” column  If the number in this column is below .05, the difference between pre and post is statistically significant  The following example is from an actual survey conducted with college students to measure self-efficacy, attitudes toward computing, help-seeking and intention to go to graduate school Sample Output from Actual Data (not Terrell’s) Sample of Reporting Outcomes Gender Male Female Prefer not to specify 4% *significant increase at p<.05 Construct SelfEfficacy 37% 59% Intent Attitude Help SeekingCoping Time Mean SD pre 3.11 0.47 post 3.48* 0.39 pre 3.22 0.59 post 3.23 0.66 pre 3.67 0.36 post 3.70 0.38 pre 2.98 0.39 post 2.98 0.36 Other Options  Terrell could decide that he wants to know if there are any differenced in outcomes based upon gender or race    A Cross-Tabs for these variables can be conducted in SPSS, along with a chi-square analysis Or an analysis of variance (ANOVA) This analysis would also apply if there were different classes being compared, as in Mr. Jones and Mrs. Smith’s classes  Descriptive information is reported in aggregate  Total number of students (15)  Gender number and percent  Race/ethnicity numbers and percent Storing Data  This is particularly important if names are associated with data. In our case it’s not.  But keep in mind: always safeguard participant identity!    Limit access to data sources Password protect files Keep data separate from the actual surveys with identifiers More Information  This was a very basic overview.  For more information on data analyses:  Chi square:  ANOVA: http://www.statisticssolutions.com/resources/directory-of-statistical-analyses/anova http://math.hws.edu/javamath/ryan/ChiSquare.html  For more information on SPSS:  http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml  Contact Audrey directly if you would like to address specific questions about your EA project!

Data Analysis 101 - STARS Computing Corps

Related documents

Products

Support

Data Analysis 101 - STARS Computing Corps

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib