CBR221 Introduction to Survey Data Analysis with Excel Workshop Objectives Use Excel to help you: – Organize data for analysis – Systematically work with data – Analyze data – Graphically display analysis 2 We’ll Examine – Types of variables used in analysis – Types of measurement scales used in analysis – How to describe data with Frequency counts, Descriptive Statistics, Histograms, and Pivot Tables – How to create charts Survey Questions: 1. Area of the city child lives in? 2. Number of colds child had last year? 3. Gender: 4. Age 5. Describe cold symptoms __________ Male_____ Female _____ Analysis Helps describe, conclude, recommend Systematic exploration for interpreting data 5 Survey Data Answers can be in text or numbered formats Statistics Systematic method of converting and analyzing data by using numbers 7 Excel Support tool for statistical methods 8 Start Excel © The Wellesley Institute www.wellesleyinstitute.co 9 Moving from Model to Excel Data Analysis • Assumption, hypothesis, or model • Collect data • Organize data for analysis 10 Example Model or Assumption • “West area children get fewer colds than central area children.” • Want proof • Analysis: Find mean (average) number of colds by area When Data Fit Model • Data cluster as expected • Findings support assumption or model For 33 west area children, the mean is 5 colds. For 31 central area children, the mean is 6 colds. The results indicate west area children do have fewer colds than central area children. Table Illustrates Table 1. Colds by City Area Zone Participants Colds x n West 33 5 Central 31 Total 64 6 Graph Illustrates Mean Colds 6 5 West Central Figure 1. West area children had a lower mean number of colds than central area children. Pie Chart Illustrates West, 5 Central, 6 Figure 1. West area children had a lower mean of five (5) colds than central area children who had a mean value of six (6). Data Organizing Tips As required, ensure: • All questions answered • Questions needing to be skipped, were skipped • Split multiple choice into 1 answer per column • Open File 1 16 Coding Open Ended Questions What cold symptoms did your child have? • Take first 100 answers • Group similar answers together • You define what is similar • Reduce to 10-20 codes or fewer if useful • Pilot test 17 Exercise: Create 3 codes Question: Describe child’s symptoms? • • • • • • • • • Response 1 Response 2 Response 3 Response 4 Response 5 Response 6 Response 7 Response 8 Response 9 Stuffy nose Sinus congestion, Runny nose Difficulty breathing through nose Phlegm, Body ache, Runny nose No energy, Cough No energy Sore throat Cough No energy 18 Exercise: Code Symptoms Code Description Block Stuffy nose, Sinus congestion, Difficulty breathing through nose, No energy Expel Cough, Phlegm, Runny nose Pain Body Ache, Headache, Sore throat 19 Assign Values to Codes Code Description Value Block Stuffy nose, Sinus congestion, Difficulty breathing through nose, No energy Yes=1 No=0 Expel Cough, Phlegm, Runny nose Yes=1 No=0 Pain Body Ache, Headache, Sore throat Yes=1 No=0 20 I entered Code Variables and Values on new Excel sheet 21 I entered Responses with values on new Excel Sheet 22 To Enter Data • • • • Row 1 has label for each variable Enter data 1 survey at a time 1 row per ID, work left to right 1 answer per column row 1 A ID B Location C Block row 2 1 1 0 row 3 2 2 1 row 4 3 1 0 23 To Enter Data • Enter 1 survey at a time • For each question, work left to right across single Excel row • Look at File 1b 24 Practice Enter answers on File 1b, sheet 3 Tip: Split answers across 3 columns Question 2 • ID#1 Answer • ID#2 Answer • ID#3 Answer • ID#4 Answer • ID#5 Answer a)Block X a)Block X a)Block X a)Block X a)Block X b)Expel b)Expel X b)Expel b)Expel X b)Expel X c)Pain c)Pain c)Pain c)Pain X c)Pain Compare your results with Data sheet 25 Other Possible Open-Ended Question Codes 1 = positive comment 2 = negative comment 3 = neutral comment, positive and negative Verbatim codes: e.g., “ache”, “congested”, “cough” Code only those related to research question 26 Text or Numbers • Text codes often easier to remember, fewer entry errors • e.g. M for male and F for female • But numbers often faster entry • Easier to work with numbers in Excel 27 Check for Response Accuracy • See File 1b, Accuracy Sheet • Take 1 question at a time - i.e. pick out single column and check answers • ID numbers at left • For unanswered question, create blank cell (pivot tables count blanks) 28 Use Excel When have larger number of respondents Makes manual calculations easier 29 Organize Data • Make sure data are entered into Excel in such a way that mathematical transactions can be performed on them • E.g., If studying gender, let male equal 1 and female equal 2. Can then count the 1s and 2s in your study. 30 To Explain Findings Use common terms e.g., Variable Use accepted methods of analyses (Certain variables and measurements scales use certain tests) Variable An object or human characteristic that: • • • Is observable Can be subject to variation Can be classified according to a type (discrete, continuous as well as dependent, independent) Variable and Measurement Scale • Certain variables also use certain scales • Can do frequency test for all variables • But variable and scale type may also further inform with additional statistical test e.g., descriptive statistics Discrete vs Continuous Variable Discrete Continuous • Finite values • No valid values in-between • E.g., male/female full/part-time/co-op • Infinite values • Valid values in-between • E.g., distance, height, age Discrete or Continuous Variable Hypothesis: City areas have different water temperatures. Discrete Variable: e.g., West, Central, East area Yes or No Value, no values in-between, no equi-distance Continuous: e.g., water temperature Infinite number or range of possible values in-between 35 Dependent vs Independent Variable Dependent • Assumed to change Independent • Assumed to influence or does not change Hypothesis: City areas have different water temperatures Dependent Water temperature Independent City area Measurement Scales How do we measure what we are working with? Types: 1. 2. 3. 4. Nominal Ordinal Interval Ratio Nominal Scale • • • • • • Finite values No values in-between No logical order Yes/No answer E.g., East, West, Central E.g., colour, gender Ordinal Scale • • • • • Finite values No values in-between Yes logical order Yes/No answer E.g., letter grades Grade B Grade C Grade A Interval Scale • • • • • • • • • Infinite values Logical order Values in-between Equal distance between data points How much? Numerical value No natural zero, keeps going No meaningful ratio between numbers E.g., temperature, 20 degrees NOT twice as hot as 10 Ratio Scale • • • • • • • • Infinite values Logical order Values in-between Equal distance between data points Comparing how much? Numerical values “0” value means something Meaningful ratio between numbers E.g., AGE Adult earns $50K; Teenager earns $25K Adult earns twice as much as teenager : Variable Type Review Variable Type Values Values Value Values Scale * Infinite inin Equidist Range between order ant Discrete (Yes/no) X X X X Nominal Discrete Grade (Yes/no) A B C X X √ X Ordinal Continuous (How much?) Temperature or $$, Age √ √ √ √ Interval or Ratio Dependent Assumed changes due to an independent variable Independent Assumed does not change Areas * Certain variables lend themselves to using certain types of measurement scales Measurement Scale Review Scale Values finite in a range Values Values Values are Test * have equidistant inorder between Nominal (yes/no), AREAS √ X X X Frequency Ordinal A,B,C, (yes/no) √ √ X X Frequency Interval/ Ratio X (how much for 1 sample?) Age √ √ √ Descriptive Statistics Interval/ Ratio (how much for comparing 2 or more samples?) G1 age G2 age √ √ √ Inferential Statistics e.g. t-test X Note: * Certain scales lend themselves to using certain statistical tests. Statistical Tests • Frequency count for discrete data (nominal, ordinal scale) * - quantity • Descriptive Statistics for continuous data (interval, ratio scale) – mean, median, mode – Characteristics of single sample • Inferential Statistics for continuous data (ratio scale) – t-test – Comparison of two or more samples – Making inference from samples to populations Note: Can do frequency count for all data types 44 Methodology • “By using a discrete/continuous independent variable called area (for city areas West, East, and Central), this study examined the discrete/continuous, dependent variable (with its possible multiple range of values) namely, water temperature.” • “To measure the independent variable, (to indicate if someone lived in the West, East, or Central area), a nominal/ordinal/interval/ratio scale was used.” • “To measure the dependent variable, to indicate how much the water temperature is), a(n) nominal/ordinal/interval/ratio scale was used.” 45 Methodology • “By using a discrete independent variable called area, this study examined the continuous, dependent variable namely, water temperature.” • “To measure the independent variable, a nominal scale was used.” • “To measure the continuous dependent variable, an interval scale was used.” 46 Calculations The statistical calculations we would conduct with Excel would be: • Frequency counts for each of the 3 city areas • Descriptive statistics of mean, median, mode for water temperatures 47 Review of Descriptive Statistical Terms • Mean – average • Median – where 50% of scores lie above a certain score and where 50% lie below a certain score • Mode – score that results most often 48 Format Cells Open Booklet Open File 1c)