PSYCH 3400 Statistical Methods CUNY Brooklyn College, Department of Psychology Alla Chavarga alla.chavarga@gmail.com Approach of the Course • In this class you will learn both the theory and practice of statistics. • Homework is practice for the exams • Essay type answers • Statistical calculations by hand • SPSS analysis Lab Format • Announcements (make sure you are on time • Demonstration of new computer techniques required for that week’s homework • Period of questions and answers • Opportunity for you to work with SPSS when your TA is present You should think of the lab section as training, you will complete most of the homework on your own time. http://psychfiles.net • • • • Contact info Syllabus/ Semester Schedule Lecture Slides Homework Assignments/Problem Sets Definition of a Statistic OUR WORKING DEFINITION: A number that organizes, summarizes or makes understandable a collection of data. THE FORMAL DEFINITION: A number calculated on sample data that quantifies a characteristic of the sample. Which of these makes more sense? “In our calculations, we noted large differences in pupil size between males and females. The male group had pupil diameters (mm) of 3.2, 4.1, 4.6, 7.2, 4.1, 5.3, 8.1, 6.3, 4.8, 4.6, 4.8, while females had the following pupil diameters: 4.6, 7.1, 4.7, 3.7, 8.0, 4.8, 6.2, 4.5, 4.9, 7.1, 6.8. Obviously, there is a noticeable difference.” vs. “In our calculations, we noted large differences in pupil size between males and females. The male group had an average pupil diameter of 4.9, while females had an average pupil diameter of 6.1. Obviously, there is a noticeable difference.” Hours worked Pay Hours worked Pay Pay We can also use statistics to describe relationships that we can depict graphically, such as in these SCATTERPLOTS. Hours worked How do we acquire knowledge? Authority Scientific Method Intuition Rationality WHY do I have to learn Statistics? Some VERY important definitions: • Experimental vs. Observational Methods • Population – the complete set of individuals, objects, or scores that the investigator is interested in studying. • Sample – a subset of the population. • Variable – any property or characteristic of some event, object, or person that may have different values at different times depending on the conditions – Independent: the variable that is systematically manipulated by the investigator – Dependent: the variable that is measured to determine the effect of the independent variable • Data - the measurements made on the subjects of an experiment • Statistic – a number calculated on sample data that quantifies a characteristic of the sample. (Note: Parameter). – Descriptive vs. inferential statistics The Concept of a Variable Any measurable property of a person, event or object that may take on different values at different times or under different conditions. Height (y-axis) Weight (x-axis) Textile Workers 75 Hieght (inches) 70 65 60 55 50 45 80 100 120 Weight (lbs) 140 160 Compare with a CONSTANT like p Continuous and Discrete Variables Discrete Variable 1 2 3 4 2 Can divide 2.125 in half 1/8 infinitely 2.25 1/4 5 2.5 1/2 6 3 Continuous Variable Scales of Measurement Nominal Ordinal Interval Ratio Names or categories Order: a sense of greater or lesser but not by how much Ordinal and how much greater & lesser: each interval is equal Interval scale with an absolute zero - ratios of scores have meaning. Summarizing Samples with Math and Graphs i = Frequency (number of individuals) SG Class Heights (Raw S cores) 15 10 5 0 54 55 Nominal Ordinal 56 57 58 59 60 61 Interval Height (inches) Ratio 62 63 64 Significant Figures and Rounding It does not make sense to carry our calculations beyond the real limits of the variables we measure. Ex: On a thermometer the smallest unit is half of a degree. By convention, in this class we will round all numbers to the hundredths place (two places after the decimal). 5.624 5.62 when the 3rd decimal place is ≤4. 1.287 1.29 when the 3rd decimal place is ≥5. Mathematical Notation This is probably new to you. S It means “summation” Mathematical Notation: Summation Calculation Student ID 1 2 3 4 5 6 7 Grade (X) 93 75 88 77 65 55 97 S X = 93 + 75 S X = 550 +88 + 77 + 65 + 55 + 97 Average of the variable X: 1 n (S X ) = (1/7) 550 = 78.57 Order of Operations Order of operations: Parentheses, Exponents, Summation, Multiplication/Division, Addition/Subtraction Read them like English sentences or lists of things to do in order Important Example x: { 1, 2, 3} S x2 (S x )2 “Sum of the squared x’s” x 1 2 3 “Square of the summed x’s” x2 (1)2=1 (2)2=4 (3)2=9 x 1 2 3 14 6 62 = 36 How can data be described? Summarized? Here is a set of 15 height measurements (in inches). { 55, 56, 56, 58, 60, 61, 57, 57, 59, 60, 60, 61, 54, 57, 57} Value 54 55 56 57 58 59 60 61 HEIGHT 5 4 3 2 1 Std. Dev = 2.20 Mean = 57.9 N = 15.00 0 54.0 55.0 HEIGHT 56.0 57.0 58.0 59.0 60.0 61.0 Frequency Histogram Frequency 1 1 2 4 1 1 3 2 How can data be described? Summarized? How to create a detailed frequency table: Example: How many siblings do you have? Set of scores: x: {2, 1, 5, 0, 2, 1, 2, 0, 1, 1, 3, 1, 2, 1, 1, 0, 0, 2, 3 , 1} Value 0 1 2 3 4 5 How can data be described? Summarized? How to create a detailed frequency table: Example: How many siblings do you have? Set of scores: x: {2, 1, 5, 0, 2, 1, 2, 0, 1, 1, 3, 1, 2, 1, 1, 0, 0, 2, 3 , 1} Value 0 1 2 3 4 5 Frequency How can data be described? Summarized? How to create a detailed frequency table: Example: How many siblings do you have? Set of scores: x: {2, 1, 5, 0, 2, 1, 2, 0, 1, 1, 3, 1, 2, 1, 1, 0, 0, 2, 3 , 1} Value 0 1 2 3 4 5 Frequency 4 How can data be described? Summarized? How to create a detailed frequency table: Example: How many siblings do you have? Set of scores: x: {2, 1, 5, 0, 2, 1, 2, 0, 1, 1, 3, 1, 2, 1, 1, 0, 0, 2, 3 , 1} Value 0 1 2 3 4 5 Total Frequency 4 8 5 2 0 1 20 How can data be described? Summarized? How to create a detailed frequency table: Example: How many siblings do you have? Set of scores: x: {2, 1, 5, 0, 2, 1, 2, 0, 1, 1, 3, 1, 2, 1, 1, 0, 0, 2, 3 , 1} Value 0 1 2 3 4 5 Total Frequency Percent 4 8 5 2 0 1 20 How can data be described? Summarized? How to create a detailed frequency table: Example: How many siblings do you have? Set of scores: x: {2, 1, 5, 0, 2, 1, 2, 0, 1, 1, 3, 1, 2, 1, 1, 0, 0, 2, 3 , 1} Value 0 1 2 3 4 5 Total Frequency Percent 4 20 8 5 2 0 1 20 = (4/20) x 100 = .20 x 100 = 20 How can data be described? Summarized? How to create a detailed frequency table: Example: How many siblings do you have? Set of scores: x: {2, 1, 5, 0, 2, 1, 2, 0, 1, 1, 3, 1, 2, 1, 1, 0, 0, 2, 3 , 1} Value 0 1 2 3 4 5 Total Frequency 4 8 5 2 0 1 20 Percent 20 40 25 10 0 5 Cumulative Frequency 4 12 17 19 19 20 Cumulative Percent 20 60 85 95 95 100 How can data be described? Summarized? How to create a detailed frequency table: Example: TEST GRADES!!? Set of scores: x: {100, 23, 65, 98, 84, 72, 50, 49, 52, 99, 83, 79, 89, 90 56, 63, 72, 92, 83, 100} What if our range is very large? -We use class intervals instead of single values -Rule for # of intervals for use in this class: 10 -To determine the width that each interval should be given the range of data we have, use the following formula: = (Highest score – Lowest score)/10 = (100 – 23)/10 = 77/10 = 7.7 round this to the next whole number, 8. How can data be described? Summarized? How to create a detailed frequency table: Example: TEST GRADES!!? Set of scores: x: {100, 23, 65, 98, 84, 72, 50, 49, 52, 99, 83, 79, 89, 90 56, 63, 72, 92, 83, 100} Intervals 23-30 31-38 39-46 47-54 55-62 63-70 71-78 79-86 87-94 95-102 How can data be described? Summarized? How to create a detailed frequency table: Example: TEST GRADES!!? Set of scores: x: {100, 23, 65, 98, 84, 72, 50, 49, 52, 99, 83, 79, 89, 90 56, 63, 72, 92, 83, 100} Intervals 23-30 31-38 39-46 47-54 55-62 63-70 71-78 79-86 87-94 95-102 Frequency 1 0 0 3 1 2 2 4 3 4 How can data be described? Summarized? How to create a detailed frequency table: Example: TEST GRADES!!? Set of scores: x: {100, 23, 65, 98, 84, 72, 50, 49, 52, 99, 83, 79, 89, 90 56, 63, 72, 92, 83, 100} Cumulative Cumulative Intervals Frequency Percent Frequency Percent 5 23-30 1 5 1 0 31-38 0 5 1 0 39-46 0 5 1 15 47-54 3 20 4 5 55-62 1 25 5 10 63-70 2 35 7 10 71-78 2 45 9 20 79-86 4 65 13 15 87-94 3 80 16 20 95-102 4 100 20 Choice of Interval is Important HEIGHT 30 20 HEIGHT 20 0 46.0 43-48 50.5 49-54 55.0 55-60 59.5 61-66 64.0 67-72 10 HEIGHT Frequency Frequency 10 0 45.0 45-47 HEIGHT 47.5 50.0 52.5 57-59 55.0 57.5 60.0 62.5 65.0 48-50 51-53 54-56 60-62 63-65 66-68 69-71 Frequency Polygons 5.0 4.0 3.0 2.0 Count 1.0 0.0 54.00 55.00 HEIGHT 56.00 57.00 58.00 59.00 60.00 61.00 HEIGHT 5 By Comparison… 4 3 2 5.0 1 Std. Dev = 2.20 4.0 Mean = 57.9 N = 15.00 0 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0 3.0 HEIGHT 2.0 Count 1.0 0.0 54.00 55.00 HEIGHT 56.00 57.00 58.00 59.00 60.00 61.00 HEIGHT 5 These are By Comparison… commonly referred to as DISTRIBUTIONS 4 3 2 5.0 1 Std. Dev = 2.20 4.0 Mean = 57.9 N = 15.00 0 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0 3.0 HEIGHT 2.0 Count 1.0 0.0 54.00 55.00 HEIGHT 56.00 57.00 58.00 59.00 60.00 61.00 Common Shapes of Frequency Distributions HEIGHT 7 6 5 4 3 Frequency 2 1 0 54.0 55.0 56.0 57.0 58.0 59.0 60.0 HEIGHT HEIGHT 7 6 6 5 5 4 4 3 3 2 2 Frequency Frequency HEIGHT 7 1 0 54.0 HEIGHT 55.0 56.0 57.0 58.0 59.0 60.0 1 0 54.0 HEIGHT 55.0 56.0 57.0 58.0 59.0 60.0 Common Shapes of Frequency Distributions Common Shapes of Frequency Distributions Symmetrical Bell-shaped Positively Skewed Negatively Skewed Multimodal Distributions HEIGHT HEIGHT 8 14 12 6 10 8 4 4 2 Frequency Frequency 6 0 54.0 HEIGHT 55.0 56.0 57.0 58.0 59.0 60.0 2 0 54.0 55.0 56.0 57.0 58.0 59.0 HEIGHT When describing a distribution, always specify: -Is it unimodal, bimodal, multimodal? -Is it symmetrical? -Is it skewed, positive or negative? 60.0 61.0 62.0 A real example… Psych Stats 3400 First Exam Grades N=66 students 16 14 Frequency 12 10 8 6 4 2 0 20-28 29-36 37-44 45-52 53-60 61-68 Grade 69-76 77-84 85-92 93-100 IT’S THE HUMAN HISTOGRAM! Is this a histogram?