LESSON 1: Introduction to Statistics STATISTICAL METHODS STATISTICS- refers to a set of mathematical procedures for organizing, summarizing, and interpreting information. (STATISTICS, SCIENCE, AND OBSERVATIONS) Descriptive statistics- are statistical procedures used to summarize, organize, and simplify data. POPULATION- is the set of all the individuals of interest in a particular study. Non-probability EX. Mean, media, mode; range, standard deviation, variance, interquartile. SAMPLE- is a set of individuals selected from a population, usually intended to represent the population in a research study. Inferential statistics- consist of techniques that allow us to study samples and then make generalizations about the populations from which they were selected. VARIABLE- is a characteristic or condition that changes or has different values for different individuals (height, weight, gender, or personality; Temperature, time of day, or the size of the room). Probability Ex. T-test, Analysis of Variance (ANOVA), correlation, regression. DATA (plural) - are measurements or observations. A data set is a collection of measurements or observations. Adatum (singular) is a single measurement or observation and is commonly called a score or raw score. PARAMETER- is a value, usually a numerical value that describes a population. A parameter is usually derived from measurements of the individuals in the population. STATISTIC- is a value, usually a numerical value that describes a sample. A statistic is usually derived from measurements of the individuals in the sample. SAMPLING ERROR- is the naturally occurring discrepancy, or error, that exists between a sample statistic and the corresponding population parameter. RESEARCH METHODS, AND STATISTICS CORRELATIONAL METHOD- two different variables are observed to determine whether there is a relationship between them. Non-experimental study- The "independent variable" that is used to create the different groups of scores is often called the QUASI-INDEPENDENT VARIABLE. EXPERIMENTAL METHOD- one variable is manipulated while another variable is observed and measured. To establish a cause-and-effect relationship between the two variables, an experiment attempts to control all other variables to prevent them from influencing the results (Ex. T-test and ANOVA). The independent variable (IV) - is the variable that is manipulated by the researcher. The dependent variable (DV) - is the variable that is observed to assess the effect of the treatment. observations in terms of size or magnitude (Ex. 1st, 2nd. 3rd, so on) Interval Scale- consists of ordered categories that are all intervals of exactly the same size. Equal differences between numbers on a scale reflect equal differences in magnitude. However, the zero point on an interval scale is arbitrary and does not indicate a zero amount of the variable being measured. (EX. Fahrenheit and Celsius) Ratio Scale- is an interval scale with the additional feature of an absolute zero point. With a ratio scale, ratios of numbers do reflect ratios of magnitude (EX. physical measures such as height and weight; number of errors on a test) Individuals in a control condition do not receive the experimental treatment. Instead, they either receive no treatment or they receive a neutral, placebo treatment. The purpose of a control condition is to provide a baseline for comparison with the experimental condition. VARIABLES AND MEASUREMENT Individuals in the experimental condition do receive the experimental treatment. OPERATIONAL DEFINITION- identifies a measurement procedure (a set of operations) for measuring an external behavior and uses the resulting measurements as a definition and a measurement of a hypothetical construct. SCALES/LEVEL OF MEASUREMENT Nominal Scale- consists of a set of categories that have different names. Measurements on a nominal scale label and categorize observations, but do not make any quantitative distinctions between observations (College Program: Psychology, Biology, etc.) Ordinal Scale- consists of a set of categories that are organized in an ordered sequence. Measurements on an ordinal scale rank CONSTRUCTS- are internal attributes or characteristics that cannot be directly observed but are useful for describing and explaining behavior (EX. Intelligence, Anxious, Hungry) Note that an operational definition has two components: First, it describes a set of operations for measuring a construct. Second, it defines the construct in terms of the resulting measurements (EX. Behavior...intelligent behavior). DISCRETE VARIABLE- consists of separate, indivisible categories. No values can exist between two neighboring categories (EX. the number of children in a family or the number of students attending class. EX: An ice cream shop keeps track of how much ice cream they sell versus the temperature of the day. CONTINUOUS VARIABLE- there are an infinite number of possible values that fall between any two observed values. A continuous variable is divisible into an infinite number of fractional parts (EX. Weights). Dependent (DV) and Independent (IV) Variables INDEPENDENT VARIABLES- are variables that the researcher controls and manipulates in accordance with the purpose of the investigation. Multivariable distribution- each datum belongs to three or more variables. EX: the teacher would like to keep track of the enrollment in the College in terms of program, year level and gender. DEPENDENT VARIABLES- are variables that are measures based on the effect of the independent variables. Example: The researcher would like to determine the predictive validity of the entrance requirements for freshman students, the (___) are the national achievement test, entrance examination, and school grade. The (____) is the performance in first year college. Univariable, bivariable, and multivariable distribution Univariable Distribution- there is only one variable involved. Ex: Age of Grade 7 pupils. Bivariable Distribution- in which data are classified on the basis of two variables. Steps in determining the sample size Determine the population where the data researcher needs can be gathered. Determine the kind of sample to be drawn from it or to be selected from the identified population. (Criteria: age, gender, working experiences, etc. Determine the desired sample size. The Slovin's sampling formula EX: A researcher may want to determine the reading deficiencies of the students in his school. However, he may not probably be able to test all the students on account of their big numbers with 5,000 population. Let us estimate the sample size using a 5% acceptable margin of error. Sampling method Probability Sampling- this refers to a sampling process where each unit in the population has known nonzero probability (every unit in the population has a chance of being selected in the sample) of being included in the sample. 1. Simple Random sampling- the simplest method available...each member in the population will have an equal chance of being selected (This makes impossible to predict who will be chosen). EX: Fishbowl technique, lottery or raffle type method, roulette wheel method. When to use: if the population is not widely spread geographically. 2. Stratified Random Sampling- the method where the samples are randomly selected from the different groups or sections of the population used in the study. (EX: age, gender, economic status and others) When to use: This is preferred to use if precise estimates are desired for stratified parts of the population and if the sampling problems differ in various strata or groups of the population. 3. Systematic Random SamplingEvery member of the population is listed with a number, but instead of randomly generating numbers, individuals are chosen at regular intervals. When to use: this is advisable to use if the ordering of the population is essentially random and when stratification with numerous data is used. 4. Cluster Sampling- involves dividing the population into subgroups, but each subgroup should have similar characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select entire subgroups (EX: neighborhoods, school district or region) When to use: if the population can be grouped into clusters or where the individual population samples are known to be different with respect to the characteristics under study. LESSON 2: Descriptive Statistics Descriptive statistics is the term given to the statistical treatments of data that helps describe, show or summarize data in a meaningful way. This form of statistics does not allow us to make conclusions to prove/disprove any hypotheses that we established in our study. Making comparisons between groups of individuals or between sets of figures. Mean is sensitive to the exact value of all the scores (Affected by extremely high or low values, called outliers) Under most circumstances, of the measures used for central tendency, the mean is least subject to sampling variation "Balance Point"(between the highest score and the lowest score) They are simply a way to give a general overview of our data. Types of Mean Regular Average (Mean) Defined as the sum of the scores divided by the number of scores. Measures of Central Tendency: 1. Mean 2. Median 3. Mode Measures of Variability: 1. Range 2. Standard Deviation 3. Variance Central Tendency Measure used to compare the quantity of two or more sets of data. "average" or "typical" Mean (Average), Median, Mode Weighted Average (Overall Mean) Defined as the mean of a certain distribution which considers the weight (Value) of each indicator (to combine two sets of scores and then find the overall mean for the combined group). Median Defined as the scale value below which 50% of the scores fall. The centermost score if the number of the scores is odd. If the number is even, the median is taken as the average of the two centermost scores. Determine whether the data value falls into the upper half or lower half of the distribution. Mode Most Frequent Score in the distribution There is a situation that there is only one mode on a certain distribution (unimodal) but most of the time it is multiple (bimodal/multimodal) Not used very much in studying behavioral sciences because it is very unstable. CENTRAL TENDENCY AND THE SHAPE OF THE DISTRIBUTION SYMMETRICAL DISTRIBUTIONS The right-hand side of the graph is a mirror image of the left-hand side. If a distribution is perfectly symmetrical, the median is exactly at the center because exactly half of the area in the graph is on either side of the center. The mean also is exactly at the center of a perfectly symmetrical distribution because each score on the left side of the distribution is balanced by a corresponding score (the mirror image) on the right side. The mean (the balance point) is located at the center of the distribution (perfectly symmetrical distribution, the mean and the median are the same). The curve is symmetric; both sides of a vertical line passing through the center. Variability - Specifies the extent to which scores are different from each other. SKEWED DISTRIBUTION –especially distributions for continuous variables, there is a strong tendency for the mean, median, and mode to be located in predictably different positions. SKEWNESS- is a measurement of the distortion of symmetrical distribution or asymmetry in a data set. Skewness is demonstrated on a bell curve when data points are not distributed symmetrically to the left and right sides of the median on a bell curve. POSITIVE SKEWNESS- when its tail is more pronounced on the right side than it is on the left (This means that the most extreme values are on the right side) NEGATIVE SKEWNESS - when the tail is more pronounced on the left rather than the right side (The most extreme values are found further to the left) NORMAL DISTRIBUTION The mean, median, and mode are equal and located at the center of the distribution. A normal distribution curve is UNIMODAL. - Signify scores dispersion. - "Difference and Degree" - Statistic that represent variability are the following: Range, Standard Deviation and Variance. Range - Defined as the difference between the highest and the lowest scores in the distribution. - Range = Highest Score - Lowest Score - Crude measure of Dispersion Standard Deviation - Give us a measure of dispersion relative to the mean - Sensitive to each Score in the distribution - Stable with regard to sampling fluctuation. Variance Square of SD represents the entirety and 200 percent specifies twice the given quantity (Encyclopedia Britannica, 2021). Not used widely in Descriptive Statistics but most of the time in inferential Statistics Percentile Rank – the percentage of scores with values lower than the score in question. Opposite of the percentile point. Used in ANOVA statistical treatment Percentile Rank = [cumfL + (fi / i) (X - XL) / N] x 100 Standard Score (Z-Score) A Z score is a transformed score that designates how many standard deviation units the corresponding raw score is above or below the mean Above or Below the mean Remember: If the Z-score is positive. The score is above the mean. If the Z score is 0, the Score is the same as the mean. And if the Z scores is negative, the score is below the mean. Percentage, Percentile and Percentile Rank Percentage - is a relative value indicating hundredth parts of any quantity. One percent (symbolized 1%) is a hundredth part; thus, 100 percent Percentile - is a value on the measurement scale which a specified percentage of the scores in the distribution fall below. Percentile = XL + (i /fi ) (cumfp - cumfL) PSYCH STATS Chapter 3: Writing Descriptive Stats results & Research Questions, Inferential Stats Overview (with Normality testing) WRITING RESULTS OF DESCRIPTIVESTATISTICS Commonly Reported Descriptive Statistics: Mean, Standard Deviation and Verbal Interpretation - These are descriptive stats that represent and are sensitive to the entire values in the distribution. Mean - Is all about the average or the usual score of your respondents on your studied variable. Standard deviation - Refers to usual dispersion of the scores in reference to the mean. Verbal Interpretation - Is the category used to tag your yielded average values. This is done through the usage of a Cut-Off Scores or a transmutation table. Sample Problem: A certain team of experimental psychologists wanted to identify which type of competition (group/individual) catalyzed better performance in accomplishing logic quizzes. They decided that the population that will be asked to participate are 3rd year philosophy majors studying at a College located at Pangasinan. They requested the administrator of the College to lend them 20 students on each of the four sections of the said program. RESEARCH QUESTIONS, INFERENTIAL STATS AND NORMALITY TESTING Research Question - This is a list of questions that your study is required to answer through the data that you will gather from your sample. - This is the compass that will guide your data gathering as well as the statistical treatment that you will implement on your data. How to write your own Research questions? 1. Identify the general concept that you want to work on. Then look for a theory that explains its relationship to other variables or concepts. Then confirm this relationship by reviewing literature (Look for Blindspot). 2. Once you identify your topic, the next thing that you will do is to create a specific title for your prospect research. This will be the guide for your Statement of the Problem (SOP). Example: Rumination and its relationship on Depression 3. Write your statement of the Problem together with your research questions. Descriptive Stats Question/s Inferential Stats Question/s Implication Question/s Statement of the Problem: Hypothesis - Is the educated guess that you will be giving to the question that requires inferential statistics to be answered. - These are the points that you will try to prove or not. To evaluate the relationship of Rumination to Depression. There are two types of Hypotheses: Research Questions: Null Hypothesis (H0:) 1.) What is the level of the participants' Rumination? - There’s no effect in the population. Alternative Hypothesis (H1:) 2.) What is the level of the participants' Depression? 3.) Is Rumination related to the depression experienced by the participants? 4.) What implications can be drawn from the results of the study? KINDS OF HYPOTHESIS Scientific Hypothesis - Is a suggest explanation or solution to a phenomenon Statistical Hypothesis - There’s an effect in the population. Non Directional: Hypothesis that doesn't specify the direction of the effect of the independent variable on the dependent variable. Two-tailed Test: The region of rejection lying on both tails of the normal curve. It is used when the alternative hypothesis uses words such as not equal to, significantly different, etc. Directional: Hypothesis that specifies the type of effect the independent variable has on the dependent variable. - It is a guess or prediction made by a researcher regarding the possible outcome of the study. One-tailed test: The region of rejection lying on either left or right tail of the normal curve. - It is a claim or statement about an unknown parameter. Right directional test: the region of rejection is on the right tail (greater than, higher than, better than, superior to, exceeds, etc. Left directional test: The region of rejection is on the left tail (less than, smaller than, inferior to, lower than, below, etc.) INFERENTIAL STATISTICS - With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. - Infer (Conclude, Deduce & Assume) - Measure if the scores yielded can be considered significant or just brought by chance. - Strengthen the Point we will be making in the conclusions of our data. Independent T-Test - Identifying effect of IV to DV through comparing two groups. Dependent T-Test - Identifying effect of IV to DV through comparing2 situations of a certain group. One-way ANOVA - Identifying effect of IV to DV through comparing3 or more groups. Two Way ANOVA - Identifying effect of IV to DV through multiple comparisons. Pearson Correlation - to know the relationship between 2 or more interval / ratio variables. DESCRIPTION OF A NORMAL CURVE (FROST, 2020) - Also known as the Gaussian distribution and the bell curve which is first described by the German mathematician Carl Gauss - The normal distribution is a probability function that describes how the values of a variable are distributed. Spearman's Rho - to know the relationship between 2 ordinal variables (usage of deviations). Kendall's Tau - to know the relationship between 2 or more ordinal variables (usage of concordant and discordant pairs). Linear Regression -to know the ability of a certain IV by predicting a DV Multiple Regression - to know the ability of a certain model in predicting a DV IMPORTANT CHARACTERISTICS OF A NORMAL CURVE 1. Mean, median and Mode are equal. Chi-Square - to know relationships between 2 or more categorical / nominal variables. THE NORMAL CURVE - A theoretical Distribution of Population Scores. It is a bell-shaped curve that is described by a specific equation. 2. Both sides from the median are symmetrical. 3. The tails are asymptotic, which means that they approach but never quite meet the horizon (Asymptotic). 4. Kolmogorov Smirnov and ShapiroWilk tests can be used to test your data against Normal Curve. P should be more than .05 WHY IT HAS TO BE NORMAL CURVE? - Variables measured in behavioral Science closely resemble Normal Curve. - It is a requirement to lot of inference test that the data should project Normal Distribution. PSYCH STATS Chapter 4: Correlational Analyses Correlational Study - Positive (+) or negative (-) and range from -1.00 (perfect negative correlation) to 1.00 (Perfect positive correlation). A correlation coefficient of 0.00 indicates no relationship between variables. A form of empirical study wherein the researcher examines the relationships between variables by identifying the direction and the significance of the tested relationship. DESCRIPTION OF CORRELATION (PEARSON R) PEARSON R - An inferential statistics used to identify if there is a significant linear relationship between two interval/ratio variables. - It gives information about the magnitude of the association, or correlation, as well as the directionof the relationship (Statistics Solution, 2020). Requirements: A.) Variables to be correlated shouldhave a linear relationship. B.) Both variables should bein interval/ratio form. C.) Normal Distribution D.) Random Sampling Scatter Plot: The Graph for Correlations - The graph used in presenting Correlation Results - A Graph of paired X and Y values. - The more the points were on the trend line the more it is correlated. Ascending - correlation is positive Descending -correlation is negative FORMULATING RESEARCH QUESTION CORRELATION COEFFICIENT The magnitude and the direction of the relationship between two variables are indicated by a correlation coefficient (r). - Is there a significant relationship between Variable A and Variable B? - Is Variable A significantly related to Variable B? ALTERNATIVE HYPOTHESIS (H1) - There is a positive/negative relationship between variable A and Variable B. Variable A is positively/negatively correlated/related to Variable B. participants on the two variables) of ranks on the participant's data - 2.) Yielded Coefficient - RANK CORRELATION ANALYSES - - These are statistical analyses that analyze correlation of two variables represented by not-normally distributed data, minimal sample size and in the ordinal form. Both produce coefficients (+1 to -1) similar to Pearson r. Due to that, hypothesis making and results reporting is relatively similar with the manner of Pearson r. Basic difference: 1.) Coefficient Derivation (Formula) - The coefficient of Spearman's rho is based on the deviation (Absolute difference of the ranks of the Most of the time, the yielded coefficient for Spearman's rho is larger compare to Kendall's tau due to their difference in formula. 3.) Accuracy in smaller sample size - Kendall's Tau yields a more accurate coefficient in studies which has small number of participants (12 or less). 4.) Popularity - Examples: Kendall's Tau and Spearman's Rho. Kendall's tau it is based on the number of concordant and discordant pairs (Concordance of the second variable to the first variable ranking). Spearman's rho is quite popular compare to Kendall's Tau since this is launched to the scientific community earlier by a renowned English Psychologist/theorist named Charles Spearman.