What is statistics? Basic Concepts Statistics – is a branch of mathematics that focuses on the organization, analysis, and interpretation of a group of numbers. variable – characteristic that can have different values. Example: Stress level; age; gender; religion Psychologists use statistical methods to help them make sense of the numbers they collect when conducting research. The word statistics comes from the Italian word statista, a person dealing with affairs of state (from stato, “state”). The two branches of statistical methods descriptive statistics – procedures for summarizing a group of scores or otherwise making them more understandable. inferential statistics – procedures for drawing conclusions based on the scores collected in a research study but going beyond them values – A value is a specific measurement or number obtained during data collection. It represents a raw piece of data before any analysis. score – A score often refers to a processed or interpreted value. It typically results from applying a formula or assessment to the raw data to make it meaningful Example : Age: 5 years, 10 years, 25 years Height: 150 cm, 175 cm, 190 cm equal-interval variable – variable in which the numbers stand for approximately equal amounts of what is being measured. Example: Temperature in Celsius: The difference between 20°C and 30°C is the same as the difference between 30°C and 40°C. IQ Scores: The difference between an IQ of 100 and 110 is the same as the difference between an IQ of 110 and 120. levels of measurement – Levels of measurement refer to the different ways that variables can be quantified and categorized. They determine the types of statistical analysis that can be performed. ratio scale – an equal-interval variable is measured on a ratio scale if it has an absolute zero point, meaning that the value of zero on the variable indicates a complete absence of the variable. Example: Weight: 0 kg (no weight), 50 kg, 100 kg Height: 0 cm (no height), 150 cm, 180 cm numeric variable – variable whose values are numbers (as opposed to a nominal variable). Also called quantitative variable. rank-order variable – numeric variable in which the values are ranks, such as class standing or place finished in a race. Levels of measurement (kinds of variables) Also called ordinal variable. Example: Race Position: 1st place, 2nd place, 3rd place (the difference between 1st and 2nd place may not be the same as between 2nd and 3rd place) Movie Ratings: 5 stars, 4 stars, 3 stars (the difference between each star rating is not necessarily the same) nominal variable – variable with values that are categories (that is, they are names rather than numbers), it represents categories with no intrinsic order or ranking. Also called categorical variable. Example: Eye Color: Blue, Brown, Green Type of Pet: Dog, Cat, Bird In summary: Numerical Variable: Age, Height Equal-Interval Variable: Temperature in Celsius, IQ Scores Ratio Scale: Weight, Height Rank-Order Variable: Race Position, Movie Ratings Nominal Variable: Eye Color, Type of Pet discrete variable – variable that has specific values and that cannot have values between these specific values. Example: Number of Children: A family can have 0, 1, 2, 3, etc., children. You cannot have 2.5 children. Number of Cars in a Parking Lot: You can count the cars as 0, 1, 2, 3, etc., but not 1.5 cars. continuous variable – variable for which, in theory, there are an infinite number of values between any two values. Example: Height: A person’s height could be 170.2 cm, 165.5 cm, or any other value within a range. Temperature: The temperature can be 20.5°C, 21.3°C, or any value within the temperature range. Frequency Table frequency table – is a way to organize data to show how often each value or range of values occurs in a dataset. How to Make a Frequency Table 1. Make a list down the page of each possible value, from lowest to highest – Note that even if one of the ratings between 0 and 10 is not used, you still include that value in the listing, showing it as having a frequency of 0. For example, if no one gave a stress rating of 2, you still include 2 as one of the values on the frequency table. 2. Go one by one through the scores, making a mark for each next to its value on your list. 3. Make a table showing how many times each value on your list is used. 4. Figure the percentage of scores for each value – To do this, take the frequency for that value, divide it by the total number of scores, and multiply by 100. You may need to round off the percentage. We recommend that you round percentages to one decimal place. Grouped Frequency Tables Interval – range of values in a grouped frequency table that are grouped together. (For example, if the interval size is 10, one of the intervals might be from 10 to 19.) grouped frequency table – frequency table in which the number of individuals (frequency) is given for each interval of values. Note: Sometimes there are so many possible values that an ordinary frequency table is too awkward to give a simple picture of the scores. The solution is to make groupings of values that include all values in a certain range. This combined category is a range of values that includes these two values. A combined category like this is called an interval. A frequency table that uses intervals is called a grouped frequency table. other without spaces, giving the appearance of a city skyline. A graph is another good way to make a large group of scores easy to understand. Researchers make histograms to show the pattern visually in a frequency table. The values, from lowest to highest go along the bottom; (b) the frequencies from 0 at the bottom to the highest frequency of any value at the top go along the left edge; (c) above each value is a bar with a height of the frequency for that value. Histograms Histogram – barlike graph of a frequency distribution in which the values are plotted along the horizontal axis and the height of each bar is the frequency of that value; the bars are usually placed next to each How to Make a Histogram 1. Make a frequency table (or grouped frequency table). 2. Put the values along the bottom of the page, from left to right, from lowest to highest. 3. Make a scale of frequencies along the left edge of the page that goes from 0 at the bottom to the highest frequency for any value. 4. Make a bar above each value with a height for the frequency of that value. Shapes of frequency distributions frequency distribution – pattern of frequencies over the various values; what a frequency table, histogram, or frequency polygon describes. unimodal distribution – has one peak or mode, which is the value that appears most frequently. bimodal distribution – has two peaks or modes. These peaks represent values that occur more frequently than others in the dataset. multimodal distribution – has more than two peaks or modes. rectangular distribution – or uniform distribution, has values that are all equally likely to occur, resulting in a flat, rectangular shape when graphed. In summary: Frequency Distribution: Shows how often each value occurs. Unimodal Distribution: One peak. Bimodal Distribution: Two peaks. Multimodal Distribution: More than two peaks. Rectangular Distribution: Values occur with equal frequency. Symmetrical and Skewed Distributions symmetrical distribution – is a type of distribution where the left and right sides are mirror images of each other. skewed distribution – distribution in which the scores pile up on one side of the middle and are spread out on the other side; distribution that is not symmetrical. A distribution that is skewed to the right is also called positively skewed. A distribution skewed to the left is also called negatively skewed. floor effect – situation in which many scores pile up at the low end of a distribution (creating skewness to the right) because it is not possible to have any lower score. ceiling effect – situation in which many scores pile up at the high end of a distribution (creating skewness to the left) because it is not possible to have a higher score. Normal and Kurtotic Distributions normal curve – specific mathematically defined, bellshaped frequency distribution that is symmetrical and unimodal; distributions observed in nature and in research commonly approximate it. Kurtosis – extent to which a frequency distribution deviates from a normal curve in terms of whether its curve in the middle is more peaked or flat than the normal curve. Central tendency Central tendency – is a statistical measure that identifies a single value as representative of an entire dataset. Mean – arithmetic average of a group of scores; sum of the scores divided by the number of scores. Mode – is the value that appears most frequently in a dataset. A dataset may have one mode, more than one mode, or no mode at all if no number repeats. Median – The median is the middle value in a dataset when the values are arranged in ascending or descending order. If the dataset has an odd number of values, the median is the middle one. If it has an even number of values, the median is the average of the two middle values. Mean Example and Statistical Symbols The rule for figuring the mean is to add up all the scores and divide by the number of scores. Mode Example Example: M – mean. ∑ – sum of; add up all the scores following this symbol. X – scores in the distribution of the variable X N stands for number – the number of scores in a distribution. Example: ∑x is 7 + 8 + 8 + 7 + 3 + 1 + 6 + 9 + 3 + 8, which is 60. In our example, there are 10 scores. Thus, N equals 10. Consider the dataset: 1, 2, 2, 3, 4. The mode is 2. If the dataset is 1, 1, 2, 2, 3, it has two modes: 1 and 2 (bimodal). If the dataset is 1, 2, 3, 4, 5, it has no mode since no number repeats. Median Example Example: Consider the dataset: 3, 1, 4, 1, 5. First, arrange the data in ascending order: 1, 1, 3, 4, 5. Note: When an answer is not a whole number, we suggest that you use two more decimal places in the answer than for the original numbers. The median is 3. For an even number of values, consider the dataset: 3, 1, 4, 1. Arranged in ascending order: 1, 1, 3, 4. The median is The variance is the sum of the squared deviations of the scores from the mean, divided by the number of scores. Variance – measure of how spread out a set of scores are; average of the squared deviations from the mean. deviation score – score minus the mean. squared deviation score – square of the difference between a score and the mean. sum of squared deviations – total of each score’s squared difference from the mean. Formulas for the Variance and the Standard Deviation Z scores Z score – number of standard deviations that a score is above (or below, if it is negative) the mean of its distribution; it is thus an ordinary score transformed so that it better describes the score’s location in a distribution. Probability Calculations: Helps calculate probabilities under the normal distribution curve. Formula Positive Z-score – Indicates that the data point is above the mean. Negative Z-score – Indicates that the data point is below the mean. Example: Magnitude of Z-score – Indicates how far (in terms of standard deviations) the data point is from the mean. Outlier Detection: Identifies data points that are unusually high or low compared to the rest of the dataset. Psychologists usually study samples and not populations because it is not practical in most cases to study the entire population. Methods of sampling Z-score of 0 – Indicates that the data point is exactly at the mean. Uses of Z-score Standardization: Allows comparison of data points from different datasets. Sample – is a subset of the population that is selected to represent the larger group. Sample and population Population – refers to the entire group of individuals, items, or data points that we want to study and draw conclusions about. Simple Random Sampling – every member of the population has an equal chance of being selected. Selection is done randomly without any bias. Stratified Sampling – The population is divided into homogeneous subgroups (strata) based on certain characteristics (e.g., age, gender, income). Then, random samples are taken from each stratum. Systematic Sampling – Selecting every nth individual from a list or population. The first individual is randomly chosen, and subsequent selections are made at regular intervals. Example: Selecting every 10th person from a list of registered voters. Cluster Sampling – The population is divided into clusters (geographical or administrative units), and then some clusters are randomly selected. All individuals within the selected clusters are included in the sample. Example: Randomly selecting several schools in a district, then surveying all students within those schools. Convenience Sampling – Individuals who are readily available and willing to participate are included in the sample. This method is easy and quick but may not represent the entire population accurately. Snowball Sampling – Initially selecting a few individuals who meet the criteria for the study. These individuals then refer others they know who also meet the criteria, creating a chain or 'snowball' effect. Quota Sampling – Similar to stratified sampling but nonrandom. Researchers choose individuals to fulfill certain quotas based on predetermined criteria (e.g., age, gender) until the quota is met. Probability, Outcome, Frequency Probability – measures the likelihood or chance of a specific outcome occurring. Outcome – refers to the result of an experiment, observation, or action. It represents a possible result or event that can occur. Frequency – is how many times something happens. Expected relative frequency – is what you expect to get in the long run if you repeat the experiment many times. Steps for Finding Probabilities 1. Determine the number of possible successful outcomes 2. Determine the number of all possible outcomes. 3. Divide the number of possible successful outcomes (Step ❶) by the number of all possible outcomes (Step ❷). deciding whether the outcome of a study (results for a sample) supports a particular theory or practical innovation (which is thought to apply to a population). Research hypothesis – Claims a significant relationship, effect, or difference between variables. Null hypothesis – States there is no significant relationship, effect, or difference; any observed results are due to random chance. Hypothesis testing Theory – set of principles that attempt to explain one or more facts, relationships, or events. Hypothesis – prediction, often based on informal observation, previous research, or theory, that is tested in a research study. Hypothesis testing – procedure for In hypothesis testing, researchers typically seek to reject the null hypothesis in favor of the research hypothesis, based on empirical data and statistical analysis.