Statistics Class 2 Semester Chapter 1 – The Nature of Statistics Section 1.1 Statistic Basics Definition 1.1 Descriptive Statistics: Consists of methods for organizing and summarizing information. It includes the construction of graphs, charts and tables and the calculation of various descriptive measures, such as averages, measures of variation, and percentiles. Furthermore, are in this no inferences made. Population: Collection of all individuals or items under consideration in a statistical study. Sample: That part of the population from which information is obtained. Inferential Statistics: Methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population. The Information obtained from a sample of the population lets one make inferences (draw conclusions), about preferences of the entire population. Section 1.2 Simple Random Sampling Definition 1.4 Simple random sampling: Sampling procedure for which each possible sample of a given size is equally likely to be the one obtained. Simple random sample: Sample obtained by simple random sampling. SRSWR (Simple Random Sampling with replacement): A member of a population can be selected more than once. SRS (Simple Random Sampling without replacement): A member of the population can be selected at most once. Random-Number Tables: Procedure to obtain a random sample, involving a table of random numbers. Random-Number Generators: Software to get random numbers. Section 1.3 Other Sampling Designs Systematic Random Sampling Step 1: Divide a populations size by the sample size and round it down (=𝑚) Step 2: Use random number generator or similar to obtain, 𝑘, between 1 an 𝑚. Step 3: select for a sample those members of the population, that are numbered 𝑘, 𝑘 + 𝑚, 𝑘 + 2𝑚, … Cluster Sampling Step 1: divide population into groups (clusters). Step 2: Obtain random sample of the clusters. Step 3: Use all members of the clusters obtained in Step 2 as the sample. Stratified Random Sampling with Proportional Allocation Step 1: Divide populations into subpopulations (strata). Step 2: Now take from each stratum, a simple random sample of size proportional to the size of the stratum. (That is, the sample size for a stratum = the total sample size * the stratum size / by the population size). Step 3: Use all members obtained in Step 2 as the sample. Section 1.4 Experimental Designs Definition 1.5 Experimental Units; Subjects: Individuals or items on which a designed experiment is performed on, are called experimental units. When the experimental units are humans, the term subject is often used instead. Principles of Experimental Design: Following principles are being looked out for, so the result of an experiment is not reasonably attributable to chance are likely caused by treatments: Control: Two or more treatments should be compared. Randomization: Experimental units should be randomly divided into groups, to avoid unintentional bias. Replication: Sufficient numbers of experimental units should be used to ensure that the randomization creates groups that resemble each other closely and also to increase chances of detecting differences among the treatments. Often involve experimental situations a treatment group and a control group, which is receiving placebo’s. Definition 1.6 Response variable: The characteristic of the experimental outcome that is to be measured or observed. Factor: A variable whose effect on the response variable is of interest of an experiment. Levels: The possible values of a factor. Treatment: Resembles each experimental condition. For one-factor experiments, the treatments are the levels of the single factor. For multifactor experiments, each treatment is a combination of levels of the factor. Definition 1.7 Completely Randomized Design: All experimental units are assigned randomly among all the treatments. Once treatments have been chosen, one must decide on how the experimental units are to be assigned to the treatments (or vice versa). Definition 1. Randomized Block Design: Experimental units are assigned randomly among all the treatments separately within each block. This means, that experimental units that are similar in ways that are expected to affect the response variable are grouped in blocks; then the random assignment of experimental units to the treatments is made block by block. Chapter 2 – Organizing Data Definition 2.1 - Variables Variable: Traits or characteristics that can vary from one person or thing to another. Qualitative Variable: Non-numerically valued variable. Discrete Variable: Quantitative variable. Only with a finite number of possible values are discrete variables. Continuous Variable: Quantitative variable, whose possible values form some kind of interval of numbers. Definition 2.2 - Data Data: Values of a variable. Qualitative Data: Values of a qualitative variable. Quantitative Data: Values of a quantitative variable. Discrete Data: Values of a discrete variable. Continuous Data: Values of a continuous variable. Section 2.2 Organizing Qualitative Data Definition 2.3 – Frequency Distribution of Qualitative Data Frequency Distribution of Qualitative Data: A frequency distribution of qualitative data is a listing of the distinct values and their data Procedure 2.1 Construct a Frequency Distribution of Qualitative Data: Step 1: Listing the distinct values of the observations in the data set in the first column of a table. Step 2: A tally mark is being placed for every observation in the second column of the table, in the row of the appropriate distinct value. Step 3: The tallies are being counted for each tally and the totals are recorded in the third column of the table. Definition 2.4 – Relative-Frequency Distribution of Qualitative Data Relative-Frequency Distribution of Qualitative data: A relative-frequency distribution of qualitative data is a listing of the distinct values and their relative frequencies. Procedure 2.2 Step 1: Obtain a frequency distribution of the data. 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 Step 2: Divide each frequency by the total number of observations (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠). Definition 2.5 Pie Chart: A disk divided into wedge-shaped pieces, proportional to the relative frequencies of the qualitative data. Procedure 2.3 – To Construct a Pie Chart Step 1: Obtain data through procedure 2.2. Step 2: Divide disk into pieces proportional to the relative frequencies. Step 3: Label pieces with the distinct values and their relative frequencies. Definition 2.6 Bar Chart: Displays distinct values of the qualitative data on a horizontal axis and the relative frequencies (or frequencies or percents) on a vertical axis. Relative data is being represented by a bar, whose height is equal to the relative frequency of that value. The bars do not touch each other. Procedure 2.4 – To Construct a Bar Chart Step 1: Obtain data by applying procedure 2.2. Step 2: Place horizontal axis and vertical axis to display the relative frequencies. Step 3: Each value gets a bar, whose height equals the relative frequency of that value. Step 4: Label the bars with the distinct values, the horizontal axis with the name of the variable, and the vertical axis with “Relative frequency”. Section 2.3 Organizing Quantitative Data Step 1: collect Data (e.g. 50 households, Nr. Tv) Step 2: Sort, categorize and compute relative frequency Definition 2.7 – Terms Used in Limit Grouping Lower Class Limit: Smallest value that could go in a class. Upper Class Limit: Largest value that could go in a class. Class Width: Difference between lower limit of class and lower limit of the next-higher class. Class Mark: Average of the two class limits of a class Definition 2. – Terms Used in Cutpoint Grouping Lower Class Cutpoint: Smallest value that could go in a class. Upper Class Cutpoint: Largest value that go in the next-higher class (= lower cutpoint of the next-higher class). Class Width: Difference between the cutpoints of a class. Class Midpoint: Average of the two cutpoints of a class. Choosing the Grouping Method Grouping Method Single-value Grouping Limit Grouping Cutpoint Grouping When to use Used with discrete data, when there are only a small number of distinct values. Used when data is being expressed as whole numbers and there aren’t too many distinct values to employ single-value grouping. Used when data is continuous and is expressed through decimals. Definition 2.9 – Histograms Histogram: Displays quantitative data on a horizontal axis and the frequencies of those on a vertical axis. The frequencies of each class are presented by a vertical bar whose height is equal to the frequency. The bars should be positioned so the touch each other. Single-value grouping: distinct values are used to label the charts, with each of them centered under the bar. Limit Grouping or Cutpoint Grouping: Lower class limits are used to label the bars. (Note: sometimes class marks or class midpoints are used and centered under the bars) Procedure 2.5 – To Construct a Histogram Step 1: Obtain a frequency (relative-frequency, percent) distribution. Step 2: Place the bars on the horizontal axis and display the frequencies on a vertical axis. Step 3: Each class gets a vertical bar, whose height equals the frequency of its class. Step 4: Bars get labels with their classes, the horizontal axis the name of the variable, and the vertical axis with “frequency”. Definition 2.10 – Dotplot Dotplot: A graph in which each observation is plotted as a dot at an appropriate place above a horizontal axis. Observations having equal values are stacked vertically. Procedure 2.6 – To Construct a Dotplot Step 1: Horizontal axis displays the possible values of the quantitative data. Step 2: Each observation is recorded, by placing a dot over the appropriate value on the horizontal axis. Step 3: The horizontal axis gets a label with the name of the variable. Definition 2.11 – Stem-and-Leaf Diagrams Stem-and-Leaf Diagrams: (also called stemplot), each observation is separated into two parts, namely, a stem-consisting of all but the rightmost digit- and a leaf, the rightmost digit. Procedure 2.7 – To Construct a Stem-and-Leaf Diagram Step 1: Each observation as a stem-consisting of all but the rightmost digit-and a leaf, the rightmost digit. Step 2: Write the stems from smallest to largest in a vertical column to the left of a vertical rule. Step 3: Write each leaf to the right of the vertical rule in the row that contains the appropriate stem. Step 4: Arrange the leaves in each row in ascending order. Section 2.4 Distribution Shapes Definition 2.12 – Distribution of a Data Set Distribution of a Data Set: Is a table, graph, or formula that provides the values of the observations and how often they occur. Different kinds of distribution