Elementary Statistics Chapter 1 Dr. Ghamsary Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Page 1 Elementary Statistics Statistics: Chapter 1 Dr. Ghamsary Page 2 Statistics is the science of • collecting, • organizing, • summarizing, • analyzing data, and • Draw conclusions. Objective: The primary objective of statistics is inference. The applications of statistics can be divided into two broad areas: 1. Descriptive Statistics 2. Inferential Statistics Variable: is a characteristic of an individual population unit. Data are the values (measurements or observations) that the variables can assume. Variables whose values are determined by chance are called random variables. For example: 12, 13, 69, 98, 78, 87, 36, 54, 68, 36, 63, 85, 79, 75, 32, 16, 57, 58, 34, 91, 74, 83, 92. Each value in the data set is called a data value or a datum. 1. Descriptive statistics: consists numerical and graphical techniques to summarize and present the information in the data set. 2. Inferential statistics consists of estimation, prediction, or generalizing from samples to populations. Qualitative variables are variables that can be placed into distinct categories, according to some characteristic or attribute. 2 Elementary Statistics Chapter 1 Dr. Ghamsary Page 3 For example, • gender (male or female) • Race (White, Black, Hispanic, etc) • Religion Quantitative variables: are numerical in nature and can be ordered or ranked. For example, • Age is numerical and the values can be ranked. • Height • Scores on a test of Stat class Discrete variables Assumes a finite number of possible values that can be counted. For example: • Numbers of telephone calls is made at the switch board of our school every day. {0, 1, 2, 3, 4,…} • Number of accidents in FWY 5 • Number of babies delivered at LLU hospital Continuous variables can assume infinitely many values between any two specific values such that there would be no gaps. • Height of boys born at UCLA hospital on July 4th • Amount of rain falls in California in the year 2000. • # of car accidents in FWY 10 from 5 to 7PM daily • # of babies delivered at LLU hospital daiy 3 Elementary Statistics Chapter 1 Dr. Ghamsary Page 4 Levels of Measurement When we observe and record a variable, it has characteristics that influence the type of statistical analysis that we can perform on it. These characteristics are referred to as the level of measurement of the variable. The first step in any statistical analysis is to determine the level of measurement; it tells us what statistical tests can and cannot be performed. • There are four levels of measurement: 1. Nominal 2. Ordinal 3. Interval 4. Ratio 1. The nominal level of measurement: Refers to data consist of names and/or categories so that the data cannot be arranged in any specific ordering scheme. The nominal level of measurement occurs when the observations do not have a meaningful numeric value. For example: • Sex ( Male, Female) • Race (White, Black, Hispanic, Asian, Persian, etc) • Colors of car in the street • Area Code • Zip code The values of nominal variables cannot be meaningfully: • compared to see if one is larger than another • added or subtracted • multiplied or divided • calculate the mean (what most people call the average) 4 Elementary Statistics Chapter 1 Dr. Ghamsary Page 5 2. The ordinal level of measurement classifies data into categories that can be ranked; but differences between the ranks cannot be determined. The Ordinal variables are used to represent observations that can be categorized and rank ordered For example: • Letter Grades such as A, superior; B, good; C, average; D, poor; F, Fail • Size of cars in the street: Small, Medium, and Large. • Scoring in games: 1st, 2nd, 3rd,…. • Class rank, • Order of finishing a horse race, • How much you prefer various vegetables The values of ordinal variables can be: • compared to see if they are equal or not • compared to see if one is larger or smaller than another The values of ordinal variables cannot be meaningfully: • added or subtracted • multiplied or divided • calculate the mean 3. The interval level of measurement is like ordinal, with additional property that differences between units of data can be defined, but there is no meaningful zero. The Interval variables represent observations that can be categorized, rank ordered, and have an unit of measure. • An unit of measure implies that the difference between any two successive values is identical With an interval scaled variable, the value 0 does not represent the complete absence of the variable. 5 Elementary Statistics Chapter 1 Dr. Ghamsary Page 6 The values of interval variables can be: • compared to see if they are equal or not • compared to see if one is larger or smaller than another • added or subtracted The values of interval variables cannot be meaningfully: • multiplied or divided (eg. 60oF is not twice as hot as 30oF) For example: • Temperature, like Fahrenheit as, we know there is no natural 0. • The years • IQ scores • Shoe size 4. The ratio level of measurement is just like the interval measurement, and there exists a natural zero. In addition, true ratios and differences both exist for the same variable. The Ratio variables represent observations that can be categorized, rank ordered, have an unit of measure and have a true zero • The true zero implies that a value of zero represents the complete absence of the variable The values of ratio variables can be: • compared to see if they are equal or not • compared to see if one is larger or smaller than another • added or subtracted • multiplied or divided 6 Elementary Statistics Chapter 1 Dr. Ghamsary Page 7 For example: • Weight • Height • Age • Length • Distance Most students have trouble differentiating between interval and ratio levels of measurement. Here is a simple test: If one number is twice the other is the quantity being measured also twice the other quantity? • For example if you have two weights 120 lbs. and 240 lbs. it should be clear that 240 lbs. is twice as heavy as 120 lbs. So weights are an example of a ratio level of measurement. • However say you have two temperatures 30 degrees and 60 degrees, 60 degrees is not twice as hot as 30 degrees, so this is an example of an interval level of measurement. Another test is that in the ratio level of measurement zero means absence of quantity. If you consider weights, 0 lb. means that you have NO weight (so weight is ratio), while with the interval level of measurement, such as temperature 0 degrees Fahrenheit does not mean the absence of heat which is what temperature measures. Population: consists of all units (subjects, objects, etc) that are being studied. Sample is a subset of the units of a population. Parameter: descriptive measure of the population: Usually represented by Greek letters Statistic: descriptive measure of a sample: Usually represented by Roman letters 7 Elementary Statistics Chapter 1 Dr. Ghamsary Measure Page 8 Sample Population (Statistics) (Parameters) Mean x µ Variance s2 σ2 Standard Deviation s σ Correlation Coefficient r ρ Proportion p̂ p Slope of Simple Regression β̂1 β1 Size n N Summary of Data Classifications 8 Elementary Statistics Chapter 1 Dr. Ghamsary Page 9 Example1: From a sample of students in your statistics class, you collect the following: the student's name, gender, SAT score, age, IQ, birth date (BD), and their grade in a freshman level math class. Use the measurement of Qualitative or Quantitative to answer the following. Which scale of measurement? 1. The variable student's name is measured on 2. The variable student's gender is measured on 3. The variable student's SAT score is measured on 4. The variable student's age is measured on 5. The variable student's IQ is measured on 6. The variable student's BD is measured on Example2: From a sample of students in your statistics class, you collect the following: the student's name, gender, SAT score, age, IQ, birth date, and their grade in a freshman level math class. Use the measurement of Nominal, Ordinal, Interval or Ratio to answer the following. Which scale of measurement? 1. The variable student's name is measured on 2. The variable student's gender is measured on 3. The variable student's SAT score is measured on 4. The variable student's age is measured on 5. The variable student's IQ is measured on 6. The variable student's BD is measured on 9 Elementary Statistics Chapter 1 Dr. Ghamsary Page 10 Example3: A researcher is claiming that the average age of women who are graduated from medical school at Loma Linda Medical School is about 27 years. To test his hypothesis, he randomly selected 200 female doctors who have graduated from LLU medical school. 1. Describe the population. 2. Identify the variable of interest. 3. Is the variable quantitative (qualitative)? 4. Is the variable discrete or continuous? 5. Identify the type of the variable. 6. Describe the sample. 7. Describe the inference. Example4: A researcher in LA county is claiming that the men and women have different attitude toward abortion. He randomly selected 500 men and 500 women and ask them to see if they are antiabortion. 1. Describe the population. 2. Identify the variable of interest. 3. Is the variable quantitative(qualitative)? 4. Is the variable discrete or continuous? 5. Identify the type of the variable. 6. Describe the sample. 7. Describe the inference. Example5: Read the following article and answer the following questions A study in California (which also funds abortions for the poor) found that by 1990, among young white women. there was no difference in the rate of breast cancer between rich and poor. 1. Describe the population. 2. Identify the variable of interest. 3. Is the variable quantitative(qualitative)? 4. Is the variable discrete or continuous? 5. Identify the type of the variable. 6. Describe the sample. 7. Describe the inference 10 Elementary Statistics Chapter 1 Dr. Ghamsary Page 11 Methods of Sampling: There are many method of sampling, but we will describe 5 common and basic method of sampling as follows: a. Convenience Sampling b. Simple Random Sampling c. Systematic Sampling d. Stratified Sampling e. Cluster Sampling Convenience sampling: attempts to obtain a sample of convenient elements. Often, respondents are selected because they happen to be in the right place at the right time. For example: • use of students, and members of social organizations • mall intercept interviews without qualifying the respondents • department stores using charge account lists • “people on the street” interviews Simple Random Sampling (SRS) • Each element in the population has a known and equal probability of selection. • Each possible sample of a given size (n) has a known and equal probability of being the sample actually selected. • This implies that every element is selected independently of every other element 11 Elementary Statistics Chapter 1 Dr. Ghamsary Page 12 Systematic Sampling • The sample is chosen by selecting a random starting point and then picking every ith element in succession from the sampling frame. For example, there are 1000 elements in the population and a sample of 100 is desired. In this case the sampling interval is 10. Stratified Sampling • A two-step process in which the population is partitioned into subpopulations, or strata. • The strata should be mutually exclusive and collectively exhaustive in that every population element should be assigned to one and only one stratum and no population elements should be omitted. • Next, elements are selected from each stratum by a random procedure, usually SRS. • A major objective of stratified sampling is to increase precision without increasing cost • The elements within a stratum should be as homogeneous as possible, but the elements in different strata should be as heterogeneous as possible. • The stratification variables should also be closely related to the characteristic of interest. • Finally, the variables should decrease the cost of the stratification process by being easy to measure and apply. • In proportionate stratified sampling, the size of the sample drawn from each stratum is proportionate to the relative size of that stratum in the total population. • In disproportionate stratified sampling, the size of the sample from each stratum is proportionate to the relative size of that stratum and to the standard deviation of the distribution of the characteristic of interest among all the elements in that stratum. 12 Elementary Statistics Chapter 1 Dr. Ghamsary Page 13 Cluster Sampling • The target population is first divided into mutually exclusive and collectively exhaustive subpopulations, or clusters. • Then a random sample of clusters is selected, based on a probability sampling technique such as SRS. • For each selected cluster, either all the elements are included in the sample (one-stage) or a sample of elements is drawn probabilistically (two-stage). • Elements within a cluster should be as heterogeneous as possible, but clusters themselves should be as homogeneous as possible. Ideally, each cluster should be a small-scale representation of the population. • In probability proportionate to size sampling, the clusters are sampled with probability proportional to size. In the second stage, the probability of selecting a sampling unit in a selected cluster varies inversely with the size of the cluster. 13 Elementary Statistics Chapter 1 Dr. Ghamsary Page 14 Review of Chapter 01 • Determine whether the given values are from a discrete or continuous data set. 1. In a sample data of 100 Pepsi’s can we find that the average size of Pepsi’s can was 11.98oz 2. Ina survey of 1,011 adults, it is found that 450 of them have smoked at least once in their life. 3. Ina survey of 3,289 adults, it is found that 45% of them have garden in their homes 4. The average American drink 2 cup of coffee per day. • Determine whether the given variables are from a Qualitative or Quantitative. 5. Area Codes of for the phone # of students in this class 6. Social Security of students in this class 7. Professor’s nationality who are teaching in this school 8. Height of students in this class. • Determine which of the four levels of measurement is most appropriate: Nominal, Ordinal, Interval, or Ratio. 9. Area Codes of for the phone # of students in this class 10. Social Security of students in this class 11. Professor’s nationality who are teaching in this school 12. Height of students in this class. 13. Ratings of good, average, poor for today lecture. 14. Current temperatures of this class room. 15. Numbers on the Laker’s basketball players. 16. The year of student’s birth day. 17. Drivers license numbers. 14 Elementary Statistics Chapter 1 Dr. Ghamsary Page 15 • Identify which of these types of sampling is used: Random (SRS), Systematic, Stratified, Cluster, or Convenience. 18. An Los Angeles Times reporter gets a reaction to a breaking story by poling people as they pass the front of the Times building. 19. Dr. Ghamsary has randomly selected 5 students in his class. 20. The Orange County Commissioner of Jurors obtains a list of 55,014 car owners and constructs a poll of jurors by selecting every 50th name on the list. 21. In a Harris poll of 1,011 adults, the interview subjects were selected by using a computer to randomly generate telephone numbers that were then called. 22. A Ford Motor Company researcher has partitioned all registered cars into categories of compact, mid-size, and family-size. He is surveying 75 car owners from each category. 23. Motivated by a student who died from binge drinking, Chico State conducts a study of student drinking by randomly selecting 10 different classes and interviewing all of the students in each of those classes. 24. A statistics student obtains height/weight data by interviewing the members of his fraternity. 25. A UCLA researcher surveys all cardiac patients in each of 30 randomly selected hospitals. 15