INTRODUCTION TO STATISTICS What is Statistics ✓ Statistics is all about converting data into useful information to solve a problem ✓ Statistics is concerned with scientific methods for collecting, organizing, summarizing, presenting and analyzing data as well as deriving valid conclusions and making reasonable decisions on the basis of this analysis. Collecting data Organising data Summarizing data Present data Interpreting data Descriptive and Inferential Statistics Descriptive statistics Inferential statistics • Uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables. • The process of collecting, compiling, summarizing, and presenting data into graphical forms such as charts, graphs, tables or numerical form such as averages and percentages derived from them so that one can evaluate the data set easily. • Makes inferences and predictions about a population based on a sample of data taken from the population in question. • A decision, estimate, prediction, or generalization about a population based on a sample. It consists of methods that use sample results to help make decisions or predictions about the population such as estimation, hypothesis testing, probability, regression and etc. • Example: a) The percentage growth of Malaysia’s population from one decade to the next b)The average income of the 104 families in Maju Berhad is RM 28 673 per annum • Example: a)Based on the sample survey by a lecturer at a higher learning institution, only 45% of diploma graduates further their studies in the Bachelor’s program in local IPTA. b)Department of Labour uses the average income of a sample of several hundreds workers to estimates the averages income of all 3 million workers. Statistical Terms Research/Survey – A study that is done using the statistical methods in order to understand certain problem. Element/Experimental units – the objects either people or things on which measurements in taken. Population – All elements under study either living or non-living object. Sample – subset or part of population. Sampling – the process of selecting sample from the interest population. Sampling frame – a list of sampling units used to select the sample. Sampling unit – the elements listed in the frame. Pilot survey – A study done on a small scale before the actual survey. Sample survey – A study done based on sample. Census – A study done on the entire population. Parameters – A summary measure/characteristics obtained from population. Statistics – A summary measure/characteristics obtained from sample. Data – A collection of observations, measurements or information obtained from study that is carried out. Variable/Attribute – Characteristics of the population under study. TYPES OF VARIABLES A variable is an attribute that describe a person, place, thing or idea. The value of the variable can “vary” from one entity to another. Random variables and data can be classified into two main categories measured according to their specific categories or characteristics. Qualitative variable Types of Variable Example: gender (male, female), marital status (single, married), races (Malay, Indian, Chinese), grade (A, B, C) Discrete assume only exact values Quantitative variable Continuous can be expressed in a certain degree of accuracy Example: no. of student, no of car, no of book Example: Distance traveled litters of petrol, weight and height of children. LEVEL/SCALE OF MEASUREMENT Level of measurement Nominal • Classifies objects into categories Ordinal • Classified and rank the objects Interval • The value of interval variables cannot be meaningful multiplied or divided Ratio • Has meaningful zero point • The value of interval variables can be meaningful multiplied or divided Example Gender Race Religion Level of education Stage of cancer Agreement level Temperatures Shoe size IQ scores Salary Weight Height SOURCES OF DATA Data Explanation Advantage Primary Data • First hand data • Researcher carried out the research and obtained the data directly from respondent • Accurate • Reliable • Up to date Secondary Data • Data obtained from another • Less time • Less Effort sources • Inexpensive data sources Disadvantage • Time consuming • Costly • Requires a lot of man power • May not meet our specific objective DATA COLLECTION METHOD Advantage Personal interview/ face to face interview ▪ Telephone ▪ Direct observation ▪ ▪ ▪ Questionnair ▪ e ▪ ▪ ▪ Disadvantage Obtained higher percentages of response than other ▪ methods ▪ Allows the interviewer to clarify any terms that aren’t understood by the respondent The cost is high (pay interviewers, salary, travelling etc) Expression of researcher can lead to bias This method provide information from wide geographical access The process of interviewing quicker and less expensive The researcher will get the answer spontaneously and get the answer correctly ▪ ▪ Interviewers have limitation in asking questions Lower response rate ▪ ▪ ▪ ▪ Time consuming Validity and reliability may be problematic Requires skilled observer Does not provide complete information for more complex jobs Cheaper than personal interviews The research coverage is wider No interviewer influence The respondent has more time to think of proper response ▪ ▪ ▪ ▪ Normally, the rate of response is quite low It may be biased because only particular types of people will reply Only very simple questions can be asked Not able to interact with the respondent Sampling Technique WHY NEED SAMPLING? Sampling is required whenever the process of implementing the research become costing and timely. Probability sampling Every elements in the population has equal chance to be selected as sample. ✓ Simple random sampling (SRS) ✓ Systematic sampling ✓ Stratified sampling ✓ Cluster sampling Sampling technique Non-probability sampling Not all elements in the population has equal chance to be selected as sample. ✓ Quota sampling ✓ Convenience sampling ✓ Judgmental sampling ✓ Snowball sampling PROBABILITY SAMPLING Simple Random Sampling (SRS) ➢ Each item has the same chance of being selected as a sample. ➢ Characteristic of SRS a) Target population must homogenous b) Must have complete sampling frame ➢ Example i) Lucky Draw method ii) Random Number Population : 12 students Sample size : 4 students Systematic Sampling A sample obtained by selecting every kth member of the population where k is a counting number Step 1. Identify the population size (N), and sample size (n). 2. Obtained the range k by dividing the population size by the sample size. Sampling Interval, th element is selected. 3. Randomly select one element N from the first k elements in the list (using SRS). Suppose the r k= n 4. Lastly sample every kth element in the population begins with the r element until a sample of size n obtained. r th, (r+k)th, (r+2k)th, ..., (r+(n-1)k)th Population : 12 students Sample size : 4 students k= 𝑁 𝑛 = 12 4 N = 12 n=4 =3 Let say we randomly choose to starts with second students Student no 2,5,8 and 11 will be selected Stratified Sampling Stratified random sampling Applicable for population that is categorized such as according to sex, races, etc. Characteristics of the population: Elements in each stratum are homogeneous Elements between the strata are heterogeneous Step 1: Group the students based on course Group 1 : Tourism Group 2 : Foodservice Group 3 : Culinary Step 2: Find the number of sample for each group Want to select 4 students Population : 12 students Sample size : 4 students 3 ×4=1 12 6 Foodservice= ×4=2 12 3 Culinary= ×4=1 12 Tourism= Step 3: Choose randomly using SRS or Systematic Sampling from each strata (course) Cluster Sampling Applicable for a population that is divided into homogeneous or similar cluster. Elements in the cluster are heterogeneous. How to use cluster sampling? A population is divided into clusters (using naturally occurring geographic or other boundaries) Then clusters are randomly selected. A sample is collected by taking all elements in the selected clusters. Population : 6 campus Sample size : 2 campus Randomly choose two campus using SRS or Systematic Sampling Non-Probability Sampling Non-Probability Description Sampling Convenience ➢ The selection of elements or sampling units is left primarily to the interviewer ➢ Recommended for: sampling a. Pilot study b. Generating idea c. Insights/opinion d. Hypothesis/conclusion Judgemental ➢ Population elements selected based on the judgement or expertise of the researcher. He believes sampling the elements are represents of the population of interest ➢ An initial group of respondent is selected usually at random and were asked to recommend other who belongs to the target population of interest. ➢ The initial sample/subject selected using probability sampling ➢ The respondents, having the similar characteristics Quota sampling ➢ Similar to the convenience sampling except the number allocated for each group of respondents is based on the population statistics. Snowball sampling Exercise 1 For each statement, state whether descriptive or inferential statistics: a) The average life expectancy in New Zealand is 78.49 years. b) A researcher founded that there is positive relationship between salary and food expenditure. c) The price of shirt Shopping Complex B is more consistent as compared to price of shirt Shopping Complex C. d) There is significantly difference to shows that female gain more salary than male. e) The total population in Malaysia is stated in 2010 is 24 million people Exercise 2 What level of measurement would be to measure each variable? a) The ages of patients in a local hospital b) The ratings of movies released this month c) Colours of athletic shirts sold by Oak Park Health Club d) Temperatures of hot tubs in local health clubs e) Rating of text book (poor, fair, good, excellent) f) Ranking of golfers in a tournament Exercise 3 Classify each sample as random, systematic, stratified, cluster, or other. a) To check accuracy of a machine that is used for filling ice cream container, every 20th bottle is selected and weighed. b) In a large school district, a researcher number all the full-time teachers and then randomly selects 30 teachers to be interviewed. c) Out of hospital in a municipality, a researcher selects one and collects records for a 24hour period on the types of emergencies that were treated here. d) A researcher divides a group of students according to gender, major field, and low, average, and high grade point average. Then she randomly selects six students from each group to answer questions in a survey. e) The subscribers to a magazine are numbered. Then a sample of these people is selected using random number Exercise 4 Suppose the following information is obtained from students upon existing from the campus bookstore during the first week of classes. Identify the types of variables used and also the corresponding scales of measurements. a) Amount of money spent on books b) Number of textbooks purchased c) Amount of time spent shopping in the bookstore. d) Program enrolled e) Number of credits registered in the current semester f) Method of payment Example Final Question Farid is the manager of FashazTourist Agency. He is interested to determine the level of satisfaction on the needs and preferences of his customers who have booked their overseas tours through his agency. Out of 10,000 of his customers, he selected randomly 1000 customers and posted a questionnaire to his customers. a) b) c) d) e) f) g) h) State the population of the above study Is the above study a census study or a sample survey? Explain your answer Is sample survey, state the sample in this study? Hence state the sampling unit. What is the sampling frame for this study Does the study involve primary data or secondary data? Give a reason to support your answer. State the variable(s) involved in the above study Classify the variable(s) in part (f) whether it is qualitative, quantitative discrete or quantitative continuous variable State the level of measurement for the variable in part(f) July 2017 The production manager of a glove factory wants to find out how many defective surgical gloves are produced per shift. A random sample of 200 surgical gloves were selected from a total of 2000 pieces of surgical gloves produced per shift by selecting every 10th gloves a) State the population and sample of this study b) Identify whether the researcher conducted a census or sample survey. Give a reason for your answer c) Identify the variable of interest in this study. State the type of variable and its level of measurement d) State the sampling method used in this study. Give a reason for your answer e) Give the suitable data collection method for this study Jan 2018 A headmaster of a private higher learning institution is interested to study the relationship between the student hours spending on social media and their academic performances. He believed that the more time student spent on social media, the more likely the students will fail in their academics. The institution has a total of 2500 students. Based on the previous semester examinations, the students academic performance has been categorized as Excellent, Moderate and Low, whereby the number of students in each categories are 750, 1350, 400 respectively. A random sample of 100 students was selected for this study and the time spent on social media was recorded. a) State the population and sample for the above study b) State the sampling frame c) Identify whether the researcher conducted a census or sample survey. Give a reason for your answer d) Give a suitable data collection method used in this study e) What is sampling technique used for this study f) Suggest another sampling method that can be used by the headmaster. Explain briefly the sampling method chosen. June 2018 SS Airlines has implemented a new boarding policy. In order to determine its customers opinion of this new policy, a group of researchers made a list of all its flights and randomly selected 30 flight. All of the passengers on those flights were invited to answer a questionnaire during a certain week. On of the survey items was “Please rate your overall boarding experience today based on the following scale: 1Excellent; 2-Good; 3-Fair; 4-Poor; 5-Very poor” a) State the population and the sample of this study b) State the sampling frame for the study c) Name the sampling method used in this study. Give a reason for your answer d) Identify the type of variable and the scale of measurement for the variable “boarding experience rating” Dec 2018 In the automobile industry, customer service is a crucial factor affecting car sales. The management of a reputed automobile company is interested in determining the level of customers satisfaction with the service provided by the company’s service center. The company has altogether 40 service centers throughout Malaysian. A sample of eight center was selected at random. Questionnaire are disseminated to all customers who service their cars at these eight selected service center on one selected days ( the day of the survey). One of the questions asked is satisfaction level n the services provided(using rating: good,fair,poor). a) State the population and sample of the study b) State the sampling frame for the study c) Name the variable of interest for the above study. State its type and its level of measurement d) Give one advantage and one disadvantage of data collection method used in this study e) Identify the sampling technique used in the survey. Explain briefly how the ample is selected.