Uploaded by idzreenriena

CHAPTER 1 - INTRODUCTION TO STATISTICS

advertisement
INTRODUCTION TO STATISTICS
What is Statistics
✓ Statistics is all about converting data into useful information to solve a problem
✓ Statistics is concerned with scientific methods for collecting, organizing,
summarizing, presenting and analyzing data as well as deriving valid conclusions and
making reasonable decisions on the basis of this analysis.
Collecting
data
Organising
data
Summarizing
data
Present
data
Interpreting
data
Descriptive and Inferential Statistics
Descriptive statistics
Inferential statistics
• Uses the data to provide descriptions of the population, either
through numerical calculations or graphs or tables.
• The process of collecting, compiling, summarizing, and
presenting data into graphical forms such as charts, graphs,
tables or numerical form such as averages and percentages
derived from them so that one can evaluate the data set easily.
• Makes inferences and predictions about a population based on
a sample of data taken from the population in question.
• A decision, estimate, prediction, or generalization about a
population based on a sample. It consists of methods that use
sample results to help make decisions or predictions about the
population such as estimation, hypothesis testing, probability,
regression and etc.
• Example:
a) The percentage growth of Malaysia’s population from one
decade to the next
b)The average income of the 104 families in Maju Berhad is
RM 28 673 per annum
• Example:
a)Based on the sample survey by a lecturer at a higher
learning institution, only 45% of diploma graduates further
their studies in the Bachelor’s program in local IPTA.
b)Department of Labour uses the average income of a sample
of several hundreds workers to estimates the averages income
of all 3 million workers.
Statistical Terms
 Research/Survey – A study that is done using the statistical methods in order to understand certain













problem.
Element/Experimental units – the objects either people or things on which measurements in taken.
Population – All elements under study either living or non-living object.
Sample – subset or part of population.
Sampling – the process of selecting sample from the interest population.
Sampling frame – a list of sampling units used to select the sample.
Sampling unit – the elements listed in the frame.
Pilot survey – A study done on a small scale before the actual survey.
Sample survey – A study done based on sample.
Census – A study done on the entire population.
Parameters – A summary measure/characteristics obtained from population.
Statistics – A summary measure/characteristics obtained from sample.
Data – A collection of observations, measurements or information obtained from study that is carried out.
Variable/Attribute – Characteristics of the population under study.
TYPES OF VARIABLES
 A variable is an attribute that describe a person, place, thing or idea. The value of the variable can “vary” from
one entity to another.
 Random variables and data can be classified into two main categories
measured according to their
specific categories or
characteristics.
Qualitative
variable
Types of
Variable
Example:
gender (male, female), marital status
(single, married), races (Malay,
Indian, Chinese), grade (A, B, C)
Discrete
assume only
exact values
Quantitative
variable
Continuous
can be expressed in a
certain degree of
accuracy
Example:
no. of student, no of car,
no of book
Example:
Distance traveled litters of
petrol, weight and height of
children.
LEVEL/SCALE OF MEASUREMENT
Level of measurement
Nominal
• Classifies objects into categories
Ordinal
• Classified and rank the objects
Interval
• The value of interval variables cannot be meaningful multiplied or divided
Ratio
• Has meaningful zero point
• The value of interval variables can be meaningful multiplied or divided
Example
Gender
Race
Religion
Level of education
Stage of cancer
Agreement level
Temperatures
Shoe size
IQ scores
Salary
Weight
Height
SOURCES OF DATA
Data
Explanation
Advantage
Primary Data
• First hand data
• Researcher carried out the
research and obtained the
data directly from
respondent
• Accurate
• Reliable
• Up to date
Secondary Data
• Data obtained from another • Less time
• Less Effort
sources
• Inexpensive data
sources
Disadvantage
• Time consuming
• Costly
• Requires a lot of man
power
• May not meet our
specific objective
DATA COLLECTION METHOD
Advantage
Personal
interview/
face to face
interview
▪
Telephone
▪
Direct
observation
▪
▪
▪
Questionnair ▪
e
▪
▪
▪
Disadvantage
Obtained higher percentages of response than other ▪
methods
▪
Allows the interviewer to clarify any terms that aren’t
understood by the respondent
The cost is high (pay interviewers, salary, travelling etc)
Expression of researcher can lead to bias
This method provide information from wide geographical
access
The process of interviewing quicker and less expensive
The researcher will get the answer spontaneously and get
the answer correctly
▪
▪
Interviewers have limitation in asking questions
Lower response rate
▪
▪
▪
▪
Time consuming
Validity and reliability may be problematic
Requires skilled observer
Does not provide complete information for more complex jobs
Cheaper than personal interviews
The research coverage is wider
No interviewer influence
The respondent has more time to think of proper response
▪
▪
▪
▪
Normally, the rate of response is quite low
It may be biased because only particular types of people will reply
Only very simple questions can be asked
Not able to interact with the respondent
Sampling Technique
WHY NEED SAMPLING?
Sampling is required whenever the process of implementing the research become costing and timely.
Probability sampling
Every elements in the
population has equal
chance to be selected as
sample.
✓ Simple random sampling
(SRS)
✓ Systematic sampling
✓ Stratified sampling
✓ Cluster sampling
Sampling technique
Non-probability
sampling
Not all elements in the
population has equal
chance to be selected as
sample.
✓ Quota sampling
✓ Convenience sampling
✓ Judgmental sampling
✓ Snowball sampling
PROBABILITY SAMPLING
Simple Random Sampling (SRS)
➢ Each item has the same chance of being selected as a sample.
➢ Characteristic of SRS
a) Target population must homogenous
b) Must have complete sampling frame
➢ Example
i) Lucky Draw method
ii) Random Number
Population : 12 students
Sample size : 4 students
Systematic Sampling
A sample obtained by selecting every kth member of the population where k is a counting number
Step
1. Identify the population size (N), and sample size (n).
2. Obtained the range k by dividing the population size by the sample size. Sampling Interval,
th element is selected.
3. Randomly select one element
N from the first k elements in the list (using SRS). Suppose the r
k=
n
4. Lastly sample every kth element
in the population begins with the r element until a sample of size n obtained.
r th, (r+k)th, (r+2k)th, ..., (r+(n-1)k)th
Population : 12 students
Sample size : 4 students
k=
𝑁
𝑛
=
12
4
N = 12
n=4
=3
Let say we randomly choose to starts with second
students
Student no 2,5,8 and 11 will be selected
Stratified Sampling
 Stratified random sampling
 Applicable for population that is categorized such as according to sex, races, etc.
 Characteristics of the population:
 Elements in each stratum are homogeneous
 Elements between the strata are heterogeneous
Step 1: Group the students based on course
Group 1 : Tourism
Group 2 : Foodservice
Group 3 : Culinary
Step 2: Find the number of sample for each
group
Want to select 4 students
Population : 12 students
Sample size : 4 students
3
×4=1
12
6
Foodservice= ×4=2
12
3
Culinary= ×4=1
12
Tourism=
Step 3: Choose randomly using SRS or
Systematic Sampling from each strata (course)
Cluster Sampling
Applicable for a population that is divided into homogeneous or similar cluster. Elements in the cluster are heterogeneous.
How to use cluster sampling?

A population is divided into clusters (using naturally occurring geographic or other boundaries)

Then clusters are randomly selected.

A sample is collected by taking all elements in the selected clusters.
Population : 6 campus
Sample size : 2 campus
Randomly choose two campus
using SRS or Systematic
Sampling
Non-Probability Sampling
Non-Probability
Description
Sampling
Convenience
➢ The selection of elements or sampling units is left primarily to the interviewer
➢ Recommended for:
sampling
a. Pilot study
b. Generating idea
c. Insights/opinion
d. Hypothesis/conclusion
Judgemental
➢ Population elements selected based on the judgement or expertise of the researcher. He believes
sampling
the elements are represents of the population of interest
➢ An initial group of respondent is selected usually at random and were asked to recommend other
who belongs to the target population of interest.
➢ The initial sample/subject selected using probability sampling
➢ The respondents, having the similar characteristics
Quota sampling ➢ Similar to the convenience sampling except the number allocated for each group of respondents is
based on the population statistics.
Snowball
sampling
Exercise 1
For each statement, state whether descriptive or inferential statistics:
a) The average life expectancy in New Zealand is 78.49 years.
b) A researcher founded that there is positive relationship between salary and food
expenditure.
c) The price of shirt Shopping Complex B is more consistent as compared to price
of shirt Shopping Complex C.
d) There is significantly difference to shows that female gain more salary than male.
e) The total population in Malaysia is stated in 2010 is 24 million people
Exercise 2
What level of measurement would be to measure each variable?
a) The ages of patients in a local hospital
b) The ratings of movies released this month
c) Colours of athletic shirts sold by Oak Park Health Club
d) Temperatures of hot tubs in local health clubs
e) Rating of text book (poor, fair, good, excellent)
f) Ranking of golfers in a tournament
Exercise 3
Classify each sample as random, systematic, stratified, cluster, or other.
a) To check accuracy of a machine that is used for filling ice cream container, every 20th
bottle is selected and weighed.
b) In a large school district, a researcher number all the full-time teachers and then
randomly selects 30 teachers to be interviewed.
c) Out of hospital in a municipality, a researcher selects one and collects records for a 24hour period on the types of emergencies that were treated here.
d) A researcher divides a group of students according to gender, major field, and low,
average, and high grade point average. Then she randomly selects six students from each
group to answer questions in a survey.
e) The subscribers to a magazine are numbered. Then a sample of these people is selected
using random number
Exercise 4
Suppose the following information is obtained from students upon existing from the
campus bookstore during the first week of classes. Identify the types of variables used
and also the corresponding scales of measurements.
a) Amount of money spent on books
b) Number of textbooks purchased
c) Amount of time spent shopping in the bookstore.
d) Program enrolled
e) Number of credits registered in the current semester
f) Method of payment
Example Final Question
Farid is the manager of FashazTourist Agency. He is interested to determine the level of satisfaction on the
needs and preferences of his customers who have booked their overseas tours through his agency. Out of
10,000 of his customers, he selected randomly 1000 customers and posted a questionnaire to his customers.
a)
b)
c)
d)
e)
f)
g)
h)
State the population of the above study
Is the above study a census study or a sample survey? Explain your answer
Is sample survey, state the sample in this study? Hence state the sampling unit.
What is the sampling frame for this study
Does the study involve primary data or secondary data? Give a reason to support your answer.
State the variable(s) involved in the above study
Classify the variable(s) in part (f) whether it is qualitative, quantitative discrete or quantitative continuous
variable
State the level of measurement for the variable in part(f)
July 2017
The production manager of a glove factory wants to find out how many defective
surgical gloves are produced per shift. A random sample of 200 surgical gloves were
selected from a total of 2000 pieces of surgical gloves produced per shift by selecting
every 10th gloves
a) State the population and sample of this study
b) Identify whether the researcher conducted a census or sample survey. Give a
reason for your answer
c) Identify the variable of interest in this study. State the type of variable and its
level of measurement
d) State the sampling method used in this study. Give a reason for your answer
e) Give the suitable data collection method for this study
Jan 2018
A headmaster of a private higher learning institution is interested to study the relationship between
the student hours spending on social media and their academic performances. He believed that the
more time student spent on social media, the more likely the students will fail in their academics.
The institution has a total of 2500 students. Based on the previous semester examinations, the
students academic performance has been categorized as Excellent, Moderate and Low, whereby
the number of students in each categories are 750, 1350, 400 respectively. A random sample of 100
students was selected for this study and the time spent on social media was recorded.
a) State the population and sample for the above study
b) State the sampling frame
c) Identify whether the researcher conducted a census or sample survey. Give a reason for your
answer
d) Give a suitable data collection method used in this study
e) What is sampling technique used for this study
f)
Suggest another sampling method that can be used by the headmaster. Explain briefly the
sampling method chosen.
June 2018
SS Airlines has implemented a new boarding policy. In order to determine its
customers opinion of this new policy, a group of researchers made a list of all its
flights and randomly selected 30 flight. All of the passengers on those flights were
invited to answer a questionnaire during a certain week. On of the survey items was
“Please rate your overall boarding experience today based on the following scale: 1Excellent; 2-Good; 3-Fair; 4-Poor; 5-Very poor”
a) State the population and the sample of this study
b) State the sampling frame for the study
c) Name the sampling method used in this study. Give a reason for your answer
d) Identify the type of variable and the scale of measurement for the variable
“boarding experience rating”
Dec 2018
In the automobile industry, customer service is a crucial factor affecting car sales. The management
of a reputed automobile company is interested in determining the level of customers satisfaction
with the service provided by the company’s service center. The company has altogether 40 service
centers throughout Malaysian. A sample of eight center was selected at random. Questionnaire are
disseminated to all customers who service their cars at these eight selected service center on one
selected days ( the day of the survey). One of the questions asked is satisfaction level n the services
provided(using rating: good,fair,poor).
a) State the population and sample of the study
b) State the sampling frame for the study
c) Name the variable of interest for the above study. State its type and its level of measurement
d) Give one advantage and one disadvantage of data collection method used in this study
e) Identify the sampling technique used in the survey. Explain briefly how the ample is selected.
Download