Statistics-MAT 150 Chapter 2 Descriptive Statistics

advertisement
Statistics-MAT 150
Chapter 1
Introduction to Statistics
Prof. Felix Apfaltrer
fapfaltrer@bmcc.cuny.edu
Office:N518
Phone: x7421
Chapter 1
• Overview
• Nature of data
• Skills needed in statistics
Overview
Statistics:
• Descriptive
– Analyze nature of data
from surveys,
experiments,
observations,
• Inferential
– Draw conclusions from
the analyses with
respect to the
population
Survey: tool to collect
data from a smaller
group which is part of
a larger group to learn
something about the
larger group
Key goal of statistics:
•Learn about a large group
(population) from data from
from a smaller subgroup
(sample)
Overview
Definitions:
• Data: observations collected (measurements, gender,
answers,…)
• Statistics: collection of methods to analyze data
• Population: complete collection of elements (scores,
measurements, subjects,…)
• Sample: subcollection of members from selected
population
• Census: collection of data from every member of the
population
Overview 2
Example:
• Poll: 1087 adults are asked whether they drink
alcoholic beverages or not.
– Sample: 1087 adults
– Population: US adults 150 million.
• Census: Every 10 years, the census bureau tries to
collect information from every member of the US
population.
– Impossible!
– Very expensive!
• Use sample data to draw conclusions from whole
population: inferential statistics!
Types of data
Parameter:
• A numerical measurement describing some
characteristic of the population.
• Lincoln elected: 39.82% of 1,865,908 votes counted.
– 39.82% is a parameter.
Statistic:
• A numerical measurement describing some
characteristic of the sample.
• Based on a sample of 877 elected executives, 45%
would not hire an applicant with a typographical
error in the application.
– 45% is a statistic.
Types of data 2
Quantitative data:Numbers representing counts or measurements.
• Weights of supermodels.
Qualitative data: Nonnumerical.
• Gender of an athlete.
Discrete vs. continuous data
• # of people in a household vs. temperatures in May.
Nominal level of measurement: names, labels categories: no ordering.
• Yes/No/Undecided responses, colors.
Ordinal level of measurement: some order, but numerical values
meaningless or nonexistent.
• Course grades A, B, C, D, F. “Livability rank of a city”.
Interval level of measurement: order, but “no 0” or meaningless.
• Temperature, year.
Ratio level of measurement: as before with meaningfull zero.
• Weights, prices (non-negative).
Basic skills
Samples:
• representative:
• “39/40 polled people vote for A” Sampled in A’s headquarters!
• Not too small:
• CDF published “among HS students suspended, 67% suspended more
than 3 times” Sample size: 3!
Graphs: In which one does red do better?
Median Weekly Income (16-24)
Median Weekly Income (16-24)
$390
$400
$380
$350
$370
$300
$360
$250
$350
$200
$340
$150
$330
$320
$100
$310
$50
$0
$300
Men
Women
Percentage of:
• 6 % of 1200 = 6 / 100 * 1200 = 72%
Fraction >>> percentage:
• 3/4 = 0.75 >>> 0.75 * 100% = 75 %
Men
Women
Percentage >>> decimal:
• 27.3% = 27.3/100 = 0.273
Decimal >>> percentage:
• 0.852 >>> 0.852 * 100%
= 85.2%
Calculator:
Basic skills 2
Design
Observational study: observe and measure characteristics without trying
to modify subjects.
• Gallup poll.
• Cross-sectional: data observed, measured at one point in time.
• Retrospective: data are collected from the past (records)
• Prospective: data collected along the way from groups (smokers/NS)
Experiment: apply treatment and observe and measure effects.
• Clinical trial for Lipitor.
• Control: blinding - placebo, double-blinding, blocks
• Replication: ability to repeat experiment
• Randomization: data needs to be collected in an appropriate (random)
way, otherwise it is completely useless!
– Random sample: members of the population are selected so that each
individual member has the same chance of being selected.
– Simple random sample of size n : every possible random sample of size n
has the same chance of being chosen.
Design 2
Sampling:
• systematic: select starting point and every kth member chosen.
• convenience: use easy to get data
• stratified: subdivide population into at least 2 subgroups with common
characteristic and draw samples from each (e.g. gender or age)
• cluster: divide population into areas and draw samples form
clusters
Sampling error: the difference between a sample result and the true
population result; results from chance sample fluctuations
Nonsampling error: occurs when data is incorrectly collected,
measured, recorded or analyzed.
Download