Example

advertisement
1
Chapter 1 – Introduction: The Role of Statistics in
Engineering
Example: The manufacturer of a medical laser used in ophthalmic
surgery wants to be able to quote quality characteristics of the laser
to potential customers. One characteristic that they want to use is
the average lifetime of the laser under normal use. They could
obtain the exact average lifetime by running each laser produced
until it wears out, recording the lifetime for that laser, and finding
the average over all lasers produced. They would then know the
exact average lifetime for this type of laser. The drawback to this
procedure is that they would be left with no product to sell. In order
to both stay in business and advertise quality characteristics to
potential customers, they need to find a way to estimate the average
lifetime from a relatively small sample of lasers. Since they are
using only some of the lasers produced, not all, their estimate will
have some uncertainty. One major use of statistics is to quantify the
degree of uncertainty in such situations.
Definition: Statistics is the branch of applied mathematics that
deals with collection, organization, and interpretation of numerical
data, especially the analysis of population characteristics by
inference from sampling.
There are two general branches of statistics: 1) Descriptive statistics
and 2) Inferential Statistics.
Definition: Descriptive statistics consists of methods of organizing,
summarizing, and presenting data in an informative way.
Graphical techniques allow us to present summaries of data in
pictorial form, so that data characteristics may be easily seen.
2
Numerical techniques provide various summary values which
represent the characteristics of the data set.
Definition: A unit is a single entity, usually an object or a person,
whose characteristics are of interest to the researcher.
Definition: A population of units is the set of all items of interest in
a statistical problem.
Example 1: All registered voters in Florida in November,
2012.
Example 2: All cars of a certain model coming off an assembly
line in October, 2009.
Example 3: All 12-oz. cans of Pepsi-Cola produced at a certain
factory in the year 2009.
Definition: A statistical population is the set of all measurements
corresponding to each unit in the entire population of units.
Note: We will generally use the term population to refer to either a
population of units or a statistical population.
Definition: A parameter is a numerical characteristic of a
population.
Example 1: Proportion of all registered voters in Florida who
intend to vote for Pres. Barack Obama in November, 2012.
Example 2: Average time until first major repair job for all
cars of a certain model coming of an assembly line in October,
2009.
Example 3: Average amount of Pepsi-Cola, by weight, in all
12-oz. cans of Pepsi-Cola produced at a certain factory in the
year 2009.
3
Definition: A sample is a subset of a population. We will also use
the term sample to denote the subset of measurements that are
actually collected by the researcher.
Example 1: One thousand randomly selected registered voters
from across Florida in October, 2012.
Example 2: Every 50th car of a certain model coming off an
assembly line in October, 2009.
Example 3: Every 100th 12-oz. can of Pepsi-Cola produced at a
certain factory in the year 2009.
Note 1: The description of a sample must refer to the population
from which the sample was selected.
Note 2: Depending on the method of selection, a sample may or
may not be representative of the population from which it was
selected.
Definition: A statistic is a numerical characteristic of a sample.
Example 1: The proportion of one thousand randomly selected
voters from across Florida in October, 2012 who intend to vote
for Pres. Barack Obama.
Example 2: The average time until the first major repair job for
a sample consisting of every 50th car of a certain model coming
off an assembly line in October, 2009.
Example 3: The average amount of Pepsi-Cola in a sample
consisting of every 100th can of Pepsi-Cola produced at a
certain factory in the year 2009.
Definition: Inferential statistics Consists of methods of drawing
conclusions about the characteristics of a population based on the
information obtained from a sample selected from the population.
4
Inferential statistics is divided into the fields of 1) estimation and
2) hypothesis testing.
An example of inferential statistics is the estimation of the
average lifetime for the population of medical lasers based on
data from a small sample of the lasers.
Note: Inferential statistics amounts to making decisions based on
incomplete information.
Note: The particular statistical inferential technique used depends
strongly on the method by which the sample was selected from the
population.
Definition: A representative sample is a sample whose
characteristics reflect the characteristics of the population from
which the sample was selected.
Example 1: Is this sample representative? If the sample was
randomly chosen, then it has a good chance of being
representative of the population.
Example 2: Is this sample representative? What if there were
some cyclically occurring flaw in the manufacturing process
which affected every 50th car produced?
Example 3: Is this sample representative? What if there were
some flaw in the manufacturing process which led to
overfilling of every 100th can?
Note: A primary reason for working with samples instead of entire
populations is that often the populations are too large to handle
easily.
Example 1: All registered voters in Florida in October, 2012.
There are approximately 6,000,000 of them.
5
Example 2: How easy would it be to actually examine the
entire months production of cars, following them over time to
see when the first major repair job was required?
Example 3: Would we actually want to weigh the amount of
Pepsi-Cola in every 12-oz. can coming off the assembly line at
a certain factory in 2009?
Whenever we infer a population characteristic based on sample data,
there is always the chance that our inference will be incorrect.
Example: A public opinion poll conducted in 1936 for Literary
Digest Magazine (R.I.P.) predicted that Alf Landon would defeat
Franklin Delano Roosevelt in the Presidential election by a 3 to 2
margin. Actually, F.D.R. won 62% of the ballots. Why was the
prediction so incorrect?
1) The pollsters sent out 10 million sample ballots to
prospective voters, based on the magazine’s subscription list
and on telephone directories. (Poor identification of
population.)
2) Only 2.3 million of the mailed ballots were actually
returned. (Self-selection of sample.)
Note: To do valid statistical inference, we need a sample which is
likely to be representative of the population. We want to build into
our statistical inferential procedures measures of reliability, which
will tell us how likely it is that our inference is correct/incorrect.
These measures of reliability depend on the sampling method used.
For estimation of parameters based on sample statistics, the measure
of reliability is called the confidence level. For testing hypotheses
about parameters based on sample statistics, the measure of
reliability is the significance level.
6
Definition: A simple random sample of size n is a sample drawn
from a population by a method which makes every sample of size n
equally likely to be chosen.
Steps in choosing a SRS of size n:
1) Obtain a list of all members of the population; this list is called a
sampling frame. (Note: This is the most difficult step in the whole
process, and is also error-prone.)
2) Assign a unique ID number to each member of the population.
3) Go to a table of random numbers; choose a convenient starting
point; go down the column, recording numbers within the range of
the assigned ID numbers, until n distinct numbers are selected.
4) The population members that have the ID numbers obtained by
this process make up the SRS of size n.
(Step 2 may also be done using your calculator)
Note: We can never be absolutely certain that our sample is
representative, but simple random sampling gives us a good chance.
Example: I want to estimate the average height of the class, without
gathering height data for every person in the class. I will select a
simple random sample of size 3 and use the average height of the
members of the sample as the estimate of the average height of the
members of the class. I assign a unique ID number to each person in
the class; the first person on the class roll will have the ID number
001, the second person 002, etc. I then go to a table of random
numbers, open it, and blindly choose a starting point. Reading down
the column from the starting point, I find 3 distinct two-digit
numbers within the range of the values of the ID numbers. The
class members with these 3 ID numbers constitute the SRS.
7
Collecting Engineering Data
 Observational study: members of the sample are simply
observed, during routine operation, with measurements taken
- To build empirical models
- Cause-and-effect relationships cannot be confirmed
 Designed experiment: the engineer makes deliberate,
purposeful changes in controllable variables (called factors),
and observes the results of these changes
- Designing and running very efficient experiments
- Cause-and-effect relationships can be examined, using:
 Hypothesis testing and parameter estimation
A carefully designed data collection procedure (including the
method of selecting a sample from the population) will usually lead
to interpretable and useful results; a poorly designed data collection
procedure will often lead to worthless data. As R. A. Fisher said,
"Often the only thing you can do with a poorly designed experiment
is to try to find out what it died of."
Example of an Observational Study to Build an Empirical Model1
The table contains data collected on three variables in an
observational study conducted at a semiconductor manufacturing
plant. In this plant, the semiconductor is wire-bonded to a frame.
The variables are: Pull Strength – the force required to break the
bond; Wire Length; and Die Height. We want to be able to predict
the Pull Strength by knowing the Wire Length and Die Height.
1
Montgomery, D. C.; Runger, G. C.; and Hubele, N. F. Engineering Statistics, 3 rd Edition, John Wiley & Sons, Inc.
(2004).
8
The linear regression model that we want to estimate has the
following form (We will cover linear regression in Chapter 11):
Pull Strength   0  1 (Wire Length)   2 ( Die Height )  
We wish to: a) Test to see whether this model adequately represents
the relationship between Pull Strength and Die Height (is the
relationship linear?), and b) estimate the values of the constant term
and the coefficients in this equation, so that we will have an useful
model. (The term ε in the equation is a random error term.)
Download