Statistics Stat 430 Heike Hofmann

advertisement
Statistics
Stat 430
Heike Hofmann
An Example
Irrelevant or inappropriate messages sent on
the Internet to a large number of
newsgroups or users
What makes SPAM?
• Case-by-case decision
• But: if we know some details about an
email, we will be able to determine fairly
accurately, whether the email is SPAM
• What information would be helpful to
collect about email to make a decision?
Your Turn
Data Exploration
• Understanding patterns/structures based
on collected data
• sample: data on a representative subgroup
of the total population
• representativeness allows us to make
generalizations
Statistics
• (Exploration)
• Estimation of Parameters
• Hypothesis Testing
• Predictions
Statistical Summaries
• Let x , x , ...., x
1
2
N
be observations
• a statistic is a summary of these numbers:
e.g. average, minimum, maximum, range,
quartiles, median
for categorical values: mode, levels
Graphical Summaries
Barchart
Spinogram
area is number of emails by day, red areas
correspond to SPAM
Graphical Summaries
Histogram
%capital
Spineplot
Graphical Summaries
Scatterplot
Download