Statistics Stat 430 Heike Hofmann An Example Irrelevant or inappropriate messages sent on the Internet to a large number of newsgroups or users What makes SPAM? • Case-by-case decision • But: if we know some details about an email, we will be able to determine fairly accurately, whether the email is SPAM • What information would be helpful to collect about email to make a decision? Your Turn Data Exploration • Understanding patterns/structures based on collected data • sample: data on a representative subgroup of the total population • representativeness allows us to make generalizations Statistics • (Exploration) • Estimation of Parameters • Hypothesis Testing • Predictions Statistical Summaries • Let x , x , ...., x 1 2 N be observations • a statistic is a summary of these numbers: e.g. average, minimum, maximum, range, quartiles, median for categorical values: mode, levels Graphical Summaries Barchart Spinogram area is number of emails by day, red areas correspond to SPAM Graphical Summaries Histogram %capital Spineplot Graphical Summaries Scatterplot