Introduction to Statistics, Lecture 1

advertisement
Agenda
Course 02402
Introduction to Statistics
Lecture 1: Introduction to Statistics
1
Practical Information
2
Introduction to Statistics
3
Descriptive Statistics: Summary Statistics
4
Software: R
Per Bruun Brockhoff
DTU Informatics
Building 305 - room 110
Danish Technical University
2800 Lyngby – Denmark
e-mail: pbb@imm.dtu.dk
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
1 / 22
Practical Information
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
2 / 22
Spring 2013
5 / 22
Practical Information
Practical information
Practical Information
Teaching module: Tuesdays 13.00-17.00
Generic weekly agenda:
Homepage: 02402.imm.dtu.dk
Note about software R
Syllabus, Lecture plan
Exercises & solutions
Slides
Podcasts of lectures (In English AND Danish)
Quizzes
BEFORE teaching module: Read announced stuff
2 hours long lectures (curriculum of the week)
2 hours of exercises (Mix of: Book, Rnote, online quiz-questions)
AFTER teaching module: Test yourself by online exam quiz.
Exam: 4 hour multiple choice
Campusnet: www.campusnet.dtu.dk
Messages and (certain) file sharings
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
4 / 22
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Introduction to Statistics
Introduction to Statistics
Introduction to Statistics
Statistics and Engineers
How to treat (or analyse) data?
What is random variation?
Statistics is a tool for making decisions:
How many computers did we sell last year?
What is the expected price of a share?
Is machine A more effective than machine B ?
Statistics can be used
Statistics can be used in most disciplines and is
therefore a very important tool
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
7 / 22
Statistics is an important tool in problem
solving
Data analysis
Quality improvement
Design of experiments
Predictions of future values
.. and much more!
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics
Spring 2013
8 / 22
Introduction to Statistics
Statistics
Statistics
Modern statistics Modern statistics are based
on theory of probabilities and descriptive
statistics.
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Introduction to Statistics, Lecture 1
Spring 2013
9 / 22
Statistics is often about analyzing a sample,
that is taken from a population
Based on the sample, we try to generalize (or
comment on) the population
Therefore it is important that the sample is
representative of the population
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
10 / 22
Descriptive Statistics: Summary Statistics
Descriptive Statistics: Summary Statistics
Chapter 2: Summary statistics
Mean
We use a number of summary statistics to
summarize and describe data (stochastic variables)
Mean x̄
Median
Variance s2
Standard deviation s
Percentiles
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
We say that x̄ is an estimate of the mean value
Spring 2013
12 / 22
Descriptive Statistics: Summary Statistics
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
13 / 22
Descriptive Statistics: Summary Statistics
Median
Variance and standard deviation
The median is also a key number, indicating the
center of the data. In some cases, for example in
the case of extreme values, the median is
preferable to the mean
Median:
The observation in the middle (in sorted order)
Per Bruun Brockhoff (pbb@imm.dtu.dk)
The mean value is a key number that indicates
the centre of gravity or centering of the data
The mean:
n
1X
x̄ =
xi
n i=1
Introduction to Statistics, Lecture 1
Spring 2013
14 / 22
The variance (or the standard deviation) indicates
the spread of the data:
Variance
n
X
1
s2 =
(xi − x̄)2
n − 1 i=1
Standard deviationv
u
n
√
u 1 X
s = s2 = t
(xi − x̄)2
n − 1 i=1
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
15 / 22
Descriptive Statistics: Summary Statistics
Descriptive Statistics: Summary Statistics
The coefficient of variation
Percentiles
The standard deviation and the variance are key
numbers for absolute variation. If it is of interest
to compare variation between different data sets,
it might be a good idea to use a relative key
number, the coefficient of variation:
s
V = · 100
x̄
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
16 / 22
The median it the point that divides the data into
two halves. It is of course possible to find other
points that divide the data in other parts, they are
called percentiles.
Often calculated percentiles are
0, 25, 50, 75, 100 % percentiles (quartiles)
and/or
0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 %
percentiles
Note: the 50% percentile is the median
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Descriptive Statistics: Summary Statistics
Spring 2013
17 / 22
Software: R
Figures/Tables
Software: R
Quantitative data:
Scatter plot (xy plot)
Histogram
Cumulative distribution
Boxplots
Count data:
Bar charts (pareto diagram)
Pie charts
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Introduction to Statistics, Lecture 1
Appendix C in the textbook (7. and 8.
edition): Description of R.
R Commander: a graphical user interface.
R-exercise today.
You can run R from the G-bar at home via
Thinlinc.
R can (easily) be installed on your own
computer. (See Rnote)
Spring 2013
18 / 22
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
20 / 22
Software: R
Software: R
Next week:
Agenda
Discrete distributions - chapter 4.
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
21 / 22
1
Practical Information
2
Introduction to Statistics
3
Descriptive Statistics: Summary Statistics
4
Software: R
Per Bruun Brockhoff (pbb@imm.dtu.dk)
Introduction to Statistics, Lecture 1
Spring 2013
22 / 22
Download