Review Topic 6 PowerPoint III

advertisement
IB Math Studies – Topic 6
IB Course Guide Description
IB Course Guide Description
IB Course Guide Description
Describing Data
Types of Data
• Categorical Data – Describes a particular quality or characteristic. It can be
divided into categories.
– i.e. color of eyes or types of ice cream
• Quantitative Data – Contains a numerical value. The information collected is
termed numerical data.
– Discrete – Takes exact number values and is often the result of counting.
• i.e. number of TVs or number of houses on a street
– Continuous – Takes numerical values within a certain range and is often a
result of measuring.
• i.e. the height of seniors or the weight of freshman
Types of Distribution
Symmetric Distribution
Positively Skewed Distribution
Negatively Skewed Distribution
Example 1: Describing Data
24 families were surveyed to find the number of people in the family.
The results are:
5, 9, 4, 4, 4, 5, 3, 4, 6, 8, 8, 5, 7, 6, 6, 8, 6, 9, 10, 7, 3, 5, 6, 6
a)
b)
c)
d)
e)
Is this data discrete or continuous?
Construct a frequency table for the data.
Display the data using a column graph.
Describe the shape of the distribution. Are there any outliers?
What percentage of families have 5 or fewer people in them?
Standard Deviation Formula
 x  x 
2

n
• x is any score
• x is the mean
• n is the number of
scores
Calculate the standard deviation
Values
2
4
5
5
6
6
7
35
xx
x  x 
2
 x  x 
2

n
• Calculate the mean
• Subtract the mean from each
value
• Square these
• Add them
• Divide by n
• Take the square root
Standard Deviation on the GDC
• Type data in List 1
• 1-Var Stats L1
mean  x
standard deviation  x
• On paper you’ll see ‘s’ being used to standard for standard
deviation.
• But you should use the σ measurement from the calculator.
Measuring the Spread of Dara
• The median is the second quartile, Q2
or 50th percentile
• The lower quartile, Q1, is the median of the lower half of the
data
or 25th percentile
• The upper quartile, Q3, is the median of the upper half of the
data
or 75th percentile
• The inter-quartile range is the difference in the upper
quartile and the lower quartile.
IQR = Q3 – Q1
Box Plots
• The inter-quartile range is the width of the box.
• The maximum length of each whisker is 1.5 times the interquartile range.
• Any data value that is larger than (or smaller than) 1.5 × IQR
is marked as an outlier.
To Create a Box-and-Whisker Plot:
1)
2)
3)
4)
5)
6)
Make a number line.
Create the box between Q1 and Q3.
Draw in Q2.
Determine any outliers:
•
Upper boundary = Q3 + 1.5(IQR)
•
Lower boundary = Q1 – 1.5(IQR)
Plot any outliers.
Extend the whiskers to the maximum & minimum (provided
they’re not outliers).
Example :Box and Whisker Plots
A hospital is trialing a new anesthetic drug and has collected data on how long the
new and old drugs take before the patient becomes unconscious. They wish to
know which drug acts faster and which is more reliable.
Old drug times:
8, 12, 9, 8, 16, 10, 14, 7, 5, 21, 13, 10, 8, 10 11, 8, 11, 9, 11, 14
New drug times:
8, 12, 7, 8, 12, 11, 9, 8, 10, 8, 10, 9, 12, 8, 8, 7, 10, 7, 9, 9
Prepare a parallel box plot for the data sets and use it to compare
the two drugs for speed and reliability.
FORMULA
Pearson’s Correlation Coefficient: r
Correlation Coefficient on the GDC
• Turn on your Diagnostics
• Enter the data in L1 and L2
• LinReg L1, L2
Example 1: Correlation Coefficient
In an experiment a vertical spring was fixed at its upper end. It was
stretched by hanging different weights on its lower end. The length of
the spring was then measured. The following readings were obtained.
Load (kg)
x
Length
(cm) y
0
1
2
3
4
5
6
7
8
23.5
25
26.5
27
28.5
31.5
34.5
36
37.5
(b) (i)
Write down the mean value of the load, x
(ii) Write down the standard deviation of the load.
(iii) Write down the mean value of the length,
y
(iv) Write down the standard deviation of the length.
It is given that the covariance Sxy is 12.17.
(d) (i) Write down the correlation coefficient, r, for these readings.
(ii) Comment on this result.
Example 2: Correlation Coefficient
Average speed in the metropolitan area and age of
drivers
The r-value for this
association is 0.027. Describe
the association.
Drawing the Line of Best Fit
1. Calculate mean of x values x, and mean of y values
2. Mark the mean point x , y on the scatter plot
3. Draw a line through the mean point that is through
the middle of the data
– equal number of points above and below line


y
Least Squares Regression Line
• Consider the set of points below.
• Square the distances and find their
sum.
• we want that sum to be small.
• The regression line is used for
prediction purposes.
• The regression line is less reliable
when extended far beyond the region
of the data.
Line of Regression using GDC
• LinReg(ax +b) Test, L1, L2
• where L1 contains your independent data.
• and L2 contains your dependent data
Example 3: Line of Regression
The table shows the annual income and average weekly grocery
bill for a selection of families
a) Construct a scatter plot to illustrate the data.
b) Use technology to find the line of best fit.
c) Estimate the weekly grocery bill for a family
with an annual income of £95000.
Comment on whether this estimate is likely to be reliable.
X2 Test of Independence
• The variables may
be dependent:
– Females may be
more likely to
exercise regularly
than males.
• The variables may
be independent:
– Gender has no
effect on whether
they exercise
regularly.
A chi-squared test is used to
determine whether two variables
from the same sample are
independent.
How to do it:
1) Write the null hypothesis (H0) and the alternate
hypothesis (H1).
2) Create contingency tables for observed and expected
values.
3) Calculate the chi-square statistic and degrees of
freedom.
4) Find the chi-squared critical value (booklet).
• Depends on the level of significance (p) and the
degrees of freedom (v).
5) Determine whether or not to accept the null
hypothesis.
Contingency Tables


Observed Frequencies
Column1
Column2
Totals
Row1
a
b
sum row1
Row2
c
d
sum row2
Totals
Sum column1
Sum column2
total
Expected Frequencies
Column1
Column2
Totals
Row1
row1sum  column1sum
total
row1sum  column2sum
total
sum row1
Row2
row2sum  column1sum
total
row2sum  column2sum
total
sum row2
Totals
sum column 1
sum column 2
total
Χ2 Statistic on the GDC
X
2
calc

f

obs
On the calculator:
Put your contingency table in matrix A

STAT
 TESTS
 C: χ2 Test



 f exp 
2
fe
Observed: [A]
Expected: [B] (this is where you want to go)
Calculate
Output:
Χ2  Χ2 calculated value
df  degrees of freedom
in Matrix B  expected values
Find the Critical Value
• Get this from the formula booklet.
• Significance level (p) is always given in the problem.
 A 5% significance level = 95% confidence level
• Degrees of freedom: v = (c - 1)(r – 1)
where c = number of columns in table
and r = number of rows in table
Accepting the Null Hypothesis
If X2calc < Critical Value
ACCEPT the null hypothesis
If X2calc > Critical Value
REJECT the null hypothesis
Important IB Notes:
•
•
•
•
•
In examinations: the value of sxy will be given if required.
sx represents the standard deviation of the variable X;
sxy represents the covariance of the variables X and Y.
A GDC can be used to calculate r when raw data is given.
For the EXAM students do NOT need to know how to
find the covariance.
• But, for their project if they’re doing regression, then they
DO need to do covariance by hand so they can do the r by
hand so they can get points for using a sophisticated math
process.
Download