Uploaded by Madlad Rodriguez

AP Statistics

advertisement
Symbols/Equations
❖ Sample mean
❖ Population mean
❖ Summation
❖ Sample Variance
❖ Population Variance
❖ S - Sample Standard Deviation
❖
❖ Z-score
❖ Y-hat
(predicted value)
❖ R (Correlation Coefficient)
❖ R^2 (Coefficient of Determination)
❖ Standard deviation of Residuals (Root Mean Square
Deviation RMSD)
❖ Standard Error
*Excel Functions for equations
● Mean
○ ⇒ Average()
● Standard Dev,
○ ⇒ =stdev()
● Median
○ ⇒ median()
● Interquartile Range
○
● R - Correlation Coefficient
○
● R^2 → Correlation of determination
○
*Logarithms
Log(x) - no base
●
Assumed to be Log Base 10
Unit 1 - exploring categorical data
Marginal Distribution
●
Marginal distributions are totals for each row OR column in a two-way table (or joint
distribution table), showing the distribution of one variable
Conditional Distribution
●
Conditional distributions show the distribution of one variable GIVEN a condition on the
other. They're usually in percentages.
Classifying shapes of distributions
Unit 3 - Summary Statistics
Interquartile Range
● What is it : statistical measure used to find the range of the middle 50% of
values in a dataset
● Why use it : Non-Parametric Data Analysis: For data that is not normally
distributed or when the sample size is small, the IQR is a better measure of
spread than the standard deviation, as it is not as influenced by extreme
values.
● Equation : Interquartile Range (IQR) = Q3 - Q1
○
How to find Q3 and Q1 : Q3 and Q1 are the medians of the values of each side
of the median.
Sample Variance
● What is it : Sample variance quantifies the dispersion or spread of data
points in a sample around the mean.
● A high variance indicates that the data points are spread out over a wide
range of values,
● A low variance suggests that the data points are clustered closely around
the mean
●
Sample Standard Deviation
● What is it : measures the dispersion of a dataset relative to its mean in the
same unit as the original data
● Low S.D = low variance = minimum fluctuation in data values (could be
good or bad depending on context)
●
Excel() Mean, STD, IQR, Median
● Mean ⇒ Average()
● Standard Dev, ⇒ =stdev()
● Median ⇒ median()
● Interquartile Range ⇒
○
Unit 4 -One-var quantitative data
z - score
●
What is it: The number of standard deviations from the mean for a specific data point
○ Z score is 0 if the value is equal to the mean
○ Z-score is positive if the value is > the mean
○ Z-score is negative if value < mean
○ Ex ) z score of -1.5 ⇒ the value is 1.5 S.D away from the mean to
the left
●
Why? : Tells you how usual / unusual a data value is in your dataset b/c it tells you how
far away from the mean it is.
Density Curve / Norm Distribution
●
Area under the curve is basically length x width
Empirical rule
●
●
the 68–95–99.7 rule
68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of
the mean, respectively.
Using z - score Tables
●
Why/ when to use one : when you have a data point that you want to compare against
the normal distribution
●
●
●
Steps 1 ⇒ Find the z score of that data point
Step 2 ⇒ open the z-score table and find the value corresponding to z-score
Step 3 ⇒ This value is the percentile of the normally distributed dataset.
○ In example above , Darnell is taller than 71.57% of all students.
Find z-score for a percentile
Unit 5 - Two-var quantitative data
R - Correlation Coefficient
●
What is it : statistical measure of the strength of a linear relationship between 2
variables
○ Value ranges from -1 to 1
○ -1 = negative linear correlation
○ 0 = no linear correlation
○ 1 = positive linear correlation
●
○
○
Formula is basically saying add up each x,y coord’s multiplied z-score.
Nobody does this by hand, use software / excel
Residuals & Least-Squares regression
Residual
●
●
What is it : the difference between actual - estimate
Equation : Residual = y data point - (the linear regression equation)
○ Residual = actual - expected
Least-Squares Regression
●
Basically just choosing the best linear regression model
Equation of Regression line
R^2 (Also known as Coefficient of Determination)
●
●
●
●
What is it : It shows how correlated one dependent variable is with one independent
variable
Range: a number between 0 and 1.
○ The closer the value is to 0, the less correlated the dependent value is.
○ The closer the value is to 1, the more correlated the value is.
Watch out for : high correlation does not assure significance.
○ Correlation does not imply causation typea shit
Example :
○
○
60.032% of the variation in study time can be explained by the regression on
caffeine intake
Standard deviation of Residuals (Root Mean Square Deviation
RMSD
●
What is it : the average difference between a set of observed and predicted values
from the regression line.
Unit 7: Probability
Conditional Probability
Atleast 1 condition
https://www.khanacademy.org/math/ap-statistics/probability-ap/probability-multiplication-rule/a/pr
obabilities-involving-at-least-one-success
Unit 10: Confidence Intervals, Significance tests.
Margin of error
Standard Error
●
What is it : The standard deviation of a SAMPLE distribution.
○ Measures the amount of discrepancy between a sample estimate and the true
value in the population
○ Smaller S.E ⇒ better
○ S.E = 0 ⇒ estimated value is exactly the true value
○
Download