Uploaded by Boopbeep Doobap

DATA ANALYTICS FINAL EXAMINATION REVIEWER

advertisement
DATA ANALYTICS FINAL EXAMINATION REVIEWER
MODULE 7: NULL AND ALTERNATIVE HYPOTHESIS
TYPE 1 AND TYPE 2 ERRORS:
TYPE 1
-
Denoted by alpha
In the critical region, it is called the alpha region
Rejecting the null hypothesis but was true.
It is also known as the false positive
Sample Test:
-
The earth is the center of the Universe.
Null Hypothesis:
-
The earth is not the center of the Universe.
Rejecting the null and accept the alternative:
-
The earth is the center of the Universe.
TYPE 2
-
Denoted by beta
In the acceptance region, it is called the beta region
Accepting the null hypothesis but was false.
It is also known as false negative
Sample Test:
-
Do people believe in magic?
Alternate Hypothesis:
-
Most people do not believe in magic.
Failed to prove the alternate hypothesis:
-
Most people believe in magic and failed to prove the alternate hypothesis.
Level of Significance
-
It is denoted by alpha or α
It is the degree of significance of accepting or rejecting the H0.
It specifies the allowable probability of making a type 1 error.
It is determined before the study to know the strength of evidence in the sample before
rejecting the null hypothesis.
Allowable values are α = 0.05 and α = 0.01
MODULE 8 – T-TEST
T-TEST
-
It allows you to know the differences between two groups and it is measured through
means or average.
In every T-Test, it results into t-value.
The t-value is a type of test statistic that evaluates the data sample.
The larger the t-value, the larger difference in two groups.
ANOVA (Analysis of Variance)
-
Compares the mean of two or more groups which are significantly different from each
other.
SUMMARY Table:
Groups: Names of the groups
Count: Number of observation in each group
Sum: Sum of the values in each group
Average: Average value in each group
Variance: Variance of the values in each group
ANOVA Table:
Source of Variation: Variation being measured either between or within groups
SS: The sum of squares for each source of variation. Sum of squares.PNG
df: The degrees of freedom, calculated as
Between (df) = No. of groups-1
Within (df) = No. observations – No. groups
MS: The mean sum of squares, computed as SS / df
F: F-value, computed as MS Between / MS Within
P-value: P-value corresponding to the F-value
F crit: F critical value corresponding to α = .05
-
If the P-value < α, we reject the null hypothesis. Moreover, if the F-value > F crit, we
reject the null hypothesis.
MODULE 9 – CHI-SQUARE
Chi Square test for independence
-
It compares two categorical variables in a table to see if they are related or whether the
distribution of these variables differ from each other.
Chi Square Statistic
-
It is used to evaluate test of independence in a bivariate table (displays the distribution
of one variable across the categories of another variable).
Formula:
MODULE 10: LOGISTIC REGRESSION
Logistic Regression
-
It is a predictive analysis used to describe the data and the relationship between the
dependent and independent variable.
it is used when the dependent variable (target) is categorical.
it analyzes variables which are CONTINUOUS (NUMERICAL)
it is a statistical way of measuring relationship between variables. (Dependent or
independent)
it is used to predict future/probability of an event occurring
one of the simplest algorithms when it comes to Machine Learning.
o Machine Learning is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being
explicitly programmed to do so.
Discrete (categorical)
-
countable
nothing in between
digital
for example, gender, marital status, exam result (pass, fail)
-
Continuous (quantitative)
infinite/uncountable
always something in between
analog
for example, wind speed, temperature, volts of electricity
Download