The T-TEST ANOVA CORRELATION CHI-SQUARE REGRESSION Human behavior Submitted by: Salwa Buriro Roll No: 2k19/HBBAE/19 Assigned by: Syed Safdar Ali Shah BBA(EVENING) 1|Pag e The T-TEST T-tests offer an opportunity to compare two groups on scores such as differences between boys and girls or between children in different school grades. A t-test is a type of inferential statistic, that is, an analysis that goes beyond just describing the numbers provided by data from a sample but seeks to draw conclusions about these numbers among populations. The t-test is one of many tests used for the purpose of hypothesis testing in statistics. Calculating a t-test requires three key data values. They include the difference between the mean values from each data set (called the mean difference), the standard deviation of each group, and the number of data values of each group. the t-test analyzes the difference between the two means derived from the different group scores. T-tests tell the researcher if the difference between two means is larger than would be expected by chance. There are three versions of t-test: 1. Independent samples t-test (which compares mean for two groups) 2. Paired sample t-test (which compares means from the same group at different times) 3. One sample t-test (which tests the mean of a single group against a known mean.) Dependent samples t-test (also called repeated measures t-test or paired-samples ttest) t-tests are used when we want to compare two groups of scores and their means. Sometimes, however, the participants in one group are somehow meaningfully related to the participants in the other group. One common example of such a relation is in a pretest post-test research design. Because participants at the pre-test are the same participants at the post-test, the scores between pre- and post-test are meaningfully related. independent samples t-tests the independent samples t-test is used to compare two groups whose means are not dependent on one another. In other words, when the participants in each group are independent from each other and actually comprise two separate groups of individuals, who do not have any linkages to particular members of the other group One sample t-test (a single group against a known mean.) 2|Pag e Analysis of Variance Analysis of variance (ANOVA) and is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups. The ANOVA technique applies when there are two or more than two independent groups. The ANOVA procedure is used to compare the means of the comparison groups The fundamental strategy of ANOVA is to systematically examine variability within groups being compared and also examine variability among the groups being compared. ANOVA using the five-step approach. Step 1. Set up hypotheses and determine level of significance. Step 2. Select the appropriate test statistic. Step 3. Set up decision rule. Step 4. Compute the test statistic. Step 5. Conclusion. There are two types of ANOVA One Way ANOVA Two Way ANOVA One Way ANOVA: A one-way ANOVA is used to compare two means from two independent (unrelated) groups using the F-distribution. The null hypothesis for the test is that the two means are equal. Therefore, a significant result means that the two means are unequal. A one-way ANOVA will tell you that at least two groups were different from each other. But it won’t tell you what groups were different. Two Way ANOVA: A Two Way ANOVA is an extension of the One Way ANOVA. With a One Way, you have one independent variable affecting a dependent variable. With a Two Way ANOVA, there are two independents. Use a two-way ANOVA when you have one measurement variable (i.e. A quantitative variable) and two nominal variables. In other words, if your experiment has a quantitative outcome and you have two categorical explanatory variables, a two-way ANOVA is appropriate. 3|Pag e CHI-SQUARE A chi-squared test, also written as χ2 test, is any statistical hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Without other qualification, 'chi-squared test' often is used as short for Pearson's chi-squared test. The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. In the standard applications of this test, the observations are classified into mutually exclusive classes, and there is some theory, or say null hypothesis, which gives the probability that any observation falls into the corresponding class. The purpose of the test is to evaluate how likely the observations that are made would be, assuming the null hypothesis is true. chi-squared test is used to compare the distribution of plaintext and (possibly) decrypted cipher text. The lowest value of the test means that the decryption was successful with high probability. This method can be generalized for solving modern cryptographic problems. There are two types of chi-square tests. Both use the chi-square statistic and distribution for different purposes: chi-square goodness of fit test chi-square test for independence chi-square goodness of fit test A chi-square goodness of fit test determines if a sample data matches a population. a family of continuous probability distributions, which includes the normal distribution and many skewed distributions, and proposed a method of statistical analysis consisting of using the Pearson distribution to model the observation and performing the test of goodness of fit to determine how well the model and the observation really fit. the goodness-of-fit test, which asks something like "If a coin is tossed 100 times, will it come up heads 50 times and tails 50 times?" chi-square test for independence A chi-square test for independence compares two variables in a contingency table to see if they are related. In a more general sense, it tests to see whether distributions of categorical variables differ from each another. A very small chi square test statistic means that your observed data fits your expected data extremely well. In other words, there is a relationship. A very large chi square test statistic means that the data does not fit very well. In other words, there isn’t a relationship. the test of independence, which asks a question of relationship, such as, "Is there a relationship between gender and SAT scores?" 4|Pag e Correlation Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. A positive correlation indicates the extent to which those variables increase or decrease in parallel; a negative correlation indicates the extent to which one variable increases as the other decreases. A correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another. When the fluctuation of one variable reliably predicts a similar fluctuation in another variable, there’s often a tendency to think that means that the change in one causes the change in the other. Correlation is a statistic that measures the degree to which two variables move in relation to each other. In finance, the correlation can measure the movement of a stock with that of a benchmark index, such as the Beta. Correlation measures association, but does not tell you if x causes y or vice versa, or if the association is caused by some third (perhaps unseen) factor. Investment managers, traders and analysts find it very important to calculate correlation, because the risk reduction benefits of diversification rely on this statistic. There are four types of correlations: Pearson correlation Kendall rank correlation Spearman correlation The Point-Biserial correlation. Pearson correlation: Pearson correlation is the most widely used correlation statistic to measure the degree of the relationship between linearly related variables. Kendall rank correlation: Kendall rank correlation is a non-parametric test that measures the strength of dependence between two variables. Spearman rank correlation: Spearman rank correlation is a non-parametric test that is used to measure the degree of association between two variables. The Spearman rank correlation test does not carry any assumptions about the distribution of the data and is the appropriate correlation analysis when the variables are measured on a scale that is at least ordinal. The Point-Biserial correlation: The point-biserial correlation is conducted with the Pearson correlation formula except that one of the variables is dichotomous. 5|Pag e Regression Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest. While there are many types of regression analysis, at their core they all examine the influence of one or more independent variables on a dependent variable. Regression analysis provides detailed insight that can be applied to further improve products and services. Regression analysis is a reliable method of identifying which variables have impact on a topic of interest. There are 7 types of regression: Linear Regression Logistic Regression Polynomial Regression Stepwise Regression Ridge Regression Lasso Regression Elastic Net Regression Linear Regression: In this technique, the dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature of regression line is linear. Logistic Regression: Logistic regression is used to find the probability of event=Success and event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature. Polynomial Regression: A regression equation is a polynomial regression equation if the power of independent variable is more than 1. Stepwise Regression: This form of regression is used when we deal with multiple independent variables. The selection of independent variables is done with the help of an automatic process, which involves no human intervention. Ridge Regression: Ridge Regression is a technique used when the data suffers from multicollinearity (independent variables are highly correlated). Lasso Regression: Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. It is capable of reducing the variability and improving the accuracy of linear regression models. Elastic Net Regression: Elastic Net is hybrid of Lasso and Ridge Regression techniques. It is trained with L1 and L2 prior as regularize. Elastic-net is useful when there are multiple features which are correlated. Lasso is likely to pick one of these at random, while elastic-net will pick both.