HONORS STATISTICS TUTORIAL To help you with this tutorial, Log into http://www.gla.ac.uk/sums/users/jdbmcdonald/PrePost_TTest/pandt1.html Introducing Experimental Design Many experiments compare two sets of measurements of the same variable. There are many ways of setting up an experiment to produce such data, but when we talk about experimental design, we are referring to a specific aspect of the experiment: whether or not the two sets of measurements can be sensibly paired off with each other. If you can pair off each measurement from one sample with a natural partner from the other sample, then you have a paired design. This may be because you are measuring something twice under different conditions (sometimes called a repeated measures design) or because the two things you are measuring are naturally related; EX: Comparing the running speed of horses for a week of eating one type of feed with the same horses for a week on a different type of feed would be a paired design as you can pair off measurements from the same horse. If there is no sensible way of pairing off the values from the two samples, then you have an independent design. EX: Comparing the running speeds of horses and zebras would be an independent design as there is no sensible way to pair off each horse with each zebra. What is a Hypothesis? A hypothesis is designed to be tested as being either supported or not supported (refuted) by experimental data. This means that a hypothesis will have an opposite, which is the fact that the hypothesis should be rejected! There is a vocabulary that comes with all of this. Here it is: The experimental (or research) hypothesis is the prediction that your theory makes, or the effect you suspect you will see. This is referred to as H1. The null hypothesis is the statement that the effect described in the experimental hypothesis does not exist. This is referred to as H0. One thing to remember when wording your hypotheses is that it is important to decide whether or not you expect to see a difference in a particular direction. If you think that coffee improves memory, then you expect memory scores to be better for coffee drinkers, so there is an expected direction. Here are lists of sentences that represent either an experimental hypothesis or a null hypothesis. Work through them saying which category each falls into and say whether the hypothesis has a direction or not. CHOOSE EITHER: EXPERIMENTAL HYPOTHESIS OR NULL Alcohol consumption does not affect reaction time _________ Does the hypothesis have a direction? _______ Sports drinks improve recovery time after exercise _________ Does the hypothesis have a direction? _______ There is a difference between the ability of girls and boys to learn statistics. ___________ Does the above hypothesis have a direction? ________ Application Here are the two hypotheses from your study. Which is the null and which is the experimental hypothesis? Post-ninth grade student heights will be significantly higher than pre-ninth grade student heights. ________ There is no difference in Pre/Post ninth grade heights ______________________ Is your experimental hypothesis looking for an effect in a certain direction? __________ Introducing Central Tendency The measure of central tendency of your data is the single value that best represents all of the data. It is the value that you would pick if you had to guess which of your data points somebody had chosen at random. This value often (but certainly not always) lies in the 'middle' of the data, in the sense that it has as many values above it as it has below. There are three main measures of central tendency: The mean is the result of adding all the values in your data together and dividing the total by the number of data points you have. The mean is the measure that people often refer to as the average. For example, the mean height is 180 meters; The median is the result of arranging the values in order and finding the middle value in the resulting list. For example, the median age is 45; The mode is the most commonly occurring value in your data. This corresponds to the highest bar in the frequency histogram. For example, the most common number of children in a family is 2. The most appropriate measure for a given data set depends on the data itself. Continuous values such as height are suitable for using the mean, for example 'The mean height is 32.5 cm'. The mode is not a good measure to use with continuous values measured to high accuracy as such data may not contain any repeated values. For example, if you measured the height of ten people to the nearest millimeter, you might get ten different values; Discrete values such as number of children are better suited to the mode or median, thus avoiding 'The average is 2.3 children'; Categorical data such as Color of cars sold should use the mode, for example, 'Red is the most common car color'. Application Your experiment generated data describing two variables. The independent variable, separates your experimental samples into pre-test and post-test. The dependent variable, takes discrete numeric values. Standard Deviation The average of your data summarizes it all in a single value. That certainly throws away a lot of information. If you were to know one more thing about the data, after the average, what would be the most useful thing? The range (largest and smallest) might be useful, but there is a different measure that is even better - the standard deviation. The standard deviation measures how much the data varies: A large number means the data varies a lot A small number means the data varies a little A standard deviation of zero indicates that all the values in the data are identical The standard deviation tells you something more about the average too, as it measures variation in terms of how far from the mean all the values in the sample fall. Values are further from the mean on average when standard deviation is large than they are when it is small. Application A. The standard deviation for when is pre-test is 5.88 B. The standard deviation for when is post-test is 7.88 Which sample has the most variation between its values? Samples and Populations Explanation When You Cannot Measure Everyone We have already seen that data from experiments is generated by taking measurements from a number of different experimental units. For example, we might measure the height of 20 people or the acidity level in 30 soil samples. It is very rare indeed that we will have measured every possible unit (every person in the world or every bit of soil). To make a distinction between the few we have measured and all that we might measure, we use the following words: The population refers to every unit in existence; A sample refers to those units that we have measured. Here are the key points to remember about sampling: What makes up a population depends on the definition you choose for your study. It might be as broad as all people or as narrow as the male members of class 3B; A very common method of collecting samples, known as simple random sampling attempts to ensure that each member of the population has an equal chance of being picked as part of a sample; Descriptive statistics are used to describe certain aspects of a sample (for example, 'The sample mean is 5.4'); Inferential statistics are used to make statements about the whole population based only on what we know about a given sample. The difference between a statistic inferred from the sample and the true population statistic is known as sampling error; The larger a sample gets, the smaller sampling error is likely to get. Application Now we will look at the data from your study. You have collected 30 paired measurements, so you have two samples of 30 each. One is from students in the pre-ninth grade sample and the other is from students in the post-ninth grade sample. Does the data you have collected represent a sample of a larger population, or have you collected a measurement from every possible student there is? ____________________________ What kind of statistics would you use to describe the sample we have collected? Descriptive or Inferential What kind of statistics would you use to infer things about the population based on that sample? Descriptive or Inferential If you doubled your sample size, how would that affect sampling error? Increase or Reduce the sampling error? Choosing a T-Test Explanation Paired or Independent t-test? There are two types of t-test, the paired t-test and the independent t-test. This page tells you how to pick the right one for your data. We have already seen that when comparing two samples, it is important to know whether or not the samples are paired. The section on experimental design covers this in more detail, but here is a quick recap: With paired (dependent) samples, it is possible to take each measurement in one sample and pair it sensibly with one measurement in the other sample. This might be because measurements were taken from the same group twice (repeated measures) or because there is some other way to join measurements, for example, comparing the IQ of older and younger brothers; With independent samples, there is no sensible way to pair off the measurements. One of the reasons that you need to identify the type of experimental design that you are dealing with is that you need to use the right t-test for the right design: The paired t-test is used when you have a paired design The independent t-test is used when you have an independent design The other thing you need to decide at this point is easy to decide, but can be slightly harder to understand. You need to decide which of the following types of effect you expect to find: The first mean to be larger than the second The first mean to be smaller than the second The first mean to be different from the second in either direction You will see this choice referred to in literature and textbooks as the number of tails of the test. The tail is the extreme end of the distribution of the data and your experiment can be one of two types: One tailed tests expect the effect to be in a certain direction, so the first two points above are examples of 1 tailed experiments Two tailed tests are used when you have no idea which sample will be larger than the other, but you are looking for any difference. The third point above is such a case. If you have stated your experimental hypothesis with care, it will tell you which type of effect you are looking for. For example, the hypothesis that "Coffee improves memory" is one tailed because you expect an improvement. The hypothesis, "Men weigh a different amount from women" suggests a two tailed test as no direction is implied. So remember, don't be vague with your hypothesis if you are looking for a specific effect! Exploration Here are a few questions to test yourself to make sure you understand the choice of t-test type and tail number. 1. An experiment measures people's lung capacity before and then after an exercise program to see if their fitness has improved. Which t-test would you use? ______________ How many tails does the test have? ________ 2. A different experiment measures the lung capacity of one group who took one exercise program and another group who took a different exercise program to see if there was a difference. Which t-test would you use? ____________ How many tails does the test have? __________ Application Your experiment compares pre-ninth grade heights with post-ninth grade heights and is measured from the same students under both conditions, Pre- and Post-. Your experimental hypothesis is "Post-ninth grade heights will be significantly higher than Pre-ninth grade heights.", so you know which direction you expect the difference to be in. What is your experimental design? Paired or Independent Which t-test should you choose? Paired t-test or Independent t-test How many tails does your experiment have? 1 tailed or 2 tailed Paired t-test Explanation What a Paired T-Test Does A paired t-test compares two samples in cases where each value in one sample has a natural partner in the other. The concept of paired samples is covered in more detail in the section on choosing a t-test. What a Paired T-Test Measures A paired t-test looks at the difference between paired values in two samples, takes into account the variation of values within each sample, and produces a single number known as a t-value. You can find out how likely it is that two samples from the same population (i.e. where there should be no difference) would produce a t-value as big, or bigger, than yours. This value is called a p-value. So, a t-test measures how different two samples are (the t-value) and tells you how likely it is that such a difference would appear in two samples from the same population (the p-value). P-values How to use a Paired T-Test You will use a software package (EXCEL) to perform a t-test. Software can perform the calculations to produce t-values and p-values, but it is your responsibility to do the following: Pick the right kind of t-test, in this case, a paired t-test and the right direction of test (one or two tailed). See the sections on choosing a t-test for more on this; Ensure the distribution of your data is suitable for a t-test. See the sections on the normal distribution for more on this; Know how to interpret the results of doing a t-test. See the sections on t-values and p-values for more on this. One final practical point: each value in one sample is paired with a single value in the other. When you enter your data into a computer for analysis by a software package, make sure the paired values are lined up. This usually means having data in two columns where each row represents a single pair. The fact that values are paired is very important! T-TEST = A WAY TO COMPARE THE MEANS OF SETS OF DATA USING STATISTICS. p-test = STATISTICAL SIGNIFICANCE BETWEEN THE MEANS. IF THERE IS A STATISTICAL SIGNIFICANCE (p<0.05), the means are statistically significant. The standard benchmark is 5% (0.05) and this is called the significance level. Information from :http://www.gla.ac.uk/sums/users/jdbmcdonald/PrePost_TTest/pandt1.html