Scenario 1 – Several ‘threats’ to internal and external validity that were originally identified by Campbell & Stanley (as Cited in George et al, 2003) are present in this research. The study used a Pre-Experimental “One group Pre-Test, Post-Test’ design, with all participants being tested before and after power training. This design is inherently flawed; the lack of a control group makes it is impossible to determine cause and effect of the treatment. The lack of randomisation of subjects promotes the threat to internal validity of ‘selection bias.’ Furthermore, the threat of ‘Interaction of selection bias and experimental treatment’ to external validity is induced by the biased sample. The researcher aimed to assess jump height in ‘Adolescents’, but the sample does not reflect this - with only twelve male participants, all from the same football club. Therefore, any effect of the training observed in the research may simply be since all participants were active, male footballers and thus cannot be generalised to the wider population of ‘adolescents’ as intended. In addition, the threat of ‘maturation’ to internal validity may have been present. The sample included boys aged 11-16, therefore the older boys may have naturally responded better to the training due to being more physically developed. These issues could be rectified through simple changes to design and sampling. A true experimental design such as a randomised pre-test post-test design, with participants being randomly assigned to receiving a power training group or a control group (i.e. non-power based exercises, or no training) would allow for a cause and effect assumption. Using a larger, wider sample would reduce the threat of Interaction of selection bias and experimental treatment. This could be done by recruiting more participants, with mixture of boys and girls from several different schools, and not having a narrow inclusion criteria such as ‘footballers only’. These changes would greatly improve external validity and thus generalisability to the target population. The researcher may also narrow the age range to reduce the threat of ‘maturation’ to internal validity. In this study, the testing and training procedure was only verbally explained, with testing performed only once by participants, causing a ‘testing’ threat to internal validity. Taking the test only once will inevitably benefit the participant’s performance on the next test, as they are learning the method of testing. The researcher should use familiarisation session/s with each group before the testing, allowing time for demonstration and practice of the test. This would provide testing scores that reflect the more accurately reflect the effect of the power training rather than learning of the testing procedure – through reducing the ‘learning effect’. In addition, the participants should be tested more than once (with adequate recovery time), to improve reliability of results - three may be more appropriate. Testing scores may also be threatened by ‘Reactive effects to experimental arrangements’ – in particular the ‘Hawthorne effect’, originally identified by Henry A. Landsberger (as cited in Merrett, 2006). This may arise since each participant performed their jump in front of eleven other boys from their football club plus the researcher, therefore they may be motivated to try harder to beat one another. This threat could be reduced by each participant being tested one at a time on their own, without the others watching. Following testing, the researcher only explained the training programme and asked for ‘verbal confirmation’ from the participants that they had performed the training programme each week. This is likely to promote socially desirable answers from the participants, giving an inaccurate reflection of training adherence. The researcher could have arranged a weekly session for the groups to perform their training on site, under their supervision, to reduce this threat to internal validity and confirm both groups performed the correct training. The researcher placed no restrictions on the physical activity (PA) levels of the participants during the eight weeks of training and made no effort to record the participants PA levels during this time, causing a threat of ‘history’ effects to internal validity, and therefore making it difficult to determine cause and effect of the training programme. The responsiveness of the players to the training may simply reflect their PA levels outside of the training – (i.e. some may be weight training multiple times per week, therefore may show better improvement in testing results than others who are not). The researcher should have assessed PA levels of the players during the 8 weeks (e.g. Accelerometry), meaning external PA levels could be correlated with results from the training programme, giving a better assumption of cause and effect of training and thus reducing the threat of history to internal validity. Scenario 2 – Several ‘threats’ to internal and external validity that were originally identified by Campbell & Stanley (as Cited in George et al, 2003) are present in this research. This study used a pre-experimental static group comparison design with non-randomised, age-matched groups, causing the threat of ‘selection bias’ to internal validity. This makes it difficult to determine whether any effect/lack of effect observed is because of the treatment, or due to inherent similarities/differences in the samples between groups. The researcher’s target population was male and female adults, however the sample selected does not accurately reflect this – with only 2 males compared to 16 females, of which both males were in the control group. Use of a true experimental design such as a Randomised group design would allow for a more confident assumption of cause and effect of melatonin. Participants were all colleagues of the researcher, therefore social threats to internal validity may be present. The participants may feel pressured to give the results they believe the researcher is looking for in testing, as they were funded by him/her to travel to the USA. Additionally, the fact that they are all colleagues attending the same conference promotes the social threat of ‘diffusion’ to internal validity, as they will be able to easily be able discuss the research between themselves. External validity and generalisability could be improved by recruiting a larger sample from a wider area (i.e. several different Universities with members of staff attending the conference). This would allow for a better gender distribution across the control and melatonin group, and thus making the findings more generalizable to the target population of adult males & females. A wider, larger sample such as this would also reduce the risk of diffusion, as not all participants would be colleagues, and would be less likely to discuss the experiment. The threat of ‘expectancy’ to internal validity is particularly prominent in this research design. The participants were informed of the potential benefits of melatonin on jet lag before the treatment – immediately causing them to expect a positive effect should they receive melatonin. Furthermore, the Melatonin and Placebo treatments were administered in different formats. The melatonin in capsule form, and the placebo in a powder consisting of crushed up mint sweets. As academic members of staff, it is reasonable to assume that should they receive a ‘minty’ flavoured powder they may be aware that they are receiving the placebo and therefore will expect little to no effect on their jet lag. This may also further promote diffusion – e.g. “You received a capsule? I was given a minty powder, maybe its placebo” and so on. The threat of expectancy could be reduced greatly. Removing the melatonin briefing, providing both groups with identical capsules, and changing the study to a ‘double-blind’ design, would greatly reduce the threat of ‘expectancy’ and thus improve internal validity of this research. The threat of ‘history’ effects to internal validity is prominent in this research, as the researcher did not attempt to control or record any important variables such as physical activity, or sleep levels once the participants returned; with the only instruction being to ‘adhere to the drug regime’. Some may sleep two or three times as much as others upon returning home, some may be engaging in strenuous PA, others none. These factors will clearly influence one’s subsequent perception of ‘jet lag’, therefore providing invalid results. The researcher should have used a more direct measure of sleep and physical activity levels in the seven days after arrival in the UK (i.e. Accelerometry) to minimise this threat and allow for a better assumption of cause and effect of the treatment. The use of a visual analogue scale ranging from ‘insignificant jet lag’ to ‘very bad jet lag’ as a means of assessment causes the threat of ‘instrumentation’ to internal validity. With no definition of what constitutes ‘very bad’ or ‘insignificant’ jet lag, the results are entirely subjective and dependent upon each participant’s perception. They were also only assessed once, 7 days following return, therefore the whole effect of treatment was pinned down to how they felt on one day. The use of a validated jet lag questionnaire, filled in every day for 7 days upon return would reduce this threat and improve internal validity, whilst providing a better insight into the time scale of the participants jet lag. References George, K., Batterham, A. and Sullivan, I. (2000). Validity in clinical research: A review of basic concepts and definitions. Physical Therapy in Sport, 1, 19–27. Merrett, F. (2006). Reflections on the Hawthorne effect. Educational Psychology, 26, 143– 146.