Uploaded by Lemeyoh

Research Methodology Lecture Notes

advertisement
co
●
●
●
Specialist in a particular area
○ Eg., Cardiologist, PT,
Advantages
○ Have really detailed knowledge within their field
Disadvantages
○ Unsystematic
■ “You might try this one time, try another next time. Jump higher one
time, lower another”
○ Difficult to determine what caused the outcome
○ Often the results are not shared/communicated
Deductive reasoning (major premise, minor premise, into conclusion)
Inductive reasoning (specific premise, into general idea)
Both types of reasoning used to:
● Develop a research question
● Answering + interpreting research question
Necessary Example:
● Air is necessary for human life
○ Without air, we are dead
Sufficient Example
● Air is insufficient for human life
○ Air does not guarantee our existence
■ Food, Water, etc.
Journal article types
● Primary article (empirical study) – a primary study is one that aims to gain new
knowledge on a topic through direct or indirect observation and research. These
include quantitative or qualitative data and analysis.
○ IMRaD = Intro, Methods, Results and Discussion
● Review article – a review article provides a summary of existing research in a
field/topic area. There are several common types of review articles:
○ Narrative reviews (literature reviews) – Summarises some of the existing
evidence in a field or topic
○ Scoping reviews – these are broad reviews that aim to gather as much
evidence as possible and maps the evidence into themes.
○ Systematic reviews – these are highly structured reviews that utilize
pre-planned methods to include/exclude articles.
●
●
Meta-analysis – is a type of research study that combines and/or analyzes data from
different primary studies (usually) in a new analysis in order to strengthen the
understanding of a particular topic.
Case studies – report specific instances of interesting phenomena. A goal of case
studies is to make other researchers aware of the possibility that a specific
phenomenon might occur. This type of study is often used in medicine to report the
occurrence of previously unknown or emerging pathologies.
○ Anecdotal, but not an anecdote
Theories, models and the scientific method
● Theory: “Set of interrelated concepts (constructs), definitions, and propositions that
specify relationships among variables and represent a systematic view of specific
phenomena” -Portney & Watkins (2009)
● The Scientific Method:
○ Have preconception (theory)
○ Make prediction from preconception (hypothesis)
○ Conduct experiment/obtain data to compare
○ Update preconception from data (induction/deduction)
Definitions
● Variables - anything you observe or measure
● Construct - a useful “idea” represented by 1 or more variables
○ A connection between variables that we measure
● Hypothesis - A prediction of what you think is going to happen
● Theory - systematic synopsis of interrelated constructs/variables
● Model - simplified representation of a complex phenomenon - they can incorporate
multiple theories
Health and Well-being Theory
● Theory - Health and Well-Being Theory
↕
↕
● Constructs - Health
well-being
↕
↕
↕
↕
● Variables - BP, cholesterol, stress, anxiety
Why do we need theories?
1. Summarizes existing knowledge/observations
a. Organizes multiple studies and ideas
b. Induction: specific observations to generalizations
2. Predicts future events
a. That can or cannot be observed
b. Deduction: general theory to specific instances
3. Generates hypotheses
a. Tests of hypotheses support or refute theories
b. Stimulate development of new knowledge
MODULE 2: PICOT APPROACH
The scientific method
● Make an observation
● Formulate a question
● Formulate a hypothesis
● Design an experiment
● Execute the experiment
● Analyse the results
● Draw conclusion
● Formulate a new hypothesis
Development of a research question
● Identify the problem you wish to examine
● Could come up with the problem based on:
○ Clinical problem
○ Literature review
○ Research theory
○ Prior observation(s)
● The answerable question has to be defined in terms of population, intervention, and
outcome
A model for developing a research question - PICOT
● A useful model to help structure an answerable question
● Used to formulate clinical or research questions
● Breaks down question into four/five key elements
●
POPULATION
○ Who is the group who will participate in your research?
○
○
●
How do you describe the group?
To whom do you want to apply the research results?
■ Can be defined in many ways:
● Disease or condition
● State of illness
○ Often in clinical research
● Region
● Demographics
● Risk factors
○ Often in epidemiological research
○ Target population
■ Identify the population that you wish to study and hope to apply the
results of your study to
○ Accessible population
■ The portion of the target population from which you are able to recruit
participants
○ Sample
■ The participants who you recruited and who met your inclusion criteria
● VOLUNTEERS (typically what we have to rely on)
Intervention/exposure
○ Intervention refers to the treatment that participants in your study will receive
○ What is it you want to know the effect of
○ Sometimes you don’t have to do anything!
■ The intervention group (or exposure group; or treatment group) does
NOT have to be a manipulated intervention. It can refer to
self-selected, identifiable groups or other classifications.
● Manipulated
○ Surgical procedure
○ Pharmaceuticals
○ People who exercise vs. sedentary (“force” people to
train)
○ Rehabilitation technique
● Not Manipulated
○ People who live near power lines vs. those that do not
○ People who exercise vs. sedentary (self-classified or
researcher classified)
○ People who eat a ‘Mediterranean’ diet vs. a ‘North
American’ diet
○ Older adults vs. younger adults
○ High school vs University education
Control group or Comparison group
● Typically, the intervention group(s) and control group(s) are two (or more) levels of
the same variable. For example:
● In some studies, there is no comparison between groups
For example, what is the prevalence of cardiovascular disease in Canada?
Examines an outcome (cardiovascular disease) in a population (Canadians)
No intervention or control groups
• Control groups need to be carefully constructed. Sometimes more than one type of control
group is necessary. For example:
What is the mean of the compression force of the LBP group
● 16N/Nm
What is the standard deviation of the compression force of the LBP group?
● +- 6 N/Nm
The compression force of the asymptomatic group is statistically lower than that of the LBP
Group (p < 0.05)
● No
OUTCOME
● Dependent variables
● Discrete/Categorical
○ Typical to have subsets
○ Hospitalisation rate
○ Education level
○ IVD degeneration stage
○ Surgical requirements
● Continuous
○ Wide variety of outcomes typically have relevant units
■ VO2 max
■ HR
■ Force
TIME
● How long do you want to follow the outcome
● Can be over time:
○ Longitudinal
●
○
Can be at one specific point in time
○ Cross-sectional
Time as an intervention/control
● Time can also be an intervention/control
○ Eg., Is the prevalence of dementia in Canada higher in 2021 than in 1991?
■ Population: Canadians
■ Intervention (exposure)Control: 2021 vs. 1991
■ Outcome: Prevalence of dementia
■ Time: 30 years
PICOT practise
● You were curious if there was a difference in vaccination rates of COVID vaccine
eligible adults based on geographical location. You examined public health records
from 7 different health units in the Province spanning March 2021 to March 2022
○ P: Adults in ontario
○ I: Geographical location (8 different health units)
●
○ C: NO CONTROL
○ O: vaccination rates
○ T: 1 year (March 2021 to March 2022)
A researcher was interested in how the pandemic has affected youth outdoor
physical activity in the Waterloo Region. They have activity data from 2018, 2019,
2020 and 2021
○ P: Youth in waterloo
○ I: Pandemic, 2018, 2019, 2020, 2021
○ C: Pre pandemic (2018, 2019)
○ O: Physical ability
○ T: 4 years, each assessed individually
How do we develop a research question?
●
●
●
What are the effects of aerobic and light resistance exercise over a 12 week period
on adults over 65+ with acute stroke, and their ability to perform a 6 minute walk as
compared to standard care.
Does a 10 week hyaluronic acid injection and conservative treatment program
decrease shoulder elevator muscle fatiguability for adults >40 with rotator cuff tears
compared to conservative treatment.
●
○
○
○
○
○
P: resistance-trained women
I: ketogenic diet
C: non-ketogenic
O: health parameters (VAT, BMC, BMD, BP)
T: 8-weeks
■
How does a keto/non-keto diet affect health parameters of 21
resistance-trained women over the course of 8 weeks.
Research Hypotheses
● Null hypothesis (H0):
○ A sample is representative of (equal to) a population
○ Intervention (exposure) & control groups will have the same outcome
○ No difference between groups or there’s no effect of an intervention/exposure
● Alternative (research) hypothesis (H 1):
○ This hypothesis contradicts the null hypothesis
○ A sample does not represent (differs from) a population
○ The outcome of the intervention & control groups will differ
Research Hypotheses and Directionality
● Non-directional:
○ “There is a difference between the groups”,
○ “Blood pressure will be altered”,
○ “There will be a change in blood pressure”
■ Predict a change or difference in the outcome measure, but you do not
specify which way that change is/will go
● Directional:
○ “Quitting smoking will reduce cancer risk”
○ “Grade 3s are taller than Grade 1s”
○ “Exercise will lower blood pressure”
■ Predict the direction of the change or difference in the outcome
measure between the groups
Definitions - Variables
● Independent variable
○ This is what a researcher typically manipulates
○ it is selected by the researcher to determine its relationship/effect on some
other observed variable
○ The independent variable is plotted on the x-axis of a graph
● Dependent variable
○ This is what is measured
○ This is the outcome of interest as selected by the researcher
○ The dependent variable is plotted on the y-axis of a graph
Definitions - Types of variables
Control Variables
● These are variables that are held constant by the researchers
● The goal is to minimise the effects that these variables might have on the dependent
variable or other aspect of the study
Confounding Variable
● These are variables other than independent variable that may have an effect on the
dependent variable
● They can lead to erroneous conclusions about the relationship between the
independent and dependent variables
Intervening Variable
● Is a conceptual variable
● Difficult to define/measure
● For example: health
Reliability – in relation to measurement
● Reliability – sometimes referred to as repeatability or precision
● A researcher should consistently get the same output when providing the same input
or performing the same measurement.
○ E.g., you measure the body mass of a participant 3x and get 70.1kg, 70.1kg,
70.2kg
● True reliability occurs when our measurements are consistent and free from random
errors
Factors affecting test-retest reliability
● Effects of testing
○ Participants ‘learn’ and perform better on subsequent trials
● Effects of Test/Retest intervals
○ Too much rest = boredom
○ Too little rest = fatigue (physical, mental)
● Rater Bias
○ People will perform measurements slightly differently
○ Same person should measure the outcome on all participants
● External Factors
○ Ambient conditions
○ Noise
○ Temp
○ Distractions
P - 89 men and woman aged between 30 and 65 years old with T2DM and body mass index
between 30 and 35kgm^-2
I - VLCK diet
C - standard low-calorie diet
O - safety and tolerability
T - 4 months
Independent variable - Diet
Dependent variable - weight loss
Control variable - age, sex, BMI
Confounding variable - physical activity, years with disease, smoking
Validity
● In the context of measurement: Validity is the extent to which an instrument
measures what it is intended to measure
○ Determines the ‘believability’ or ‘trueness’ of results
● Can a measurement device be reliable but not valid?
○ YES!
○ Sometimes an instrument can be reliable but may not be measuring exactly
what you are intending to measure
Measurement Reliability and Validity
● Can we have a reliable outcome measure that is not valid? YES
● Can we have a valid measure that is not reliable? NO
^not reliable + valid
^reliable but not valid
^reliable and valid
Internal validity is the degree to which a study establishes a cause-and-effect relationship
between the treatment (independent variable) and the outcome (dependent variable)
● Threats to Internal Validity
○ Selection - the groups are not equivalent
■ A difference between groups could have been present at the start
○ History - it refers to some event/effect during the study, other than the
independent variable that influenced the dependent variable
○ Maturation - developmental (physical, mental) changes occur in participants
during the study, which may influence the dependent variable
○ Testing - if multiple trials are performed, participants might improve the more
trials they complete
■ Especially important with new/unfamiliar tasks
○ Instrumentation - Instruments could be unreliable. Instruments are not valid.
Observer/rater bias
○
Attrition - Unequal loss of participants from groups after random assignment
has occured
How to control for/mitigate threats to internal validity
RANDOM ASSIGNMENT (mitigates) - hallmark
of experimental research
Attrition/withdrawal/drop-out - intention-to-analyze all data is analyzed regardless of subjects
dropping out or receiving a treatment when they should really be in the control group
Other requirements/considerations
● Control group
● Pre-test and post-test
● Masking
External Validity
● External validity refers to whether causal relationships can be generalised to different
measures, persons, settings, and times
● In other words, how generalizable/applicable are the findings to a wider setting (eg.,
●
●
Threats to external validity
○ Selection of participants - if the sample is not representative of the population
from which it was drawn, the generalizability is reduced
○ Selection of Treatment - if the treatment is not likely observed or found in the
“real world”
○ Multiple treatment effects - if multiple treatments are applied to an individual,
a prior treatment might influence the next treatment
○ Repeated testing - a pretest (or repeated testing) can affect the participant’s
responsiveness to the independent variable
Ways to mitigate threats to external validity:
○ Random sampling is randomly drawing people form a target population to
participate in your research
○ Selecting an appropriate research design may reduce the multiple treatment
and testing threats. Washout periods will help mitigate the multiple treatment
effect.
What is critical appraisal of research
● Critical appraisal is the process of carefully and systematically examining research to
judge its trustworthiness, and its value and relevance in a particular context. It is an
essential skill for evidence-based medicine because it allows people to find and use
research evidence reliably and efficiently.
Modern Day Research Ethics
● Social and clinical value
● Scientific validity
● Fair subject selection
● Favorable risk-benefit ratio
● Independent review
● Informed consent
● Respect for potential and enrolled subjects
Social and clinical value
● Will answering the research question have significant value for society or for present
or future patients with a particular illness
● The answer to the research question should be important or valuable enough to
justify some risk
● Only if society will gain useful knowledge - which requires sharing results both
negative and positive - can exposing human subjects ot the risk and burden of
research be justified
Scientific Validity
● Is the question researchers are asking answerable? Are the research methods valid
and feasible? Is the study designated with a clear scientific objective and does it use
accepted principles, methods, and reliable practices.
● Statistical power must be sufficient to definitively test the objective. Invalid research is
unethical because it is a waste of resources and exposes people to risk for no
purpose.
Fair subject selection
● The primary basis for recruiting and enrolling groups and individuals should be the
scientific goals of the study — not vulnerability, privilege, or other factors unrelated to
the purposes of the study.
● Consistent with the scientific purpose, people should be chosen in a way that
minimises risks and enhances benefits to individuals and society
Favourable risk-benefit ratio
● Risks can be physical (death, disability, infection), psychological (depression,
anxiety), economic (job loss), or social (for example, discrimination or stigma from
participating in a certain trial).
● Has everything been done to minimize the risks and inconvenience to research
subjects?
● Do the potential benefits outweigh the risks?
Independent review
● To minimize potential conflicts of interest and make sure study is ethically acceptable
before it even starts, an independent review panel with no vested interest in the study
should review the proposal and ask important questions.
Informed consent
● For research to be ethical, most agree that individuals should make their own
decision about whether they want to participate or continue participating in research.
● This is done through a process of informed consent in which individuals (1) are
accurately informed of the purpose, methods, risks, benefits, and alternatives to the
research, (2) understand this information and how it relates to their own clinical
situation or interests, and (3) make a voluntary decision about whether to participate.
● There are exceptions to the need for informed consent from the individual — for
example, in the case of a child, of an adult with severe Alzheimer’s, of an adult
unconscious by head trauma, or of someone with limited mental capacity.
Respect for participants and enrolled subjects
● Individuals should be treated with respect from the time they are approached for
possible participation—even if they refuse enrollment in a study—throughout their
participation and after their participation ends.
● Keeping their private information confidential.
● Respecting their right to change their mind, and to withdraw without penalty.
● Informing them of changes to the risks and benefits of participating.
● Monitoring their welfare and, if they experience adverse reactions, untoward events,
or changes in clinical status, ensuring appropriate treatment and, when necessary,
removal from the study.
● Informing them about what was learned from the research
MODULE 3: RESEARCH COHORT AND CASE CONTROL STUDIES
No design is “superior” to any other...
● but there is often a “most appropriate” design for a specific scenario
●
Research Types
●
● Basic research
○ Conducted to increase knowledge and fundamental understanding of the
physical, chemical, and functional mechanisms of life processes and disease.
It is not directed to solving any particular problem in humans or animals
● Applied research
○
●
●
involves the application of existing knowledge, much of which is obtained
through basic research, to solve a practical problem.
Clinical Research
○ Patient- or end user-oriented research with human subjects. Patient-oriented
research includes:
■ Mechanisms of human disease
■ Therapeutic interventions
■ Clinical trials
■ Development of new technologies
Translational research
○ part of a unidirectional continuum in which research findings are moved from
the researcher’s bench to the patient’s bedside and to the community
Research Design Definitions
● Descriptive
○ Describes an outcome in a population. Characterises who, where, or when in
relation to the what (the outcome of interest)
■ Eg., oh the sky is blue!
● Analytical
○ Examines the relationship between intervention and outcome
○ Test hypotheses
○ The “how” and the “why”
■ Eg., why is the sky blue
● Qualitative
○ Subjective/interpretive observations
○ Identifies themes in observations - forms narrative/story/essay
○ Does not test a hypothesis, but may lead to hypothesis development
● Quantitative
○ Objective, measurable, units
○ Test hypothesis
■ Require statistical analysis
Contrasting qualitative and quantitative
●
●
●
Qualitative Strengths
○ Generates new ideas, hypotheses
Quantitative Strengths
○ Test hypotheses and allows us to examine cause and effect relationships
Quantitative Research Designs
● Observational
○ Is non-manipulated studies/research
○ Researchers do not attempt to influence/manipulate participants or the
surroundings
● Experimental
○ Is a manipulated study
○ Participants are randomised to receive intervention or control
● Quasi-experimental
○ Lacking 1 or more element of experimental research
■ Eg., you want to do an elementary school study with 2 classrooms,
you're not randomising, you are splitting them off based on how they
are already split
The utility of observational research
● Studying the otherwise un-study-able
○ When manipulation of an exposure (independent variable, IV) is not possible,
not practical, too complex
○ Cannot manipulate for ethical/logical reasons e.g., toxin exposure, education
level/attainment
○ Cannot manipulate the variable of interest e.g., age, sex, personality
● Prioritising external validity
○ Laboratory manipulation does not well-represent real-world phenomena.
■ Eg., treadmill walking vs outdoor walking, “social engagement”
● Generating research questions
○ Scientific method - observe, then develop a question
Observational Research Design - Time element
Advantages
● Less expensive
● Less likely to drop out
● Controls for ‘period effects’
● Data on ALL variables are collected at one time
Disadvantages
● Do not know whether exposure(s) happened before or after outcome
● Associations identified between variables may be difficult to interpret
● “Snapshot” timing not guaranteed to be reflective of ‘real-world’ settings
Observational Research Design
● Collected multiple times = repeated measures
● Could be 5 months, 5 years, 10 years, etc.
Advantages
● You may observe patterns in the outcome (Dependent variable, DV) over time
● Establishes an order of events
● Reduces recall bias of participants
● May provide insight into causal mechanisms
○ Can not definitively say it for sure
Disadvantages
● Time consuming and expensive
● Usually requires a large sample size
● Affected by ‘cohort effects’
○ Eg., generational cohorts (gen z, gen y), Individuals are affected differently
based on when they were born
● Cannot be used to suggest causation - only associations
● Despite temporal aspects - may not know if exposure precedes outcome
Difference between prevalence and incidence
● Prevalence refers to the total number of individuals in a population who have disease
or health condition at a specific period of time, usually expressed as a percentage of
the population.
○ Who has the disease now?
●
○
Incidence refers to the number of individuals who develop a specific disease or
experience a specific health-related event during a particular time period (such as a
month or year).
○ Who will develop the disease over time?
○
Observational Study Designs Discussed in KIN 232
● Case-control study
○ Participants are selected based on an OUTCOME of interest (eg.,
hypertension)
● Cohort study
○ Participants are selected based on a POPULATION of interest
● Definition of a cohort:
○ A collection or sampling of individuals who share common experience and/or
characteristics, such as age, sex, activity level, location, education etc.
■ Examples:
● Birth cohort: group born at the same time
● Geographic cohort (residents of the same area)
● Historical cohort (group exposed to the same historical event)
Cohort Studies
● Participants are recruited based on cohort
● A cohort study is a longitudinal study.
○ It may be:
■ Prospective:
● Recruit participants and track them forward in time
● Outcome is evaluated in the future
■ Retrospective:
● Recruit your participants and identify past/historical exposures
○ ONLY exposure of the particular participant (no family)
● Outcome is evaluated at time of recruitment (present day)
Cohort Studies - prospective vs retrospective
Advantage of Cohort Studies
● Longitudinal (time element)
○ Can determine temporal sequence of risk factors versus outcome
● Best external validity
○ More likely to be representative of ‘real-life’ scenario/environment
● Representative
○ If the population is appropriately sampled, the risk estimates may be
generalizable to the population
● Multiple Exposures Outcomes
○ Often multiple exposures and outcomes are evaluated within one study
Disadvantages of Cohort Studies
● Large sample is required (in order to capture outcomes e.g., heart attack in FHS)
● Expensive - participant compensation, researchers/staff
● Attrition Bias
○ Overtime, people will drop out
) affects outcomes/results
○ Tends to be people who are most sick )
●
●
Measurement Bias:
○ If measurement methods change over time, this may alter rate/risk estimates.
Hard to measure certain variables consistently over time.
Poor Internal Validity
Recruiting people from a population
● Researchers wish to study individuals with cystic fibrosis (CF)
● Prevalence ≈ 0.00033% Thus, 1 CF case for every 3000 participants
● How many people would you need to recruit if you wanted to capture 100 CF cases
in your study?
■ 300,000
Case-Control Studies
● Deliberately recruit participants based on outcome status (diagnosis). Then, you can
assess how exposures affected that outcome.
○ Case: have the outcome/disease of interest
○ Control: Do not have the outcome/disease
● Adu. - will capture even the most rare disease
○ Fewer people
○ Cheaper
Recruitment
● Recruited based on having some outcome
● Research can recruit in different ways:
○ Incident cases
○ Prevalent cases
Where can you recruit from?
● Sources of cases:
○ Population-based
■ Recriot from all cases in the population
● Strengths (external validity):
○ Cases are representative of population
○ Results are generalizable to population
● Weaknesses:
○ More difficult to recruit
○ How do we find the cases?
○ Hospital-based
■ Drawn from cases admitted and treated in a hospital.
● Strengths:
○ Easy to identify cases
○ Access to medical records/history
● Weaknesses:
○ Typically more sick
○ May be different in other ways compared to general
population
Control Group Recruitment
● How do we identify appropriate controls?
○ Want to match the controls as closely to the cases as possible
■ Age, sex, ht., wt., smoking/non,
● Often recruit from family or friends
○ Advantage: Often similar in many factors
○ Problems:
■ May not generalize to population
■ May be too similar to cases to find differences
Challenges in Case-Control Studies
● Selection Bias
○ Selection bias occurs when the subjects studied are not representative of the
target population about which conclusions are to be drawn.
● The way that cases and controls are recruited alters the relationship between
intervention/exposure & outcome
○
○
Control selection and case selection must NOT be based on exposure history
E.g., If fish consumption is the exposure, and blood pressure is your outcome
of interest, then recruiting from a fishing village would be inappropriate
● Recall Bias
○ Recall bias is a type of information bias common in case-control studies
where the cases (or their families) are more likely to recall a prior exposure
than the controls.
■ It is all about exposure and outcome
● For example:
○ Cancer patients may be more likely to recall exposure to potential
toxins/carcinogens
○ Diabetic patients may be more likely to remember poor diet/lack of physical
activity
● Misclassification
○ Non-differential misclassification
■ Cases and controls are misclassified equally
■ Will make detection of a true effect less likely
○ Differential misclassification
■ Only one group (cases or controls) are misclassified
■ Can alter the magnitude and/or direction of the effect
Case-Control Study Design
● Case-control studies can be:
○ Cross-sectional, or
○ Longitudinal
■ Longitudinal case-control studies are always retrospective
■ i.e., exposures were evaluated/reported in the past
●
●
Some important results from observational studies
● Absolute risk is the actual risk of some event happening given the current exposure.
There is no comparison between groups. For example, if 1 in 10 individuals with
●
exposure develop the disease then the absolute risk of developing the disease with
exposure is 10%
Absolute risk =Outcome/Outcome+Control
●
● Absolute risk (Exposed) = 983/(983+4467) = 0.180 or 18%
● Absolute risk (Not Exposed) = 85/(85+7941) = 0.0105 or 1.05%
Relative risk
● Example: the relative risk of developing lung cancer (outcome) in smokers (exposed
group) versus non-smokers (non-exposed group) would be the probability of
developing lung cancer for smokers divided by the probability of developing lung
cancer for non-smokers.
● Relative Risk (RR) = [OE/(OE+CE)]/[ON/(ON+CN)]
● [983/(983+4467)]/[85/(85+7841)]=17.03x higher in Exposed group
Odds Ratios (does not take into account all of the people
● The odds ratio (OR) is a measure of how strongly an event is associated with
exposure. The odds ratio is a ratio of two sets of odds: the odds of the event
occurring in an exposed group versus the odds of the event occurring in a
non-exposed group. The larger the odds ratio, the higher odds that the event will
occur following exposure. A ratio equal to 1 means there’s no association between
exposure and event (disease).
● Odds Ratio (OR) = (OE/CE)/(ON/CN)
● (983/4467)/(85/7941) = 20.56 greater odds
Parkinson’s Study
● Type of study: Case-control longitudinal - retrospective
● Expeosure/event: Appendectomy (yes/no)
● Population: 62.2 million patients
● Absoulte Risk (Exposed): 4470/488190=0.00915627112
● Absolute Risk (Not Exposed): 177230/61,700,000= 0.00287244732
● Relative Risk: 0.00915627112/0.00287244732=3.18762020673x
Does living in a fishing community affect the risk/odds of developing mercury poisoning?
●
● Absolute risk (Exposed) = 578/(578+14458) = 0.03844107475
● Absolute risk (Not Exposed) = 13/(13+15012) = 0.00086522462
● Relative Risk (RR) = [578/(578+14458)]/[13/(13+15012)]= 44.4290116825x
Do elite athletes have an elevated risk/odds of developing irregular heart rhythms?
●
●
●
●
Absolute risk (Exposed) = 5806/(5806+19448) = 0.22990417359
Absolute risk (Not Exposed) = 354/(354+748) = 0.32123411978
Relative Risk (RR) = [5806/(5806+19448)]/[354/(354+748)]= 0.71569039349x
MODULE 4: EXPERIMENTAL DESIGNS
● Specifically testing hypotheses in a systematic manner
○ Deliberate consideration of variables (i.e., DV, IV, Control, Confounding)
○ A major aim is to examine a cause & effect relationship
■ (does x cause y?)
1. Manipulation of variables
2. Control Group
3. Random Assignment
Manipulation of variables
● Independent Variables:
○ In experimental research the independent variable is chosen/manipulated by
the researcher. Whereas, in observational research the independent variable
is not manipulated.
●
●
Dependent Variables: Affected by the independent variable.
Experimental Research:
○ Participants are assigned to an interventional group (many possible groups)
Researcher manipulates the level of the independent variable by group (more
on this to come!
Types of independent variables
● Active Variable:
○ An IV that CAN be manipulated (CAN BE, NOT IS MANIPULATED)
■ Eg., drug dosage, exercise intensity
● Attribute Variable
○ An IV that CANNOT be manipulated
■ Eg., genes, geographical location, disease presence, age, sex
**if primary comparison of interest is an attribute variable, then an experimental design is
impossible as you cannot manipulate the independent variable
● Experimental research must have 1 active variable
Control groups
● The group you compare your intervention group to
○ Control may be:
■ Standard treatment
■ Placebo
■ No treatment/no intervention
■ Sham surgery
○ What are the characteristics of a good control group?
●
There are potentially some ethical issues with control group design
○ By design, experimental research involves withholding experimental
intervention from one group
■ Is the experimental intervention known to be beneficial?
■ Is the control group required to abstain from standard care?
■ Sometimes, the control group receives the experimental intervention
after some delay (at the end of the trial) – if it is deemed that
beneficial.
Random Assignment
● Random Assignment = Every participant has the same chance of being assigned to
any group in the study
○ Does NOT mean that each participant has a match in the other group
○ Randomization would guarantee traits balance out in an infinite population but
not in a finite sample – we assume there is balance (measure attribute
variables to confirm)
○ Researchers compare groups by important traits & extraneous factors to
make sure they are similar
○ Chance of being assigned to a given group does not depend on:
■ Time of recruitment (early or late in the study)
■ Location of recruitment (one site vs. another
Matching vs. Blocking vs. Random Assignment
● Can randomize within blocks of a trait (sex, age, etc.)
○ Considered randomization
●
○
Matching
○ Some studies match on key variables (sex, age, etc.)
■ Not considered randomization
○
…But experimental design can’t do everything
● Limitations of experimental
○ Impossible if you cannot manipulate variables
○ Tight control over extraneous variables may limit generalizability
■ Relatively poor EXTERNAL VALIDITY
● Often difficult/expensive to manipulate intervention
Experimental Design
● Within the three factors associated with experimental designs (above), there are
several options that can be incorporated into the design.
○ 1. Masking (blinding)
○
○
○
2. Number of assessments
3. Number of experimental groups
4. Number of independent variables
Masking
● Certain individuals involved in the experiment are prevented from knowing (i.e.,
masked) certain information about the experiment.
○ Generally refers to being ‘masked’ to group allocation (e.g., control vs.
treatment)
○
●
●
●
Masking Participants
○ May have the expectation that experimental group will have better (or worse)
outcome
○ May influence participant’s behaviour or performance, thus influencing results
Masking Researcher(s) Administering Intervention
○ Avoiding bias
○ Avoiding mistakes when administering/giving away variables
■ Eg., “oh here is your placebo”
Masking of Researcher(s) Evaluating Outcome
○ Expectation that experimental group will have better (or worse) may influence
measurement, especially where not entirely objective
Placebo
● Definition:
○ A ‘dummy’ treatment administered to the control group to distinguish specific
and nonspecific effects of the experimental treatment
○ Participants are generally masked to group assignment by administration of a
“placebo condition”
○ The Placebo Effect
■
○
Definition:
● .Observable, measurable, tangible effect that has nothing to do
with the actual intervention because you did not receive it
Placebos have identified effects, especially in cases of pain, depression,
insomnia, anxiety
Masking
● Certain individuals involved in the experiment are prevented from knowing (i.e.,
masked) certain information about the experiment.
○ Generally refers to being ‘masked’ to group allocation (e.g., control vs
treatment)
○
Number of Assessment: Pre/post
● The most basic design is a pre/post evaluation
○ Randomised Control Trial
●
Number of assessments: Post- only
●
●
●
●
What is the disadvantage of this design?
○ Are groups equivalent?
■ Likely because of random assignment but not 100% sure because we
back a pre-test
When would you use it?
○ Time
○ Learning - how you respond to measurement
■ Eg., injury
Number of assessments: Repeated Measures
● Third measurement is added
●
●
●
RO O/X O
RO O O
●
●
●
●
●
●
R OXOO
R O O O
X = intervention
R = random assignment
O = observation
No X = control group
●
How long does an effect take to disappear (how long they linger)
○ Eg., resistance training (how long does the hypertrophy last for)
●
■
^18lbs
^240lbs
Understanding comparisons
Between Group Comparisons
1) Were the groups the same/different at the start of the study (“pre”)?
a) Compare the red and black bar in the pre-test
b) The same
2) Were the groups the same/different at the end of the study (“post”)?
a) Compare the red and black bars in the post-test
b) different
Within Group Comparisons
3) Did the control change pre vs. post?
a) Red bar = the control
b) Compare the red bars between pre and post
4) Did the experimental group change pre vs. post?
a) Compare the black bars between pre and post
●
●
●
●
RO O
ROXO
ROXO
What is the advantage of this design over the standard two-group design
○ Determining if more of the independent variables is better or worse?
■ Eg., What is the dose effect?
Repeated Measures between subject
● Between subjects design:
○ Participants are assigned to and only receive 1 level of the independent
variable
○ Examines the variability and differences BETWEEN groups
Repeated Measures Within Subject Design
● Within Subject Design:
○ Participant receive every level of the independent variable
○ Examines the variability and differences WITHIN a person, and of course
examines the effect of the independent variable.
○ The different levels of the independent variable can be given to the
participants in a randomized or fixed order
Crossover Design
● Cross-over design is similar to the repeated measures design, but allows for
wash-out period in between interventions:
○ Wash-out period = effect of treatment ‘washes out’ or diminishes to baseline
○ Reduces the likelihood of carryover effects
○ Preferred when only two interventions are used
Kyle UG, Genton L, Hans D, Karsegard L, Slosman D, Pichard C. Age-related differences in
fat-free, skeletal muscle, body cell mass and fat mass between 18 and 94 years of
age. Eur J Clin Nutr 2001;55:663–72.
Midterm question on board (you have picture)
●
●
●
●
●
P - 28 males, overweight/obesity, 18-45yrs
I - HIIT/MICT
C - No control
O - Blood Pressure, central aortic stiffness
T - 6 weeks
MODULE 5: QUASI-EXPERIMENTAL DESIGN
Quasi experiments
● Used under conditions where true experimental designs are not possible (usually
clinical conditions) or are too expensive.
● Cannot be used to infer causation as conclusively as true experimental trials
○ Poorer internal validity
● Most useful if:
○ Control is exerted where possible
○ Masking (blinding) procedures are used
Quasi-Experimental: Non-equivalent Pre-test-Post-test control group design
●
●
OXO
O O
A nonequivalent groups design is a between subjects design in which participants
have not been randomly assigned to groups
Think about it....
●
What is/are the independent variables in this design and what are the levels of
independent variable?
●
●
●
○ Iron supplementation
○ Time (pre + post test_
What are the weaknesses in this design?
○ The groups may not be equivalent
○ However, there was a pre-test
Why might you use this design
○ When you have pre-existing groups (eg., grade 3 classrooms)
The independent variable is time
Let’s think about it....
● What is the independent variable in this design?
○ Time
●
●
●
●
●
●
●
●
●
●
●
What are the weaknesses in this design?
○ No control group – poor internal validity
Why might you use this design?
○ Ethical
○ Time frame is really short
What are the weaknesses in this design?
○ No control group - poor internal validity
Why might you use this design?
○ Ethical
○ Time frame is really short - pilot studies
What is the independent variable in this design
○ Time - there are 2 levels -> pre, post
Is there a “treatment”?
○ Yes
Does everyone in the study receive the same “treatment”?
○ Yes
Let’s revisit the box approach to laying out research designs.
When there is a control group (Figure 1), there are two independent variables: time
and supplementation.
When the control group is removed, supplementation is no longer considered an
independent variable, yet it is a “treatment”.
This is being driven from a statistical perspective, the statistical comparison is being
made within the factor “Time”.
○
SKIP 13, 15
Quasi-experimental Design: Repeated Measures
● These repeated measures designs may include:
○ Pre/Post
○ Interim assessment(s)
○ Follow-up assessment(s)
Comparisons: Internal Validity
● Control vs. single group quasi-experimental designs
● Better internal validity when a control group is included in design
● What threats to internal validity does incorporation of a control group help rule out?
○ History
○ Testing
○ Instrumentation
Time-Series Design
● Multiple measures to document patterns or trends in behaviour over time
○ Purpose is to determine trend in outcome over time
●
○
What type of study is this (at the highest level)?
○ Observational
●
In order to make this into a quasi-experimental design, what needs to happen?
○ Manipulate variables, Control
Interrupted Time-Series Design
● Interrupted Time-Series:
○ Time-series is “interrupted” by an intervention within the series of
measurements
○ Used for population level interventions (including policy changes)
■ Randomization not generally possible, thus, quasi-experimental
■ May or may not have a control group
■ Repeated measures before and after intervention
■
Interrupted Time Series Design: Single Group
● Intervention to reduce inappropriate prescription of antibiotics
○ Intervention: Tracked antibiotic prescriptions by patient/New policy regarding
antibiotic prescriptions
● Drug prescriptions were assessed monthly for 2 years pre and post intervention
○
Interrupted Time Series Design: Single Group
●
Interrupted Time-Series Control Group Design
● Two-groups enrolled
● Repeated measures before and after intervention
● One group receives intervention, other does no
○
Control Group Interrupted Time-Series Design
●
Advantages/Disadvantages
● Advantage:
○ Time-Series: Multiple measurements enhance confidence that there was a
real change .
■ Versus normal variation
■ Versus repeated testing effects
● Disadvantages:
○ Lack of randomization and/or control still reduces internal validity relative to
experimental design
○ Also, may be a confounding (e.g., historical) event that occurred at same time
as intervention
■ E.g., Smoking, parent/family member died/cancer
Randomized controlled trials, randomized clinical trials, and clinical trials
● These terms are used interchangeably, which creates confusion.
● Randomized controlled trial: A study design that randomly assigns participants into
an experimental group or a control group. As the study is conducted, the only
●
●
expected difference between the control and experimental groups in a randomized
controlled trial (RCT) is the outcome variable being studied. George Washington
University
Randomized clinical trial: A study in which the participants are divided by chance into
separate groups that compare different treatments or other interventions. Using
chance to divide people into groups means that the groups will be similar and that the
effects of the treatments they receive can be compared more fairly. At the time of the
trial, it is not known which treatment is best. National Cancer Institute
Clinical trial: A research study in which one or more human subjects are
prospectively assigned to one or more interventions (which may include placebo or
other control) to evaluate the effects of those interventions on health-related
biomedical or behavioral outcomes. National Institutes of Health
○
Phases of clinical trails
● Preclinical research
○ Non-humans in laboratory research conditions
○ Screens many potential compounds
○ If promising, request to move to human trials
● Phase 0 trial
○ First trial in humans
○ Very small number of healthy people (N = 10-15)
○ Very small dose to ensure safety
● Phase 1 trial
○ Determines the largest dose without serious side effects
○ Larger sample of healthy adults, but still relatively small (N = 50-200)
○ Dosage, timing, side-effects
● Phase 2 trial
○ Given dose found to be most beneficial in Phase I
○ In the case of SARS-CoV-2 vaccine phase II has been large (thousands)
○ Small sample of people with disease/at risk for disease (e.g., patients)
○ Effect should be ≥ standard treatment to continue trials
○ More appropriate for a drug developed to treat a disease like cancer
● Phase 3 trial
○ Randomized controlled trial (masked/blinded)
○ Large studies with thousands of participants - years
○ New therapy vs. standard treatment/placebo
● Phase 4 trial
○ Post-approval
○ Alternate populations
○ Risk factors, benefits, optimal use
●
●
Mice and rats are generally regarded as the go-to animal for pre-clinical research
Phases of Clinical Trials
● Quasi-experimental (single group)
○ Preclinical research
○ Phase 0 Trial
○ Phase 1 Trial
● Quasi-experimental OR Experimental (drig vs usual care)
○ Phase 2 Trial
●
Experimental (RCT, placebo, blinded)
○ Phase 3 Trial
○ Phase 4 Trial
●
MODULE 6: Descriptive Statistics, Correlation, and Regression
Descriptive statistics
● Descriptive statistics
○ What are descriptive statistics? These are tools used to organize and
illustrate data. They include, but are not limited to tables, graphs, measures of
central tendency, measures of variability, etc.
○ Population: A population is a group of people that share similar traits or
characteristics. It is a group of people that you are interested in studying or
making inference about. Target population vs. Accessible population
○ Sample: A sample is a subset of people from the target population and is
ideally randomly selected from the population and is used to make inferences
back to the population. E.g., randomly select 10 people from class
○
● Measurement
○ Quantify characteristics by assigning numerals to variable according to
certain rules. e.g., height in cm; left-handed = 0, right-handed = 1
○ Scales of measurement
■ Non parametric statistics (Non-normal variables; qualitative)
● Nominal - categories with no order
○ Left handed vs right handed
● Ordinal - categories with order - difference b/w rank or order is
meaningless
○ Eg. 100m race 1st, 2nd, 3rd or pain scale
■ Just interested in the order
■ Parametric statistics (Normal variables quantitative)
● Interval - equal intervals between categories - no true zero
○ Temperature (degrees)
■ No absence of something if 0
● Ratio - equal intervals between categories - true zero
○ Temperature (kelvins, mass, height, velocity, distance)
○ Scores
■ What is a score? A score is a value associated with any variable that
you have measured.
● e.g., height – if you are 180 cm then your score is 180 cm
■ In this course, we primarily work with interval or ratio data
Data sets or distribution
● A data set or distribution is a collection of scores – typically arranged in a series of
columns (individual variables) and rows (an individual’s scores).
●
Possible outcomes: no mode
FILLOUT ABOVE^^ SLIDE 4
Measures of central tendency - Median (M or X)
● Median = the score that divides the data set in two equal halves. In other words, it is
the score at the 50 th percentile.
● There are two approaches to determine the median score and it depends on whether
you have an odd number of scores in your data set or an even number of scores.
Measures of central tendency - Mean
● Mean = The mathematical centre of all the scores in a data set; the closest value to
all data (scores)
Advantages
● Considers the magnitude of all scores
● Reflects the total sum of all scores
Disadvantages:
● Affected by extreme scores
●
Measures of variability - Range
● Measures of variability provide information about the spread of scores. Said another
way, variability values provide information about the extent that scores differ from
each other.
Range = maximum (highest) score – minimum (lowest) score
Min = 160m
Max = 190cm
Range = 190-160
Range = 30cm
● Advantages:
○ Very easy to do
● Disadvantages:
○ Provides minimal information
○ Only considers 2 scores, which are at the extreme ends of the distribution
Measure of variability - Variance
● Variance is an average of the squared deviations about the mean. Variance provides
an indication of how much the scores in a data set are spread out around the mean
●
This is the theoretical aspect of variance. From a mathematical perspective the
calculator is a little bit more complex
○ Symbol for Variance = s^2
■ MAKE SURE IT IS LOWERCASE S (s^2 NOT S^2)
○
○
Measures of variability - Standard deviation
Measures of variability - Coefficient of variation
●
●
●
Height (cm)
○ X = 176.3cm
○ s = +/- 9.53
Weight (kg)
○ X = 75.35kg
○ s = +/- 14.65kg
■ Calculate coefficient of variation
● CV = 9.53/176.3 * 100 = 5.40%
● CV = 14.65/75.35 * 100 = 19.44%
Frequency graphs = frequency polygon and histogram
●
○
○
○
Mode = 18 years
Median = 19 years
Mean = 19.7 years
●
age
f
16
17
18
19
20
21
22
23
24
Quick comment on histograms
●
●
●
^N (number of scores in data set)
More data means a more smoother histogram
●
Quick comment on skewness
^ +ve skew -> (affected by large scores)
●
●
Xbar = sample mean
Xtild (squiggle line) = sample median
Frequency graphs - cumulative frequency polygon
● CF = cumulative frequency
^<- -ve skew (affected by small scores)
Frequency graphs – percentile polygon (cumulative percent)
Need to start with covariance before talking about correlation/regression
● Covariance directly influences correlation/regression. Covariance is a measure of
how two variables vary together. E.g., height and weight
Units?
● Kgm -> what does this mean?
Scaling?
● Magnitude of covariance is affected by magnitude of x and y scores
Pearson Product Moment Correlation (PPMC)
●
●
rxy indicates the strength and direction of a relationship between two variables (X &
Y) e.g., height (X) & weight (Y) OR determined the reliability of a repeated measure
of one variable (X1 & X2) e.g., measure height twice
Rxx - same variable measured twice
Correlation and linear regression
● What is it used for?
○ Linear regression or correlation is a measure of the strength of a
relationship/association between two variables
● What is the difference?
○ “The main difference between correlation and regression is that in correlation,
you sample both measurement variables randomly from a population, while in
regression you choose the values of the independent (X) variable” (Handbook
of Biological Statistics, McDonald JH 3rd ed. 2014)
● Researcher choose the independent variable
○ X value is Regression
Linear Regression
● Linear regression is used to study the linear relationship between a dependent
variable (Y-axis) and one or more independent variables (X-axis).
● The dependent variable (Y) must be continuous.
● The independent variable(s) may be either continuous (age), dichotomous (yes/no),
or categorical (social status).
● The initial judgement of a possible relationship between two continuous variables
should always be made on the basis of a scatter plot (scatter graph).
○
○
GET THE NOTES FOR THE DAY YOU MISSED!!!
The normal curve or normal distribution
● Why is the normal distribution important?
1. Distribution of many variables show normal distribution (weight, height, blood
pressure)
2. Distribution of sample means produce normal curve
3. Normal curve allows determination of relative frequency/proportion and
probabilities
Properties of a normal distribution
What inputs determine the appearance/shape of the normal curve?
● μ = population mean
● σ = population standard deviation
● X = a score
● f = frequency
○ Properties of a normal curve:
■ 1) Symmetrical about mean
■ 2) Area under the curve is a proportion
■ 3) Defined by mean and standard deviation
■ 4) Asymptotic
■ 5) ± 3 s represents where the majority of scores are located
■ 6) point of inflection at ±1s
The normal curve – relative frequency and percentiles
●
●
Side note: since the curve is symmetrical, the mode, median, and mean are all the
same value
Means will never change
The normal curve - relative frequency and percentiles
●
●
●
●
●
Group 1: narrow
Group 2: more spread out
Group 3: even more spread out
Depending on mean, you can shift curve
Depending on standard deviation, you can shift the spread of the curve
Standardized scores, z scores, and the normal curve
● Standardized scores (e.g., z score) are powerful because we can compare and
interpret scores from virtually any normal distribution of interval or ratio scores (data).
● All normal curves can be expressed in standardized terms, referred to as “z scores”.
● z score is based on mean and standard deviation of distribution.
● A z score is the distance a raw score (X) is from the mean relative to the standard
● deviation
●
○
Midterm 2 is sad despite having more marks
■ Need more information
● Eg., Midterm 1: Mean score = 30 marks, SD = 10 marks
● Midterm 2: Mean score = 90 marks, SD = 15
z score determination
● Any known score (X) can be expressed as a z score by knowing the mean and
standard deviation of the distribution.
●
○
Therefore, all normal distributions can be expressed in terms of z scores (unitless)
and thus, are standardised for common interpretation and comparison.
○ Midterm 1 calculation: Z = 45 - 30/10 = 1.5
○ Midterm 2 calculator: Z = 75 - 90/15 = -1.0
Use z scores to determine:
1. What proportion of the normal curve represents a certain score (X)
2. How many people represent a certain proportion of the distribution
3. What percentile represents a score
4. What score represents a certain percentile
Using z scores
● Normal curve allows determination of relative frequency or proportion
●
●
Normal curve allows determination of relative frequency or proportion of area under
the normal standard curve
●
●
○
○
Green area = 50%
(using the z-tables posted in kin232)
○
○
○
○
○
●
Area between mean and z = 0.2910 -> 29.1%
Total area = 50% + 29.1% = 79.1
Area beyond z = 0.2090 -> 20.9%
Total area = 100% - 20.9%
Total area = 79.1%
Total area = 50% - 29.1% = 20.90%
Z score
● Z = (52 - 40)/7
● Z = 1.71
○ Thus, joe falls on the right side of the curve
Using Z-value table
●
●
Area from mean to z of 1.71 is 0.4564 -> 45.64%
Joe’s percentile is 50% + 45.64% = 95.64%
Sam:
● Z = (35 - 40)/7 = -0.714
○ Mean to z = 26.11%
○ Percentile = 50% - 26.11% = 23.89%
Bill:
● Z = (44 - 40)/7 = 0.571
○ Mean to z = 21.57%
○ Percentile = 50% + 21.57 = 71.57%
Percentage of people between sam and bill is 71.57 - 23.89 = 47.68% OR 26.11 + 21.57
= 47.68%
INFERENTIAL STATISTICS, PROBABILITY & HYPOTHESIS TESTING
Inferential Statistics
● Inferential statistics are used to make judgements of the probability that an observed
difference between groups is statistically significant or that the difference between
groups happened simply by chance
Sampling Distribution of Means
Consider 3 different types of distributions:
●
3) Sampling distribution of means
Measure an infinite number of samples with 100 undergraduate students in each sample
●
●
The sampling distribution of means is a frequency distribution of all of the infinitely
possible sample means from a population
N = numbers of scores used to determine sample means
Three distributions
1. Population distribution
a. Data from which a sample is chosen
2. Sample distribution
a. Assume sigma and s to be equal
3. Sample distribution of means (many samples)
a. Theoretical
Sampling distribution of means
● Sampling distribution of means always:
○ Forms an approximately normal distribution
○ Has a mean (μ !" ) equal to the population mean (μ) from which it was created
○ Has a standard error that is a function of the sample standard deviation and
the sample size
■
■
Formula for standard error of the mean
● Larger sample size = lower variability
Probability in everyday life:
1. Poker
2. Insurance Rates - How much you pay in premiums is based on probability.
3. Weather
4. Flipping a coin
5. Rolling a die
Inferential statistics and probability
● Probability forms the basis for inferential statistics and statistical conclusions
●
●
Proportion of the total area under the curve for particular scores equals the
probability of measuring those scores
Hypothesis testing
● Sample data is used to make inferences about the population. For interval/ratio data,
the mean is the best representation of this data.
○ Hypothesis Testing:
■ Allows you to determine if the sample is representative of the
population
■ Allows you to determine if groups are from the same population or
from different populations
■ Uses sampling distribution of means to represent the population
Sampling distribution of means
● As N increases, variability σ "# about the mean is reduced
○
●
■ Standard error of the mean
Sampling distribution of means = normal distribution
○ Therefore, it possesses characteristics and properties of a normal curve.
○ remember z scores? – Table C.1 (probabilities)
●
○
Black curve = increased sample size (= increased variability)
Using sampling distribution of means
● Known:
○ population mean (μ) = 40.0 kg
○ population standard deviation (σ) = ± 10.0 kg
● Measure:
○ Grip strength of 100 individuals (N = 100)
○ Determine the characteristics of the sampling distribution
●
Using sampling distribution of means
● What is the probability of having a sample mean between 39 – 41 kg?
●
Using sampling distribution of means
● What is the probability of having a sample mean between 38 – 42 kg?
○ z score of -1.96 to +1.96: 95% Table C.1
■ Therefore, the probability of having a sample mean less than 38 kg is
<2.5% and the probability of having a sample mean greater than 42 kg
is <2.5%.
Using sampling distribution of means
● Using a criterion of 5%, is it possible that a sample with a mean of 36 is from the
population of interest?
No, NOT statistically possible!
● If a sample with a mean of 36 is not from the population of interest, then they must be
from a different population.
● We call this a “rare event” the fact that the sample mean was 36 did not simply
happen by chance. There’s a reason why the sample mean is 36
○ (could be from another population OR in the case of experimental research,
the independent variable/intervention caused an effect).
○
HYPOTHESIS TESTING
Significance Level:
● Probability is used to define the sample means as being too unlikely to represent the
underlying raw score population (rare event)
○ This sample is NOT from this population.
● Use probability of 0.05 (5 times out of 100) or 0.01 (1 time out of 100)
● Use symbol α (alpha) to represent significance level, i.e. α = 0.05
● As a researcher, you choose the α value (either 5% or 1%). As a KIN232 student,
you will always be told/given the α value to use. Working with an α value of 1% is
what we call being more conservative because it will be harder to have a mean that
is NOT from the population. In other words, it will be harder to find a significant
difference between means/groups
Graphically
●
○
One tail question: contains a “higher than” or “lower than” statement
Converting alpha and 1 or 2-tails into z score
Two tail question: state that there is a difference
Hypothesis testing - 7 steps
1. State null hypothesis in symbols and words
2. State alternative hypothesis in symbols and words
3. Use α level and decide if one or two-tailed
4. State rejection and retain rule
5. Compute appropriate statistic
a. The one step that is variable
6. Make decision by applying rejection / retain rule
7. Write conclusion in context of study
a. Scientific method
●
NOTE – THIS IS SIMPLY AN EXAMPLE. You will see specific aspects of each step
with the various types of analyses we will examine.
○ Example: Is the starting salary of University of Waterloo graduates higher
than other Ontario University graduates who average $50,000?
■ Questions you must ask yourself when reading the descriptions of the
study (this one is simple, they will get more complex).
● Is there a reference to a population?
● Is randomization mentioned?
● How many groups are there?
● Is there a control group?
● Is there an independent variable (or more than 1)? How many
levels of the independent variable are there?
● Is there directionality (e.g., lower, higher, smaller, larger) or not
(altered, difference, change)?
● Is there a pre-test and post-test?
●
Are there repeated measures?
1. State Null
a. University of Waterloo graduates have the same starting salary as other
Ontario University graduates.
i.
μ: University of Waterloo mean salary
ii.
μo : other Ontario Universities mean salary
1.
b. Most stats tests will have hypotheses that involve symbols AND words. For
t-tests, it is helpful (i.e. helpful for marking!) to label the means/groups
c. We will use these 7 steps to perform several different statistical analyses.
With each step there are similarities and differences between statistical tests.
I will do my best to highlight these similarities and differences to you.
2. State alternate
a.
b. University of Waterloo graduates have a higher starting salary than the
$50,000 average of other Ontario University graduates
3. Alpha level
a.
b. You will be told whether to use 0.05 or 0.01. The only decision you need to
make is whether it is a one or two tailed test.
4. Rejection Rule
a. using Table C.1, α = 0.01 / one tail
i.
z critical = + 2.33
ii.
reject Ho : if the test statistic ≥ z critical 2.33
iii.
retain Ho : if the test statistic < z critical 2.33
b. How you structure the rejection rule is dependent on several things:
i.
What type of statistical test you are performing
ii.
Is it a one tail or two tail analysis
iii.
What is your data? What I mean by that is it blood pressure (reduction
is good), is it hypertrophy (more muscle is better), is it memory loss
(more is bad)
5. Calculate
a. compute test statistic using appropriate test
6. Decision
a. Make a decision by applying rejection rule:
i.
Possibility 1: If test statistic is ≥ z critical (+ 2.33) then, reject null
hypothesis
ii.
Possibility 2: If test statistic is < z critical (+ 2.33) then, retain null
hypothesis
7. Conclusion
a. Conclusion must reflect the decision
i.
Possibility 1: If our decision was to reject Ho
1. University of Waterloo graduates have a significantly higher
starting salary (> $50 000) than other
2. Ontario University graduates. (p < 0.01)
ii.
Possibility 2: If our decision was to retain Ho
1. University of Waterloo graduates have statistically, the same
starting salary ($50 000) as other
2. Ontario University graduates (p > 0.01)
b. The conclusion must:
i.
Include mention of the dependent variable;
ii.
The comparison being made (i.e. control vs. exercise group);
iii.
The statistical results (i.e. statistically different, no statistical
difference, statistically);
iv.
The probability.
T-Tests
● First described in 1908 by Willian Sealey Gosset
● Degree in mathematics and chemistry
● Worked for Arthur Guinness Son and Company
● Invented t-test to help with the quality control of small samples
What is a T-Test
● Ratio that quantifies how significant the difference is between the “mean” of two
groups
● Considers the variance or distribution
● Used when the population standard deviation is unknown
● Uses t-statistics and compares to a t-distribtuion (Table C.2)
● More than one mean?
○ Then an ANOVA (analysis of variance) is used – discussed later
PAGE 3
● Df = degrees of freedom
3 Versions of the T-Test
● Ratio/Interval Data à parametric statistics
● One sample mean:
○ Compare Waterloo male body fat % to Canadian male body fat %
● Two sample mean:
○ Two different groups/Independent t-test - compare Waterloo KIN student’s
body fat % to Guelph KIN student’s body fat %
○ Same individuals (or people who are very closely matched; e.g., twins)/Paired
sample t-test - compare body fat % before and after exercise program
The 𝐭 !𝐗 test
1) You would use the 𝐭 !𝐗 test when the population standard deviation is not known and
thus you must estimate the population standard deviation.
2) In t test, sample distribution described by: student t distribution or t distribution (Table
C.2)
3) There are many t distributions; changes with sample size (N) (determined by degrees
of freedom, df = N-1
Degree of freedom (df)
● The number of values in a set of scores that are free to vary
● Defined by t !" test: df = N-1
Example
● You have measured the age of 3 UW students and Xbar = 20 years
○ Recall that Xbar = sigmaX/N
○ What is the sum of X? sigma X?
○ Possible values for X3
■ X1 = 21,
■ X2 = 25
■ X3 = ?
○ Xbar = sigmaX/N
○ 20 = X1 + X2 + X3/N
○ 20 = 21 + 25 + X3/3
○ 20 = 46 + X3/3
○ 60 - 46 = X3
○ 14 = X3
■ Note: X 1 and X 2 values are free to vary (i.e., 2 degrees of
freedom). X 3 cannot vary – it is a defined value that is
determined by X 1 and X 2 .
The 𝐭 !𝐗 test
● As sample size (N) increases the t distribution becomes closer to the z
distribution
● Comparison of t score vs. z score - as N increases (df ↑), t value becomes
closer to z value
The 𝐭!𝐗 test formula
● Recall: z = X-Xbar/s
● Txbar Xbar - (uo)/(sxbar)
Review of SD and SEM
● Standard deviation (s/SD/sigma)
○ Amount of variability from the individual data values to the mean
● Standard error of the mean (sx/SEM/sigmaxbar)
○ Amount of discrepancy likely in the sample mean compared to the population
mean
■ Defining variables
■
■
■
■
s/ σ – Population standard deviation
N/n – the sample size
x i – each value from the population
μ – Population mean
Premise of hypothesis testing
● Use sampling distribution of means to establish the range of sample means that
would represent the population from which the sample came
● Calculate one value (t !/ ) that represents the sample mean
● Compare the t !/ value to a critical/criterion t value from Table C.2
○ Generated based on degrees of freedom, n shit
● If t !/ lies beyond the critical value (within the tails of the distribution), the sample
came from some other population (in other words a treatment effect exists)
Example
t !" test: using sampling distribution of means
A local Waterloo elementary school has introduced a 6-month resistance exercise training
program to grade 5 students. The school board is interested if the program changes the
weight of the students.
You measure the weight of 20 grade 5 students involved in the program. The mean weight
for these students was 32.4 kg with a standard deviation of ±4.2 kg. The reported average
weight for Ontario grade 5 students is 30.7 kg.
Does a resistance training program change the weight of Grade 5 students when compared
to the Provincial average?
Use α=0.05
●
●
●
●
N = 20
Xbar = 32.4
S = +/- 4.2
Mew (u) 30.7
Df = n - 1 = 19
Step 1 hypothesis testing:
● Resistance training did not change the body mass in grade 5 students
Step 2:
● Resistance training caused a change in body mass of grade 5 students
Step 3:
● α=0.05
○ How many tails? One or two?
■ Two, there is a difference
Step 4:
● Fetain Ho : if t !" < 2.093 AND if t !" > -2.093
● Reject Ho : if t !" ≥ 2.093 OR if t !" ≤ -2.093
Step 5:
● Calculating the test statistic - 𝐭 !𝐗
○ Xbar = sample mean
○ μo = population mean
○
○
sxbar = standard error of mean
CALCULATION ON PHOTO GALLERY/SLIDES
Step 6 and 7:
● Decision
○ Since t !/ 1.81 < 2.093, retain null hypothesis
● Conclusion
○ There was no statistical difference in body mass of a sample of Grade 5
students when compared to Ontario Grade 5 students (p > 0.05)
■ p = probability
● When reporting NO STATISTICAL DIFFERENCES findings use p > 0.05 or p > 0.01
(depending on α)
● When reporting STATISTICAL DIFFERENCES findings use p < 0.05 or p < 0.01
(depending on α)
t !/ # ?!/ % - t test for two independent samples
● When would I use this test? When reading a study description, ask yourself, “how
many observations or measurements were made?”. In other words, how many
means are being generated and compared based on the design. We are looking for 2
means!
● Cohort study – Highly unlikely, why?
● Case-control – Possibly, e.g., You identify group of heart failure patients (cases) and
a similar group of healthy controls and measure their VO2 max. This is an example of
a cross-sectional case- control study
○ An observation (“O”) will generate a mean value. Thus, if you have 4
observations you will have 4 means
Example:
● You are working on the KIN 204 flexibility portion of the course and a new and
exciting flexibility program was just introduced. You were interested in answering the
question, does this new flexibility training program increase flexibility compared to an
old/traditional flexibility program? You recruited 18 students from the 2B KIN 204
class and randomly assigned them to a group (see table below). The students
participated in their respective programs for 1 month and then flexibility was
assessed.
○ Use α=0.05
T-TEST PART 2:
●
REPEATED MEASURES: typically pre/post test
●
D stands for “differences”
td
2 means are in this example -> pre + post
Null Hypothesis: The drug had no effect on tumour size.
● Ud(mew) = >0
Alternate hypothesis: the new drug reduced the size of the tumour
Conclusion:
● The tumour size was statistically decreased post when compared to pre. (p < 0.01)
○ When should it be <, when >?
■ Statistical difference p<
■ NO statistical difference p>
There will be no difference in height between males + females
● Hi = u1 - u2 does not = 0
○ There will be a difference in height between males + females
Sigma = 0.01
tcritical = +/- 3.355
Df = N1 + N2 - 2
Df = 8
2 tails
Reject Ho if <-3.355 or >3.355 (or equal to)
Retain Ho if >-3.355 or <3.355 (or equal to)
Module 10: Error Power and Confidence Intervals Student
STATISTICAL CONCEPTS
● Statistical difference (or no difference)
● Type I & type II errors
● Power
● Confidence Intervals
Statistical difference or no statistical difference
●
Takeaway: significance does NOT equal importance
Type I and Type II Errors
● Possible conclusions:
○ “Reject H o: sample is not from population” or, “the treatment group was
statistically different from the control group”. However, it is possible that we
made the wrong decision – the sample is in fact from the population. We have
committed Type I error.
● Type I error (also called α error): we rejected H 0 but H 0 is true
○ “Retain H o: sample is from population” or, “the treatment group was
statistically different from the control group”. However, it is possible that we
made the wrong decision – the sample is in fact from a different population.
We have committed Type II error.
● Type II error (also called β error): we retained H 0 but H 0 is false
Type I error: compare α levels
● Using larger α level (0.05 vs 0.01) has two effects:
○ 1) 5% is easier to reject H o vs. 1%
○ 2) 5% greater risk of making type I error
● Using α = 0.01 decreases the chance of type I error (false claim)
○ BUT Increases the probability of type II error (failure of detection)
Type II error: retain Ho but should have rejected Ho
● When discussing type II error, need to consider two different populations
○ Example: Researcher develops new drug treatment for reducing blood
cholesterol.
■ individuals with high cholesterol (control group)
● mean = 250 mg/dl
■ individuals with drug treatment (experimental group)
● mean = 240 mg/dl
● Conclusion: drug treatment does not work
○ But the drug actually does decrease blood cholesterol!!!
Type II error
● If drug is effective, there exists two populations:
○ 1) People with high blood cholesterol (control group)
○ 2) People using drug who have lower blood cholesterol (treatment group)
Type II error: β error
Power
● Probability of correctly identifying a statistical difference if one exists (correctly reject
H o)
● The researcher who did not find that a drug treatment was effective (when in truth the
drug did work) may have an experiment with low power.
● A study with low power may fail to detect a difference when one does exist
○
Factors affecting power
1) Significance level chosen
● Selecting 0.05 vs 0.01 will increase power
● A one tail analysis will increase power compared to two tail
2) Sample size - N effects the sampling distribution of means
3) Variability (σ) will change distribution similar to N
● large σ: wider distribution thus more overlap and decrease power
● small σ: tighter distribution and increase power
Large N:increase power
● Power = 1-B
● 1 - 0.2 = 0.8
Small N: decrease power
● 1-B
● Power = 1 - 0.4 = 0.6
Magnitude of difference (μ1 – μo )
● the greater the difference between means – the larger the power the magnitude is
determined by treatment effect
○ Does drinking cola cause stomach ulcers
■ drink 1 cola per week vs. drink 20 colas per day
●
●
Beta is smaller due to larger sample
Mean is placed around the middle of the two graphs
Confidence Intervals
● Definition:
○ A confidence interval (CI) is a range of scores with defined boundaries that
should contain the population mean.
○ The boundaries are calculated from the sample mean and standard error of
the mean.
○ The wider the CI, the more confident you can be that the population mean is
within the boundaries.
■ 99%
■ Alpha = 0.05
■ Alpha = 0.01
■ For example, the 95% confidence interval of the mean:
● The defined interval will include the population mean with 95%
certainty. Said another way, the probability of finding the
population mean with the confidence interval is 95%.
How to calculate a confidence interval
● We require a sample mean, a standard error of the mean and t value to calculate a
confidence interval
● (CI). The t value will depend on the probability and degrees of freedom. Typically, CIs
are 95% and 99%, thus, we require a t value that encompasses 95% or 99% of the
distribution.
○ Use the example: Xbar = 40 cm, s = ±8.8 cm, N = 22, and α = 0.05
X^2 one way test
Chi square test:
● Non-parametric test
● Nominal scale of measurement
● Frequencies of more than 2 categories (observations
●
X^2 tests
● Tests whether the frequency in each category in the sample data represents the
expected frequency based on previous research (or some known data)
●
●
○
O – observed frequency (sample)
E – expected frequency (based on population/previous research/known data
○ Used with categories describing one variable
○ Also called “goodness of fit” test, tests how “good” the “fit” is between our
data and the Ho
■ Data = observed frequency
■ Ho = expected frequency
X^2 distribution
●
●
●
Distribution is dependent on degrees of freedom
df = K – 1 where K is number of categories (options)
●
X^2 test
● If you know the expected proportion for each category:
○ burgers 35%, pizza 25%, subs 20%, other 20%
■ From previous experience, knowledge, research etc
● If you don’t know the exwpected proportion for each category:
○ Burgers 25%, pizza 25%, subs 25%, other 25%
■ Evenly distribute the proportion based on number of categories
○
Decision: Since X^2 18.78 > 11.34, we will reject Ho
Conclusion: University students have a statistically different preference for fast food when
compared to general population (p < 0.01)
OUR OWN STUDY (FAVOURITE CHIP)
●
●
●
x^2 = 7.07
Decision:
● Since x^2 > 7.07 < 9.49, retain the Ho
Conclusion:
● KIN232 students do not have a statistically significant preferred flavour of chips (p >
0.05)
X^2 two-way test
●
●
You can think of independent and dependent like a correlation. Does knowing
whether someone studied help us determine the likelihood that someone will pass. In
the example on the left there is no relationship between studying and the grade.
In Scenario B: that there is a relationship between whether you studied and whether
you passed. So these two variable are dependent.
Example:
● A health researcher is investigating if there is a relationship between smoking and
alcohol consumption. Individuals were classified for both variables as low, moderate
or high.
○
○
●
●
Example classification:
■ Low -> 0 - 1 drinks/week
■ Moderate -> 1-3 drinks/week
■ High -> 3-5 drinks/week
What are the expected frequencies ?
○ Assume “equal” probability for each cell based upon the observed row and
column total.
Decision:
● Since X^2 2.62 is < 9.49, retain Ho
Conclusion
● Alcohol consumption and smoking are not statistically related (p>0.05)
OR Alcohol and smoking consumption are statistically independent from each other
(p > 0.05).
MODULE 12: One way ANOVA
● ANOVA -> ANalysis Of VAriance
● Ratio/Interval Data
○ ANOVAs are used when you have more than 2 means to analyze
1 factor:
● Different groups – compare undergraduate body fat % at University of Waterloo,
University of Guelph and University of Toronto
● Same individuals – compare body fat % before treatment and at 1 month, 6 months
and 1 year after starting an exercise program
ANOVA Theory
● Example: Examine the effects of exercise on blood pressure.
○ Group #1: no exercise
○ Group #2: exercise 1X per week
○ Group #3: exercise 2X per week
○ Group #4: exercise 3X per week
○ Group #5: exercise 4X per week
■ Independent variable is exercise frequency – 5 levels
■ This research design produces 5 means
● If t tests were used it would require 10 separate analyses,
which increases potential for Type I error
ANOVA
● One of the more common inferential statistical procedure used in Kinesiology
research Keeps error equal to α
● Examines ratio of variances – there are 2 general sources of variance
○ 1) Within group variance
■ Describes variability within the conditions
■ An individual value is compared to group mean
● Variability is due to sampling error
● Individual differences
● Measurement error
○ 2) Between group variance
■ Describes the variability between conditions
■ Mean of each group compared to mean of all scores
● Within group variability
● Group is from different population (treatment caused change)
ANOVA Theory
Need the means, not necessarily comparing them
●
●
●
●
ANOVA – we calculate a F ratio
○ F ratio = between group variability/within group variability
Between group variability = treatment effect + within group variability. Thus:
○ F ratio = (treatment effect + within group variability)/within group variability
If groups are from the same population, then:
○ The treatment effect = 0
○ Thus, if the F ratio = 1 if there is no treatment effect
○ F ratio = 0 + (within group variability)/within group variability
The greater the F ratio (> 1.0), the greater the between group variance (due to larger
treatment effect)
○ A statistical difference is present when: Fcalculated ≥ Fcritical
One-way ANOVA
ANOVA Theory & Calculation
● Steps required:
○ 1) determine the sume of square (SS) *These will be given in KIN 232*
○ 2) determine the mean square (MS)
■ Divide SS by df
○ Calculate F ratio
■ Divide MSbetween by MSwithin
● F = between group variance/within group variance
○
One-way ANOVA formulae
●
Example
○ 0x/week, 3x/week, 5x/week
■ 20 people each program
●
●
●
dfbetween= K-1 = 3-1 = 2
Dfwithin = 60 - 3 = 57
Dftotal = N - 1 = 59
One-way ANOVA
Example:
A health researcher was interested in answering the question, “what is the optimal
frequency of aerobic exercise per week to lower mean arterial pressure (MAP) in
hypertensive individuals?” To answer this question, the researcher recruited 25 individuals
with hypertension (age 52±8 years, height 168±12 cm, weight 87±5 kg). These individuals
were randomly assigned to one of five groups. The details of the aerobic exercise and
groups are provided in the table. The study period lasted 12 weeks and mean arterial
pressure was assessed at the end of the study period
●
●
μ means ‘mean’
○ Null hypothesis: Aerobic exercise training will have NO effect on MAP
○ Alternate hypothesis: Aerobic training will have an effect on MAP
●
●
●
●
Dfbetween = k - 1 = 5 - 1 = 4
Dfwithin = 25 - 5 = 20
○ Use table: critical values of F: The F-tables (TALBE C.5)
■ FCritical = 2.87
Reject Ho if Fratio is > or equal to 2.87
Retin Ho if Fratio < 2.87
Decision: Since F 49.74 > 2.87, reject Ho
Conclusion: Aerobic training resulted in a statistical difference in MAP in individuals with
hypertension (P < 0.05)
● ***Notice how non-descript this conclusion statement is?*** That’s because ANOVAs
DO NOT provide any information regarding specific means/groups that are different
from each other. If the H o is rejected in an ANOVA, then all we know is that 2 or
more means are different from each other
●
How do we know what means are different from each other? (0, 1X, 2X, 3X or 4X per
week) Examine means for the different treatments:
○
Download