Uploaded by goaforeva

How to Read Research: A Biolayne Guide

advertisement
Table of Contents
Article 1: Overview of Research
3
- The Scientific Method
3
- Variables
4
- Types of Research
5
- Study Designs
6
- Types of Publications 7
Article 2: Reading and Interpreting Research - Abstract
- Introduction
- Materials & Methods
- Results
- Discussion
- Conclusion
9
10
10
11
11
11
11
Article 3: Statistical Concepts
- Overview of Statistics
- Data Representation
12
12
16
Article 4: Challenges for Researchers
20
- Funding 20
- Trusting Research
22
Article 5: Common Methods for Measuring Variables
- Body Water
- Body Composition
- Protein Metabolism
- Hypertrophy Measurements
- Energy Expenditure
- Hormones
- Muscle Excitation
- Strength Testing
- Psychometrics
- Closing Remarks
- References
How To Read Research: A Biolayne Guide
28
28
29
33
36
37
39
40
40
41
43
44
2
Article 01
Overview of
Research
Science is known as a branch of knowledge or body of
truth/facts. Science is based on research. The Oxford
University Press defines research as, “the systematic
investigation into and study of materials and sources in
order to establish facts and reach new conclusions” 1.
The term research can have a variety of definitions and
meanings depending on the context. There are many
different branches of research with diverse focuses.
This guide will provide a general understanding of
research from a broad perspective and narrow it
down to the details that are of specific importance
to exercise and nutritional science. It’s important to
understand that many of the definitions, topics and
concepts that we discuss have multiple definitions and
lack clear characteristics. With everything we discuss
in this guide, we will provide our best definition and
interpretation as we understand it. Our goal with this
guide is to provide you with the necessary knowledge
and information you need to critically read and
interpret scientific publications and their respective
findings. Remember, one study doesn’t prove anything.
Individual studies are pieces to a much larger puzzle.
Tuckman, 2012 2 characterized the research process
by these five properties:
1.
Systematic: researchers follow certain rules and
parameters when investigating a specific question
and designing a research study. This involves
identifying variables of interest, designing a
study to test the relationships of the variables,
and collecting data to evaluate the problem and
prediction.
2. Logical: examining the procedures from testing
a theory allows for evaluation of the conclusions
that are made.
How To Read Research: A Biolayne Guide
3. Empirical: collecting data.
4. Reductive: evaluating data to establish
generalizations for explaining relationships.
5. Replicable: the research process is recorded and
described in detail to allow for future studies to
test the findings and build future research.
The Scientific Method
You may not remember learning about the scientific
method in grade school, so let’s do a quick recap.
The scientific method is a formal set of steps that
researchers follow to conduct research. The scientific
method can be broken up in a variety of different ways,
but for the sake of simplicity we will divide the scientific
method into these four steps 3.
1.
Identifying and developing a problem: all
research starts with identifying a problem or
topic of interest and defining the studies purpose.
2. Formulating the hypotheses: a hypothesis
is a testable statement of the anticipated
results of a study. This is a formal prediction
of what will occur when the study is carried
out, based on prior results or theory.
3. Gathering data: researchers use processes
and validated methods to measure and
collect data during the study or experiment.
4. Analyzing and interpreting results: once data is
collected from the experiment or study, it is then
analyzed using statistical methods to determine
the accuracy of the hypothesis. Researchers aim to
understand what was found and how it fits within
the context of other evidence.
3
Variables
Variables are factors that can be measured or
manipulated during research. Once the problem is
identified variables of interest are determined to
design a study around those variables to be tested
and measured. There are a number of different types
of variables and here we cover the primary variables
that you should know to further understand the
research process.
Independent Variables
Independent variables are what is being manipulated by
the researcher to determine the relationship or affect
it has on another variable. Independent variables are
also known as the experimental or treatment variable,
input, cause or stimulus. For example, an independent
variable could be the type of diet subjects are following
(i.e. high carb, high fat, low carb, etc.). Independent
variables can also have different levels. For example,
if a training study is evaluating high, moderate and
low training volume and muscle hypertrophy, training
volume would be the independent variable with the
different levels being high, moderate and low.
Dependent Variables
Dependent variables are measured following a
treatment or stimulus. Dependent variables are
known as the output or response variable and they are
observed or measured to determine the effect of the
independent variable 2. The dependent variable changes
as a result of the manipulation of the independent
variable. Examples of dependent variables are body
composition, strength, resting metabolic rate, blood
hormones, etc. If a study is investigating high fat vs.
low fat diet and weight loss, weight loss would be
considered the dependent variable while the type of
diet would be considered the independent variable.
Control Variables
Control variables are factors that could influence the
results and are left out of the study 3. Control variables
are not a part of a study and instead controlled by the
researcher to cancel out or neutralize any potential
How To Read Research: A Biolayne Guide
effects they may have on the relationship between the
independent and dependent variables 2. The caloric
intake in a diet study could be viewed as a control
variable when comparing two different types of diets.
Extraneous Variables
Extraneous variables are factors that can influence the
relationship between the independent and dependent
variables, but it is not identified or controlled in the
study 3. This can cause spurious associations between
variables. There may be an association between the
independent and dependent variables but could be due
to both variables being affected by a third unknown or
uncontrolled variable (extraneous). For example, let’s
assume a study is examining differences in weight loss
when following a high carb/low fat diet or a high fat/low
carb diet and let’s say they don’t equate calories. By not
having any control over caloric intake that could be an
extraneous variable because it can impact the changes
between groups irrespective of the type of diet.
4
Extraneous variables are usually identified following an
experiment when associations between variables have
been identified and examined further. They can also
be identified by researchers during the study design,
but because of lack of resources researchers may be
unable to control or account for a specific variable.
Other variables known as confounding variables and
covariates are similar to extraneous variables and
often used synonymously, but slightly different. Just
know that extraneous variables, confounding variables
and covariates are additional, unknown variables
that weren’t identified or controlled in the study and
have some type of impact on the independent and
dependent variables.
Types of Research
There are many different types of research to
answer different kinds of questions and problems.
The different types and categories of research are
limitless, we will discuss the common types that
are generally incorporated into exercise and sports
science research.
Basic vs. Applied
Research in exercise and nutrition science can be
placed somewhere on a spectrum between basic and
applied research 3. Basic research is commonly referred
to as “bench science”. Basic research is difficult and is
generally done in a laboratory under tightly controlled
conditions. Basic research operates under scientific
theories and often involves animals, but the relevance
or direct value to practitioners is limited 3. You can
think of this type of research as a scientist in a lab
with pipettes and cell cultures, studying underlying
molecular mechanisms. In contrast, applied research
is limited in the type of control it offers, but it’s much
more practical and carries high ecological validity.
Meaning, it applies to real-world settings/conditions.
This type of research involves human subjects and
is based on common practice and experiences.
Comparing different diet and training programs
How To Read Research: A Biolayne Guide
while measuring fat loss or muscle growth would be
considered an applied form of research because they
are performed in real world settings with limited control
over the environment.
Quantitative Research
Quantitative research is the most common type
of research you will find in exercise and nutrition
science. Quantitative research is concerned with
numbers and groups, the aim is to determine the
relationship between variables 4. The relationships
between variables are expressed through statistical
analysis (we’ll cover later). This type of research is
objective and tightly follows the scientific method and
seeks to determine a cause and effect. Studies that
are classified as quantitative research can be further
classified into two different study types known as
experimental and descriptive (observational).
Experimental - Experimental research involves the
manipulation of treatments or interventions. The
aim of experimental research is to establish causeand-effect relationships and commonly utilizes
some form of randomization (discussed below) 3.
Experimental studies require diligent control over
variables and other factors that may impact the
outcomes of a study. Experimental studies are also
known as longitudinal or repeated-measure studies 4.
Experimental studies measure subjects before and
following treatments or interventions. This type
of research aims to explain phenomena through
controlled manipulation of variables, commonly
viewed as the ‘gold-standard’ for research.
Descriptive - Descriptive research is also known as
observational research and measures things as they are
without intervening 4. There is no attempt to change or
modify certain behaviors. This type of research doesn’t
attempt to determine cause and effect (although
many media outlets and even researchers are guilty
of attempting to infer causation from these results)
and instead characterizes phenomena as it exists.
This type of research is less controlled and utilizes
questionnaires, interviews and observation.
5
Qualitative Research
Qualitative research is concerned with words and
individuals. Qualitative research is more subjective
and seeks understanding of multiple realities/truths
and requires constant comparison and revision.
Qualitative research rarely develops hypotheses prior
to the study and instead uses more general questions
to guide the study 3. Qualitative research has been
growing interest in the field of exercise science and
is now being included more frequently. This type of
research has been historically used in social sciences
like psychology, sociology, and anthropology 5. This type
of research is concerned with behaviors like attitude,
beliefs, motivation and perception, all of which are
becoming popular in the field of exercise science and
sports medicine. Qualitative research is frequently used
to evaluate community and school physical activity
programs to understand the less tangible outcomes
like the participants attitudes and experiences about
a program of interest 5. Qualitative methods of data
collection can include open-ended questionnaires,
interviews or market research focus groups 5.
How To Read Research: A Biolayne Guide
Study Designs
Animal models
Animal model research commonly includes rats or
mice as subjects to perform more intensive and
controlled experiments. Other species are included
in various types of research and many debate the
ethical considerations associated with this design.
Nevertheless, humans share many anatomical and
physiological similarities with different animals, which
allows investigation into underlying mechanisms.
Animal models allow for testing of novel therapies
before applying to humans, although not all results can
be directly translated to humans 6.
Controlled Trials
Controlled trials include a group that does not receive
a specific treatment or intervention. This is called the
control group and either receives nothing at all or a
placebo.
6
Placebo-Controlled - When one of the treatments is
inactive and does not produce any impact or effect on
any of the variables it’s considered a placebo. Placebocontrolled trials can be single or double blinded.
Single blinded trials are when the subjects are blinded
to the type of treatment they’re receiving. In other
words, they don’t know if they’re getting the active or
inactive treatment. This is done to avoid the placebo
effect. If subjects believe one treatment is more or
less effective than the other it can actually cause a
psychosomatic change to occur irrespective of the
treatment itself. Double-blind trials include the subject
and the researchers being blinded to the treatments
and when done properly researchers are blinded to the
statistical analysis as well.
Randomized Controlled Trials - Randomizing
participants to groups can reduce the risk of researcher
bias on the outcomes of interest and assumes both
groups to be similar. This type of study design is of
the highest quality because it tightly controls for
factors and variables that could influence the results,
regardless of the effectiveness of the treatments or
interventions.
Crossover Designs
This type of study design includes both groups receiving
both treatments at different times. For example, group
1 may receive treatment 2 and group 2 may receive
treatment 1. After a specified time period, treatments
are switched to the other group. These studies are
unique in that each subject is able to be used as their
own control since they both receive each treatment.
Case Studies
Case studies observe and report data on one
participant (n = 1). Case studies provide an in-depth and
detailed analysis that can assist in developing theories,
evaluating programs, and developing interventions 7.
Case studies lack a specific intervention or treatment
and instead observe and control testing procedures.
This type of study design is generally categorized
as a type of quantitative, descriptive study, but can
be used in qualitative research as well 7. This type of
study design has gained popularity in physique athlete
How To Read Research: A Biolayne Guide
research because it’s hard to recruit that type of
population and implement an intervention that they’re
willing to follow. Generally case studies have widely
been utilized in fields such as medicine, psychology,
counseling and sociology 3.
Cohort Studies
This is a type of longitudinal study that investigates
a certain sample of people that share defining
characteristics. This type of design can be experimental
or observational depending on how it is applied.
Types of Publications
After a scientific study is conducted, analyzed and
written, it’s then submitted to a journal for peer review.
Peer review involves one or more professionals or
experts within the same field to critically evaluate
the submitted manuscript. Reviewers can choose
to simply reject the paper after reading it or suggest
revisions for the authors to complete before the
paper can be accepted. The peer review process is
not perfect by any means, but it provides a form of
regulation to maintain the quality and integrity of the
scientific literature and ensure the study is suitable for
publication. Different journals follow minor differences
in their rules and regulations. They also vary in the
way their publications are formatted, while following
a general template. All scientific journals have what’s
called an impact factor. The higher the impact factor
of a journal, the higher the quality and therefore,
higher quality studies are published in those journals.
The impact factor is calculated based on the number
of citations the articles receive that are published in
that journal. There are a number of different types of
scientific publications, but here we briefly describe the
primary types you’ll encounter.
Original Research
Original research is a standard peer-reviewed
publication, what you would consider to be a published
scientific study. This type of publication follows a
7
Systematic Reviews
The main purpose of systematic reviews is to create
generalizations by integrating empirical research 8.
Systematic reviews attempt to answer a specific
research question and use a systematic process to
collect relevant data sources and synthesize the
empirical findings. Systematic reviews address relevant
theories, critically analyze the data of the included
studies, attempt to resolve conflicting evidence on a
topic and identify central issues for future research 8.
Systematic reviews are a superior form of a literature
review because they use a systematic process to
collect, evaluate and synthesize the data on a particular
subject. Commonly thought to be the same thing as a
meta-analysis, systematic reviews differ in that they
don’t use any formal statistical methods to analyze the
combined data of studies, they simply summarize the
empirical evidence.
Meta-Analysis
general format including an introduction, methods,
results, discussion and conclusions. Original research
is considered a primary source and includes data and
results that have not been published previously.
Narrative (Literature) Review
Narrative reviews are considered secondary sources
and provide a review and general consensus on a
specific topic. Authors collect relevant, primary
source articles relating to a specific topic and provide
a summary of the most current and relevant evidence
pertaining to that topic. Narrative reviews are different
from systematic reviews in that they are based on
the opinion of the authors and lack strict control over
which studies to include in the review. You can think of
these as an opinion-based article including a collection
and summary of original research. These can be helpful
when trying to understand concepts, theories or a body
of evidence regarding a specific topic but be careful
accepting them as truth since it’s only the opinion of
the researchers who wrote it. These reviews can be
subject to confirmation bias and cherry picking studies
that fit their narrative.
How To Read Research: A Biolayne Guide
Meta-analyses include the results of two or more
studies. Meta-analyses were first introduced in 1976
by Gene Glass and defined as “a technique of literature
review that contains a definitive methodology and
quantifies the results of various studies to a standard
metric that allows the use of statistical techniques
as a means of analysis” 3. Meta-analyses can be
distinguished from literature reviews because they
include a definitive methodology for including specific
studies in the literature analysis, and the results of
various studies are quantified to a standard metric
called effect size (which we will cover later) 3. Different
from systematic reviews, they use various statistical
methods to combine and analyze the data of a number
of studies. Meta-regressions are an extension of metaanalyses and include a more effective and advanced
statistical tool to assess the relationships between
variables. Meta-regressions account for covariates or
other study characteristics of interest. When carried
out properly, meta-analyses are considered the highest
quality of scientific study.
8
Article 02
Reading and
Interpreting
Research
Reading research can be a challenging task for
those who are not experienced and educated to read
scientific publications. Before being able to interpret
results and findings from research, it’s necessary to
understand the layout and how to read a study. Most
peer-reviewed journal publications follow a similar and
general format, with minor differences. Understanding
the general layout of publications will make it easier
to identify key details of studies and understand the
findings and takeaways. This section of the guide
focuses on how to read scientific studies and interpret
their findings. After we cover the general layout and
briefly describe each section of a published study, we
How To Read Research: A Biolayne Guide
will cover basic statistics and dig into challenges faced
by researchers in exercise and nutritional science. We
will finish this section with how to trust studies and
evaluate studies reporting conflicting findings.
General Format
The author line of publications follows a specific
order. The first author is the one who coordinated
and had the largest role or responsibility in the study.
Generally, if this is a graduate student’s project or
thesis their mentor or supervisor will be listed last. The
remaining order of authors will be based on their level
of contribution. The general format for peer-reviewed,
academic publications include five sections known
as the introduction, methods, results, discussion, and
conclusion. The abstract is another section, but it is
separate from the actual publication.
Abstract
After the study title and author line you will find the
abstract. The abstract is a paragraph summary of the
study. The abstract includes one to two sentences
9
from each of the sections of the publication. Don’t be
an abstract warrior and only read the abstract to report
what the study found. The details are important, and
findings are accompanied with caveats.
Introduction
The introduction is the first section of all publications.
The introduction includes a discussion of recent and
previous studies that relate to the current study of
interest. Intro’s start with more general background
information and progress into key details and
publications that apply to the current study. The intro
also discusses any controversies between theories
or hypotheses and highlights the importance for the
current study. The intro includes two key pieces of the
study known as the purpose and the hypothesis:
•
Duration of the study: how long did the
experiment occur and how often did they observe
and measure changes?
•
Instrumentation: which devices and methods
were used to collect data. How was body
fat percentage (BF%) tested? Did they use
appropriate equipment for what they were
attempting to test? Were their measurements
valid and reliable?
•
Level of control: were the participants in a tightly
controlled environment (metabolic ward) or was
this a free-living experiment? Studies that include
supervision for resistance training studies are
more tightly controlled than studies that allow
Purpose - The purpose of the study is a one to two
subjects to train on their own. Studies that provide
sentence that describes the aim or the reason for why
the study is being carried out.
food to subjects during diet studies have more
Hypothesis - Based on previous research and
understanding researchers develop what’s known
as a hypothesis, a short explanation of the predicted
results. Hypotheses cannot be proven, but when the
data backs up the hypothesis it is “supported” and when
it doesn’t its “rejected” 10.
control over studies that rely on self-reported
nutritional intake. Ethical and diligent researchers
will specify their studies strengths and limitations
in the discussion but paying close attention to the
details in the methods will allow you to identify the
level of control in a particular study.
Materials & Methods
The Material & Methods (methods) section is where the
study design is explained, detailing the procedures for
each measurement during experimentation. Methods
provide specific details of how the experiment was
carried out so that future research can attempt to
replicate and build on previous results. Key details that
you want to focus on are:
•
Variables of interest: what did the researchers
manipulate and have control over (independent),
and which variables were tested or measured
(dependent).
•
Participants: how many people were studied
and what were their characteristics. Were they
male? What was their training status? Were they
overweight?
How To Read Research: A Biolayne Guide
10
Group Effect - This tells us if there was a significant
Results
This is the section of publications that most people
skip over or shy away from because most people find
math and numbers confusing. Later we will provide a
brief and general overview of statistics to help with
your confidence and ability to interpret results. In the
results section researchers report the outcomes of the
statistical tests that include the relationships between
data from experimentation 2. The results section
also includes the majority of figures and tables that
represent the data in a different way than reported in
the text. The results section is written so that readers
can interpret the data from only reading the text and
the figures are designed to represent the data in a
way that allows for interpretation without having to
read the results section. The results section does
not include any of the researcher’s interpretation or
explanation of the data, that occurs in the discussion
section. There are three types of effects generally (not
always) reported in the results section that you should
focus on. For the following sections we will reference
this table for an example:
Changes in Bodyweight between a high
carb and high fat diet.
Diet Group
Baseline
Post-Testing
High Carb
200lbs
180lbs
High Fat
190lbs
175lbs
change within a group, this does not compare groups,
but rather tells us if a group made a real change. For
example, this would tell us if the high carb group
experienced a significant change from baseline to
post-testing.
Interaction (group x time) Effect - An interaction
effect is what you want to focus on if you wish to
compare groups. This compares the rate of body
weight change from baseline to post-testing between
dieting groups. In other words, did the high carb group
lose more body fat from baseline to post-testing or did
the high fat group lose more body fat from baseline to
post-testing.
Discussion
Like the intro, the discussion is a heavier section
where researchers provide their interpretation and
explanation for the results they found. There is no
general format for this section, but includes an indepth summary of the results from the study that was
conducted. The majority of the discussion is focused
on comparing and contrasting the results of the
conducted study to what has been previously reported
by similar studies. The discussion and intro are good
places to learn about other studies you might not
have known about. Towards the end of the discussion
you’ll generally find a disclosure of the strengths and
limitations of the study. Every study has limitations
and if a study doesn’t explicitly mention their primary
limitations, that could be a red flag. Some publications
also include a conclusion within the discussion section,
but some journals may include a separate section for
conclusions or practical recommendations.
Conclusion
Main Time Effect - This simply explains if there was
a significant change in the dependent variable from
baseline to post-testing for all subjects. Referring to
the table above, this will tell us if there was a change
in body weight from baseline to post-testing for both
groups (high carb & high fat) combined.
How To Read Research: A Biolayne Guide
Everyone knows what a conclusion is, but in this
short section authors give a final summary of the
main takeaways and practical recommendations.
This is a more concise version of the discussion,
short and practical.
11
Article 03
Statistical
Concepts
Overview of Statistics
Most people cringe at the word statistics and we understand why. Math and statistics can be complex and
difficult to understand. There are various meanings for
the word statistics, which adds to its confusion. With
a mixture of math and logic, statistics is a branch of
mathematics that is concerned with collection, analysis and interpretation of data . Data are scores and
3
values that we obtain from measuring the outcomes
(dependent variables) of interest in a study. Collecting data is only one piece of the puzzle, if researchers
don’t know what to do with the data and how to properly
describe the data, then the findings may seem underwhelming. Statistics are a way of describing data characteristics and examining the relationships between
How To Read Research: A Biolayne Guide
variables, this allows for greater objectivity when interpreting research and drawing conclusions. This
section provides a simple overview of some common
and basic statistical concepts that you will encounter
throughout exercise and nutrition research. Again, this
is a brief section and doesn’t even scratch the surface
of the broader and more complex statistical methods
that exist. Statistics operate under a number of assumptions and rules, if these are violated, they can
misrepresent the data. Statistics is not our area of expertise and it’s important to realize that if you don’t fully
understand statistics they can be misused to deceive
people into believing the data is more appealing than it
actually may be.
Percent Change
Very simply, this is the change between two values expressed as a percentage. You have to be careful with
percentage change because it can sometimes appear
to be a greater change than it actually is. That’s why you
also want the raw or true values. For example, if a study
is looking at leptin changes and they have a baseline
value of 0.3ng/mL and a post-test value of 1.0ng/mL,
the absolute change is 0.7ng/mL, but the percentage
12
change is 233% [(1 - 0.3) / (0.3 x 100)]. While this change
is minimal and may not be meaningful, the percentage
change can make it appear as if it’s a big deal.
Central Tendency
The mean is probably one of the most commonly understood mathematical terms. The mean describes
the average value of a group of numbers. In statistics,
the mean is a measure of central tendency, which represents a central or balance point within a set of data 10.
The mode and median are similar to the mean because
they represent centrality, but technically they’re slightly different. Mode refers to the most frequent value
that appears in a data set, which may or may not be
close to the mean. Median refers to the middle point of
a data set, in other words 50% of the scores will fall under the median. For example, let’s assume the following
10 scores were collected during an experiment:
6 6 6 10 11 12 14 14 16 17
Mean = 11.2 The average of all scores
(6+6+6+10+11+12+14+14+16+17 **/** 10)
Median = 11.5 Middle value
(5 scores below and 5 scores above this value)
Mode = 6 Most frequent score
If the data set had an odd number of values, then the
middle value is simply the median (ex. 1, 2, 3; 2 would be
the median). Just remember there are slightly different ways to describe central tendency, but most often
you’ll hear about the mean since mode and median are
only reported for certain instances. When evaluating
data based on calculated means it’s important to identify any outliers or extreme values in the data. Outliers
and high variability of data can produce inflated or misleading results because the mean is sensitive to outliers and extreme values. In contrast, the median is not
sensitive to outliers and extreme values, meaning the
median won’t change if there is a greater spread in the
data. If the mean is being reported it’s important to also
take note of the standard deviation to account for this.
How To Read Research: A Biolayne Guide
Standard Deviation
The standard deviation is concerned with the variability or the spread of a data set. As previously mentioned,
the mean is the central point of a data set and the standard deviation is an estimate of the variability around
that central point. In other words, the standard deviation represents the typical amount that a score deviates from the mean. When the standard deviation is low
that means the spread or dispersion of scores is small
and more tightly grouped closer to the mean. When the
standard deviation is large it signifies a widespread or
high variability of scores, when this occurs the mean
may not be a good representation of the data. The
mean and standard deviation are forms of descriptive statistics which is useful for summarizing the
data of a specific group. Meaning, they are only able
to describe the data we have accrued, it cannot tell
us if the results we acquired will happen again. Other statistical tests can fall under another form known
as inferential statistics, which can allow (not always)
for conclusions and generalizations of a sample to the
larger population.
P-value
Probability is the underlying concept of p-values,
which is the likelihood that something will occur.
P-values reflect the level of significance, and the odds
that the findings are due to chance, it’s impossible to
have a p-value of 0 3. In exercise science the p-value is
considered to be ‘significant’ at p < 0.05. Meaning, researchers believe that the odds of their findings occurring by chance are 5 in 100 or they are 95% sure the results were not by chance and the observed differences
were a real change. In the results section when changes of a specific variable are reported there is a p-value
reported after (e.g., 103.5 ± 15.1 ng/dL (*p* = 0.02)). In
exercise and nutritional science, if the p-value is greater than 0.05 the result isn’t deemed to be significant.
This is also stated as ‘supporting the null hypothesis’.
The null hypothesis states there isn’t a relationship or
difference and instead the findings are due to sampling error or random chance. Statistical tests are performed to either support or reject the null hypothesis
and anything less than 0.05 rejects the null hypothesis
and accepts the research hypothesis. Statistical sig13
nificance is what you should identify when interpreting
results, but significant differences aren’t the only thing
you want to focus on. A study might show that one type
of diet lost significantly more weight than another type
of diet, but what if it was only by 0.5 lbs? That doesn’t
mean much, but how do you determine if significant results are meaningful? While p-values provide statistical
significance, effect sizes allow researchers to communicate practical significance of their results 11.
for meta-analytic conclusions, and they are commonly
used for future study planning using a power analysis 11.
Effect sizes can be interpreted based on recommendations by Cohen 1988, which states that effect sizes can
range from small (d = 0.2), medium (d = 0.5), and large (d
= 0.8) 12. Larger effect sizes are more significant. Effect
sizes are also commonly used to plan future studies by
predicting the sample size needed to detect a difference, this type of test is known as a power analysis.
Effect Size
Power Analysis
The effect size reflects the meaningfulness in the
changes that occurred during an experiment. While the
p-value tells us if there was a statistically significant
and real change, the effect size tells us the magnitude
in that change. In other words, effect sizes tell us the
magnitude of a relationship between two variables 8.
Effect size is an absolute value that represents the
standardized difference between two means 3. Effect
sizes are frequently used in meta-analyses to compare
results between different studies. Effect sizes have
been considered as the most important result of empirical studies because they are useful for providing the
magnitude of effects in a standardized metric despite
differences in measurement techniques, they allow
How To Read Research: A Biolayne Guide
Statistical power relies on the effect size, the significance criterion (generally p < 0.05) and the number of
subjects (sample) in a study 11. When researchers are
planning a study, they want to know how many subjects they will need to detect a significant difference
between treatments or groups. To accomplish this
they perform what’s called an ‘a prior power analysis’
which includes using effect size estimates from similar research, the significance criterion of p < 0.05 and
a generally accepted minimum level of power (0.80) to
calculate the minimum sample size needed to observe
an effect of a specific size 11, 12. Researchers could also
use the sample size, significance criterion and power
to calculate the minimal detectable effect size.
14
Correlation Coefficient (r)
Most of you have probably heard the saying, “correlation, does not equal causation”. Correlation is an association and in research we often want to know the
degree of association between two variables across a
group of subjects. In other words, an increase or decrease in one variable may occur with an increase or
decrease in another variable, but the changes in one
variable are associated (not caused) with changes in
the other variable. There are different types of correlations used in statistics, but here we discuss the
r-value, also known as the ‘Pearson product moment
coefficient of correlation’.
The correlation coefficient is a statistic used to describe the relationship between two variables (independent & dependent). The r-value can range from -1
to +1. A negative r-value represents an inverse relationship between two variables and a positive r-value indicates a direct relationship (we’ll show you this visually
in the ‘data representation’ section). For example, a decrease in body weight is commonly associated with a
decrease in leptin, this would be an example of a direct
relationship (+r), whereas a decrease in body weight is
commonly associated with an increase in ghrelin, this
would be considered an inverse relationship (-r). An
r-value of 0 indicates no relationship and an r-value of
1 indicates a perfect correlation, however it is likely impossible to achieve a 0 or 1 due to the variability in subject responses and other influences related to physical
characteristics, traits or abilities 10. In the scatterplot
section below, we will provide a visual explanation for
the strength and relationships of correlations. It is not
uncommon to evaluate the strength of correlation on a
spectrum (0.1 – 0.3 = weak, 0.3 – 0.5 = moderate, 0.5 – 1
= strong) 2. However, some statisticians advise against
this practice because correlation is context specific 10.
For example, a correlation in biological in vitro experiments could commonly consider a 0.9 to be a strong
correlation and correlations close to 0.5 would be
much weaker, whereas free-living experiments could
consider a 0.6 to be a strong correlation. Regardless,
it’s important to remember that the closer the r-value
is to 1, the stronger the correlation is.
How To Read Research: A Biolayne Guide
Coefficient of Determination (r2)
You will also encounter a statistic known as the coefficient of determination (r2). This is commonly used
with regression analysis and can be conceptualized as
a ‘correlational effect size’. It provides a percentage
of variance in one variable (dependent, outcome variable we want to predict) that can be accounted for by
the variance in the other variable (independent, predictor variable) 3. By squaring the r-value you obtain
R-squared (r2) which can be calculated to a percentage. For example, if we wanted to predict how much
leptin (dependent) would decrease as fat mass (independent) decreased in a group of people dieting we
would use the coefficient of determination. Let’s assume we obtained an r-value of 0.76 and you square it
(r2 = 0.762 = ~0.58 = 58%) the percentage signifies 58%
of the changes in leptin are predicted or explained by
changes in fat mass. If there was a regression line calculated and drawn on a scatterplot with leptin (y-axis)
and fat mass (x-axis), 58% of the data points would fall
within that regression line.
T-test
The statistical test used to compare the differences
between two means is known as a t-test. The larger the
t-value the greater difference there is between means,
larger t-values are likely to produce lower p-values.
There are two types of t-tests we want to focus on.
Independent - This type of t-test determines whether
two sample means are significantly different when the
two groups being compared are unrelated. For example, if a study randomized 20 subjects to a high carb
diet and 20 subjects to a high fat diet and you wanted to
know the extent to which the mean weight loss differed
between groups, you would perform an independent or
unpaired t-test. This type of t-test could also be used
to determine how different the two dieting groups were
in terms of body fat percentage at baseline since baseline differences can pose problems.
Dependent - Dependent or paired samples t-tests are
used when comparing two groups that are related in
some way or one group at multiple points in time (baseline and post-test: repeated measures). For example,
15
if a study was measuring muscle thickness in 10 males
before beginning a training program and then again following a training program, they would use a dependent
t-test to evaluate the difference between the mean
muscle protein synthesis from baseline and post-testing. If there are more than two groups we use a different statistical test.
Analysis of Variance
If there are more than two means/groups we wish to
compare, we need to perform an extension of a t-test
known as Analysis of Variance (ANOVA) 10. The score
that is generated from running an ANOVA is known as
the F-value (similar to a t-value) and indicates the size
of group mean differences.
One-Way - A one-way ANOVA is used to determine if
statistically significant differences exist between 3 or
more means/groups. For example, let’s assume a study
is comparing training volume with three groups (low,
medium, high) and the dependent variable of interest is
muscle growth. The ANOVA would tell us if a difference
exists between low vs. medium, low vs. high, and medium vs. high. However, a one-way ANOVA fails to tell
us where the significant difference in muscle growth
is for the three groups. You could evaluate this unofficially by examining group means, but to statistically
test where the difference is, you will have to perform a
post-hoc test.
Repeated Measures - You will frequently encounter
repeated measures ANOVA in the statistical analysis
section of many exercise science studies. This statistical test is used to compare the same individuals
across time points (repeated measures). For example,
let’s assume a study is comparing muscle growth at the
beginning, middle and end of a training program. You
would run repeated measures ANOVA to determine if
there were significant changes between time points.
Post-Hoc - If an ANOVA detects a statistical difference
between means, we then want to determine where this
significance lies. Is it occurring within a group over
time (baseline to post-test) or did one group exhibit a
greater difference compared to the others (interacHow To Read Research: A Biolayne Guide
tion)? There are a number of different types which we
won’t cover here, but just know that this gives a more
specific idea of differences between means/groups.
Data Representation
Figures, graphs and tables are used to represent data
visually, which can provide a unique perspective and
greater understanding of the results. There are tons of
different types of figures available, but we’ll talk about
a few common types you’ll often see. A couple of key
points we want to make regarding most figures and
graphs. Different journals will have varying formatting
requirements, but you can expect some components
to be the same. Underneath the actual figure there will
be a title and a description of what the figure is displaying. You will also find any special symbols (i.e., *)
to be defined here, but generally the symbols that are
used to represent statistical significance or depict a
relationship between variables. It’s important to take
notice of the axis titles, units of measurement and the
scale that is used. There are instances when the scale
of a figure doesn’t start at 0 and this can lead to misunderstanding of the actual data. If a graph or figure
scale doesn’t start at 0 there should be some type of
break expressed with two dashed lines (//) to represent
a nonzero baseline. Generally, graphs are better for
providing a general overview or “big picture” view of a
set of data, whereas tables are better for exact values
and individual raw data.
Histogram
Histograms are a common figure and generally the easiest to understand. Histograms are great when comparing groups or the distribution of a set of scores for
a particular group. Most people would also consider or
refer to these figures as “bar charts”. However, there’s
a slight difference. Bar charts are used for qualitative
data that are separated into categories (i.e., gender,
race, other specific groups) and the bars are separated
and not touching each other. Histograms have vertical
bars that are directly adjacent to one another with no
16
space (unless there’s an interval with no scores), signifying continuity 10.
Scatter Plot
Scatter plots are another type of graph that most people are familiar with. This type of figure commonly reports data points for individual scores for two variables
but could also be used to display baseline and post-test
scores for an individual 13. You’ll find this type of figure
is used most for correlational analyses and while the
data points are not connected by lines, a non-vertical
line of fit can be generated to summarize or predict the
relationship between variables or data points, known
as simple regression 13. Simply by looking at scatter
plots we can get a pretty good idea of the type of correlation and its strength.
example below that is not to scale.
There are 5 elements in all box plots that you want to
know to understand this type of visual depiction:
1.
Q1: This is the first side of the rectangle and signifies
the 25th percentile of the data set. Meaning,
Line Graph
Line graphs depict related data points that are connected with a line, sometimes they include symbols
[13]. Line graphs are great when comparing time trials
where there are multiple testing points over a period
of time. For example, comparing the response of two
different supplement treatments over a predefined period of time.
Box and Whisker Plots
Box and whisker plots (box plots) are used to depict the
distribution of a data set. Once you understand each
component of a box plot, you’ll realize how simple and
effective they can be at summarizing a set of scores.
Usually box plots are vertical, but we have provided an
How To Read Research: A Biolayne Guide
25 percent of the scores fall under this line.
2. Median: The median (as described previously) is
the middle value and 50 % of the scores fall under
this value.
3. Q2: is the right side of the rectangle and represents the 75th percentile, meaning 75% of the
scores fall below this value.
4. Whiskers: The whiskers can be found on either
17
side of the rectangle and depict the minimum and
maximum values within a set of scores. However, these do not include any outliers or extreme
values.
5. Outliers & extreme values: Outliers and extreme
values are any scores or values that are widely different from the rest of the data set and “stick out”.
There’s actually a mathematical way to determine
these for a box and whisker plot, but we’ll spare
you the details. Just know they are represented by
the O and E below and can be expressed as other
special symbols in different publications.
Forest Plots
You will mostly see forest plots in Joe Rogan podcasts
with James Wilks… just kidding. You typically see Forest Plots in meta-analyses because they depict the
individual results as well as the pooled results of the
meta-analysis. Forest plots will indicate the strength of
the treatment effect with the y-axis containing a list of
the studies included in the analysis and the x-axis will
have a distinction of what the studies favor (control vs.
treatment) 13. Each study will have their mean symbolized as a data marker and their respective confidence
interval (we will cover next, but generally 95%) represented as a horizontal line 13. The size of the data marker generally represents the sample size, or the weight
carried by that particular study in the meat-analysis.
Diamond markers are generally used to represent the
overall or pooled result 13. In the example below adapted by Morton et al. (2017), you will find three different
diamonds 14. The first two unfilled diamonds represent
the pooled results of trained vs. untrained samples and
the filled in or dark diamond represents the overall or
total results of the meta-analysis (including trained and
trained subjects). Oftentimes forest plots will contain a
clear description of what each marker symbolizes underneath the actual figure.
Error bars
Elements that you will commonly see on most figures
are error bars. Error bars are lines that represent the
variability of the data being reported. There are different types of error bars and if the legend or description
How To Read Research: A Biolayne Guide
under the figure doesn’t explicitly state what kind they
are they can be rather meaningless 25. The standard
deviation (SD) bars represent the typical difference between the data points and their mean, whereas standard error (SE) bars indicate how variable the mean will
be if you repeat the study over and over, and more subjects or samples decrease the SE 15. You’ll notice in the
forest plot above that they included 95% CI error bars,
which indicates where the true mean will fall within
that bar on 95% of occasions 15. SE and CI with wider
bars indicate larger error and shorter bars indicate
higher precision, as sample sizes increase the bars become shorte 15. Error bars are helpful in visually depicting the significance in changes between groups. When
error bars overlap the difference isn’t significant or in
other words, the larger the gap between error bars the
smaller the p-value will be. Error bars can be valuable in
justifying the authors conclusions, but like any statistic
they are only a guide and you should rely upon your own
logic and understanding to determine the meaningfulness in the results being reported 15.
Tables
Tables are generally self-explanatory and describe the
different symbols in the figure legend/description below. This table is from Layne’s PhD thesis where they
examined the time course of plasma amino acid levels in response to ingestion of various protein sources
18
[63]. What’s important to notice here is how the statistics are portrayed. The first number is the mean for the
particular group under the designated time category
and the second number is the standard error associated with the mean. The letters after the standard error
are used to statistically differentiate the means from
each other, while the means with an * indicate that they
are different from the baseline levels. For example, let’s
compare the 30-minute whey group leucine (Leu) levels to the 30-minute wheat group leucine levels. The
whey group has an ‘a*’ whereas the wheat group has
a ‘b*.’ This indicates that these values are statistically
different from each other (different letters) and both
are significantly different than baseline (because they
both have a *). However, let’s look at threonine (Thr)
levels in the Whey, Wheat, and Wheat + Leu groups at
90 minutes. The Whey group at 90 minutes has an ‘a*’,
while the Wheat group has a ‘b’, and the Wheat + Leu
group has an ‘ab.’ So what does this mean? It means
that the Whey group is statistically different from the
Wheat group and from baseline. Also, the Whey group
was not statistically different from the Wheat + Leu
group since they both share an ‘a.’ The Wheat + Leu
group was also not different from Wheat since they
both share the letter ‘b’ and they weren’t significantly
different from baseline.
Post-prandial changes for plasma amino acids 1-3
Baseline
Whey
Time (Min)
Wheat
Wheat + Leu
30
90
135
30
90
135
30
90
135
Leu
86 ± 4
226 ± 17 a*
164 ± 26 a*
173 ± 22 a*
151 ± 8 b*
86 ± 6 b
99 ± 5 b
211 ± 8 a*
137 ± 8
148 ± 3 a*
lle
69 ± 2
116 ± 11 a*
104 ± 4 a*
134 ± 16 a*
110 ± 6 b*
66 ± 3 b
86 ± 6 b
98 ± 6 b*
60 ± 4 b*
67 ± 1 c
Val
117 ± 5
234 ± 17 a*
161 ± 5 a*
186 ± 19 a*
154 ± 8 b*
91 ± 3 b
104 ± 7 b
131 ± 18 b
77 ± 6 b*
77 ± 2 c*
Lys
608 ± 24
1083 ± 78 a*
593 ± 34
688 ± 62
930 ± 64*
553 ± 28
698 ± 14
933 ± 67
597 ± 55
726 ± 48
Met
49 ± 2
102 ± 6 a*
62 ± 2 a*
80 ± 5 a*
72 ± 3 b*
42 ± 1 b
52 ± 3 b
71 ± 3 b*
44 ± 4 b
46 ± 2 b
Thr
309 ± 9
594 ± 73 *
567 ± 18 a*
554 ± 38 a
383 ± 21
330 ± 18 b
314 ± 22 b
387 ± 12
382 ± 20 ab 308 ± 13 b
Plasma amino acids express as umol/L.
Data are mean ± SE<; n = 5-6. Means without a common letter differ between treatments within.
time-points, P < 0.05.* Indicates different from fasted (P < 0.05).
3
12 h food-deprived controls.
1
2
How To Read Research: A Biolayne Guide
19
Article 04
Challenges for
Researchers
Research critics will often complain about studies
not performing a specific measurement or failing to
account for some variable. Oftentimes these criticisms
are invalid or unwarranted because of the limits
imposed on researchers. Armchair scientists who
unfairly criticize studies for certain aspects oftentimes
fail to recognize the challenges that researchers in
nutrition and exercise science face. Depending on the
academic institution, labs and universities vary widely
in the equipment and funding they have available for
research. Obviously, larger labs with graduate and
postdoctoral programs are able to attract larger grants
and more funding for projects which leads to more
sophisticated testing instruments and a higher level of
control over testing conditions. While there is growing
interest in exercise and nutritional sciences which
leads to more funding sources, there are still studies
that can’t be conducted due to lack of resources.
Funding
The primary challenge for researchers in exercise
and nutritional science is funding. There are various
funding sources available such as governmental like the
NIH, University grants, industry funding from food or
supplement companies, organizations such as ACSM,
NSCA, and other private foundations and non-profit
organizations. The unfortunate reality is that even with
studies receiving funding, the funding generally isn’t
enough to support the desired level of control to be
considered a high-quality study. To give you an idea of
how quickly the costs for a study can add up, here in
Florida the cost of performing a blood hormone test like
leptin is roughly $70 per blood draw. So, let’s assume you
wanted to test 10 subjects before and after a diet, that’s
two leptin tests per subject which adds up to $1,400
for only 10 subjects. That’s a small sample size and if
you wanted to make it a stronger study you would likely
need more like 40 people which could cost upwards of
How To Read Research: A Biolayne Guide
$5,600 and that’s just to test one hormone. That’s not
considering other lab supplies you might need, and the
researcher wouldn’t be able to pay their staff anything
which means they would need to find students who are
willing to volunteer their time on top of their academic
responsibilities. If you’re looking at studies that test
protein metabolism in rats, the cost of carrying out an
experiment could be upwards of $50,000. Many studies
need to pay subjects to recruit the necessary sample
size and if it’s a dieting study that includes supplying
food, the cost of food can be astronomical. Nowadays
many supplement companies are becoming more
interested in having scientifically validated research
to support the efficacy of their products for improved
marketing. Some studies sponsored by supplement
companies can cost tens of thousands of dollars and
can even reach upwards of hundreds of thousands of
dollars when offering to pay subjects to participate. We
haven’t even discussed the costs associated with the
instrumentation necessary to test certain variables in a
lab. Generally, departments receive funding from their
Universities for lab related costs to maintain, repair
20
or replace testing equipment. The amount received
yearly for department budgets is generally only enough
to afford maintenance on their current equipment and
replace regularly used supplies, they can’t afford to
buy new equipment or replace machines every year.
Most exercise science programs have what’s called
a metabolic cart (which we’ll discuss later) and costs
upwards of $20,000, that’s not including the costs to
maintain normal functioning or replace certain supplies
needed for regular use. That is why labs are limited by
funding and the equipment they have available.
Available lab equipment
It should now be no surprise why most exercise science
programs can’t afford to have sophisticated testing
equipment. The type of equipment in a researcher’s lab
will determine the type of studies they can conduct.
Some labs are focused on more mechanistic studies
that involve molecular biology experimentation using
cells and microscopes, whereas other labs are focused
on more practical and applied research that investigate
the effectiveness of a type of training modality.
Researchers will focus on a specific area of interest
and build their labs around that focus. The majority of
How To Read Research: A Biolayne Guide
exercise science programs will have a metabolic cart,
treadmills, cycle ergometers, various types of body
composition testing instruments, heart rate and blood
pressure monitors, and some other performancebased testing equipment, but again this will depend
on the university, the region and the faculty’s research
interest. We will cover some common measurement
techniques later, but it’s important to understand that
very few labs have the most sophisticated testing
equipment like a metabolic ward, MRI’s or muscle
biopsy testing, due to funding. Aside from the major
challenges of funding and lab equipment, researchers
are governed by their institution to ensure responsible
research conduct.
IRB/ethics boards
Academic institutions have ethics boards or governing
bodies that oversee experimental research. At many
universities the governing body is known as the
Institutional Review Board (IRB) for humans and the
Institutional Animal Care and
Use Committee (IACUC) for
animal research . The purpose of
these departments is to ensure
safe and ethical standards are
being followed according to
laws and regulations. Before a
study can begin recruiting and
testing subjects, they must go
through a formal review process
to obtain study approval. This
is one of the most annoying
processes involved in research
because it’s time consuming
and tedious. It’s comparable
to filing your taxes, but more
detail oriented and time
consuming. While necessary,
this approval process can take
away time from conducting
the experiment because most
academic institutions operate on semester timelines
that may include breaks or holidays that interfere with
the study timeline. So, if it takes 8 weeks to approve
a study and then another 3 weeks to recruit enough
21
subjects that’s the majority of the semester and only
leaves a few weeks to conduct an experiment. This is
why you will often see studies that aren’t much longer
than 12 weeks in duration. The IRB process includes an
informed consent for subjects and a very formal written
study protocol explaining in detail every aspect of the
study, including how you intend to recruit subjects.
Subject Recruitment
Subject recruitment is the other annoying process
for conducting human research. Recruitment can be
difficult and time consuming for exercise science and
nutrition researchers. As mentioned previously, many
labs don’t have the necessary funding to pay subjects
to participate in their studies. Free protein powder
and supervised training in the lab can be an appealing
incentive to some, but many others don’t want to follow
a standardized program for fear of less than optimal
results. This is why you generally see sample sizes less
than 50 in training studies. Even if a researcher is lucky
enough to recruit 50 people you generally have subjects
drop out due to various reasons and can end up losing
up to 20 subjects or more sometimes depending on
testing or intervention requirements. People have a
hard time following specific instructions, especially if
it means changing their usual lifestyle to accommodate
study procedures when there is no incentive to comply.
Think about asking college students to follow a specific
diet and no alcohol on the weekends or asking them
to come to the lab early before classes for testing or
training, or how about asking them if it’s ok to stick
a needle as large as a pencil in their leg for a muscle
biopsy? Obviously, studies that include animal models
don’t have to ‘recruit’ subjects, but they have to pay
more for their ‘subjects’.
Scheduling and Testing
As mentioned earlier, scheduling and experimental
time frames can be a major issue in conducting
experiments, especially if operating under University
semester timelines. Even if studies have the
opportunity to occur over multiple semesters or with
no time restrictions, scheduling can be a logistical
nightmare for research staff. For example, let’s assume
a study is investigating muscle growth over 12 weeks in
How To Read Research: A Biolayne Guide
50 subjects and the training program consists of 3 full
body days per week supervised in the lab by research
staff. Not only will you have to create a schedule for
the research staff to supervise each training day, but
you’ll also need to schedule each participant for each
training session each week. Not to mention, you’ll have
to schedule your baseline testing, mid-point testing
(if there is one) and post-testing. Depending on which
measurements will be taken, it could take an hour for
each participant, which means 50 hours per testing
session multiplied by three testing points and that’s
150 hours only for the measurement testing sessions.
That doesn’t account for the hour each subject is
training in the lab 3 days per week over 12 weeks. The
time requirement researchers ask from their subjects
can be a lot. This is a good example of why you don’t
see many training studies over 12 weeks, it takes a lot
of time and money!
Trusting Research
How can you trust research and how do you evaluate
studies that show conflicting findings? Individuals
without research experience are at a severe
disadvantage when it comes to being able to tease out
the nuances and extrapolate upon results presented
in publications.
Bias
We all have our own biases towards certain ideas or
topics, unfortunately most people either fail to admit
or don’t realize they have a bias towards a particular
topic. Good scientists recognize and acknowledge
their bias in an effort to tightly control for them in their
experimental design. Being biased means having an
unbalanced opinion or belief regarding a certain topic or
idea. This often leads to being close-minded and failing
to recognize conflicting or contrary evidence, beliefs
or ideas. Scientifically speaking, bias is a systematic
deviation between an estimated value and its true value 3.
In other words, it can be used to represent error.
There are a few types of biases that are important to
understand to become more critical of research.
22
Confirmation Bias - This is essentially when people
will cite evidence or report data that fits their bias or
belief, while ignoring or failing to provide evidence
that says otherwise. You’ll oftentimes see unethical
individuals cite one study that supports their argument
while failing to acknowledge five other studies that
refute their argument. There could also be a scenario
where someone misinterprets or takes very weak
evidence and glorifies it to make it seem stronger
than it really is. Politics is a good example, you will
oftentimes see certain media or news outlets reporting
a story that is misleading or simply untrue. They may
use a weak study or twist the narrative of a particular
topic to support their side of the story. Sometimes
you’ll see a news report showing only a piece of an
interview or press conference where it falsely portrays
an individual’s beliefs to make them look bad and push
their own political agenda. In research you may come
across a discussion where authors are comparing their
findings to other studies, but they fail to acknowledge
other studies that refute their findings.
Publication Bias - Publication bias is actually a pretty
common and unfortunate practice in the scientific
community. This type of bias is concerned with
publishing studies that only report significant results.
Published studies that support their hypothesis
represent 85.9% of published studies in 2007 compared
to studies that reject their hypothesis 16. Let’s face it,
studies with stronger findings or significant results
are more appealing to readers, especially editors and
publishers because they’re more likely to get cited in
other research, which leads to higher journal impact
factors and more revenue for journals 16. Completing a
study with insignificant findings can pose challenges for
researchers and leaving them unpublished also poses
a few issues. While the majority of responsibility for
publication bias lies with journal editors and publishers,
researchers can be guilty also. Researchers are busy
and they usually have a research agenda planned out
so that once a study is completed, they can begin on
the next project, and oftentimes they have multiple
research projects occurring at the same time. Earlier we
briefly described what goes into designing and carrying
out a research study, it’s obvious that research studies
How To Read Research: A Biolayne Guide
are a serious undertaking and require substantial time,
money, and effort to complete them. When the results
turn out to be non-significant it can be crushing to
the researcher and the amount of time and headache
they would have to put into getting it published just
isn’t worth it so they store it in a file and forget about
it (“file drawer effect”) 17. There’s also scenarios where
graduate students carry the responsibility of writing
up and submitting their manuscript for publication
after completing the research project and instead
they either graduate or move on to another program
without completing the publication process. Other
times researchers still put in the effort to get their
study published but due to publication bias of journals
it may be difficult or impossible to receive acceptance.
However, reasons for researchers being guilty of
publication bias are due to lack of time, low quality or
an incomplete study, fear of rejection, or insignificant
findings 16.
Even though resources, time and effort will go
wasted when studies aren’t published, there are
some consequences of failing to publish studies with
negative results. Before researchers invest time in
designing a study, they obviously explore journals to
find publications that are similar to their research
question or hypothesis and evaluate their findings.
If a study isn’t published due to negative results and
another researcher wants to test the same hypothesis,
they will be wasting valuable time and resources on a
study that would produce negative results. Therefore,
even though a study produces negative results it
should still be published to inform future research.
Additionally, unpublished data can misguide metaanalysis findings and conclusions. If meta-analyses
are using data that only show significant findings when
there are unpublished studies to conflict with some
studies, they can produce false positives and misguide
recommendations 16. Appropriately performed metaanalysis of clinical trials are the highest quality of
scientific publications and commonly used for healthcare decision making and therapies 18. One of the more
serious consequences of unpublished negative data is
the potential harm to individuals from pharmaceutical
drugs or even supplements. Publishing these negative
23
results could improve safety and standards of drugs
before they’re released [16, 18]. Maybe a supplement
study is carried out and finds no positive effect of
their treatment, but there were some subjects who
reported adverse symptoms or side effects. This
study goes unpublished but could be detrimental to
someone’s health.
Inflation Bias - Commonly referred to as “p-hacking”,
this is when unethical researchers will try a wide
variety of statistical tests and then selectively report
the significant results 17. This is essentially when
researchers torture their data until they obtain a
significant finding. It’s important to understand
that statistical analyses should be pre-determined
and a part of the study design process. P-hacking
commonly occurs when researchers conduct a study
and after collecting data decide to perform additional
or different statistical tests based on the gathered
data. Another common occurrence is when they
simply eliminate outlier data from subjects who didn’t
respond or responded much greater than the rest of
the group. Another situation in which researchers are
guilty of p-hacking is when they manipulate or change
the groups, they established at the beginning of the
study to make one group look like they experienced
greater change. Lastly, p-hacking can occur from
researchers performing data analysis part way through
the duration of the study and discontinuing the study
based on their results or simply not performing other
statistical tests once they find significance [17].
Ethical researchers will do their best to address and
acknowledge their biases, which sometimes can be
unintentional. Unethical researchers obviously make
choices with illintent and biases are irrelevant in those
situations. Science and peer-reviewed research does a
pretty good job at weeding out the bad apples and part
of this deals with addressing conflicts of interest.
Funding Sources / Conflicts of Interest - Any time
there is a conflict of interest listed at the bottom of
a publication it should be evaluated more critically.
However, this doesn’t mean you should immediately
discredit or dismiss the study or the findings. Ethical
researchers list their conflicts of interest to be
How To Read Research: A Biolayne Guide
transparent and acknowledge any potential personal
benefit or gain of the researchers or parties involved.
This should be a clear indication that they aren’t trying
to “hide” something or be dishonest, it should represent
the opposite. If dishonest researchers were attempting
to conceal some relationship or personal benefit, they
simply would risk not listing a conflict of interest. Earlier
we mentioned various sources of funding including
food and supplement companies, governmental
organizations, private companies, etc. When you come
across a supplement company funding a dieting study
or a study investigating the effectiveness of a particular
supplement, this should raise a red flag, as with any
type of company funding a study that investigates their
product. But again, it just means you should evaluate
the findings more critically. Before even evaluating the
results check the study design. Was it a randomized
placebo-controlled design? If not, you should be very
apprehensive to the findings and results. Randomizing
and having a placebo-controlled design is essential
when comparing treatments.
Evaluating Conflicting Evidence
Let’s assume there have only been two studies published
on a certain topic and they report contrasting findings.
How do you determine which study is better or which
study to trust? This is a difficult question to answer
and involves many considerations, but we will highlight
certain aspects and key details you’ll want to focus on.
Results - The level of significance of the results
is important and this is one of the first things you
should notice, but as mentioned previously (statistical
concepts), how meaningful are the results? Remember,
we want to see a P-value < 0.05 and the higher the effect
size value, the more meaningful it is. After evaluating
the statistics, check to see if there is any missing data
or if authors also published raw data within the text,
appendix or supplementary material. A good example
is, if a study is comparing two different types of diets,
they should have a table showing their respective diet
compositions, if not some type of food records or
nutrition data. If there isn’t any type of nutrition data
and it’s a diet study, we would be VERY cautious of the
findings and the conclusions that are drawn. Publishing
24
raw data is not necessary, but it’s a good practice and if
there’s raw data available look it over for yourself to see
if there’s any glaring issues or if some of the numbers
don’t add up. Within the results section they obviously
will report the results from statistical analysis for
the primary variables of interest, but they should
also provide some type of figure or table to visually
represent the data. Lastly, do the results of the study
agree with previous studies? It’s ok if they don’t, but in
the discussion the authors should explain conflicting
results and if there is a reason why results don’t agree.
Study Design / Level of control - How much control
did the researchers have over the independent
variables? Did they provide food to participants if it
was a diet study? Were they supervising the resistance
training program prescribed to participants? How
did they control free-living conditions? Obviously,
there are no mandatory requirements researchers
should be following for their study design, this will be
limited by their laboratory techniques and equipment
they have available. But, there are some things you
should be asking yourself when reading through the
methods section, how did they test and control for X,
Y and Z. If a study had subjects in a metabolic ward
that’s far more valuable data than any free-living
study. Similarly, if a training study doesn’t mention
anything about supervised training it’s going to carry
more confounding variables and limitations than a
study that included supervised training in the lab for
the duration of the study. The level of control is going
to significantly impact the sample size and the study
duration. Increasing the level of control comes at a
cost, higher control = higher cost and generally leads
to a smaller sample size and shorter study durations
to maintain that level of control. Unlike human model
designs, rodent models offer a high level of control,
longer study duration and a larger sample size at a
smaller cost compared to human subject designs. But
the results aren’t always transferable to humans.
10 subjects it carries a lot less weight than studies
with larger cohorts, but they can still be valuable and
contribute to the body of literature. Case studies are at
the bottom of the totem pole for study designs, but for
investigating certain novel topics they can be the only
appropriate design available. These types of studies
should just be interpreted with caution and understand
that their ability to draw strong conclusions is severely
limited. The caveat to this is with studies that are
extremely well controlled but have a small subject
number. An example of these types of studies would
be metabolic ward nutritional studies. In these studies
every piece of food is provided to the subjects and
they are housed in a ward that measures their energy
expenditure. These types of studies do not need to
have a high subject number in order to be impactful due
to their high degree of control. They are also incredibly
expensive which is why they typically don’t have a high
subject number.
Study Duration - You will generally encounter training
studies in exercise science with durations around
12 weeks. This isn’t a bad thing, but the strength of
Sample size - How many subjects were included in the
study? Generally, studies with less than 10 subjects is
a poor sample size and less likely to lead to significant
changes in the outcomes. If a study has less than
How To Read Research: A Biolayne Guide
25
evidence is going to be less than a study of 24 weeks,
assuming all else being equal. Longer study durations
provide a bigger picture of what could happen. It’s like
having two cars drag race, maybe one car has greater
acceleration and pulls ahead for the first ¼ mile, but
the other car has greater overall speed and ends up
winning the race. With longer durations we can have a
more dependable and reliable idea of the changes that
could occur. The difficulty with longer studies is that
they are more expensive and less likely to have a high
degree of control as they become more invasive to the
subject’s lives.
In general it’s important to understand the limitations
that exist in all scientific studies. In general, if you
want to conduct a long term study in humans, it will
either be a low subject number or not well controlled
or both. If you want to conduct a tightly controlled
study in humans it will likely be short in duration or low
in subject number or both. If you want to conduct a
long term, tightly controlled study with a high subject
number, it will likely be in animals. Below is a venn
diagram providing you with a conceptual framework
to give you a better idea of the give and take between
variables for study designs.
Treatment/Intervention - Any study that involves
groups with different treatments or interventions,
it’s important to take note of the dosages or the
amount of the treatment or intervention. If a study
is investigating a specific supplement, is the dosage
clearly stated and is it an appropriate dosage to
elicit a response? If you’re looking at two studies
that compared the effects of caffeine on heart rate,
it should be obvious that whichever study used the
higher dose will see a greater heart rate. If two training
studies are comparing muscle growth in a specific
muscle, the level of training volume and intensity are
going to have a major impact on their outcomes. If
the study is investigating supplements they should be
randomized and placebo-controlled to account for the
various confounding variables and limitations.
Limitations - Every study carries limitations, you can’t
account and control for everything, at least not in freeHow To Read Research: A Biolayne Guide
living studies. It’s ok for studies to have limitations and
generally they’re outside of the researcher’s control,
but major limitations should be clearly stated and
explained towards the end of the discussion. With that
being said, the researchers aren’t going to state every
little thing that’s wrong with their study, so don’t expect
that. Any major methodological limitations should be
explained. Examples of some limitations are low sample
sizes, study durations, lack of control over a specific
measurement due to lack of laboratory resources, lack
of generalizing the findings, differences in treatments,
characteristics of subjects, issues with measurement
devices, etc.
Measures - It’s important to evaluate the methods
section for the types of measurements they used
to test the dependent variables. There is no perfect
measurement available, it’s impossible to measure
someone’s true or exact score of any measure and every
device used in research will have a certain level of error
associated with them. We may have “gold standards’’’
or measures that we use to validate other measures,
but this is done through correlations and the criteria
we use to validate other measures have their own error
rates associated with them. Underwater weighing
used to be the “gold standard” for measuring body
composition, now we use the 4-compartment model
because it’s been shown to be more reliable [19]. This
doesn’t mean any study that uses hydrostatic weighing
is useless, we just need to be critical of its error rates.
There are endless types of available instrumentation
to measure certain variables and we’ll cover some
common measurements in the next section. When
evaluating measurements, we are concerned with the
validity and reliability of that measurement.
Validity - Validity is arguably the most important
consideration for measurement technique and
indicates the degree to which a device measures
what it’s supposed to 3. This is concerned with how
accurate and “true” the measurement technique is.
There are a number of types and ways to establish
validity of a measurement technique. Frequently in
research, validity is established by comparing one type
of measurement to a criterion method. For example,
26
the 4-compartment model that uses bod pod for body
Standard error of measurement - Standard error
volume estimates was used as the criterion to determine
of measurement (SEM) is calculated using the ICC
if Dual-Energy X-ray Absorptiomertry (DXA) would be
and the standard deviation of scores, which means it
an acceptable method to measure body volume 20. The
accounts for the variability and reliability of the test.
validity of a measurement is more difficult to establish
than reliability.
ReliabilityReliability is concerned
with the consistency of the measurement technique.
Reliability is the degree to which a device produces
stable or consistent results.
If a measurement
technique is not consistent, then you cannot trust
the test. In other words, “a test cannot be valid if it’s
This value tells us the level of error and precision of a
measurement. SEM values can be viewed as a range,
plus or minus around the predicted or measured value.
For example, if you’re testing body fat percentage
and you measure someone at 15% body fat and the
SEM value is 3%, their true percentage is somewhere
between 18% and 12% body fat.
not reliable” 3. Before performing an experiment it’s
Minimal detectable difference - The minimal
important to test laboratory equipment that will be
detectable difference (MDD) is calculated using the
measuring our dependent variables to ensure consistent
SEM. This tells us how sensitive the measurement is.
and accurate results. This doesn’t have to be done
It provides a value in the common unit associated with
prior to every experiment, but the equipment used in
the testing device and tells us the minimum amount
research should be tested to ensure reliability. The test-
of change needed to exceed measurement error and
retest method is a common technique used to estimate
the reliability of testing devices by performing one test,
then after a specified time interval, test again 3. We can
then perform some stats to obtain some values that tell
us how reliable our instruments are. Not all studies do
this and some studies test reliability in other ways, but
it’s good science to report some type of reliability for
testing devices, to ensure changes that occurred are
to be considered a ‘real’ change. For example, if the
MDD of an RMR machine is 100 kcal then the person
you’re testing would have to have an RMR greater
or less than 100kcal between testing points to be
considered a real change. You may often see different
terms used for these three statistics. SEM can also be
called standard error of estimate (SEE), MDD can also
be called minimal detectable Change (MDC), just know
there may be different names that essentially resemble
dependable. You will generally find these values in the
the same meaning. There are also many other statistics
methods section after a brief explanation of the testing
available to test validity and reliability of measurement
procedures for a specific device.
techniques. These are just a few common ones you
might come across and hopefully give you a better idea
Intraclass correlation - The intraclass correlation
coefficient (ICC) is calculated by running a simple ANOVA
to produce a reliability coefficient (similar to coefficient
correlation, as described in the stats section) that
provides an estimate of the error variance of a testing
device. This is a good indicator of the stability of the
measurement. Values closer to 1 resemble scores that
have a high similarity or high correlation as in other
correlational scores, likewise scores closer to 0 mean
of the error rates associated with testing different
variables. We want to reiterate that oftentimes people
overlook the error rates associated with some measures
and assume they are accurate and/or exact scores.
With in-vivo studies it’s impossible to know the true and
exact score of certain variables, we test them which
gives us a good estimate or prediction of the score and
we have to know there is always a certain level of error
associated with the device and/or technician. So long
as the same technician, same device and testing is done
they are less similar. In other words, scores closer to 1
under the same conditions, we can use measurements
have less error and better reliability.
to compare changes over time.
How To Read Research: A Biolayne Guide
27
Article 05
Common
Methods for
Measuring
Variables
Body Water
Deuterium Dilution
Deuterium is a stable isotope of Hydrogen and
deuterium dilution serves as the “gold standard” or
criterion method for total body water assessment.
Researchers use a labeled water that contains a large
quantity of deuterium (“heavy water”) and measure
concentrations in the urine, blood or saliva to measure
total body water. There are other isotopes that can
be used in a similar manner to the deuterium dilution
method, but most commonly it is deuterium that’s
used as a tracer. Using this method subjects void their
bladders than drink water with the labelled isotope and
after it has equilibrated in the body for a duration of time
researchers most commonly collect a urine sample.
The urine is then analyzed using a mass spectrometer
to determine total body water levels. This method is
expensive, time consuming and requires sophisticated
laboratory expertise 26. For this reason, other measures
have been developed to more conveniently measure
total body water (TBW).
Bioelectrical Impedance Analysis (BIA)
BIA technology uses a small electrical current that
is transmitted through your body extremities and
between voltage detecting electrodes (contacting
hands and/or feet). Water conducts electricity and
tissues like fat mass and bone have very little water
which increases the resistance (impedance) of the
electrical current thereby decreasing the rate of its
How To Read Research: A Biolayne Guide
transmission. Based on fat mass content in your body,
the impedance (resistance) of the electrical current
is measured using Ohm’s law (resistance = volume
/ current) which can then be applied in an equation
to quantify water volume, percentage body fat, and
FFM 21. There are many different types of BIA devices
available and vary based on specific frequencies, cost
and complexity, which will impact the validity and
reliability of the specific device being used. Nowadays
you will commonly see BIA technology integrated into
at-home body weight scales. When used for body
composition assessment, research indicates that BIA
is comparable to DXA when estimating BF%, fat mass
or fat-free mass (FFM) 27. However, other research
indicates that single assessments using DXA or BIA
is questionable due to their accuracy on an individual
level [28]. When compared to deuterium dilution
for measuring TBW, BIA is close in accuracy, but still
slightly underestimates TBW 29. BIA shows promise
in accurately estimating TBW, however accuracy in
measurement can vary based on the population being
28
studied and with little research comparing BIA to
deuterium dilution, the validity to accurately estimate
TBW remains questionable 30. Nonetheless, evidence
suggests BIA is acceptable for assessing TBW and
displays acceptable accuracy when assessing body
composition if incorporated into a multi-compartment
model 28. Another tool that shares similarities to BIA
known as Bioelectrical impedance spectroscopy (BIS),
seems to exhibit greater validity and reliability than BIA
when assessing TBW 26.
Bioelectrical Impedance
Spectroscopy (BIS)
BIS features the same underlying technology as
BIA to estimate body composition and water, which
includes an electrical current traveling through the
body between electrodes to measure the impedance
of the electrical current. BIS devices differ from BIA
devices by utilizing a ‘spectra’ of frequencies, which is
where the term spectroscopy comes from 30. Although
there are single and multi-frequency BIA devices on
the market and it’s unclear at what frequency a BIA
could be considered BIS; BIS uses Cole modelling
to predict body fluids, which has been suggested to
be superior for assessing body composition using
impedance based methods 30, 32, 33. BIS is also useful in
differentiating between intracellular and extracellular
body water. The underlying principles used for BIA
and BIS are the same for estimating body composition
and either device can acceptably be utilized for body
water estimations, however it appears BIS is more
accepted 26, 28, 30, 33. It’s important to keep in mind the
underlying principles for how these impedance based
devices were developed and they’re primarily for body
water assessment, although they can predict body fat
% (BF%), other body composition methods would be
more acceptable.
fat mass and/or fat-free mass (FFM). The only direct
measurement of body composition would involve
performing an autopsy on a human cadaver to dissect
and weigh various tissues and organs, which is obviously
impossible for free living experiments. Therefore, we
estimate body composition based on what we know
about the weight and composition of various tissues in
the body. It’s important to understand that there is no
perfect estimate and all techniques and methods have
error rates associated with them. For this reason, we
cannot place a high level of importance with a specific
percentage of body fat. Rather, we use it as an objective
measure to quantify and track changes to determine
the effectiveness of specific interventions.
Skinfold
The most common and cost-effective method for
estimating body composition is the skinfold technique.
This technique assumes a 2-compartment (2C) model
(more on multi-compartment models later), splitting
body weight into fat mass and FFM.
This technique requires firmly grasping the subject’s
subcutaneous fat and skin with the thumb and
forefingers to measure the thickness (in mm.) with a
caliper. You can accomplish these measurements with
as few as three sites or as many as seven including
the triceps, subscapular, suprailiac, abdominal, upper
thigh, chest, and midaxillary. Measuring seven sites
Body Composition
There are a number of techniques and methods
available for measuring body composition, specifically
How To Read Research: A Biolayne Guide
29
give a more accurate estimate of BF% because it
can account for body fat distribution, some people
hold more fat in their lower body compared to upper
body. The sum of these site measurements are added
together and plugged into a prediction equation to
estimate body density, which is then plugged into the
Siri equation to estimate body fat percentage (BF%)
34
. There are a number of body density prediction
equations available and it’s important to use a population
specific equation because the coefficients used in the
calculations can produce inaccurate estimations for
individuals with varying body fat levels. When using
an appropriate population specific equation, skinfold
fairly accurately predicts BF% (± 3-4%) 35. The great
thing about skinfold is not only the low cost, but you
can track site-specific changes
to gauge the rate and location of
fat loss. Additionally, this is one
of the few measurements that
actually assess fat thickness,
most other measures use X-ray
beams and imaging techniques
or electrical currents to assess
fat mass. This technique is
only as accurate and reliable
as the technician who is
performing the test. The
technician must have a lot of
experience developing this skill
to precisely identify anatomical
site location and accurately
measure
fat
thickness
consistently. When compared
to computed tomography (CT
scan) skinfold shows a strong
correlation when comparing
measurements performed in the abdominal region
36
. However, studies comparing skinfolds to the gold
standard 4C model, results indicate large individual
error rates, but acceptable group average values 37, 39.
Meaning, when you test one person the error rate can
be much higher compared to measuring and averaging
the BF% of a group of people. For example, you could
compare skinfolds to another method and see an over
or under estimation in BF% by 6%, but when comparing
How To Read Research: A Biolayne Guide
the group average BF% the error in BF% estimation
could be only 2%. These are arbitrary numbers and don’t
reflect the true error rates of skinfolds, those will vary
depending on the equation, population and criterion
method being used for comparison. Nonetheless,
skinfolds are the most cost-effective method and with
a skilled technician and correct equations, they can
provide an accurate estimate of body composition.
A-mode Ultrasound
A-mode ultrasound uses ultrasonography technology,
which transmits a signal through the skin and tissues
and the reflection of the signal at tissue boundaries is
transmitted back as an “echo”. There is also another
type of ultrasound known as “B-mode” (we’ll cover
later), but we’re specifically referring to A-mode
ultrasound. Bodymetrix has developed a handheld
portable device that is used similarly to how skinfolds
are conducted. The device can be used to measure
as few or as many sites as desired, simply select the
equation and number of sites from a drop-down menu
in the software. This technique also relies on the skill
of the technician. One of the primary benefits is being
less invasive since it does not include “pinching” the
30
subject and while the cost is much less expensive
than other sophisticated laboratory equipment, it
is still more expensive than skinfolds. The unique
aspect of this device is that it can also produce an
image of the muscle and fat layers. This device has
not been validated to measure muscle thickness, but
some researchers suggest it could be a useful tool for
measuring acute changes in muscle thickness 40. For
body composition it hasn’t been validated adequately
to the same degree as other measures, but studies
show strong agreement between skinfold and air
displacement plethysmography (ADP) 41, 42.
Body Volume Measurement
Underwater weighing (UWW) and air displacement
plethysmography (ADP) accomplished via Bod Pod, are
used to measure body volume by applying Archimedes
principle, which allows for calculation of body density.
Body density can then be used in an equation (generally
the Siri equation) to calculate BF%. Underwater
weighing is conducted by having the subject sit on
a flimsy carriage that is connected to a scale (it’s like
a human produce scale) and lowers them into a pool
of water. The subject’s nose is pinched closed and
they are instructed to blow out all of their air as they
are slowly submerged into the pool in a fetal-like
position. The testing procedure for this technique
is probably the worst compared to others. Imagine
exhaling all of your air while hunched over, remaining
as still as possible, while being lowered into a pool
while researchers attempt to record your weight. Prior
to being submerged in water, researchers measure
residual lung volume to account for air trapped in the
lungs after full exhalation. The Bod Pod is very similar
to underwater weighing, except using air, and involves
a much more comfortable testing procedure; although
those who are claustrophobic may not agree. Subjects
are placed in a large plastic “pod” like device with a
small window. While sitting on a small seat wearing
a swim cap, body volume is measured within a few
minutes by subtracting the initial volume of the empty
chamber by the reduced air volume with a person
inside. This method estimates body composition very
closely to hydrostatic or underwater weighing, since
they use similar underlying principles. Underwater
How To Read Research: A Biolayne Guide
weighing was previously considered the gold standard
and criterion method to validate other methods, now
we have more non-invasive techniques available that
can provide greater BF% accuracy.
Multi-compartment Models
The advancement of technology and how we understand
body composition has led to the development of more
accurate and precise assessment techniques. Multicompartment models are considered the criterion
for validating other methods of body composition
estimates 43. By including more measures and multicompartments we can reduce the assumptions made
regarding various tissues weights and volumes,
leading to a more precise estimate by measuring
them. It would be reasonable to assume that by
introducing more measurements the error rates
associated with those measurements could reduce
the accuracy, but research shows these error rates are
negligible 44. Multi-compartment models range from
the traditional 2C all the way up to 6-compartments
(6C). The 4-compartment (4C) model is viewed as
the gold standard for body composition assessment
45
. The 4C model will be accomplished based on the
instrumentation that labs have available, but generally
it is accomplished using a DXA and a measurement of
body water (generally BIA). It has not been established
if increasing the complexity of these methods
justifies the potential benefits 46. The more complex
and sophisticated these models become, the cost of
testing increases due to the instrumentation needed
to measure various tissues, making it impossible for
some research labs.
Dual-Energy X-Ray
Absorptiometry (DXA)
Dual-Energy X-Ray Absorptiometry (DXA) is a common
and popular method to test body composition. DXA
machines were originally developed for bone mass
assessment. Now they have become a common method
for body composition testing, if labs are fortunate
enough to have the funding to support the high cost
associated with them. DXA is a practical and noninvasive way to measure body fat percentage. Subjects
comfortably lie supine on a table for the 10-15 minute
31
2 Compartment Model
3 Compartment Model
4 Compartment
Model
5 Compartment Model
How To Read Research: A Biolayne Guide
6 Compartment Model
32
test while two low-energy X-ray beams (with minimal
radiation exposure) slowly pass across the body.
The computer software generates an image of the
underlying tissues and quantifies bone mineral content
(BMC), total fat mass, and FFM 21. Additionally, DXA has
the ability to perform regional body tissue analysis to
determine if specific areas of the body have lower or
higher body fat or BMC. Many believe DXA scans are a
superior method for testing BF%. However, if certain
variables are not accounted for and if DXA scans are
not performed correctly (like any measure) there is
potential for high error rates. DXA scans operate under
a 3-compartment model, splitting body weight into:
body fat, fat-free mass (FFM) and BMC. Body water
fluctuates throughout the day based on water and
glycogen stores and these fluctuations can lead to
large error rates because DXA fails to account for body
water. This is supported by previous research showing
a 3C model with a body water measurement produces
smaller error rates than DXA when compared to a 4C
model 28. When looking at group level comparisons,
DXA seems to have pretty good accuracy compared to
the gold standard 4-compartment model 28. However,
when looking at individual comparisons or changes,
the error rates can be much higher, especially if
individuals differ in certain characteristics such as sex,
size, fatness or nutritional status 28, 47. The error rates
of DXA scans will vary from study to study depending
on methodological differences of the study design,
but research has shown that DXA error rates can be
as high as 8-10%, which is similar to the error rates
of hydrostatic weighing 48. While DXA shows a strong
correlation to CT scans, DXA still underestimated fat
weights by 5kgs 49. The accuracy of DXA has also been
questioned when evaluating weight loss changes from
a study that simulated weight gain by wrapping lard
around subjects and performing a DXA scan. Results
showed that the DXA scans quantified the lard as bone
mineral content rather than fat 50. For these reasons,
results should be interpreted with caution from
studies using exclusively a DXA scan to evaluate weight
changes. Instead, researchers should incorporate DXA
scans into a 4C model that also accounts for body
water to more accurately estimate body composition
changes.
How To Read Research: A Biolayne Guide
As you can see all of these techniques and methods
carry some limitations and while some methods may
be more accurate or precise, they are all acceptable
methods for estimating fat and fat-free mass. Since
a large portion of FFM is muscle, you may see some
of these techniques used to infer increases in FFM as
increases in muscle growth (hypertrophy) 51. However,
there are more direct and appropriate methods
available to assess hypertrophy.
Protein Metabolism
In the body, protein is in a continuous state of
breakdown and synthesis, this simultaneous process
is known as protein turnover. In a typical 70kg male,
about 0.3kg of protein is degraded and replaced each
day to avoid the breakdown of stored protein 22. Protein
metabolism is a complex and intricate process that
requires sophisticated laboratory equipment and
testing techniques. Here we describe a few methods
that are commonly used to assess protein turnover.
Isotopic Tracer Method
Muscle Protein Synthesis (MPS) is one of the more
complicated measures to explain. To assess the rate of
MPS scientists often use a ‘tracer’, which is a molecule
that they can track and ‘see’ which tissues it ends
up in. In the case of MPS we use either a radioactive
(less common) or stable isotope form of an amino acid
to measure MPS. You may remember from general
chemistry that an isotope is an atom that has a different
number of neutrons than normal, which increases its
weight. Since it’s heavier than a normal molecule we
can use a gas chromatography mass spectroscopy
(GCMS) to separate it from the ‘normal’ molecules. A
common amino acid isotope used to assess MPS is
D-5 Phenylalanine (an amino acid). D-5 means the 5th
carbon on the phenylalanine is deuterated hydrogen
which contains an extra neutron, thus making it heavier
than normal phenylalanine. D-5 Phenylalanine is often
chosen as a tracer because it is not metabolized by the
muscle (although various other amino acids are used
33
as well), so it can be assumed that any D-5 that winds
up in muscle protein did so due to MPS.
To assess MPS, typically the amino acid isotope is
infused or injected into the bloodstream of the subject
that is undergoing whatever treatment is being
provided. The tracer will then be taken up by the muscle
in the form of intracellular amino acids or incorporated
into proteins via MPS. This ratio of peptide bound
tracer vs. intracellular tracer forms the basis behind
determining the ‘rate’ of MPS. To put it in more practical
terms, if the tracer is found in greater concentrations
in muscle proteins in one treatment group vs. another,
it is likely that the first treatment group has higher
rates of MPS since more of the tracer wound up there.
The actual equation of MPS is a bit more complicated
than this and for bolus injections of isotopes (usually
done in rodents) the equation is: MPS (%/hr) =
MPS = (Eb x 100)/(Ea x t) where t is the time interval
between isotope injection and snap freezing of muscle
expressed in hours and Eb and Ea are the enrichments
of 2 H5-phenylalanine in hydrolyzed tissue protein and
in muscle free amino acids, respectively. In the case of
infusing a tracer the equation is portrayed as: MPS (%/
hr) = (Ep2 - Ep1)/(Eic)/(t 100) where Ep2 and Ep1 are the
protein-bound enrichments from muscle biopsies at
time 2 (Ep2) and previous muscle biopsy at time 1 h (Ep1).
Eic is the mean intracellular phenylalanine enrichment
from the biopsies and t is the tracer incorporation time.
We realize these equations probably look quite
daunting but the only thing you need to know is that
we are comparing the incorporation of the tracer at
one time point vs. another time point to see how much
has been incorporated into muscle tissue and in what
timeframe. If we have that information and we have
the intracellular concentrations of that tracer, then we
can determine the rate of MPS. Once a biopsy (human
testing) or sacrifice (animal studies) is performed and
the muscle tissue is taken, it is immediately frozen
in liquid nitrogen to ‘freeze’ all metabolic processes
so that there is now a ‘snapshot’ of the muscle
metabolism. The tissue is then later ‘powdered’ (fancy
word for grinding it into a powder with a mortar and
How To Read Research: A Biolayne Guide
pestle), homogenized, and then taken through various
chemical reactions in order to separate the protein
bound amino acids from the intracellular amino acids
(this is usually done by adding perchloric acid to the
sample). The intracellular amino acids and peptide
bound amino acids are then taken through several
other chemical reactions to prepare them for the GCMS
and then run through the GCMS which allows scientists
to determine the concentrations of the tracer in the
muscle and intracellular fluid by separating the tracer
from the normal amino acid on the GCMS (the gas
chromatograph helps separate the isotope based on
weight since it’s heavier). Then the concentrations
of the tracer in each sample can be determined by
comparing them to standardized concentration
samples that are also run through the GCMS. Once we
have the concentrations of these samples, we can plug
them into our equation to determine MPS.
Easy right? We doubt anyone is saying that and we can
assure you that it’s not. The entire process is extremely
sensitive to error, which takes around 2 weeks to
analyze ~100 samples and is a minefield for potential
errors. Scientists have to be borderline obsessive about
handling their samples and execution of reactions in
order to ensure good data.
Nitrogen Balance
Nitrogen balance is the difference between nitrogen
intake and nitrogen excretion. A negative nitrogen
balance occurs when nitrogen excretion is greater
than nitrogen intake and vice versa. A neutral nitrogen
balance is said to occur when nitrogen intake is equal
to nitrogen excretion. Protein contains roughly 16%
nitrogen content on average, so by knowing protein
intake we can then calculate nitrogen intake 22. Nitrogen
excretion on the other hand, is more complicated
to measure and control for. Nitrogen excretion can
occur through urine, feces, sweat, and skin 22. One of
the primary drawbacks to this method is attempting
to quantify nitrogen excretion which can often lead
to an underestimation of total nitrogen excretion.
Another drawback of the nitrogen balance method is
the effects of dietary intakes. During caloric restriction
an increase in nitrogen excretion can occur, even when
34
protein intake is high and when protein intake increases
nitrogen excretion generally increases as well. These
drawbacks can lead researchers to overestimate
nitrogen intake and underestimate nitrogen excretion,
leading to inaccurate estimates of nitrogen balance
22
. The majority of nitrogen stored in the body resides
in skeletal muscle tissue and is often used to assess
muscle protein metabolism, however this method is
more indicative of whole-body protein turnover and
does not specify tissue specific protein metabolism.
This is important because while skeletal muscle mass is
the largest source of nitrogen in the body, the turnover
rate for skeletal muscle is very slow at only ~1% per day
whereas the liver and gut tissues turn over at 30-80%
per day. Due to this, nitrogen balance changes often
reflect what is occuring in those tissues vs. muscle
mass.
3-Methylhistidine
3-methylhistidine is an amino acid present in actin and
myosin, which are contractile units of muscle fibers.
3-methylhistidine can be measured from a muscle
biopsy, in the blood or more commonly in the urine.
Unlike the nitrogen balance method, 3-methylhistidine
can be used as a urinary marker for muscle protein
breakdown, since roughly 90% of 3-methylhistidine
is located in skeletal muscle 22, 23. When skeletal
muscle is broken down, 3-methylhistidine is excreted
through the urine because it cannot be recycled from
degraded contractile proteins 22. However, as much
as 25% of urinary 3-methylhistidine could come from
other nonmuscle sources 24. Another limitation of
this method is that 3-methylhistidine is present in
dietary meat. So large intakes or increase in dietary
meat consumption would increase urinary excretion
giving researchers inaccurate data. So long as these
limitations are accounted for and controlled it can be
a viable method to assess skeletal muscle degradation.
acid that cannot be metabolized or produced in muscle
tissue and monitoring how much goes in and out of
the muscle 22, 25. Phenylalanine, tyrosine and lysine
are not metabolized in muscle, but most often you’ll
see phenylalanine used 22. Arteries carry blood and
nutrients to the skeletal muscle and waste products or
nutrients are carried out of the skeletal muscle through
the veins. By inserting a catheter into the vein and artery
of the leg or arm, researchers can then measure the
concentration of phenylalanine in the veins and arteries
at those locations. Then muscle protein synthesis is
determined by the disappearance of phenylalanine
in arterial blood (signifying phenylalanine being
deposited in muscle protein), and the appearance of
phenylalanine in venous blood signifies muscle protein
breakdown 22. The obvious downside to this is you can’t
have catheters inserted in subjects indefinitely. This
technique is employed for short durations, usually only
a few hours after a specific treatment. This only gives a
small snapshot of what could occur, the problem is that
the observations are generalized or extrapolated into
long-term changes.
It’s common for some of these methods to be
implemented together in some studies to generate
a more reliable outlook of protein metabolism due
to the limitations associated with each technique.
Since protein synthesis leads to more muscle
mass and protein breakdown leads to less, a better
approach for investigating the long-term changes of
protein metabolism may be to specifically measure
muscle growth.
The Arteriovenous Net
Balance Technique
Unlike the nitrogen balance method for measuring
protein metabolism, this technique can measure rates
of protein synthesis and breakdown that occur within
the muscle tissue. This is accomplished using an amino
How To Read Research: A Biolayne Guide
35
Hypertrophy
Measurements
B-mode Ultrasound
The most common device you will encounter for
assessing muscle thickness changes is the B-mode
ultrasound. This is the same type of ultrasound device
that’s used for measuring fetal development during
pregnancy. Similar to A-mode ultrasound, the device
probe converts electrical energy into high-frequency
sound waves that pass through the skin surface
and underlying tissues, which reflect from the bone
surface to produce an echo 21. Compared to A-mode,
B-mode is more expensive and technically demanding,
it also produces a higher resolution image that
provides more detail and tissue differentiation 21. This
method of assessing muscle thickness is non-invasive,
can be done quickly and is less expensive than most
other measures of muscle growth. Like the skinfold
technique, this assessment is skill dependent and
relies on the error rate of the technician. Hypertrophy
can vary through different regions of the same muscle
and B-mode measurements only represent the site
specific region that’s measured, it is not indicative of
hypertrophy of the entire muscle 51, 52.
Advanced Imaging Techniques
The two types of advanced imaging techniques we
will briefly discuss are computed tomography (CT) and
Magnetic Resonance Imaging (MRI). These are highly
complex and expensive pieces of equipment, which is
why you’ll rarely see them used for body composition
or muscle growth research. These two methods are
as close as we can get to human cadaver analysis
in free-living subjects, with their advanced imaging
techniques they allow for visualizing and quantifying
organs and tissues such as muscle and fat 54. CT scans
use ionizing radiation X-ray beams that pass through
tissues with differing densities, which generates
cross-sectional, 2-dimensional radiographic images
of body segments 21. Using these images researchers
can determine total tissue area, tissue thickness and
volume of tissues within an organ 21. CT scans have
How To Read Research: A Biolayne Guide
been shown to be reliable and valid for assessing
changes in muscle cross sectional area (CSA) [55]. MRI
is a large tunnel-like machine you see in most medical
tv shows. These machines use electromagnetic fields
and radio waves to generate detailed images of the
organs and tissues within the body. Unlike CT scans,
MRI’s don’t use ionizing radiation, whereas CT scans
do emit a small amount of radiation exposure. MRI
can be used for a variety of measurements including,
total and subcutaneous adipose tissue assessment,
muscle’s lean and fat components, muscle thickness,
and muscle volume 21. MRI is viewed as a reference
standard for regional muscle mass analysis and is the
most accurate in terms of assessing changes in gross
muscle size 51, 56. The few downsides associated with
MRI (aside from the high cost) include its inability to
assess the molecular adaptations that occur within
muscle fibers and they fail to evaluate the metabolic
and underlying mechanisms of muscle tissue 51, 57.
Muscle Biopsy
Muscle biopsies are a safe procedure accomplished
using an anesthetic to numb the site, then a large
pencil sized needle is inserted through the skin and
underlying subcutaneous tissues and fascia to reach
the skeletal muscle tissue sample that is clipped and
removed. Muscle biopsy samples can be used to assess
microscopic and molecular changes to skeletal muscle.
When evaluating microscopic changes, the sample is
frozen, thinly sliced and attached to a slide and stained
(depending on the method used), to determine fiber
cross sectional area (fCSA) or fiber type-specific cross
sectional area 51. Molecular assessment of muscle
growth takes it another step deeper than microscopic
and analyzes the changes in protein sub-fractions (the
components that make up muscle fibers) such as actin,
myosin, or other sarcoplasmic protein concentrations 51.
When evaluating molecular changes there are a
variety of different methods and protocols available.
Limitations to muscle biopsies share some similarities
to B-mode ultrasound. Muscle biopsies only measure
the site where the sample was extracted and any
observed changes are also assumed to occur in the
surrounding fibers, as previously mentioned, muscle
growth can vary throughout the muscle. Additionally,
36
the difference in tissue processing methods between
labs and the lack of standardization makes it difficult
to compare findings between studies 51. Lastly, it’s
impossible to perform a biopsy in the same location
twice, so biopsy samples within the same study could
be comparing the changes of different regions of the
measured muscle. For a more comprehensive and
in-depth review of measurements relating to muscle
hypertrophy we strongly suggest a review by Haun et
al. (2018) 51.
Energy Expenditure
Energy expenditure is essentially a measure of heat
production. Cellular metabolism results in heat
production and measuring the body’s rate of heat
production gives a direct assessment of metabolic
rate 21. We can measure heat production directly or
indirectly by measuring the exchange of gases (carbon
dioxide and oxygen).
Indirect Calorimetry
Resting metabolic rate or resting energy expenditure
How To Read Research: A Biolayne Guide
(REE) is an estimation of the amount of energy an
individual expends at rest (laying on a bed) over a 24hour period. This estimation is derived from analysis
of the volume of air breathed during a specified
period of time and the composition of expired air 21.
The most accepted measure to determine REE is
via indirect calorimetry, using a device known as
a metabolic cart. The metabolic cart analyzes air
volume and composition the participant is breathing.
To accomplish this, the metabolic cart includes a
computer interface to display data output recorded
by a device that continuously measures the subject’s
expired air, a flow-measuring device to record the
amount of air volume breathed and a small gas
chamber that analyzes the oxygen and carbon dioxide
composition of expired air 21. The subject lies supine
on a table with a facemask or a plastic canopy that
collects the air breathed which travels through a long
tube to the metabolic cart. The device then estimates
the number of calories per day the participant uses
at rest, based on the volume of air breathed and the
composition of expired air, accounting for ambient air
temperature and composition. Substrate utilization is
accomplished within the same test and is calculated
37
from the volume of carbon dioxide produced divided by
the volume of oxygen consumed, known as respiratory
quotient (RQ). The RQ value is used to determine if a
greater percentage of calories burned come from fat
or carbohydrates. This method of energy expenditure
is non-invasive and takes approximately 20 minutes
to complete, with the first 5 minutes discarded for
calibration purposes and the remaining 15 minutes
used to extrapolate the data into a 24-hour period.
The obvious drawback to this method is the high cost
associated with the device and requiring the subject
to be at rest (not sleeping) for 20 minute time periods.
This method also requires careful calibration between
tests and controlling for variables by testing subjects
fasted, prior to any food or drink consumption.
Doubly Labeled Water Technique
The doubly labeled water technique involves consuming
a quantity of water with a known concentration of
non-radioactive stable isotope forms of hydrogen and
oxygen 21. This method estimates average daily energy
expenditure in free-living conditions once the isotopes
have distributed throughout all bodily fluids (roughly
How To Read Research: A Biolayne Guide
5 hours) 21. These labeled isotopes serve as tracers
and can be measured as they leave the body through
sweat, urine, pulmonary vapor, and carbon dioxide
(CO2). The difference between elimination rates of
the two isotopes is determined using an isotope ratio
mass spectrometer and allows for an estimate of
total CO2 production 21. During the observation period
(several days to weeks), researchers measure a urine
or saliva sample for concentrations of the enriched
isotopes for estimation of CO2
production rate. Researchers
then use this estimated carbon
dioxide production rate and the
subject’s RQ to calculate energy
expenditure. This technique has
a high cost associated with it,
which results in low sample sizes
and doesn’t allow for evaluation
of day to day variations in
energy expenditure. However,
this method allows prolonged
assessment periods that don’t
interfere with everyday life or
physical activity. This method
also serves as a criterion to
validate other methods since
its accuracy averages between
3-5% when compared to direct
measurements
of
energy
expenditure
in
controlled
21
settings
. Drawbacks to
this technique are that it does not assess what is
contributing to changes in energy expenditure (BMR vs.
NEAT vs. TEF vs. Exercise) and has been demonstrated
to possibly overestimate energy expenditure in low
carb diets 64.
Direct Calorimetry
Direct calorimetry is the most controlled and accurate
measure available for estimating energy expenditure.
This is accomplished using a metabolic ward or
metabolic chamber that houses subjects in a room
sized chamber. The chamber has an inlet for oxygen
to flow into the chamber and an outlet for CO2 to exit.
There is also a layer of water surrounding the chamber
38
and as the subject’s heat is dissipated it warms that
layer of water. By knowing the volume of water and
the temperature change of the water, researchers can
then calculate heat production. Then calculate energy
expenditure based on heat production. This type of
measurement is highly expensive, which is why few labs
have this available for measuring energy expenditure.
For this reason, you will often find energy expenditure
to be measured using indirect methods. When this type
of measurement is used in studies, you’ll see they use
small sample sizes to account for the high cost. But
these types of studies will always be more powerful
than a study that uses an indirect measurement.
Hormones
Hormones are chemical messengers synthesized in
specific glands and transported in the blood to targeted
cells or receptors to elicit a physiological response.
Hormone secretion rarely occurs at a constant rate
and adjusts rapidly to meet the demands of the body 21.
Various sources can impact a hormone secretion rate
depending on the magnitude of chemical stimulatory or
inhibitory input 21. The secreted amount of a hormone
is indicative of its blood plasma concentration 21.
Hormones are most commonly tested from blood
draws and analyzed based on their blood serum
concentrations. Some hormones can be tested through
saliva which introduces a less invasive and more costeffective method. Cortisol is commonly assessed from
saliva and has been shown to have a linear correlation
with blood concentrations 58. However, the correlation
was low and blood concentrations could not be inferred
from salivary cortisol concentrations 58. Also, the
concentrations of hormones in saliva are much less than
the concentrations in the blood, which could indicate
that salivary measures are more indirect and imply
passive diffusion rather than active secretion 58. There
are a number of factors that can impact the results of
salivary hormone testing, but when procedures are
standardized and certain variables are accounted for,
salivary hormone testing can be a good and acceptable
How To Read Research: A Biolayne Guide
method, especially if blood testing is not an option.
However, blood testing will give a better indication of the
secreted hormone concentration. We have frequently
noticed in the fitness industry many individuals place
hormones on a pedestal and unreasonably emphasize
hormone data. While hormone results are objective
and excellent physiological outcomes, they shouldn’t be
exaggerated. There are a few things you want to consider
when evaluating hormone results. Most hormones
are secreted based on a variety of stimuli, while other
hormones follow specific secretory daily cycles (diurnal
pattern) or several week cycles 21. If not performed
properly, one single blood draw will fail to account for
the specific secretory pattern of certain hormones and
won’t tell you anything about the changes that occurred
following a treatment. Even with multiple testing
points, the secretory pattern must be acknowledged
or flaws in the analysis and interpretation of results can
occur. Acute changes in hormones don’t necessarily
lead to long-term adaptations. A great example of
this is the hormone hypothesis for muscle growth.
It was commonly believed that the acute increases
in anabolic hormones like testosterone and growth
hormone following resistance training leads to greater
muscle growth. However, this has been discredited by
a comprehensive review that explains how acute postexercise increases in systemic hormones are not a proxy
measure for increased muscle growth 59. Rather, these
transient increases in hormone concentrations are
more likely due to changes in fuel demand and increased
fuel mobilization to support exercise. Hormone data
gives us an objective measure for assessing underlying
physiological responses to certain treatments, but
it only gives us a snapshot of physiological changes
during the specific measurement points. Therefore,
it’s imperative to have a comprehensive understanding
of specific hormones and underlying physiology when
evaluating hormone data.
39
Muscle Excitation
Electromyography (EMG)
Electromyography is a measure of how the
neuromuscular system is behaving 60. In exercise
science, EMG is commonly used to investigate variables
such as muscle activation, force production, muscle
recruitment, muscle strength and hypertrophy (which
is problematic as we’ll discuss). Surface EMG (sEMG) is
the most frequently used device and is highly sensitive
to increases and decreases in voltage that occur on the
muscle fiber membrane 60 Small electrodes are placed
over the muscle group/s of interest on the surface
of the skin. The electrodes transmit the detected
electrical impulses to a computer that displays a
graphical representation of the voltage amplitude
readings. Great caution is needed when reading
studies using EMG as a primary outcome due to the
complicated nature of sEMG and lack of longitudinal
work 60. Amplitudes measured with sEMG are the
most frequently reported metric in EMG experiments,
which are a measure of excitation, they are not a
direct measure of activation and sEMG amplitudes
by themselves cannot be used to infer motor unit
recruitment or rate coding 60 In other words, sEMG
cannot tell us if a certain exercise is recruiting more
muscle fibers or if the muscle fibers that are activated
are firing at a faster rate. Additionally, the passive
properties of muscle allow force production to occur
with a corresponding sEMG amplitude reading of zero,
indicating sEMG amplitudes cannot reliably predict
muscle force during dynamic tasks 60, 61.
It has been assumed that greater sEMG amplitudes
from certain exercises can be used to predict longterm adaptations in strength and muscle hypertrophy,
this is currently unknown and conclusions should be
interpreted with caution due to sEMG’s inability to
account for muscle properties and the number of other
variables that can impact hypertrophy and strength
adaptations 60. There are a number of factors that
can influence sEMG amplitudes aside from muscular
effort such as muscle length, contraction type,
contraction speed, tissue conductivity, and electrode
How To Read Research: A Biolayne Guide
placement; if variables like these are not accounted for
it can make comparisons between different exercises
inappropriate 60. The take home message from this
discussion is that EMG data can be very messy and
misconstrued with inappropriate conclusions, but
that’s not to say that all EMG studies/data are useless.
The previously mentioned confounding variables need
to be controlled when using EMG as a primary outcome.
Comparing different exercises using a within-subject
and within-muscle (comparing pre- and post-test
results from the same person and same muscle in the
same testing session) design may provide more reliable
data on muscular excitation and force production when
amplitude signals are appropriately normalized and
other variables are controlled 60. While EMG data can be
useful for understanding the neuromuscular system,
the conclusions and recommendations are currently
limited by lack of longitudinal studies 60.
Strength Testing
Strength is a skill and highly specific, not only to the
type of exercise, but also specific to the intensity
and rep range you consistently train at. This can be
problematic when attempting to measure and compare
strength adaptations between groups exposed to
different training programs.
Repetition Maximum (RM)
Repetition maximum (RM) is the most commonly used
test of strength you’ll encounter in the literature. An RM
test can be used for any number of repetitions to assess
the maximum amount of weight a subject can lift for a
specified number of repetitions, most often a 1RM test
is utilized. This type of test lends itself to a certain level
of subjectivity because load selection is dependent on
the research staff who are supervising. If researchers
over or underestimate the load change it can lead to
a subject not achieving a true maximum and falling
short due to fatigue from repeated max attempts.
So long as proper standardization is applied with an
established protocol, subjectivity can be minimized.
40
Generally, most protocols involve a few warm-up sets
with progressively heavier weight in each set. After
a few minutes rest between sets the subject would
then attempt a near maximal attempt. During each
completed attempt, the weight is increased based
on the researcher’s discretion. A skilled researcher
should find a 1RM within three to five attempts. Some
researchers suggest if the only measure of strength in
a study is a 1RM, it may overlook strength adaptations
because the 1RM is a skill and will improve most when
training closely reflects the 1RM test 62. For example, if
you have one group training closer to their 1RM during a
training period, theoretically they would perform better
at a 1RM test than a group who trains further away from
their 1RM, making it difficult to compare adaptations.
For this reason, it may be more appropriate to include a
test that both groups are inexperienced at performing.
Tests using dynamometers can accomplish this and
provide a more objective measure.
Dynamometry
There are various types of dynamometers used
in research. It can range from spring or hydraulic
loaded dynamometers to highly sophisticated
computerized dynamometers that can isolate various
types of contractions and force outputs. The spring
and hydraulic dynamometers are usually used for a
measure of forearm isometric strength using a hand
grip dynamometer. These are simply a handle that
you squeeze and hold for a few seconds to measure
the pounds or kilograms of force that you generate.
The more sophisticated computerized devices offer
more functions and provide a more comprehensive
evaluation of muscle function and strength. These
can range from simple handgrip dynamometers
to leg extension and mechanized squat devices.
These computerized devices like the knee extension
dynamometer, can tightly control the range of
motion, duration of each rep and the force applied
throughout different ranges of motion. These devices
can measure a number of variables including maximal
voluntary isometric contraction, rate of isometric
force development, power, torque, and velocity.
Generally, you’ll see rate of force development,
maximal voluntary contraction and peak power
How To Read Research: A Biolayne Guide
reported for various contraction types (eccentric,
concentric, isometric, etc.).
Psychometrics
Psychometrics measure psychological constructs such
as, moods, behaviors, and personality traits. These are
measured using various types of questionnaires, surveys
and interviews. These types of measures fall under
descriptive research and are widely used in education and
behavioral sciences3. Questionnaires and surveys can
utilize open-ended, closed questions or a combination
of the two. Open-ended questionnaires provide more
opportunity for subjects to elaborate or provide detailed
information about their feelings or ideas. For example,
“Why did you struggle to adhere with your diet?”. While
these types of questions can gather a lot of detailed
information, they require considerable time and are
difficult to score or compare answers between subjects
or groups. Closed questions require a specific response
and commonly are yes, or no questions. These types of
questionnaires are relatively faster to administer, and
score compared to open-ended questionnaires. With
an appropriate scoring system closed question surveys
can be used to compare answers between subjects
or groups of subjects. Closed questions also include
different formats such as scaled, ranking or categorical
questions. A very common iteration of a scaled
questionnaire is known as a visual analog scale (VAS)
which has a line with corresponding answer choices
and equal intervals between answers that indicate
the strength of agreement or disagreement with a
statement 3. The problem with some questionnaires and
surveys is how questions are worded. Some questions
may be worded in a way that subjects may feel there is a
“right” or “wrong” answer and change their true response
based on trying to satisfy the questionnaire. So, it’s
important they’re developed with appropriate wording
that doesn’t bias the subject to a certain answer and
even the order in which the questions are placed can
play a role. Also, the more a subject repeats a specific
survey or questionnaire the more likely it is to bias their
41
answers, since they become more familiar with the
questionnaire. For this reason, it’s important to have
a proper amount of time in between testing points
for questionnaires and surveys. It’s not a requirement
for questionnaires to be validated prior to its use in an
experiment, but it would carry greater importance if it
was. Generally, a lot of questionnaires used in exercise
and nutrition science have been validated in the field
of psychology or other medical related fields which
indicates they are acceptable for use, but a lot of times
these questionnaires should be developed specifically
for the sample being studied. There are many different
psychometric questionnaires and surveys available. You’ll
most often see a number of different likert or VAS scales
used, along with others like the profile of mood states
questionnaire, the three factor eating questionnaire, or
the Pittsburgh sleep quality index. Psychometrics can be
a very useful, cost-effective tool and easily implemented
with other more objective based measures to provide an
in-depth evaluation.
How To Read Research: A Biolayne Guide
Once again, we’d like to reiterate that this is not a
complete list of all of the available measurements in
exercise and nutrition research. There are many others
available, but generally these are the ones you will
frequently encounter when reading through nutrition
and exercise science research publications. When
measuring any outcome, it’s best to use a combination
of measures whenever possible. This can help to
provide a more comprehensive evaluation of the
outcomes of interest and control for more variables.
Obviously, that’s not always possible for some labs
and they have to deal with the equipment they have
available. Which means, as consumers of research
we need to critically evaluate the methods that are
used in experiments and take them for what they’re
worth. Again, just because a study uses a less valid and
reliable method, doesn’t mean we should throw it out.
Instead we can use it as a small piece of evidence and
compare with other studies.
42
Closing Remarks
There is no substitute for spending years in a lab
actually working on a study, but hopefully this handbook
has provided insight into the research process.
Participating in research is really the only way you
can appreciate and understand how much goes into
conducting and publishing a study. Unfortunately
not everyone has the opportunity to attend higher
education, which was the primary motive for this
handbook. To share our knowledge and understanding
of research based on our experiences conducting
studies in nutrition and exercise science. Keep in mind
that this is a relatively short and non-comprehensive
guide to what goes into conducting and publishing a
study. Please take a look at some of our references
which includes some excellent books worth investing
in if you want to learn more about the research process.
If you want to be more active in research but you don’t
have the opportunity to attend a University, many labs
How To Read Research: A Biolayne Guide
in your area would be grateful to have another helping
hand. Simply reaching out to a professor in your area
who is conducting research that interests you can help
guide you in the right direction.
We hope you’ll consider subscribing to our research
review as part of the [Biolayne.com](http://Biolayne.
com) membership. Each month we review 5 scientific
studies related to training, nutrition, supplements,
muscle growth, fat loss, and other topics related
to health and fitness. Our goal is to provide our own
opinions, criticisms, dig into the important nuances,
but also summarize the studies into a digestible format
for the non-scientist. Our mission for this review is to
establish a resource that allows individuals to stay up
to date on current research and aware of the general
consensus of specific topics without a major time
commitment.
43
References
1.
Oxford University Press. (n.d.). Research English definition and
meaning. Lexico Dictionaries | English.
2.
3.
17.
Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D.
(2015). The extent and consequences of p-hacking in science. PLoS
biology, 13(3), e1002106.
Tuckman, B. W., & Harper, B. E. (2012). Conducting educational
research. Rowman & Littlefield Publishers.
18.
KRLEŽA-JERIĆ, K. A. R. M. E. L. A. (2014). Sharing of clinical trial data
and research integrity. Periodicum biologorum, 116(4), 337-339.
Thomas, J. R., Nelson, J. K., & Silverman, S. J. (2015). Research
methods in physical activity. Human kinetics.
19.
Moon, J. R., Eckerson, J. M., Tobkin, S. E., Smith, A. E., Lockwood,
C. M., Walter, A. A., Cramer, J. T., Beck, T. W., & Stout, J. R. (2009).
Estimating body fat in NCAA Division I female athletes: a fivecompartment model validation of laboratory methods. European
journal of applied physiology, 105(1), 119–130.
4.
Hopkins, W. G. (2000). Quantitative research design. Sportscience,
4(1), 1-8.
5.
Draper, C. E. (2009). Role of qualitative research in exercise science
and sports medicine. South African Journal of Sports Medicine, 21(1),
27-28.
6.
Barré-Sinoussi, F., & Montagutelli, X. (2015). Animal models are
essential to biological research: issues and perspectives. Future
science OA, 1(4).
7.
Baxter, P., & Jack, S. (2008). Qualitative Case Study Methodology:
Study Design and Implementation for Novice Researchers. The
Qualitative Report, 13(4), 544-559.
8.
Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.). (2019). The
handbook of research synthesis and meta-analysis. Russell Sage
Foundation.
20. Smith-Ryan, A. E., Mock, M. G., Ryan, E. D., Gerstner, G. R., Trexler, E.
T., & Hirsch, K. R. (2017). Validity and reliability of a 4-compartment
body composition model using dual energy x-ray absorptiometryderived body volume. Clinical nutrition (Edinburgh, Scotland), 36(3),
825–830.
21.
9.
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research.
Educational researcher, 5(10), 3-8.
10.
Ware, W. B., Ferron, J. M., & Miller, B. M. (2013). Introductory
statistics: A conceptual approach using R. Routledge.
11.
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate
cumulative science: a practical primer for t-tests and ANOVAs.
Frontiers in psychology, 4, 863.
12.
Cohen, J. (1988). Statistical power analysis for the social sciences
(2nd ed.). Routledge.
13.
King, L. (2018). Preparing better graphs. Journal Of Public Health And
Emergency, 2(1).
14.
Morton, R. W., Murphy, K. T., McKellar, S. R., Schoenfeld, B. J.,
Henselmans, M., Helms, E., Aragon, A. A., Devries, M. C., Banfield,
L., Krieger, J. W., & Phillips, S. M. (2018). A systematic review,
meta-analysis and meta-regression of the effect of protein
supplementation on resistance training-induced gains in muscle
mass and strength in healthy adults. British journal of sports
medicine, 52(6), 376–384.
15.
16.
Cumming, G., Fidler, F., & Vaux, D. L. (2007). Error bars in
experimental biology. The Journal of cell biology, 177(1), 7–11.
Mlinarić, A., Horvat, M., & Šupak Smolčić, V. (2017). Dealing with the
positive publication bias: Why you should really publish your negative
results. Biochemia medica, 27(3), 030201.
How To Read Research: A Biolayne Guide
McArdle, W. D., Katch, F. I., Katch, V. L. (2015). Exercise Physiology:
Nutrition, Energy, and Human Performance. United Kingdom: Wolters
Kluwer Health/Lippincott Williams & Wilkins.
22. Campbell, B. (Ed.). (2013). Sports nutrition: Enhancing athletic
performance. CRC Press.
23. RRooyackers, O. E., & Nair, K. S. (1997). Hormonal regulation of human
muscle protein metabolism. Annual review of nutrition, 17, 457–485.
24. Afting, E. G., Bernhardt, W., Janzen, R. W., & Röthig, H. J.
(1981). Quantitative importance of non-skeletal-muscle N taumethylhistidine and creatine in human urine. The Biochemical journal,
200(2), 449–452.
25. Katsanos, C. S., Chinkes, D. L., Sheffield-Moore, M., Aarsland, A.,
Kobayashi, H., & Wolfe, R. R. (2005). Method for the determination of
the arteriovenous muscle protein balance during non-steady-state
blood and muscle amino acid concentrations. American journal of
physiology. Endocrinology and metabolism, 289(6), E1064–E1070.
26. Kerr, A., Slater, G., Byrne, N., & Chaseling, J. (2015). Validation of
Bioelectrical Impedance Spectroscopy to Measure Total Body Water
in Resistance-Trained Males. International journal of sport nutrition
and exercise metabolism, 25(5), 494–503.
27.
Schoenfeld, B. J., Nickerson, B. S., Wilborn, C. D., Urbina, S. L.,
Hayward, S. B., Krieger, J., Aragon, A. A., & Tinsley, G. M. (2020).
Comparison of Multifrequency Bioelectrical Impedance vs. DualEnergy X-ray Absorptiometry for Assessing Body Composition
Changes After Participation in a 10-Week Resistance Training
Program. Journal of strength and conditioning research, 34(3),
678–688.
28. Graybeal, A. J., Moore, M. L., Cruz, M. R., & Tinsley, G. M. (2020).
Body Composition Assessment in Male and Female Bodybuilders:
A 4-Compartment Model Comparison of Dual-Energy X-Ray
Absorptiometry and Impedance-Based Devices. Journal of strength
and conditioning research, 34(6), 1676–1689.
44
29. Haas, V., Schütz, T., Engeli, S., Schröder, C., Westerterp, K., &
Boschmann, M. (2012). Comparing single-frequency bioelectrical
impedance analysis against deuterium dilution to assess total body
water. European journal of clinical nutrition, 66(9), 994–997.
30. Moon J. R. (2013). Body composition in athletes and sports nutrition:
an examination of the bioimpedance analysis technique. European
journal of clinical nutrition, 67 Suppl 1, S54–S59.
31.
Matias, C. N., Santos, D. A., Gonçalves, E. M., Fields, D. A., Sardinha,
L. B., & Silva, A. M. (2013). Is bioelectrical impedance spectroscopy
accurate in estimating total body water and its compartments in elite
athletes?. Annals of human biology, 40(2), 152–156.
32. Cole, K.S. Permeability and impermeability of cell membranes for ions
in Cold Spring Harbor Symposia on Quantitative Biology. 1940. Cold
Spring Harbor Laboratory Press.
33. Matthie, J. R. (2008). Bioimpedance measurements of human body
composition: critical analysis and outlook. Expert review of medical
devices, 5(2), 239-261.
34. Siri, W. E., Brozek, J., & Henschel, A. (1961). Techniques for measuring
body composition. Washington, DC: National Academy of Sciences,
223-224.
programme in women. Clinical physiology and functional imaging,
37(6), 663–668.
43. Wang, Z., Pi-Sunyer, F. X., Kotler, D. P., Wielopolski, L., Withers, R.
T., Pierson, R. N., Jr, & Heymsfield, S. B. (2002). Multicomponent
methods: evaluation of new and traditional soft tissue mineral models
by in vivo neutron activation analysis. The American journal of clinical
nutrition, 76(5), 968–974.
44. Friedl, K. E., DeLuca, J. P., Marchitelli, L. J., & Vogel, J. A. (1992).
Reliability of body-fat estimations from a four-compartment model
by using density, body water, and bone mineral measurements. The
American journal of clinical nutrition, 55(4), 764–770.
45. Wilson, J. P., Strauss, B. J., Fan, B., Duewer, F. W., & Shepherd, J.
A. (2013). Improved 4-compartment body-composition model for a
clinically accessible measure of total body protein. The American
journal of clinical nutrition, 97(3), 497–504.
46. Nickerson, B. S., & Tinsley, G. M. (2018). Utilization of BIA-Derived
Bone Mineral Estimates Exerts Minimal Impact on Body Fat Estimates
via Multicompartment Models in Physically Active Adults. Journal of
clinical densitometry : the official journal of the International Society
for Clinical Densitometry, 21(4), 541–549.
47.
35. Withers, R. T., Craig, N. P., Bourdon, P. C., & Norton, K. I. (1987).
Relative body fat and anthropometric prediction of body density
of male athletes. European journal of applied physiology and
occupational physiology, 56(2), 191–200.
Williams, J. E., Wells, J. C., Wilson, C. M., Haroun, D., Lucas, A., &
Fewtrell, M. S. (2006). Evaluation of Lunar Prodigy dual-energy X-ray
absorptiometry for assessing body composition in healthy persons
and patients by comparison with the criterion 4-component model.
The American journal of clinical nutrition, 83(5), 1047–1054.
36. Orphanidou, C., McCargar, L., Birmingham, C. L., Mathieson, J., &
Goldner, E. (1994). Accuracy of subcutaneous fat measurement:
comparison of skinfold calipers, ultrasound, and computed
tomography. Journal of the American Dietetic Association, 94(8),
855–858.
48. Clasey, J. L., Kanaley, J. A., Wideman, L., Heymsfield, S. B., Teates,
C. D., Gutgesell, M. E., Thorner, M. O., Hartman, M. L., & Weltman, A.
(1999). Validity of methods of body composition assessment in young
and older men and women. Journal of applied physiology (Bethesda,
Md. : 1985), 86(5), 1728–1738.
37.
49. Kullberg, J., Brandberg, J., Angelhed, J. E., Frimmel, H., Bergelin, E.,
Strid, L., Ahlström, H., Johansson, L., & Lönn, L. (2009). Whole-body
adipose tissue analysis: comparison of MRI, CT and dual energy X-ray
absorptiometry. The British journal of radiology, 82(974), 123–130.
van Marken Lichtenbelt, W. D., Hartgens, F., Vollaard, N. B., Ebbing,
S., & Kuipers, H. (2004). Body composition changes in bodybuilders:
a method comparison. Medicine and science in sports and exercise,
36(3), 490–497.
38. Evans, E. M., Saunders, M. J., Spano, M. A., Arngrimsson, S. A., Lewis,
R. D., & Cureton, K. J. (1999). Body-composition changes with diet and
exercise in obese women: a comparison of estimates from clinical
methods and a 4-component model. The American journal of clinical
nutrition, 70(1), 5–12.
50. Tothill, P., & Hannan, W. J. (2000). Comparisons between Hologic
QDR 1000W, QDR 4500A, and Lunar Expert dual-energy X-ray
absorptiometry scanners used for measuring total body bone and soft
tissue. Annals of the New York Academy of Sciences, 904, 63–71.
51.
39. Peterson, M. J., Czerwinski, S. A., & Siervogel, R. M. (2003).
Development and validation of skinfold-thickness prediction
equations with a 4-compartment model. The American journal of
clinical nutrition, 77(5), 1186–1191.
40. Kuehne, T. E., Yitzchaki, N., Jessee, M. B., Graves, B. S., &
Buckner, S. L. (2019). A comparison of acute changes in muscle
thickness between A-mode and B-mode ultrasound. Physiological
measurement, 40(11), 115004.
41.
Wagner D. R. (2013). Ultrasound as a tool to assess body fat. Journal
of obesity, 2013, 280713.
42. Schoenfeld, B. J., Aragon, A. A., Moon, J., Krieger, J. W., & TiryakiSonmez, G. (2017). Comparison of amplitude-mode ultrasound versus
air displacement plethysmography for assessing body composition
changes following participation in a structured weight-loss
How To Read Research: A Biolayne Guide
Haun, C. T., Vann, C. G., Roberts, B. M., Vigotsky, A. D., Schoenfeld,
B. J., & Roberts, M. D. (2019). A Critical Evaluation of the Biological
Construct Skeletal Muscle Hypertrophy: Size Matters but So Does the
Measurement. Frontiers in physiology, 10, 247.
52. Vigotsky, A. D., Schoenfeld, B. J., Than, C., & Brown, J. M. (2018).
Methods matter: the relationship between strength and hypertrophy
depends on methods of measurement and analysis. PeerJ, 6, e5071.
53. Haun, C. T., Vann, C. G., Mobley, C. B., Roberson, P. A., Osburn, S. C.,
Holmes, H. M., Mumford, P. M., Romero, M. A., Young, K. C., Moon, J.
R., Gladden, L. B., Arnold, R. D., Israetel, M. A., Kirby, A. N., & Roberts,
M. D. (2018). Effects of Graded Whey Supplementation During
Extreme-Volume Resistance Training. Frontiers in nutrition, 5, 84.
54. Ward L. C. (2018). Human body composition: yesterday, today, and
tomorrow. European journal of clinical nutrition, 72(9), 1201–1207.
45
55. Verdijk, L. B., Gleeson, B. G., Jonkers, R. A., Meijer, K., Savelberg, H.
H., Dendale, P., & van Loon, L. J. (2009). Skeletal muscle hypertrophy
following resistance training is accompanied by a fiber type-specific
increase in satellite cell content in elderly men. The journals of
gerontology. Series A, Biological sciences and medical sciences,
64(3), 332–339.
60. Vigotsky, A. D., Halperin, I., Lehman, G. J., Trajano, G. S., &
Vieira, T. M. (2018). Interpreting Signal Amplitudes in Surface
Electromyography Studies in Sport and Rehabilitation Sciences.
Frontiers in physiology, 8, 985.
61.
56. Smeulders, M. J., van den Berg, S., Oudeman, J., Nederveen, A. J.,
Kreulen, M., & Maas, M. (2010). Reliability of in vivo determination of
forearm muscle volume using 3.0 T magnetic resonance imaging.
Journal of magnetic resonance imaging : JMRI, 31(5), 1252–1255.
57.
Hellerstein, M., & Evans, W. (2017). Recent advances for measurement
of protein synthesis rates, use of the ‘Virtual Biopsy’ approach, and
measurement of muscle mass. Current opinion in clinical nutrition
and metabolic care, 20(3), 191–200.
58. Rantonen, P. J., Penttilä, I., Meurman, J. H., Savolainen, K., Närvänen,
S., & Helenius, T. (2000). Growth hormone and cortisol in serum and
saliva. Acta odontologica Scandinavica, 58(6), 299–303.
59. West, D. W., Burd, N. A., Staples, A. W., & Phillips, S. M. (2010). Human
exercise-mediated skeletal muscle hypertrophy is an intrinsic
process. The international journal of biochemistry & cell biology,
42(9), 1371–1375.
© Copyright 2022 Biolayne Technologies LLC
Roberts, T. J., & Gabaldón, A. M. (2008). Interpreting muscle function
from EMG: lessons learned from direct measurements of muscle
force. Integrative and comparative biology, 48(2), 312–320.
62. Buckner, S. L., Jessee, M. B., Mattocks, K. T., Mouser, J. G., Counts,
B. R., Dankel, S. J., & Loenneke, J. P. (2017). Determining Strength:
A Case for Multiple Methods of Measurement. Sports medicine
(Auckland, N.Z.), 47(2), 193–195.
63. Norton, L. E., Wilson, G. J., Layman, D. K., Moulton, C. J., & Garlick,
P. J. (2012). Leucine content of dietary proteins is a determinant of
postprandial skeletal muscle protein synthesis in adult rats. Nutrition
& metabolism, 9(1), 67.
64. Hall, K. D., Guo, J., Chen, K. Y., Leibel, R. L., Reitman, M. L.,
Rosenbaum, M., Smith, S. R., & Ravussin, E. (2019). Methodologic
considerations for measuring energy expenditure differences
between diets varying in carbohydrate using the doubly labeled water
method. The American journal of clinical nutrition, 109(5), 1328–1334.
46
Download