Statistics Practical Handout

advertisement
SPSS TUTORIAL
Statistics and Data Management Module
Following you will find two practical exercises to be executed using SPSS. The
first one requires cleaning up and describing an example data set. The second
one requires the statistical analysis of an example data set. You will find detailed
description on how to do this, and you will receive plenty of support during this
session.
1
Clean and Describe a Data Set ...................................................................... 1-1
1.1 Data Set: Adamicin_data_set.xlsx ................................................................................................ 1-1
1.2 Exercises ............................................................................................................................................... 1-3
1.2.1 Prepare Data set in Excel and read into SPSS..................................................................... 1-3
1.2.2 Define and Recode Variables....................................................................................................... 1-4
2
2.1
2.2
2.3
2.4
Statistical Analysis of an Example Data Set ................................................... 2-6
Question 1 ............................................................................................................................................ 2-6
Question 2 ............................................................................................................................................ 2-7
Question 3 ............................................................................................................................................ 2-7
Question 4 ............................................................................................................................................ 2-8
1 Clean and Describe a Data Set
1.1 Data Set: Adamicin_data_set.xlsx
Adamicin: a new potent antibiotic
Introduction:
Adamicin is a new aminoglycoside antibiotic that has structural similarities to gentamicin. It
is more potent against P aeruginosa and S aureus ( both penicillin sensitive and resistant
than Gentamicin).
Comparative sensitivity of key organisms to Adamicin and Gentamicin
P aeruginosa
S aureus (sensitive)
MRSA
Adamicin
1.5
.25
1.5
1-1
Gentamicin
7.5
.5
3
In animal studies Adamicin has been shown to be less nephrotoxic than Gentamicin. Other
nonclinical safety tests were unremarkable.
Adamicin is given by intravenous injection at a dose of 1.2 mg/kg per day in patients with
normal renal function. It has a Cmax of 14mg/l and a plasma half life of 15 hours. The trough
level is 3mg/l. Thus Adamicin can be given once a day as an IV bolus.
Gentamicin is also given by intravenous injection at a dose of 3 mg/kg per day in patients
with normal renal function. It has a Cmax of 20mg/l and a plasma half life of 4 hours. The
trough level is 2mg/l. Gentamicin is given as an IV bolus 3 times a day.
Both products have low binding to plasma proteins (30-35%) and are excreted largely
unchanged in the kidney.
They are both acid labile and therefore cannot be given by mouth.
Protocol:
Background: adults with cystic fibrosis usually have chronic infection in their lungs with P
aeruginosa, the colonisation starting in the early teenage years and proving almost
impossible to eliminate. In addition they often experience acute respiratory infections with S
aureus, both sensitive and resistant varieties. Treatment can often be given at home by
inserting a central venous line, but if the infection becomes too severe the patient will have
to be admitted to hospital.
Objective: this is a phase 2 open label study to compare Adamicin and Gentamicin in adults
with cystic fibrosis who have an acute respiratory infection with S. aureus.
Inclusion criteria: adult patients with cystic fibrosis have an acute respiratory infection
where the presumptive organism is thought to be S. aureus. The diagnosis of an acute
respiratory infection will be based on 2 of the following criteria:
a.
b.
c.
d.
Increased production of sputum
Change of colour of sputum
Fever - temperature more than 370 C
Increase in respiratory rate of more than 20% (Normal range 12 – 16 breaths
/minute)
e. Decrease in FEV1 of more than 10% (Normal range men aged 20 – 50 is 3.5 –
4.5L/sec and women 2.8 – 3.2 L/sec)
Exclusion criteria:
a. Patients who require immediate admission to hospital
b. Patients with any degree of renal impairment – BUN (Blood Urea Nitrogen) outside
normal range of 2.5-8.0 mmol/L
c. Patients with a history of hepatic or cardiac disease
d. Patients with cancer except non melanoma skin cancers
e. Patients who have had an acute respiratory infection within the past 4 weeks
1-2
Primary outcome measure: elimination of S.aureus at 14 days
Secondary outcome measures:
a.
b.
c.
d.
e.
Improvement in FEV1 (Forced Expiratory Volume 1)
Admission to hospital
Time off work/study
Change in renal function
Side effects
Design: This is an open label randomised study with a non-inferiority design, where the
hypothesis is that Adamicin is not inferior to Gentamicin. The inferiority margin is 10%. Only
patients who have a positive culture for S. aureus (either penicillin sensitive or resistant) will
go into the efficacy analysis.
Numbers: 100 patients per group
Procedures: adult patients with cystic fibrosis who suffer from episodes of acute respiratory
infection will be invited to participate in the study when they attend for routine outpatient
visits. After being given the opportunity to read the study information sheet and signing the
consent document they will be screened for the inclusion and exclusion criteria and if
suitable with be given packs of study drug.
Patients will be screened for the exclusion criteria and have base line measurements of renal
function (BUN), FEV1, sputum volume and colour
When they have an acute episode they will take a sample of sputum and send it to the
central laboratory. They will immediately start on study drug for 14 days. On day 14 ± 1 day
they will attend the hospital for follow up sputum test, BUN and FEV1.
1.2 Exercises
1.2.1 Prepare Data set in Excel and read into SPSS
Open and compare the following two Excel worksheets:


The original: Adamicin_data_set.xlsx
The Edited for analysis: Adamicin_data_set_edited.xlsx
The following changes have been made to give the data the right format:
-
Limited variable names to one row. Avoid names starting with numbers, or “/”,”.”
signs, or spaces.
Replaced text by numeric codes  explained codes in an additional sheet.
1-3
-
Replaced missing values by missing value code e.g. -99
Note: Remember, numeric codes are always preferable, and you should never mix strings
and numbers.
Read in coded data into SPSS:
Open SPSS.
FILE>OPEN>DATA
Under Format drop down menu, pick Excel.
 Select Adamicin_data_set_edited.xlsx
 Make sure you read in the right sheet .
 Select “Read variable names from the first row of data”
 Save .sav file
-
-
Look in the Window ‘Data View’ (Windows can be switched by pressing tabs at the
bottom of the screen). After the first analysis assignment SPSS opens an output file.
Use [Alt] [Tab] to switch between the dataset and output file.
Commands can be saved in the syntax screen by pressing <Paste> in the menus
(instead of <Run>). Parts of the syntax can be selected to be executed with ‘Run’.
Closing SPSS: after all the work you may want to externally save some files.
Check that variables names have been read in correctly
Check that you have the right number of cases (200)
Check that, under variable view, all variables have been read as numeric, except for
AEs.
Check that variables such as FEV1, BUN or temperature have been read with
decimals.
1.2.2 Define and Recode Variables
1.2.2.1 Define Variable Properties
DATA>DEFINE VARIABLE PROPERTIES
Transfer all variables into box “Variables to Scan”. Click Continue. For each variable:
- Provide a label that describes the variable.
- Choose measurement level : Scale (continuous), Ordinal, or Nominal (categorical).
For categorical values, provide value labels. E.g. For Gender, Value = 1 Label = ’Male’
Value = 2 Label = ‘Female’
Tip: See that there are variables measured twice, and you can simply copy the variable
properties from the first using the button <Copy Properties From Another Variable>.
1-4
1.2.2.2 Recode Variables
The background information of the study describes a number of variables for which there
are values within a normal range, and abnormal values. Some of them are part of the
inclusion/exclusion criteria:
- Respiratory rate: Normal range 12 – 16 breaths /minute
- FEV1: Normal range men aged 20 – 50 is 3.5 – 4.5L/sec and women 2.8 – 3.2 L/sec
- BUN: normal range of 2.5-8.0 mmol/L
- S.aureus: Positive = Sensitive or Resistant, Negative = Negative.
For two of these variables, at each time point:
- Inspect the distribution of the variable
- Create a new dichotomized variable for each of them normal vs. abnormal or
positive vs. negative. Remember! Use numeric codes, and then provide value labels.
- Count the number of patients with abnormal levels
In SPSS:
 Frequency distributions at: ANALYZE>DESCRIPTIVE STATISTICS>FREQUENCIES. You can
get a histogram under <charts>.
 To dichotomize: TRANSFORM>RECODE INTO DIFFERENT VARIABLES. Tips: 1) You can
recode all repeated measures of the same variable at once. 2) Use the <If…> button when
different cut-offs apply to different individuals, e.g. FEV1 cut-offs for men and women.
 Once dichotomized, you can inspect the changes in category between baseline and day
14 using crosstabs: ANALYZE>DESCRIPTIVE STATISTICS>CROSSTABS
Cut and paste the graphs and frequency tables that you consider relevant in a word
document, to help you elaborate your responses to this practical. Do not paste anything that
you do not understand. You will need to interpret properly every table and graphic in your
hand-in document.
1.2.2.3 Describe Variables
For the following variables: Gender, Age, BUNBaseline, FEV1Baseline, Baseline resp rate,
Sputum Vol, Sputum colour and Medication:
- Describe type of variable: independent vs. dependent, scale of measurement
(Categorical, Ordinal, Continuous).
- Use Descriptive statistics or frequency distributions as appropriate.
- Use graphs as appropriate, e.g. histograms or boxplots for continuous variables, bar
or pie charts for categorical variables.
- For Continuous variables, decide whether they are normally distributed, if they can
be analysed as continuous or rather grouped into ordinal categories, or
dichotomized.
- For categorical variables, decide whether there are sufficient number of patients in
each category or if some categories need to be merged.
In SPSS:
1-5
 Descriptive statistics: ANALYZE>DESCRIPTIVE STATISTICS>DESCRIPTIVES: This dialogue
provides basic descriptives only. ANALYZE>DESCRIPTIVE STATISTICS>EXPLORE: This dialogue
provides a larger list of statistics, as well as plots and normality tests. Descriptives can be
obtained for separate groups e.g. Medication, adding the grouping variable under “Factor
List”.
 Graphics: For GRAPHS>CHART BUILDER: Here you can make further graphics where
different groups are depicted in e.g. different boxplots. However, here only independent
groups can be depicted together in the same graphic. GRAPHS>LEGACY DIALOGS: here you
can depict paired samples within the same plot e.g. repeated measures of FEV1.
2 Statistical Analysis of an Example Data Set
Using the SPSS data set from the Adamicin trial (Adamicin.sav), we are going to carry
out a number of statistical tests to test a series of hypothesis and assumptions. Please,
consult your book and or lecture notes to decide the appropriate statistical test to apply
in each case.
Note: In some of the exercises you are required to test hypotheses regarding more than
one outcome variable, e.g. Sputum Volume and Temperature. Most of the statistical
techniques can be carried out simultaneously for several outcome variables. However,
only if you feel very confident and see that you progress at good rhythm, carry out
analysis for both; otherwise, proceed with only one of them. It is more important for you
to get to the end of the practical.
2.1 Question 1
When carrying out a clinical trial, it is important to test imbalances between groups in
relevant variables:
1. Are there any significant differences in Sputum Volume, or temperature at
baseline between the treatment groups i.e. Adamicin/Gentamicin?
a. Look at the distribution of the outcome variables (normality tests, mean
and median, skewness, qq-plots) to decide whether parametric or nonparametric statistics are appropriate for each test.
b. For each statistical test, phrase the null hypothesis, report the relevant
statistics, and produce the appropriate graphic to illustrate the results.
2. Using the dichotomized variables created yesterday, test for imbalances in
Baseline BUN and respiratory rate.
a. Obtain the appropriate statistic, phrase the null hypothesis, and
interpret the results using percentages and residuals.
b. Produce the appropriate graph to illustrate the results.
In SPSS:
 T-tests in ANALYZE>COMPARE MEANS
 Non-parametric tests in ANALYZE>NONPARAMETRIC TESTS
2-6
 Crosstabs (Chi-squared tests) in ANALYZE>DESCRIPTIVE STATISTICS>CROSSTABS
2.2 Question 2
Let’s look now at how a few of the secondary symptoms measured in the study changed
between entry and the 14 days after treatment, separately for the two treatment groups.
To carry out simultaneous, but independent analysis in two groups in the data
set, we need to split the data. In SPSS DATA>SPLIT FILE, select “compare groups”, and
add “Medication” in the <Groups Based on> box. Click <OK>.
1. Is there a decrease in Temperature or sputum volume between entry and 14
days after treatment in either of the treatment groups?
a. Decide and justify what is the appropriate statistical test.
b. For each test, phrase the null hypothesis, interpret the results and
illustrate them with the appropriate graphic.
2. Using the dichotomous variables, check whether there is a change in 1)
Respiratory rate between entry and 14 days; or 2) S. Aureus between baseline
and 14 days.
a. Obtain the appropriate statistic, phrase the null hypothesis, and
interpret the results using percentages and residuals.
b. Produce the appropriate graph to illustrate the results.
In SPSS:
 When you finish with this section, make sure that you un-split the file. DATA>SPLIT
FILE select <Analyze all cases, do not create groups>.
2.3 Question 3
We have seen that both treatment groups experience an improvement in temperature
and Sputum Volume between baseline and 14 days after treatment. What we need to
know is whether such improvement is better in the Adamicin group compared to the
Gentamicin group. Choose one of the two outcome variables for this exercise
(temperature or sputum volume), and test whether the two treatment groups differ in
temperature/sputum at 14 days after treatment, controlling for the baseline level.
In SPSS:
 Univariate ANOVA in ANALYZE> GENERAL LINEAR MODEL>UNIVARIATE.
 Tip: Treatment is the categorical independent variable of interests, and thus it is said
to have “fixed effects”. The baseline level of the continuous outcome variable is
something we want to control for, to avoid possible confounding due to imbalance at
baseline, and thus it is considered a “covariate”.
-
Phrase the null hypothesis
Interpret the results
Make a graph to represent the results.
2-7
2.4 Question 4
The final question we want to answer relates to the primary outcome, S Aureus
(dichotomized). Remember that this is a non-inferiority trial, with a non-inferiority limit
of 10%. The protocol stated: “adult patients with cystic fibrosis have an acute respiratory
infection where the presumptive organism is thought to be S. aureus”. Thus, for the
following analysis we need to select patients positive for S Aureus at entry.
 In SPSS, DATA>SELECT CASES> IF CONDITION IS SATISFIED. Define the condition as
S Aureus at entry = positive (resistant or sensitive).
1. Test whether the two treatment groups differ in the rate of positive S Aureus 14
days after treatment. Choose the right statistic to make a decision against the
null hypothesis. Interpret the results using percentages and residuals.
2-8
Download