Content_Proposal_for..

advertisement
This analysis was submitted by John Brega and Jane Diefenbach of PharmaStat. It is based on methods
they have developed through their eCTD submissions consulting practice.
We call the Data Guide document either tabulations-data-guide.pdf or analysis-data-guide.pdf,
depending on which type of database it describes. In the submission it lives with the data, just like the
define.pdf or define.xml. We did not call it a Reviewer’s Guide because that causes a name collision
with the Reviewer’s Guide document typically included in Module 1, which is a guide to the entire
submission and mainly the work of the Clinical and Regulatory groups.
We have provided links to examples of both the Tabulations Data Guide and the Analysis Data Guide.
We have also described and provided an example of a Guide to Analysis Programs. The examples make
it easier to visualize what we’re describing below in the analysis of the issues, but the issues can be
addressed by many different styles of document. It is important to understand the issues and the
rationales for the document contents to fully understand the examples.
Data Guide for an SDTM Database
Click Here for a generic example of an SDTM tabulations-data-guide.pdf document.
Click Here for a more detailed example of a Data Guide document for a public domain TB submission.
For an SDTM database the document has six sections:
1. Intro to the study and database
We provide one to three paragraphs at the top of the document, starting with the name and
title of the study, followed by a few general observations about it such as whether the database
includes screen failures, whether the study is ongoing, whether it is the pivotal study, which
studies are included if it’s an ISS or ISE database, etc.
2. Table of Contents and bookmark panel
3. Where to Find Key Data
One of the difficulties reviewers have, even with SDTM data, is to figure out where to find the
basic data they need to review, and which datasets to ignore because they’re just administrative
data, like lab sample dates that are already reconciled with the LB dataset. We typically put
most or all of the following subsections under this header:

Demographics and Compliance

Exposure to Study Treatment

Subject Disposition

Safety

Efficacy

Pharmacokinetics/Pharmacodynamics

Trial Design Model Datasets
Each subsection gives the names and titles of the datasets relevant to that category of data. For
example, if only Drug Accountability data was collected for the study, it would mention that DA
(Drug Accountability) is the primary source for exposure data, and that the EX (Exposure)
domain is derived from the DA data. The Efficacy section it might describe which custom
domains address which study endpoints. For a phase I study it might simply say that no efficacy
data were collected. There could be other subcategories as needed. For example, a Thorough
QT study might have a section to identify the ECG and PK data, describe their relationships,
indicate where to find the means of the replicates, etc.
4. Overview of Custom Domains
We typically don’t give descriptions of the standard domains, but we find that the custom
domains can be very obscure, limited as we are to two-character dataset names and 40
character dataset labels. We provide a description of each custom domain. The description is
usually just a short paragraph describing the dataset structure and purpose, what kind data it
contains, how it relates to other datasets, and how to identify the subjects or records of
particular interest. If it’s just administrative data we would mention that.
5. Datasets Not Submitted
It frequently happens that a study end up with no data collected for some domains. Since the
SDTM rule is to not submit empty datasets, the database can have holes that make it appear
that the sponsor was asleep at the switch. This most frequently happens in phase I healthy
volunteer studies where, for example, there may not be any subjects with IE criteria not met, or
any AEs. You’d kind of wonder if there weren’t an AE dataset, wouldn’t you? For later phase
studies there might be a collection instrument for pregnancies, but none were reported. And so
forth.
6. Derived Datasets
Sometimes SDTM datasets are completely derived, and in these cases it’s important to describe
the derivation methods and the assumptions that are made. For example, if EX is derived from
DA, you often end up with EX records that represent an average daily dose for each “continuous
dosing interval” since you don’t have a record of each dose taken. It’s important to know this if
you want to associate AEs and other data with specific cumulative dose levels, for example.
If a DV (Protocol Deviations) dataset is submitted it is essential for the reviewer to know what
categories of deviations or violations were included in the dataset, and what rules or data
sources were used to identify them. If you get a dataset with a handful of trivial deviations and
no explanation, you have no idea whether those were all the deviations there were, or whether
they were the only ones collected on a CRF, for example.
We put the descriptions of derivations in their own section because they can sometimes be
lengthy and are a different kind of narrative from the descriptions of the datasets, which are
focused on how to use and interpret them.
Data Guide for an Analysis Database
Click Here for a generic example of an ADaM analysis-data-guide.pdf document.
We use a similar template for other types of databases and adjust it to address the reviewer’s needs.
We treat ADaM documentation the same as for any other legacy analysis database. The first three
sections above are included and perform the same function. Section 4 is an overview of all datasets,
since only ADSL is the only standard analysis domain. One thing we include in the description of ADSL is
the list of the “Core” variables that are merged onto all the other analysis datasets.
There is no need for section 5 (Datasets not Submitted). Section 6 we typically roll up into the dataset
overviews if the material is not too voluminous. This would include dataset-level and row-level
descriptions of the structure, derivation and usage of the data. Occasionally a study will involve a large
amount of complex derivation. In these cases we add another section on the end called Computational
Methods. We have seen these sections get as large as 70 – 80 pages. A handbook for scoring a complex
questionnaire is an example of a multipage derivation description that might be included in its entirety
for the benefit of the reviewer.
Data Guide for an “Item 11” Legacy Database
The template we use for legacy “Item 11” database documentation is essentially the same as for an
analysis database. In particular we include a description of every dataset since there are no standard
datasets with commonly understood purposes and structures. Since this type of data is typically less
derived than SDTM, a Computational Methods section is rarely needed.
Analysis Program Guide
Besides the define.xml/define.pdf, data guides and blankcrf.pdf, if a submission includes analysis
programs we also include an analysis-program-guide.pdf document. This is because the names of
sponsor SAS programs do not usefully describe their purpose, and without a guide to the program set a
reviewer has little hope of making good use of them. The purpose of the guide is to identify every
component of the submission that was produced by a program (e.g., analysis datasets, tables and
figures), the program that produced it, and the inputs the program used. We put this document with
the analysis data and related documentation because that’s where reviewers look for documentation.
There is an obvious case to be made for putting it with the programs, especially if it were an expected
document.
Click Here for a generic example of an analysis-program-guide.pdf document.
We build the program guide as a spreadsheet which we then convert to a pdf. It has four tabs:
1. Index Page
This tab serves as a table of contents for the other three tabs
2. Analysis Dataset Programs
This tab has an entry for each submitted analysis dataset. It specifies the dataset name and
description (label), the program that creates it, and the complete list of inputs. Everything
except the dataset label is a hyperlink to the named file.
3. Table and Figure Programs
This tab has an entry for each table and figure included in the Clinical Study Report, though not
necessarily the in-text tables. We do not include listings, as their programs do not typically
perform interesting derivations. The tab is ordered by table/figure number. It specifies the
table or figure number, the title of the table or figure, the program that created it, and the
complete list of inputs to the program. This table can be repetitive if many tables are produced
by a single program, but it makes it very easy for a reviewer to identify a program of interest by
looking up the table or figure number.
4. Macros
Frequently the analysis programs will employ a suite of SAS macros to perform common
functions. There may be many such macros used, but usually only a few are of interest to a
reviewer. This tab identifies the macros of interest. We define these to be macros that perform
some substantive derivation that contributes to results in an analysis dataset, table or figure.
We exclude utility macros that perform formatting, printing or common data manipulation
functions. Out of a possible 70 – 100 macros used by the analysis programs we rarely submit
more than about eight.
The tab has an entry for each submitted macro. It lists the macro name, a short description of
its function, and the complete list of programs that call the macro. Everything except the
description is a hyperlink to the referenced program file.
Download