HL7 Project Scope Statement

advertisement
Emerging Technologies Project Plan Template
1. Project Title:
Impact of Emerging “Big Data” Sources on Clinical Research and Review
2. Project Scope:
Identify the impact of emerging data sources in clinical research. Impact includes:
 How/where this data is typically collected and stored
 Typical data volumes and rate of data collection
 Use cases for the access, analysis, viewing, and review of these data types
 Mechanisms by which this data might be viewed or analyzed “in place” instead of physically
moving or copying the entire data set
The overall goal of this effort is to promote awareness of the potential changes needed to cope with
these emerging data sources.
3. Project Team Members
Interim industry lead: Louis Norton, Covance
Suresh Madhavan, PointCross (+individual to be assigned)
Hugo Geerts
Marcelina Hungria
More members strongly encouraged
Note: Project would prefer to get an FDA co-lead to be named
4. Affected Stakeholder:
Data providers (e.g. labs, hospitals, etc)
CRO(s)
Sponsor scientists & regulatory submissions groups
Reviewers / Regulatory agencies
System / Software vendors
5. Project Meeting Frequency:
Every 2 weeks.
6. Project Objectives and Timeline:
Collect relevant data sources and use cases. Identify complementary or 3 months
similar efforts for re-use of information already available.
Prioritize and do more detailed analysis on 3-4 relevant cases (ones for 5 months
which information is available) – including consumption patterns
(analysis, viewing).
Develop/document findings and recommendations for solution approach / 3 month
architecture.
Assessing Impact of Emerging “Big Data” Sources on Clinical Research and Review
Background
A new working group focused on “Emerging Technologies” was established at the PhUSE/FDA
meeting on March 18th and 19th 2013 at Silver Springs, MD. One of the many subgroups in this
group took up the task of assessing the impact of emerging “Big Data” sources, that are
expected to be prevalent in clinical research, on both researchers and reviewers at regulatory
agencies such as FDA.
For the purposes of this working group, “Emerging” will include those proven technologies that
are being adopted and which are expected to become widely adopted in the mid term future. In
that spirit we plan to use this forum to look ahead, imagine, and forecast, as best we can,
some of these emerging sources that will generate very large data sets, in the aggregate, of
petabytes or even exabytes. We will consider how these “big data” will need to be:







Stored at source
Used at source
Exchanged or made available to other stakeholders
Used at the destination
Included in part or whole within regulatory submissions
Archived, reposed or stored for future retrospective look-backs
Secured from source to destination and along the way at various stores
While contemplating each of these aspects we will necessarily have to touch upon or identify,
but not resolve:











Technology issues related to storage and access
Nature of the data and nature of access
Nature of applications that will access these data sets
Constraints on transmission of data among stakeholders
Meaning or relevance of “validation” of computer systems in the context of CFR Part 11
Validation of data and how traditional rules must evolve
The role of semantically enabled analysis and ontology management - a topic that will
be touched on the surface as there is another sub-group that is assigned this task
Nature of, and role of enriched metadata in the world of MapReduce and specialized
“apps”
New role of validating the functionality of specialized apps when metadata is the only
data that can be moved
How “data in motion” may play a different role in such areas as in Pharmacovigilence as
well in new clinical trial arms where constant monitoring on the cloud may be important
Role of Cloud for certain types of data - although we will not delve into much details
since this is already covered by another sub-group.
Separately, we will consider the following aspects for each of the big data sources:

Source of the big-data and its type in terms of how it is represented elementally or in
aggregation










How it is collected and frequency, and expected triggers for collection (protocol, AE)
Nature of the data –
o level of structure,
o associated metadata and its essential structure
o its inherent consistency, completeness, and quality at source
o its usability – direct or only post-processing
Likely use in real-time by which stakeholders
Likely use in quasi real-time by which stakeholders
Likely use in research by which stakeholders
Likely use for reporting by which stakeholders
Likely use by reviewers
Size of data at each collection event
Expected size over a trial or investigation
Expected size over a period of time by a large to medium scale operation
Download