Numerous hurdles must be overcome in the development and

advertisement
Development of an Endocrine Genomics Virtual
Research Environment: Building on Success
Richard O. Sinnott, Loren Bruns, Christopher Duran, William Hu, Glenn Jayaputera, Anthony Stell
Melbourne eResearch Group
University of Melbourne
Melbourne, Australia
rsinnott@unimelb.edu.au
Abstract The Australian National eResearch Collaboration Tools
and Resources (NeCTAR - www.nectar.org.au) project has
recently funded an initiative to establish an Australia-wide
endocrine
genomics
virtual
laboratory
(endoVL
–
www.endovl.org.au) covering a range of disorders including type1,
type-2
diabetes,
rare
diabetes-related
disorders,
obesity/thyroid disorders, neuroendocrine/adrenal tumours, bone
disorders and disorders of sex development. This virtual
laboratory will establish a range of targeted databases, clinical
registries and support a range of genetically targeted clinical
trials leveraging a body of international projects and experiences
garnered over many years through a range of EU and MRC
funded initiatives. This paper focuses on the plans for endoVL
and especially, the systems it leverages in supporting large-scale
clinical, collaborative environments.
Keywords - endocrine disorders, virtual research environment,
security
I.
INTRODUCTION
The endocrine system is comprised of a system of glands, each
of which secretes different types of hormone into the
bloodstream to regulate the body. At present across Australia
and indeed internationally, communities of clinicians and
biomedical researchers are working on aspects of clinical care
and biomedical research associated with particular disorders of
the endocrine system, however the infrastructure to support
these networks and to facilitate international endocrine-wide
research interactions does not exist. Furthermore, following
the sequencing of the human genome and unprecedented
growth and availability of genomic (and other –omics) data,
the research opportunities and challenges across the postgenomic life sciences for personalised e-Health is growing at
an exponential rate and “big data” is an increasingly common
challenge that must be overcome. Obviously infrastructures
for dealing with “big data” especially in the health context
require far more than just large-scale data storage systems.
Rather the systems need to be designed, developed and
deployed to cope with the specific concerns of health-related
data regarding information governance, privacy and
confidentiality. Experience has shown that these challenges
are enormous when considered in the large, e.g. consider the
UK Connecting for Health initiative [1], however they can be
successfully addressed when targeted to specific clinical and
biomedical endeavours. Endocrine-related biomedical research
represents a domain where a targeted endocrine genomics
virtual laboratory (endoVL) for clinical and biomedical
research would provide cohesion across many efforts across
Australia. Establishing and operating such a facility on behalf
of Australia-wide endocrine communities is the purpose of the
endoVL project (www.endovl.org.au) The endoVL project
itself commenced (in early 2013), however it builds upon and
leverages an extensive portfolio of software systems and
experiences in development and provisioning of securityoriented collaborative platforms dealing with a wide variety of
clinical and biomedical data and associated networks/research
communities.
Amongst others, these include major international
European (EU FW7) and UK MRC funded projects including:
 The €6m ENSAT-CANCER (www.ensat-cancer.eu),
which began in 2011 and has a focus on adrenal
tumours;
 The £625k I-DSD (www.i-dsd.org), which has a focus
on disorders of sex development;
 The €1m EuroWABB (www.euro-wabb.org), which
has a focus on the rare diseases Alstrom, Bardet-Biedl
and Wolfram;
 The $200k Australian Diabetes Data Network (ADDN)
(www.addn.org.au), which has a focus on child type-1
diabetes.
These systems have grown over time and have been used
for a wide range of major clinical trials and studies including
full 4-phase genetically targeted clinical trials. Whilst some of
these diseases are common, e.g. type-1 diabetes, others are
quite rare. Adrenal tumors are typical of this. For example
around 1-2 cases of adreno-cortical carcinomas (ACC) - one
particular adrenal tumor type are diagnosed for every million
individuals. About 600 cases of ACC are diagnosed for the
whole of Germany, population of approximately 81m [2].
Given the sparseness of such information, it is difficult to
establish how best to treat these patients and indeed what
treatment regimes work based for which cancer types at
particular points in their evolution. The need to aggregate such
data and ideally make it available with analytical tools is
compelling.
The Internet provides a ubiquitous data-sharing platform.
However this of course has major challenges that must be
addressed for such information sharing to occur: security,
ethics and importantly the standardisation of the information
that is to be shared. The endoVL project builds upon a
portfolio of projects, systems and lessons learnt in
development and delivery of such web-based infrastructures.
This paper describes the goals of the endoVL project and
highlights some of the key lessons learnt in the associated
projects upon which it is being built.
The rest of this paper is paper is structured as follows.
Section II focuses on the goals of the endoVL project and
highlights the communities it serves. Section III, illustrates the
data challenges and kinds of web based databases that are
currently being built to service those needs. Section IV
demonstrated the utility of these databases focusing in
particular on a major international clinical trial that is
currently on going. Finally section V draws some conclusions
on the work and outlines the challenges that remain to be
solved.
II.
ENDOVL COMMUNITIES NEEDS AND INFRASTRUCTURE
REQUIREMENTS
The endoVL project has been established to serve the
clinical/biomedical needs of a wide range of researchers,
groups, networks and societies across Australia. These include
the:
 Endocrine Society of Australia (ESA) - a national not-forprofit organisation of scientists and clinicians who
conduct research and practice in the field of
endocrinology. The ESA currently has over 900 members
spread over all of Australia and New Zealand.
 Australian and New Zealand Bone and Mineral Society
(ANZBMS) has over 450 members spread over all of
Australian and New Zealand. ANZBMS brings together
clinical and experimental scientists and physicians
actively involved in the study of bone and mineral
metabolism in Australia and New Zealand.
 Australian Diabetes Society (ADS) is the peak medical
and scientific body in Australia working towards
improved care and outcomes for people with diabetes.
The society has over 500 members.
 Australasian Paediatric Endocrine Group (APEG) is a
not-for-profit
organization
representing
health
professionals caring for children with diabetes across
Australasia. APEG currently has 400 members distributed
across Australasia.
 Australasian Disorders of Sex Development network
(DSDnetwork) is a newly formed network of clinical and
biomedical researchers across Australasia focused upon
understanding the aetiology of inter-sex disorders.
 Australian Thyroid Foundation (ATF) is a national notfor-profit organisation that supports and educates its
member base and promotes good thyroid health. There
are currently 250 members of the ATF.
 Clinical Oncology Society of Australia (COSA) is the
peak national body representing health professionals
working in cancer control and treatment. COSA provides
a national perspective on cancer control activity from
those who deliver treatment and care services across all
forms of cancer. COSA currently has 150 members.
The above societies represent the direct communities and
societies that will be the beneficiaries of endoVL, however it is
important to recognise that every hospital and GP practice in
Australia deal with patients with endocrine disorders. It s
intended that the infrastructure that endoVL will provide will
allow for knowledge exchange beyond the immediate groups
and societies identified above.
As described in section I, the most urgently demanded
capability to support these networks and societies is for secure
access to a range of detailed (disease/domain specific) data. For
many groups this is a largely phenotypic data, but for others
support for genetics and other –omics analysis capabilities is
needed. A major focus is on providing a seamless environment
where phenotypic and genotypic information is uniformly
presented. It is important to note that each disorder has its own
specific phenotypic data that is of interest and has been agreed
as the data to be collected. An overview of the infrastructure to
be developed is highlighted in Figure 1. This will be delivered
through a user-oriented research environment that is made
available through the Internet2 Shibboleth-based Australian
Access Federation (www.aaf.edu.au) where all of these
endocrine diseases and related information/clinical trials will
be made available.
Figure 1: High level schematic of endoVL Infrastructure
In Figure 1, a range of clinical hospitals (referral partners)
associated with the identified endocrine conditions act as the
sources of clinical information. The endoVL infrastructure
itself will support:
 A portfolio of disease-specific (deep) phenotypic
databases that will subsequently allow much needed
contextualisation of the –omics analysis of those patients.
Each of these databases will allow screening (searching)




for patients with certain phenotypic traits, on certain
treatments, or a variety of other disease-specific criteria
(particular karyotype, where a particular genetic
screening using a given –omics approach has found (or
not found) a particular mutation etc). These databases
will be populated both through security-oriented forms
that define and enforce data validation and structure, as
well as live data feeds from hospital systems (as outlined
below). Targeted bulk upload facilities for each
endocrine disease will be supported (through processing
of clinical data exported hospital databases to Excel
spreadsheets for example). In this model, clinicians will
be returned a list of auto-generated patient identifiers.
This model has been realized through the I-DSD project.
Clinical study platforms including development and
support of new clinical study databases (typically
populated with those patients screened from the given
phenotypic databases subject to ethical approval and
patient consent) together with eCRFs targeted to the
specific requirements of the clinical trials and studies.
Biobanking support including Australia-wide biosample
labelling and tracking between clinical referral partners
and biomedical research groups. This should use
international protocols for structured sample naming
including the State, Centre, Patient-Id, and an associated
sample identifier (such systems are already in use across
Europe and work with a range of site-specific laboratory
information management systems). This will allow a
tracking capability to show where all the samples are for
a given person at any given time.
-omics support including a range of analyses of patient
samples from major bioinformatics organisations across
Australia including the Australian Genome Research
Foundation (AGRF – www.agrf.org.au), Centre for
Comparative Genomics in Western Australia (CCG http://ccg.murdoch.edu.au/),
Genomics
Virtual
Laboratory (GVL - https://genome.edu.au/wiki/GVL)
and the National ICT Australia e-Health (NICTA http://www.nicta.com.au/business/health/e-health). These
groups cover the whole –omics space (from genomics;
proteomics; metabolomics; transcriptomics through to
whole exome/genome sequencing approaches). This will
allow for both a comparative analysis of common
bioinformatics approaches, e.g. where the same sample is
analysed using the same –omics approach (microarray or
mass spectroscopy etc) by different organisations/groups,
and where the same biosample is analysed using different
complementary –omics approaches, e.g. to get both a
more complete understanding of the biology and on the omics approaches themselves and their suitability in an
applied clinical context. The raw data; stages of
derived/processed data, and the final resultant (analysed)
data together with all associated metadata from the omics approaches for a given patient should be made
available through the endoVL research environment.
Pathology platform including standardised information
on cellular information and associated pathological
analysis of samples (including imaging). For tumours
this will include information such as the Weiss score,
number of mitoses, necrosis and information on when the
biopsy/surgery was undertaken. Targeted databases will
be developed for this purpose.
 Community forums for communication between the
clinical and biomedical groups involved and for wider
outreach efforts including to patient information support
groups and networks. This will provide summary data on
the number of patients included in the clinical databases,
information on the clinical trials that are ongoing, and the
process for access to and use of the endoVL resources
more generally. Linkage / mining of the relevant research
results from the international community through
facilities such as PubMed and MedLine will be offered.
 Finally, where appropriate relevant data/metadata will be
periodically published into the Human Variome Project
Australia Node hosted at the University of Melbourne
(HVPA - http://www.hvpaustralia.org.au/).
In the above figure, many of the images leverage pre-existing
data models and services for these disorders, e.g. DSD and
adrenal tumours developed in European projects. Other
systems are currently ongoing, e.g. the ADDN national type-1
diabetes infrastructure.
The seamless interplay across all of these solutions is
essential.
III.
ENDOVL DATA AND SECURITY CHALLENGES
Numerous hurdles must be overcome in the development and
maintenance of such a complex environment. The main
challenges are outlined in this section.
A.
EndoVL Patient Linkage and Security
Central to the information model used in endoVL is the notion
of a unique patient identity. A unique patient index is often
used to encapsulate the patient identity – typically within a
particular hospital centre. Each hospital has different
approaches and solutions to tracking patients internally, and
national systems are not yet ubiquitous. Establishing the
identity of individuals is critical to maintaining the
consistency of information held within the endoVL databases
but it must also be decoupled from any actual identity
information in use locally, e.g. the name, data of birth, their
address, Medicare number and potentially other local
institutional identifiers that may be in place.
It should also be noted that for almost all of the disease
areas under consideration within endoVL, an independent
standalone secure database was required, i.e. a database
without the requirement for interacting with existing hospital
IT systems. For many of the rarer-disorders such as DSD this
was quite satisfactory, however for some disorders such as
type-1 diabetes this model was unrealistic. There are of the
order of 500,000 individuals with type-1 diabetes across
Australia [3]. Given this, automating data feeds from hospital
settings was mandatory – to avoid the need for double data
entry. In this model, patient specific information (name,
address, Medicare number…) are kept in the hospital and a
unique identifier generated (using a heartbeat counter specific
to each centre). Thus the first patient from Westmead
Hospital, Sydney, New South Wales, has identifier NSWSYD-WM0001, the second NSW-SYD-WM0002 etc. This
unique identifier is kept locally to the hospital and associated
with other identifying information. This can be realised in
many ways, e.g. a simple hash table or local database,
depending upon the needs and experiences of the local
hospital IT staff and technologies that they have in place.
Lightweight clients are used to run local queries that extract
the core identified data and associate them with the unique
identifier NSW-SYD-WM0001, before pushing them out
through outgoing-only hospital firewalls to the corresponding
endoVL server where its digital signature is checked; it is
decrypted and subsequently pushed into the database
(validating the data as part of this process). This model of
pushing data and not allowing incoming connections from the
Internet appeases many of the security concerns of the hospital
staff involved.
An important aspect in this is the continual updates to data
feeds where/when patients come to clinics. That is, the
endoVL system needs to cope with updates to patient
information and the synchronisation issues that this raises. In
this case, a small subset of the core type-1 diabetes data needs
to be updated based on information collected during the
clinics. This includes such as the body-mass index and a range
of typically fluctuating measurements and assessments that are
captured on patient visits.
B.
EndoVL Portal Security
EndoVL needs to adhere to the highest standards of data
security. The endoVL portal is to be provisioned with the
Australian Access Federation (AAF – www.aaf.edu.au), which
provides federated authentication. However guided by ongoing efforts across Europe, users of the endoVL will also
have specific privileges depending on their role within the
endocrine disorder(s) efforts. These roles will be used to
restrict access to data, tools and resources more generally. The
default model will be to deny access, i.e. only individuals with
authentic and valid credentials (digitally signed credentials
recognised by the endoVL service components), will be able
to access and use these resources. A targeted attribute
authority will be set up for this purpose. These roles and
privileges will be allocated under strict management of the
endoVL staff, working in close cooperation with the clinical
research communities on the assignment of these roles and
privileges to the wider research community. Examples of the
security policies that will be adopted (based on projects like IDSD and ENSAT-CANCER) are that data can only be
accessed by individuals from the same hospital; only by
individuals from the same State (Victoria, NSW etc), or from
the same country. Furthermore, certain individuals will be
allocated roles that only allow read access to the database(s)
whilst others will be allocated read/write access.
As noted above, endoVL will not hold the names or
addresses of any individuals or any direct information that can
be used to identify any patient. Instead cases are identified for
the purposes of communication with an automatically
generated structured identifier that is unique within the clinical
databases and an email address of the associated clinician
responsible for the patient. This will leverage international
standards and protocols that have been defined through
projects such as ENS@T-CANCER on patient and biosample
identification and tracking.
At no time will a clinician ever be asked to reveal the
identity of an individual to any researcher outside of their
immediate clinical care environment. Furthermore, the linkage
between the patient identifiers in the endoVL and identifiers
used for biomaterials will be both distinct and completely
separated, i.e. it will never be possible to directly identify
(through a given software query or direct observation of a
particular software system) that a particular sample comes
from a particular patient through use of the endoVL or other
related IT system.
C.
EndoVL Cloud Security
In addition to security related to portal-based access, the
endoVL project will be hosting a wide range of genomic data
including whole genome data sets derived using next
generation sequencing technologies. To deal with the
encryption and decryption of these data sets in virtualized
environments and especially Cloud-based environments as
offered through the National eResearch Collaboration Tools
and Resources (NeCTAR) project (www.nectar.org.au), and
especially their Research Cloud, the project is adopting the
CSIRO TrustStore technology [4]. This allows users to create
secure storage spaces on the Cloud by defining which storage,
key management, and integrity services should be used and
subsequently providing credentials to access these services.
User profiles are used to support this process. With TrustStore,
drag and drop ability is offered to copy files from a local
machine to TrustStore storage. The files themselves are
fragmented, encrypted, and hashed/signed with the encrypted
fragments uploaded to the storage providers, and the keys
stored in a TrustStore Key Management Service and signed
digests stored in an Integrity Management Service.
It is expected that the genomic data itself is itself to be
stored in the Leiden Open Variation Database (LOVD3 http://www.lovd.nl/3.0/home). This system has been used in
other endoVL related projects including EuroWABB http://www.euro-wabb.org/en/lovd-genetic-variation-database.
At present the LOVD3 database has been established and
initial experiments conducted in its utility.
D.
EndoVL Data Flow Functions
The seamless movement of patient and/or genomic
information is essential in developing systems such as
endoVL. At the heart of the data flows is patient and sample
tracking. Knowing that a patient is involved in a given clinical
study should be used to inform for example whether they
should be excluded from other studies. The interconnectedness
of patients and their associated information is a key element of
endoVL.
At the heart of this scheme are patient and study
identifiers. Central identifiers used for patient tracking in the
endoVL system, e.g. for the core data registries, need to be
augmented with identifiers used within particular clinical trials
and indeed for tracking of biosamples associated with those
patients. Furthermore core registries established as part of
endoVL, e.g. for adrenal tumours, should allow both the
identification and recruitment of patients for particular clinical
trials and studies, as well as the delivery of all relevant
information on those patients for those trials and studies. In a
similar manner data that is collected throughout the course of
a particular clinical trial or study, should in principle (subject
to ethics and information governance arrangements associated
with the clinical trial) be available to wider collaborating
researchers.
IV.
ENDOVL CASE STUDY BASED ON ADRENAL TUMOURS
As identified, to realise the vision of a seamless
interconnected virtual research environment for endocrine
research across Australia, the endoVL project leverages a
body of on going efforts and resources. The ENSATCANCER project is a significant undertaking that focuses on
adrenal tumours (one form of neuroendocrine tumour).
The ENSAT-CANCER project focuses specifically on four
main types of adrenal tumour:
 aldosterone producing adenomas (APA);
 pheochromocytomas and paragangliomas (Pheo/PGL);
 non-aldosterone cortical adrenal adenomas (NAPACA),
 and adrenocortical carcinomas (ACC).
Each of these subtypes has different manifestations; involves
different molecular mechanisms and ultimately requires
different treatment regimes for optimal patient care. Given the
rarity of the above kinds of adrenal tumours, the availability of
a large collection samples, with associated clinical,
biomedical/-omic data and accompanying treatment
information is essential to better understand the differentiators
of these tumour types; their aetiology and their associated
molecular mechanisms.
The ENSAT-CANCER project commenced in 2011 and
has since grown to include data on over 4000 patients with one
of the above four tumour types; over 27,000 clinical
annotations (treatments, surgeries, clinical visit information),
as well as offering a source of over 5500 physical biosamples
for a range of biomedical research and –omics data analysis
(see Figure 2). The system is currently used by 37 different
centres/hospitals from around Europe. Given the rarity of the
conditions identified in section I, the critical mass of clinical
and biomedical data to conduct statistically relevant research
is now possible, and indeed is currently taking place. A key
goal of endoVL and this paper is to build on the success of
projects like ENSAT-CANCER.
Figure 2: Summary Data from ENSAT-CANCER
To illustrate how this system has evolved to be far more
than simply a set of databases, but a truly collaborative
research environment, we focus on one specific clinical trial
that is currently on-going associated with Pheo/PGL patients.
It is noted that a multitude of other genetically targeted
clinical trials are now supported through ENSAT-CANCER
including ADIUVO (ACC), FIRSTMAPPP (Pheo/PGL),
FAMIAN (an imaging study) and a prospective study on
recurrence of Pheo for patients with hypertension. It is
envisaged that this latter study will run for many years
(potentially over 25 years!)
A. PMT Study Background
The PMT study (Prospective Monoamine Tumour study) is a
four-phase clinical trial targeted to patients who exhibit
clinical indications of suspected pheochromocytoma through
one or more of the following criteria:
 signs and symptoms;
 therapy-resistant hypertension;
 incidental finding on imaging for related condition;
 routine screening due to known mutation or
hereditary syndrome;
 routine screening due to previous history of
pheochromocytoma.
Thus far patients have been admitted to the study from a range
of specialist clinics across Europe (including currently
Dresden, Munich, Wurzburg, Nijmegen and Warsaw).
Information on these patients is collected and tracked over
time over the full four phases of the trial (see Figure 3). These
phases include initial screening of patients, clonidine tests for
patients who meet the screening criteria, biochemical
characterisation of patients and the subsequent follow-up of
patients. This information is to be collected up to five years
after the study completes.
Plasma chromogranin A
Clonidine tests
Complications
Post-clonidine metanephrines/catecholamines
Phase 3: Tumor Characterization
Medications
Biochemical tests (carried over from phase 2)
Ambulatory blood pressure monitoring
Cardiovascular tests
Echocardiography
Electrocardiogram
Metabolic tests
Imaging tests
Figure 3: Clinical Path of Patients in the PMT study
At present 865 patients have been recruited to the PMT study
(see Figure 4).
Phase 4A: Excluded Follow-up
Follow-Up
Medications
Phase 4B: Pheo Follow-up
Genetics
Unresectable tumor: imaging
Resectable tumour: surgery/pathology, postoperative verification
One year follow-up: medications, biochemical
tests,
cardiovascular,
blood
pressure,
echocardiography, electrocardiogram, metabolic
Yes
Table 1: Sections of PMT study database information
Figure 4: PMT Recruitment Status
B. Pheo/PGL and PMT Data Flow
To support the recruitment (data flow) from the Pheo/PGL
database, many of the required data points have been mapped
onto the PMT data model directly. With a subset of
commonality in the data models, for patients who match the
criteria for recruitment to the PMT, many of their core data
can be automatically fed into the PMT phase 1
recruitment/screening database (subject, amongst other things,
to agreement/consent to participate in the study).
The full set of information captured in the PMT study
along with their counterpart in the Pheo/PGL database is
outlined in table 1.
Phase 1: Screening
Identification
Demographics
Medications
Screening biochemical tests
Signs and symptoms
Tumor details
Other cardiovascular diseases and malignancies
Hereditary PPGL syndromes (genetics)
Phase 2: Clonidine Testing
Medications
Overnight urinary metanephrines
24-hour urinary catecholamines
As noted a small but significant overlap of patient data exists,
however an equally important factor here is that a critical mass
of Pheo/PGL patients existed in the first place to support the
recruitment to this study. The actual clinical path (protocol) of
patient progress through the PMT study is shown in Figure 5.
In Pheo/PGL
Yes
Yes
Yes
Figure 5: PMT Phase Status
C. Biobanking Support
As identified in section II, value added services that should be
supported by a virtual research environment include broader
data tracking – including tracking of physical biosamples
between partner sites.
In terms of biobanking, the production of labels is of great
significance, along with the ability to provide information that
is relevant to all of the participating centers. In this regard
processes have been added to the biomaterial forms that allow
the specification of samples stored, which studies they are
primarily collected for, and the number of aliquots stored (this
is stored permanently with the option for modification when it
comes to the time of printing). The information captured
includes the canonical identifier (including the center code),
the biomaterial form number, the aliquot number, and the
sample name. This is also captured in barcodes for readers that
wish to capture the information electronically. There are many
barcode standards to choose from – QR codes were initially
tried but rejected due to the two-dimensional nature being
unreadable on the curved surface of a typical sample tube. As
a result the project has now adopted Interleaved 2 of 5 [7] as
the agreed standard (a label example is shown in figure 6).
Figure 6: Biomaterial label output with barcode, form
and study information (in this case EURINE-ACT study)
D. PMT Form Implementation and Adaptation
Many of the PMT study forms have relatively complicated
user interface requirements – signs and symptoms have a
matrix of symptoms versus frequency, duration and severity.
Similarly, a common specification throughout the PMT study
is the use of multiple units. Units can be input and converted
between SI and imperial measures (e.g. pg/mL versus
nmol/L). Originally these values were stored in the database in
one format – converted from what was originally input and
rendered in both units when returned. However, the accuracy
loss during conversion in this formatting was deemed
unacceptable for clinical purposes and the information is now
stored as separate data-points, interchangeable if precision loss
is tolerable, but losing none of the accuracy of the original
input data units. It should be noted that the range of numbers
is large – with values of three significant numbers to three
decimal places for measurements of normetanephrine (e.g.
0.003 nmol/L), compared to ranging to five significant
numbers for measurements of methoxytyramine (e.g. 50,000
nmol/L), in patients that may be presenting with a pheo
tumour.
Figure 7: PMT Phase 1 Screening Biochemical Test Matrix
Other significant features of forms used in the VRE include
the ability to track medication inputs throughout the study –
when they are added and when they are removed (important
for the interpretation of relevant biochemical results). These
are listed in three categories of anti-hypertensives, other
prescribed and “over-the-counter” supplements. It is typical
that the storage of such information is vast and can quickly
become unmanageable. However, a listing is only held for the
most important category (anti-hypertensives), which allows
this to be searched for standard terms more readily. The
existence of the other categories is more important, but the
flexibility of free-text is deemed acceptable from a clinical
knowledge point of view.
Similarly, relevant information from previous phases
tracks through the patient’s record in the eCRF. Genetic
information is used to interpret data obtained in later phases.
Critically, as patients are able to move from phase 1 straight to
phase 3 – where a patient has a pheo confirmed without the
requirement for supplemental clonidine testing – it is
important for the biochemical testing to still be performed. If
the patient has gone through phase 2 already then this
information is presented as complete in phase 3, but still has
the option for manual input for patients that missed phase 2.
The biochemical pages are also locked down to be open to
only those users in Dresden, the central biochemical
processing center. This allows the LS/MS spectrometer
information to be standardised to the lead center settings but
also interacts with the input/output information captured.
A completeness function has also been implemented for
each of the forms indicating what information still needs to be
input for each phase. The method follows a simple method of
green for complete, red for incomplete. The background
information that provides this is a survey of the relevant
database table, taking into consideration the points that have
been marked as optional (in consultation with the clinicians
involved).
E. Data Output and Analytics
Of paramount importance is the ability to use the clinical
information for statistical analysis. Whilst it is possible to
embed statistical tools within a browser, many researchers
prefer to directly download the data sets and analyse them on
their local desktop. In this regard, it is necessary to translate a
relational database structure into a two-dimensional
spreadsheet becomes the main challenge. A critical point here
is to have all the information for a single patient on each row.
Whilst information is captured that exists in a one-to-many
relationship – for instance, the biomaterial forms – must be
rendered transversely onto one line (figure 6).
Figure 6: Rendering one-to-many forms to a single
spreadsheet line
The output of such data to these spreadsheets can often be
“inelegant” as the size of the single line column width, must
be pre-calculated from the size of the single patient that has
the most table entries for that entity. The column number, and
programmable features of output, also depend on the format
selected – comma-separated values (.csv) being the simplest
with the ability to render in most simple text editors, .xls Excel
files being the next up (with a maximum column width of
65536), and finally the more advanced .xlsx format which can
accommodate a much greater number (though legacy issues of
whether the researchers have the latest version of Microsoft
Office or not may become relevant).
The use of this formatting style is particularly important
when tracking longitudinal information about treatment and
follow-up summaries. For the ACC section of the registry,
instances of recurrence, surgery resection status and
treatments received, can be summarised in a single page. To
export this summary to a similarly formatted spreadsheet, then
to track this over time, is a critical function for progressing
research into ACC treatments. Often this requires
programmatic interfaces that calculate form dates to assess
features such as time to recurrence or patient death. The
algorithms to complete these are being updated as the study
requirements develop.
V.
CONCLUSIONS
The endoVL project is very much ramping up across
Australia, however it is based upon established platforms that
are used for a wide range of biomedical research endeavours.
ENSAT-CANCER has become the central resource for a wide
range of major multi-center clinical trials that are currently
ongoing across Europe. These include ADIUVO (ACC
clinical trial), FIRSTMAPPP (Pheo trial), EURINE-ACT,
PMT (Pheo trial) with future studies including FAMIAN
(imaging trial) and AVIS in the offing. Similarly, the DSD
platform is now well established as the major disorders of sex
development platform with a 5-year MRC platform grant to
extend the EU EuroDSD system that was build originally.
VI.
REFERENCES
Download