Development of an Endocrine Genomics Virtual Research Environment: Building on Success Richard O. Sinnott, Loren Bruns, Christopher Duran, William Hu, Glenn Jayaputera, Anthony Stell Melbourne eResearch Group University of Melbourne Melbourne, Australia rsinnott@unimelb.edu.au Abstract The Australian National eResearch Collaboration Tools and Resources (NeCTAR - www.nectar.org.au) project has recently funded an initiative to establish an Australia-wide endocrine genomics virtual laboratory (endoVL – www.endovl.org.au) covering a range of disorders including type1, type-2 diabetes, rare diabetes-related disorders, obesity/thyroid disorders, neuroendocrine/adrenal tumours, bone disorders and disorders of sex development. This virtual laboratory will establish a range of targeted databases, clinical registries and support a range of genetically targeted clinical trials leveraging a body of international projects and experiences garnered over many years through a range of EU and MRC funded initiatives. This paper focuses on the plans for endoVL and especially, the systems it leverages in supporting large-scale clinical, collaborative environments. Keywords - endocrine disorders, virtual research environment, security I. INTRODUCTION The endocrine system is comprised of a system of glands, each of which secretes different types of hormone into the bloodstream to regulate the body. At present across Australia and indeed internationally, communities of clinicians and biomedical researchers are working on aspects of clinical care and biomedical research associated with particular disorders of the endocrine system, however the infrastructure to support these networks and to facilitate international endocrine-wide research interactions does not exist. Furthermore, following the sequencing of the human genome and unprecedented growth and availability of genomic (and other –omics) data, the research opportunities and challenges across the postgenomic life sciences for personalised e-Health is growing at an exponential rate and “big data” is an increasingly common challenge that must be overcome. Obviously infrastructures for dealing with “big data” especially in the health context require far more than just large-scale data storage systems. Rather the systems need to be designed, developed and deployed to cope with the specific concerns of health-related data regarding information governance, privacy and confidentiality. Experience has shown that these challenges are enormous when considered in the large, e.g. consider the UK Connecting for Health initiative [1], however they can be successfully addressed when targeted to specific clinical and biomedical endeavours. Endocrine-related biomedical research represents a domain where a targeted endocrine genomics virtual laboratory (endoVL) for clinical and biomedical research would provide cohesion across many efforts across Australia. Establishing and operating such a facility on behalf of Australia-wide endocrine communities is the purpose of the endoVL project (www.endovl.org.au) The endoVL project itself commenced (in early 2013), however it builds upon and leverages an extensive portfolio of software systems and experiences in development and provisioning of securityoriented collaborative platforms dealing with a wide variety of clinical and biomedical data and associated networks/research communities. Amongst others, these include major international European (EU FW7) and UK MRC funded projects including: The €6m ENSAT-CANCER (www.ensat-cancer.eu), which began in 2011 and has a focus on adrenal tumours; The £625k I-DSD (www.i-dsd.org), which has a focus on disorders of sex development; The €1m EuroWABB (www.euro-wabb.org), which has a focus on the rare diseases Alstrom, Bardet-Biedl and Wolfram; The $200k Australian Diabetes Data Network (ADDN) (www.addn.org.au), which has a focus on child type-1 diabetes. These systems have grown over time and have been used for a wide range of major clinical trials and studies including full 4-phase genetically targeted clinical trials. Whilst some of these diseases are common, e.g. type-1 diabetes, others are quite rare. Adrenal tumors are typical of this. For example around 1-2 cases of adreno-cortical carcinomas (ACC) - one particular adrenal tumor type are diagnosed for every million individuals. About 600 cases of ACC are diagnosed for the whole of Germany, population of approximately 81m [2]. Given the sparseness of such information, it is difficult to establish how best to treat these patients and indeed what treatment regimes work based for which cancer types at particular points in their evolution. The need to aggregate such data and ideally make it available with analytical tools is compelling. The Internet provides a ubiquitous data-sharing platform. However this of course has major challenges that must be addressed for such information sharing to occur: security, ethics and importantly the standardisation of the information that is to be shared. The endoVL project builds upon a portfolio of projects, systems and lessons learnt in development and delivery of such web-based infrastructures. This paper describes the goals of the endoVL project and highlights some of the key lessons learnt in the associated projects upon which it is being built. The rest of this paper is paper is structured as follows. Section II focuses on the goals of the endoVL project and highlights the communities it serves. Section III, illustrates the data challenges and kinds of web based databases that are currently being built to service those needs. Section IV demonstrated the utility of these databases focusing in particular on a major international clinical trial that is currently on going. Finally section V draws some conclusions on the work and outlines the challenges that remain to be solved. II. ENDOVL COMMUNITIES NEEDS AND INFRASTRUCTURE REQUIREMENTS The endoVL project has been established to serve the clinical/biomedical needs of a wide range of researchers, groups, networks and societies across Australia. These include the: Endocrine Society of Australia (ESA) - a national not-forprofit organisation of scientists and clinicians who conduct research and practice in the field of endocrinology. The ESA currently has over 900 members spread over all of Australia and New Zealand. Australian and New Zealand Bone and Mineral Society (ANZBMS) has over 450 members spread over all of Australian and New Zealand. ANZBMS brings together clinical and experimental scientists and physicians actively involved in the study of bone and mineral metabolism in Australia and New Zealand. Australian Diabetes Society (ADS) is the peak medical and scientific body in Australia working towards improved care and outcomes for people with diabetes. The society has over 500 members. Australasian Paediatric Endocrine Group (APEG) is a not-for-profit organization representing health professionals caring for children with diabetes across Australasia. APEG currently has 400 members distributed across Australasia. Australasian Disorders of Sex Development network (DSDnetwork) is a newly formed network of clinical and biomedical researchers across Australasia focused upon understanding the aetiology of inter-sex disorders. Australian Thyroid Foundation (ATF) is a national notfor-profit organisation that supports and educates its member base and promotes good thyroid health. There are currently 250 members of the ATF. Clinical Oncology Society of Australia (COSA) is the peak national body representing health professionals working in cancer control and treatment. COSA provides a national perspective on cancer control activity from those who deliver treatment and care services across all forms of cancer. COSA currently has 150 members. The above societies represent the direct communities and societies that will be the beneficiaries of endoVL, however it is important to recognise that every hospital and GP practice in Australia deal with patients with endocrine disorders. It s intended that the infrastructure that endoVL will provide will allow for knowledge exchange beyond the immediate groups and societies identified above. As described in section I, the most urgently demanded capability to support these networks and societies is for secure access to a range of detailed (disease/domain specific) data. For many groups this is a largely phenotypic data, but for others support for genetics and other –omics analysis capabilities is needed. A major focus is on providing a seamless environment where phenotypic and genotypic information is uniformly presented. It is important to note that each disorder has its own specific phenotypic data that is of interest and has been agreed as the data to be collected. An overview of the infrastructure to be developed is highlighted in Figure 1. This will be delivered through a user-oriented research environment that is made available through the Internet2 Shibboleth-based Australian Access Federation (www.aaf.edu.au) where all of these endocrine diseases and related information/clinical trials will be made available. Figure 1: High level schematic of endoVL Infrastructure In Figure 1, a range of clinical hospitals (referral partners) associated with the identified endocrine conditions act as the sources of clinical information. The endoVL infrastructure itself will support: A portfolio of disease-specific (deep) phenotypic databases that will subsequently allow much needed contextualisation of the –omics analysis of those patients. Each of these databases will allow screening (searching) for patients with certain phenotypic traits, on certain treatments, or a variety of other disease-specific criteria (particular karyotype, where a particular genetic screening using a given –omics approach has found (or not found) a particular mutation etc). These databases will be populated both through security-oriented forms that define and enforce data validation and structure, as well as live data feeds from hospital systems (as outlined below). Targeted bulk upload facilities for each endocrine disease will be supported (through processing of clinical data exported hospital databases to Excel spreadsheets for example). In this model, clinicians will be returned a list of auto-generated patient identifiers. This model has been realized through the I-DSD project. Clinical study platforms including development and support of new clinical study databases (typically populated with those patients screened from the given phenotypic databases subject to ethical approval and patient consent) together with eCRFs targeted to the specific requirements of the clinical trials and studies. Biobanking support including Australia-wide biosample labelling and tracking between clinical referral partners and biomedical research groups. This should use international protocols for structured sample naming including the State, Centre, Patient-Id, and an associated sample identifier (such systems are already in use across Europe and work with a range of site-specific laboratory information management systems). This will allow a tracking capability to show where all the samples are for a given person at any given time. -omics support including a range of analyses of patient samples from major bioinformatics organisations across Australia including the Australian Genome Research Foundation (AGRF – www.agrf.org.au), Centre for Comparative Genomics in Western Australia (CCG http://ccg.murdoch.edu.au/), Genomics Virtual Laboratory (GVL - https://genome.edu.au/wiki/GVL) and the National ICT Australia e-Health (NICTA http://www.nicta.com.au/business/health/e-health). These groups cover the whole –omics space (from genomics; proteomics; metabolomics; transcriptomics through to whole exome/genome sequencing approaches). This will allow for both a comparative analysis of common bioinformatics approaches, e.g. where the same sample is analysed using the same –omics approach (microarray or mass spectroscopy etc) by different organisations/groups, and where the same biosample is analysed using different complementary –omics approaches, e.g. to get both a more complete understanding of the biology and on the omics approaches themselves and their suitability in an applied clinical context. The raw data; stages of derived/processed data, and the final resultant (analysed) data together with all associated metadata from the omics approaches for a given patient should be made available through the endoVL research environment. Pathology platform including standardised information on cellular information and associated pathological analysis of samples (including imaging). For tumours this will include information such as the Weiss score, number of mitoses, necrosis and information on when the biopsy/surgery was undertaken. Targeted databases will be developed for this purpose. Community forums for communication between the clinical and biomedical groups involved and for wider outreach efforts including to patient information support groups and networks. This will provide summary data on the number of patients included in the clinical databases, information on the clinical trials that are ongoing, and the process for access to and use of the endoVL resources more generally. Linkage / mining of the relevant research results from the international community through facilities such as PubMed and MedLine will be offered. Finally, where appropriate relevant data/metadata will be periodically published into the Human Variome Project Australia Node hosted at the University of Melbourne (HVPA - http://www.hvpaustralia.org.au/). In the above figure, many of the images leverage pre-existing data models and services for these disorders, e.g. DSD and adrenal tumours developed in European projects. Other systems are currently ongoing, e.g. the ADDN national type-1 diabetes infrastructure. The seamless interplay across all of these solutions is essential. III. ENDOVL DATA AND SECURITY CHALLENGES Numerous hurdles must be overcome in the development and maintenance of such a complex environment. The main challenges are outlined in this section. A. EndoVL Patient Linkage and Security Central to the information model used in endoVL is the notion of a unique patient identity. A unique patient index is often used to encapsulate the patient identity – typically within a particular hospital centre. Each hospital has different approaches and solutions to tracking patients internally, and national systems are not yet ubiquitous. Establishing the identity of individuals is critical to maintaining the consistency of information held within the endoVL databases but it must also be decoupled from any actual identity information in use locally, e.g. the name, data of birth, their address, Medicare number and potentially other local institutional identifiers that may be in place. It should also be noted that for almost all of the disease areas under consideration within endoVL, an independent standalone secure database was required, i.e. a database without the requirement for interacting with existing hospital IT systems. For many of the rarer-disorders such as DSD this was quite satisfactory, however for some disorders such as type-1 diabetes this model was unrealistic. There are of the order of 500,000 individuals with type-1 diabetes across Australia [3]. Given this, automating data feeds from hospital settings was mandatory – to avoid the need for double data entry. In this model, patient specific information (name, address, Medicare number…) are kept in the hospital and a unique identifier generated (using a heartbeat counter specific to each centre). Thus the first patient from Westmead Hospital, Sydney, New South Wales, has identifier NSWSYD-WM0001, the second NSW-SYD-WM0002 etc. This unique identifier is kept locally to the hospital and associated with other identifying information. This can be realised in many ways, e.g. a simple hash table or local database, depending upon the needs and experiences of the local hospital IT staff and technologies that they have in place. Lightweight clients are used to run local queries that extract the core identified data and associate them with the unique identifier NSW-SYD-WM0001, before pushing them out through outgoing-only hospital firewalls to the corresponding endoVL server where its digital signature is checked; it is decrypted and subsequently pushed into the database (validating the data as part of this process). This model of pushing data and not allowing incoming connections from the Internet appeases many of the security concerns of the hospital staff involved. An important aspect in this is the continual updates to data feeds where/when patients come to clinics. That is, the endoVL system needs to cope with updates to patient information and the synchronisation issues that this raises. In this case, a small subset of the core type-1 diabetes data needs to be updated based on information collected during the clinics. This includes such as the body-mass index and a range of typically fluctuating measurements and assessments that are captured on patient visits. B. EndoVL Portal Security EndoVL needs to adhere to the highest standards of data security. The endoVL portal is to be provisioned with the Australian Access Federation (AAF – www.aaf.edu.au), which provides federated authentication. However guided by ongoing efforts across Europe, users of the endoVL will also have specific privileges depending on their role within the endocrine disorder(s) efforts. These roles will be used to restrict access to data, tools and resources more generally. The default model will be to deny access, i.e. only individuals with authentic and valid credentials (digitally signed credentials recognised by the endoVL service components), will be able to access and use these resources. A targeted attribute authority will be set up for this purpose. These roles and privileges will be allocated under strict management of the endoVL staff, working in close cooperation with the clinical research communities on the assignment of these roles and privileges to the wider research community. Examples of the security policies that will be adopted (based on projects like IDSD and ENSAT-CANCER) are that data can only be accessed by individuals from the same hospital; only by individuals from the same State (Victoria, NSW etc), or from the same country. Furthermore, certain individuals will be allocated roles that only allow read access to the database(s) whilst others will be allocated read/write access. As noted above, endoVL will not hold the names or addresses of any individuals or any direct information that can be used to identify any patient. Instead cases are identified for the purposes of communication with an automatically generated structured identifier that is unique within the clinical databases and an email address of the associated clinician responsible for the patient. This will leverage international standards and protocols that have been defined through projects such as ENS@T-CANCER on patient and biosample identification and tracking. At no time will a clinician ever be asked to reveal the identity of an individual to any researcher outside of their immediate clinical care environment. Furthermore, the linkage between the patient identifiers in the endoVL and identifiers used for biomaterials will be both distinct and completely separated, i.e. it will never be possible to directly identify (through a given software query or direct observation of a particular software system) that a particular sample comes from a particular patient through use of the endoVL or other related IT system. C. EndoVL Cloud Security In addition to security related to portal-based access, the endoVL project will be hosting a wide range of genomic data including whole genome data sets derived using next generation sequencing technologies. To deal with the encryption and decryption of these data sets in virtualized environments and especially Cloud-based environments as offered through the National eResearch Collaboration Tools and Resources (NeCTAR) project (www.nectar.org.au), and especially their Research Cloud, the project is adopting the CSIRO TrustStore technology [4]. This allows users to create secure storage spaces on the Cloud by defining which storage, key management, and integrity services should be used and subsequently providing credentials to access these services. User profiles are used to support this process. With TrustStore, drag and drop ability is offered to copy files from a local machine to TrustStore storage. The files themselves are fragmented, encrypted, and hashed/signed with the encrypted fragments uploaded to the storage providers, and the keys stored in a TrustStore Key Management Service and signed digests stored in an Integrity Management Service. It is expected that the genomic data itself is itself to be stored in the Leiden Open Variation Database (LOVD3 http://www.lovd.nl/3.0/home). This system has been used in other endoVL related projects including EuroWABB http://www.euro-wabb.org/en/lovd-genetic-variation-database. At present the LOVD3 database has been established and initial experiments conducted in its utility. D. EndoVL Data Flow Functions The seamless movement of patient and/or genomic information is essential in developing systems such as endoVL. At the heart of the data flows is patient and sample tracking. Knowing that a patient is involved in a given clinical study should be used to inform for example whether they should be excluded from other studies. The interconnectedness of patients and their associated information is a key element of endoVL. At the heart of this scheme are patient and study identifiers. Central identifiers used for patient tracking in the endoVL system, e.g. for the core data registries, need to be augmented with identifiers used within particular clinical trials and indeed for tracking of biosamples associated with those patients. Furthermore core registries established as part of endoVL, e.g. for adrenal tumours, should allow both the identification and recruitment of patients for particular clinical trials and studies, as well as the delivery of all relevant information on those patients for those trials and studies. In a similar manner data that is collected throughout the course of a particular clinical trial or study, should in principle (subject to ethics and information governance arrangements associated with the clinical trial) be available to wider collaborating researchers. IV. ENDOVL CASE STUDY BASED ON ADRENAL TUMOURS As identified, to realise the vision of a seamless interconnected virtual research environment for endocrine research across Australia, the endoVL project leverages a body of on going efforts and resources. The ENSATCANCER project is a significant undertaking that focuses on adrenal tumours (one form of neuroendocrine tumour). The ENSAT-CANCER project focuses specifically on four main types of adrenal tumour: aldosterone producing adenomas (APA); pheochromocytomas and paragangliomas (Pheo/PGL); non-aldosterone cortical adrenal adenomas (NAPACA), and adrenocortical carcinomas (ACC). Each of these subtypes has different manifestations; involves different molecular mechanisms and ultimately requires different treatment regimes for optimal patient care. Given the rarity of the above kinds of adrenal tumours, the availability of a large collection samples, with associated clinical, biomedical/-omic data and accompanying treatment information is essential to better understand the differentiators of these tumour types; their aetiology and their associated molecular mechanisms. The ENSAT-CANCER project commenced in 2011 and has since grown to include data on over 4000 patients with one of the above four tumour types; over 27,000 clinical annotations (treatments, surgeries, clinical visit information), as well as offering a source of over 5500 physical biosamples for a range of biomedical research and –omics data analysis (see Figure 2). The system is currently used by 37 different centres/hospitals from around Europe. Given the rarity of the conditions identified in section I, the critical mass of clinical and biomedical data to conduct statistically relevant research is now possible, and indeed is currently taking place. A key goal of endoVL and this paper is to build on the success of projects like ENSAT-CANCER. Figure 2: Summary Data from ENSAT-CANCER To illustrate how this system has evolved to be far more than simply a set of databases, but a truly collaborative research environment, we focus on one specific clinical trial that is currently on-going associated with Pheo/PGL patients. It is noted that a multitude of other genetically targeted clinical trials are now supported through ENSAT-CANCER including ADIUVO (ACC), FIRSTMAPPP (Pheo/PGL), FAMIAN (an imaging study) and a prospective study on recurrence of Pheo for patients with hypertension. It is envisaged that this latter study will run for many years (potentially over 25 years!) A. PMT Study Background The PMT study (Prospective Monoamine Tumour study) is a four-phase clinical trial targeted to patients who exhibit clinical indications of suspected pheochromocytoma through one or more of the following criteria: signs and symptoms; therapy-resistant hypertension; incidental finding on imaging for related condition; routine screening due to known mutation or hereditary syndrome; routine screening due to previous history of pheochromocytoma. Thus far patients have been admitted to the study from a range of specialist clinics across Europe (including currently Dresden, Munich, Wurzburg, Nijmegen and Warsaw). Information on these patients is collected and tracked over time over the full four phases of the trial (see Figure 3). These phases include initial screening of patients, clonidine tests for patients who meet the screening criteria, biochemical characterisation of patients and the subsequent follow-up of patients. This information is to be collected up to five years after the study completes. Plasma chromogranin A Clonidine tests Complications Post-clonidine metanephrines/catecholamines Phase 3: Tumor Characterization Medications Biochemical tests (carried over from phase 2) Ambulatory blood pressure monitoring Cardiovascular tests Echocardiography Electrocardiogram Metabolic tests Imaging tests Figure 3: Clinical Path of Patients in the PMT study At present 865 patients have been recruited to the PMT study (see Figure 4). Phase 4A: Excluded Follow-up Follow-Up Medications Phase 4B: Pheo Follow-up Genetics Unresectable tumor: imaging Resectable tumour: surgery/pathology, postoperative verification One year follow-up: medications, biochemical tests, cardiovascular, blood pressure, echocardiography, electrocardiogram, metabolic Yes Table 1: Sections of PMT study database information Figure 4: PMT Recruitment Status B. Pheo/PGL and PMT Data Flow To support the recruitment (data flow) from the Pheo/PGL database, many of the required data points have been mapped onto the PMT data model directly. With a subset of commonality in the data models, for patients who match the criteria for recruitment to the PMT, many of their core data can be automatically fed into the PMT phase 1 recruitment/screening database (subject, amongst other things, to agreement/consent to participate in the study). The full set of information captured in the PMT study along with their counterpart in the Pheo/PGL database is outlined in table 1. Phase 1: Screening Identification Demographics Medications Screening biochemical tests Signs and symptoms Tumor details Other cardiovascular diseases and malignancies Hereditary PPGL syndromes (genetics) Phase 2: Clonidine Testing Medications Overnight urinary metanephrines 24-hour urinary catecholamines As noted a small but significant overlap of patient data exists, however an equally important factor here is that a critical mass of Pheo/PGL patients existed in the first place to support the recruitment to this study. The actual clinical path (protocol) of patient progress through the PMT study is shown in Figure 5. In Pheo/PGL Yes Yes Yes Figure 5: PMT Phase Status C. Biobanking Support As identified in section II, value added services that should be supported by a virtual research environment include broader data tracking – including tracking of physical biosamples between partner sites. In terms of biobanking, the production of labels is of great significance, along with the ability to provide information that is relevant to all of the participating centers. In this regard processes have been added to the biomaterial forms that allow the specification of samples stored, which studies they are primarily collected for, and the number of aliquots stored (this is stored permanently with the option for modification when it comes to the time of printing). The information captured includes the canonical identifier (including the center code), the biomaterial form number, the aliquot number, and the sample name. This is also captured in barcodes for readers that wish to capture the information electronically. There are many barcode standards to choose from – QR codes were initially tried but rejected due to the two-dimensional nature being unreadable on the curved surface of a typical sample tube. As a result the project has now adopted Interleaved 2 of 5 [7] as the agreed standard (a label example is shown in figure 6). Figure 6: Biomaterial label output with barcode, form and study information (in this case EURINE-ACT study) D. PMT Form Implementation and Adaptation Many of the PMT study forms have relatively complicated user interface requirements – signs and symptoms have a matrix of symptoms versus frequency, duration and severity. Similarly, a common specification throughout the PMT study is the use of multiple units. Units can be input and converted between SI and imperial measures (e.g. pg/mL versus nmol/L). Originally these values were stored in the database in one format – converted from what was originally input and rendered in both units when returned. However, the accuracy loss during conversion in this formatting was deemed unacceptable for clinical purposes and the information is now stored as separate data-points, interchangeable if precision loss is tolerable, but losing none of the accuracy of the original input data units. It should be noted that the range of numbers is large – with values of three significant numbers to three decimal places for measurements of normetanephrine (e.g. 0.003 nmol/L), compared to ranging to five significant numbers for measurements of methoxytyramine (e.g. 50,000 nmol/L), in patients that may be presenting with a pheo tumour. Figure 7: PMT Phase 1 Screening Biochemical Test Matrix Other significant features of forms used in the VRE include the ability to track medication inputs throughout the study – when they are added and when they are removed (important for the interpretation of relevant biochemical results). These are listed in three categories of anti-hypertensives, other prescribed and “over-the-counter” supplements. It is typical that the storage of such information is vast and can quickly become unmanageable. However, a listing is only held for the most important category (anti-hypertensives), which allows this to be searched for standard terms more readily. The existence of the other categories is more important, but the flexibility of free-text is deemed acceptable from a clinical knowledge point of view. Similarly, relevant information from previous phases tracks through the patient’s record in the eCRF. Genetic information is used to interpret data obtained in later phases. Critically, as patients are able to move from phase 1 straight to phase 3 – where a patient has a pheo confirmed without the requirement for supplemental clonidine testing – it is important for the biochemical testing to still be performed. If the patient has gone through phase 2 already then this information is presented as complete in phase 3, but still has the option for manual input for patients that missed phase 2. The biochemical pages are also locked down to be open to only those users in Dresden, the central biochemical processing center. This allows the LS/MS spectrometer information to be standardised to the lead center settings but also interacts with the input/output information captured. A completeness function has also been implemented for each of the forms indicating what information still needs to be input for each phase. The method follows a simple method of green for complete, red for incomplete. The background information that provides this is a survey of the relevant database table, taking into consideration the points that have been marked as optional (in consultation with the clinicians involved). E. Data Output and Analytics Of paramount importance is the ability to use the clinical information for statistical analysis. Whilst it is possible to embed statistical tools within a browser, many researchers prefer to directly download the data sets and analyse them on their local desktop. In this regard, it is necessary to translate a relational database structure into a two-dimensional spreadsheet becomes the main challenge. A critical point here is to have all the information for a single patient on each row. Whilst information is captured that exists in a one-to-many relationship – for instance, the biomaterial forms – must be rendered transversely onto one line (figure 6). Figure 6: Rendering one-to-many forms to a single spreadsheet line The output of such data to these spreadsheets can often be “inelegant” as the size of the single line column width, must be pre-calculated from the size of the single patient that has the most table entries for that entity. The column number, and programmable features of output, also depend on the format selected – comma-separated values (.csv) being the simplest with the ability to render in most simple text editors, .xls Excel files being the next up (with a maximum column width of 65536), and finally the more advanced .xlsx format which can accommodate a much greater number (though legacy issues of whether the researchers have the latest version of Microsoft Office or not may become relevant). The use of this formatting style is particularly important when tracking longitudinal information about treatment and follow-up summaries. For the ACC section of the registry, instances of recurrence, surgery resection status and treatments received, can be summarised in a single page. To export this summary to a similarly formatted spreadsheet, then to track this over time, is a critical function for progressing research into ACC treatments. Often this requires programmatic interfaces that calculate form dates to assess features such as time to recurrence or patient death. The algorithms to complete these are being updated as the study requirements develop. V. CONCLUSIONS The endoVL project is very much ramping up across Australia, however it is based upon established platforms that are used for a wide range of biomedical research endeavours. ENSAT-CANCER has become the central resource for a wide range of major multi-center clinical trials that are currently ongoing across Europe. These include ADIUVO (ACC clinical trial), FIRSTMAPPP (Pheo trial), EURINE-ACT, PMT (Pheo trial) with future studies including FAMIAN (imaging trial) and AVIS in the offing. Similarly, the DSD platform is now well established as the major disorders of sex development platform with a 5-year MRC platform grant to extend the EU EuroDSD system that was build originally. VI. REFERENCES