Data Management seminar 05th October 2011 Gwenlian Stifin & Aude Espinasse South East Wales Trials Unit, Cardiff University DATA MANAGEMENT OVERVIEW Aim of the session: • General understanding of the principles underpinning data management for clinical studies. • Overview of the data cycle in a clinical study. • Overview of data management procedures. BACKGROUND REGULATORY FRAMEWORK Good clinical practice is an international ethical and scientific quality standard for the design, conduct and record of research involving humans. GCP is composed of 13 core principles, of which the following 2 applies specifically to data. BACKGROUND GCP – CORE PRINCIPLES FOR DATA • The confidentiality of records that could identify subjects should be protected, respecting the privacy and confidentiality rules in accordance with the applicable regulatory requirement(s). • All clinical trial information should be recorded, handled, and stored in a way that allows its accurate reporting, interpretation and verification. DATA SEQUENCE WHAT IS A CRF? • A case report form (CRF) is a printed or electronic form used in a trial to record information about the participant as identified by the study protocol. • CRFs allow us to: – record data in a manner that is both efficient and accurate. – Record data in a manner that is suitable for processing, analysis and reporting. KEY QUESTIONS Designing CRFs, key questions: • What data is required to be collected? – Only data we specified in the proposal/protocol. – Only data required to answer the study question. • When will this data be collected? – Baseline / follow-up . • What Forms will need to be designed. • Who is going to collect/complete this form. • Are there validated instruments available? • How is the data going to be analysed. DATA SEQUENCE WHAT IS METADATA? • Metadata is structured data to organise and describe the data being collected. • It is centralized data management. • It is a tool to control and maintain data entities: – Content and variable definitions – Validation rules • Metadata consistently and effectively describes data and reduces the probability of the introduction of errors in the data framework by defining the content and structure of the target data. Metadata File Name of Trial/Study: PAAD (Probiotics for Antibiotic Associated Diarrhoea) - stage 1 Metadata Author: H S Number of Data Collection Forms for Trial/Study: 10 Name of File (Corresponding Data Collection Form): Recruitment CRF 02 Form Variable Format Variable Label Title Data Type Name Length Value Labels Recruitment CRF 02 Linked Validation Condition Type Validation Missing Codes datecons date of consent date sugender service user gender category 1 = Male, 2 = Female 1 consss1 consent for SS1 category 0 = no; 1 = yes 1 dd/mm/yyyy Skip 10 range warn if <01.11.2010 > 01.06.2012 CRF AND DATABASE DESIGN • Study outcomes in protocol define what questions are asked in the CRF. • Database is built to receive data extracted from the CRFs. • Use of validated scales and questionnaires. • Database needs to include querying and reporting tools. •User-friendliness and ease of completion important. •Data needs to be coded into numbers to facilitate statistical analysis. DATABASE DESIGN Database allows for adequate storage of study data and for accurate reporting, interpretation and verification of the data. 2 database systems tend to co-exist alongside one another: • Study management database: personal information, recruitment, data completeness (CRF receipts) follow-up triggers… • Clinical database: clinical information (study outcomes). DATABASE DESIGN Functionalities to consider in both types of database: • Validation rules (Ranges, skips, inconsistencies…). • Queries / report. • Audit trail. TEST/VALIDATE THE DATABASE Check Ranges, Skips, inconsistencies, missing data i.e. what is on your metadata is exactly what is applied when entering the data on the form Check output file for data export (for clinical database) Variable names match up/are all there Coding of categories correct Numbers when alpha required What is on the form is transferred exactly into CSV / SPSS DATA SEQUENCE DATA COLLECTION • Validity of data collection must be ensured. • Source data is identified and data transcribed correctly onto data collection system. • Process of data collection/transcription is audited throughout the process (monitoring – Source data verification). DATA COLLECTION • Before starting data collection – Testing – SOP and PRA – Training • During data collection – Audit TESTING • After set-up, test or pilot the system before you use it. • Maintain an adequate record of this procedure. DATA COLLECTION SOP and PRA • Good idea to write a Standard Operating Procedure or a working practice document detailing how you set up your electronic data capture systems. • The appropriate persons need to be trained in these. • Need to write a Privacy Risk Assessment, this document includes: – Personal data items held in study e.g. name, DOB – Individuals who are granted access to this data – Procedures for colleting, storing, and sharing personal data – How personal data will be anonomised – Identifying possible breaches of confidentiality and how these can be reduced DATA COLLECTION TRAINING • After piloting, when it is working as it should, next step is to train all users of the system • A record should be kept of the training • A detailed diagram and description of how data will be collected should be provided at training. Participant flowchart Participant progress Data collection Woman identified and agrees to be approached Assessed for eligibility and consented Baseline data (CAPI) Randomisation Control Intervention FNP visits & routine antenatal care Routine antenatal care Birth 34 - 36 weeks gestation (CATI) Birth (CRF) 6 month post partum (CATI) 1 year post partum (CATI) FNP visits & usual services Usual services 18 month post partum (CATI) 2 years post partum (CAPI) Key CAPI: Computer Assisted Personal Interview CATI: Computer Assisted Telephone Interview DATA COLLECTION AUDIT • Maintain an audit trail of data changes made in the system. • Procedure in place for when a study participant or other operator capturing data, realises that he / she has made a mistake and wants to correct data. • Important that original entries are visible or accessible to ensure the changes are traceable. ELECTRONIC DATA COLLECTION WHAT IS THIS? Variety of software and hardware now being used to collect data: • • • • • • • PC Laptops mobile devices audio visual email transmission web-based systems ELECTRONIC DATA COLLECTION WHAT IS THIS? • Some of the fundamental issues we have discussed are common to all modes of electronic data collection as well as data collection on paper. • IMPORTANT: There should be no loss of quality when an electronic system is in place of a paper system. ELECTRONIC DATA COLLECTION SPECIFIC TRAINING ISSUES • Training on the importance of security; including the need to protect passwords, as well as enforcement of security systems and processes. • System user should confirm that he / she accepts responsibility for data entered using their password. • Maintain a list of individuals who are authorised to access data capture system and add to PRA. • Ensure that the system can record which user is logged in and when. Timely removal of access no longer required, or no longer permitted. DATA ENTRY • Different types of data entry exist, (manual /optical mark recognition system, online/offline, etc…). • Type of data can also influence the method of data entry (numerical, free text, images etc…). • It is important to have documented procedures (SOPs) defining who is performing data entry and how it is performed. DATA ENTRY • Data entry procedures should be tested at the earlier design stage, and testing adequately documented before sign-off. •Adequate training on these procedures should be provided. •Appropriate quality control procedures have to be set up. ELECTRONIC DATA ENTRY • Electronic entry does not usually have to be a separate ‘data entry phase’, normally entered during collection straight onto an electronic CRF. • Data can be entered straight onto a website, or can be entered onto a laptop and uploaded using the internet onto a server. • When designing forms to collect data electronically you can include ‘validation rules’. An electronic system can stop the Researcher from proceeding with data collection if they break a validation rule. AFTER DATA COLLECTION • Regular backups should be made of your data, if outsourcing data collection or storage ensure that the company have backup systems in place. • After trial has finished using data capture systems, you may need to dispose of these or send them to another company e.g. if they are loaned. Before doing this, you may need to professionally erase the hard drive as it may still contain participant information. • May need to archive whatever data you collect, includes both hard copy and electronic data, documents not archived need to be disposed of securely. COLLECTING DATA SAFELY • The safe collection of data in clinical trials is essential for compliance with Good Clinical Practice (CPMP/ICH/GCP/135/95) and the Data Protection Act 1998. • Because of increased use of information technology in the collection of trial data there is a need to have clear guidance on how to safely collect data in this manner. • Need to protect your data capture systems from loss or unauthorised access, at the same time ensuring that it is accessible to those who need it. COLLECTING DATA SAFELY ………CONTINUED • Need to protect participants’ identity by using Participant Identifiers (PID). PID’s should be used when communicating with other trial team members. • Electronic info particularly vulnerable to security threats: – can be physically accessed. – could be loss or damage to computer. – can be remotely accessed through internet or virus. • For each tool that you use to collect data, must ensure that system is password protected and encrypted. DATA SEQUENCE DATA CLEANING • Errors / inconsistencies / missing data spotted at different time points depending on the study and methods used. • Errors should be corrected where possible, but no changes should be made without proper justification. • Appropriate audit trails should be kept to document changes in the data (queries form, SPSS syntax…). DATA CLEANING Data manager cleans and validates data entered in the database Problems found such as missing values or inconsistencies Corrections are entered onto the Database Yes Data manager checks queries resolution Queries addressed to sites Site resolves and sends back the queries No Data validated REPORTING DATA • Throughout the course of the study it is usually the responsibility of the Data Manager to report on study progress, these kinds of reports include: • Recruitment progress • Follow-up rates • SAEs • Data completeness • Withdrawals Thank you! Any questions?