Supplementary Material MAGIC-SPP: a database-driven DNA sequence processing package with associated management tools Chun Liang1, Feng Sun1, Haiming Wang1, Junfeng Qu2, Robert M. Freeman, Jr.3, Lee H. Pratt1 and Marie-Michèle Cordonnier-Pratt1 1 Laboratory for Genomics and Bioinformatics Department of Plant Biology University of Georgia Athens, GA 30602, USA 2 Department of Computer Science University of Georgia Athens, GA 30602, USA 3 Department of Systems Biology Harvard Medical School Boston, MA 02115 14 February 2006 Trace Files MAGIC-SPP MAGIC POLYMORPHISM MAGIC-CLUSTER MAGIC ANNOTATION MAGIC MICROARRAY Figure 1: Relationship between MAGIC-SPP and other components of the complete MAGIC system. Information about sequence reads obtained from trace files processed through MAGIC-SPP is stored in MAGIC DB. While MAGIC-SPP can be used as a stand-alone package, it also provides input for four other major subsystems: annotation, EST clustering, detection of sequence polymorphisms, and MIAME-compliant microarrays. These downstream subsystems have been designed to benefit from features unique to MAGIC-SPP, for example the inclusion of genotype information in sequence names. V2_CONTACT CONTAC T_ID_N UM LAST_NAME FIRST_N AME MIDDLE_NAME INITIALS TITLE FAX TEL EMAIL LAB_ID (FK) INSTITUTION ADDRESS DB_ACCESS V2_LAB_PI LAB_ID PI_CONTACT_ID_NUM (FK) Project Author V2_LAB : MAGIC_SPP (Figure 041216) LAB_ID : Lee Pratt (print@50% ) Com pany : Univers ity of Georgia Vers ion : 1.0 V2_USER Modified: 2004/12/17 V2_LIB_SEQ_NAME_FORMAT_PART LAB_NAME USERID Copyright (c) 2004 Univers ity of Georgia V2_FIELD_VALUE_DEFINITION V2_TABLE_LIST V2_ORACLE_SEQ_LIST TABLE_ID TABLE_ID (FK) ORA_SEQ_NAME TABLE_NAME DEFINITION COMMENTS STATUS COLUMN _N AME TABLE_NAME FIELD_N AME TABLE_NAME (FK) VALUE_GRIC ASSO_GRIC ASSO_TABLE_NAME ASSO_FIELD_NAME COMMENTS POSSIBLE_VALUE DEFIN ITION NAME4SELECT SEQ_NAME_FORMAT_ID (FK) PART_INDEX PART_NAME PART_FIX_VALUE PART_LENGTH TAG_IS_NUM V2_TEMPLATE_PLATE CONTAC T_ID_N UM (FK) PASSWORD ALLOW_AD MIN CATEGORY_ID (FK) V2_FIELD_VALUE_ASSOCIATION TEMPLATE_PLATE_ID V2_ALIAS_ENTRY_NAME TEMPLATE_PLATE_FORMAT TEMPLATE_PLATE_NAME PLATE_N UM PCR_TAG LIB_NAME GT_C OMBINE_CODE USER_PLATE_NUM USER_QUADRANT QU ADRANT_CONVENTION TABLE_NAME (FK) FIELD_NAME (FK) VALU E_GRIC (FK) ALIAS_ENTRY_ID ALIAS_PLATE_ID (FK) ORIGINAL_NAME ORIGINAL_CLONE ORIGINAL_WELL ORIGINAL_PRIMER ORIGINAL_INJECTION ORIGINAL_CAPILLAR Y V2_FIELD_DEFINITION FIELD_NAME TABLE_NAME (FK) V2_LIBRARY_USER_PERMISSION LIB_CODE (FK) USERID (FK) ENTRY_ALLOWED VIEW_ALLOWED V2_ALIAS_PLATE V2_TEMPLATE_W ELL V2_CONFIG_PLATE COMMENTS POSSIBLE_VALUE DEFINITION TEMPLATE_WELL_ID CONFIG_PLATE_ID V2_BACTERIA_PLATE_CONFIG LAB_ID (FK) USER_CONFIG_NAME USERID (FK) CONFIG_DATE PLATE_FORMAT TAG_CONFIG_COMPLETE V2_BACTERIA_PLATE384 CONFIG_PLATE_ID GRIC BACTERIA_PLATE96_ID DNA_PLATE96_ID V2_BUFFER_COMPOSITION DNA_PLATE96_BARCODE BACTERIA_PLATE96_ID (FK) SUPT_USERID (FK) SUPT_START_DATE SUPT_START_TIME SUPT_DR OP_DATE SUPT_STATUS SUPT_COMMEN TS_ID SUPT_COMMEN TS DNA_USERID (FK) DNA_START_DATE DNA_START_TIME DNA_DR OP_DATE DNA_STATUS DNA_COMMEN TS_ID DNA_COMMEN TS STUDEN T_ID (FK) BUFFER _ID BUFFER _GRIC BUFFER _NAME BUFFER _BARCOD E CHEMICAL_ID CHEMICAL_GRIC CHEMICAL_NAME VENDOR CATALOG_NUM LOT_NU M STOCK_ID STOCK_NAME STOCK_GRIC USER ID (FK) MAKE_D ATE COMMENTS UNIT_NUM_SERIES_ID IN CREMENTAL_LEVEL LAB_ID (FK) PROJEC T_ID (FK) PI_ID LAST_VALUE DATE_FOR _LAST_VALUE UNIT_NAME COMMENTS UNIT_NUM_ID V2_LIMS_TO_UNIT_NUM UNIT_NUM_ID (FK) BACTERIA_PLATE96_ID (FK) BACTERIA_PLATE384_ID BACTERIA_PLATE384_QUADRAN T BARC ODE_BACTERIA_PLATE384 BARC ODE_BACTERIA_PLATE96 UNIT_FORMAT CREATION _DATE UNIT_NUM_SERIES_ID (FK) UNIT_NUM UNIT_FORMAT UNIT_TYPE TAG_PLATE_COMPLETED CONFIG_COMPLETE V2_TH_INSTRUMENT IN STRUMENT_ID IN STRUMENT_NAME MAX_SLOT_N UM COMMENTS IN STRUMENT_LOCATION TH_PLATE_BARCODE TH_PLATE_FORMAT IN STRUMENT_ID (FK) IN STRUMENT_BLOCK ENTR Y_DATE DROP_DATE STATUS COMMENT_ID COMMENTS STUD ENT_ID USERID TH_PLATE_NAME EN TRY_TYPE ALIAS_LIB_ID (FK) ALIAS_LIB_N AME ORIGINAL_PLATE_NAME ORIGINAL_PLATE_FORMAT ORIGINAL_PR OCESS_TYPE V2_CONFIG_WELL V2_LIBRARY_CATEGORY_PERMISSION LIB_CODE (FK) C ATEGOR Y_ID (FK) ENTRY_ALLOWED VIEW_ALLOWED V2_SEQ_PROJECT_LAB SEQ_PROJECT_ID (FK) LAB_ID (FK) V2_SEQ_PROJECT SEQ_PR OJECT_ID SEQ_PROJEC T_ID (FK) LIB_CODE (FK) PROJEC T_NAME PROJEC T_DESC COMMENTS LIB_NAME GT_COMBINE_CODE LIB_ADD_DATE V2_VECTOR VECTOR_ID VECTOR_NAME VECTOR_SEQU ENCE1 VECTOR_SEQU ENCE2 VECTOR_TYPE VECTOR_COMMENT V2_PRIMER_VECTOR PRIMER_ID (FK) VECTOR_ID (FK) VECTOR_ORIENTATION OLD_EXP_VECTOR _LEN EXP_VECT_LEN_SEGMENT_A EXP_VECTOR_LEN LIB_CODE (FK) VECTOR _ID (FK) HOST VECT_ADAPT_COMB_SEQ1 VECT_ADAPT_COMB_SEQ2 FILE_NAME FILE_LOCATION DESCRIPTION COMMENTS RESTRICTION_SITE1 RESTRICTION_SITE2 CLONE_ENZ1_VECTOR_ORI CLONE_ENZ2_VECTOR_ORI CLONE_ENZ1_EST_D IR CLONE_ENZ2_EST_D IR FOR_DIR_ADAPTOR_SEQ_START FOR_DIR_ADAPTOR_SEQ_END REV_DIR_ADAPTOR_SEQ_START CLONE_ENZ1_NAME CLONE_ENZ2_NAME CLONE_ENZ1_SEQ CLONE_ENZ2_SEQ FOR_DIR_ADAPTOR_SEQ_START_FILE FOR_DIR_ADAPTOR_SEQ_END_FILE REV_DIR_ADAPTOR_SEQ_START_FILE REV_DIR_ADAPTOR_SEQ_END_FILE FOR_DIR_ADAPTOR_SEQ_START_PAR FOR_DIR_ADAPTOR_SEQ_END_PAR REV_DIR_ADAPTOR_SEQ_START_PAR REV_DIR_ADAPTOR_SEQ_END_PAR REV_DIR_ADAPTOR_SEQ_END FOR_EXP_VECT_LEN_SEG_B_C REV_EXP_VECT_LEN_SEG_B_C PR IMER_ID V2_MASTER_MIX_PREP PR IMER_ID (FK) PR IMER_NAME DDH2O_U L DMSO_UL BU FFER5X_UL PR IMER_VOLUME_UL BIGDYE_UL V2_PRIMER_INSERT PRIMER_ID (FK) PRIMER_NAME DIRECTION V2_SEQUENCING_PLATE SEQUENCING_PLATE_ID V2_SEQUENCING_UNIT_PLATE96 INSTRUMENT_NAME INSTRUMENT_LOCATION COMMENTS TAG_ACTIVE V2_SEQUENCER_RUN_PROFILE_ASSO SEQU ENCER_RUN _ID PLATE_LOC ATION PLATE_NAME PLATE_QU ADRANT_NUM PR IMER_NAME PR IMER_SEQU ENCE NUM_OF_MERS TM LOOP SELF_DIMER TEMPLATE_NAME DESIGN_PROGRAM RECORD_USER ID (FK) CREATE_DATE PR IMER_COMMENT PR IMER_TYPE TEMPLATE_FILE_NAME TEMPLATE_FILE_LOCATIO STR AND PR IMER_ORIENTATION TH_CLEAN_UNIT_PLATE_ID (FK) BACTER IA_PLATE96_ID (FK) SEQU ENCING_PLATE_ID (FK) SEQU ENCING_PLATE_QUADRANT USER ID (FK) ENTRY_DATE DROP_DATE STATUS COMMENT_ID COMMENTS STUDENT_ID (FK) RUN _MODULE PRIMER_ID (FK) ALIAS_GRIC PRIMER_NAME ALIAS_PRIMER_ID ALIAS_PRIMER_N AME ALIAS_LIB_ID (FK) COMMENTS TAG_MULTI_ATGC TAG_MULTI_A TAG_MULTI_T TAG_MULTI_G TAG_MULTI_C RATIO_MULTI_A RATIO_MULTI_T RATIO_MULTI_G RATIO_MULTI_C ALIAS_NAME V2_SEQ_RRNA_SSAHA SSAHA_ID SEQ_NAME (FK) QUAL_PROC ESS_ID (FK) F_OR_R Q1 Q2 SUBJECT S1 S2 Q2_Q1_DIFF S2_S1_DIFF Q16V_LENGTH SLENGTH MATCH_LEN PERCENT_ID M_QUE_PER CENT M_SUB_PERCENT PASS_CRITERIA U PLOAD_DATE U PLOAD_TIME ALIAS_NAME VECTOR_LEN VECTOR_RATIO VECTOR1_START VECTOR1_STOP VECTOR1_LEN VECTOR2_START VECTOR2_STOP VECTOR2_LEN ALIAS_NAME V1STOP_MINUS_Q16_START_PLUS_1 Q16_STOP_MINUS_V2START_PLUS_1 VECTOR1_IN _Q16 VECTOR2_IN _Q16 VECTOR_Q16_RATIO CHROMAT_FILE_NAME BLOCK_ID_NU M (FK) UPLOAD_TIME UPLOAD_DATE LIB_NAME (FK) GT_COMBINE_CODE (FK) CHROMAT_STATUS FTP_TIME LOCATION_DIR FINAL_DIR FILESIZE ORIGINAL_NAME RUN_FOLDER RUN_FOLDER_ID (FK) CAPILLAR Y ALIAS_NAME EXTERNAL_C HROMAT V2_PUBLICATION GT_GEN_C ODE GT_GEN_N AME SP_CODE (FK) GT_GEN_C OMMENTS GENOTYPE_GENERAL_TYPE INBREEDING_STATUS GT_CODE (FK) HETERO_VAL_LEVEL RATIO_OF_GEN OTYPES_USED V2_GENOTYPE V2_LIBRARY_GENOTYPE_RATIOS GT_COD E LIB_NAME (FK) GT_COMBINE_CODE (FK) GT_CODE (FK) GT_NAME SOURCE_CONTACT_ID (FK) GT_COMMENTS GT_GEN_CODE (FK) INDIVID UAL_ID INDIVID UAL_NAME POPULATION_MIX_VALUE GT_RATIO V2_SAME_GT_CLONE SAME_GT_CLON E_ID GT_CODE (FK) INDIVIDU AL_C LONE_D ESCRIPTION INDIVIDU AL_C LONE_N AME EXT_DB_ID EXT_CLON E_ID CLONE_SOUR CE_LAB_ID CLONE_NAME CLONE_ALIAS_NAME CLONE_AKA_NAME CLONE_SOUR CE CLONE_ORIGIN _LOCATION CLONE_TYPE TAG_IN SERT_INVERTED TAG_COMPLETE_SEQUENCE TAG_FULL_LEN _CODING TAG_SEQ_B_GOOD TAG_SEQ_G_GOOD GT_COMBIN E_COD E CLONE_STATUS TAG_ARRAY CLONE_MAGIC_SEQ_ID EXTERNAL_CLONE_ID EXTERNAL_CLONE_DB_ID LIB_NAME SEQ_NAME (FK) QU AL_PROCESS_ID (FK) TITLE JOURNAL_NAME START_PAGE STOP_PAGE AUTHORS YEAR STATUS VOLUME_NUM AUTHOR_COMBIN E_ID BLOCK_ID _N UM (FK) LIB_NAME (FK) GT_C OMBINE_CODE (FK) CLONE_NAME SEQUENC ED_PLATE96_NAME BLOCK_NUM WELL_ID VECTOR_ORIENTATION REPLICATE_NUM TAG_C OUNT TAG_SEQ_ACTIVITY TAG_PROCESS_ACTIVITY TAG_POLYT TAG_VECTOR_F1 TAG_VECTOR_F2 TAG_TOTAL_VECTOR TAG_ECOLI TAG_R RNA TAG_C HLORO TAG_MITO TAG_FAIL_IN_80 TAG_POT_DYEBLOB TAG_JOIN_GAP TAG_PUBLIC_VIEW TAG_Q16VS_100 TAG_GB_DEP_AUTH ORIZED TAG_GB_DEP_STATUS TAG_OVERLAP RAW_SEQ_LEN GTH NU M_GAP_5_9_Q16VS NU M_GAP_10P_Q16VS NU M_GAP_5_9_RAW NU M_GAP_10P_RAW NU M_OF_N NU M_OF_Q16_Q16VS NU M_OF_Q20_Q16VS NU M_OF_Q16_RAW NU M_OF_Q20_RAW Q16VS_STAR T Q16VS_LENGTH Q16VS_STOP Q16V_START Q16V_LENGTH Q16V_STOP Q16_START Q16_LENGTH Q16_STOP UPLOAD_DATE UPLOAD_TIME COMMEN T_ID (FK) COMMEN TS TAG_PAIRED_GOOD_ENDS TAG_METHOD_TEST TAG_MULTI_ATGC ALIAS_NAME USER_AKA_NAME MAGIC_SEQ_ID (FK) V2_PLATE96_SEQUENCING_NAME SEQ_NAME (FK) QUAL_PROCESS_ID (FK) GB_DEPOSITION_DATE GB_ACCESS_NUM GB_GI_NUM DB_EST_ID GB_DEPOSITION_EVENT_STATUS GB_SEQU ENCE MAGIC_SEQ_ID BLOCK_ID_NUM V2_UPLOAD RUN_FOLDER_ID RUN_FOLDER_NAME BLOCK_ID_NU M (FK) FOLDER_FORMAT SEQUENCING_PLATE_FOR MAT NUM_WELL_USED LIB_TYPE STATUS ENTRY_DATE ENTRY_TIME PROCESS_DATE PROCESS_TIME TAG_RENAME_FORMAT COMMENTS COMMENT_ID (FK) RUN_FOLDER_NAME_MULTIPLE RUN_FOLDER_TYPE V2_W ELLPLATE96_SEQUENCING_NAME SEQ_ID BLOCK_ID_NUM BLOCK_TYPE BLOCK_NUM (FK) ALIAS_EN TRY_ID TEMPLATE_WELL_ID REPLICATE_NUM (FK) PRIMER_ID PLATE96_WELL_N AME PLATE384_WELL_N AME SEQ_NAME ALIAS_NAME VECTOR _ORIENTATION (FK) LIB_NAME (FK) LIB_TYPE GT_COMBIN E_CODE (FK) TEMPLATE_C LONE PRIMER_NAME PCR_TAG UNIT_NUM_ID BARCODE_384 BARCODE_QU AD RANT_384 BACTERIA_PLATE96_ID LIMS_LIB_NAME MAGIC_PROC_LIB_NAME MAGIC_PROC_GT_COMBIN E_COD E MAGIC_TEMPLATE_C LONE_N AME MAGIC_PROC_LIB_TYPE V2_SEQ_QVS_ADD_INFO SEQ_NAME (FK) QUAL_PROCESS_ID (FK) SEGMENT_ORDER QU AL_PROC ESS_ID SEQ_NAME ALIAS_NAME CLONE_NAME SEQ_TYPE CLONE_ID (FK) AKA_NAME TAG_PHD SEQ_ORIGIN CONTIG_ID CLU STER_CATEGORY CLU STER_RUN_ID ALIAS_CONTIG_ID COMMEN T_ID EXT_SEQ_DB_ID PROCESS_TYPE V2_CLONE_SEQ_ASSOCIATION C LONE_ID (FK) MAGIC _SEQ_ID (FK) TAG_Q16VS_100 TAG_DIR_C ONTROL_SELECTED SEQ_4_CLONE_STATUS EST_DIRECTION V2_GENUS_SCREEN_DEF GENUS_CODE (FK) GENUS_NAME MITO_TYPE C HLORO_TYPE R RNA_TYPE ECOLI_TYPE V2_ECOLI_SCREEN_PROCESS EC OLI_PROCESS_ID EC OLI_FILE_NAME SSAHA_PAR_FILE_NAME SSAHA_PAR_CONTENT DATE_START COMMENTS PR OGRAM_ID (FK) PR OGRAM_GRIC (FK) V2_MITO_SCREEN_PROCESS MITO_PROCESS_ID TABLE_NAME U SERID (FK) ENTRY_DATE C OMMENT_CONTENT MITO_FILE_NAME SSAHA_PAR_FILE_NAME SSAHA_PAR_C ONTEN T DATE_START COMMENTS PROGR AM_ID (FK) PROGR AM_GRIC (FK) V2_QUAL_PROCESS_DEF QUAL_PROCESS_ID QA_SC RIPT_VERSION CREATE_DATE COMMENTS ECOLI_PROCESS_TYPE_1_ID (FK) ECOLI_PROCESS_TYPE_2_ID (FK) ECOLI_PROCESS_TYPE_3_ID ECOLI_PROCESS_TYPE_4_ID RRNA_PROCESS_TYPE_1_ID (FK) RRNA_PROCESS_TYPE_2_ID (FK) RRNA_PROCESS_TYPE_3_ID (FK) RRNA_PROCESS_TYPE_4_ID (FK) MITO_PROCESS_TYPE_1_ID (FK) MITO_PROCESS_TYPE_2_ID (FK) MITO_PROCESS_TYPE_3_ID (FK) MITO_PROCESS_TYPE_4_ID (FK) CHLORO_PROCESS_TYPE_1_ID (FK) CHLORO_PROCESS_TYPE_2_ID (FK) CHLORO_PROCESS_TYPE_3_ID (FK) CHLORO_PROCESS_TYPE_4_ID (FK) BASEC ALL_PROGRAM_ID (FK) BASEC ALL_GRIC (FK) DESCRIPTION VECTOR_FILENAME_EXT_TYPE V2_QUAL_PROCESS_UPDATE QUAL_PROCESS_ID (FK) COMMENT_ID (FK) BLOCK_ID_NUM (FK) UPLOAD _DATE (FK) UPLOAD _TIME (FK) PHD_TIME_STAMP SEQUENCE QUALITY_VALUES1 QUALITY_VALUES2 ALIAS_NAME V2_SEQ_POLYT_SEGMENT MAGIC_SEQ_ID C OMMENT_ID SEQ_NAME (FK) QUAL_PR OCESS_ID (FK) SEQ_N AME (FK) QUAL_PROCESS_ID (FK) V2_MAGIC_SEQUENCE V2_COMMENTS V2_SEQ_BASE_QUALITY A_NUM T_NUM G_NU M C_NU M TOT_BASE_NUM_QVS GC_R ATIO N_NU M ALIAS_N AME V2_GENOTYPE_GEN GT_COMBIN E_COD E (FK) GRIC V2_SEQ_QUAL_STATS PUBLICATION_ID_NUM GB_DEPOSITION_ID_NUM CHROMAT_FILE_ID SEQ_NAME (FK) QUAL_PROCESS_ID (FK) SEQ_NAME (FK) QUAL_PROCESS_ID (FK) V2_GENOTYPE_ASSOCIATION CLONE_ID V2_SEQUENCE_GENBANK V2_CHROMAT V2_SEQ_VECTOR_SCREEN V2_SEQ_QVS_MULTI_ATGC SP_NAME SP_COMMON_NAME GEN US_CODE (FK) SP_COMMENTS TAXON_ID LIB_NAME (FK) GT_COMBINE_CODE (FK) BLOCK_NUM ID_CH ECK_STATUS D IRECTION_G_GOOD D IRECTION_G_BLOCK_ID_N UM (FK) D IRECTION_B_GOOD D IRECTION_B_BLOCK_ID_NUM (FK) C OMMENT_ID LIB_NAME (FK) GT_COMBINE_CODE (FK) BLOCK_NUM VEC TOR_ORIENTATION REPLICATE_N UM QUAL_PROCESS_ID ID_CHECK_PASSED PERCENT_POLYT_GOOD_3P_SEQ AVG_POLYT_LEN_GOOD_3P_SEQ TAG_STUDENT_D EPOSITION STU DENT_PUBL_ID_N UM (FK) OVERLAP_GOOD_SEQ_NUM NUM_PAIRED_GOOD_EN DS PERCENT_OVERLAP_GOOD_EN DS NUM_GOOD_SEQS PERCENT_SU CCESS COMMENTS V2_ALIAS_PRIMER_MAPPING LIB_TYPE LIB_NAME (FK) GT_COMBINE_C ODE (FK) BLOCK_NU M VEC TOR_ORIEN TATION REPLICATE_NUM UPLOAD_DATE UPLOAD_TIME BLOCK_TH_STATUS BLOCK_UPLD_STATUS FOLDER_MAPPING SEQUENCING_UNIT_PLATE_ID (FK) PR IMER_ID (FK) STU DENT_ID (FK) PR OCESSING_TYPE BLOCK_FOR MAT SEQU ENCING_U NIT_PLATE_ID RU N_DATE PLATE_384_NU M USERID (FK) PLATE_A_NAME PLATE_B_NAME PLATE_C _NAME PLATE_D _NAME PLATE_A_QU ADRANT_NUM PLATE_B_QUADRAN T_NUM PLATE_C _QUADRANT_NUM PLATE_D _QUADRANT_NUM ADAPTOR_SEG_START ADAPTOR_SEG_STOP ADAPTOR_SEG_LEN COMPLEMENT ADAPTOR_TOT_NUM ADAPTOR_SEG_TYPE PROCESS_DATE PROCESS_TIME ALIAS_NAME SP_CODE SOURCE_CONTACT_ID (FK) CREATE_D ATE RECORD_USERID (FK) MULTIPLE_INDIVIDUALS_SAME_GT COMMENTS V2_CLONE SEQUENC ER_RUN _ID SEQ_NAME (FK) QUAL_PROCESS_ID (FK) ADAPTOR_ENZYME ADAPTOR_SEG_ORD ER GT_COMBINE_CODE PROCESS_BLOCK_ID V2_SEQUENCER_RUN_PROFILE V2_SEQ_ADAPTOR_SEGMENT V2_SPECIES V2_PROCESS_PLATE96_DIR_CONTROL BLOCK_ID_NUM (FK) INSTRUMENT_ID GENUS_NAME GENUS_COMMON GENUS_COMMENTS DIR ECTORY_N AME V2_GT_COMBINE_CODE LIB_PRIMER _ID LIB_CODE (FK) PRIMER_ID (FK) EXP_VECTOR_LEN VECTOR_ORIENTATION EST_DIRECTION TAG_POLY_T_SC REEN TAG_ACTIVE COMMENT_ID COMMENTS OLD_EXP_VECTOR_LEN V2_PRIMER BACTERIA_PLATE96_ID (FK) TH _UNIT_PLATE_ID (FK) TH _CLEAN_PLATE_ID (FK) TH _CLEAN_PLATE_QUADRANT U SERID (FK) ENTRY_DATE D ROP_D ATE STATUS C OMMENT_ID C OMMENTS STUDENT_ID (FK) SEQUENCING_PLATE_BARCODE PLATE_FORMAT INSTR UMEN T_ID (FK) ENTRY_DATE DROP_DATE STATUS COMMENT_ID COMMENTS STUDEN T_ID SEQUENCING_PLATE_NAME TAG_PLATE_RECORD_FILE SEQUENCER_RUN_ID (FK) SEQUENCER_PLATE_LOCATION TAG_PLATE_RECORD_FINISHED USERID (FK) PLATE_LOCATION (FK) LIB_NAME LIB_TYPE GT_COMBIN E_COD E (FK) GB_DEPOSIT_AUTHOR IZATION PUBLIC_VIEW_AUTH ORIZATION ENTRY_ALLOWED DESCRIPTIVE_LIB_NAME CONTACT_ID_NUM_SOURCE (FK) CONTACT_ID_NUM_SUBMITTER (FK) PUBLICATION_ID _NUM (FK) GB_FIELD_NAME_FOR_GEN OTYPE SEX ORGAN TISSU E CELL_TYPE CELL_LINE STAGE EXPECTED_3PRIME_DIR EXPECTED_5PRIME_DIR EST_DIR_FOR_B EST_DIR_FOR_G NUM_BLOCK_R EQUEST_FOR_B NUM_BLOCK_R EQUEST_FOR_G NUM_SEQ_REQUEST_FOR_B NUM_SEQ_REQUEST_FOR_G DESCRIPTION COMMENT_FOR_PROCESS LIB_RECORD_USER ID (FK) LIB_CREATION_DATE PICK_384_REQU EST SEQ_PERMIT_EXCEPTION RNA_ID GB_DEPOSIT_STATUS SEQ_NAME_FORMAT_ID (FK) CHROMAT_NAME_FORMAT_ID BLOCK_SERIES_ID ALIAS_LIB_ID LIMS_LIBRAR Y_ONLY UNIT_NUM_SERIES_ID V2_LIBRARY_PRIMER V2_PROCESS_PLATE96_PROFILE_GB V2_SEQUENCER_INSTRUMENT LIB_CODE ALIAS_LIB_NAME ALIAS_LIB_TYPE LAB_ID LAB_PI_USERID ENTRY_DATE DESCRIPTION SAME_GT_CLONE_ID (FK) GT_RATIO TH _CLEAN_U NIT_PLATE_ID TH _CLEAN_PLATE_BARCOD E PLATE_FORMAT ENTRY_DATE D ROP_D ATE STATUS C OMMENT_ID C OMMENTS STUDENT_ID U SERID TH _CLEAN_PLATE_N AME GENUS_CODE V2_LIBRARY ALIAS_LIB_ID (FK) LIB_NAME GT_COMBINE_CODE GRIC V2_TH_CLEAN_UNIT_PLATE96 TH _CLEAN_PLATE_ID V2_GENUS V2_ALIAS_LIBRARY LIB_VEC TOR_ID BACTERIA_PLATE96_ID (FK) DN A_PLATE96_ID (FK) TH_PLATE_ID (FK) TH_PLATE_QUADRANT LIB_NAME (FK) GT_C OMBINE_CODE (FK) BLOCK_NUM REPLICATE_NUM VECTOR_ORIENTATION PHASE_NUM USERID (FK) PRIMER _ID (FK) ENTRY_DATE DR OP_DATE STATUS CH EMISTRY CH EMISTRY_DILU TION COMMEN T_ID COMMEN TS STUDENT_ID (FK) V2_TH_CLEAN_PLATE SEQ_N AME_FOR MAT_NAME TOT_PART_NUMBER LIB_N AME_INDEX BLOCK_NUM_INDEX WELL_ROW_IND EX WELL_COL_INDEX VECTOR_ORIENTATION_IN DEX GT_COMBINE_CODE_IN DEX REPLICATE_NUM_INDEX TAG_UPSTREAM TAG_GT_C OMBINE_CODE TAG_REPLICATE_NUM PARSE_DIRECTION VECTOR_ORI_FORWARD_NAME VECTOR_ORI_REVER SE_NAME SEQ_N AME_REGULAR_PATTER N PRIMER_ID_INDEX TEMPLATE_CLONE_INDEX CAPILLARY_IN DEX FORMAT_TYPE INJECTION_INDEX FILE_EXTENSION_INDEX V2_LIBRARY_VECTOR TH_U NIT_PLATE_ID TH_PLATE_ID CATEGORY_NAME SEQ_N AME_FOR MAT_ID V2_SAME_GT_CLONE_IN_LIB V2_TH_UNIT_PLATE96 V2_TH_PLATE CATEGORY_ID V2_SEQ_PROJECT_LIB V2_UNIT_NUM_SERIES V2_UNIT_NUM_LOG BACTERIA_PLATE96_BAR CODE BACTERIA_PLATE384_QUADRAN T BACTERIA_PLATE384_ID (FK) LIB_NAME (FK) GT_COMBINE_CODE (FK) BLOC K_NUM BACTERIA_INOCULATION_DATE BACTERIA_INOCULATION_USERID (FK) BACTERIA_GROWTH_START_TIME BACTERIA_GROWTH_START_DATE BACTERIA_GROWTH_START_USERID (FK) BACTERIA_GROWTH_STOP_TIME BACTERIA_GROWTH_STOP_DATE BACTERIA_GROWTH_STOP_USERID (FK) TE_RN ASE_ID SDS_NAOH_ID KOAC _3M_ID LYSIS_START_TIME LYSIS_START_DATE LYSIS_END_TIME LYSIS_USERID (FK) STATUS DROP_DATE PHASE_NUM BG_START_COMMENT_ID BG_START_COMMENTS BG_STOP_COMMENT_ID BG_STOP_COMMENTS LYSIS_COMMENT_ID LYSIS_COMMENTS LYSIS_END_D ATE PROTOCOL_ID STUD ENT_ID (FK) RED O_NUM V2_CATEGORY CONFIG_PLATE_ID (FK) ALIAS_ENTRY_ID TEMPLATE_WELL_ID (FK) PCR_TAG WELL_NAME ALIAS_TEMPLATE_CLONE_NAME MAGIC_TEMPLATE_CLONE_NAME MAGIC_PROC_LIB_TYPE MAGIC_PROC_LIB_NAME (FK) MAGIC_PROC_GT_COMBINE_CODE (FK) LIMS_LIB_NAME LIMS_GT_COMBINE_CODE BACTERIA_PLATE384_BARCODE BACTERIA_PLATE384_LETTER_CODE BACTERIA_PLATE384_STATUS LIB_NAME (FK) GT_COMBINE_CODE (FK) PICKING_D ATE U SERID (FK) N UMBER_OF_COPY Q1 Q2 Q3 Q4 AVAILABLE_COUNT C OMMENT_ID C OMMENTS PICK_METHOD STUDENT_ASSIGN (FK) BACTERIA_PLATE384_ALIAS_NAME V2_BACTERIA_PLATE96 V2_DNA_PREP_PLATE96 TEMPLATE_CLONE_NAME TEMPLATE_PLATE_ID (FK) WELL_NAME PCR_TAG CONFIG_WELL_ID BACTERIA_PLATE384_ID BACTERIA_PLATE_ID (FK) BACTERIA_PLATE_FOR MAT BACTERIA_PLATE_AVAILABILITY ALIAS_PLATE_ID V2_LIB_SEQ_NAME_FORMAT V2_RRNA_SCREEN_PROCESS RRNA_PROCESS_ID RRNA_FILE_NAME SSAHA_PAR_FILE_NAME SSAHA_PAR_CONTENT DATE_START COMMENTS PROGRAM_ID (FK) PROGRAM_GR IC (FK) V2_CHLORO_SCREEN_PROCESS CHLORO_PROC ESS_ID CHLORO_FILE_NAME SSAHA_PAR_FILE_NAME SSAHA_PAR_CONTENT DATE_START COMMENTS PR OGRAM_ID (FK) PR OGRAM_GRIC (FK) V2_SEQ_GAP_10PLUS_SCREEN SEQ_GAP_10PLUS_ID SEQ_NAME (FK) QUAL_PROCESS_ID (FK) GAP_10_POSITION GAP_10_SIZE WIN_SIZE THRESH OLD TAG_GAPJOIN UPLOAD_DATE UPLOAD_TIME ALIAS_NAME SEGMENT_TOT_NUM SEGMENT_START SEGMENT_LENGTH V2_SEQ_GAP_5_9_SCREEN SEQ_GAP_5_9_ID V2_SEQ_MITO_SSAHA SSAHA_ID V2_SEQ_POLYT_GAP SEQ_NAME (FK) QUAL_PR OCESS_ID (FK) F_OR_R Q1 Q2 SUBJECT S1 S2 Q2_Q1_DIFF S2_S1_DIFF Q16V_LENGTH SLENGTH MATCH_LEN PERCENT_ID M_QUE_PERCEN T M_SUB_PERCENT PASS_CR ITERIA UPLOAD _DATE UPLOAD _TIME ALIAS_NAME SEQ_NAME (FK) QU AL_PROCESS_ID (FK) GAP_ORDER V2_SEQ_ADAPTOR_SCREEN SEQ_N AME (FK) QUAL_PROCESS_ID (FK) V2_SEQ_ECOLI_SSAHA SSAHA_ID SEQ_NAME (FK) QUAL_PROC ESS_ID (FK) F_OR_R Q1 Q2 SUBJECT S1 S2 Q2_Q1_DIFF S2_S1_DIFF Q16V_LENGTH SLENGTH MATCH_LEN PERCENT_ID M_QUE_PER CENT M_SUB_PERCENT PASS_CRITERIA U PLOAD_DATE U PLOAD_TIME ALIAS_NAME V2_SEQ_CHLORO_SSAHA SSAHA_ID SEQ_NAME (FK) QUAL_PR OCESS_ID (FK) F_OR_R Q1 Q2 SUBJECT S1 S2 Q2_Q1_DIFF S2_S1_DIFF Q16V_LENGTH SLENGTH MATCH_LEN PERCENT_ID M_QUE_PERCEN T M_SUB_PERCENT PASS_CR ITERIA UPLOAD _DATE UPLOAD _TIME ALIAS_NAME ADAPTOR1_START ADAPTOR1_STOP ADAPTOR1_LEN ADAPTOR2_START ADAPTOR2_STOP ADAPTOR2_LEN ADAPTOR1_TAG ADAPTOR2_TAG PROCESS_DATE PROCESS_TIME SEQU ENCE Q16VS_STAR T Q16VS_LEN GTH Q16VS_STOP Q16V_START Q16V_LENGTH Q16V_STOP TAG_Q16VS_100 VECTOR1_START VECTOR1_STOP VECTOR1_LENGTH VECTOR2_START VECTOR2_STOP VECTOR2_LENGTH TAG_VECTOR_F1 TAG_VECTOR_F2 TAG_TOTAL_VECTOR VECTOR_LEN VECTOR_RATIO PREV_SEQUEN CE PREV_Q16V_START PREV_Q16V_LENGTH PREV_Q16V_STOP PREV_TAG_VECTOR _F1 PREV_TAG_VECTOR _F2 ALIAS_N AME V2_SEQ_RRNA_SCREEN V2_SEQ_CHLORO_SCREEN V2_SEQ_MITO_SCREEN SEQ_N AME (FK) QUAL_PROCESS_ID (FK) MATCH _SEG_NUM F_OR_R Q1 Q2 S1 S2 SUBJEC T L_Q16V100 L_Q2 M_QU E_PERCENT MATCH _LEN PERC ENT_ID MERGE_NUM TAG_COUNT TAG_SCREENED UPLOAD_DATE UPLOAD_TIME ALIAS_N AME V2_SEQ_ECOLI_SCREEN SEQ_NAME (FK) QUAL_PROCESS_ID (FK) MATCH_SEG_NUM F_OR_R Q1 Q2 S1 S2 SU BJECT L_Q16V100 L_Q2 M_QUE_PERCENT MATCH_LEN PERCENT_ID MERGE_N UM TAG_COUN T TAG_SCREENED UPLOAD_DATE UPLOAD_TIME ALIAS_NAME SEQ_NAME (FK) QUAL_PR OCESS_ID (FK) MATCH_SEG_NUM F_OR_R Q1 Q2 S1 S2 SUBJECT L_Q16V100 L_Q2 M_QUE_PERCEN T MATCH_LEN PERCENT_ID MERGE_NUM TAG_COUNT TAG_SCR EENED UPLOAD _DATE UPLOAD _TIME ALIAS_NAME SEQ_NAME (FK) QUAL_PROCESS_ID (FK) MATCH_SEG_N UM F_OR_R Q1 Q2 S1 S2 SUBJECT L_Q16V100 L_Q2 M_QUE_PERCENT MATCH_LEN PERCENT_ID MER GE_NU M TAG_COUNT TAG_SC REENED UPLOAD_DATE UPLOAD_TIME ALIAS_NAME GAP_TOT_NUM GAP_START GAP_LENGTH SEQ_NAME (FK) QUAL_PROCESS_ID (FK) GAP_5_9_POSITION GAP_5_9_SIZE WIN_SIZE THRESHOLD UPLOAD_DATE UPLOAD_TIME ALIAS_NAME V2_PROGRAM V2_SEQ_POLYT_SCREEN SEQ_NAME (FK) QUAL_PROCESS_ID (FK) POLYT_START POLYT_STOP POLYT_LEN ALIAS_NAME Figure 2: Entity-Relationship diagram of the MAGIC-SPP database. To see detail, use the ‘Zoom To…’ function in the Adobe Acrobat View menu. PROGR AM_ID PROGR AM_GRIC PROGR AM_NAME PROGR AM_VERSION PROGR AM_FILENAME PROGR AM_DESCRIPTION D ATE_START C OMMENTS R ELEASE_DATE PROGR AM_PARAMETERS Figure 3: A screen shot of MAGIC Admin. In addition to one of the library definition screens illustrated here, note the many pages available not only for entry of contact and publication information, but also primer, vector and genotype definitions. In addition to what is shown, the sequencing primers and vector used with this library, together with all required information as described in the text, are entered using pages selected from the options along the left-hand border. These pages also permit input of target information for each sequencing project and specify user permissions with respect to both entry and viewing of data. The Contact Information page facilitates among other functions creation of user accounts. Figure 4: A screen shot of MAGIC SEQ-LIMS. This interface tracks all steps in DNA sequencing from picking of bacterial colonies to upload of trace files into the analysis server. Figure 5: A screen shot of the MAGIC-SPP Sequence Statistics interface. This interface provides sequencing performance statistics for one or more plates in a selected library. In this view, all plates were selected for view both collectively (above red line) and individually (below red line). Figure 6: A screen shot of the MAGIC-SPP Status Report interface. Target information, including number of colonies to be picked and number of sequences to be obtained, is entered through MAGIC Admin (columns 1, 2, 5 and 6 from the left). Other columns report progress. The query page for this interface also permits retrieving and viewing only data entered between any two dates. Figure 7: A screen shot of MAGIC-SPP Plate Viewer. This interface assists with troubleshooting by showing sequences from both vector directions for the same plate. For example, in this illustration wells A1, A10, A12, C11, and E10 have all failed in both forward (top) and reverse (bottom) sequencing directions indicating that these failures are most likely due to missing or poor quality template DNA. Numbers within wells represent Q16VS read lengths.