Bioinformatics at The Roslin Institute Andy Law Bioinformatics Activities • The interface between computer science and biology Computer Scientists New algorithms Biologists Roslin Bioinformatics Group New New tools tools Integrating Integrating distincttools tools distinct Providing access Providing access to/ to/ maintaining tools maintaining tools Automation Automation scripts scripts Using tools and scripts Our Role • Grew out of the genomics programmes • Provide tools, and advice on their use • Make routine data handling easier QTL-mapping Pedigree Records Trait Records Genotypes Analysis Programs Analysis Programs • Law’s First Law – The first step in designing a new genetic analysis program is to determine how to make the input file sufficiently different from all previously defined genetic analysis programs Consequences • Data must be manipulated to suit each and every analysis program • Changes made to the data set as a result of one analysis may not be propagated to the other copies of the data 1 3 A197 GGQW Z113 AF1 10 1 0 0 1 1 2 1 1 2 0 0 0 1 2 1 2 3 0 0 1 1 1 1 1 4 0 0 0 3 3 1 1 5 2 1 1 1 2 1 2 8 4 3 0 1 3 1 1 11 8 5 1 1 2 1 2 12 8 5 1 1 3 1 1 13 8 5 0 2 3 1 1 14 8 5 0 1 3 1 2 1 5 2 3 4 6 3 3 1 3 3 4 0 0 3 3 1 4 1 4 QTL-mapping • Other problems – Sharing Data • Genotyping lab may be different from the lab that recorded the traits • Analysis may performed by a different lab • Populations may overlap • Need… – An easily accessible database QTL-mapping Pedigree Records Trait Records Genotypes Analysis Programs QTL-mapping Pedigree Records Trait Records Genotypes resSpecies Analysis Programs Analysis Programs • Law’s Second Law – Error messages should not be provided – If error messages are provided, they must be cryptic and convey as little information as possible Consequences • Errors in the data (e.g. Mendelian inheritance errors) result in long down-times hunting for the cause of the problem QTL-mapping Pedigree Records Trait Records Genotypes resSpecies Analysis Programs QTL-mapping Pedigree Records Trait Records Genotypes resSpecies Analysis Programs Code re-use • Same code as we use in the web-system… • …is the basis of a stand-alone application Our Role • Grew out of the genomics programmes • Provide tools, and advice on their use • Make routine data handling easier Analysis Programs • Law’s Third Law – The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study“ ... and is frequently many, many more A recent example • Data set of genotypes – 1152 samples – 6654 markers A recent example • Delivered as a matrix – Markers in rows – Samples in columns – Genotypes as ‘AA’, ‘AB’, ‘BB’ or ‘NC’ • To be converted to a tab-delimited list of Sample name, marker name, allele1, allele2 A recent example • But… – Sample name in matrix file suffixed with ‘.GType’ – Maps to sample name in ‘Sample Sheet’ prefixed with ‘COL_’ – Maps to ‘Plate/Row/Column’ designation – Maps to DNA number – Maps to Tissue Sample Bag number… A recent example • But… – Maps to Tissue Sample Bag number… – Which – together with slaughterhouse designation – maps to sample number Who • • • • David Speed Neil Bartley Fahad Ifthkar Phil Devall • John Bowman • Jan Aerts • Wilfrid Carre • Zen Lu • Trevor Paterson