Canine SNP Analysis Matthew Kusner Background Samples – Studying Progressive Retinal Atrophy (PRA) – 8 affected & 2 unaffected Italian Greyhounds SNP Genotyping and Errors – Allele-specific PCR – No call vs. allele miscall Background • SNP data – File conversion – Ordered by genomic position – Genotype “codes” • • • • 0 = AA 1 = AB 2 = BB -1 = unknown Background • Disease Characteristics – Autosomal Recessive • Increased likelihood of some loss of genomic functionality • Disease loci near homozygous SNPs • Duplicate gene copies • Larger region = possibly more genes involved • Program Nomenclature – For a given SNP: • • • • Consistent – All affected dogs homozygous for one allele (A or B). Inconsistent – All affected dogs consist of a combination of both alleles (A and B). Informative – All affected and unaffected dogs consist of a combination of both alleles. Uninformative – All affected and unaffected dogs homozygous for one allele. • • Program Seed Formation – Run through data – Form blocks of >= 5 consecutive consistent SNPs. These are the seeds. Expansion – Forward (based on genomic position) seed expansion if one of the cases apply: • • • Next SNP is consistent Next SNP is inconsistent, yet an underrepresentation of inconsistent SNPs in block Next SNP is inconsistent based on one allele and an under-representation of such SNPs. We'll call these “changed” SNPs. Program • – Backward (based on genomic position) seed expansion is the same. – If block collision occurs, blocks combined. Merge – Blocks merged (equivalent to “combining” blocks as above) if both cases apply: • • An under-representation of inconsistent SNPs in merged block. An under-representation of changed SNPs in merged block. Program • Sort – Blocks sorted first by size (based on earlier reasoning) then by Consistent/Informative product (for blocks of the same size). Blocks are then outputted.