Huvariome The central requirement for implementing NGS into clinical practice is to allow simple and secure access to databases containing curated knowledge of variants scored as clinically relevant pathogenic mutations with standardized clinical reporting. Huvariome provides the user with whole genome allele frequencies, their associated quality score (detection and chance to detect the variant), gene based ranking and integrated access to publicly available data for the detection of common, rare and deleterious variants. The functional impact of variants in Huvariome is provided by the Complete Genomics (CG) annotation pipeline. The novelty of Huvariome is that it provides rapid and simple access to SNV, short indels, and de novo assembled regions of the genome at any position in the genome with allelic frequencies and associated error for position in the human genome. Huvariome also delivers common variants from a small cohort of Benelux genomes from unrelated individuals with no disease association. In light of these developments we have developed a simple application, Huvariome, which goes beyond the current platforms with similar goals to enable efficient allelic frequencies searching in both public and private genomes for clinical research scientists. The following pages describe in detail how to use Huvariome. hg18 variant analysis The current database contains whole genomes from 165 individuals all mapped to hg18. The following example will describe how to query Huvariome and what is retrieved from Huvariome. In the following example known variants from Huvariome query 7 55216556 7 55216983 7 55217384 7 55217551 7 55217609 7 55217875 7 55217935 7 55217949 7 55218287 7 55218347 7 55218596 7 55218972 7 55219034 7 55219086 Fig-1A Fig-1B Fig 1 – Variants from EGFR from UCSC (A) and with chromosome number and start position used for the query to huvariome (B). the EGFR were downloaded from UCSC table browser and a subset used to query the huvariome. The query takes the form of chromosome number (e.g. 7) and 0-based start location (e.g. 55216556) in the above example (Fig 1). The user then choose to keep the default query for NCBI36 (hg 18) and processes the request by “clicking” the Run Variomatic button at the bottom of the query page (Fig -1B). The results page is returned, and depending on the size of the request or the load on the server a notification that the page will be refreshed every 30s appears which can be refreshed by pressing F5 on the key board (Fig 2). If the request is large Huvariome will deliver back the genomic locations and the genotype frequencies before returning the annotations to improve the usability of by reducing the waiting time of the user (Fig 3). Fig-2 Fig-3 Fig 2 – Results page from huvariome Fig 3 – If the query is large then the chromosome location of the query and genotypes are returned with an a warning that the annotation will be delivered as soon as possible. The results page is for the 14 variants submitted variants from the EGFR are displayed in Fig 4, initially with the Diversity panel genotypes in the primary display. Fig 4 An illustration of the analysis of five genomic locations using the guest account. The balloons labeled on this figure outline the key data that are returned for each variant (each row). The frequency of each genotype is highlighted by the size of the associated blue bar. Abbreviations: gsym= gene symbol, comp= gene component (e.g. exon , intron), xref=external reference for variants, dgv=database of genomic variants, vista=VISTA enhancers (http://enhancer.lbl.gov/). hg19 variant analysis The current database contains whole genomes from 165 individuals all mapped to hg18. Variants mapped to hg19 can be queried against huvariome which will return the hg19 position and the hg18 position of the submitted variant. Huvariome uses the UCSC mapping table to “liftover” hg19 variants to the hg18 results currently stored in the database In the following example known variants from the EGFR were downloaded from UCSC table browser and a subset (Fig 5A) and used as a test case to query huvariome. UCSC uses 0-based half open notation for representing nucleotide positions whilst huvariome uses 0-based thus only the chromosome number and start position are required to query Huvariome in tab-delimited format (Fig 5B) as with hg18 query. Chromsome chr7 chr7 chr7 chr7 chr7 chr7 chr7 chr7 chr7 chr7 chr7 chr7 chr7 chr7 Fig 5A Start 55249062 55249489 55249890 55250057 55250115 55250381 55250441 55250455 55250793 55250853 55251102 55251478 55251540 55251592 End 55249063 55249490 55249891 55250058 55250116 55250382 55250442 55250456 55250794 55250854 55251103 55251479 55251541 55251593 Alleles A,G A,G C,T A,G C,T A,G A,G A,G C,T G,T A,G C,T A,G A,G Huvariome query 7 55249062 7 55249489 7 55249890 7 55250057 7 55250115 7 55250381 7 55250441 7 55250455 7 55250793 7 55250853 7 55251102 7 55251478 7 55251540 7 55251592 Fig 5B Fig 5C Fig5 – Variants from EGFR from UCSC (a) and with chromosome number and start position (b) used for the query to huvariome (c). The resultant query (Fig 5B) is inserted into the query box on the Huvariome start page and the user then select GRCh37 (hg19; lift over to hg18) and then the Run Variomatic button (Fig 5C). The results are display as in the hg18 view but now the position in hg19 is displayed as original (chr , pos) in Fig 6 below. Fig-6 Huvariome output, hg18 (chr, pos), hg19 (original- chr, -pos), ref= reference allele, a1&a2 genome set allele, nc rate (no calls), xref (external reference) , impact (effect oof gene), gysm (gene symbol), comp (gene feature), dgv (database of genomic variants), vista (human regulatory elements), common var (>=5% MAF in 31 genomes denoted by a “1”).