Introduction to BioComputing Biology in silico 3rd February 2010 Carrie Iwema, PhD, MLS Molecular Biology Information Specialist Health Sciences Library System University of Pittsburgh iwema@pitt.edu http://www.hsls.pitt.edu/guides/genetics General Topics Information Overload Genome Gene Protein http://www.hsls.pitt.edu/guides/genetics Specific Topics Information Overload PubMed Alternatives to PubMed GoPubMed Novoseek PubGet Molecular Databases HSLS Molecular Biology Information Service Genome Gene Protein Genome Biology Genome Browsers UCSC Genome Browser NCBI MapViewer Entrez Gene UniProt http://www.hsls.pitt.edu/guides/genetics Information Overload 209K 5,394 Journals 1.3 billion searches in 2009 4K • Breast Cancer 84K • Colon Cancer 52K • p53 • STAT1 http://www.hsls.pitt.edu/guides/genetics Alternatives to PubMed http://www.hsls.pitt.edu/guides/genetics Growth of Molecular Databases 2010: 1230 2009: 1170 2008: 1075 Source: Nodal Point Blog http://www.hsls.pitt.edu/guides/genetics Molecular Databases Nucleic Acids Research: Oxford Journals Journals Annual Database Issue Annual Web Server Issue Bioinformatics: Oxford Journals BMC Bioinformatics: BioMed Central Database: Oxford Journals *new in 2009* Articles on “genetic databases” PubMed: 21,851 results MeSH: 16,398 results http://www.hsls.pitt.edu/guides/genetics HSLS Molecular Biology Information Service Workshops Bioinformatics Consultations Website Software Licensing http://www.hsls.pitt.edu/guides/genetics HSLS OBRC http://www.hsls.pitt.edu/guides/genetics HSLS OBRC in Science HSLS OBRC 2441 links to databases and software ~3000 hits/day http://www.hsls.pitt.edu/guides/genetics search.HSLS.MolBio Integrated search system Databases & Software Articles on Databases & Software Genes/Proteins Pathways Protocols Videos Recommended Articles Tabbed browsing Clustered search results http://www.hsls.pitt.edu/guides/genetics Hands-on exercises Locate databases on Retrieve gene information for Your favorite gene, BRCA1, STAT1 Find a suitable protocol for Natural antisense, UTR, copy number variation Methylation PCR, in situ hybridization, primer design Identify videos on Protein structure prediction, human genome project http://www.hsls.pitt.edu/guides/genetics Genome Biology http://www.hsls.pitt.edu/guides/genetics From Cell to Gene Human Genome Project Video http://www.hsls.pitt.edu/guides/genetics Genome Biology Time Line RNA Bacteriophage MS2 Human Genome Draft Seq 1976 Diploid Genome seq of an Individual Human 2001 1995 Haemophilus Influenza 2007 2003 2008 Published Complete Human Ref Genome Published Complete Genomes: 1191 organisms Jim Watson Genome Human Genome Project Video http://www.hsls.pitt.edu/guides/genetics 2010 Genome Resources NCBI: Genomes Resources : Link Genome Project Genome: 6108 species Genomes OnLine Database (GOLD): Link JGI: Integrated Microbial Genomes: Link http://www.hsls.pitt.edu/guides/genetics NCBI Genome Resources http://www.hsls.pitt.edu/guides/genetics Practice Question: Query: Check the status of genome sequencing for an organism, such as rabbit. Answer: Pick an organism or metagenome project name. Search the Genome Project database. To get the most precise results specify the organism field when searching with an organism name, for example: human[orgn]. Click on the desired Genome Project if more than one result. The Genome Project summary page will provide information of available projects and sequencing status. http://www.hsls.pitt.edu/guides/genetics NCBI Genome Project A collection of complete and in-progress large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms. The database is organized into organism-specific overviews that function as portals for browsing and retrieving projects pertaining to each organism. Rabbit CLICK http://www.hsls.pitt.edu/guides/genetics NCBI Genome Project : Rabbit Genome http://www.hsls.pitt.edu/guides/genetics NCBI Genome Project : Rabbit Genome http://www.hsls.pitt.edu/guides/genetics NCBI Entrez Genome: http://www.hsls.pitt.edu/guides/genetics Genomes Online Database (GOLD) http://genomesonline.org/index2.htm Global resource for comprehensive access to information regarding complete and ongoing genome projects, metagenomes, and metadata. “genome sequencing has come of age, and genomics will become central to microbiology's future. It may appear at the moment that the human genome is the main focus and primary goal of genome sequencing, but do not be deceived. The real justification in the long run, is microbial genomics” Carl Woese, 1998 http://www.hsls.pitt.edu/guides/genetics Genome Browsers http://www.hsls.pitt.edu/guides/genetics Genome Browsers: What are they? Genome Browsers enable researchers to visualize and browse entire genomes with annotated data including gene prediction and structure, proteins, expression, regulation, variation, comparative analysis, etc. http://www.hsls.pitt.edu/guides/genetics Genome Browsers Display: Vertical The Big Three NCBI MapViewer UCSC Genome Browser EBI Ensembl Display: Horizontal Generic Genome Browser (Gbrowse) JBrowse (Ajax based like Google Map) http://www.hsls.pitt.edu/guides/genetics Tutorial Articles Link Link Link Link http://www.hsls.pitt.edu/guides/genetics Tutorial/Seminar Videos Link Link Link Link http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser http://www.hsls.pitt.edu/guides/genetics Navigating the Human Genome Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438 UCSC Genome Browser http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Set up basic browser parameters http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Start fresh http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438 What genes are present in this region? http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region http://www.hsls.pitt.edu/guides/genetics NCBI sequence databases RefSeq based on GenBank records; non-redundant, expert-verified databases of reference sequences Link GenBank archival database of nucleotide sequences from >160,000 organisms Link http://www.hsls.pitt.edu/guides/genetics International Nucleotide Sequence Database Collaboration http://www.hsls.pitt.edu/guides/genetics Primary Vs Derivative databases http://www.hsls.pitt.edu/guides/genetics RefSeq Scope & Accessions Genomic DNA NC_123456 - complete genome, chromosome, plasmid NG_123456 - genomic region NT_123456 - genomic contig mRNA NM_123456 Protein NP_123456 more about RefSeq scope and accessions... http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438 Zoom in and display only the EGFR gene http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Select the gene region from the “Scale” track to zoom in http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438 Display all Single Nucleotide polymorphisms (SNPs) present in this gene http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438 Retrieve the nucleotide sequence of this genomic region showing all exons in blue and SNPs in Red, bold faced and underlined. http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region: sequence view http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438 Look in probable promoter region and see if there’s anything interesting… http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Zoom out http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438 What transcription factors bind in this region? http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: navigating a genomic region http://www.hsls.pitt.edu/guides/genetics Discovery Tool… http://www.hsls.pitt.edu/guides/genetics NCBI MapViewer http://www.hsls.pitt.edu/guides/genetics NCBI MapViewer How To: View/download features around an object or between two objects on a chromosome Starting with...CHROMOSOMAL COORDINATES Begin on the Map Viewer home page. Click the "R" icon under Tools for the desired organism and build. Select the chromosome, enter the coordinates in the From and To boxes, and click Go. Use either exact coordinates, e.g., 61551076, or values such as, 61M or 61551K. If necessary, use the Maps & Options dialog box to change displayed maps; the maps and region displayed determine the data available. Entrez Gene http://www.hsls.pitt.edu/guides/genetics Common Questions What is its genomic seq? How many splice varients are there? What are its intron-exon architechure? What is its function? Which tissues it expressed ? What are its neighboring genes? What diseases are associated with it? http://www.hsls.pitt.edu/guides/genetics How can I get its cDNA clone? NCBI : Entrez Gene Chromosomal Localization Amino acid Genomic mRNA Sequence Sequence Sequence Homologous Sequences Expression Profile Disease 3D Structure SNP http://www.hsls.pitt.edu/guides/genetics Interacting Partners Entrez Gene Find: gene symbols and aliases sequences: genomic, mRNA, protein intron-exon architecture genomic context: neighboring and antisense genes interacting partners associated gene ontology terms: function, cellular component and biological process http://www.hsls.pitt.edu/guides/genetics Entrez Gene a searchable database of genes, from RefSeq genomes, and defined by sequence and/or located in the NCBI Map Viewer each record represents a single gene from a given organism http://www.hsls.pitt.edu/guides/genetics Entrez Gene Sequences Genomic Seq Protein Seq mRNA Seq http://www.hsls.pitt.edu/guides/genetics Entrez Gene Links http://www.hsls.pitt.edu/guides/genetics Gene Ontology (GO) Controlled vocabulary tagging Function Biological Processes Cellular Component http://www.hsls.pitt.edu/guides/genetics Entrez Gene: Gene Table Introns/Exons http://www.hsls.pitt.edu/guides/genetics Try it! Find mRNA sequence for your gene of interest http://www.hsls.pitt.edu/guides/genetics Find mRNA Sequence for Reelin Gene http://www.hsls.pitt.edu/guides/genetics FASTA vs GenBank records http://www.hsls.pitt.edu/guides/genetics NCBI Entrez Gene Tutorials Information page with wiki, video, blog etc. Entrez gene: A Directory of Genes, NCBI Handbook Short Video Tutorial (MIT) http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: find a gene in the genome http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: find a gene in the genome http://www.hsls.pitt.edu/guides/genetics UCSC Genome Browser: find a gene in the genome http://www.hsls.pitt.edu/guides/genetics Bioinformatics Databases & Software Providers NCBI Home page Site map Resource Guide EBI Home page Databases Software http://www.hsls.pitt.edu/guides/genetics UniProt http://www.hsls.pitt.edu/guides/genetics UniProt world's most comprehensive catalog of information on proteins a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR http://www.hsls.pitt.edu/guides/genetics UniProt http://www.hsls.pitt.edu/guides/genetics UniProt http://www.hsls.pitt.edu/guides/genetics Thank you! Any questions? Carrie Iwema iwema@pitt.edu 412-383-6887 Ansuman Chattopadhyay ansuman@pitt.edu 412-648-1297 http://www.hsls.pitt.edu/guides/genetics