The UCSC Genome Browser is a website

advertisement

UCSC Genome Browser

The UCSC Genome Browser is a website ( genome.ucsc.edu

) that provides access to genome sequences and related data from human and many other organisms. It has an easy to use query system and an intuitive graphical display that shows a huge amount of useful information tied to specific locations on the chromosomes.

Q uickTim e™ and a

TI FF ( LZW) decom pr essor ar e needed t o see t his pict ur e.

The Genome Browser was created by Jim Kent (a graduate student) and Jim Haussler (a computer science professor) at the University of California, Santa Cruz when the first draft of the human genome was nearly complete in 2000. Kent wrote a program which assembled hundreds of thousands of short overlapping fragments of DNA sequence into a consensus sequence for the entire genome known as the “Golden Path,” which provides a single definitive sequence for each position along the full length of every chromosome.

He created the Genome Browser to share this assembled sequence with the scientific community.

Getting Started:

The UCSC Genome Browser Home page is at http://genome.ucsc.edu.

Generally you will hit the “ Genome Browser ” button in the navigation bar to the left to get to the standard query page where you can type in a gene name, a location on a chromosome, or another keyword recognized by the Genome Browser.

QuickTime™ and a

TIFF (LZW) decompressor are needed to see this picture.

The query page has a pulldown menu (actually 2 menus) that allows you to chose which organism’s genome you wish to use. The “position” text box is where you can type the name of the gene, accession number, chromosome location, keyword, or other term that you wish to locate on the genome. Once your search term is entered, hit the Submit button (or just hit “Return” on your keyboard). ar e needed t o see t his pict ur e.

choose organism

type in a gene name or other keyword then hit Submit

Generally a query will yield more than one match in the database. This is especially true for keywords. Several genes may share similar names and a single gene may have multiple database entries (mRNAs, partial sequences, variant sequences), so if your search term is BRCA1, you will match:

BRCA1 protein, BRCA1 associated protein-1, cofactor of BRCA1, BRCA1 associated RING domain 1, etc.

QuickTime™ and a

TIFF (LZW) decompressor are needed to see this picture.

Choose the best match from the search results listing, and click on the link to go that location on the genome browser. Many of these database matches may map to the same location on the genome.

QuickTi me™ and a

TIFF (LZW) decompressor are needed to see this picture.

The Browser Window

On the Genome Browser website, each gene is displayed graphically in its position on the genome, showing introns, exons, and untranslated regions. Exons are shaded blocks; introns are thin lines with arrows that indicate the direction of transcription. At the ends of the first and last exons are narrower shaded regions that represent portions of the gene that are transcribed into mRNA, but not translated into protein – the Untranslated

Regions (UTRs).

QuickTime™ and a

TIFF (LZW) decompressor are needed to see this picture.

gene name intron exons

(click on the gene

(arrows indicate direction to get to the gene of transcription) untranslated detail page) region (UTR)

Once you are in the Browser, showing a portion of the genome, there are a set of standard navigation buttons that you can use to move around. There are three “left” and three

“right” arrows to move large, small, or tiny bits along the current chromosome. There are also three “zoom in” and three “zoom out” buttons to see more detail for a portion of a gene, or to zoom out to see more of the neighboring genes. The “base” button zooms all the way in to point where individual DNA letters are visible.

Move right Zoom in Zoom out

Q uickTim e™ and a

TI FF ( LZW) decom pr essor ar e needed t o see t his pict ur e.

Move left

Other Tracks

The Genome Browser also organizes a lot of other useful information on the genome, such as chromosome banding patterns, know genes, predicted genes, expression data, comparisons across species, and SNPs (genetic variations). It also contains links to many other databases

QuickTime™ and a

TIFF (LZW) decompressor are needed to see this picture.

It is possible to modify the Genome Browser display in order to show a great deal more information as additional horizontal “tracks.” In the browser window, scroll down the page to see several sets of pulldown menus. Each menu has 5 options: hide (do not show that track), dense (show the track as a single line), squish (squeeze lines into an unreadable mush), pack (compact as much as possible without making it unreadable), and full . The first set of “ Mapping and Sequencing Tracks ” deal with primary sequencing information – BAC clone ends, contigs, STS markers, gaps, coverage, etc.

Information about genes is located in the “ Genes and Gene Prediction Tracks ” including GenBank genes that code for proteins listed in SwissProt, RefSeq genes, and genes predicted by many different gene prediction programs including Ensembl,

Acembly, Genescan, and GeneID. The “ mRNA and EST Tracks ” provide the primary cDNA sequences used as evidence by database curators and gene prediction programs to identify protein coding regions in the genome. “ Expression and Regulation ” provides mRNA abundance data from microarrays, CpG Islands, and promoter prediction.

“ Comparative Genomics ” shows similarity between genome sequences from various species. “ Variation and Repeats ” includes tracks for SNPs and several types of repeating elements.

Mapping & Sequencing

Genes and Predictions mRNA and EST Tracks

Expression and Regulation

Comparative Genomics

Variation & Repeats

BLAT Search

It also possible to locate genes in the Genome Browser by sequence similarity. UCSC has its own similarity search tool called BLAT (Blast-Like Alignment Tool), created by Jim

Kent. BLAT is extremely fast, but much less sensitive than BLAST.

BLAT on DNA is designed to quickly find sequences of 95% and greater similarity with a length of 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 21 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more.

QuickTime™ and a

TIFF (LZW) decompressor are needed to see this picture.

The BLAT page is very simple: choose which genome to search from a pulldown menu, paste in a Query sequence, and hit the Submit button. BLAT can be used with DNA or protein query sequences.

BLAT returns a Results page that lists matching sequence segments and their locations in the genome. Matches at the top of the list are the best. The last column, labeled “Span,” is the length of sequence that matches between your query and the genome. Small matches

(less than 40 bases) are not likely to be significant.

QuickTime™ and a

TIFF (LZW) decompressor are needed to see this picture.

The “ details

” link shows a BLAST-style alignment of your query sequence with the matching segment of genome sequence.

QuickTime™ and a

TIFF (LZW) decompressor are needed to see this picture.

The “ browser ” link for each entry on the list of matching sequences will take you directly to that sequence in the Genome Browser, and your query sequence will be added as a track near the top of the main panel.

QuickTime™ and a

TIFF (LZW) decompressor are needed to see this picture.

Download