UCSC Genome Browser Activities The purpose of this set of activities is to familiarize you with the UCSC genome browser. By the end of this set, you will have learned how to use: the UCSC genome browser to look at different tracks for genes and other genome characteristics table browser to extract and manipulate data genome browser tools such as BLAT 1. CTCF is an evolutionarily well-conserved protein that is involved in multiple cellular processes. Using the Feb. 2009 human genome assembly (hg19) complete the following tasks: a. identify the location of the gene CTCF (isoform 1, RefSeq annotation). b. examining the graphical output on the browser, identify and describe any RefSeq isoforms that are readily identifiable? c. identify the length of mature mRNA of the shortest isoform. d. using the table browser (notice the tools link at the top of the browser page) and dbSNP 138, identify how many common SNPs are at the CTCF locus. e. create a BED of the SNPs. f. Identify the number of SNPs that intersect with transcription (Txn) factors identified using ChIPSeq. g. identify the location of the SNP with ID rs72140612. h. identify the number of SNPs that intersect with the mature mRNAs produced by this locus. 2. Obtain the amino acid sequence for the largest CTCF isoform and BLAT it against the mouse genome (mm10) to find the mouse homolog. (Hint, scroll down a bit after clicking on the graphic of the isoform in the browser and look for the predicted protein.) a. What is the percent identify of the best match? b. Go to the mouse browser and identify the top mouse mRNA there. c. How many SINE elements are in intron 2 of the mouse homolog? Remember, you’ll need to figure out what the orientation of the gene is. (Hint: to determine the orientation, click on one of the mRNAs. ‘+’ or ‘-‘ indicates the strand. Genes on the + strand are read from left to right.) 3. Go back to the CTCF locus in the human genome. There are 33 GO annotations in three categories associated with the gene. a. b. c. d. What is the second GO annotation (include its ID number)? What is the definition of that function found on the page that links from the ID number? Does this protein have a zinc finger domain? Using information determined from the Comparative Toxicogenomics Database (CTD), determine whether the gene interacts with acetaminophen? e. Using information from Microarray expression data, determine if this gene is expressed in the thymus. f. If so, is that expression higher or lower than what is found in skeletal muscle?