HW #5 BP401/P475 Fall 2015 Assigned Fr 9/25/15: due: Thursday 10/1/15 Paul Selvin Reading: Look at Lecture 7, slide 11, how PCR exponentially amplifies DNA. Also look at https://www.youtube.com/watch?v=0HCWmD7Mv8U. 1. You want to amplify a particular gene which starts of with a certain DNA sequence. Let’s take myoglobin, for example, a 153 amino acid protein. The DNA (mRNA) sequence is: DNA sequence http://www.ncbi.nlm.nih.gov/nuccore/NM_203377 Note: the strand begins at the 5’ end of the DNA. This is the one strand of the DNA, for example, the sequence which is used to make the RNA. There is a corresponding DNA strand with its corresponding sequence which will hybridize to it. The amino acid sequence is: 1 HW #5 BP401/P475 Fall 2015 Assigned Fr 9/25/15: due: Thursday 10/1/15 Paul Selvin a. Looking at the amino acid sequence, it looks like there is 154 amino acids. Why then, do they say myoglobin has 153 amino acids? b. You want to amplify it by PCR, so you must make two primers for PCR. Why are there two, and what sequences are they? c. The gene coding for myoglobin, is on chromosome 22, which is 49 million DNA base pairs. (Chromosome 22 is actually a small chromosome, representing between 1.5 and 2% of the total 3 billion base pairs of DNA in cells. You do NOT want to amplify the entire chromosome, but rather only the gene for myoglobin. How is this accomplished? To answer this, for the PCR primers you chose, go through a step-by-step process of PCR. Assume that the DNA polymerase will stop amplifying after about 10,000 nucleotides. (The DNA polymerase isn’t perfect and at some point falls off the DNA. 10kb is common, although about 40kb possible.) Find out what products you make after 1 round of PCR, then 2 rounds, then three round, etc. Why don’t you in-fact, amplify the whole chromosome? d. In class we calculated how much amplification we got after 32 cycles of PCR. I said that there was 1.07 billion copies, whereas many people thought the answer was 232 = 4 billion. Who’s right? How does it relate to question 4? Finally look at https://www.youtube.com/watch?v=ZmqqRPISg0g. (I strongly encourage you to answer the questions first, and then look at this video.) 2 HW #5 BP401/P475 Fall 2015 Assigned Fr 9/25/15: due: Thursday 10/1/15 Paul Selvin 2. Gene Chips (see diagram on next page) a) Explain why a gene chip (i.e. a DNA Microarray) would be ideal to use when determining which genes are being turned on (i.e. proteins expressed) and which genes are being turned off during cell division (or any other cell process). b) You have isolated the mRNA from cancerous tissue and labeled it with a red fluorescent dye. You also isolate the mRNA from a healthy version of the same tissue and label it with a green fluorescent dye. After mixing the mRNA and hybridizing it to a human gene chip you see the results below. Each Gene (genes 115) has its own spot and the color of each spot is given by the letters GGreen, RRed, BBlank (i.e. no color at all) and Y Yellow. Qualitatively, what does yellow correspond to in terms of gene expression in the cancerous and healthy tissue? If you were to design a drug that could decrease any particular gene expression, which gene(s) from the diagram might you choose to target and why? c) If you could design a drug to increase the expression of any particular gene product, which gene(s) from the diagram might you choose as a target and why? NOTE: Cancers can be caused by overexpression of certain genes, called oncogenes, or suppression of other genes, caused tumor suppressor genes. 3 HW #5 BP401/P475 Fall 2015 Assigned Fr 9/25/15: due: Thursday 10/1/15 Paul Selvin 3. DNA Sequencing— We discussed the PacBio method of sequencing DNA. It was very nice in that it could potentially look at very long reads. A more standard method is based on the chain termination method, also called the Sanger method. https://www.youtube.com/watch?v=SRWvn1mUNMA a. Explain in one paragraph how the Sanger method works. The method works but is limited to fairly short stretches of DNA, (I think) about 400 base pairs long because the gel electrophoresis is not sensitive beyond this range (i.e. it cannot separate out the 400th long piece of DNA from 401.) So to sequence an entire genome, e.g. 3 billion base pairs, what you do is a shotgun approach where you add in a primer of random sequence (at least 16 long), sequence the 400 base sequence, then do it again with another random sequence primer. You then have a whole bunch of 400 long sequences, which means that you will have a lot of sequences which overlap each other. By using a computer (and the overlap is at least 16 bases long), you can stitch together the entire sequence. A general video on DNA sequencing is: https://www.youtube.com/watch?v=MvuYATh7Y74 b. A more detailed look at Next Generation Sequencing, the most recent technique(s), can be found: https://www.youtube.com/watch?v=jFCD8Q6qSTM Explain the Sequencing by Synthesis, the Sequencing by Ligation, and the Pyrosequencing. Illumina is the company that has largely taken over the DNA sequencing field. Which technique do they use? 4