BT6Anew 2 - SHS

advertisement
Unit 6
Bioinformatics:
Genes, Proteins, and Evolutionary
Relationships
Lesson 1
• 1A – Introduction – Neanderthal Webquest
• 1B – Lecture PCR
Activity – PCR Lab
Lesson 1A - Introduction
• Computer Webquest: Neanderthal
Genome
• Research the Smithsonian Genetics
website.
• http://humanorigins.si.edu/evidence/gen
etics/ancient-dna-and-neanderthals
• Respond to questions
• Whole class discussion about DNA
sequences used and types of
biotechnology procedures used in DNA
identification.
Lesson 1A – What you need to know
• By comparing the following types of genetic material, what can we
conclude about relationships between Neanderthals and modern humans?
• mtDNA
• Nuclear DNA
• Genes for red hair and pale skin
• FOXP2 gene
• O allele
• TAS2R38 gene
• Microcephalin
Lesson 1B - PCR
• Before Neanderthal DNA sequencing could be done to study and
compare the genome to modern humans, the DNA collected from
bones, hair, etc. was not adequate in amount for the actual
sequencing.
• The DNA had to be amplified in order to have sufficient quantity for
further testing.
• A procedure called the polymerase chain reaction (PCR) was used to
amplify the DNA
PCR
• PCR background
• Polymerase Chain Reaction (PCR) is a
rapid technique to clone specific DNA
fragments.
• The technique revolutionized
biotechnology with its many
applications.
• Among these applications are its use in
forensics testing as well as a
replacement for DNA libraries as it is
much faster than building a screening a
library.
PCR
• PCR Technique
PCR • Target DNA is put into a PCR test tube.
• DNA is mixed with DNA polymerase, deoxyribonucleotides (dATP,
dGTP, dCTP and dTTP) and buffer.
• A pair of primers (short single stranded DNA nucleotides) is added.
The primers are complimentary to nucleotides on the ends of the
DNA.
PCR
• The test tube is placed in a
thermocycler, a sophisticated
heating block capable of
changing temperatures over
short time periods.
• The thermocycler takes the
sample through a series of
reactions called the PCR cycle
PCR
• Each PCR cycle has 3 stages:
• Denaturation- Sample is heated to 94-96
degrees C. This causes the DNA to
separate into single strands.
• Hybridization – Sample is cooled to 55-65
degrees C. This allows the primers to
hydrogen bond to complimentary bases
at opposite end of the target sequence.
PCR
• Extension – Sample is heated to 70-75
degrees C. The DNA polymerase copies
the target sequences by binding the
nucleotides to the 3’ end of each primer.
• At the end of one cycle, the amount of
DNA has doubled.
• Researchers usually run 20-30 cyles of
PCR.
• After 20 cycles, there are about 1 million
copies of target DNA
PCR
• One of the keys to PCR is the type of
DNA polymerase used.
• Most DNA polymerase would denature
in the heating and cooling process of
PCR.
• Taq DNA polymerase is used in PCR.
• It is isolated from Thermus aquaticus, an
Archaea species that thrives in the hot
springs of Yellowstone National Park.
• Taq is stable at high temperatures.
PCR
• Cloning PCR products
• If you wish to clone a gene made by
PCR:
• Thermostable polymerases like Taq
add a single adenine nucleotide to the
3’ end of all PCR products (It’s a
quirk).
• PCR products can be ligated to T
vectors which are plasmids that have
a single stranded thymine nucleotide
at each end.
• Once ligated, the recombinant
plasmid can be introduced into a
bacteria.
PCR
• http://highered.mcgrawhill.com/sites/0072556781/student_view0/chapter14/animation_qui
z_6.html
• PCR animation
PCR Activity - Lab
• We will conduct the PCR
procedure and check for PCR
products with gel electrophoresis.
• Refer to your lab handout for
directions.
1B – What you need to know
• Be able to explain, in detail, the PCR technique.
• What is the importance of Taq polymerase?
Lesson 2- DNA Sequencing and The Human
Genome Project
• 2A – Lecture Early History of DNA Sequencing and Sanger method.
Activity- DNA Sequencing by Sanger method
• 2B – Lecture Human Genome Project and Automated DNA sequencing.
Activity – Video: Cracking the Code of Life
DNA Sequencing and Human Genome Project
• Lesson 2A -Early History
• Techniques to sequence DNA developed many
years before the Human Genome Project.
• In the 1970s and early 1980s, DNA sequencing
was done manually and required methods that
were expensive, labor intensive, and used
dangerous chemicals and radioactive reagents.
• Only short sequences were identified.
• By the mid 1980s, the Sanger chain termination
method was developed by Fred Sanger. It quickly
became the method of choice for DNA
sequencing
DNA Sequencing and The Human Genome Project
• Original Sanger method
• Four separate reaction tubes are set up.
• Each tube contained
- Identical DNA of interest
- Radioactively labeled primer to get DNA synthesis started
- Deoxyribonucleotide phosphate to be used in DNA synthesis (dNTP)
- Small amount of dideoxyribonucleotide phosphate (ddNTP)
- DNA polymerase.
DNA Sequencing and The Human Genome Project
• All four test tubes have each of the four
nucleotide bases (dNTP) but each one of the
tubes will also have one radioactively labeled
(ddNTP).
• Example
• "G" tube: all four dNTP's, ddGTP , DNA
polymerase, and primer
• "A" tube: all four dNTP's, ddATP , DNA
polymerase aqnd primer
• "T" tube: all four dNTP's, ddTTP, DNA
polymerase and primer
• "C" tube: all four dNTP's, ddCTP , DNA
polymerase, and primer
DNA Sequencing and The Human Genome Project
• Sanger Method
• DNA strands are separated.
• The radioactive primer binds to the 3’
end of the fragment.
• DNA polymerase synthesizes a
complimentary DNA sequence.
• Every time a specific ddNTP is used in the
complimentary strand, the DNA synthesis
halts.
• This creates fragments of different
lengths.
DNA Sequencing and The Human Genome Project
• EX: On the right are the contents of
the “A” tube. It has ddATP in it.
• The ddATP is used. Where the
termination process ends with the
ddATP is random in the tube. So you
generate fragments of different
lengths because every possible A site
has incorporated ddATP
DNA Sequencing and The Human Genome Project
• Sanger Method
• The same process that occurred
in the A tube occurs in the C, G,
and T tube.
• The DNA from each tube is run
in gel electrophoresis. The
banding pattern allows you to
sequence the DNA.
• The sequence on the right is
ATGCCAGTA.
• How do you figure this out?
DNA Sequencing and The Human Genome Project
• Sanger method animations
• http://highered.mcgrawhill.com/sites/0072556781/student_view0/chapter15/anima
tion_quiz_1.html
• Http://www.dnalc.org/resources/animations/sangerseq.html
Sanger method activity
Complete the DNA sequencing activity as explained in your
handout.
DNA Sequencing
• There are a variety of techniques in use or being
explored.
• Pyrosequencing – Uses DNA on a bead to
sequence complimentary DNA strands.
• SOLID – Supported oligonucleotide ligation and
detection which generates 6 billion base
pairs/reaction.
• http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=relat
ed
• https://www.youtube.com/watch?v=4XMO5VfLIKs
• Nanotechnology – to sequence DNA without
fluorescent tags.
Lesson 2A – What you need to know
• How many reaction tubes are used?
• What is added to each reaction tube?
• Describe the Sanger method procedure.
• Explain how gel electrophoresis enables the determination of DNA
sequence.
DNA Sequencing and The Human Genome Project
• Lesson 2B – Human Genome Project
• In January 1989, a group of biologists, ethicists, industry
scientists, engineers, and computer scientists met and
announced the Human Genome Project.
• The Human Genome Project was a 13 year research effort
(1990-2013) staffed by scientists from all over the world
from both the private and public sectors.
• There was stiff competition between the public and private
sector scientists to sequence the human genome.
• In 2001, the leaders of both groups announced the first
version of the human genome and published their research
simultaneously.
DNA Sequencing and The
Human Genome Project
• The purpose of the Human Genome Project was:
1. Sequence the entire human genome
2. Analyze genetic variations among humans.
3. Map and sequence the genomes of model organisms ,including
bacteria, yeast, roundworms, fruit flies, mice, and others.
4. Develop new laboratory technologies such as automated sequencers
and computer databases.
5. Disseminate genome information among scientists and the general
public.
6. Consider the ethical, legal, and social issues that accompany the HGP
and genetic research.
DNA Sequencing and The Human Genome Project
• When the Human Genome Project started off, results were slow to
come.
• DNA sequencing was done manually by a combination of the Sanger
method and the shotgun cloning method.
• In the shotgun method, very large pieces of DNA are cut into smaller
overlapping fragments by restriction enzymes.
• The overlapping fragments are called contiguous sequences
(contigs).
• DNA fragments were inserted into bacterial plasmids and replicated
along with the bacteria.
• Scientists then used the Sanger method to sequence each fragment.
• Using computer analysis, the fragments were reassembled into the
entire genome sequence.
DNA Sequencing and The Human Genome Project
DNA Sequencing and The Human Genome Project
• In 1988, Celera Genomics focused on the development of automated DNA
sequencing technologies.
• Celera announced, with this new technology, they could sequence the
entire genome by 2001 for $200 million.
• A competition between the private and public sectors began and this had
an positive impact on the work to be done.
• In time, the entire genome project, adopted Celera’s automated DNA
sequencing equipment.
• Celera’s original sequencing machines could sequence about 500
nucleotides in a single reaction; a vast improvement over the 200
nucleotides sequenced by the manual Sanger method.
• Today’s automated DNA sequencing technologies can sequence about 1
billion nucleotides in a single reaction.
DNA Sequencing and The Human Genome Project
• What did we learn from the Human Genome?
• The human genome consist of about 3.1 billion
base pairs.
• The genome is 99.9% the same among all
humans.
• Single nucleotide polymorphisms (SNPs) account
for the genomic diversity among humans and
serve as markers to identify disease.
• Less that 2% of the total genome codes for
protein.
• 98% of the total genome is composed nonprotein coding DNA with 50% of it being
repetitive DNA sequences and the other 50% of
transposons.
DNA Sequencing and The Human Genome Project
• What did we learn from the Human Genome?
• The genome has approximately 20,000 coding
genes.
• Many genes make more than one protein; 20,000
genes make 100,000 proteins.
• Functions of one half of all human genes is
unknown.
• Chromosome 1 has the highest number of . The Y
chromosome has the least.
• Many of the genes in the human chromosome
show a high degree of similarity to genes in other
organisms.
• Thousands of human diseases have been
identified and mapped to their chromosomal
locations.
Video – Cracking the Code of Life
• http://video.pbs.org/video/1841308959/
• NOVA – Cracking the Code of Life 153 minutes
Lesson 2B – What you need to know
• List the purposes of the Human Genome Project.
• Describe how DNA was sequenced at the beginning of the genome
project.
• How did Celera change DNA sequencing.
• What did we learn from the Human Genome Project?
Lesson 3 Bioinformatics
• 3A – Lecture: Overview of Bioinformatics
Activity: Bioinformatics Exercise 1
• 3B- Lecture : Gene Databases
Activity: Bioinformatics Exercises 2-9
Activity: Using GEO database to research disease.
• 3C- Lecture: Protein Databases
Activity: Bioninformatics amino acid sequences
3A Bioinformatics - Overview
• The work of the Human Genome Project and significant
developments in molecular biology have generated staggering
amounts of biological information.
• The huge amount of nucleic acid and protein sequences have led to
the development of an interdisciplinary field called bioinformatics.
• Bioinformatics uses computational models to store, organize,
analyze and manipulate vast amounts of biological data.
• Those involved in the field of bioinformatics make use of biology,
computer science, mathematics, genetics, and statistics to analyze
DNA and protein sequences, to study genomes, and predict the
structure and function of DNA and proteins.
• This marriage of biological data and computer science allows
scientists to manipulate and analyze gene and protein data via many
different computer databases.
Bioinformatics
Bioinformatics
• A biological database is a collection
of biological data organized in a
specific and useful way.
• Bioinformatics data bases are very
large, accessible by the Internet, and
are continuously updated with new
research information.
• To acquire information from the
database, a request called a query is
made and the information obtained
is called the result.
Bioinformatics
• The three comprehensive databases on the Internet are
- GenBank at the National Center for Biotechnology Information (in
the U.S.)
- EMBL: European Molecular Biology Laboratory (in the U.K.)
- DDBJ: DNA Database of Japan
• The major databases share and integrate the data from different
sources, and each database provides links to other online sources
• Researches and the public can take advantage of these vast online
resources.
Bioinformatics
• There are a many types of
biological data bases
1. Nucleic acid sequences
2. Genome
3. Gene Expression
4. Protein sequences
5. Protein structure
6. Gene and protein comparisons for
evolutionary relationships
Bioinformatics Activity
• Refer to your handout for Exercise 1: Introduction to Bioinformatics.
3A – What you need to know
•
•
•
•
What is bioinformatics?
Why did bioinformatics develop as a field of work?
What are the 3 comprehensive databases?
What types of databases are available for use?
3B Bioinformatics – Gene Databases
• Nucleic Acid Sequence
Database
• The BLAST program is a useful
basic research tool for finding
nucleic acid base sequences.
• If a researcher has an
unknown DNA sequence, the
sequence can be entered into
nucleotide BLAST as a query.
• The program will compare it
against all DNA sequences and
find a similar identified
sequences.
Bioinformatics
• The similar sequences will be assigned a Max Score and a Total Score
which give an indication of the number of aligned sequences. The
higher the scores the better the match.
• The Query Coverage gives the percentage of sequence alignments.
The higher the percentage the better the match.
• An Expected value (E value) will also be assigned to the sequence.
This is a measure of the number of matching sequences that can be
expected by random chance for a particular query. The lower the E
value, the more likely the sequences are related to each other.
Bioinformatics
Bioinformatics Activity
• Refer to your handout.
• We will complete Exercises 2,3,4,5, and 9
Bioinformatics
• Genome Databases
• Whole genome databases have been developed for more than 1,000
organisms.
• Database tools all scientist to identify genes and gene families within
specific genomes, to find locations of genes in the genome, and to
analyze evolutionary relationships.
• These databases like BLAST are linked to other sequence and
literature databases.
Bioinformatics
Bioinformatics Activity
• A quick tour of dbVAR
• Go to the NCBI site: http://www.ncbi.nlm.nih.gov/
• Click on Genomes and Maps in the left column.
• Click on the Database for Genomic structural variation.
• Click on Genome Browser
• You can query by gene, chromosome number, or gene location.
• Enter PTEN (a gene) and GO.
• You will see a map of chromosome 10.
• Below you can find the location of the gene (blue line).
• Click on it and select GENBANK VIEW.
• Click on Gene in right column.
Bioinformatics
• Gene Expression Databases
• The phenotype of an organism-its physical
characteristics, the way it interacts with the
environment, and diseases – depends on
which genes are expressed.
• Gene expression varies in different cell
types and in any cell over time, based on its
developmental stage and the environment.
• One of the ways that all expressed genes in
a particular cell at a particular time can be
measured is through microarray analysis.
Bioinformatics
• A microarray tray holds all the DNA from a genome. The DNA is cut
into fragments and placed in wells on a plastic tray.
• All mRNA from a cell of interest is converted to cDNA and tagged
with a fluorescent dye. It is also fragmented.
• The DNA from the tray and the fluorescent DNA hybridize on the tray.
• The tray is scanned by a laser that causes the dye to fluoresce when
cDNA binds to gene DNA on the slide.
• The fluorescent spots indicate which genes are expressed in the cells
of interest.
Bioinformatics
Bioinformatics
• Data collected from these tests show the relationship between a
particular phenotype and which genes are being expressed.
• Microarray and several other types of tests are used by scientists in
researching genetic expression in cells.
• Their research findings can be submitted to a Gene Expression
Omnibus Database (GEO).
• This database has gene expression information with more than
120,000 samples from 200 organisms.
Bioinformatics - Activity
• GEO Database
• We will research Merkel Cell Carcinoma on line.
• Then we will visit the GEO database and summarize research
literature about genetic expression in this type of cancer.
• See your handout for details
Bioinformatics
• Gene Expression
• As we learned in an earlier unit, each gene has a promoter region,
sometime an enhancer, the actual gene to be transcribed, and a stop
region.
• Depending of various factors present, the gene can be expressed or
it can be shut off.
• Sometimes in gene expression research, scientists need to locate a
promoter region, or the actual coding region of the gene.
• The following activity shows you how this is accomplished.
Bioinformatics Activity
• Refer to your handout.
• We will complete exercises 6,7,and 8
Lesson 3B - What you need to know
• What is the function of nucleotide BLAST?
• What do BLAST scores indicate?
• What does the query coverage indicate?
• What does an E value indicate?
• What type of information is provided by dbVAR?
• Describe the microarray procedure.
• What information does microarray provide?
• Information on your BLAST handout and how each search is
conducted.
3C –Bioinformatics Proteins
• Protein Sequence Databases
• Information about DNA sequences is important but
proteins determine the structure, function, and
behavior of an organism.
• Mutations in genes , for example, can lead to the
production of defective proteins.
• The major goals in protein research are to identify,
catalog, and understand the functions of proteins.
• Given that there are far fewer genes than proteins,
indicates that the protein complement of an organism
cannot be fully characterized by gene analysis alone.
The study of proteins is a necessary tool to understand
the complexities of living cells.
Bioinformatics
• Scientists use protein sequence databases to identify amino acid
sequences based on submission of nucleotide sequences.
• Protein sequences can be compared to other protein sequences and
this might reveal same or similar function.
• Evolutionary relationships can also be discovered in the comparison
of protein sequences.
• BLAST allows researchers to run protein comparisons or to translate
a nucleotide sequence into a protein product.
• The BLAST results can help a scientist determine the direction of
his/her research.
Bioinformatics
• Protein Structures
• The function of a proteins depends on the 3D structure
adopted by the folding of the linear amino acids.
• Bioinformatics provides tools that permit scientists to
perform online protein modeling. (i.e. predict the shapes
of proteins).
• When an amino acid sequence is known, it is possible to
compare the unknown sequence with identified protein
structures in the database.
• As novel proteins become known and protein folding
patterns are discovered, online protein modeling will
become an important skill for biologists studying protein
function.
Bioinformatics Activity
• Refer to your handout for activity directions.
• We will translate a nucleotide sequence into an amino acid
sequence, compare amino acid sequences to identify proteins from
different species, and finally take a look at a protein structure
database.
Lesson 3B – What you need to know.
•
•
•
•
•
What is the major goal of proteins research?
What does blastx do?
What does protein blast do?
What does online protein modeling enable a researcher to do?
Information from your BLAST search and how each search is
conducted.
Lesson 4 Evolutionary Relationships
• Lecture – Evolutionary Relationships
• Activity – Construct a phylogenetic tree using amino acid sequences.
• Activity – Use bioinformatics to study phylogenetic relationships.
Evolutionary Relationships
• Speciation is associated with changes in the genetic structures of
populations and with genetic divergence of those populations.
• Therefore, we are able to use genetic differences among species to
reconstruct evolutionary histories and relationships (phylogenies).
• After an ancestral species diverges into two separate species, each
branch gives rise to independent lineages that accumulate by chance
different mutations, usually small base pair changes in their DNA.
• Comparing nucleotide sequences and looking for these differences is
one way to establish evolutionary relationships.
Evolutionary Relationships
• The redundancy of the genetic code means
that these nucleotide changes might lead to a
change in an amino acid and a mutation in an
amino acid does not always change the
function of a protein.
• Scientists look at evolving proteins to study
relationships between organisms.
• In general with nucleotide sequences and
amino acid sequences, the more similar
sequences are when compared, the more like
the two organisms are closely related.
Evolutionary Relationships
• Nucleotide Sequences
• There are many genes and non-coding
regions of DNA that can be compared
when establishing evolutionary
relationships.
• Scientists have used nuclear,
mitochondrial, and genomic DNA,
along with mRNA, tRNA, rRNA, SINES,
and the Y chromosome to name a few.
• The choice of which sequences to
compare has to do with the types of
relationships in which a scientist is
interested.
Evolutionary Relationships
• When looking for relationships among several
organisms over very long periods of time, a DNA
sequence that is highly conserved is used.
• 16srRNA is a good example of this. It is used to
show evolutionary relationships (and speciation)
among prokaryotes and found on mitochondrial
DNA.
• The gene is universally present in all bacteria, it
mutates slowly, essential functions have been
conserved, and random mutations can be
correlated to evolution between species.
Evolutionary Relationships
• On the other hand, when looking at more closely related species,
such as at the family, genus, or species level, a DNA sequence that
mutates faster is used.
• The protein coding genes in mitochondrial DNA( for some proteins in
the electron transport chain) are good examples of these types of
evolutionary markers.
• The Y chromosome is a good example of both types of DNA
sequences:
• http://www.dnalc.org/view/15092-Studying-the-Y-chromosome-tounderstand-population-origins-and-migration-Michael-Hammer.html
Evolutionary Relationships
• Proteins
• Proteins with a significant amino acid sequence
similarity are conserved and are predicted to be
members of the same protein family.
• Scientists compare amino acid sequences, as well,
to determine evolutionary relationships.
• Generally, these comparisons are used to study
relationships over long periods of time.
• Changes in DNA sequences do not always result
in changes in a protein. Thus protein evolution is
a lengthier process than DNA evolution.
• Limitation to protein comparisons is that it is only
concerned with gene coding sections of DNA
Evolutionary Relationships
• http://www.youtube.com/watch?v=mA7BE3mEb64
• Molecular Evolution Video Clip
Evolutionary Relationships
• Phylogenetic Tree
• Once genetic or protein differences are found, the relationships among
organisms are often presented in the form of phylogenetic trees.
• The groups can be fromthe same species, or larger groups, and branches
of the tree represent lineages over time.
• Points at which the lines diverge are called nodes and show when a
species splits into two or more species.
• Each node represents a common ancestor of the species diverging at that
node.
• The root of the phylogenetic tree represents the oldest common ancestor
to all the groups shown on a tree.
Evolutionary Relationships
Evolutionary Relationships Activity
• Construct a phylogenetic
tree based on amino acid
differences.
• See your handout for
directions.
Evolutionary Relationships Activity
• We will be using bioinformatics to study phylogenetic relationships
among bears.
• It is Exercise 11 in your handout.
Lesson 4 – What you need to know
• What two methods are used to compare evolutionary relationships?
• How are DNA sequences selected for the study of evolutionary
relationships?
• When are amino acid sequences used in the study of evolutionary
relationships.
• How do you construct a phylogenetic tree?
• Information on your bioinformatics search regarding how the search
was conducted.
Download