genetic linkage map - Laboratory of Informatics and Chemistry

advertisement
WHAT IS
BIOINFORMATICS?
Daniel Svozil
Definition
• NCBI
• Bioinformatics is the field of science in which biology, computer
science, and information technology merge into a single discipline.
The ultimate goal of the field is to enable the discovery of new
biological insights and to create a global perspective from which
unifying principles in biology can be discerned.
• Wikipedia.org
• The application of information technology and statistics to the field
of molecular biology.
• The creation and advancement of databases, algorithms,
computational and statistical techniques, and theory to solve formal
and practical problems arising from the management, analysis and
interpretation of biological data.
http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
Extraction of biological knowledge from data
convert data to knowledge
generate new hypotheses
Experimental
Data
Knowledge
From public
databases
design new experiments
Omes
genome – DNA sequence in an organism
transcriptome – mRNA of an entire organism
proteome – all proteins in an organism
metabolome – all metabolites in an organism
interactome – all molecular interactions in an organism
Organism
Cell
Tissue architectures
Genome
Transcriptome
Proteome
Metabolome
Reactome
Cell interactions
Sigaling
……
Omes and Omics
• Genomics
• Primarily sequences (DNA and RNA)
• Databanks and search algorithms
• Supports studies of molecular evolution
• Proteomics
• Sequences (Protein) and structures
• Mass spectrometry, X-ray crystallography
• Databanks, knowledge bases, visualization
• Functional Genomics (transcriptomics)
• Microarray data
• Databanks, analysis tools, controlled terminologies
• Systems Biology (metabolomics)
• Metabolites and interacting systems (interactomics)
• Graphs, visualization, modeling, networks of entities
“Omics”
includes
Advanced
pre-processing
techniques
Reliable highthroughput
information
Genomics
Transcriptomics
Proteomics
Metabolomics
Interactomics
……
Measured by
To reduce noise High-throughput
Sequencing
Microarrays
LC/MS
NMR
Two hybrid
……
Their data are
High-noise
Techniques to analyze
high-dimensional data
and knowledgebases
Biological knowledge
Medical knowledge
Improved health
source: Bios 560R Introduction to Bioinformatics, userwww.service.emory.edu/~tyu8/560R/560R_1.pptx
Key reasearch in bioinformatics
• sequence bioinformatics
• structural bioinformatics
• systems biology
• analysis of biological pathways to gain e.g. the understanding of
disease processes
21st century – complex systems
• Understanding (reverse-engineering)
• Why is it so complex?
• Designing (forward-engineering)
• Can we make a sense of this
• Fixing
complexity?
• How is it robust?
http://yilab.bio.uci.edu/ICSB2007_Tutorial_AM1.htm
STUDYING GENOMES
Studying DNA
Enzymes for DNA manipulation
• Before 1970s, the only way in which individual genes
could be studied was by classical genetics.
• Biochemical research provided (in the early 70s)
molecular biologists with enzymes that could be used to
manipulate DNA molecules in the test tube.
• Molecular biologists adopted these enzymes as tools for
manipulating DNA molecules in pre-determined ways,
using them to make copies of DNA molecules, to cut DNA
molecules into shorter fragments, and to join them
together again in combinations that do not exist in nature.
• These manipulations form the basis of recombinant DNA
technology.
Recombinant DNA technology
• The enzymes available to the molecular biologist fall into
four broad categories:
1.
2.
DNA polymerase – synthesis of new polynucleotides
complementary to an existing DNA or RNA template
Nucleases – degrade DNA molecules by breaking the
phosphodiester bonds
• restriction endonucleases (restriction enzyme) – cleave DNA
molecules only when specific DNA sequences is encountered
3.
4.
Ligases – join DNA molecules together
End modification enzymes – make changes to the ends of
DNA molecules
source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/
DNA cloning
• DNA cloning (i.e. copying) – logical extension of the ability
to manipulate DNA molecules with restriction
endonucleases and ligases
• vector
• DNA sequence that naturally replicates inside bacteria.
• It consists of an insert (transgene) and larger sequence serving as
the backbone of the vector.
• The purpose of a vector: transfer genetic information to another cell
to isolate, multiply, or express (expression vector) the insert in the
target cell.
• plasmid (1-10 kbp), cosmid (40-45 kbp), BAC (100-350 kbp), YAC
(1.5-3.0 Mbp)
Vectors
• plasmid
• DNA molecule that is separated from, and can replicate
independently of, the chromosomal DNA.
• Double stranded, usually circular, occurs naturally in bacteria.
• Serve as important tools in genetics and biotechnology labs, where
they are commonly used to multiply (clone) or express particular
genes.
• BAC (bacterial artificial chromosome)
• It is a particular plasmid found in E. coli. A typical BAC can carry
about 250 kbp.
restriction endonuclease
ligase
source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/
PCR – Polymerase chain reaction
• DNA cloning results in the purification of a single fragment
•
•
•
•
of DNA from a complex mixture of DNA molecules.
Major disadvantage: it is time-consuming (several days to
produce recombinants) and, in parts, difficult procedure.
The next major technical breakthrough (1983) after gene
cloning was PCR.
It achieves the amplifying of a short fragment of a DNA
molecule in a much shorter time, just a few hours.
PCR is complementary to, not a replacement for, cloning
because it has its own limitations: the need to know the
sequence of at least part of the fragment.
Mapping genomes
What is it about?
• Assigning/locating of a specific gene to particular region
of a chromosome and determining the location of and
relative distances between genes on the chromosome.
• There are two types of maps:
• genetic linkage map – shows the arrangement of genes (or other
markers) along the chromosomes as calculated by the frequency
with which they are inherited together
• physical map – representation of the chromosomes, providing the
physical distance between landmarks on the chromosome, ideally
measured in nucleotide bases
• The ultimate physical map is the complete sequence itself.
Genetic linkage map
• Constructed by observing how frequently two markers
•
•
•
•
(e.g. genes, but wait till next slides) are inherited together.
Two markers located on the same chromosome can be
separated only through the process of recombination.
If they are separated, childs will have just one marker
from the pair.
However, the closer the markers are each to other, the
more tightly linked they are, and the less likely
recombination will separate them. They will tend to be
passed together from parent to child.
Recombination frequency provides an estimate of the
distance between two markers.
Genetic linkage map
• On the genetic maps distances between markers are
measured in terms of centimorgans (cM).
• 1cM apart – they are separated by recombination 1% of the
time
• 1 cM is ROUGHLY equal to physical distance of 1 Mbp in human
Value of genetic map – marker analysis
• Inherited disase can be located on the map by following the
inheritance of a DNA marker present in affected individual (but
absent in unaffected individuals), even though the molecular
basis of the disease may not yet be understood nor the
responsible gene identified.
• This represent a cornerstone of testing for genetic diseases.
Genetic markers
• A genetic map must show the positions of distinctive
features – markers.
• Any inherited physical or molecular characteristic that
differs among individuals and is easily detectable in the
laboratory is a potential genetic marker.
• Markers can be
• expressed DNA regions (genes) or
• DNA segments that have no known coding function but whose
inheritance pattern can be followed.
• genes – not ideal, larger genomes (e.g. vertebrates) →
gene maps are not very detailed (low gene density)
Genetic markers
• Must be polymorphic, i.e. alternative forms (alleles) must
exist among individuals so that they are detectable among
different members in family studies.
• Variations within exons (genes) – lead to observable
changes (e.g. eye color)
• Most variations occur within introns, have little or no effect
on an organism, yet they are detectable at the DNA level
and can be used as markers.
1.
2.
3.
restriction fragment length polymorphisms (RFLPs)
simple sequence length polymorphisms (SSLPs)
single nucleotide polymorphisms (SNPs, pron.: “snips”)
RFLPs
• Recall that restriction enzymes cut DNA molecules at specific
recognition sequences.
• This sequence specificity means that treatment of a DNA
molecule with a restriction enzyme should always produce the
same set of fragments.
• This is not always the case with genomic DNA molecules
because some restriction sites exist as two alleles, one allele
displaying the correct
sequence for the restriction
site and therefore being cut,
and the second allele having
a sequence alteration so the
restriction site is no longer
recognized.
source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/
SSLPs
• Repeat sequences that display length variations, different alleles
containing different numbers of repeat units (i.e. SSLPSs are multiallelic).
• variable number of tandem repeat sequences (VNTRs,
minisatellites)
• repeat unit up to 25 bp in length
• simple tandem repeats (STRs, microsatellites)
• repeats are shorter, usually di- or tetranucleotide
source: Brown T. A. , Genomes. 2nd ed. http://www.ncbi.nlm.nih.gov/books/NBK21129/
SNPs
• Positions in a genome where some individuals have one
•
•
•
•
nucleotide and others have a different nucleotide.
Vast numbers of SNPs in every genome.
Each SNP could have potentially four alleles, most exist in
just two forms.
The value of two-allelic marker (SNP, RFLP) is limited by
the high possibility that the marker shows no variability
among the members of an interesting family.
The advantages of SNP over RFLP:
• they are abundant (human genome: 1.5 millions of SNPs, 100 000
RFLPs)
• easire to type (i.e. easier to detect)
more at http://www.informatics.jax.org/silver/chapters/7-1.shtml
Genome maps
relative locations of genes are
established by following inheritance
patterns
visual appearance of a chromosome
when stained and examined under a
microscope
the order and spacing of the genes,
measured in base pairs
sequence
map
source: Talking glossary of genetic terms, http://www.genome.gov/glossary/
Download