Introduction to Genetical

advertisement
Introduction to Genetics – as
relevant to this course
(Ack: Roche Genetics CD-ROM,
Mishra’s notes at NYU, …)
Background (1/18)
• Genome, Chromosome, Genes – made up of DNAs
• Genetics research (largely over last 100yrs, accelerated in last 30
yrs)
– Has led to important advances in medical science.
• Nucleus of a cell : contains chromosomes (made up of DNA); and
proteins.
• DNA (Deoxy Ribo Nucleic acid)
– Is the genetic material that is inherited.
– Contains the information needed by living cells to specify their structure,
function, activity and interaction with other cells and environment.
– A DNA molecule can be thought of as a very long sequence of
nucleotides or bases.
DNA structure (2/18)
• The Nobel Prize in Physiology or Medicine 1962 -Crick, Watson and Wilkins
– for their discoveries concerning the molecular structure of
nucleic acids and its significance for information transfer in
living material
• Made up of 4 different building blocks (so called
nucleotide bases), each an almost planar nitrogenic
organic compound
–
–
–
–
–
Adenine (A)
Thymine (T)
Guanine (G)
Cytosine (C)
Base pairs (A -- T, C -- G)
DNA Structure cont. (3/18)
• Base pairs (A -- T,C -- G) are attached to a sugar phosphate
backbone to form one of 2 strands of a DNA molecule.
– Phosphate ((PO4) -3)
– Deoxyribose
• Two strands are bonded together by the base pairs (A – T, C – G).
• Results in mirror image or complementary strands, each is twisted
(or helical), and when bonded they form a double helix.
• Direction of each strand (5’ meaning beginning or 3’ meaning end of
the strand)
– 5’ and 3’ refer to position of bases in relation to the sugar molecule in
the DNA backbone.
– Are important reference points to navigate the genome.
– 2 complementary strands are oriented in opposite direction to each
other.
DNA Structure
Genome Size
Species
Genome Size
(in base pairs)
No. of
Chromosomes
E. Coli
4.64 X 106
1
1.205X107
16
108
11/12
S. Cerevisae
C. Elegans
(yeast)
(nematode)
D. Melanogaster 1.7X108
4
M. Musculus
3 X 109
20
H. Sapiens
3 X 109
23
6 feet when completely
stretched out
A. Cepa
(onion)
1.5X1010
8
DNA hybridization (DL 3/18)
• Hybridization between complementary DNA sequences
to form a double stranded DNA molecule.
– One of the most important DNA technology
• Applications of Hybridization
– PCR (Polymerase Chain Reaction)
• Enzymetically generating millions of copies of a tiny amount of a
particular nucleic acid sequence.
– Northern blots analysis
• Possible to study (in a semi-qualitative manner) the level of
transcription of a particular gene.
– DNA Microarrays
• Can interrogate the level of transcription of several thousand of
different genes in one sample in one experiment.
PCR (Polymerase Chain Reaction)
(DL 3/18)
•
PCR allows selected amplification of a DNA sequence.
–
–
Only a tiny amount of DNA is necessary to obtain a PCR product (a drop of blood or less is
enough).
Complementary DNA primers need to be designed.
•
•
•
For this the DNA sequence flanking the target sequence needs to be known in advance.
Primers are short synthetic DNA sequences of about 20 bases (so called oligonucleotides) that can
specifically hybridize to a unique complementary DNA sequence.
The approach
–
–
–
–
–
–
Genomic DNA (the template), Primers (the starters), deoxynucleotides (building blocks), a
special DNA polymerase that is resistant to heat (the motor of the reaction) are mixed
together in one reaction tube.
Reaction takes place in a thermocycler (an apparatus that allows one to precisely heat and
cool the reaction).
DNA is heated to almost boiling temperature which separates the 2 strands (whole process is
called heat denaturation)
Cooling of the mixture allows the primer to bind to their complementary sequence of the
genomic DNA.
Once the primers bind the DNA polymerase uses them as the start site to generate a copy of
each strand of the targeted gene fragment building 2 new double stranded molecules.
Doing it (denaturation followed by cooling) 30 times, results in 230 = 109 (1 billion) copies.
Northern Blot analysis
•
•
The complete RNA content from a sample is separated according to size by
electrophoresis.
Usually done in a sheet of agarose (similar to gelatine)
–
•
•
•
•
•
•
•
In response to electric current, larger molecules move slower, and smaller move faster, thus
separating different RNA molecules by size.
Then RNAs are transferred from gel to a filter membrane (blot)
Blot is then exposed to a solution containing a nucleic acid (probe) complement to the
sequence whose presence in the blot one wants to interrogate.
The probe may be cRNA or cDNA with detectable marking (radioactive isotope or a
fluoroscent tag)
If the targeted sequence is present in the blot then the probe hybridizes and sticks to
the blot at the location where the targeted sequence is located.
After washing off of excess probe -- a signal is detectable and its specificity can be
checked based on the expected size of the RNA that will correlate with how far it has
migrated during electrophoresis.
With this method – It is possible to study in a semiqualitative manner the level of
transcription of a particular gene.
Comparison of the results from different samples (e.g. different organs etc.) provides
information about the transcriptional regulation of the gene.
DNA Structure cont. (4/18)
• The order of nucleotide bases along a DNA
strand is known as the sequence.
• The genetic information is encoded in the
precise order of the base pairs.
• DL
– GenBank database
http://www.ncbi.nlm.nih.gov/Entrez/
– Human genome project
http://www.genome.gov/page.cfm?pageID=10001694
– DNA sequencing
• Is the process designed to precisely determine the sequence
of bases in the DNA.
Cells, DNA and genome (5,6,8/18)
• During cell division (Mitosis) the entire DNA of the cell is
copied
– 2 strands separate, complementary strands are generated.
– Two duplicate DNA sequences are produced.
• Genome: an organism’s total DNA content
• Diploid cells: cells that carry 2 genome copies
• Haploid cells: have a single copy of the genome
– Reproductive germs cells (gametes), i..e., egg & sperm cells
• Human genome consists of
– 22 autosomal chromosomes (same in males and females)
– 2 sex chromosomes X and Y (males XY, females XX)
Structure of Chromosomes (7/18)
• Center is called centromere.
• Two ends called Telomere.
• Center separates two arms
– Short arm p
– Long arm q.
Structure of genes 9-11/18
• Genes are those parts of the genome that contain the
information necessary for the building of proteins.
(size:100-several million base pairs)
• Exon (coding sequence), Intron (non-coding sequence),
regulatory region (at the two ends –for regulating how
actively protein is to be synthesized from them)
– Eukaryotes (organisms whose cell have nucleus) have genes
segmented into exons and introns
– Introns can occur between individual codons or within a single
codon.
• Promoter (a regulatory element in the 5’ end)
– Consists of several short sequences which are consensus
binding sites for a number of proteins called transcription factors.
(DL 10/18)
• Prokaryotes (do not have nucleus) – genes are not
segmented to exons and introns.
• Eukaryotes (normally segemented to exons and introns)
– Except mitochondrial genes & a few nuclear genes.
• During gene expression exons and introns are
transcribed to form a pre-mRNA
• RNA splicing -- removes introns and exons and produces
mature mRNA molecule that codes for a polypeptide.
• Exons – sequences that are represented in the mature
mRNA
– May or may not code for a protein
– Eg. Exons at the 3’ or 5’ end of mRNA may not be translated to
proteins
Some Genes (from Mishra’s slides)
Gene Product
Organism
Exon #Introns
Length
Intron
Length
Adenoshine deaminase
Human
1500
11
30,000
Apolipoprotein B
Human
14,000
28
29,000
Erythropoietin
Human
582
4
1562
Thyroglobulin
Human
8500
= 40
100,000
a-interferon
Human
600
0
0
Fibroin
Silk Worm
18,000
1
970
Phaseolin
French
Bean
1263
5
515
Some human gene locations
(From Mishra’s slides)
Genes
chromosome
Genes
chromosome
a-globin cluster
16
Insulin
11
b-globin cluster
11
Galactokinase
11
Immunoglobulin
k (light chain)
2
l (light chain)
22
Heavy Chain
Pseudogenes
Viral oncogene
homologues
C-sis
22
C-mos
8
14
C-Ha-Ras-1
11
9,32,15,18
C-myb
6
Growth Hormone
gene cluster
17
Thymidine kinase
17
Interferons
a & b luster
9
g
12
Gene expresion (12/18)
• Gene expression (Transcription and
Translation) – from genes to making
proteins the 2 step process
• Transcription: genetic information in DNA
is copied into messenger RNA (mRNA)
• Translation: mRNA is used as a template
to synthesize a protein.
Central Dogma
• Due to Francis Crick – 1958 states that these
information flows are all unidirectional:
– “The central dogma states that once `information' has passed into
protein it cannot get out again. The transfer of information from nucleic
acid to nucleic acid, or from nucleic acid to protein, may be possible, but
transfer from protein to protein, or from protein to nucleic acid is
impossible. Information means here the precise determination of
sequence, either of bases in the nucleic acid or of amino acid residues
in the protein.”
Transcription (13/18)
•
RNA (Ribonucleic acid)
–
–
–
–
•
Similar to DNA (except for a chemical modification of the sugar backbone)
Instead of T contains U (Uracil) which binds with A.
Is not double stranded but single stranded
RNA molecules tend to fold back on themselves to make helical twisted and rigid
segments.
RNA is synthesized
– By unwinding the DNA double helix separating the 2 strands.
– Using one of the strands as a template along which to build the RNA molecule
– Accomplished by Enzyme RNA polymerase (binds to promoter and copies or
transcribes the gene in its full length)
– Resulting molecule is called Pre-mRNA
– Single stranded pre-mRNA is then procesed.
– Splicing (mediated by spliceosome consisting of RNA and proteins) removes the
introns.
– Ends modified (Capping modifies 5’ end and Polyadenylation adds adenines at
the 3’ end) to enhance stability
Translation (14/18)
•
•
•
•
mRNA is used as a template to synthesize a protein.
Translation takes place outside the nucleus in the cytoplasm within organelles called
endoplasmic reticulum.
Except for the 5’ & 3’ end of the mRNA (which are non-coding) the rest of the
molecule codes for 1 protein
Proteins: made up of aminoacids
–
–
–
–
•
20 different aminoacids used to build proteins in humans
Each encoded by one or more sets of 3 nucleotides (called triplets or codons)
Initial codon is always AUG (coding for methionine)
Translation is terminated by one of 3 `stop’ codons.
Translation process is carried out by ribosomes which scan the mRNA, & build the
polypetide chain from aminoacids supplied by transport RNAs (tRNA).
–
–
–
–
Starts at a particular location of the mRNA called the translator start sequence (usually AUG)
tRNA (transfer RNA) are made up of a group of small RNA molecules each with specificity for
a particular amino acid.
tRNAs carry the aminoacids to the ribosomes, the site of protein synthesis, where they are
attached to a growing polypetide.
Translation stops when one of UAA, UAG or UGA is encountered
Post-translational modification (DL)
• The polypetide chain that results from mRNA
translation is often subject to chemical
modifications. Eg.
– Glycosylation, phosphorylation, hydrooxylation
– Addition of lipid groups (eg. Fatty acyl or prenyl
groups)
– Addition of co-factors (e.g. a heme molecule)
– Or proteolytic cleavage
• The type of modification a protein undergoes
depends on its function and sub-cellular
location.
Genetic Code (15/18)
• The combination of nucleotides that build the different codons
represents the genetic code.
• Codon = 3 nucleotides; 4 kinds of nucleotides. So 4X4X4 = 64
possible codons.
• But 20 amminoacids + start & stop.
• So several codons can specify the same aminoacid. (genetic code is
degenerate)
• Start codon (AUG) and Stop codons (UAA, UAG, UGA).
• Open reading frame (ORF) – the sequence of nucleotides between
and including the start and stop codons.
• The Nobel Prize in Physiology or Medicine 1968 – Holley,
Khorana and Nirenberg
– for their interpretation of the genetic code and its function in protein
synthesis
– http://www.nobel.se/medicine/laureates/1968/
Amino Acids with Codes
(From Mishra’s slide)
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
Ala
Cys
Asp
Glu
Phe
Gly
His
Ile
Lys
Leu
Met
Asn
Pro
Gln
Arg
Ser
Thr
Val
Trp
Tyr
alanine
cysteine
aspertic acid
glutamic acid
phenylanine
glycine
histine
isoleucine
lysine
leucine
methionine
asparginine
proline
glutamine
arginine
serine
threonine
valine
tryptophan
tyrosine
GC(U+A+C+G)
UG(U+C)
GA(U+C)
GA(G+A)
UU(U+C)
GG(U+A+C+G)
CA(U+C)
AU(U+A+C)
AA(A+G)
(C+U)U(A+G) + CU(U+C)
AUG
AA(U+C)
CC(U+A+C+G)
CA(A+G)
(A+C)G(A+G)+CG(U+C)
(AG+UC)(U+C)+UC(A+G)
AC(U+A+C+G)
GU(U+A+C+G)
UGG
UA(U+C)
Biological Function of Proteins
• Enzyme catalysis: DNA polymerases, lactate dehydrogenase,
trypsin
• Transport: hemoglobin, membrane transporters, serum albumin
• Storage: ovalbumin, egg-white protein, ferritin
• Motion: myosin, actin, tubulin, flagellar proteins
• Structural and mechanical support: collagen, elastin, keratin, viral
coat proteins
• Defense: antibodies, complement factors, blood clotting factors,
protease inhibitors
• Signal transduction: receptors, ion channels, rhodopsin, G
proteins, signalling cascade proteins
• Control of growth, differentiation and metabolism: repressor
proteins, growth factors, cytokines, bone morphogenic proteins,
peptide hormones, cell adhesion proteins
• Toxins: snake venoms, cholera toxin
Differential Gene Expression 17/18
• All cells in the body (that contain a nucleus) carry the full
set of genetic information, but only express about 20% of
the genes at any particular time.
• Gene expression is selective
– Different proteins are expressed in different cells according to
the function of the cell.
• Gene expression is tightly controlled and regulated.
– The differential expression of genes ensures that cells develop
correctly and can differentiate into and function as specialized
cell types. For eg. Neurons, muscle cell, or fibroblast.
cDNA and gene expression (DL)
• Goal: Identify all possible genes expressed in one tissue or cell line.
(Use cDNA libraries)
• cDNA libraries are prepared from mRNA isolated from the cells or
tissue being studied.
• cDNA are DNA molecules that are complementary to the mRNA
sequences in a sample.
• cDNA is synthesized by the enzyme reverse transcriptase (RT), that
uses the mRNA as a tenplate.
– RT is a viral enzyme used by viruses whose genome is made of RNA,
not DNA.
• A cDNA library represents the collection of all genes expressed in a
particular cell or tissue type.
• DNA sequence  mRNA sequence  cDNA sequence (much
smaller as while generating mRNAs the introns are eliminated)
– Hence very useful when trying to isolate a particular gene to study the
protein it codes.
NEXT SEC.
Gene Cloning 1/11
•
•
First step in identifying genes and their function is to isolate it from the rest
of genome and produce a large quantity of it (called cloning a gene).
Cloning a DNA fragment using bacteria
– DNA fragment is isolated from the entire genome using restriction enzyme.
• These enzymes can cut the DNA (in a staggered fashion or straight through) at specific
sites defined by a short sequence.
• Typically they recognize specific DNA sequences of 4, 6, or 8 bases
• These enzymes are found in bacterias, where their role is to protect the bacteria from
foreign DNA by digesting them into smaller pieces
– This fragment is inserted into a vector (like a mini-chromosome) using DNA
ligase and the recombinant product is introduced into bacteria (this process is
called transformation)
• Cloning vectors are DNA fragments that are able to replicate within a cell and allow the
addition of exogenous DNA.
• They are derived from plasmids, viruses, phages or chromosomes.
• Vectors are classified according to: the type of host cell they can replicate in, or the size
of the exogenous DNA they are able to carry.
– The bacteria now makes new copies with every cell division.
DNA Sequencing (DL 1/11)
• It is the process designed to precisely determine
the sequence of bases in the DNA.
• Involves enzymetically copying the DNA in the
presence of compounds that terminate this
copying process in a base specific manner,
resulting in a mixture of DNA copies that differ in
size by one base.
• Different technologies are used to resolve the
mixture and detect the different fragments.
Cloning issues (2-3/11)
• Clones from genomic DNA contain introns
(non-coding sequence) and is very large
and difficult to analyze for function.
• Alternative: start from mRNA. Convert to
cDNA and clone the cDNA.
Gene function
characterization(4/11)
• To characterize the function of a gene it is
important to know the sequence and
compare it to other sequences in the
databases. Identify where and under what
condition it is expressed and what
function, if known, it has in other
organisms.
• Also do gene expression studies.
Gene expression studies (5/11)
•
•
•
Allow you to understand how a gene is regulated in a tissue or a cell type.
Most useful way of studying gene expression is by measuring the levels of
mRNA produced from a particular gene in a particular tissue.
Application: to understand certain biological process it is useful to study the
differences in gene expression which occur during such processes. E.g.
– It is of interest to know which genes are induced or repressed, say in the liver,
after a particular drug is taken.
– Or which genes are expressed in a tumor but not in the surrounding normal
tissue.
•
Some techniques for analyzing mRNA level of a single gene or to quantify
gene expression
–
–
–
–
Northern blots
Quantitative reverse transcriptase PCR (QT-RT-PCR)
DNA microarrays
Proteomics (analysis of the protein synthesis that results from gene expression)
DNA microarrays (6/11)
•
•
•
•
•
•
•
Consist of thousands of DNA probes corresponding to different genes
arranged as an array.
Each probe (sometimes consisting of a short sequences of synthetic DNA)
is complementary to a different mRNA (or cDNA)
mRNA isolated from a tissue or cell type is converted to fluoroscently
labeled mRNA or cDNA and is used to hybridize the array.
All expressed genes in the sample will bind to one probe of the array and
generate a fluoroscent signal.
A DNA microarray can interrogate the level of transcription of several
thousand of different genes from one sample in one experiment. (One DNA
microarray experiment reveals the mRNA levels of 1000s of genes from one
tissue or cell type at one time point)
Particularly useful when studying the effect of environmental factors on
gene expression.
A fingernail size chip can interrogate 10,000 different transcripts. Chip has
30-40 different probes; half of them are designed to perfectly match 20
nucleotide stretches of the gene and the other half contains a mismatch as
a control to test for specificity of the hybridization signal.
Pharmacogenomics 7/11
• It refers to the study of differential gene
expression applied to drug discovery and
optimization.
• Applications (Differential gene expression
studies in special tissues or cell types may)
–
–
–
–
Find new disease mechanisms of a drug
Discover new drug targets
Confirm expected action of mechanism of a drug
Choose from best candidate compound based on
optimal expression profile.
– Figure out apriori with who will benefit from a drug
and who won’t.
Model organisms 9/11
• Indispensable tool to study the function of a
gene.
• Range from bacteria and yeast to animals
amenable to genetic modification.
– Worms, insect cells, frog eggs, flies, zebra fish, mice,
mammalian (human) cell lines.
• In general, more complex the organism more
difficult to do genetic modification, but more
relevant the model becomes to humans.
Download