Instructor
Zhi Wei
Outline
Cell
Genome
Gene mRNA
Proteins
Systems biology
Outline
Cell
Genome
Gene mRNA
Proteins
Systems biology
Cells
Fundamental working units of every living system.
Every organism is composed of one of two radically different types of cells: prokaryotic cells or eukaryotic cells.
Prokaryotes and Eukaryotes are descended from the same primitive cell.
Different Structures
Different Components
Different biological processes
Prokaryotes vs Eukaryotes
Prokaryotes
Single cell
No nucleus
No organelles
Eukaryotes
Single or multi cell
Nucleus
Organelles
One piece of circular DNA Chromosomes
No mRNA post transcriptional modification
Exons/Introns splicing
Prokaryotes
bacteria, archaea
Ecoli cell
5X10 6 base pairs
> 90% of DNA encode protein
5400 genes
Lacks a membrane-bound nucleus.
Circular DNA
Histones are unknown
Eukaryotes
plants, animals, protista, and fungi
Yeast cell
12.4x10
6 base pairs
A small fraction of the total DNA encodes protein. Many repeats of non-coding sequences
5800 genes
All chromosomes are contained in a membrane bound nucleus
DNA is divided between 16 chromosomes
A set of five histones: DNA packaging and gene expression regulation
Chemical composition -by weight
70% water
7% small molecules
salts
Lipids amino acids nucleotides
23% macromolecules
Proteins
Polysaccharides lipids
Cells differ in size, shape and weights
Q: what is the biggest cell in the human body?
Cell Cycle
Born, eat, replicate , and die
Lodish et al. Molecular Biology of the Cell (5 th ed.). W.H. Freeman & Co., 2003.
Sexual Reproduction v.s. Cell Division
Cell Division: Cells reproduce by duplicating their contents and dividing in two.
Sexual Reproduction
Formation of new individual by a combination of two haploid sex cells (gametes).
Gametes for fertilization usually come from separate parents
Both gametes are haploid, with a single set of chromosomes. The new individual is called a zygote, with two sets of chromosomes (diploid).
Meiosis is a process to convert a diploid cell to a haploid gamete, and cause a change in the genetic information to increase diversity in the offspring.
Meiosis v.s.
Mitotic cell division
Outline
Cell
Genome
Gene mRNA
Proteins
Systems biology
Genome
A genome is an organism ’ s complete set of DNA (including its genes).
However, in humans less than
3% of the genome actually encodes for genes.
A part of the rest of the genome serves as a control regions (though that ’ s also a small part)
The function of the rest of the genome is unknown (junk DNA?
An open question).
E. Coli
Yeast
Worm
Fly
Human
Plant
Genome size (bp) Num. of genes
.05*10 8
.12*10 8
.15*10 8
5,400
5,800
18,400
1.8*10 8
30*10 8
1.3*10 8
13,600
25,000
25,000
Outline
Cell
Genome
Gene mRNA
Proteins
Systems biology
Promoter Protein coding sequence Terminator
Genomic DNA
DNA: Deoxyribo Nucleic Acid
ATGAAGCTACTGTCTTCTATCGAACAAGCATGCGATATTTGCCGACTTAAAAAGCTCAAG
TGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGTCTGAAGAACAACTGGGAGTGTCGCTAC
TCTCCCAAAACCAAAAGGTCTCCGCTGACTAGGGCACATCTGACAGAAGTGGAATCAAGG
CTAGAAAGACTGGAACAGCTATTTCTACTGATTTTTCCTCGAGAAGACCTTGACATGATT
TTGAAAATGGATTCTTTACAGGATATAAAAGCATTGTTAACAGGATTATTTGTACAAGAT
AATGTGAATAAAGATGCCGTCACAGATAGATTGGCTTCAGTGGAGACTGATATGCCTCTA
ACATTGAGACAGCATAGAATAAGTGCGACATCATCATCGGAAGAGAGTAGTAACAAAGGT
CAAAGACAGTTGACTGTATCGATTGACTCGGCAGCTCATCATGATAACTCCACAATTCCG
TTGGATTTTATGCCCAGGGATGCTCTTCATGGATTTGATTGGTCTGAAGAGGATGACATG
TCGGATGGCTTGCCCTTCCTGAAAACGGACCCCAACAATAATGGGTTCTTTGGCGACGGT
TCTCTCTTATGTATTCTTCGATCTATTGGCTTTAAACCGGAAAATTACACGAACTCTAAC
GTTAACAGGCTCCCGACCATGATTACGGATAGATACACGTTGGCTTCTAGATCCACAACA
TCCCGTTTACTTCAAAGTTATCTCAATAATTTTCACCCCTACTGCCCTATCGTGCACTCA
CCGACGCTAATGATGTTGTATAATAACCAGATTGAAATCGCGTCGAAGGATCAATGGCAA
ATCCTTTTTAACTGCATATTAGCCATTGGAGCCTGGTGTATAGAGGGGGAATCTACTGAT
ATAGATGTTTTTTACTATCAAAATGCTAAATCTCATTTGACGAGCAAGGTCTTCGAGTCA
A sequence of A,C,G,T
MKLLSSIEQACDICRLKKLKCSKEKPKCAKCLKNNWECRYSPKTKRSPLTRAHLTEVESR
LERLEQLFLLIFPREDLDMILKMDSLQDIKALLTGLFVQDNVNKDAVTDRLASVETDMPL
TLRQHRISATSSSEESSNKGQRQLTVSIDSAAHHDNSTIPLDFMPRDALHGFDWSEEDDM
SDGLPFLKTDPNNNGFFGDGSLLCILRSIGFKPENYTNSNVNRLPTMITDRYTLASRSTT
SRLLQSYLNNFHPYCPIVHSPTLMMLYNNQIEIASKDQWQILFNCILAIGAWCIEGESTD
IDVFYYQNAKSHLTSKVFESGSIILVTALHLLSRYTQWRQKTNTSYNFHSFSIRMAISLG
LNRDLPSSFSDSSILEQRRRIWWSVYSWEIQLSLLYGRSIQLSQNTISFPSSVDDVQRTT
TGPTIYHGIIETARLLQVFTKIYELDKTVTAEKSPICAKKCLMICNEIEEVSRQAPKFLQ
MDISTTALTNLLKEHPWLSFTRFELKWKQLSLIIYVLRDFFTNFTQKKSQLEQDQNDHQS
YEVKRCSIMLSDAAQRTVMSVSSYMDNHNVTPYFAWNCSYYLFNAVLVPIKTLLSNSKSN
AENNETAQLLQQINTVLMLLKKLATFKIQTCEKYIQVLEEVCAPFLLSQCAIPLPHISYN
NSNGSAIKNIVGSATIAQYPTLPEENVNNISVKYVSPGSVGPSPVPLKSGASFSDLVKLL
SNRPPSRNSPVTIPRSTPSHRSVTPFLGQQQQLQSLVPLTPSALFGGANFNQSGNIADSS
A sequence of 20 amino acids {A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y,V}
promoter
G A T T A C A . . .
5’
3’
3’
5’
C T A A T G T . . .
transcription factor, binding site, RNA polymerase
G A T T A C A . . .
5’
3’
C T A A T G T . . .
3’
5’
Transcription factors recognize transcription factor binding sites and bind to them, forming a complex.
RNA polymerase binds the complex.
5’
3’
The two strands are separated
3’
5’
5’
3’
3’
5’
An RNA copy of the 5’ → 3’ sequence is created from the 3’ → 5’ template
5’
3’
G A T T A C A . . .
C T A A T G T . . .
pre-mRNA 5’
G A U U A C A . . .
3’
3’
5’
5’ cap, polyadenylation, exon, intron, splicing, UTR
5’ cap intron mRNA
5’ UTR exon
3’ UTR poly(A) tail
introns
5’ 3’ promoter
5’ UTR exons 3’ UTR coding non-coding
Regulatory regions: up to 50 kb upstream of +1 site
Exons: protein coding and untranslated regions (UTR)
Introns:
1 to 178 exons per gene (mean 8.8)
8 bp to 17 kb per exon (mean 145 bp) splice acceptor (GU) and donor (AG) sites, junk DNA average 1 kb – 50 kb per intron
Gene size: Largest – 2.4 Mb (Dystrophin). Mean – 27 kb.
Only 1.5% DNA for coding in Human!
Predicting the start and end of genes as well as the introns and exons in each gene is one of the basic problems in computational biology.
Gene prediction methods look for ORFs (Open
Reading Frame).
These are (relatively long) DNA segments that start with the start codon, end with one of the end codons, and do not contain any other end codon in between.
Splice site prediction has received a lot of attention in the literature.
Comparative genomics
Outline
Cell
Genome
Gene mRNA
Proteins
Systems biology
RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by
U(racil)
Some forms of RNA can form secondary structures by “ pairing up ” with itself. This can have change its properties dramatically.
DNA and RNA can pair with each other.
tRNA linear and 3D view: http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif
Several types exist, classified by function mRNA – this is what is usually being referred to when a Bioinformatician says
“ RNA ” . This is used to carry a gene ’ s m essage out of the nucleus.
tRNA – t ransfers genetic information from mRNA to an amino acid sequence rRNA – r ibosomal RNA. Part of the ribosome which is involved in translation.
Basically, an intermediate product
Transcribed from the genome and translated into protein
Number of copies correlates well with number of proteins for the gene.
Unlike DNA, the amount of messenger
RNA (as well as the number of proteins) differs between different cell types and under different conditions.
mRNA is transcribed from the DNA mRNA (like DNA, but unlike proteins) binds to its complement
Outline
Cell
Genome
Gene mRNA
Proteins
Systems biology
Proteins are polypeptide chains of amino acids.
20 different amino acids
different chemical properties cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell.
Proteins do all essential work for the cell
build cellular structures
digest nutrients execute metabolic functions
Mediate information flow within a cell and among cellular communities.
Genes Make Proteins
genome-> genes ->protein(forms cellular structural & life functional)->pathways & physiology
Second letter
U
C
A
G
U
UUU Phenylalanine (Phe)
UUC Phe
UUA Leucine (Leu)
UUG Leu
CUU Leucine (Leu)
CUC Leu
CUA Leu
CUG Leu
AUU Isoleucine (Ile)
AUC Ile
AUA Ile
AUG Methionine (Met) or START
GUU Valine (Val)
GUC Val
GUA Val
GUG Val
C
UCU Serine (Ser)
UCC Ser
UCA Ser
UCG Ser
CCU Proline (Pro)
CCC Pro
CCA Pro
CCG Pro
ACU Threonine (Thr)
ACC Thr
ACA Thr
ACG Thr
GCU Alanine (Ala)
GCC Ala
GCA Ala
GCG Ala
A
UAU Tyrosine (Tyr)
UAC Tyr
UAA STOP
UAG STOP
CAU Histidine (His)
CAC His
CAA Glutamine (Gln)
CAG Gln
AAU Asparagine (Asn)
AAC Asn
AAA Lysine (Lys)
AAG Lys
GAU Aspartic acid (Asp)
GAC Asp
GAA Glutamic acid (Glu)
GAG Glu
Triplet one Amino Acid
4^3 combinations mapped to 20 Amino Acids
G
UGU Cysteine (Cys)
UGC Cys
UGA STOP
UGG Tryptophan (Trp)
CGU Arginine (Arg)
CGC Arg
CGA Arg
CGG Arg
AGU Serine (Ser)
AGC Ser
AGA Arginine (Arg)
AGG Arg
GGU Glycine (Gly)
GGC Gly
GGA Gly
GGG Gly
C
A
G
C
A
G
U
U
C
A
G
U
C
A
G
U
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
Ala Cys Leu Arg Ile
G C U U G U U U G C G A A U U A G
Ala Cys Leu Arg Ile
G
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
Ala Cys Leu Arg Ile
G C U U G G U U A C G A A U U A G
Ala Trp Leu Arg Ile
A
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
Ala Cys Leu Arg Ile
G C U U G A U U A C G A A U U A G
Ala STOP
G C U U G U U U A C G A A U U A G
G C U U G U U U A C G A A U U A G
Ala Cys Leu Arg Ile
G C U U G U U A C G A A U U A G
Ala Cys Tyr Glu Leu
Proteins work together with other proteins or nucleic acids as "molecular machines"
structures fit together and function in highly specific, lock-and-key ways.
Four levels of structure:
Primary Structure: The sequence of the protein
Secondary structure: Local structure in regions of the chain. (alpha helix, beta sheet)
Tertiary Structure: Three dimensional structure
Quaternary Structure: multiple subunits
While 25000 genes have been identified in the human genome, relatively few have known functional annotation.
Determining the function of the protein can be done in several ways.
Sequence similarity to other (known) proteins
Using domain information
Using three dimensional structure
Based on high throughput experiments (when does it functions and who it interacts with)
Replication
Transcription Translation
Outline
Cell
Genome
Gene mRNA
Proteins
Systems biology
Instead of having brains, cells make decision through complex networks of interactions, called pathways
Synthesize new materials
Break other materials down for spare parts
Signal to eat or die
In order to fulfill their function, proteins interact with other proteins in a number of ways including:
Regulation
Signaling Pathways, for example A -> B -> C
Post translational modifications
Forming protein complexes
We now have many sources of data, each providing a different view on the activity in the cell
Sequence (genes)
DNA motifs
Gene expression
Protein interactions
Protein-DNA interaction
Etc.
Putting it all together: Systems Biology
Introduction to R programming
You need to do in-class exercises
Ziv Bar-Joseph: for some of the slides adapted or modified from his lecture slides at Carnegie Mellon University
Neil Jones: for some of the slides adapted or modified from his slides for the book An
Introduction to Bioinformatics Algorithms