Protein Structure - FAU College of Engineering

advertisement
COT 6930
HPC and Bioinformatics
Introduction to Molecular Biology
Xingquan Zhu
Dept. of Computer Science and Engineering
Outline


Cell
DNA




DNA Structure
DNA Sequencing
RNA (DNA-> RNA)
Protein


Protein structure
Protein synthesis
Central Dogma of Biology:
DNA, RNA, and the Flow of Information
Replication
Transcription
Translation
Protein

A sequence from 20 amino acids
Lys

Lys
Gly
Gly
Leu
Val
Ala
His
Adopts a stable 3D structure that can be measured
experimentally
Oxygen
Nitrogen
Carbon
Sulfur
Cartoon
Space
filling
Surface
Ribbon
X-ray Crystallography
X-ray Crystallography
X-ray Crystallography
The 20 amino acids
• Each amino acid
contains an "amine"
group (NH3) and a
"carboxy" group
(COOH) (shown in
black in the diagram).
• The amino acids
vary in their side
chains (indicated in
blue in the diagram).
Protein Structure

Protein Structure





Primary structure (amino acid sequence)
Secondary structure (local folding)
Tertiary Structure (global folding)
Quaternary structure (multiple-chain)
Protein Structure Animation

https://mywebspace.wisc.edu/jonovic/web/protein
s.html
Primary Structure

Primary structure is described by the sequence of
Amino Acids in the chain
Polypeptide
N-terminal
C- terminal
One end of every polypeptide, called the amino terminal or N-terminal,
has a free amino group. The other end, with its free carboxyl group, is
called the carboxyl terminal or C-terminal.
Peptide: 50 amino acids or less
Polypeptide: 50-100 amino acids
Protein: over 100 amino acids
Polypeptide

The amino acids are linked covalently by peptide bonds.
The image shows how three amino acids linked by peptide
bonds into a tripeptide.
Secondary Structure



Secondary structure describes the way the chain
folds
Local structure of consecutive amino acids
Common regular secondary structures



 Helix
 Sheet
b turn
Secondary Structure



Alpha helix
Beta strand / pleated sheet
Coil
Tertiary Structure of protein

Tertiary Structure describes the shapes which form
when the secondary spirals of the protein chain
further fold up on themselves.
Quaternary structure (multi-chain structures)

Quaternary structure describes any final adjustments to
the molecule before it can become active. For example,
pairs of chains may bind together or other inorganic
substances may be incorporated into the molecule.
Protein Structure Space
Protein folding
taxonomy :
all alpha
all beta
alpha/beta
alpha+beta
others
http://www.nigms.nih.gov/psi/
Geometry of Protein Structure
rotatable
rotatable
Total number of
degree is 2*(n-1)
where n is the
length of the protein
The Leventhal Paradox






Given a small protein (100aa) assume 3 possible
conformations/peptide bond
3100 = 5 × 1047 conformations
Fastest motions 10- 15 sec so sampling all conformations would
take 5 × 1032 sec
60 × 60 × 24 × 365 = 31536000 seconds in a year
Sampling all conformations will take 1.6 × 1025 years
Proteins do not have problem in folding, we have! the Leventhal
paradox
Outline


Cell
DNA




DNA Structure
DNA Sequencing
RNA (DNA-> RNA)
Protein


Protein structure
Protein synthesis
RNA
3 types of RNA
Messenger RNA
DNA: TAC CAT GAG ACT … ATC
mRNA: AUG GUA CUC UGA … UAG
Ribosomal RNA and ribosomes
Transfer RNA
Overview of protein synthesis
Transcription: same language
Translation: different language
Overview of protein synthesis
A. Transcription
No Thymine, instead has
Uracil
2. Translation, the final steps
Rules (the secret of life)

Transcription:



A →U
T →A
Translation
AUG: Methionine (Met)


G →C
C →G
Codons and anticodons
DNA: TAC CAT GAG ACT … ATC
mRNA: AUG GUA CUC UGA … UAG
tRNA: UAC CAU GAG ACU … AUC
Protein
structure
databases
Gene expression
database
transcription
DNA
Genomic
DNA
Databases
translation
RNA
cDNA
ESTs
UniGene
protein
Protein
sequence
databases
phenotype
List of Amino Acids (1)
List of Amino Acids (2)
Transcription & Open Reading Frame (ORF)

Open Reading Frame (ORF)




Where to start reading codons (ATG)
6 possible reading frames (3 forward, 3 backward)
Gene is usually longest ORF found
Forward reading frame example
Complication – Non-coding Regions

Non-coding regions




Very little genomic DNA produce proteins
Exon – DNA expressed in protein (2–3% of human genome)
Intron – DNA transcribed into mRNA but later removed
Untranslated region (UTR) – DNA not expressed


Biological processes



UTRs may affect gene regulation & expression
Remove introns from mRNA, splice exons together
Transition between intron / exon = splice site
Splicing can be inconsistent



Some exons may be skipped
Result = splice-variant gene / isoform
Estimated 30% of human proteins from splice-variant genes
Non-coding regions
Transcription

The process of making
RNA from DNA

Needs a promoter
region to begin
transcription.
Exons
Control
regions
Transcription
Splicing
Introns
Alternative Splicing

One single gene produce different forms of a protein

A single gene can contain numerous exons and introns, and the
exons can be spliced together in different ways
Complication: Mutations

Mutations


Modifications during DNA replication
Possible changes

Point mutation / single nucleotide polymorphism (SNP)
 5’ A T A C G T A …
5’ A T G C G T A …
 Occur every 100 to 300 bases along the 3-billion-base human
genome
Duplicate sequence




Inverted sequence
Insert / delete sequence ( indel )
Mutations
Mutations
Outline


Cell
DNA




DNA Structure
DNA Sequencing
RNA (DNA-> RNA)
Protein


Protein structure
Protein synthesis
Excellent Animation

Cell


http://www.youtube.com/watch?v=UB6G9GD2KF
k
Central Dogma

http://www.youtube.com/watch?v=GkdRdik73kU
Download