The human genome

advertisement
Introduction to genomes & genome browsers
Content



Introduction
The human genome
Human genetic variation




SNPs
CNVs
Alternative splicing
Browsing the human genome
Celia van Gelder
CMBI
UMC Radboud
December 2013
c.vangelder@cmbi.ru.nl
Exponential Growth in Genomic Sequence Data
# of genomes
First 2
bacterial
genomes
complete
Currently
1000+ completed
genomes
First eukaryote
complete
(yeast)
First metazoan
complete
(flatworm)
Genome projects
http://www.genomesonline.org/
©CMBI 2013
The pig genome
The human genome
• Genome: the entire sequence of DNA in a cell
• 3 billion basepairs (3Gb)
• 22 chromosome pairs + X en Y chromosomes
• Chromosome length varies from ~50Mb to ~250Mb
• About 20000 protein-coding genes
(average gene length 3000 bases, but largest known gene is 2.4 Mb (dystrophin))
• Human genome is 99.9% identical among individuals
This means that every 2 persons differ in 3 million nts!!
Eukaryotic Genomes: more than collections of genes
• Genes & regulatory sequences make up 5% of the genome
Protein coding genes
RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA)
Structural DNA (centromeres, telomeres)
Regulation-related sequences (promoters, enhancers, silencers,
insulators)
– Parasite sequences (transposons)
– Pseudogenes (non-functional gene-like sequences)
– Simple sequence repeats
–
–
–
–
The human genome cntnd
• Only 1.2% codes for proteins
• Long introns, short exons
• Large spaces between genes
• More than half consists of repetitive DNA
Alu repeat
~300 bp
> million copies
From: Molecular Biology of the Cell
(4th edition) (Alberts et al., 2002)
Chromosome organisation (1)
Genes that are ON
Genes that are OFF
Introduction to genomes & genome browsers
Content



Introduction
The human genome
Human genetic variation




CNVs
SNPs
Alternative splicing
Browsing the human genome
Human Genetic Variation
•
Every human has essentially the same set of genes, but there are different
forms of each gene -- known as alleles
•
Genetic variation explains some of the differences among people, such as:
– Blood group
– Eye color
– Skin color
– Hair color
– Higher or lower risk for getting particular diseases
•
•
•
•
•
Cystic fibrosis, Sickle cell disease,
Diabetes, Cancer, Arthritis, Asthma
Stroke, Heart disease
Alzheimer's disease, Parkinson's disease
Depression, Alcoholism
Variations in the Genome
Common Sequence
Variations
Polymorphism
Deletions
Insertions
Chromosome
Translocations
Today’s focus
1. Single Nucleotide Polymorphisms (SNPs)
2. Copy number variations (CNV)
3. Alternative transcripts
Single Nucleotide Polymorphisms (SNPs)
• SNPs are DNA sequence variations that occur when a single
nucleotide (A,T,C,or G) in the genome sequence is altered.
• For a variation to be considered a SNP, it must occur in at least 1%
of the population.
• SNPs make up about 90% of all
human genetic variation and occur every
100 to 300 bases.
• SNPs can occur in coding (gene) and
non coding regions of the genome;
<1% alter the protein sequence
SNPs
• determine properties like eye color, hair (curly or straight), or if
you can taste bitter or not.
• are used for identification and forensics
• are used for estimating predisposition to disease
• can cause drug side–effects and/or non responsiveness for the
drug
• have impact on how humans respond to environmental factors like
bacteria, viruses, toxins and chemicals
• are used to predict specific genetic traits
• are used for classifying patients in clinical trials
• are used for mapping and genome-wide association studies of
complex diseases
SNP - Bitter tasting, TAS2R38
SNP & disease, Alzheimer
Alzheimer's disease (AD) & apolipoprotein E (APOE)
•
Apolipoprotein E is a cholesterol carrier that is found in the brain and other
organs.
•
APOE is suspected to be involved in amyloid beta aggregation and clearance,
influencing the onset of amyloid beta deposition.
•
APOE contains 2 SNPs that result in 3 possible alleles: E2, E3, E4.
•
Variant
E2
E3
E4
•
A person who inherits at least one E4 allele will have
a greater chance of developing AD.
rs429358
T
+
T
+
C
+
rs7412
T
C
C
Today’s focus
1. Single Nucleotide Polymorphisms (SNPs)
2. Copy number variations (CNV)
3. Alternative transcripts
Copy Number Variation
• Copy Number Variations (CNVs):
gains and losses of large chunks of DNA sequence (10kB – 5Mb)
• When there are genes in the CNV areas, this can lead to variations
in the number of gene copies between individuals
• CNVs contribute to our uniqueness.
• CNVs can also influence the susceptibility to disease.
• CNVs may either be inherited or caused by de novo mutation
Copy Number Variation
Normal cell
CN=2
deletion
CN=0
amplification
CN=1
CN=3
CN=4
CNVs & disease
• Many inherited genetic diseases result from CNVs;
–
–
–
–
–
Gene copy number can be elevated in cancer cells
Autism
Schizophrenia (dept. human genetics)
Mental retardation (dept. human genetics)
Parkinsons disease
• There are CNVs that protect against HIV infection and malaria.
• The contribution of CNV to the common, complex diseases, such as
diabetes and heart disease, is currently less well understood
Today’s focus
1. Copy number variations (CNV)
2. Single Nucleotide Polymorphisms (SNPs)
3. Alternative transcripts
Alternative splicing
Alternative splicing
• Defects in alternative splicing have been implicated in many
diseases, including:
–
neuropathological conditions such as Alzheimer disease
–
cystic fibrosis, those involving growth and developmental defects
–
many human cancers, e.g. BRCA1 in breast cancer
– Beta-globin in Beta-thalassemia
– Parkinsons Disease
Introduction to genomes & genome browsers
Content



Introduction
The human genome
Human genetic variation




CNVs
SNPs
Alternative splicing
Browsing the human genome
Annotating the genome
Annotation: attaching biological information to sequences.
Two main steps:
• identifying elements on the genome
• attaching biological information to these elements.
Basic & Advanced Genome Annotation
• Basic:
–
–
–
–
–
–
Genomic location
Gene features: Exons, Introns, UTRs
Transcript(s)
Pseudogenes, Non-coding RNA
Protein(s)
Links to other sources of information
• Advanced
–
–
–
–
–
–
–
Cytogenetic bands
Polymorphic markers
Genetic variation, including SNPs & CNVs
Repetitive sequences
cDNAs or mRNAs from related species
Genomic sequence variation
Regulation sequences (enhancers, silencers, insulators)
[Human] Genome Browsers
Not limited to
only human data
EBI
Ensembl
NCBI
Map Viewer
UCSC Genome Browser
Ensembl
©EMBL-EBI
Other Ensembl Installations
©EMBL-EBI
Organized Data Based on Chromosome Location
Gene X
tracks
genes & predictions
variations &
repeats
cross-species
comparative data
& many more types of data from expression
& regulation to mRNA and ESTs…
Description
Transcript data
Structure
Gene Ontology
Pathway Data
Homologous
Genes
Expression Data
Etc….
HGNC – a unique name and
symbol for every gene in human
http://www.genenames.org/
ENSG### Ensembl Gene ID
ENST### Ensembl Transcript ID
ENSP### Ensembl Peptide ID
ENSE### Ensembl Exon ID
tracks
tracks
Ensembl: An Example
Click for
more
details
Direction of transcription
Above blue line: forward strand
Below blue line: reverse strand
Ensembl Transcripts
©EMBL-EBI
Synopsis- What can I do with Ensembl?
•
View, examine & explore annotated information for any chromosomal
region:
– Genes,
– ESTs, mRNAs, alternative transcripts
– Proteins
– SNPs, and SNPs across strains (rat, mouse), populations (human), or
even breeds (dog)
– homologues and phylogenetic trees across more than 40 species
– whole genome alignments
– conserved regions across species
– gene expression profiles
•
Upload your own data and use BLAST/BLATagainst any Ensembl genome
•
Export sequence, or create a table of gene information
Download