CS 6293 Advanced Topics: Translational Bioinformatics

advertisement
CS 6293 Advanced Topics:
Chapter 12: Human Microbiome
Analysis
OMER ASLAN
1
Outline
• An overview of the analysis of microbial
communities
• Understanding the human microbiome from
phylogenetic and functional Perspectives
• Methods and tools for calculating taxonomic and
phylogenetic diversity
• Metagenomic assembly and pathway analysis
• Human Microbiome Project (HMP)
• The impact of the microbiome on human host
• Summary
2
An overview of the analysis of
microbial communities
• The human microbiome is the aggregate
of microorganisms, a microbiome that
resides on the surface and in deep layers
of skin, in the saliva and oral mucosa, in
the conjunctiva, and in the gastrointestinal
tracts [1]. They include bacteria, archaea ,
fungi, and viruses.
3
Understanding the human microbiome from
phylogenetic and functional Perspectives
 Under normal circumstances, some of these
organisms perform tasks that are useful for the
human host such as helping to digest our food
and produce certain vitamins, regulate our
immune system, and keep us healthy by
protecting us against disease-causing bacteria.
 However, some of them cause some kinds of
disease in some conditions includes inflammatory
bowel disease to diabetes to antibiotic-resistant
infection.
4
human microbiome cont.
• According to some scientist humans are born only with
their own eukaryotic human cells, but over the first
several years of life, the skin surface, oral cavity, and
gut are colonized by a tremendous diversity of bacteria
(the majority), archaea, fungi, single-celled eukaryetes
and viruses.
• However, some other scientist includes Dr. Madan and
a number of other researchers are now convinced
mothers seed their fetuses with microbes during
pregnancy.
5
human microbiome functional
Perspectives.
Recent searches show that the specific sites on the
body, a different set of microbes may perform the
same function for different people. For instance, on
the tongues of two different people two entirely
different sets of organisms will break down sugars in
the same way. This suggests that medical science
may be forced to abandon the one-microbe model of
disease, and rather pay attention to the function of a
group of microbes that has somehow gone awry.
6
Some characteristic of human microbiome.
•
•
•
•
•
Humans are host to > 100 trillion organisms
They outnumber human cells 10: 1
Their combined genome is 100 fold greater
They comprise 700-800 separate species
The human microbiome makes up about one to two
percent of the body mass of an adult.
• Microbes contribute more genes responsible for human
survival than humans' own genes. It is estimated that
bacterial protein-coding genes are 360 times more
abundant than human genes.
7
Some characteristic of human
microbiome cont.
So we can say that we are more Microbiome
than human .
.Short video about human microbiome:
http://www.youtube.com/watch?v=5DTrENd
WvvM
8
Bacteria .
 Bacterium is a large domain of prokaryotic
microorganisms.
 Howard Hughes Medical Institute of Maryland
reports that the largest concentration of
bacteria in the human body is found in the
intestines. They also inhabit the skin and
mucosa, and gut.
 if microbe numbers grow beyond their typical
ranges or if microbes populate atypical areas
of the body such as through poor hygiene or
injury, disease can result.
9
Human Bacteria.
 It is estimated that 500 to 1,000 species of bacteria
live in the human gut [1]
 Bacterial cells are much smaller than human cells,
and there are at least ten times as many bacteria as
human cells in the body. The mass of microorganisms
are estimated to account for 1-3% total body mass [1].
 Many of the bacteria in the digestive tract, are able to
break down certain nutrients such as carbohydrates
that humans otherwise could not digest. The majority
of these commensal bacteria (commensal relationship
is between two organisms the one get benefit without
affecting the other) survive in an environment with 10
no oxygen.
Archaea
 Archaea are a kingdom of single-celled
microorganism. These microbes are prokaryotes ,
meaning they have no nucleus in the cell .
 Archaea were initially classified as a bactaria,
receiving the name archaebacteria), but this
classification is very old .
 Archaeal cells have unique properties separating
them from the other two domains of life: Eukaryote
and Bacteria .
11
Human Archaea
• Archaea are present in the human gut, but, in
contrast to the enormous variety of bacteria in
this organ, the numbers of archaeal species
are much less than bacteria.
• Although a relationship has been proposed
between the presence of some methanogens
and human periodontal disease, no clear
examples of archaeal pathogens are known
12
Fungal
• Fungi, a large group of eukaryotic organisms,
which is separate from plants, animals, protists,
and bacteria.
• Fungi, in particular yeasts, are present in the human
gut.
• The best-studied of these are Candida species. This
is because of their ability to become pathogenic
in immunocompromised hosts.
13
virus
 A virus is a small infectious agent that replicates
only inside the living cells of other organisms [6].
 Viruses can infect all types of life forms,
from animals and plants to bacteria and archaea.
• Basic structural characteristics, such as genome
type, virion shape and replication site, generally
share the same features among virus species
within the same family. There are currently 21
families of viruses known to cause disease in
humans
14
History of Microbiome
Studies
• Historically, microbial community were identified in situ
by stains which targeted their physiological
characteristics, such as the Gram stain .
• This technique distinguish many broad clades of
bacteria, however, were non-specific at lower
taxonomic levels. Hence, microbiology was culturedependent; it was necessary to grow an organism in
the lab in order to study it.
• Specific kinds of microbial species were detected by
plating samples on specialized media selective for the
growth of that organism.
• This approach limited the range of organisms which
15
could be detected actively grow in laboratory culture,
History of Microbiome
Studies cont.
• But it has been known that the majority of microbial
species have never been grown in the laboratory, and
options for studying and quantifying the uncultured
were severely limited until the development of DNA
based culture-independent methods in the 1980s.
• Culture-independent technique analyzes the DNA
extracted directly from a sample rather than from
individually cultured microbes. This technique allow us
to investigate many aspects of microbial communities
includes taxonomic diversity, such as how many of
which microbes are present in a community.
16
History of Microbiome
Studies cont
 One of the earliest targeted metagenomic assays for
studying uncultured communities without prior DNA
extraction was fluorescent in situ hybridization (FISH).
 FISH probes can be targeted to almost any level of
taxonomy from species to phylum. Even though FISH was
initially limited to the 16S rRNA marker gene and therefore
to diversity studies, it has since been expanded to
functional gene probes that can be used to identify specific
enzymes in communities.
 However, this earliest technique remains a primarily low
throughput, imaging-based technology
17
History of Microbiome
Studies cont.
• Even though DNA sequencing has existed
since the 1970s, it was quite expensive
because it required additional time and
expense of clone library construction. But later
it has been become economically feasible for
most scientists to sequence the DNA of an
entire environmental sample, and
metagenomic studies have since become
increasingly common.
18
Taxonomic Diversity
• 1. The 16S rRNA Marker Gene.
• 2 .Binning 16S rRNA Sequences into OTUs
(Operational Taxonomic Unit ) I will explain
this a bit later.
• 3. Measuring Population Diversity
19
The 16S rRNA Marker Gene.
• Generally microbial community consists of a
collection of individual cells, each carrying a
distinct complement of genomic DNA.
However , communities are obviously differ
from multicellular organisms in which their
component cells may or may not carry identical
genomes, although substantial subsets of
these cells are typically assumed to be clonal.
20
16S rRNA cont.
• Therefore, assign a frequency to each distinct
genome within the community describing either the
absolute number of cells in which it is carried or
their relative abundance within the population. it is
not practical to fully sequence every genome in
every cell
• Microbial ecology has defined a number of
molecular markers that uniquely tag distinct
genomes. A marker is a DNA sequence that
identifies the genome that contains it, without the
need to sequence the entire genome.
21
The 16S rRNA Marker Gene
• Even though different markers can be chosen for
analyzing different populations, several properties
are desirable for a good marker.
• A marker should be present in every member of a
population.
• A number of such markers have been defined,
including ribosomal protein subunits. Small 16S
ribosomal RNA subunit gene 1.5 Kbp gene
• 16S ribosomal RNA (16S rRNA) is a component
of the 30S small subunit of prokaryotic ribosomes.
The genes coding for it are referred to as 16S
rDNA and are used in reconstructing phylogenies22
The 16S rRNA Marker Gene
• Multiple sequences of 16S rRNA can exist within a
single bacterium and It is relatively cheap and simple
to sequence only the 16S sequences from a
microbiome. Hence, describing the population as a set
of 16S sequences and the number of times each was
detected.
• Sequences assayed in this manner have been
characterized for a wide range of cultured species and
environmental isolates;
• These are stored and can be automatically matched
against several databases including Ribosomal
23
Database Project, GreenGenes, and Silva
Ribosomal Database Project
 Ribosomal Database Project (RDP) is a curated
database that offers ribosome data along with related
programs and services.
 The offerings include phylogenetically ordered
alignments of ribosomal RNA (rRNA) sequences,
rRNA secondary structure diagrams and various
software packages for handling, analyzing and
displaying alignments and trees.
 The data are available via ftp and electronic mail.
Certain analytic services are also provided by the
electronic mail server.
 http://rdp.cme.msu.edu/ you can access database.24
Greengenes
• Greengenes web application provides access to the
2011 version of the greengenes 16S rRNA gene
sequence alignment for browsing, blasting, probing,
and downloading.
• The data and tools presented by greengenes can
assist the researcher in choosing phylogenetically
specific probes, interpreting microarray results, and
aligning novel sequences.
• You can download from
http://greengenes.secondgenome.com/
25
SILVA
• SILVA provides comprehensive, quality
checked and regularly updated datasets of
aligned small (16S/18S, SSU) and large
subunit (23S/28S, LSU) ribosomal RNA
(rRNA) sequences for all three domains of
life (Bacteria, Archaea and Eukarya).
• You can visit http://www.arb-silva.de/ for
SILVA
26
Binning 16S rRNA Sequences
into OTUs
 The challenge that appears in the analysis of rRNA
genes is the precise definition of a ‘‘unique’’ sequence.
Even though much of the 16S rRNA gene is highly
conserved, several of the sequenced regions are
variable or hypervariable, so small numbers of base
pairs can change in a very short period of evolutionary
time.
 There is a fair chance that they will thus contain at least
one sequencing error , because 16S regions are typically
sequenced using only a single pass.
27
OTUs cont.
• Some degree of sequence divergence is typically
allowed - 95%, 97%, or 99% are sequence similarity
cutoffs often used in practice and the resulting cluster
of nearly-identical tags is referred to as an Operational
Taxonomic Unit (OTU) or sometimes phylotype.
• OTUs take the place of ‘‘species’’ in many microbiome
diversity analyses because named species genomes are
often unavailable for particular marker sequences.
28
OTUs cont.
The assignment of sequences to OTUs is
referred to as binning, and it can be performed
by
• 1) Unsupervised clustering of similar sequences
• 2) Phylogenetic models incorporating mutation
rates and evolutionary relationships
• 3) Supervised methods (whole genome
shotgun ) that directly assign sequences to
taxonomic bins based on labeled training data
29
Measuring Population Diversity
• Population diversity is a very important concept when
dealing with OTUs or other taxonomic bins because
this is critical for human health.
• since a number of disease conditions have been
shown to correlate with decreased microbiome
diversity, presumably as one or a few microbes
overgrow during immune or nutrient imbalance in a
process, it can affect human health seriously.
• Human intestinal contents appear to be highly
personalized when considered in terms of microbial
presence, absence, and abundance.
30
Measuring Population Diversity cont.
• We can ask two well-defined questions when
quantifying population diversity given that x bins have
been observed in a sample of size y from a population
of size z.
• How many bins are expected to exist in the
population; or, given that x bins exist in a population of
size z.
• If I have sequenced some amount of diversity, how
much more exists in my microbiome? and, How much
do I need to sequence to completely characterize my
microbiome?
31
Measuring Population Diversity cont.
• Measurement exists for calculating alpha diversity, the
number (richness) and distribution (evenness) of taxa
expected within a single population.
• These give rise to figures known as collector’s or
rarefaction curves, since increasing numbers of
sequenced taxa allow increasingly precise estimates
of total population diversity.
• On the other hand, when comparing multiple
populations’ beta diversity measures including
absolute or relative overlap describe how many taxa
are shared between them.
32
Alpha, beta diversity
whereas an alpha diversity measure acts like a
summary statistic of a single population, a beta
diversity measure acts like a similarity score
between populations, allowing analysis by
sample clustering .
33
Alpha, beta diversity cont.
34
Alpha diversity is
often quantified by the Shannon Index
Simpson Index
where
pi is the fraction of total species comprised
by species i .
35
Beta diversity
• Beta diversity can be measured by simple
taxa overlap quantified
by the Bray-Curtis dissimilarity
• where Si and Sj are the number of species in
populations i and j, and Cij is the total number of
species at the location with the fewest species. Like
similarity measures in expression array analysis,
many alpha- and beta-diversity measures have been
developed that each reveal slightly different aspects
of community ecology
36
Shotgun Sequencing and
Metagenomics
• Metagenomics is a investigation of the
microbes that inhabit oceans, soils, and the
human body etc. with sequencing technologies.
• The composition and function of uncultured
microbial communities are often referred to
collectively as ‘‘metagenomic,’’
• Metagenomics is the study
of metagenomes, genetic material recovered
directly from environmental samples.
37
38
Metagenomics
 While traditional microbiology and microbial genome
sequencing rely on cultivated clonal cultures,
 early environmental gene sequencing cloned specific
genes (often the 16S rRNA gene) to produce a profile
of diversity in a natural sample.
 Such work revealed that the vast majority of microbial
biodiversity had been missed by cultivation-based
methods.
 Recent studies use "shotgun" Sanger sequencing or
massively parallel pyrosequencing to get largely
unbiased samples of all genes from all the members
39
of the sampled communities.
Metagenomics

Because of its ability to reveal the previously hidden
diversity of microscopic life, metagenomics offers a
powerful lens for viewing the microbial world that has
the potential to revolutionize understanding of the
entire living world.
 The term metagenomics also is used with some
frequency to describe the entire body of highthroughput studies now possible with microbial
communities, although it also refers more specifically
to whole-metagenome shotgun (WMS) sequencing of
genomic DNA fragments from a community’s
40
metagenome.
Shotgun metagenomics
 The approach, used to sequence many cultured
microorganisms and the human genome, randomly
shears DNA, sequences many short sequences,
and reconstructs them into a consensus sequence.
Shotgun sequencing reveals genes present in
environmental samples.
• This provides information both on which organisms
are present and what metabolic processes are
possible in the community. This can be helpful in
understanding the ecology of a community, especially
if multiple samples are compared to each other
41
Shotgun metagenomics
• Shotgun metagenomics also is capable of sequencing
nearly complete microbial genomes directly from the
environment.
• Because the collection of DNA from an environment
is largely uncontrolled, the most abundant organisms
in an environmental sample are most highly
represented in the resulting sequence data.
• To achieve the high coverage needed to fully resolve
the genomes of under-represented community
members, large samples, often prohibitively so, are
needed.
42
Shotgun metagenomics cont.
• On the other hand, the random nature of
shotgun sequencing ensures that many of
these organisms, which would otherwise
go unnoticed using traditional culturing
techniques, will be represented by at least
some small sequence segments.
43
High-throughput sequencing
• The first metagenomic studies conducted using highthroughput sequencing used massively parallel.
• Three other technologies commonly applied to
environmental sampling are the Ion Torrent Personal
Genome Machine, the Illumina Genome Analyzer II
and the Applied Biosystems solid system.
• These techniques for sequencing DNA generate
shorter fragments than Sanger sequencing. These
read lengths are significantly shorter than the typical
Sanger sequencing read length of ~750 bp.
44
Sequence pre-filtering
 The first step of metagenomic data analysis
requires the execution of certain pre-filtering
steps, including the removal of redundant, lowquality sequences and sequences of
probable eukaryotic origin (especially in
metagenomes of human origin).
 The methods available for the removal of
contaminating eukaryotic genomic DNA
sequences include Eu-Detect and DeConseq.
45
Metagenome Data Analysis
• Unlike whole-genome shotgun (WGS) sequencing of
individual organisms, metagenomes tend not to have
a single finish line and have been successfully
analyzed using a range of assembly techniques.
• Metagenome-specific assembly algorithms have been
proposed that reconstruct only the open reading
frames from a population, recruiting highly sequence
similar fragments on complete single gene sequences
and avoiding assembly of larger contigs.
46
Metagenome Data Analysis cont
• The most challenging option is to attempt full
assemblies for complete genomes present in the
community.
• When successful, this has the obvious benefit of
establishing synteny, structural variation, and opening
up the range of tools developed for whole-genome
analysis.
• A key bioinformatic tradeoff in analyzing
metagenomic WMS sequences, regardless of their
degree of assembly, is whether they should be
analyzed by homology.
47
Metagenome Data Analysis cont
 An illustrative example is the task of determining
which parts of each sequence read encode one or
more genes,
 i.e. gene finding or calling. By homology, each
sequence can be BLASTed against a large
database of reference genomes.
 This method is robust to sequencing and assembly
errors, but it is sensitive to the contents of the
reference database. Conversely, de novo methods
have been developed to directly bin and call genes
within metagenomic sequences using DNA features
alone.
48
Computational Functional
Metagenomics
• Computational functional metagenomics typically
focus on the function of individual genes and gene
products within a community and fall into one of two
categories Top-down approaches and Bottom-up
approaches.
• Both approaches relies, first, on cataloging some or
all of the gene products present in a community and
assigning them molecular functions and/or biological
roles in the typical sense of protein function
predictions.
49
Computational Functional
Metagenomics cont.
• Top-down approaches screen a metagenome for a
functional class of interest, for instance a particular
enzyme family, transporter, pathway, or biological
activity, essentially asking the question, ‘‘Does this
community carry out this function and, if so, in what
way?
• On the other hand, Bottom-up approaches attempt to
reconstruct profiles, either descriptive or predictive, of
overall functionality within a community, typically
relying on pathway and/or metabolic reconstructions
and asking the question, What functions are carried
50
out by this community?
Computational Functional
Metagenomics cont
• A lot of bioinformatic methods, the simplest techniques
rely on BLAST : a top down investigation can BLAST
representatives of gene families of interest into the
community metagenome to determine their presence
and abundance
• and a bottom-up approach can BLAST reads or
contigs from a metagenome into a large annotated
reference database such as nr to perform knowledge
transfer by homology.
• Top-down approaches dovetail well with experimental screens
for individual gene product function , and bottom-up
approaches are more descriptive of the community as a whole.
51
Computational Functional
Metagenomics cont
• Functional comparisons between metagenomes may
be made by comparing sequences against reference
databases such as COG or KEGG, and tabulating the
abundance by category and evaluating any
differences for statistical significance. This genecentric approach emphasizes the functional
complement of the community as whole rather than
taxonomic groups,
52
Computational Functional
Metagenomics cont
• Since we are currently able to infer function for
only a fraction of the genes in any given
complete genome,
• let alone metagenome, any of these
approaches should be deemed hypothetical at
best; nevertheless, like any missing value
imputation process, they can provide
numerically stable guesses that are
substantially better than random.
53
Host Interactions and
Interventions
• Final aspect of translational metagenomics lies in
understanding microbiome community and its
environment - They interaction with a human host.
Human are heavily influenced by microbiome
especially host health and disease.
• Almost all over the human body part includes
microbiome. The skin of humans hosts relatively few
taxa the nasal cavity somewhat more, the oral cavity
several hundred taxa and the gut over 500 taxa with
densities over 10 ^ 11 cells/g . Almost none of these
communities are yet well-understood.
54
Human gut.
• The gut, is currently the best studied human
microbiome .
• It is a dynamic community changing over time such as
diet , illness, travel, chemical additives, and antibiotics.
• Indeed, the human gut microbiome has proven
difficult to study exactly because it is so intimately
related to the physiology of its host; in as much as no
two people share identical microbiota,
55
Microbiome and Human Health
• Microbiota clearly represent a key component of future
personalized medicine.
• The number and diversity of phenotypes linked to the
composition of the microbiota is immense: obesity,
diabetes, allergies, autism, inflammatory bowel
disease, fibromyalgia, cardiac function, various
cancers, and depression have all been reported to
correlate with microbiome function .
• Even without causative or modulatory roles, there is
tremendous potential in the ability to use the
taxonomic or metagenomic composition of a subject’s
gut or oral flora as a diagnostic or prognostic
56
biomarker for any or all of these conditions.
Microbiome and Human Health cont.
57
Human Microbiome Project (HMP)
• The Human Microbiome Project (HMP) is a United
States National Institutes of Health initiative with the
goal of identifying and characterizing the
microorganisms which are found in association with
both healthy and diseased human [5].
• Launched in 2008, it is a five-year project, best
characterized as a feasibility study, and has a total
budget of $115 million [5].
• National Institutes of Health (NIH)sponsored
microbiome projects is to test how changes in the
human microbiome are associated with human
58
health or disease.
HMP cont.
• Project will be culture-independent methods of
microbial community characterization, such
as metagenomics , as well as extensive whole
genome sequencing .
• The microbiology of five body sites will be
emphasized: oral, skin, vaginal, gut, and nasal/lung.
• Total microbial cells found in association with humans
may exceed the total number of cells making up
the human body by a factor of ten-to-one.
• The total number of genes associated with the human
microbiome could exceed the total number of human
59
genes by a factor of 100-to-one
The goal of the HMP
• To develop a reference set of microbial genome
sequences and to perform preliminary characterization
of the human microbiome
• To explore the relationship between disease and
changes in the human microbiome
• To develop new technologies and tools for
computational analysis
• To establish a resource repository
• To study the ethical, legal, and social implications of
human microbiome research
60
Human Microbiome Project (HMP)
61
Some new discoverer with HMP
• Microbes contribute more genes responsible for
human survival than humans' own genes. It is
estimated that bacterial protein-coding genes are 360
times more abundant than human genes.
• Microbial metabolic activities; for example, digestion of
fats; are not always provided by the same bacterial
species.
• Components of the human microbiome change over
time, affected by a patient disease state and
medication. However, the microbiome eventually
returns to a state of equilibrium, even though the
62
composition of bacterial types has changed.
HMP Achievements
Major categories of work :
• Development of new database systems allowing
efficient organization, storage, access, search and
annotation of massive amounts of data.
• Development of tools for comparative analysis that
facilitate the recognition of common patterns, major
themes and trends in complex data sets
• Development of new methods and systems for
assembly of massive sequence data sets.
63
Summary
• The human microbiome has been referred to as a
forgotten organ. Microbiome cells outnumber 10 times
to the human cells.
• Women has more diverse microbiome than men.
• The human microbiome consists of unicellular
microbes - mainly bacterial, but also archaeal, viral,
and eukaryotic that occupy nearly every surface of our
bodies and have been linked to a wide range of
phenotypes in health and disease.
64
Summary
• High-throughput assays have offered the first
comprehensive culture-free techniques
• for surveying the members of these
communities and their biomolecular activities at
the transcript, protein, and metabolic levels.
• Most current technologies rely on DNA
sequencing to examine either individual
taxonomic markers in a microbial community,
typically the 16S ribosomal subunit gene, or the
composite metagenome of the entire
65
community.
Ongoing study.
• Ongoing studies are beginning to investigate
the ways in which the microbiota can be directly
engineered using pharmaceuticals,
• prebiotics (a food substance metabolized by the
microbiota so as to directly or indirectly benefit
the host),
• probiotics (a live microorganism consumed by
the host with direct or indirect health benefits),
or diet as a preventative or treatment for a wide
range of disorders.
66
REFERENCES
1) http://en.wikipedia.org/wiki/Human_microbiome
2) http://en.wikipedia.org/wiki/Metagenomics
3)http://en.wikipedia.org/wiki/16S_ribosomal_RNA
4) https://www.bcm.edu/departments/molecular-virology-andmicrobiology/microbiome
5) http://en.wikipedia.org/wiki/Human_Microbiome_Project
6) http://en.wikipedia.org/wiki/Virus
YouTube lectures:
1) http://www.youtube.com/watch?v=EEZSuwkx7Ik
2)http://www.youtube.com/watch?v=pMU9d67ShoQ
3)http://www.youtube.com/watch?v=5DTrENdWvvM
4)http://www.youtube.com/watch?v=thuMCQ8ngzM/
5)http://www.youtube.com/watch?v=cXrWADTIf0s&list=PLC848629C82162FB0
6)http://www.youtube.com/watch?v=erVVL1XfLkw
67
Download