CS 6293 Advanced Topics: Chapter 12: Human Microbiome Analysis OMER ASLAN 1 Outline • An overview of the analysis of microbial communities • Understanding the human microbiome from phylogenetic and functional Perspectives • Methods and tools for calculating taxonomic and phylogenetic diversity • Metagenomic assembly and pathway analysis • Human Microbiome Project (HMP) • The impact of the microbiome on human host • Summary 2 An overview of the analysis of microbial communities • The human microbiome is the aggregate of microorganisms, a microbiome that resides on the surface and in deep layers of skin, in the saliva and oral mucosa, in the conjunctiva, and in the gastrointestinal tracts [1]. They include bacteria, archaea , fungi, and viruses. 3 Understanding the human microbiome from phylogenetic and functional Perspectives Under normal circumstances, some of these organisms perform tasks that are useful for the human host such as helping to digest our food and produce certain vitamins, regulate our immune system, and keep us healthy by protecting us against disease-causing bacteria. However, some of them cause some kinds of disease in some conditions includes inflammatory bowel disease to diabetes to antibiotic-resistant infection. 4 human microbiome cont. • According to some scientist humans are born only with their own eukaryotic human cells, but over the first several years of life, the skin surface, oral cavity, and gut are colonized by a tremendous diversity of bacteria (the majority), archaea, fungi, single-celled eukaryetes and viruses. • However, some other scientist includes Dr. Madan and a number of other researchers are now convinced mothers seed their fetuses with microbes during pregnancy. 5 human microbiome functional Perspectives. Recent searches show that the specific sites on the body, a different set of microbes may perform the same function for different people. For instance, on the tongues of two different people two entirely different sets of organisms will break down sugars in the same way. This suggests that medical science may be forced to abandon the one-microbe model of disease, and rather pay attention to the function of a group of microbes that has somehow gone awry. 6 Some characteristic of human microbiome. • • • • • Humans are host to > 100 trillion organisms They outnumber human cells 10: 1 Their combined genome is 100 fold greater They comprise 700-800 separate species The human microbiome makes up about one to two percent of the body mass of an adult. • Microbes contribute more genes responsible for human survival than humans' own genes. It is estimated that bacterial protein-coding genes are 360 times more abundant than human genes. 7 Some characteristic of human microbiome cont. So we can say that we are more Microbiome than human . .Short video about human microbiome: http://www.youtube.com/watch?v=5DTrENd WvvM 8 Bacteria . Bacterium is a large domain of prokaryotic microorganisms. Howard Hughes Medical Institute of Maryland reports that the largest concentration of bacteria in the human body is found in the intestines. They also inhabit the skin and mucosa, and gut. if microbe numbers grow beyond their typical ranges or if microbes populate atypical areas of the body such as through poor hygiene or injury, disease can result. 9 Human Bacteria. It is estimated that 500 to 1,000 species of bacteria live in the human gut [1] Bacterial cells are much smaller than human cells, and there are at least ten times as many bacteria as human cells in the body. The mass of microorganisms are estimated to account for 1-3% total body mass [1]. Many of the bacteria in the digestive tract, are able to break down certain nutrients such as carbohydrates that humans otherwise could not digest. The majority of these commensal bacteria (commensal relationship is between two organisms the one get benefit without affecting the other) survive in an environment with 10 no oxygen. Archaea Archaea are a kingdom of single-celled microorganism. These microbes are prokaryotes , meaning they have no nucleus in the cell . Archaea were initially classified as a bactaria, receiving the name archaebacteria), but this classification is very old . Archaeal cells have unique properties separating them from the other two domains of life: Eukaryote and Bacteria . 11 Human Archaea • Archaea are present in the human gut, but, in contrast to the enormous variety of bacteria in this organ, the numbers of archaeal species are much less than bacteria. • Although a relationship has been proposed between the presence of some methanogens and human periodontal disease, no clear examples of archaeal pathogens are known 12 Fungal • Fungi, a large group of eukaryotic organisms, which is separate from plants, animals, protists, and bacteria. • Fungi, in particular yeasts, are present in the human gut. • The best-studied of these are Candida species. This is because of their ability to become pathogenic in immunocompromised hosts. 13 virus A virus is a small infectious agent that replicates only inside the living cells of other organisms [6]. Viruses can infect all types of life forms, from animals and plants to bacteria and archaea. • Basic structural characteristics, such as genome type, virion shape and replication site, generally share the same features among virus species within the same family. There are currently 21 families of viruses known to cause disease in humans 14 History of Microbiome Studies • Historically, microbial community were identified in situ by stains which targeted their physiological characteristics, such as the Gram stain . • This technique distinguish many broad clades of bacteria, however, were non-specific at lower taxonomic levels. Hence, microbiology was culturedependent; it was necessary to grow an organism in the lab in order to study it. • Specific kinds of microbial species were detected by plating samples on specialized media selective for the growth of that organism. • This approach limited the range of organisms which 15 could be detected actively grow in laboratory culture, History of Microbiome Studies cont. • But it has been known that the majority of microbial species have never been grown in the laboratory, and options for studying and quantifying the uncultured were severely limited until the development of DNA based culture-independent methods in the 1980s. • Culture-independent technique analyzes the DNA extracted directly from a sample rather than from individually cultured microbes. This technique allow us to investigate many aspects of microbial communities includes taxonomic diversity, such as how many of which microbes are present in a community. 16 History of Microbiome Studies cont One of the earliest targeted metagenomic assays for studying uncultured communities without prior DNA extraction was fluorescent in situ hybridization (FISH). FISH probes can be targeted to almost any level of taxonomy from species to phylum. Even though FISH was initially limited to the 16S rRNA marker gene and therefore to diversity studies, it has since been expanded to functional gene probes that can be used to identify specific enzymes in communities. However, this earliest technique remains a primarily low throughput, imaging-based technology 17 History of Microbiome Studies cont. • Even though DNA sequencing has existed since the 1970s, it was quite expensive because it required additional time and expense of clone library construction. But later it has been become economically feasible for most scientists to sequence the DNA of an entire environmental sample, and metagenomic studies have since become increasingly common. 18 Taxonomic Diversity • 1. The 16S rRNA Marker Gene. • 2 .Binning 16S rRNA Sequences into OTUs (Operational Taxonomic Unit ) I will explain this a bit later. • 3. Measuring Population Diversity 19 The 16S rRNA Marker Gene. • Generally microbial community consists of a collection of individual cells, each carrying a distinct complement of genomic DNA. However , communities are obviously differ from multicellular organisms in which their component cells may or may not carry identical genomes, although substantial subsets of these cells are typically assumed to be clonal. 20 16S rRNA cont. • Therefore, assign a frequency to each distinct genome within the community describing either the absolute number of cells in which it is carried or their relative abundance within the population. it is not practical to fully sequence every genome in every cell • Microbial ecology has defined a number of molecular markers that uniquely tag distinct genomes. A marker is a DNA sequence that identifies the genome that contains it, without the need to sequence the entire genome. 21 The 16S rRNA Marker Gene • Even though different markers can be chosen for analyzing different populations, several properties are desirable for a good marker. • A marker should be present in every member of a population. • A number of such markers have been defined, including ribosomal protein subunits. Small 16S ribosomal RNA subunit gene 1.5 Kbp gene • 16S ribosomal RNA (16S rRNA) is a component of the 30S small subunit of prokaryotic ribosomes. The genes coding for it are referred to as 16S rDNA and are used in reconstructing phylogenies22 The 16S rRNA Marker Gene • Multiple sequences of 16S rRNA can exist within a single bacterium and It is relatively cheap and simple to sequence only the 16S sequences from a microbiome. Hence, describing the population as a set of 16S sequences and the number of times each was detected. • Sequences assayed in this manner have been characterized for a wide range of cultured species and environmental isolates; • These are stored and can be automatically matched against several databases including Ribosomal 23 Database Project, GreenGenes, and Silva Ribosomal Database Project Ribosomal Database Project (RDP) is a curated database that offers ribosome data along with related programs and services. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, rRNA secondary structure diagrams and various software packages for handling, analyzing and displaying alignments and trees. The data are available via ftp and electronic mail. Certain analytic services are also provided by the electronic mail server. http://rdp.cme.msu.edu/ you can access database.24 Greengenes • Greengenes web application provides access to the 2011 version of the greengenes 16S rRNA gene sequence alignment for browsing, blasting, probing, and downloading. • The data and tools presented by greengenes can assist the researcher in choosing phylogenetically specific probes, interpreting microarray results, and aligning novel sequences. • You can download from http://greengenes.secondgenome.com/ 25 SILVA • SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). • You can visit http://www.arb-silva.de/ for SILVA 26 Binning 16S rRNA Sequences into OTUs The challenge that appears in the analysis of rRNA genes is the precise definition of a ‘‘unique’’ sequence. Even though much of the 16S rRNA gene is highly conserved, several of the sequenced regions are variable or hypervariable, so small numbers of base pairs can change in a very short period of evolutionary time. There is a fair chance that they will thus contain at least one sequencing error , because 16S regions are typically sequenced using only a single pass. 27 OTUs cont. • Some degree of sequence divergence is typically allowed - 95%, 97%, or 99% are sequence similarity cutoffs often used in practice and the resulting cluster of nearly-identical tags is referred to as an Operational Taxonomic Unit (OTU) or sometimes phylotype. • OTUs take the place of ‘‘species’’ in many microbiome diversity analyses because named species genomes are often unavailable for particular marker sequences. 28 OTUs cont. The assignment of sequences to OTUs is referred to as binning, and it can be performed by • 1) Unsupervised clustering of similar sequences • 2) Phylogenetic models incorporating mutation rates and evolutionary relationships • 3) Supervised methods (whole genome shotgun ) that directly assign sequences to taxonomic bins based on labeled training data 29 Measuring Population Diversity • Population diversity is a very important concept when dealing with OTUs or other taxonomic bins because this is critical for human health. • since a number of disease conditions have been shown to correlate with decreased microbiome diversity, presumably as one or a few microbes overgrow during immune or nutrient imbalance in a process, it can affect human health seriously. • Human intestinal contents appear to be highly personalized when considered in terms of microbial presence, absence, and abundance. 30 Measuring Population Diversity cont. • We can ask two well-defined questions when quantifying population diversity given that x bins have been observed in a sample of size y from a population of size z. • How many bins are expected to exist in the population; or, given that x bins exist in a population of size z. • If I have sequenced some amount of diversity, how much more exists in my microbiome? and, How much do I need to sequence to completely characterize my microbiome? 31 Measuring Population Diversity cont. • Measurement exists for calculating alpha diversity, the number (richness) and distribution (evenness) of taxa expected within a single population. • These give rise to figures known as collector’s or rarefaction curves, since increasing numbers of sequenced taxa allow increasingly precise estimates of total population diversity. • On the other hand, when comparing multiple populations’ beta diversity measures including absolute or relative overlap describe how many taxa are shared between them. 32 Alpha, beta diversity whereas an alpha diversity measure acts like a summary statistic of a single population, a beta diversity measure acts like a similarity score between populations, allowing analysis by sample clustering . 33 Alpha, beta diversity cont. 34 Alpha diversity is often quantified by the Shannon Index Simpson Index where pi is the fraction of total species comprised by species i . 35 Beta diversity • Beta diversity can be measured by simple taxa overlap quantified by the Bray-Curtis dissimilarity • where Si and Sj are the number of species in populations i and j, and Cij is the total number of species at the location with the fewest species. Like similarity measures in expression array analysis, many alpha- and beta-diversity measures have been developed that each reveal slightly different aspects of community ecology 36 Shotgun Sequencing and Metagenomics • Metagenomics is a investigation of the microbes that inhabit oceans, soils, and the human body etc. with sequencing technologies. • The composition and function of uncultured microbial communities are often referred to collectively as ‘‘metagenomic,’’ • Metagenomics is the study of metagenomes, genetic material recovered directly from environmental samples. 37 38 Metagenomics While traditional microbiology and microbial genome sequencing rely on cultivated clonal cultures, early environmental gene sequencing cloned specific genes (often the 16S rRNA gene) to produce a profile of diversity in a natural sample. Such work revealed that the vast majority of microbial biodiversity had been missed by cultivation-based methods. Recent studies use "shotgun" Sanger sequencing or massively parallel pyrosequencing to get largely unbiased samples of all genes from all the members 39 of the sampled communities. Metagenomics Because of its ability to reveal the previously hidden diversity of microscopic life, metagenomics offers a powerful lens for viewing the microbial world that has the potential to revolutionize understanding of the entire living world. The term metagenomics also is used with some frequency to describe the entire body of highthroughput studies now possible with microbial communities, although it also refers more specifically to whole-metagenome shotgun (WMS) sequencing of genomic DNA fragments from a community’s 40 metagenome. Shotgun metagenomics The approach, used to sequence many cultured microorganisms and the human genome, randomly shears DNA, sequences many short sequences, and reconstructs them into a consensus sequence. Shotgun sequencing reveals genes present in environmental samples. • This provides information both on which organisms are present and what metabolic processes are possible in the community. This can be helpful in understanding the ecology of a community, especially if multiple samples are compared to each other 41 Shotgun metagenomics • Shotgun metagenomics also is capable of sequencing nearly complete microbial genomes directly from the environment. • Because the collection of DNA from an environment is largely uncontrolled, the most abundant organisms in an environmental sample are most highly represented in the resulting sequence data. • To achieve the high coverage needed to fully resolve the genomes of under-represented community members, large samples, often prohibitively so, are needed. 42 Shotgun metagenomics cont. • On the other hand, the random nature of shotgun sequencing ensures that many of these organisms, which would otherwise go unnoticed using traditional culturing techniques, will be represented by at least some small sequence segments. 43 High-throughput sequencing • The first metagenomic studies conducted using highthroughput sequencing used massively parallel. • Three other technologies commonly applied to environmental sampling are the Ion Torrent Personal Genome Machine, the Illumina Genome Analyzer II and the Applied Biosystems solid system. • These techniques for sequencing DNA generate shorter fragments than Sanger sequencing. These read lengths are significantly shorter than the typical Sanger sequencing read length of ~750 bp. 44 Sequence pre-filtering The first step of metagenomic data analysis requires the execution of certain pre-filtering steps, including the removal of redundant, lowquality sequences and sequences of probable eukaryotic origin (especially in metagenomes of human origin). The methods available for the removal of contaminating eukaryotic genomic DNA sequences include Eu-Detect and DeConseq. 45 Metagenome Data Analysis • Unlike whole-genome shotgun (WGS) sequencing of individual organisms, metagenomes tend not to have a single finish line and have been successfully analyzed using a range of assembly techniques. • Metagenome-specific assembly algorithms have been proposed that reconstruct only the open reading frames from a population, recruiting highly sequence similar fragments on complete single gene sequences and avoiding assembly of larger contigs. 46 Metagenome Data Analysis cont • The most challenging option is to attempt full assemblies for complete genomes present in the community. • When successful, this has the obvious benefit of establishing synteny, structural variation, and opening up the range of tools developed for whole-genome analysis. • A key bioinformatic tradeoff in analyzing metagenomic WMS sequences, regardless of their degree of assembly, is whether they should be analyzed by homology. 47 Metagenome Data Analysis cont An illustrative example is the task of determining which parts of each sequence read encode one or more genes, i.e. gene finding or calling. By homology, each sequence can be BLASTed against a large database of reference genomes. This method is robust to sequencing and assembly errors, but it is sensitive to the contents of the reference database. Conversely, de novo methods have been developed to directly bin and call genes within metagenomic sequences using DNA features alone. 48 Computational Functional Metagenomics • Computational functional metagenomics typically focus on the function of individual genes and gene products within a community and fall into one of two categories Top-down approaches and Bottom-up approaches. • Both approaches relies, first, on cataloging some or all of the gene products present in a community and assigning them molecular functions and/or biological roles in the typical sense of protein function predictions. 49 Computational Functional Metagenomics cont. • Top-down approaches screen a metagenome for a functional class of interest, for instance a particular enzyme family, transporter, pathway, or biological activity, essentially asking the question, ‘‘Does this community carry out this function and, if so, in what way? • On the other hand, Bottom-up approaches attempt to reconstruct profiles, either descriptive or predictive, of overall functionality within a community, typically relying on pathway and/or metabolic reconstructions and asking the question, What functions are carried 50 out by this community? Computational Functional Metagenomics cont • A lot of bioinformatic methods, the simplest techniques rely on BLAST : a top down investigation can BLAST representatives of gene families of interest into the community metagenome to determine their presence and abundance • and a bottom-up approach can BLAST reads or contigs from a metagenome into a large annotated reference database such as nr to perform knowledge transfer by homology. • Top-down approaches dovetail well with experimental screens for individual gene product function , and bottom-up approaches are more descriptive of the community as a whole. 51 Computational Functional Metagenomics cont • Functional comparisons between metagenomes may be made by comparing sequences against reference databases such as COG or KEGG, and tabulating the abundance by category and evaluating any differences for statistical significance. This genecentric approach emphasizes the functional complement of the community as whole rather than taxonomic groups, 52 Computational Functional Metagenomics cont • Since we are currently able to infer function for only a fraction of the genes in any given complete genome, • let alone metagenome, any of these approaches should be deemed hypothetical at best; nevertheless, like any missing value imputation process, they can provide numerically stable guesses that are substantially better than random. 53 Host Interactions and Interventions • Final aspect of translational metagenomics lies in understanding microbiome community and its environment - They interaction with a human host. Human are heavily influenced by microbiome especially host health and disease. • Almost all over the human body part includes microbiome. The skin of humans hosts relatively few taxa the nasal cavity somewhat more, the oral cavity several hundred taxa and the gut over 500 taxa with densities over 10 ^ 11 cells/g . Almost none of these communities are yet well-understood. 54 Human gut. • The gut, is currently the best studied human microbiome . • It is a dynamic community changing over time such as diet , illness, travel, chemical additives, and antibiotics. • Indeed, the human gut microbiome has proven difficult to study exactly because it is so intimately related to the physiology of its host; in as much as no two people share identical microbiota, 55 Microbiome and Human Health • Microbiota clearly represent a key component of future personalized medicine. • The number and diversity of phenotypes linked to the composition of the microbiota is immense: obesity, diabetes, allergies, autism, inflammatory bowel disease, fibromyalgia, cardiac function, various cancers, and depression have all been reported to correlate with microbiome function . • Even without causative or modulatory roles, there is tremendous potential in the ability to use the taxonomic or metagenomic composition of a subject’s gut or oral flora as a diagnostic or prognostic 56 biomarker for any or all of these conditions. Microbiome and Human Health cont. 57 Human Microbiome Project (HMP) • The Human Microbiome Project (HMP) is a United States National Institutes of Health initiative with the goal of identifying and characterizing the microorganisms which are found in association with both healthy and diseased human [5]. • Launched in 2008, it is a five-year project, best characterized as a feasibility study, and has a total budget of $115 million [5]. • National Institutes of Health (NIH)sponsored microbiome projects is to test how changes in the human microbiome are associated with human 58 health or disease. HMP cont. • Project will be culture-independent methods of microbial community characterization, such as metagenomics , as well as extensive whole genome sequencing . • The microbiology of five body sites will be emphasized: oral, skin, vaginal, gut, and nasal/lung. • Total microbial cells found in association with humans may exceed the total number of cells making up the human body by a factor of ten-to-one. • The total number of genes associated with the human microbiome could exceed the total number of human 59 genes by a factor of 100-to-one The goal of the HMP • To develop a reference set of microbial genome sequences and to perform preliminary characterization of the human microbiome • To explore the relationship between disease and changes in the human microbiome • To develop new technologies and tools for computational analysis • To establish a resource repository • To study the ethical, legal, and social implications of human microbiome research 60 Human Microbiome Project (HMP) 61 Some new discoverer with HMP • Microbes contribute more genes responsible for human survival than humans' own genes. It is estimated that bacterial protein-coding genes are 360 times more abundant than human genes. • Microbial metabolic activities; for example, digestion of fats; are not always provided by the same bacterial species. • Components of the human microbiome change over time, affected by a patient disease state and medication. However, the microbiome eventually returns to a state of equilibrium, even though the 62 composition of bacterial types has changed. HMP Achievements Major categories of work : • Development of new database systems allowing efficient organization, storage, access, search and annotation of massive amounts of data. • Development of tools for comparative analysis that facilitate the recognition of common patterns, major themes and trends in complex data sets • Development of new methods and systems for assembly of massive sequence data sets. 63 Summary • The human microbiome has been referred to as a forgotten organ. Microbiome cells outnumber 10 times to the human cells. • Women has more diverse microbiome than men. • The human microbiome consists of unicellular microbes - mainly bacterial, but also archaeal, viral, and eukaryotic that occupy nearly every surface of our bodies and have been linked to a wide range of phenotypes in health and disease. 64 Summary • High-throughput assays have offered the first comprehensive culture-free techniques • for surveying the members of these communities and their biomolecular activities at the transcript, protein, and metabolic levels. • Most current technologies rely on DNA sequencing to examine either individual taxonomic markers in a microbial community, typically the 16S ribosomal subunit gene, or the composite metagenome of the entire 65 community. Ongoing study. • Ongoing studies are beginning to investigate the ways in which the microbiota can be directly engineered using pharmaceuticals, • prebiotics (a food substance metabolized by the microbiota so as to directly or indirectly benefit the host), • probiotics (a live microorganism consumed by the host with direct or indirect health benefits), or diet as a preventative or treatment for a wide range of disorders. 66 REFERENCES 1) http://en.wikipedia.org/wiki/Human_microbiome 2) http://en.wikipedia.org/wiki/Metagenomics 3)http://en.wikipedia.org/wiki/16S_ribosomal_RNA 4) https://www.bcm.edu/departments/molecular-virology-andmicrobiology/microbiome 5) http://en.wikipedia.org/wiki/Human_Microbiome_Project 6) http://en.wikipedia.org/wiki/Virus YouTube lectures: 1) http://www.youtube.com/watch?v=EEZSuwkx7Ik 2)http://www.youtube.com/watch?v=pMU9d67ShoQ 3)http://www.youtube.com/watch?v=5DTrENdWvvM 4)http://www.youtube.com/watch?v=thuMCQ8ngzM/ 5)http://www.youtube.com/watch?v=cXrWADTIf0s&list=PLC848629C82162FB0 6)http://www.youtube.com/watch?v=erVVL1XfLkw 67