The NIH Human Microbiome Project (HMP)

The NIH Human Microbiome Project (HMP)
Prepared by Lita M. Proctor, NHGRI/NIH for 2013 ACVP Conference proceedings
081913 draft
The human microbiome comprises the genes, gene products and genomes of the microbiota, which
include archaea, bacteria, eukaryotic viruses, bacteriophage, fungi and protozoa, that inhabit the body.
The NIH Human Microbiome Project (HMP) was established to create a map of the microbial
communities of the human body in health and in disease and to evaluate the biological properties of the
microbiome. As a community resource program, the HMP has produced or will produce reference
microbial genome sequences, 16S ribosomal RNA gene sequences for microbial taxonomic analysis,
metagenome sequences for microbial community analysis, as well microbial community gene
expression, proteome and metabolome data from the microbiome. All of these data as well as
associated computational tools and scientific approaches are being generated or developed to support
growth of this emerging concept for human health.
Throughout most of human history, we have felt at war with microbes. Bubonic plague, small pox,
yellow fever, and typhoid are just a few important historical examples while modern day infectious
diseases include malaria, tuberculosis, cholera and HIV/AIDS, to name a few. The scientific study of
microbiology, which led to important discoveries such as Louis Pasteur’s “germ theory of disease”, grew
out of society’s desire to conquer these pathogens and eradicate infectious disease. But a new view is
emerging in which we recognize humans and microbes as a co-evolved system for the mutual benefit of
both the host and the resident microbes. We now recognize that the human body is made up of about
10 times more microbial cells (~1014) than human cells (~1013). Further, there may be hundreds of times
more microbial genes (i.e. millions) than human genes (i.e. 20,000-25,000) in this human+microbiome
system, which is often thought of as a human ‘superorganism’, and it is these microbial communities
and the way they interact with the human host which describe their role in our health.
It is thought that infants are sterile in the womb and receive their first inoculum of microbes from the
mother, which goes on to colonize the newborn and, through successional waves of different microbes,
leads to the development of its own microbiome. The newborn relies on this maternal microbial
inoculum as well as other environmental sources of microbes for microbial colonization of all exposed
surfaces in and on the infant’s body (e.g. oral, nasal/airways, gut, urogenital, skin). This is a dynamic
process in which microbial abundances increase from effectively zero at birth to over 6 orders of
magnitude within just the first few weeks of life with wide swings in the microbial membership of these
communities until the microbiota largely stabilize and become ‘adult-like’ in composition and numbers
after approx. 3 years of life (Costello et al., 2009; Koenig et al., 2011).
At the same time, it is thought that the newborn’s gut microbiota trigger development and maturation
of the newborn’s immune system. Though there is still a great deal of active research needed to
understand precisely what happens in this developmental process, it appears that the maturing immune
system relies on the presence of specific microbial communities and especially the presence of specific
bacteria early in the maturation process as the immunological trigger for identifying ‘self’ from ‘nonself’
(Hooper and Gordon, 2001; Round and Mazmanian, 2009). Much research is now examining the links
between these early life events of the microbiome and the immune system and the subsequent health
status over a person’s lifetime. The NIH Human Microbiome Project (HMP) was created to provide tools,
data and resources to enable research in this emerging concept in the biomedical field and for human
The NIH Human Microbiome Project: A Community Resource
The human microbiome encompasses the full complement of microbial genes, gene products, and
genomes of the microbiota (which include bacteria, archaea, eukaryotic viruses, bacteriophage and
eukaryotic microbes like fungi and protozoa) that call the human body home and interact with the
human host to maintain host health. At thousands of species, millions of microbial genes and trillions of
microbial cells, the global microbiome contributes to the health and maintenance of the human
superorganism. Interest in this human+microbiome system has been motivated by simultaneous
advances in sequencing technologies and in microbial ecology, by an ever increasing understanding that
the human host and microbiota have co-evolved and that resident microbiota are intimately involved in
the development and maintenance of the human immune system. To catalyze this research, the NIH
launched an 8-year Human Microbiome Project (HMP) as a community resource program
( There are two phases to the program, a first phase (2007-2012) to
undertake a survey of the human microbiome and a second phase (2013-2015) to create a catalog of
biological properties of the human microbiome.
The HMP is 1) surveying the microbiomes across the bodies of a cohort of healthy adults to produce a
reference dataset of baseline microbiomes, 2) developing a catalog of microbial genome sequences of
reference strains and, 3) evaluating the characteristics of microbiomes associated with specific
gastrointestinal tract, urogenital and skin diseases and 4) compiling a rich catalog of microbiome
biological properties in association with host properties from cohorts with known diseases or health
conditions. A description of the some of the key resources and the consortium activities in both phases
of this program follow.
HMP Reference Strain Microbial Genome Sequence Catalog
The HMP has assembled a key reference dataset of microbial genome sequences collected from the
major body regions of the human microbiome, primarily bacterial, although it also includes archaea,
viruses, bacteriophage and eukaryotic microorganisms. The project's target catalog of 3,000 microbial
genome sequences is intended as a reference for the interpretation of the 16S ribosomal RNA gene
sequences, as well as a scaffold for assembly of metagenomic sequences determined from the microbial
communities. A publication documenting the analysis of the first 178 microbial isolates was published
(Nelson et al., 2010); just this subset of the catalog described over 550,000 predicted genes, 30,000 of
which were novel.
As of this writing, thousands of microbial strains have been sequenced or are in progress for the HMP
reference strain catalog ( Several hundred of these
sequences are available in GenBank (, and a key subset
of the cultures of the corresponding reference strains are available at the HMP Strain Repository in the
ATCC/Biodefense and Emerging Infectious Diseases Research Repository (BEI)
HMP Healthy Adult Cohort Study
The second major resource of the HMP is the largest study to date of the microbiomes of five major
regions of the body of healthy adults (airway, skin, oral cavity, gastrointestinal tract, and vagina). Several
specific body sites were sampled within each major region (18 in total), and as the volunteers were
clinically verified to be free of overt disease in all body regions, this study is known as the healthy adult
cohort study.
Three hundred adult volunteers were enrolled at two US clinical centers. These included equal numbers
of 18-40 year old men and women, 20% of whom self-identified as a racial minority, and 11% selfidentified as Hispanic. Exclusion and inclusion criteria, clinical sampling procedures, and the
corresponding clinical metadata, can be found at the NCBI database of Genotypes and Phenotypes
(dbGaP, Of the 300
volunteers in this study, over 2/3 were sampled twice and 1/3 were sampled a third time, over
approximately two years. Among the 18 total body sites sampled, the oral cavity had the largest number
of sites (9) and all body sites were directly sampled except for the gastrointestinal tract, for which stool
served as a proxy. Blood was collected for serum and for bulk DNA for human subject whole genome
sequencing, and lymphocytes were harvested for the preparation of cell lines; these specimens are
being held for future analysis and distribution when the larger research community expresses interest in
these resources.
Of the over 10,000 primary microbiome specimens collected for the full cohort, all have been sequenced
for the 16S rRNA gene taxonomic marker. Metagenomic sequence data has additionally been generated
from approximately one thousand of the nucleic acid samples. A portion of the full set of 16S data and
of the metagenomic data was targeted by the HMP research consortium for a global analysis. In 2012,
these analyses were published in Nature (The Human Microbiome Project Consortium, 2012a, 2012b)
and in the HMP collection of related papers in PLoS
over). As of this writing, over 300 publications from analysis of the healthy cohort study data and from
other studies in the program are in PubMed and cite HMP support. A current list can be found at
HMP Demonstration Projects
A third key resource from this program are the HMP Demonstration Projects, designed to evaluate
microbiome characteristics in health conditions or disease states with putative microbiome associations.
Many complex diseases appear to have a microbiome component, and these projects were designed to
characterize the microbiome in such cases in order to develop a reference dataset of microbiome
properties associated with specific disease and clinical phenotypes.
Eleven Demonstration Project studies were supported in the program, including six projects on
microbiome-associated gastrointestinal diseases (Crohn’s disease, ulcerative colitis, pediatric
inflammatory bowel syndrome, neonatal necroticizing enterocolitis, and esophageal adenocarcinoma),
three on urogenital status (i.e., microbiome characteristics associated with bacterial vaginosis,
reproductive history, sexual history and circumcision), and two on microbiome-associated skin diseases
(atopic dermatitis and psoriasis). Almost all of the studies include 16S rRNA gene and metagenomic
sequencing, and some also include functional data from the microbiome such as microbial community
gene expression, proteomics, or metabolomics. The Demonstration Projects sequence datasets and
study descriptions can be found at NCBI Bioprojects (
HMP Phase Two
It was recognized during the analysis of the healthy cohort metagenomic data that there appeared to be
millions of unique microbial genes in the human microbiome and this brought into question the genetic
potential of these microbial communities. What products and pathways do these microbial genes
encode for? It was not feasible to decode all of the transcripts, gene products and metabolites of the
global human microbiome. However, a more limited effort to collect biological properties of the
microbiome in well-characterized cohorts of putative microbiome-associated diseases or health
conditions became the focus of the second phase of the HMP program. The data from these studies are
intended to serve as model systems for microbiome-associated disease or health conditions and so of
broad interest to the larger research community. The properties of particular interest include the
microbial community gene expression profiles (metatranscriptome), the proteins (metaproteome) and
the metabolic products of microbial activity (metabolome). These data will be combined to create an
integrated dataset of microbiome and host properties of subjects as a community resource. This dataset
will provide the research community the opportunity to evaluate which microbiome properties or
combination of properties are the most informative when investigating the role of the microbiome in
disease or in health.
Future Directions for Human Microbiome Research
Though the HMP may be the most visible example of NIH interest in the human microbiome, there are
currently 16 NIH Institutes and Centers which have microbiome-related research programs or are
developing new microbiome programs. A recent three-day meeting brought together the NIH and the
larger research community to discuss the state of the science for this field and to identify the gaps,
needs and challenges for advancing this field over the next ten years. The videotaped talks and slides are
available here
One of the crucial needs called out by these meeting participants was the desire for large, longterm
cohort studies which could serve as a platform from which numerous investigations could address
microbiome development, variability of the microbiome across populations, temporal changes, and
functional properties in response to diet or disease or other perturbations or interventions. Genome
sequences of the cohort subjects would provide invaluable data for integration with the microbiome
properties. Opportunities for cohort studies may soon become available; for example, a federal
collaboration between the NIH, the CDC and the EPA is currently developing the National Children’s
Study (, which plans to follow 100,000 children from birth to 21
years of age. The pilot phase of the NCS is exploring the inclusion of microbiome samples during this
large study. These and other cohort studies could provide the ideal framework from which to analyze
the microbiome from birth throughout lifetimes in diverse populations.
Costello, E.K., Lauber, C.L., Hamady, M., Fierer, N., Gordon, J.I., and Knight, R. (2009). Bacterial
community variation in human body habitats across space and time. Science 326, 1694-1697.
Hooper, L.V. and J. I. Gordon. (2001). Commensal host-bacterial relationships in the gut. Science 292,
The Human Microbiome Project Consortium (2012a). Structure, function and diversity of the healthy
human microbiome. Nature 486, 207-214.
The Human Microbiome Project Consortium (2012a).A framework for human microbiome research.
Nature 486, 215-219.
Koenig, J.E., Spor, A., Scalfone, N., Fricker, A.D., Stombaugh, J., Knight, R., Angenent, L.T., and Ley, R.E.
(2011). Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S
A 108 Suppl 1, 4578-4585.
Nelson, K.E., Weinstock, G.M., Highlander, S.K., Worley, K.C., Creasy, H.H., Wortman, J.R., Rusch, D.B.,
Mitreva, M., Sodergren, E., Chinwalla, A.T., et al. (2010). A catalog of reference genomes from the
human microbiome. Science 328, 994-999.
Round, J.L., and Mazmanian, S.K. (2009). The gut microbiota shapes intestinal immune responses during
health and disease. Nat Rev Immunol 9, 313-323.