Background Prediction Feature Selection Sex | Run 7

Background
 The human microbiome is composed of trillions
of organisms occupying the gastrointestinal, genitourinary, and respiratory tracts, the skin, and the
oral cavity.
Prediction
Feature Selection
Sex | Run 7
Sex
(F=0 | M=1)
 Influential in immunity, vitamin production, digestion, maintenance of intestinal mucosa.
 Shifts in microbiota composition may have significant consequences in the homeostasis of the host .
Aims
 Implement supervised latent Dirichlet allocation (sLDA) to describe microbial architecture.
Body Site | Run 1
(Skin=0 | Gut=1)
 Predict host features based on microbiome composition as a function of topics.
Body Site
Data
 Open access American Gut Project
 Operational taxonomic unit (OTU) genomic information from 3832 subjects with information on
194 variables such as sex, diet, and weight.
Methods
Obesity | Run 1
(Lean=0 | Obese=1)
*
1
sLDA
2
Support Vector
Machine
Random Forest
are not directly associated with the two regression figures (Log(OTU) vs. raw BMI). These are the logistic regression coefficients from the sLDA model fit. sLDA fits a maximized set of topic assignments then re* Coefficients
gresses the document annotations (in this case 0 or 1 for non-obese or obese, respectively) against the distribution of words assigned to a given topic k for each document d:
TZ = Yd, where Yd is a vector of annotations of length D
Z is a D x K matrix where K is the number of topics and Zi,j = N1(zn,d = kj, Yd = di)
Accuracy