Article Source
Hosted on the New York Times
Written by Carl Zimmer
Research Source
Article Posted on Nature
Research Authors:
Manimozhiyan Arumugam, Jeroen Raes, Eric Pelletier, Denis Le Paslier, Takuji Yamada, Daniel R.
Mende, Gabriel R. Fernandes, Julien Tap, Thomas Bruls, Jean-Michel Batto, Marcelo Bertalan, Natalia
Borruel, Francesc Casellas, Leyden Fernandez, Laurent Gautier, Torben Hansen, Masahira Hattori,
Tetsuya Hayashi, Michiel Kleerebezem, Ken Kurokawa, Marion Leclerc, Florence Levenez,
Chaysavanh Manichanh, H. Bjørn Nielsen, Trine Nielsen, Nicolas Pons, Julie Poulain, Junjie Qin,
Thomas Sicheritz-Ponten, Sebastian Tims, David Torrents, Edgardo Ugarte, Erwin G. Zoetendal, Jun
Wang, Francisco Guarner, Oluf Pedersen, Willem M. de Vos, Søren Brunak, Joel Doré, MetaHIT
Consortium (additional members), Jean Weissenbach, S. Dusko Ehrlich & Peer Bork
Researchers looked at what bacteria is found in people's stomachs
Discovered people are host to one of three bacteria ecosystems
Discovery was made by analyzing the types of bacteria DNA found in test subjects’ skin and sweat
The DNA data was examined using clustering analysis
The topic of the article does not relate directly to any material covered in class
However
It is interesting because it demonstrates using data to solve real world problems
Every Human is Host to 100 trillion bacteria
The researchers were looking for DNA related to 1,511 bacteria species
The researchers did not know what they were looking for:
“We didn't have any hypothesis, Anything that came out would be new” -Dr. Bork
Trying to group things into known categories
Examples:
Grouping donated blood by blood type
Looking for patients with low, med, and high risk for heart disease
Looking for groups in data
Clustering Analysis of Blood Groups By Percent of Population can Donate To/From
AB
100% A
B
O
0%
0% Donate To 100%
M Arumugam et al. Nature 000 , 1-7 (2011) doi:10.1038/nature09944
Clustering analysis highlights the existence of distinct groups in data
Can be used in a situation with a lot of data, but little knowledge of how to organize the data
It can provide enough information about a subject to allow more interesting questions to be asked
Classification analysis is grouping data into known groups
Clustering analysis is looking for unknown groups in data
Clustering analysis is most useful when not much is known about a subject