index

advertisement
Theoretical and Practical Course
on Metagenomic Methods for the
Study of Complex Microbial
Communities
Organizing Committee:
Eduardo Santero
Manuel Ferrer
Ramón Rosselló
Juan Luis Ramos
esansan@upo.es
mferrer@icp.csic.es
rossello-mora@uib.es
juanluis.ramos@eez.csic.es
Universidad Pablo de Olavide, Sevilla
2-6 Febrero 2009
INDEX
I. Lecture abstract
I.1. The gold era of metagenomics – Dr. Manuel Ferrer
I.2. Molecular Methods to construct environmental DNA libraries – Dr. Manuel Ferrer
I.3. High-throughput sequencing: applications and challenges– Dr. Julián Pérez
I.4. Biodiversity and biologically active molecules – Dr. Olga Guenilloud
I.5. Bioinformatics applied to bacterial (meta)genomics – Dr. Javier Tamames
I.6. Revealing the identity of DNA fragments – Dr. Ramón Roselló
II. Experimental procedures
II.1. DNA extraction and pLAFR3 shoulder preparation
Sample preparation
DNA extraction
Gel preparation
pLAFR3 shoulder preparation
II.2. DNA and 16S rRNA gene libraries production (1)
16S rRNA gene libraries construction
CopyControl fosmid library production
pLAFR3 cosmid library production
Lambda phage library production
II.3. DNA and 16S rRNA gene libraries production (2)
II.4. DNA library production (3)
II.5. DNA library production (4) and activity screens
III. In silico procedures
III.1. Meta(genomics) assembling methodologies – Dr. Giuseppe D’Auria
III.2. Phylogenetic reconstructions. An ARB software introduction – Dr. Pablo Yarza
III.3. Bioinformatics for metagenomcis. A beginners guide – Dr. Michael Richter
IV. Contacts. List of participants
V. Annexes. Kits Instruction Manuals
2
I. LECTURE ABSTRACT
I.1. The gold era of metagenomics
Dr. Manuel Ferrer & Ana Beloqui
CSIC – Instituto de Catálisis, Madrid, Spain
Metagenomics (also Environmental Genomics, Ecogenomics or Community Genomics) is an
emerging approach to study microbial communities in the environment. This relatively new
technique enables studies of organisms that are not easily cultured in a laboratory, thus
differing from traditional microbiology that relies on cultured organisms. Metagenomics
technology thus holds the premise of new depths of understanding of microbes and,
importantly, is a new tool for addressing biotech problems, without tedious cultivation
efforts. DNA sequencing technology has already made a significant breakthrough and
generation of giga base pairs of microbial DNA sequences is not posing a challenge any
longer. However conceptual advances in microbial science will not only rely on the
availability of innovative sequencing platforms but also on sequence-independent tools for
getting an insight into the functioning of microbial communities. This is an important issue
as we know that even the best annotations of genomes and metagenomes only created
hypotheses of the functionality and substrate spectra of proteins which require experimental
testing by classical disciplines such as physiology and biochemistry. Here, we addressed the
following question, how to take advantage of, and how can we improve the, metagenomic
technology for accommodating the needs of microbial biologists and enzymologists.
3
I. LECTURE ABSTRACT
I.2. Molecular Methods to construct environmental DNA libraries
Dr. Manuel Ferrer & Ana Beloqui
CSIC – Instituto de Catálisis, Madrid, Spain
Recent emergency of “metagenomics” allows the analysis of microbial communities without
tedious cultivation efforts. Metagenomics approach is analogous to the genomics with the
difference that it does not deal with the single genome from a clone or microbe cultured or
characterized in laboratory, but rather with that from the entire microbial community
present in an environmental sample, it is the community genome. Global understanding by
metagenomics depends essentially on the possibility of isolating the entry bulk DNA and
identifying the genomes, genes and proteins more relevant to each of the environmental
sample under investigation. Here, we tried to provide a broad view at current technical
issues to illustrate the potential of getting appropriate metagenomic material to create
representative gene libraries, as the first step for analysis community genomes.
4
I. LECTURE ABSTRACT
I.3. High-throughput sequencing: applications and challenges
Dr. Julián Pérez
Secugen, Madrid, Spain
The first nucleic acid sequenced, a gene of a RNA virus, was sequenced by Walter Fiers in
the early 70´s. Since then, Nucleic Acid Sequencing Technologies have been improving
continuosly. In the mid 70´s chemical and enzymatic methods for DNA sequencing were
developed by Maxam and Gilbert and by Sanger respectively. The enzymatic DNA
sequencing method has been an standard since then, with several improvements mainly
driven by the Human Genome Project needs. Although this technology is still in use, in the
past years new DNA sequencing technologies have been developed, with the main goal of
getting a cheaper and faster sequence. These ones has been called Next Generation
Sequencing Technologies, NGST. The first paper of these NGST came out in 2005 with the
presentation of 454 pyrosequencing technology. In a brief period of time more new
technologies have appeared, some of them are Solid, Solexa and Helicos, but are not the
only ones. These technologies have reduced the sequence price more than tenfold
compared with Sanger standard. The main characteristics of NGSTs will be described as well
as the techniques that could be the next NGSTs like nanopore sequencing. We will also
focus on NGSTs applications, like genome sequencing, transcriptomics, and metagenomics,
and on the future challenges of these rapid evolving technologies.
5
I. LECTURE ABSTRACT
I.4. Biodiversity and biologically active molecules
Dr. Olga Genilloud
From an historical perspective, microbial natural products have represented during
decades an essential source of new drugs, with a positive impact on the human quality of
life. The discovery and development of antibiotics allowing the control of infectious diseases,
and more recently the success of organ transplant therapies due to the development of
immunosuppressors, are clear examples of their beneficial effects on the society. Natural
products exhibit an amazing structural diversity but from a synthetic point of view they
derive from a limited number of biosynthetic pathways that evolved to produce an immense
diversity of chemical structures. This diversity has been translated into more 150 natural
products or semi-synthetic derivatives in the marked in the last decades.
One of the key factors in natural products programs is to guarantee the introduction
of the highest diversity of microbial groups as potential producers of novel bioactive
compounds. This continuous search of new microorganisms has required from the industrial
microbiologists the development of a large variety of isolation strategies that, in
combination with multiple characterization tools, have importantly contributed to the
knowledge of this microbial diversity.
In spite of the large existing gap between the bacterial and fungal diversity described
in the environment and the estimated data derived from metagenomic approaches that
suggest the huge non-explored diversity, industrial screening methods require to follow
traditional isolation and culturing of microbial strains to ensure the exploitation of this
diversity in artificial laboratory conditions.
Focusing on the large groups of actinomycetes and fungi, major producers of
bioactive molecules traditionally used in industrial screening, we will comment the
importance of the isolation sources, the diversity in the approaches applied to ensure their
isolation and taxonomic characterization, the strategies used to explore and exploit the
biosynthetic potential, and promoting the production of novel compounds. Finally we will
present how these efforts can be aligned within a general strategy focused on the detection
and characterization of novel bioactive molecules.
6
I. LECTURE ABSTRACT
I.5. Bioinformatics applied to bacterial (meta)genomics
Dr. Javier Tamames
Cavanilles Institute on Biodiversity and Evolutionary Biology, Valencia, Spain
Metagenomics sequencing obtains vast amount of DNA sequences that must be analysed
and annotated. This requires massive amounts of computational resources and also the
adaptation of existing bioinformatic techniques to the particular characteristics of this kind
of data. We will focus on the current state of the bioinformatic developments for
metagenomics, identifying the main problems that still need to be solved in order to get the
most of the data.
7
I. LECTURE ABSTRACT
I.6. Revealing the identity of DNA fragments
Dr. Ramón Roselló
Marine Microbiology Group (MMG), IMEDEA, Esporles
The metagenomic approach applied to natural microbial communities has brought important
information on the genetic potential of the organisms thriving in the studied environments.
However, one of the major drawbacks of the approach is to identify the identity of the
fragments of the cloned DNA. Molecular microbial ecology has long been directed the efforts
in describing an extremely hidden diversity that was not achieved by classical culturing
techniques. Much of the effort has been centred in the 16S rRNA gene as harboring a
phylogenetic signal that allows the identification of the organisms harbouring it. However,
there are other housekeeping genes that contain as well a signal that can be useful for their
identification.
Due to the low amount of paralog sequences of 16S rRNA genes in a given genome, the
probabilities to find them in a cloned fragment by using the metagenomic approach are very
low. Due to this reason, alternative genes may be selected that will help in understanding
the origin of the DNA. In such cases in where a phylogenetic valid gene is found, the
putative identity of an organism is normally guaranteed. However, in most of the cases,
DNA fragments may not contain any of such genes. In these cases, there is a need to find
alternative approaches to be able to affiliate a DNA fragment with an existing taxon.
During the talk, it will be discussed what does identity means by using gene sequences.
Different genes with different phylogenetic signals will be discussed in the frame of the
purpose of identifying their property. In addition, alternative but less accurate approaches
as tetranucleotide signals will be outlined in order to understand different levels of assigning
a sequence to an existing organism.
8
II. Experimental procedures
Day 1 (afternoon)
DNA extraction and pLAFR3 shoulder preparation
Material
Nycodenz (1.3 mg ml-1)
Disruption buffer (0.2M NaCl, 50 mM Tris-HCl pH 8)
PBS 1x buffer
TE buffer
Sample
Agarose 0.6-0.7% (w/v)
-HindIII marker
 mono-cut marker
LB-agar-Amp50-XGal
HindIII, EcoRI, BamHI and buffers
Shrimp Alkaline Phosphatase
Microcon-100 (Millipore)
E. coli S17-3 (bearing pLAFR3 cosmid)
LBa and LBb
Large construct kit (Qiagen)
GeneClean Kit (BIO101)
Protocol 1 – sample preparation
[1] Prepare sample suspension: to 40 g sample add 140 ml disruption buffer in a Waring
blender.
[2] Blend the suspension on a low speed setting for 3x1 min periods with collind on ice for 1
min between blending.
[3] Centrifuge at low speed (approx. 200-400 g for 1-5 min) to eliminate large soil particles
and then use supernatant for biomass separation via Nycodenz
[4] 25-mL of the soil homogenate is transferred to an ultracentrifuge tube and 9-11 mL of
nycodenz (1.3 g ml-1) is carefully pipetted to form a layer below the homogenate.
[5] Centrifuge at 10000 g x for 20-40 min at 4ºC. Preferably swing-out rotor.
[6] A faint whitish band containing bacterial cells is resolved at the interface between the
nycodenz and the aqueous layer. This band is transferred into a sterile tube. Note that
9
sometimes, soils contain a lot of small particles which are not separable: they cover
nycodenz surface making solid layer mixed with microbial biomass (this problem is
typical for clay soils)
[7] Approx. 35 mL of phosphate buffered saline buffer (PBS) is added and the cells pelleted
by centrifugation at 10000 g for 20 min. The cells pellet, re-suspended in 0.5-2.0 mL TE
buffer pH 8.0, is then ready for lysis and DNA extraction.
Protocol 2 – DNA extraction
[1]
To the above cells, add 1.85 ml Cell Suspension Solution (use a 15 ml clear plastic
tube for efficient mixing). Mix until the solution appears homogeneous.
[2]
Add 50 l of RNase Mix, mix thoroughly. Add 100 l of Cell Lysis/Denaturing Solution,
mix well.
[3]
Incubate at 55°C for 15 minutes.
[4]
Add 25 l Protease Mix, mix thoroughly.
[5]
Incubate at 55°C for 30 to 120 minutes (the longer time will result in minimal protein
carry over and will also allow for substantial reduction in residual protease activity).
[6]
Add 500 l “Salt-Out” Mixture, mix gently yet thoroughly. Divide sample into 1.5ml
tubes. Refrigerate at 4°C for 10 minutes.
[7]
Spin for 10 minutes at maximum speed in a microcentrifuge (at least 10000 g).
Carefully collect the supernatant, avoid the pellet. If a precipitate remains in the
supernatant, spin again until it is clear. Pool the supernatants in a 15 ml (or larger)
clear plastic tube.
[8]
To this supernatant, add 2 ml TE buffer and mix. Then add 8 mls of 100% ethanol. If
spooling the DNA, add the ethanol slowly and spool the DNA at the interphase with a
clean glass rod. If centrifuging the DNA, add the ethanol and gently mix the solution
by inverting the tube.
[9]
Spin for 15 minutes at 10000 g. Eliminate the excess ethanol by blotting or air drying
the DNA.
[10] Dissolve the genomic DNA in TE buffer.
[11] Quantify the amount of nucleic acid.
[12] Run an aliquot (about 400 ng) together with markers in an agarose gel (0.7% w/v).
Protocol 3 – Gel preparation
[1]
Prepare an agarose gel (0.7%).
10
[2]
Run an aliquot (about 400 ng) together with markers.
[3]
Run overnight a 20 cm long gel 1% agarose at 30-35 V overnight at 4ºC
Protocol 4 - pLAFR3 shoulders preparation
[1]
Inoculate 200 ml of LB, Tc 10 g/ml with a single colony of E. coli S17-3 (bearing
pLAFR3 cosmid) and grow it overnight with orbital shaking (250 rpm) at 30ºC. Pellet
cells for 10 min at 7000 g and islolate pLAFR3 plasmid with large construct kit
(Qiagen), treating the sample with ATP-dependent exonuclease to have just this
cosmid, thus eliminating DNA chromosome.
[2]
Then take two aliquots of around 10 g of pLAFR3 and cut one with HindIII (shoulder
1) and the other with EcoRI (shoulder 2) at 37ºC during 1-2 hours. Then, run small
aliquots in a 0.75% agarose electrophoresis gel just to see that the digestion worked
property. Then incubate samples at 65°C for 20 min to inactivate restriction enzymes.
20 l pLAFR3 vector (10 g)
5 l Buffer NEB2 10X
5 l BSA 10X
19 l MilliQ water
1 l EcoRI 20U/l
Total reaction volume: 50 l
20 l pLAFR3 vector (10 g)
5 l Buffer NEB2 10X
5 l BSA 10X
19 l MilliQ water
1 l HindIII 20U/l
Total reaction volume: 50 l
[3]
Add 3 l of Shrimp Alkaline Phosphatase (SAP, from Biotec ASA) to dephosphorylate
DNA, incubate 1 hr at 37°C. In order to spurn DNA shearing avoid pipetting, just stir
the tube to mix. Then incubate samples at 65°C for 20 min to inactivate SAP.
[4]
Mix the pLAFR3 shoulders at 1:1 and add 400 l of water to wash it off in Microcon100 (Millipore). Concentrate to a small volume (around 30-40 l).
11
[5]
To a volume of 37 l of Microcon-concentrated DNA add 5 l of buffer 10X NEB3 (New
England Biolabs Buffer 3), 5 l of BSA 10X, 2 l of MilliQ water and 1 l of BamHI
enzyme and digest overnight at 37ºC.
[6]
Run small aliquotes in a 0.75% agarose electrophoresis gel just to see that the
fragments will remain the same size (22 Kb), as before BamHI-digestion.
[7]
Use the GeneClean Kit (BIO101) to inactivate BamHI and to concentrate the pLAFR3
shoulders.
[8]
To do that add 150 l NaI solution
[9]
Add 5 l GLASSMILK (previous vortexing) and mix
[10] Incubate at room temperature for 5 min and mix
[11] Pellet the GLASSMILK with DNA at 14000 g x 5 seg and discard supernatant
[12] Add 500 l NEW Wash and resuspend
[13] Centrifuge at 14000 g x 5 seg and discard supernatant
[14] Repeat washing step.
[15] Dry pellet to remove residual EtOH
[16] Add 50-100 l TE or water and mix
[17] Centrifuge for 30 seg and store supernatant containing pLAFR3 ready-to-use vector.
12
II. Experimental procedures
Day 2 (morning and afternoon)
DNA and 16S rRNA gene libraries production
Material
Samples
16S rRNA primer 16F530 (5’-TTCGTGCCAGCAGCCGCGG-3’)
16S rRNA primer 16R1492 (5'-TACGGYTACCTTGTTACGACTT-3')
pGEM-Easy
T4 DNA ligase
pCC1FOS Epicentre (Cat. No. CCFOS110), pLAFR3 digested and ZAP Express vector
(Stratagene)
0.5 M EDTA pH 8.0 and TE buffer
Agarose 0.6-1.0% (w/v) (normal and low melting point)
-HindIII marker,  mono-cut marker
LB-agar-Amp50-XGal
Sau3A and buffer
Microcon-100 (Millipore)
LBa and LBb and Tc 5-10 mg/ml
GELase (Epicentre)
Protocol 5 – 16S rRNA gene libraries construction
[1]
The PCR reaction (50 l) is performed with an annealing temperature of 50ºC and 25
cycles should be used. The PCR products are purified from a 1% agarose gel and
inserted into the pGEMT-Easy vector (Promega) as follows:
Reaction 1: 1 l pGEMT-Easy, 1 l T4 DNA ligase buffer (x10), 0.5 l T4 DNA ligase,
3.3 l PCR product, 4.1 l MilliQ water
Reaction 2: 1 l pGEMT-Easy, 1 l T4 DNA ligase buffer (x10), 0.5 l T4 DNA ligase,
7.0 l PCR product, 0.5 l MilliQ water
[2]
Ligate at 4ºC overnight.
Protocol 6 – CopyControl™ Fosmid Library Production
The CopyControl™ Fosmid Library Production kit (EPICENTRE) utilizes a strategy of cloning
randomly sheared, end-repaired DNA with an average insert size of 40 kbp. Shearing the
DNA into approximately 40 Kb fragments leads to the highly random generation of DNA
13
fragments in contrast to more biased libraries that result from partial restriction
endonuclease digestion of the DNA. Frequently genomic DNA is sufficiently sheared, as a
result of the purification process, that additional shearing is not necessary. Test the extent
of shearing of the DNA by first running a small amount of it (around 100 ng). Run the
sample on a 20 cm long gel 1% agarose at 30-35 V overnight at 4ºC and stain.
If 10% or more of the genomic DNA migrates with the Fosmid control DNA provided with
the kit (36 Kb size), then you can proceed to the end repair protocol. If the genomic DNA
migrates slower (higher MW) than the 6 Kb fragment, then the DNA needs to be sheared.
Shear the DNA (2.5 g) by passing it through a 200 l small bore pipette tip. Aspirate and
expel the DNA from the pipette tip 50-100 times. If the genomic DNA migrates faster than
the 36 Kb fragment (lower MW) then it has been sheared too much and should be reisolated. For the end-repair protocol, take into account these suggestions:
End repair protocol
[1]
Thaw and thoroughly mix all of the reagents listed below before dispensing; place on
ice. Combine the following on ice:
8 l 10X End-Repair Buffer
8 l 2.5 mM dNTP Mix
8 l 10 mM ATP
32 l sheared insert DNA (approximately 4.3 g)*
20 l sterile water
4 l End-Repair Enzyme Mix
80 l Total reaction volume
*The end-repair reaction can be scaled up or scaled down as dictated by the amount of
DNA available.
[2]
Incubate at room temperature for 45 minutes.
[3]
Add gel loading buffer and incubate at 70ºC for 10 min to inactivate the End-Repair
Enzyme Mix.
[4]
Select the size of the end-repaired DNA by low melting point (LMP) agarose gel
electrophoresis. Run the sample on a 20 cm long 1% agarose gel at 30-35 V overnight
at 4ºC. Do not stain the DNA with EtBr and do not expose it to UV. Use stained DNA
marker lanes as a ruler to cut out the agarose region containing the 25-60 Kb DNA and
trim excess agarose.
14
Protocol 7 – pLAFR3 Cosmid Library Production
Since the discovery rate of novel proteins using traditional cultivation techniques has
significantly decreased during the past couple of years, many different expression hosts,
apart from the usual E. coli systems, are used at the moment for cloning the DNA
fragments. Of particular interest is the mining and further reconstitution of natural product
biosynthetic
pathways
where
large
multienzyme
assemblies
should
be
functionally
expressed and where the choice of a suitable heterologous host is critical. In this case, it
has been proposed the generation of broad host range vectors for replication in different
Gram-negative species, such us pLAFR3 vector, which is able to replicate in Pseudomonas
strains hosts (30). To this end, we are going to prepare metagenomic libraries with the
pLAFR3 vector, which allow the cloning of around 23 Kb insert DNA in the expression hosts
of the Pseudomonas genus.
Partial Sau 3AI digestion of DNA insert for pLAFR3 cloning.
In order to obtain DNA fragments of 25-50 Kb partially digested with Sau3AI is
recommended to do some pilot reactions using different amounts of enzyme. Set up a series
of reactions.
[1]
Take enzyme dilutions in 1 x reaction buffer (is enzyme 10 U/l) 1/10 l, 1/20, 1/50,
1/100, 1/200.
[2]
Do a trial digestion for 30 min at 37ºC.
2 l DNA (1 g)
1 l Buffer 10X
1 l BSA 10X
19 l MilliQ water
1 l Sau3A diluted
Total reaction volume: 10 l
[3]
Then add 1.5 l EDTA 0.5 M pH 8.0 heat at 65 C for 20 min.
[4]
Then run a 20 cm long gel 0.7-1% agarose and stain. Use the partial digestion
conditions that result in a majority of the DNA migrating in the desired size range (2550 Kb).
[5]
Make a scale-up reaction. Scale up Sau3AI enzyme amount for about 10 g DNA. You
should choose 2 different restriction conditions, as in the following example:
15
Reaction 1
Reaction 2
20 l concentrated insert DNA (10 g)
20 l concentrated insert DNA (10 g)
5 l Ligation Buffer NEB1 10X
7 l Ligation Buffer NEB1 10X
5 l BSA 10X
7 l BSA 10X
X l MilliQ water
X l MilliQ water
X l Sau3AI diluted
X l Sau3AI diluted
Total reaction volume: 50 l
Total reaction volume: 50 l
[6]
Incubate 20 min at 37ºC.
[7]
Stop reactions by adding 7.5 l EDTA 0.5 M pH8 and heat the samples to 65 ºC 15
min.
[8]
Then mix both reactions and load samples on a 20 cm long preparative low melting
point (LMP) gel 1% agarose, run it at 30-35 V overnight at 4ºC and cut and stain the
slots with the DNA marker. Do not stain the part of the gel containing your DNA for
cloning. Under UV light cut out the part of the gel blocks with the DNA markers in the
range of ca. 20 kbp to use them as a marker to excise the gel with environmental
DNA.
Protocol 8 – Lambda phage Library Production
Small insert expression libraries, especially those made in lambda phage vectors, are
specially constructed for activity screens; however, in contrast with cosmid or fosmid
vectors, the Zap Express pBK vector (Stratagene) allows cloning of up to 15 kbp (optimal
about 8.5-9.5 kbp).
Partial Sau3AI digestion of DNA insert for cloning in Zap Express vector.
In order to obtain DNA fragments of about 8.5-9.5 kbp partially digested with Sau3AI is
recommended to do some trial reactions using different amounts of enzyme. Set up a series
of reactions starting for example from 0.1 to 0.04 U of enzyme per 1 g of DNA:
[1]
Take enzyme dilutions in 1 x reaction buffer (is enzyme 10 U/l) 1/10 l, 1/20, 1/50,
1/100, 1/200.
[2]
Do a trial digestion for 30 min at 37ºC.
2 l DNA (1 g)
1 l Buffer 10X
16
1 l BSA 10X
5 l MilliQ water
1 l Sau3A diluted
Total reaction volume: 10 l
[3]
Incubate 20 min at 37ºC.
[4]
Stop reactions by adding 1.5 µL 0.5 M EDTA pH 8 and by heating the samples at 65 ºC
for 15 min.
[5]
Then run a 20 cm long gel 1% agarose stain. Use the partial digestion conditions that
result in a majority of the DNA migrating in the desired size range (5-15 Kb). So, for
the partial digestion of the DNA, you should scale up Sau3AI enzyme amount for at
least 2-10 g DNA. The two best restriction conditions are selected and scale up, as in
the following example:
Reaction 1
Reaction 2
20 l concentrated insert DNA (10 g)
20 l concentrated insert DNA (10 g)
5 l Ligation Buffer NEB1 10X
7 l Ligation Buffer NEB1 10X
5 l BSA 10X
7 l BSA 10X
X l MilliQ water
X l MilliQ water
X l Sau3AI diluted
X l Sau3AI diluted
Total reaction volume: 50 l
Total reaction volume: 50 l
[6]
Incubate 20 min at 37ºC.
[7]
Stop reactions by adding 7.5 l EDTA 0.5 M pH8 and heat the samples to 65 ºC 15
min.
[8]
Then mix both reactions and load samples on a 20 cm long preparative low melting
point (LMP) gel 1% agarose, run it at 30-35 V overnight at 4ºC and cut and stain the
slots with the DNA marker. Do not stain the part of the gel containing your DNA for
cloning. Under UV light cut out the part of the gel blocks with the DNA markers in the
range of ca. 20 kbp to use them as a marker to excise the gel with environmental
DNA.
17
II. Experimental procedures
Day 3 (morning)
DNA and 16S rRNA gene libraries production
Material
T4 DNA ligase
pCC1FOS Epicentre (Cat. No. CCFOS110)
0.5 M EDTA pH 8.0
Agarose 0.6-1.0% (w/v) (normal and low melting point)
TE buffer
Agarose
-HindIII marker
 mono-cut marker
LB-agar-Amp50-XGal
Sau3A and buffer
Microcon-100 (Millipore)
LBa and LBb
Tc 5-10 mg/ml in ethanol
GELase (Epicentre)
pLAFR3 digested
ZAP Express vector (Stratagene)
E. coli XL1 MRF’
E. coli EPI300
E. coli DH5
MgSO4 1 M and MgSO4 10 mM
Protocol 9 – 16S rRNA gene libraries construction (cont. protocol 5)
[1]
The product of this ligation (2 l) is used to transform 50 l competent E. coli DH5
cells.
[2]
Cells are plated in LB-agar-Amp50-XGal plates and incubated at 37ºC overnight.
[3]
Around 100 positives random selected clones (white colonies) are sequenced using the
M13f primer.
18
Protocol 10 – CopyControl™ Fosmid Library Production (cont. protocol 6)
DNA fragment size selection
[1]
Once run de gel overnight, proceed to the agarose gel-digesting assay using the
“GELase (EPICENTRE) Agarose Gel-Digesting protocol” described in steps below. Cut
the area > 20-30.
[2]
Thoroughly melt the gel slice by incubating at 70ºC for 3 min for each 200 mg of gel.
[3]
Transfer the molten agarose immediately to 45ºC and equilibrate 2 minutes for each
200 mg of gel.
[4]
Add 4 l 50x gelase buffer per each 200 mg agarose
[5]
Add 2 l GELase and incubate for 1-4 h at 45 ºC.
[6]
Centrifuge the tubes in a microcentrifuge at maximum speed (15000 g) for 15 min at
4ºC to pellet any insoluble oligosaccharides. Carefully remove the upper 90%-95% of
the supernatant, which contains the DNA, to a sterile 1.5 ml tube. You should be
careful to avoid the gelatinous pellet.
[7]
Concentrate the DNA in a Microcon-100 (Millipore) concentrator membrane (100 KDa
cut-off) at 4ºC to a final volume of 20-50 l. Be sure that you cut the yellow tip to
transfer the supernatant.
[8]
Then add 450 l steril water and concentrate again to 20-50 l. This concentrated DNA
is the insert to ligate to the pCC1FOS vector.
[9]
Quantify the amount of nucleic acid. DNA concentration should be not less that 75
ng/l (in 50 l a total of 3.75 g).
[10] Run an aliquot (about 400 ng) together with markers in an agarose gel (0.7% w/v).
Protocol 11 – pLAFR3 Cosmid Library Production (cont. protocol 7)
DNA fragment size selection
[1]
Once run de gel overnight, proceed to the agarose gel-digesting assay using the
“GELase (EPICENTRE) Agarose Gel-Digesting protocol” described in steps below. Cut
the area > 20 kb*. * You must see that the DNA is not intact (you run the control),
but already smears. And major fraction is running above 10-15 kbp. Take from 20 kb
and higher. The initial DNA will not exceed 30-40 kb anyway. So take everything that
is above.
[2]
Thoroughly melt the gel slice by incubating at 70ºC for 3 min for each 200 mg of gel.
19
[3]
Transfer the molten agarose immediately to 45ºC and equilibrate 2 minutes for each
200 mg of gel.
[4]
Add 4 l 50x gelase buffer per each 200 mg agarose
[5]
Add 2 l GELase and incubate for 1-4 h at 45 ºC.
[6]
Centrifuge the tubes in a microcentrifuge at maximum speed (15000 g) for 15 min at
4ºC to pellet any insoluble oligosaccharides. Carefully remove the upper 90%-95% of
the supernatant, which contains the DNA, to a sterile 1.5 ml tube. You should be
careful to avoid the gelatinous pellet.
[7]
Concentrate the DNA in a Microcon-100 (Millipore) concentrator membrane (100 KDa
cut-off) at 4ºC to a final volume of 20-50 l. Be sure that you cut the yellow tip to
transfer the supernatant.
[8]
Then add 450 l steril water and concentrate again to 20-50 l. This concentrated DNA
is the insert to ligate to the pLAFR3 vector.
[9]
Quantify the amount of nucleic acid. DNA concentration should be not less that 75
ng/l (in 50 l a total of 3.75 g).
[10] Run an aliquot (about 400 ng) together with markers in an agarose gel (0.7% w/v).
[11] Ligate overnight at 14°C partially Sau3AI digested DNA and pLAFR3 shoulders in a
ratio 1:2 or 1:1. The ligation volume must be as low as possible (5-10 l). If you take
100 ng of both shoulders together, then add 50 or 100 ng of the insert (you may do
two separate ligations and see what works better). It is highly recommended to run
small aliquots (for example 1 l) of all your samples after any manipulation, and after
ligation
Reaction 1: 1 l pLAFR3, 1 l T4 DNA ligase buffer (x10), 0.5 l T4 DNA ligase, X DNA
fragment, X l MilliQ water.
Protocol 12 – Lambda phage Library Production (continuation of protocol 8)
DNA fragment size selection
[1]
Once run de gel overnight, proceed to the agarose gel-digesting assay using the
“GELase (EPICENTRE) Agarose Gel-Digesting protocol” described in steps below. Cut
the area < 15 kb.
[2]
Thoroughly melt the gel slice by incubating at 70ºC for 3 min for each 200 mg of gel.
[3]
Transfer the molten agarose immediately to 45ºC and equilibrate 2 minutes for each
200 mg of gel.
[4]
Add 4 l 50x gelase buffer per each 200 mg agarose
20
[5]
Add 2 l GELase and incubate for 1-4 h at 45 ºC.
[6]
Centrifuge the tubes in a microcentrifuge at maximum speed (15000 g) for 15 min at
4ºC to pellet any insoluble oligosaccharides. Carefully remove the upper 90%-95% of
the supernatant, which contains the DNA, to a sterile 1.5 ml tube. You should be
careful to avoid the gelatinous pellet.
[7]
Concentrate the DNA in a Microcon-100 (Millipore) concentrator membrane (100 KDa
cut-off) at 4ºC to a final volume of 20-50 l. Be sure that you cut the yellow tip to
transfer the supernatant.
[8]
Then add 450 l steril water and concentrate again to 20-50 l. This concentrated DNA
is the insert to ligate to the lambda vector.
[9]
Quantify the amount of nucleic acid. DNA concentration should be not less that 75
ng/l (in 50 l a total of 3.75 g).
[10] Run an aliquot (about 400 ng) together with markers in an agarose gel (0.7% w/v).
[11] Ligate overnight at 14°C partially Sau3AI digested DNA and pBK-CMV, using the
following ligation conditions (the final volume should not exceed 5.0-5.5 µL)
1 µL Zap Express Vector
0.6 µL T4 ligase buffer (x10)
4 µL of concentrated insert
0.6 µL T4 DNA ligase
[12] Inoculate 50 ml of LB, supplemented with 10 mM MgSO 4 and 0.2% (w/v) maltose,
with a single colony of E. coli XL1 MRF’.
[13] Grow at 30°C, shaking overnight, shaking at 200 rpm
21
II. Experimental procedures
Day 4 (morning)
DNA gene library production
Material
pCC1FOS Epicentre (Cat. No. CCFOS110)
Agarose 0.6-1.0% (w/v) (normal and low melting point)
Microcon-100 (Millipore)
LBa and LBb, NZYa and NZYb
E. coli XL1 MRF’, E. coli EPI300, E. coli DH5
MgSO4 1 M and MgSO4 10 mM
SM buffer
Chloroform
Tc 5-10 mg/ml and Cm 50 mg/ml
Protocol 13 – CopyControl™ Fosmid Library Production (cont. protocol 10)
Ligation reaction in the pCC1FOS fosmid vector.
A single ligation reaction will produce 10 3-106 clones depending on the quality of the insert
DNA. Based on this information calculate the number of ligation reactions that you will need
to perform. The ligation reaction can be scaled-up as needed. A 10:1 molar ratio of
pCC1FOS vector to insert DNA is optimal. If we use 0.5 g of 100 Kb DNA insert we need
around 0.5 g of vector.
[1] Combine the following reagents in the order listed and mix thoroughly after each
addition.
1 l 10X Fast-Link Ligation Buffer
1 l pCC1FOS (0.5 g/l)
1 l 10 mM ATP
6.8 l concentrated insert DNA (75 ng/l)
0.2 l MilliQ water
1 l Fast-Link DNA Ligase
10 l Total reaction volume
22
[2] Incubate at room temperature for 2 hours and then transfer the reaction to 70ºC for 10
minutes to inactivate the Fast-Link DNA Ligase.
Packing reaction in the pCC1FOS fosmid vector.
[1]
Thaw, on ice, 1 tube of the MaxPlax Lambda Packaging Extracts for every ligation
reaction performed in the above step.
[2]
When thawed, immediately transfer 25 l (one-half) of each packaging extract to a
second 1.5 ml microfuge tube and place on ice.
[3]
Add 10 l of the ligation reaction to each 25 l of the thawed, extracts being held on
ice.
[4]
Mix by pipetting the solutions several times. Avoid the introduction of air bubbles.
Briefly centrifuge the tubes to get all liquid to the bottom.
[5]
Incubate the packaging reactions at 30ºC for 90 minutes. After the 90 minute
packaging reaction is complete, add the remaining 25 l of MaxPlax Lambda Packaging
Extract from to each tube.
[6]
Incubate the reactions for an additional 90 minutes at 30ºC.
[7]
At the end of the second 90 minute incubation, add Phage Dilution buffer (PD buffer:
10 mM Tris-ClH pH 8.3, 100 mM NaCl, 10 mM MgCl 2) to 1 ml final volume in each tube
and mix gently. Add 25 l of chloroform to each. Mix gently and store at 4ºC (up to a
month). A viscous precipitate may form after addition of the chloroform. This
precipitate will not interfere with library production. Determine the titer of the phage
particles (packaged fosmid clones) and then plate the fosmid library. See next day.
[8]
The day of the packaging reactions, inoculate 50 ml of LB broth + 10 mM MgSO4 with
5 ml of the EPI300-T1R overnight culture. Shake at 37ºC to an OD600nm = 0.8-1.0.
Store the cells at 4ºC until needed (Titering). The cells may be stored for up to 72
hours at 4ºC if necessary.
Protocol 14 – pLAFR3 Cosmid Library Production (cont. protocol 11)
Packaging Protocol
[1]
Remove the appropriate number of packaging extracts from a –80°C freezer and place
the extracts on dry ice.
[2]
Quickly thaw the packaging extract by holding the tube between your fingers until the
contents of the tube just begins to thaw.
23
[3]
Add the experimental DNA immediately (1–4 μl containing 0.1–1.0 μg of ligated DNA)
to the packaging extract.
[4]
Stir the tube with a pipet tip to mix well. Gentle pipetting is allowable provided that air
bubbles are not introduced.
[5]
Spin the tube quickly (for 3–5 seconds), if desired, to ensure that all contents are at
the bottom of the tube.
[6]
Incubate the tube at room temperature (22°C) for 2 hours.
[7]
Add 500 μl of SM buffer (50 mM Tris-ClH pH 7.5, NaCl 0.1M, 8.5 mM MgSO4 and
0.01% (w/v) gelatin) to the tube. The gelatin in SM buffer stabilizes lambda phage
particles during storage.
[8]
Add 20 μl of chloroform and mix the contents of the tube gently.
[9]
Spin the tube briefly to sediment the debris.
[10] The supernatant containing the phage is ready for titering. The supernatant may be
stored at 4°C for up to 1 month.
[11] Streak the bacterial glycerol stock (E. coli DH5 or XL1Blue) onto the LB agar plates.
Incubate the plates overnight at 37°C. Do not add antibiotic to the medium in the
following step. The antibiotic will bind to the bacterial cell wall and will inhibit the
ability of the phage to infect the cell.
[12] Inoculate 50 ml of LB, supplemented with 10 mM MgSO 4 and 0.2% (w/v) maltose,
with a single colony.
[13] Grow overnight at 30°C, shaking at 200 rpm.
Protocol 15 – Lambda phage Library Production (cont. protocol 12)
Packaging Protocol
[1]
Remove the appropriate number of packaging extracts from a –80°C freezer and place
the extracts on dry ice.
[2]
Quickly thaw the packaging extract by holding the tube between your fingers until the
contents of the tube just begins to thaw.
[3]
Add the experimental DNA immediately (1–4 μl containing 0.1–1.0 μg of ligated DNA)
to the packaging extract.
[4]
Stir the tube with a pipet tip to mix well. Gentle pipetting is allowable provided that air
bubbles are not introduced.
[5]
Spin the tube quickly (for 3–5 seconds), if desired, to ensure that all contents are at
the bottom of the tube.
24
[6]
Incubate the tube at room temperature (22°C) for 2 hours.
[7]
Add 500 μl of SM buffer (50 mM Tris-ClH pH 7.5, NaCl 0.1M, 8.5 mM MgSO4 and
0.01% (w/v) gelatin) to the tube. The gelatin in SM buffer stabilizes lambda phage
particles during storage.
[8]
Add 20 μl of chloroform and mix the contents of the tube gently.
[9]
Spin the tube briefly to sediment the debris.
[10] The supernatant containing the phage is ready for titering. The supernatant may be
stored at 4°C for up to 1 month.
[11] Inoculate 50 ml of LB, supplemented with 10 mM MgSO 4 and 0.2% (w/v) maltose,
with a single colony of E. coli XL1 MRF’.
[12] Grow at 30°C, shaking overnight, shaking at 200 rpm
25
II. Experimental procedures
Day 5
Activity screens
Protocol 16 – CopyControl™ Fosmid Library Production (cont. protocol 13)
Titering the Packaged Fosmid Clones. Before plating the library we recommend that the titer
of packaged fosmid clones be determined. This will aid in determining the number of plates
and dilutions to make to obtain a library that meets the needs of the user.
[1]
Make serial dilutions of the 1 ml of packaged phage particles into PD buffer in sterile
microfuge tubes. For example, use dilutions 1:101, 1:102, 1:104 and 1:105.
[2]
Add 10 l of each above dilution, individually, to 100 l of the prepared EPI300-T1R host
cells. Incubate each for 20 minutes at 37ºC.
[3]
Spread the infected EPI300-T1R cells on an LB plate plus 12.5 g/ml chloramphenicol
and incubate at 37ºC overnight to select for the fosmid clones.
[4]
Count colonies and calculate the titer of the packaged clones as following: if there were
200 colonies on the plate streaked with the 1:10 4 dilution, then the titer in cfu/ml,
(where cfu represents colony -forming units) of this reaction would be:
[5]
(# of colonies) (dilution factor) (1000 l/ml) / (volume of phage plated [l])
[6]
That is: (200 cfu) (104) (1000 l/ml)/ (10 l)= 2 x 108 cfu/ml
Based on the titer of the phage particles determined before, dilute the phage particles from
with PD buffer to obtain the desired number of clones and clone density on the plate. Mix
the diluted phage particles with EPI300-T1R cells prepared in the ratio of 100 l of cells
(prepared as above) for every 10 l of diluted phage particles. Spread the infected bacteria
on an LB plate plus 12.5 g/ml chloramphenicol and incubate at 37ºC overnight to select for
the fosmid clones. Subsequently these clones are plated with the help of a colony-picker
robot, in 384-wells plates (LB, 12.5 g/ml chloramphenicol and 15% of glycerol). Plates are
incubated overnight without shaking at 37ºC. The colony-picker robot is again used to
produce copies of the 384-wells plates.
Protocol 17 – pLAFR3 Cosmid Library Production (cont. protocol 14)
Titering the cosmid packaging reaction
26
[1]
Pellet the bacteria at 500 g for 10 minutes.
[2]
Gently resuspend the cells in half the original volume with sterile 10 mM MgSO 4.
[3]
Dilute the cells to an OD600 of 0.5 with sterile 10 mM MgSO4. The bacteria should be
used immediately following dilution.
[4]
Prepare a 1:10 and a 1:50 dilution of the cosmid packaging reaction in SM buffer.
[5]
Mix 25 μl of each dilution with 25 μl of the appropriate bacterial cells at an OD600 of 0.5
in a microcentrifuge tube and incubate the tube at room temperature for 30 minutes.
[6]
Add 200 μl of LB broth to each sample and incubate for 1 hour at 37°C, shaking the
tube gently once every 15 minutes. This incubation will allow time for expression of
the antibiotic resistance.
[7]
Spin the microcentrifuge tube for 1 minute and resuspend the pellet in 50 μl of fresh
LB broth.
[8]
Using a sterile spreader, plate the cells on LB agar plus 10 g/ml tetracycline and
incubate at 37ºC overnight to select for the fosmid clones. Incubate the plates
overnight at 37°C.
[9]
Count colonies and calculate the titer of the packaged phage particles as is described
above.
Based on the titer of the phage particles, dilute the phage particles from with SM buffer to
obtain the desired number of clones and clone density on the plate. Mix the diluted phage
particles with E. coli DH5 or XL1Blue cells prepared in the ratio of 100 l of cells for every
10 l of diluted phage particles. Spread the infected bacteria on LB agar, tetracycline 10
g/ml, XGal 40 g/ml plates and incubate at 37ºC overnight to select for the plasmid clones.
Subsequently these clones are plated with the help of a colony-picker robot, in 384-wells
plates (LB, tetracycline 10 g/ml, and 15% of glycerol). Plates are incubated overnight
without shaking at 37ºC. The colony-picker robot is again used to produce copies of the
384-wells plates.
Protocol 18 – Lambda phage Library Production (cont. protocol 15)
Titering the cosmid packaging reaction
[1]
Pellet the bacteria at 500 g for 10 minutes.
[2]
Gently resuspend the cells in half the original volume with sterile 10 mM MgSO 4.
[3]
Dilute the cells to an OD600 of 0.5 with sterile 10 mM MgSO4. The bacteria should be
used immediately following dilution.
27
[4]
Prepare dilutions from 1:1 to 1:105 1:10 of the packaging reaction in SM buffer.
[5]
Mix 1 μl of each dilution with 200 μl of the appropriate bacterial cells at an OD600 of 0.5
in a microcentrifuge tube and incubate the tube at 37ºC for 15 minutes shaking the
tube gently.
[6]
Add 500 μl of NZY soft agar to each sample plate on NZY agar plates. Incubate the
plates overnight at 37°C.
[7]
Count phage particles and calculate the titter of the packaged phage particles as is
described above.
After the titter, used to calculate the library size, the library is further amplified.
Amplification can be performed both in liquid medium or agar plates. For amplification in
liquid culture use the following protocol:
[1]
Mix 2 mL of a fresh, overnight bacterial culture (OD600 0.95) with approximately 106
pfu of bacteriophage in a sterile culture tube.
[2]
Incubate for 15 minutes at 37ºC to allow the bacteriophage particles to adsorb.
[3]
Add 8 mL of pre-warmed LB medium (or NZY) and incubate with vigorous shaking until
lysis occurs (6-12 h at 37ºC).
[4]
After lysis has occurred, add 2 drops of chloroform and continue incubation for 15
minutes at 37ºC.
[5]
Centrifuge at 4.000 g for 10 minutes at 4ºC.
[6]
Recover the supernatant, add 1 drop of chloroform, and store at 4ºC. The titter of the
stock should be approximately 1010 pfu/mL, and this usually remains unchanged as
long as the stock is stored at 4ºC.
For the amplification in solid agar, E. coli XL1 MRF’ cells are prepared as described above in
MgSO4 10 mM and OD600 of 0.5. Then proceed as follows:
[1]
Two aliquots are prepared, each of them containing approximately 5x10 4 pfu and 600
µL E. coli cells. Do not exceed 300 µL phage solution per 600 µL of cells.
[2]
Incubate for 15 minutes at 37ºC with gently shaking after which 3 mL of NZY broth are
added and further spread over NZY agar plated (20x20 cm) pre-warmed at 37ºC.
[3]
Incubate the plates at 37°C for about 8-10 h after which 8-10 mL SM buffer is added
while shaking gently the plates (50 rpm) for additional 10 h at 4ºC.
[4]
The buffer is then decanted in a Falcon tube. Two additional mL of SM buffer are added
to the agar and mixed with the previous solution.
28
[5]
Add 5% (v/v) chloroform and incubate 15 min at 4ºC.
[6]
Centrifuge at 500 g for 10 minutes at 4ºC.
[7]
The supernatant is collected and stored: one small aliquot at 4ºC for lab use and other
is stored at -70ºC after addition of 7% dimethyl sulfoxide (DMSO). The library is then
ready to use.
Protocol 19 – Activity screens
Lambda phage libraries will be used to screen particular activities. Plates 22.5 x 22.5 cm of
NZYa, in which 7000-10000 phage particles may be screens, will be used.
[8]
Pellet the bacteria at 500 g for 10 minutes.
[9]
Gently resuspend the cells in half the original volume with sterile 10 mM MgSO 4.
[10] Dilute the cells to an OD600 of 0.5 with sterile 10 mM MgSO4. The bacteria should be
used immediately following dilution.
[11] Mix 1 μl of library with 2 ml of the appropriate bacterial cells at an OD 600 of 0.5 in a
Falcon 15 ml tube and incubate the tube at 37ºC for 15 minutes shaking the tube
gently.
[12] Add to 40 ml NZY soft agar to each sample plate on NZY agar plates. Incubate the
plates overnight at 37°C.
[13] Spray the plate with substrate and see colour development.
29
III. In silico procedures
III.1. Meta(genomics) assembling methodologies
Dr. Giuseppe D’Auria
Cavanilles Institute on Biodiversity and Evolutionary Biology, Valencia, Spain
The exponential improvement of sequencing technologies is going faster than our skills in
data analysis. The last new high-throughput technologies such as pyrosequencing (454Roche), Solexa and Solid, jointly with the still useful Sanger method, give to the researcher
important instruments to obtain sequences information from single cultivated microbes (the
best of the cases), complex communities with a necessary metagenomics approach, or more
complex eukaryotic systems. In all these frames bioinformatics is the key step to reach the
information hidden into the obtained data. The selection of the good strategy of sequencing
depends on the first by the budget of the lab then by the studied organism, its “genomic
history” (sample with single or multiple organisms, genome length, genome plasticity,
presence of repeated sequences and mobile elements). In all cases, the possibility to access
different kind of technologies with different types of sequences (in terms of length and
quality) is extremely helpful in order to overcome the pro and cons of each kind of
technology. So the bioinformatics efforts are strictly related to the correct choose of the
strategy. This section is divided in two parts, the first will give hints about sequences
formats, format conversions, accessing sequence quality data, assembly strategies by the
use of open source “Staden Package” and MIRA (Mimicking Intelligent Read Assembly). The
second part is cantered in assembly and complete genome data visualization and
comparison.
30
III. In silico procedures
III.2. Phylogentic reconstructions. An ARB software
introduction
Pablo Yarza
Marine Microbiology Group. IMEDEA
Phylogenetic affiliation of the inserts in a metagenomic library is easier once we detect the
presence of certain genes with phylogenetic signal (as 16s and 23s rRNAs) in a given clone.
Rather than being common, good phylogenetic markers are restricted to a very small group
of molecules that must fulfill most of the following requirements: to be ubiquitous, to have
enough informational power, to have well documented orthologous in public databases, and
to support the current taxonomic schema. The abundance of these markers and other
potentially interesting genes in a metagenomic library depends on the library coverage and
phylotype's richness of the sample source. These and other reasons make the construction
and analysis of 16S rRNA clone libraries as a recommendable step prior to the metagenomic
approach in environmental samples.
On the best scenario, inserts containing complete or partial SSU/LSU sequences can
be optimally affiliated. In the absence of ribosomal markers, a small set of genes from those
classified as 'housekeeping genes' can be used, although they could generate low-resolution
phylogenetic reconstructions. On the worst case, where any kind of molecule with
phyogenetical signal exists, other methods based on sequence composition could be used to
hypothesize affiliation to known biodiversity.
A phylogenetic reconstruction contains three main steps: i- searching and retrieving
reference sequences from comprehensive databases, ii- aligning the sequences to verify
positional orthology, iii- the final bulk of sequences has to be submitted to different treeing
methodologies to guarantee a stable final topology.
Nowadays a broad range of online tools and public databases facilitate the
phylogenetic inference. Among them, of high relevance are: the SILVA project
(http://www.arb-silva.de) which hosts one of the biggest and curated database of SSU and
LSU genes with more than 300.000 entries; the All-Species Living Tree Project
(http://www.arb-silva.de/projects/living-tree) which since one year updates a curated
database built on only type strain sequences; the online automatic aligner for ribosomal
sequences SINA aligner (http://www.arb-silva.de/aligner); and the free-cost ARB software
package (http://www.arb-home.de) which integrates under the same interface all the
necessary tools for any kind of phylogenetic reconstructrion based either on ribosomal
markers or coding genes.
This practical course will consist on a brief introduction to the phylogenetic
reconstruction through a number of exercises consisting on retrieving sequences from public
repositories, importing into the ARB software, performing alignments with a secondarystructure based editor, calculation of some trees and evaluation of the results.
31
III. In silico procedures
III.3. Bioinformatic for Metagenomics. A beginners guide
Dr. Michael Richter
Michael Richter. Marine Microbiology Group, IMEDEA
The sequencing of microbial genomes has become a fundamental approach for the
understanding of complex biological networks. Currently, over 900 sequenced bacterial and
archaeal genomes are publicly available and many more are on their way to be fully
sequenced (www.genomesonline.org). The traditional cultivation-based sequencing
approach has been complemented by the ground breaking cultivation-independent
approaches, called metagenomics. Novel, cheap and ultra-fast sequencing technologies are
generating enormous amounts of sequence data every day. On the one hand, this opens an
unprecedented possibility to dig into the gold mine of sequence space; on the other, such
large datasets raise several processing problems and drive current bioinformatic tools to
their limit.
In this practical course, the students will learn about the basic bioinformatic concepts of
(meta)genome analysis, based on a large genomic fragment recovered form the
environment. Independent of the chosen sequencing strategy, all data generated goes
through a similar pipelines based on generic bioinformatic tools and databases, to
accumulate knowledge through functional assignments and data integration. The starting
point is always the localization of functional regions such as protein-coding genes. These
predicted protein-coding genes have to be in silico compared to proteins from a public
database. These protein sequence comparisons are used to infer a potential function for
newly sequenced genes by information propagation from already published knowledge, a
process referred to as gene annotation.
Further, in metagenomics it is a common problem that genomic fragments that have been
retrieved from environmental samples cannot be related to a specific group, because no
phylogenetic marker genes are present. In this course we will use the free available
software Tetra (www.megx.net/tetra/) to calculate tetra-nucleotide usage patterns and
compare them to whole genome sequences. This method will provide valuable information
about the relatedness of the compared sequences.
The computational needs for genome analysis and comparisons are extensive and require
a specialized infrastructure. This infrastructure includes powerful hardware systems
consisting of a computing cluster and dedicated servers. Moreover, 'large' metagenomic
datasets constitute an additional computational load, which must be processed through the
same pipeline. In order, to get an overview of possibilities the genomic fragment will be
analyzed by using the online available MG-RAST server - a public resource for the automatic
phylogenetic and functional analysis of metagenomes (metagenomics.nmpdr.org). This
server provides a wide spectrum of tools for the annotation of sequence fragments, their
phylogenetic classification and metabolic reconstructions.
In summary, accurate, consistent data acquisition and processing is a prerequisite to
generate biological understanding from the flood of sequence data. Future conceptual
advances in microbial sciences will increasingly rely on the availability of an innovative
computational infrastructure to interrogate these growing genomic and metagenomic
datasets. But only by a close partnership of biologists and bioinformatics we will be finally
able to understand the complex interplay of biological entities that form the basis of our
planet earth.
32
IV. Contacts
List of participants
Alejandro Acosta
CSIC - Estación Experimental del Zaidín, Granada
e-mail: alejandro.acosta@eez.csic.es
Yamal Al-ramahi
CSIC – Institute of Catalysis, Madrid
e-mail: yamal_a_g@icp.csic.es
Ana Beloqui
CSIC – Institute of Catalysis, Madrid
e-mail: abeloqui@icp.csic.es
Giussepe D’Auria
Cavanilles Institut on Biodiversity and Evolutionary Biology, University of Valencia
e-mail: Giuseppe.Dauria@uv.es
Nina Dinjaski
CSIC, Centro de Investigaciones Biológicas, Madrid
e-mail: nina@cib.csic.es
Manuel Ferrer
CSIC – Institute of Catalysis, Madrid
e-mail: mferrer@icp.csic.es
Beatriz Galán
CSIC - Centro de Investigaciones Biológicas, Madrid
e-mail: bgalan@cib.csic.es
Adela García
CSIC - Estación Experimental del Zaidín, Granada
e-mail: adela.garcia@eez.csic.es
Leonor Garmendia
CSIC – Centro Nacional de Biotecnología, Madrid
e-mail: lgarmend@cnb.csic.es
Olga Genilloud
Medicamentos Innovadores en Andalucía
e-mail: olga_genilloud@wanadoo.es olga.genilloud@gmail.com
Azam Ghazi
CSIC – Institute of Catalysis, Madrid
e-mail: aghazi@icp.csic.es
María Eugenia Guazzaroni
CSIC – Institute of Catalysis, Madrid
e-mail: meugenia@icp.csic.es
33
Cristina Limón
Centro Andaluz de Biologia del Desarrollo
Universidad Pablo de Olavide-CSIC, Sevilla.
e-mail: mclimmor@upo.es
Arantxa López
IMEDEA. Universitat de les Illes Balears-CSIC
e-mail: arantxa.lopez@uib.es
Nieves López-Cortés
CSIC – Institute of Catalysis, Madrid
e-mail: nieveslopez@icp.csic.es
Herminia Loza
CSIC - Centro Nacional de Biotecnología, Madrid
e-mail: hloza@cnb.csic.es
Patricia Marín
e-mail: patricia.marin@eez.csic.es
Guadalupe Martín
Centro Andaluz de Biologia del Desarrollo
Universidad Pablo de Olavide-CSIC, Sevilla.
e-mail: gmarcab@upo.es
Sophie Marie Martirani
CSIC - Estación Experimental del Zaidín, Granada
Celia Méndez
Area de Microbiología
Facultad de Medicina
Universidad de Oviedo
Mª Antonia Molina
CSIC - Estación Experimental del Zaidín, Granada
e-mail: nene.molina@eez.csic.es
Julián Pérez
Secugen, S. L.
e-mail: j.perez@secugen.es
Paloma Pizarro
Bio-Iliberis R&D
e-mail: paloma.pizarro@eez.csic.es
Michael Richter
IMEDEA. Universitat de les Illes Balears-CSIC
e-mail: michael.richter@uib.es
Ramón Rosselló
IMEDEA. Universitat de les Illes Balears-CSIC
e-mail: rossello-mora@uib.es
34
Jennifer Solano
Bio-Iliberis R&D
e-mail: jsolano@ugr.es
Ana Suárez
IMEDEA. Universitat de les Illes Balears-CSIC
e-mail: vieaabs4@uib.es
Javier Tamames
Cavanilles Institut on Biodiversity and Evolutionary Biology, University of Valencia
e-mail: javier.tamames@uv.es
Laura Terrón
Centro Andaluz de Biologia del Desarrollo
Universidad Pablo de Olavide-CSIC, Sevilla.
e-mail: ltergon@upo.es
Iria Uhía
CSIC - Centro de Investigaciones Biológicas, Madrid
e-mail: iriauhia@cib.csic.es
José María Vieites
CSIC – Institute of Catalysis, Madrid
e-mail: vieites@icp.csic.es
Pablo Yarza
IMEDEA. University of Illes Balears-CSIC
e-mail: pablo.yarza@uib.es
Luis Yuste Ricote
CSIC- Centro Nacional de Biotecnología, Madrid
e-mail: lyuste@cnb.csic.es
35
Download