Research Paper

advertisement
Charley 1
George Washington Carver Internship Program
Iowa State University
By:
Danielle Nicole Charley
Northern Arizona University
Flagstaff, AZ
And
Leslie Nelson
University of New Mexico
Albuquerque, NM
Charley 2
Iowa State University
Abstract
Positioning Maize Knobs Relative to B73 Maize Sequence
By:
Danielle Charley
Northern Arizona University, Flagstaff, AZ
And
Leslie Nelson
University of New Mexico, Albuquerque, NM
Dr. Carolyn Lawrence, USDA-ARS
Ethalinda Cannon, POPcorn Solution/Application Architect
In the past knobs have been used to characterize the diversity of maize accessions and have been
key in understanding genetic crossing over in maize as well as for other problems. Over time,
they have become less widely utilized. During the assembly of the maize genome sequence,
some repeat rich regions were removed which included some of the knob sequences. Because
most knob sequence has been removed an objective of this research is to find where the knobs
might be located relative to sequence. Regardless of where sequence has been removed, there
remain some copies of the repeats. Using the MaizeGDB website and the tools offered within the
site, I found regions where knob repeats are high, indicating that the knob itself may reside in
that region. This research will aid researchers in using the maize genome assembly by alerting
them to regions of the assemblies where sections of the genome sequence may be missing.
Charley 3
Danielle Charley and Leslie Nelson
George Washington Carver Internship Program
Ethalinda Cannon and Carolyn Lawrence
30 July 2010
Positioning Maize Knobs Relative to B73 Maize Sequence
Introduction:
In the past knobs have been used to characterize the diversity of maize accessions and have been
key in understanding genetic crossing over in maize as well as for other problems. Over time,
they have become less widely utilized. During the assembly of the maize genome sequence,
some repeat rich regions were removed which included some of the knob sequences. Because
most knob sequence has been removed an objective of this research is to find where the knobs
might be located relative to sequence. Regardless of where sequence has been removed, there
remain some copies of the repeats. Using the MaizeGDB website and the tools offered within the
site, I found regions where knob repeats are high, indicating that the knob itself may reside in
that region. This research will aid researchers in using the maize genome assembly by alerting
them to regions of the assemblies where sections of the genome sequence may be missing.
Background:
Maize has 10 chromosomes, on some of these chromosomes are heterochromatic regions called
knobs. A heterochromatic region is a place in the genome where the DNA is tightly compacted
and mainly consists of repeats. These regions contain the 180-bp repeat and the 350-bp TR-1
repeat (Peacock et al., 1981 and Ananiev et al., 1998). Studies have shown that there may be
Charley 4
some association between knob constitution and phenotypic characteristics including yield
(Wellhausen and Prywer 1954; Moll et al., 1972; and Chughtai and Steffensen 1987) . Knobs
also were used by Barbara McClintock as cytological markers in her seminal discovery that links
genetic crossing over with physical crossovers observed in chromosomes (Creighton, H., and
McClintock 1931). There has been little research done which associates knob locations with the
genomic sequence. TA Kato Y has been studying maize knobs for many years. Figure 1 is an
idiogram of the maize karyotype, showing where he has observed knobs to exist (Kato
unpublished). The goal of the project was to mark positions on the genome where the knobs may
reside.
Materials and Methods:
There are three different maps types used in this project: a genetic map, a sequence map, and a
cytological map. All three of these have their own units: centiMorgan (cM), base pair (bp), and
centiMcClintock (cMC). Each map type is collinear but each unit type is different. A cM is a
statistical measure of distance based on segregation of traits in offspring of two inbred parents. A
bp is a single nucleotide in a DNA sequence. A cMC is the percentage distance of a locus from
the centromere to the end of the locus arm.
B73 RefGen_v1- reference genome assembly for this research, version 1 assembly of the B73
maize genome (Schnable et al., 2009).
MaizeGDB- Maize Genetics and Genomics Database, used for data and visual analysis (Sen et
al., 2009).
BLAST- Used to align knob repeat sequence on B73 genome assembly (Altschul et al., 1997).
Charley 5
CViT- Used for creating map images (Cannon and Cannon, 2005).
GBrowse- A genome browser used for viewing features on genomic sequence (Stein et al.,
2002).
Locus Lookup- Searches for genomic coordinates of a locus (Andorf et al., 2010).
IBM2 2008 Neighbors map- Provided genetic coordinates for knobs (Schaeffer et al., 2008).
180 bp and TR-1- repeat sequence used to identify knob regions (GenBank).
Results:
Figure 1
Karyotype from T.A. Kato Y showing knob positions on chromosomes 1-10.
Charley 6
Table 1
See Appendix 1
Data TA Kato Y collected for chromosome 1 from various maize lines as well as the conversion
of micrometer measurements to cytological map units for those data.
Figure 2
Cytological representation of the knobs in the first sample on chromosomes 1-10 given TA Kato
Y’s data.
Charley 7
Graph 1
This graph is based on the IBM2 2008 Neighbors Map from MaizeGDB (cM). This is a genetic
map showing where knobs might be using genetic data rather than cytological measurements.
The IBM2008 map is a very common genetic map used in maize genetics This representation
was created by drawing the genetic length of the chromosome, finding the repeat in MaizeGDB
and reporting the genetic coordinates, then centromeres were added. Although only knob
locations are shown here, many other loci including genes are available on the full IBM2 2008
Neighbors map.
Charley 8
Graph 2- Sequence map (Locus Pair Lookup)
Sequence map (bp) created using MaizeGDB and Locus Pair Lookup. Locus Pair Lookup finds
knobs on the different chromosomes and determines ranges for their locations based upon probe
locations on the genome assembly and/or loci that are on either side of the knob on genetic maps.
Those coordinates are used to approximate where the knob might be. The red are the ranges of
where a knob might be and the black are the centromeres.
Charley 9
Graph 3
Sequence map (bp) generated using GBrowse and BLAST to estimate where knobs are located.
We used BLAST to search for clusters of knob repeats and uploaded hits to MaizeGDB’s
instance of Gbrowse, which is a common software package for genome visualization. In this
representation there could be errors because during the sequencing process and again for
assembly, the Maize Genome Sequencing Consortium intentionally removed repetitive
elements. This was done for sequencing because the goal was to sequence only genic regions of
the genome. For assembly, repetitive elements were removed because it is difficult to correctly
Charley 10
assemble highly repetitive regions correctly. The red regions are the knob hits that were found
and the black regions represent centromeres.
Graph 4
Sequence map (bp) with full BLAST results. Repeats within 500 bp of each other are collapsed
into regions. This map was created using the BLAST software. We took the sequence for the 180
base pair repeat and the TR-1 repeat and used the command line version of BLAST to find
matches across the entire maize genome assembly. From this we massaged the data based upon
percent identity and only show matches that were 90% identical or higher. We then converted
those results into a gff file. The red regions are the BLAST hits and the black are the
centromeres.
Charley 11
Conclusions:
Researchers understand the maize genome in many ways: cytologically, genetically, and
sequentially. Placing knob data onto the various maps on all map types will help researchers to
understand their placement across all major paradigms of biological understanding. Because all
the work done here is inferential, the knob locations relative to the genome sequence represent
hypotheses for location and must be further tested. This work serves as a framework for
understanding where knobs may lie relative to markers mapped to the maize genome sequence.
Charley 12
References:
Cannon, EK; Cannon SB. (2005) CViT: Chromosome Visualization Tool, Unpublished
Chughtai, S.R. and Steffensen, D.M. (1987) Maize Genetics Cooperation News Letter 61, 98-99.
Creighton, H., and McClintock, B. 1931 A correlation of cytological and genetical crossing-over
in Zea mays. PNAS 17:492–497.
Durbin, R., Eddy, S., Krough, A., and Mitchison, G. (1998). Biological Sequence Analysis:
Probabilistic Models of Protein and Nucleic Acid. Cambridge University Press, New
York,NY.
Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman
DJ Nucleic Acids Res. 1997 Sep 1;25(17):3389-402.
GenBank accessions: DQ186871.1, M32528.1
Kato, Takeo Angel. Cytological Studies Of Maize [Zea Mays L.] and Teosinte [Zea Mexican
(Scrader) Kuntze] In Relation to Their Origin And Evolution. Massachusetts: University
of Massachusetts, 1975.
Lawrence, Carolyn J., Seigfried, Trent E., Bass, Hank W., and Anderson, Lorinda K.(2006).
Predicting Chromosomal Locations of Genetically Mapped Loci in Maize Using the
Morgan2McClintock Translator. Genetics Society of America.
Moll, R.G., Hansen, W.D., Levings, C.S., and Ohta, Y. (1972) Crop Sci. 12, 585-589.
Peacock, W.J. Dennis, E.S., Rhoades, M.M., and Pryor, A.J. Highly Repeated DNA Sequence
Limited to Knob Heterochromatin in Maize. PNAS USA 1981 pg. 4490.
Schaeffer (Polacco), ML; Sanchez-Villeda, H; Coe, E. (2008) IBM2 2008 Neighbors,
Unpublished
Charley 13
Sen, TZ, Andorf, CM, Schaeffer, ML, Harper, LC, Sparks, ME, Duvick, J, Brendel, VP, Cannon,
E, Campbell, DA, Lawrence, CJ. (2009) MaizeGDB becomes 'sequence-centric'
Database. 2009:Vol. 2009:bap020.
Sen, TZ, Andorf, CM, Schaeffer, ML, Harper, LC, Sparks, ME, Duvick, J, Brendel, VP, Cannon,
E, Campbell, DA, Lawrence, CJ. (2009) MaizeGDB becomes 'sequence-centric'
Database. 2009:Vol. 2009:bap020.
The generic genome browser: a building block for a model organism system database. Stein LD,
Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW,
Arva A, Lewis S. Genome Res. 2002 Oct;12(10):1599-610.
The Locus Lookup tool at MaizeGDB: identification of genomic regions in maize by integrating
sequence information with physical and genetic maps. Andorf CM, Lawrence CJ, Harper
LC, Schaeffer ML, Campbell DA, Sen TZ. Bioinformatics. 2010 Feb 1;26(3):434-6.
Wellhausen, E.J. and Prywer, C. (1954) Agron J. 46,507-511.
Charley 14
Acknowledgments:
As a Native American Outreach Program participant, I am very grateful to the sponsors that
have made this program possible. A special thanks to my Mentors and Graduate Student Mentors
who have put the time and effort into helping me grow as researcher and student. List of
Sponsors: Carolyn Lawrence, Ethalinda Cannon, Trent Moore, Mary De Baca, Aurelio Curbelo,
Jovaughn Barnard, Dustin Thunder Hawk, Ranelle White Buffalo, George Washington Carver
Internship, National Science Foundation, USDA-ARS, and Iowa State University.
Charley 15
Appendix 1:
Download