ABSTRACT CHANDRA, MANAN. New Layers of the

advertisement
ABSTRACT
CHANDRA, MANAN. New Layers of the Arabidopsis Defense Transcriptome and
Proteome Diversity Uncovered by RNA-Sequencing Analysis of Alternative Transcript
Isoform Abundance. (Under the direction of Dr. Paola Veronese).
Latest annotation of the Arabidopsis thaliana genome (TAIR10, released in November 2010)
has indicated that 5,886 genes (here referred to as alternative transcript AT genes) generate
multiple transcript isoforms mostly via alternative usage of transcriptional starting sites
(TSS) and/or alternative splicing (AS). However, in Arabidopsis like in other plants, the
extent of the transcriptome that is regulated at the transcript isoform level during adaptive
response to environmental and developmental cues is still largely unknown. I present here
results from the characterization of the Arabidopsis defense response against the
hemibiotrophic bacterium Pseudomonas syringae pathovar tomato (Pst), causal agent of
bacterial speck diseases, based on RNA sequencing experiments and bioinformatics tools for
restructuring and quantifying the abundance of alternative transcripts. In particular, 75 bp
paired-end read sequencing libraries were constructed from leaf tissues treated with virulent
and avirulent Pst strains and mock treated (VIR, AVR and MOCK samples) respectively, and
using an Illumina GA IIx sequencing platform. The reads were mapped to the annotated
genome using TopHat and assembled using Cufflinks. Differentially expressed genes and
transcripts were identified by the program Cuffdiff. Following this pipeline, 764 AT genes
were found to have a detectable level of expression of all the corresponding transcript
variants (total of 2026 transcripts putatively coding for 1532 proteins) and possessing atleast
one variant with a two-fold expression change in response to Pst. For 383 of these genes, all
transcripts corresponding to the same gene had the same TSS and were originally mostly at
the post-transcriptional via AS. Another 220 genes had the corresponding transcript variants
with different TSSs and appeared to be primarily generated at the transcriptional level. The
remaining 161 genes had three or more annotated isoforms, and the different transcripts were
seemingly originated at both transcriptional and post-transcriptional levels. The three groups
of genes putatively coded for 809, 344 and 379 proteins respectively, thus revealing the
contribution of AS and probable alternative TSS/promoters to the defense proteome
diversity. In particular 99 and 177 instances in change in localization or loss of functional
domains respectively, were detected. Regulated abundance of transcript variants
characterized by alternative UTR structures were also found suggesting the presence of
regulatory elements of the efficiency of transcription and/or translation as well as of the
stability of the messenger RNA.
New Layers of the Arabidopsis Defense Transcriptome and Proteome Diversity Uncovered
by RNA-Sequencing Analysis of Alternative Transcript Isoform abundance
by
Manan Chandra
A thesis submitted to the Graduate Faculty of
North Carolina State University
in partial fulfillment of the
requirements for the degree of
Master of Science
Functional Genomics
Raleigh, North Carolina
2012
APPROVED BY:
Dr. Paola Veronese
Prof. Ralph A. Dean
Committee Co-chair
Committee Co-chair
Prof. David McK Bird
Committee Member
DEDICATION
To my parents Prof. Mukesh Chandra and Dr. Sangeeta Chandra and my brother Aman
Chandra, whose constant encouragement and unconditional support has given me the courage
and patience to complete this work. Everything I have and will achieve in life is because of,
and for, them.
ii
BIOGRAPHY
Manan Chandra was born and completed his schooling in Agra, Uttar Pradesh, India. He
went on to earn his Bachelor’s Degree in Biotechnology at the Jaypee University of
Information Technology, Himachal Pradesh, India. He then moved to Raleigh, North
Carolina (USA) where he joined Dr. Paola Veronese’s lab in March 2011 and conducted
research in the field of molecular plant pathology to get his masters in Functional Genomics
in 2012.
iii
ACKNOWLEDGEMENTS
My work could have never been complete without the constant help and support of a number
of people, towards whom I would like to extend my sincerest gratitude.
First and foremost, it is my proud privilege to have the opportunity to thank my Project
Guide and mentor Dr. Paola Veronese, Assistant Professor in the department of Plant
Pathology at the North Carolina State University. It was an honor to carry out my thesis work
under her expert guidance and supervision, without which my work would never have been
possible.
I would also like to register my gratitude for Dr. Veronese’s lab members including Dr.
Christophe La Hovary and Ms. Laura Bostic, whose invaluable advice was exceedingly
important. The constant suggestions and support by Dr. Monica Borghi also played an
immensely significant role in the completion of my work, for which I would like to extend
my heartfelt gratitude.
I would like to take this opportunity to thank the other members of the panel assessing my
work, Prof. David McKenzie Bird and Prof. Ralph A. Dean, for their critical analysis of
my work and important suggestions. Also, I would be failing in my duties if I were not to
acknowledge the support of Dr. Steffen Heber, Associate Professor at the Bioinformatics
Research Centre (BRC) and Computer Science here at NC State, for his help in the
interpretation of the generated bioinformatics data.
I would also like to express my gratitude for Dr. Jonathan Keebler and Dr. Jennifer Schaff
at the NC State Genomic Sciences Laboratory (GSL), as well as Dr. Walter Gassmann of
the University of Missouri, for helping us with the analysis of the mRNA sequencing data we
collected. I would also like to make a special mention of the contribution by Dr. Ann
Loraine, Associate Professor at the Department of Genomics and Bioinformatics at UNC
iv
Charlotte and faculty at the North Carolina Research Centre (NCRC) Kannapolis, who
helped with the visualization of the paired-end reads alignments.
I would like to extend my sincere thanks to Dr. Rajinder Singh Chauhan, Professor and
Head of the Department of Biotechnology & Bioinformatics, Jaypee University of
Information Technology, Himachal Pradesh, India. My mentor and advisor during my
undergraduate days, he played an instrumental role in generating interest in me for this
rapidly evolving and fascinating field of Functional Genomics. His advice will continue to
help me in my work in this field.
I would like to thank all my teachers, friends and peers, whose names I could not mention
here for their encouragement and the pivotal role they have played in my academic career
thus far.
Lastly, I would like to thank my parents, Prof. Mukesh Chandra and Dr. Sangeeta
Chandra, whose constant encouragement gave me the courage and enthusiasm to complete
this work.
v
TABLE OF CONTENTS
LIST OF FIGURES...……………………………………………………………….……viii
LIST OF TABLES………………………………………………………………………….ix
1. INTRODUCTION………………………………………………………………………...1
1.1 Arabidopsis thaliana: the elite model system in plant biology studies…….….………….1
1.2 Plant defense response: a background…………………………………………………….2
1.3 Extent and roles of transcript variant accumulation in Arabidopsis biology…….………..3
1.4 Biological significance of alternative transcription start sites and promoters.....................5
1.5 Biological significance of the transcript 5’ UTR structure………………………………..6
1.6 Biological significance of the transcript 3’ UTR structure…………………………..……7
1.7 Research objectives………………………………………………………………………..8
2. MATERIALS AND METHODS………………………………………………..………10
2.1 Plant material, growth conditions and inoculation procedure…………………………...10
2.2 Total RNA isolation, cDNA library preparation………..………………………………..11
2.3 RNA sequencing………………………………………………………………………... 11
vi
2.4 RNA-seq data analysis…………………………………………………………………...12
2.5 Analysis of GO Term enrichment………………………………………………………..12
2.6 Quantitative real time PCR experiments…………………………………………………13
3. RESULTS………………………………………………………………………………...14
3.1 Coverage of the Arabidopsis defense transcriptome obtained by Illumina RNAsequencing………………………………………………………………………...………….14
3.2 Distribution of FPKM values and fold changes…………………………………………15
3.3 Correlation between RNA-seq data and a microarray-based analysis of Arabidopsis
defense transcriptome ………………………………………………………………….……15
3.4 Characterization of detected transcriptome and proteome diversity……………………..16
3.5 Functional categorization of AS genes via GO Term enrichment analysis…………..….17
3.6 Pst-induced changes in the Arabidopsis transcriptome and proteome associated with
resistance and susceptibility………………………………………………………………….17
3.7 Validation of individual transcript and whole gene abundance via qRT-PCR…………..19
4. DISCUSSION………………………………………………………………………….....43
5. REFERENCES……………………...……………………………………………………46
vii
LIST OF FIGURES
Figure 1. Workflow of the followed protocols…………………………………………..… 21
Figure 2. Distribution of the FPKM values for genes (A) and individual transcript isoforms
(B) detected in the pair-wise comparison mock- versus avirulent-treated samples………. .22
Figure 3. Distribution of Pst-induced fold changes in overall gene (A) and individual
transcript isoform (B) mRNA steady-levels obtained by comparing FPKM values of
avirulent- vs mock-treated samples…………………………………………………………..23
Figure 4. Examples of Arabidopsis transcript isoforms generated at the transcriptional level
by alternative TSS usage and/or post-transcriptionally via AS and impact on UTR and
predicted protein structures……………………………………………….………………….24
Figure 5. Examples of Pst-induced changes in alternative TSS usage or AS resulting in
regulated changes of protein localization. M, mock-; A, avirulent Pst-; V, virulent Psttreatments…………………………………………………………………………………….25
Figure 6. Examples of Pst-induced changes in alternative TSS usage or AS resulting in
regulated loss of protein functional domains. M, mock-; A, avirulent Pst-; V, virulent Psttreatments……………………………………………………………………………………26
Figure 7. Examples of Pst-regulated changes in alternative TSS usage or AS associated with
changes in the sequence of proteins of unknown functions. M, mock-; A, avirulent Pst-; V,
virulent Pst-treatments………………………………………………………………………27
viii
Figure 8. Whole-gene and transcript isoform-specific expression analysis for the genes
BPS1/At1g01550 (A-C) and CPK29/At1g76040 (D-F) via qRT-PCR (B and E, biological
replicates 1 and 2) and RNA-Seq experiments (C and F, biological replicate 1), respectively.
……………………………………………………………………………………………….28
Figure 9. Whole-gene and transcript isoform-specific expression analysis for the genes
CHX18/At5g41610 (A-C) and TIR-NBS-LRR/At5g41750 (D-F) via qRT-PCR (B and E,
biological replicates 1 and 2) and RNA-seq experiments (C and F, biological replicate 1),
respectively……………………………………………………………………….………….29
ix
LIST OF TABLES
Table 1. Coverage of the Arabidopsis Col-0 transcriptome during challenge with virulent and
avirulent Pst obtained by our Illumina RNA-Seq experiments for the indicated pair-wise
comparisons……………………………………………………………………………….....30
Table 2. Changes in the expression of well-characterized defense marker genes by Pst
AvrRPS4 at the 6 hpi as detected by RNA-Seq (this study) and Affymetrix ATH1 microarray
experiments (Bartsch et al.,. 2006)…………………..………………………………………31
Table 3. Characterization of the contribution to transcriptome and proteome diversity of the
764 AT genes with detectable accumulation of all the corresponding transcript variants and
with a twofold change in the expression of at least one variant in response to Pst………….32
Table 4. GO Term enrichment of the 764 AT genes with detected expression of all the
annotated transcript variants and with at least one variant with a two-fold change in
abundance in response to Pst treatment…………………………………………..…………33
Table 5. Examples of Pst-induced changes in Arabidopsis Col-0 whole gene and individual
transcript isoform mRNA steady-levels detected by our RNA-Seq experiments, their impact
on transcript and predicted protein structures and conservation of exon/intron transcript
isoform structure found in homologous maize and rice genes………………………………34
Table 6. Pst-induced changes in Arabidopsis Col-0 whole-gene and individual transcript
isoform expression levels as detected by RNA-Seq and qRT-PCR experiments (this study).39
Table 7. List of primers used in the qRT-PCR experiments………………………………..41
x
Supplemental Table 1. List of AT genes with expression values for all the annotated
transcript isoforms, which are coding for the same protein, and with at least one isoform with
a Pst-induced two fold change in the expression level………………………..………..……62
Supplemental Table 2. List of AT genes with expression values for all the annotated
transcript isoforms, which are coding for different proteins, and with at least one isoform
with a Pst-induced two fold change in the expression level…………………………………75
xi
1. INTRODUCTION
1.1 Arabidopsis thaliana: the elite model system in plant biology studies
A combination of factors including a small genome (~157 Mb), short life-cycle, easy
transformation via a floral dip method, and availability of a large number of genomic
resources have collectively made Arabidopsis thaliana the elective model organism in plant
biology research. TAIR (http://www.arabidopsis.org/), a comprehensive Arabidopsisdedicated on-line resource contains the most recently updated version of the annotation of the
organism genome (TAIR10, Nov 17 2010) along with links to collections of other important
Arabidopsis-related databases including T-DNA express, which is an efficient gene mapping
tool (http://signal.salk.edu). In addition to structure, sequence and functional annotation,
TAIR provides information on metabolic pathways, genomic maps, genetic markers,
germplasm collections and relevant publications (Lamesch et al., 2011). These resources are
further aided by visualization tools like SeqViewer and AraCyc to view gene/genome regions
and biochemical pathways, respectively, along with sequence analysis tools like BLAST and
FASTA and data retrieval tools (e.g. GeneHunter). Examples of other resources include the
Arabidopsis Biological Resource Centre an the Ohio State University (http://abrc.osu.edu/),
the purpose of which is the preservation and distribution of wild type and mutant seed stocks
and cloned DNA sequences to the scientific community around the world. Arabidopsis
played a pivotal role in facilitating the discovery of important mechanisms controlling gene
functions including dsRNA-mediated silencing and DNA methylation, which are
underpinning plant growth and development as well as the adaptation to environmental stress
(reviewed by Berr et al., 2012; Chen and Laux, 2012; Cramer et al., 2011; Ietswaart et al.,
2012; Schmitz and Ecker, 2012). A deeper understanding of such mechanisms is having a
profound impact on the progress of crop improvement to keep abreast with the rapidly rising
global demand (Chew and Halliday, 2011).
1
1.2 Plant defense response: a background
Plants, along with their pathogens, are involved in an on-going competition for dominance
the outcome of which can have a profound impact on the global agricultural system (Dodds
and Rathjen, 2010). Plants rely on their innate immune system for resistance against
pathogens, which is complex and multi-layered (Nishimura and Dangl, 2009). Innate
immunity activation involves the recognition of pathogen-associated conserved molecular
patterns (PAMPs) also indicated as elicitors by receptors called pattern-recognition receptors
(PRR). Examples of such PAMPs include bacterial flagellin and fungal chitin. The PRRs also
recognize endogenous plant molecules produced during the interaction with pathogens, e.g.
products of the plant cell wall degradation, referred to as damage-associated conserved
molecular patterns (DAMPs). This mode of activation has been indicated as PAMP-triggered
immunity (PTI). The second mechanism of defense activation is based on the recognition of
pathogen-produced race-specific virulence factors or effectors by cognate host receptors, a
process called as effector-triggered immunity (ETI, Dodds and Rathjen, 2010). An example
of well-characterized PTI in Arabidopsis involves the trans-membrane receptor kinase
FLAGELLING SENSING 2 (FLS2), which directly binds flagellin to initiate a MAP kinase
signaling cascade (Boller and Felix, 2009). The interaction between effectors and
corresponding receptor proteins occurs either by direct physical binding, or following the
“guard” model, where the receptor monitors pathogen-induced changes of a self-component.
This model is illustrated by the Arabidopsis RIN4 protein, which forms intracellular
complexes with the resistance R -genes RPM1 and RPS2 upon phosphorylation or
degradation by the bacterial effectors avrRPM1 and avrRPS2, respectively (Mckey et al.,
2003). The evolutionary relationship between PTI and ETI is illustrated by the Zig-Zag-Zig
model. According to this model, while the initiation of PTI via PAMP perception constitutes
the first ‘zig’ towards resistance, the evolution of pathogenic effectors to counter the PTI
onset forms the ’zag’ towards susceptibility. The evolution of NBS-LRR proteins to counter
the effector activities is then the final ‘zig’ resulting in ETI or race-specific resistance to
pathogens (McDowell and Simon, 2006; Nishimura and Dangl, 2009). The initiation of
2
plant-pathogen interactions has been associated with early signature events such as altered
levels of cytoplasmic ions and free calcium, the production of reactive oxidative species
(ROS) and nitric oxide (NO). PTI and ETI are similar in terms of response, with the latter
being more rapid, involving hypersensitive response (HR) against adapted pathogens (Tsuda
and Katagiri, 2010). Arabidopsis functional genomics has helped to uncover a number of
genes playing a key regulatory role in plant immunity and with highly conserved
homologous functions in major crop species including the lipase gene ENHANCED
DISEASE
SUSCEPTABILITY
1
(EDS1)
and
the
co-transcriptional
activator
NONEXPRESSER OF PATHOGENESIS-RELATED PROTEIN 1 (NPR1) (Dodds and
Rathjen, 2010; Nishimura and Dangl, 2010). Several phytohormones have been implicated in
mediating local and systemic plant defense responses, the most extensively studied being
salicylic acid (SA), jasmonic acid (JA) and ethylene (ET), which exert both antagonistic and
synergistic activities according to the lifestyle of the pathogens (Murray and Jones, 2009;
Tsuda et al., 2009). A link between PTI and abscisic acid (ABA) is illustrated in Arabidopsis
by the flagellin-induced stomatal closure response, which requires ABA and is disabled by
the bacterial toxin coronatine, a JA structural homolog (Melotto et al., 2006). Auxins have
also been implicated in response to pathogens. While exogenous applications of auxin
induces more severe disease symptoms, flagellin perception leads to the repression of auxin
signaling via the induction of the microRNA miR393 that targets the auxin receptor TIR1
transcript (Navarro et al., 2006).
1.3 Extent and roles of transcript variant accumulation in Arabidopsis biology
Data from the most recent annotation of the Arabidopsis genome has yielded 33,602 genes,
41,671 gene models and 5,886 genes (18% of the total) with corresponding alternative
transcripts generated at the transcriptional level by alternative usage of transcription start site
(TSS) and/or at the post-transcriptional level by alternative splicing (AS). The process of
splicing resides in the elimination of the introns and the stitching together of the exons from
the newly transcribed pre-messenger RNA to yield mature RNA ready to be translated into
3
proteins. Splicing is mediated by a large RNA-protein complex, the spliceosome (Will and
Luhrmann, 2011). A total of 74 small nuclear RNA (snRNA) genes and 395 genes encoding
splicing-related proteins were identified in the Arabidopsis genome (Wang & Brendel, 2004).
The positioning of this spliceosome complex on the pre-mRNA is directed via cis-elements
(splice sites, branch-point and the polypyrimidene tract) and trans-acting factors such as
serine/arginine-rich (SR) proteins (Wahl et al., 2009). When splice sites are selected and
constant in all transcripts, the result is “constitutive splicing”; variable splice sites (often
influenced by auxiliary cis-elements like enhancers and repressors) generate alternative
transcripts producing multiple mRNAs from the same gene, and the process is referred to as
alternative splicing (AS). The most common types of AS are exon skipping (most common in
mammals, including humans), intron retention (most common in plants, including
Arabidopsis) and alternative splice sites.
Both AS and the alternative usage of
polyadenylation sites can introduce premature stop codons in the mRNA, which can become
the target of the non-sense mediated decay (NMD) surveillance pathway (Marquez et al.,
2012; Barash et al., 2010). The generation of alternative transcripts varies across cell types,
developmental stages and environmental conditions, playing a vital role in molding
transcriptome and proteome complexity, by changing UTR and CDS sequences. Alternative
TSS and AS events affect the proteome in several ways, for instance by altering protein
binding activity, sub-cellular localization, stability, signaling activities and post-translational
modifications (Kelemen et al., 2012). The estimates of the transcript isoform occurrence have
progressively risen with the increased coverage of the transcriptome facilitated by the latest
high-throughput sequencing technologies. Recent studies have indicated that ~61% of the
Arabidopsis intron-containing genes undergo AS, that alternative 3’ splice site based AS
events are more than twice as frequent as those involving an alternative 5’ splice site, and
that several transcripts display more than one splicing event (Filichkin et al., 2010; Marquez
et al 2012). AS plays in plants pivotal roles in various developmental processes, including
transition to flowering and regulation of circadian clock responses (Ali and Reddy 2008,
Terzi and Simpson 2008; Petrillo et al., 2011). Furthermore, genes undergoing AS have been
found to cover a vast array of functional groups including stress responses, which may
4
indicate that AS could confer adaptive advantages to plants under unfavorable conditions
(Ali and Reddy, 2008). Numerous are the findings linking AS to the orchestration of the
response to environmental changes, including that the initiation of signaling pathways upon
stress perception leads to the expression of SR proteins, their phosphorylation and relocation
into the nucleus, as well to the differential regulation of their own AS pattern (Ali and Reddy,
2008; Duque, 2011; Palusa et al., 2007). Interestingly, the SR gene transcriptional and posttranscriptional regulation appears to be stress-specific (Palusa et al., 2007; Tanabe et al.,
2007). Some transcription factors, which are known to regulate gene expression under stress
conditions, are also regulated via AS (Iida et al., 2004 and 2005), which uncovers another
layer of gene regulation based on the relationships between transcription and splicing.
Several R genes encoding NBS-LRR proteins are alternatively spliced, including the tobacco
N, the barley Mla6, the Arabidopsis SNC1 and RPS4 (Gassmann, 2008; Xu et al., 2011).
Constitutive expression of defense genes has been associated with deleterious phenotypes,
thus underlining the importance for a tight gene regulation (both positive and negative) in
order to find a balance between eliciting rapid and effective responses and preventing selfdamage. Accordingly, the regulation of AS of R-gene transcripts, which results in the
accumulation of truncated proteins with a possible dominant-negative activity, is an effective
way of tuning-up the defense machinery (Gassmann, 2008). As a proof-of-concept, the
functional analysis of RPS4 transcript isoforms, which are generated by retentions of intron 2
or 3, provided evidences that the accumulation of all the isoforms is required for full
resistance, and that the splicing system is dynamically regulated and highly specific (Zhang
and Gassmann, 2007).
1.4 Biological significance of changes in alternative transcription start sites and
promoters
In the post-genomic era, it has been proven that the size of eukaryotic genomes is no measure
of its complexity. It has also been elucidated that alternative TSS and promoters increase the
number of encoded proteins (Kapranov, 2009). For instance, alternative TSS usage in
5
Arabidopsis and rice has been implicated in changes of protein localization particularly in
response to stresses (Dinkins et al., 2008). Knowledge of the exact TSS position is critical to
the process of identifying the regulatory regions that flank the TSS. The main methods
employed for TSS identification include cap analysis of gene expression (CAGE), oligocapping and robust analysis of 5’-transcript end (5’ RAGE) (Gowda et al., 2006). Combining
these methods with RNA-sequencing has helped identifying not only the precise positions of
TSS, but also to deduce the relative strengths and positions of several promoter elements
affecting transcription, including transposon-like sequences, upstream open reading frames
(uORFs), internal ribosomal entry sites (IRES) and trans-acting short RNAs, thus facilitating
a correlation between their influence and transcriptional initiation (Kawaji et al., 2006; Taft
et al., 2009; Valent and Sandelin, 2011; Plessy et al., 2010; Yamamotoet al., 2009). Tagging
of 5’ RNA ends revealed that the majority of Arabidopsis promoters contain several TSSs
and that TSS distribution for a specific promoter is highly conserved across different species
(Kawaji et al., 2006). The fact that only a small fraction of small RNAs could be associated
with promoter elements, and that their genomic positions are highly conserved across diverse
species, indicates that they have a distinct biological function and are not mere by-products
of suspended polymerase or fragments of longer mRNA molecules (Taft et al., 2009).
Noticeably, the choice of an alternative TSS has been associated to the phosphorylation state
of key transcription factors as well as to DNA accessibility as determined by the methylation
state (Euskirchen et al., 2007; Megraw et al., 2009). A recent study of Drosophila
melanogaster developmental stages based on specific 5’-complete paired-end cDNA
sequencing has revealed that about 40% of the developmentally regulated genes possess at
least two promoters and indicated the presence of transposable elements in several of them,
proving the role of mobile elements in generating new regulatory networks (Batut et al.,
2012).
6
Biological significance of the 5’ UTR structure
Recent studies have indicated that 30 to 50% of the eukaryotic 5’UTR sequences contain
short uORFs, which regulate gene expression by interfering with the transcription of the
downstream primary ORF and have been also implicated in altering translation initiation.
Noticeably, about 1% of these uORFs in flowering plants encode conserved peptides
(Dinkins et al., 2008; Kochetov, 2008; Jorgensen and Dorantes-Acosta, 2012).
Computational analyses have uncovered the conservation of the presence of uORFs in
orthologous genes, in particular of transcription factors, across diverse species further
supporting their function in gene regulation (Jorgensen and Dorantes-Acosta, 2012). Other
genetic elements associated with the 5’UTRs are the IRES. These small nucleotide sequences
promote cap-independent translation initiation and can modulate the relative expression of
protein isoforms having distinct intra- or extracellular localization and functions, like in the
well-characterized case of the mammalian Fibroblast Growth Factor 1 (Martineau et al.,
2004; Pachego and Martinez-Salas, 2010). Numerous eukaryotic genes contain introns in
both 5’ and 3’ UTRs, which have been implicated in the regulation of transcription rate,
mRNA transport and mRNA stability (Bicknell et al., 2012). In general, the 5’UTRs have a
higher frequency of introns than the 3’UTRs, ranging 35% versus 6-16% in humans, and
17.1% versus 2.5% in Arabidopsis, respectively (Bicknell et al., 2012; Hong et al., 2006).
1.6 Biological significance of the 3’ UTR structure
The sequence of the transcript 3’UTRs has been characterized by the presence of
riboswitches and associated to the NMD process. Riboswitches are small regions that act as
metabolic sensors and may alter gene expression through their capability to bind small
molecules (Bocobza et al., 2007). Riboswitch targets include vitamins, amino acids, and
nucleotides via a highly conserved sensor domain. Substrate binding results in
conformational changes leading to gene regulation via alternative mRNA processing and
transcription termination often associated to NMD. Riboswitches work mostly in regulatory
7
mechanisms involving feedback loops (Chea et al., 2007). An example is the regulation of
the Arabidopsis gene THIC involved in the biosynthesis of vitamin C. When present in high
concentration, the biosynthetic intermediate thiamine pyrophosphate (TPP) binds to the
THIC riboswitch making a new splice site accessible to the spliceosome machinery. This
event results in the accumulation of a THIC transcript variant targeted for degradation
(Bocobza et al., 2007).
1.7 Research objectives
The project involves a genomic analysis of alternative transcription and splicing that takes
place during the interaction of the model organism Arabidopsis thaliana with the bacterial
pathogen Pseudomonas syringae pathovar tomato (Pst), causal agent of bacterial speck
diseases of vegetable crop plants. The overall goal of the project is to reveal the extent of the
defense transcriptome that is regulated at the transcript isoform level upon pathogen attack
and characterize the impact on transcriptome and proteome diversity. The specific objectives
of the study are: 1) the detection and quantification of Arabidopsis transcript variants during
the interaction with virulent (causing diseases) and avirulent (inducing resistance) Pst strains;
2) the identification of transcript isoforms whose abundance is differentially regulated during
disease resistance and susceptibility; 3) the structural characterization of the differentially
expressed transcript variants and corresponding encoded proteins; 4) the identification of the
conservation of the alternative transcript exon/intron architecture in homologous genes of
crop species such as maize and rice.
The research methods include the use of the latest high-throughput RNA sequencing
technology that can generate longer paired-end reads, so ensuring a significant coverage of
the isoform-specific transcript regions, and the application of newly developed
bioinformatics tools for isoform reconstruction and abundance estimation. In particular,
Illumina 75 bp paired-end read sequencing libraries are generated from RNA samples
obtained from Arabidopsis leaves harvested 6 hours after mock- and Pst-inoculation. The
obtained sequencing data are analyzed using first a combination of the mapping programs
8
Bowtie and TopHat and then the Cufflinks platform for estimating changes in transcript
abundance.
While detection of AS events in different plant species has been achieved mostly aligning
EST and cDNA sequences and, more recently, RNA-seq data (Campbell et al., 2006; Wang
and Brendel, 2006; Barbazuk et al., 2008; Filichkin et al., 2010), to our knowledge this is the
first attempt to quantify in a plant species the abundance of all the annotated alternative
transcripts generated at the transcriptional level via alternate usage of TSS and/or posttranscriptionally by AS. A better understanding of the phenomic consequences of the
regulation of the transcriptome at the transcript isoform level can constitute a new mean for
the development of new strategies for improving plant performance.
9
2. MATERIALS AND METHODS
2.1 Plant material, growth conditions and inoculation procedure
Seeds of Arabidopsis(Arabidopsis thaliana) ecotype Columbia-0 (Col-0) were surface
sterilized in 70% ethanol for few seconds and in 2% sodium hypoclohirde for 10 minutes,
stratified at 4 °C for two days for ensuring a more uniform germination process and sown in
Arabidopsis Potting Medium PM-15-13 (Lehle Seeds, Round Rock, TX, USA). Plant
material was maintained in a growth chamber (model AR36LC8, Percival Scientific Inc., IA,
USA) set with a temperature of 24 °C, 8-hour-light/16-hour-dark photoperiod and light
intensity of 100 µE m-2 s-1. Leaves of six-week-old plants were infiltrated using a needless
syringae with a cell suspension of the bacterium Pseudomonas syringae pathovars tomato
(Pst) strain DC3000. In particular, for studying the compatible Arabidopsis-Pst interaction
characterized by host susceptibility/pathogen virulence, Col-0 was inoculated with Pst
DC3000 carrying the empty vector pVPS61 (virulent strain, also indicated as VIR); whereas,
to characterize the incompatible Arabidopsis-Pst interaction resulting from host
resistance/pathogen avirulence, plants were infiltrated with Pst DC3000 transformed with a
version of the vector pVPS61 engineered to express the bacterial effector avrRPS4, which is
recognized by the Col-0 resistance gene RPS4 (Zhang & Gassmann, 2003; bacterial strains
were kindly provided by Dr. Walter Gassmann, Division of Plant Sciences, University of
Missouri, USA). Two days prior the inoculation, bacterial cultures were restored from stocks
maintained at -80 °C by streaking on King’s medium plates (composition per liter: 10 g
Difco proteose peptone #3, 1.5 g K2HPO4 anhydrous, 15 glycerol, pH 7.0, filter sterilized
5mM MgSO4 added after autoclaving) supplemented with the antibiotics kanamycin (25
mg/L) and rifampicin (100 mg/L) for positive selection for the presence of the vector pv and
against other microorganims, respectively. Some of the newly grown bacterial cells were
then suspended in 10 mM MgCl2 buffer to reach a final OD600= 0.01 corresponding to a
concentration of 107 colony forming unit (CFU) per mL. Three independent inoculation
experiments were conducted. Three to 5 leaves of each plant were infiltrated and leaves from
10
at least 12 different plants were harvested, pooled and immediately frozen in liquid nitrogen
for each treatment and for each time point after the inoculation considered (MOCK, AVR
and VIR).
2.2 Total RNA isolation, cDNA library preparation
Leaf tissues were grinded to a fine powder and the total RNA was extracted using RNeasy
Plant Mini Kit (Qiagen, Valencia, CA-USA). An on-column DNase step was added
according to the manufacturer instructions (Qiagen) to eliminate possible sources of DNA
contamination. The integrity of the extracted RNA was assayed via gel electrophoresis and
by using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA-USA). Total
RNA samples from samples harvested at 6 hpi were then shipped to the sequencing facility
of the Genomic Sciences Division, David H. Murdoch Research Institute (DHMRI,
Kannapolis, NC-USA, www.dhmri.org) for further processing.
2.3 RNA-sequencing
Sequencing libraries were generated using the Illumina mRNA-Seq sample preparation kit
according to the manufacturer's instructions (Illumina Inc., San Diego, CA-USA). Briefly,
poly-A RNA was isolated from total RNA and chemically fragmented. After first and second
strand synthesis, the cDNA went through end-repair and ligation reactions using the pairedend adapters and amplification primers (Illumina catalog # PE102-1004). Ligation of the
adapters adds 94 bases to the length of the cDNA molecules. The cDNA libraries were sizefractioned on agarose gels and 300 ± 100 bp fragments were gel purified and amplified by
PCR using the paired-end primers. The cDNA was quantified using the Qubit fluorometer
and PicoGreen quantification reagents (Invitrogen, CA-USA), checked on an Agilent 2100
Bioanalyzer and run on an Illumina Genome Analyzer (GA) IIx using v4 sequencing
chemistry (Illumina, Inc.). After running one lane per library, a total of 31.2, 33.5 and 41.1
million reads were generated for the MOCK, AVR, and VIR samples, respectively.
11
2.4 RNA-seq data analysis
Reads were filtered for removal of low quality and adapter sequences using the Illumina’s
Real Time Analysis (RTA) software. The Arabidopsis chromosome sequences, TAIR10
annotations and sequence files for individual annotated genome features (genes, cDNAs, 3’
and 5’ UTRs, introns, exons and intergenic regions) were downloaded from TAIR
(ftp://ftp.arabidopsis.org./home/tair/sequences/). Reads were mapped to the Arabidopsis
TAIR10 reference genome with TopHat version 1.0.14 (Trapnell et al., 2009;
http://tophat.cbcb.umd.edu/). TopHat alignments were converted to BAM files and sorted for
use
with
the
open-source
software
Cufflinks
(Trapnell
et
al.,.,
2010;
http://cufflinks.cbcb.umd.edu; version 1.0.2) to generate a transcriptome assembly for each
treatment using default parameters such as a cut-off of 1000 reads/transcript. All the different
functions of the Cufflinks package that include Cuffcompare, Cuffmerge and Cuffdiff, were
used to identify differentially expressed genes and transcripts. In particular, Cufflinks
assemblies were first compared to the annotated transcriptome using Cuffcompare. The
obtained assemblies were then merged with the annotated transcriptome using Cuffmerge.
Finally, gene and transcript expression levels were compared using Cuffdiff.
2.5 GO Term enrichment of differentially expressed transcripts
The over-representation of gene ontology (GO) terms in the categories biological process,
cellular component and molecular function between the differentially expressed genes and
transcripts was determined using tools and default search parameters of AmiGO version 1.8
(http://amigo.geneontology.org/cgi-bin/amigo/term_enrichment).
TargetP
version
1.1
(http://www.cbs.dtu.dk/services/TargetP/) was used to predict the subcellular location of the
proteins encoded by the alternative transcript considered. The location assignment is based
on the predicted presence of any of the N-terminal pre-sequences: chloroplast transit peptide
(cTP), mitochondrial targeting peptide (mTP) or secretory pathway signal peptide (SP)
(Emanuelsson et al., 2007). Protein conserved functional domains were identified using
12
annotation tools in TAIR (www.arabidopsis.org), The Plant Proteome Database (PPDB,
http://ppdb.tc.cornell.edu/default.aspx; Sun et al., 2008) and Conserved Domain Database
(CDD, http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml).
2.6 Quantitative real time PCR experiments
One µg of total RNA was used as a template to synthesize the first-strand cDNA using
TaqMan Reverse Transcription Reagents kit (Applied Biosystems, Roche, USA) with oligo
(dT) primers. Twenty ng of the synthesized cDNA were used in 20 µl quantitative real time
(RT)-PCR reaction with SYBR Green PCR Master mix (Applied Biosystems).
Amplifications were carried out in triplicate in a Stratagene Mx3000P RT PCR machine
(Agilent Technologies, Santa Rosa, CA, USA). Isoform specific forward and reverse primers
were designed to partially overlap annotated exon-exon junctions with the software Primer
Express 3.0 (Applied Biosystems) or with PrimerQuest made available on-line by IDT
Integrated Technologies (www.idtdna.com). Primers to amplify whole-gene transcripts were
designed in regions of isoform identity. No primer dimers were detected when inspecting the
melting curves and primer pairs were chosen that displayed greater than 90% amplification
efficiency. Fold-changes in gene and transcript abundance were derived by the comparative
∆∆CT method (Livak and Schmittgen, 2001). The housekeeping genes At1g13320 (PP2A)
was chosen as internal reference because of its stable level of expression across the different
conditions superimposed in this study and also under a series of previously published
treatments (Czechowski et al., 2005). The validation experiments comprised the analysis of
the expression of the defense-related genes At3g26830 (PAD3), At3g48090 (EDS1) and
At3G52430 (PAD4) chosen as positive controls of the onset of Arabidopsis resistance against
Pst. All the primers used in the qRT-PCR experiments are listed in the Table 7.
13
3. RESULTS
3.1 Coverage of the Arabidopsis defense transcriptome obtained by Illumina RNAsequencing
To determine the extent of the Arabidopsis transcriptome regulated at the transcript isoform
level during defense response, we conducted Illumina RNA-seq analysis of leaf tissues
harvested 6 hours after infiltration with buffer only (mock treatment, MOCK sample),
avirulent and virulent strains of the hemibiotrophic bacterial pathogen Pseudomonas syringae
pathovar tomato (Pst) (AVR and VIR sample) (workflow illustrated in Figure 1). The three
75 bp paired-end read libraries were sequenced using the Illumina GA IIx platform,
generating 31.2 million, 33.5 million and 41.1 million reads for the MOCK, AVR and VIR
samples, respectively. After weeding out low quality sequences, reads were aligned to the
annotated TAIR10 reference transcriptome using the software Bowtie and TopHat (Trapnell
et al., 2010 and 2012). The resulting alignment files were processed by the Cufflinks package
(Trapnell et al., 2010 and 2012) to produce a separate transcriptome assembly for each of our
three treatments. The “Cuffmerge” utility (a part of the Cufflinks package) was then used to
merge together these different assemblies to form the basis for estimating overall gene and
individual transcript expression level in each of the three specific treatments. These
processed assemblies were then fed into the program “Cuffdiff”, which quantified expression
via linear statistical models and also reported any statistical significance of differential
expression between the samples. Additionally, it also grouped sets of transcripts into
biologically meaningful groups, like all transcripts sharing a TSS, which enabled us to tell
apart transcriptional (alternative TSS/promoters) and post-transcriptional events (true AS).
These results were incorporated into excel files in which the measure of gene and transcript
abundance was given as the number of fragments per kilobase of transcript per million
mapped reads (FPKM values). In order to identify differentially regulated genes and
transcript isoforms associated with resistance and susceptibility to Pst, we performed the
following three pair-wise comparisons of the sequencing data sets: MOCK versus AVR
14
(M/A), MOCK versus VIR (M/V) and AVR versus VIR (A/V). The total number of genes
and transcripts detected and differentially expressed in the pairwise comparisons considered
are summarized in Table 1. In particular, we detect the expression of about 10,000 transcripts
and 8,000 genes, and about 20% resulted differentially regulated.
3.2 Distribution of FPKM values and fold changes
The distribution of the FPKM values for the genes and transcripts detected in the
MOCK/AVR comparison displayed a great variety ranging from as low as 0.1 to as high as
15,000. However, the vast majority (~50-70%) had an FPKM in the range of 10 to 100
(Figure 2 A-B). These results are consistent with what was previously reported for the human
transcriptome, were most of the genes possess an average level of expression of 40 FPKM
(Trapnell et al., 2010). As concern the range of changes in the level of expression, most of
the changes (40-50%) were between two to four folds, with top log2 fold change values of 27
and -28 for the up- and down-regulated transcripts, respectively (Figure 3 A-B).
3.3 Correlation between RNA-seq data and a microarray-based analysis of Arabidopsis
defense transcriptome
A comparison of Pst-induced changes in expression of several well-characterized defensemarker genes as detected by our RNA-seq analysis or by the microarray data generated by
other labs (Bartsch et al., 2006) yielded a high degree of similarity for the Mock/Avr
comparison (Table 2), confirming that our inoculation experiments produced expected
results. In detail, the only downregulated gene observed in our RNA-seq analysis,
DND1/At5g15410 encoding a mutated cyclic cation channel mirrored the microarray data,
along with all the other genes that resulted upregualted upon pathogen challenge..
15
3.4 Characterization of detected transcriptome and proteome diversity
Our analysis led to the identification of 764 AT genes (~13% of all AT genes annotated),
with expression values for all the corresponding transcript isoforms and with at least one
isoform with a twofold change in at least one of the pair-wise comparisons. For these genes,
we analyzed in details the corresponding 2,026 transcripts and predicted 1,532 proteins
(Supplemental Table 1 and 2). In particular, we scored the number of alternative transcripts
with changes in the TSSs, UTRs and coding sequence and how these changes impact the
coded proteins (changes in localization and/or functional domains). We also defined the
number of intron retention (IR), exon skipping (ES) and alternative 5’/3’ junction events.
These findings are summarized in Table 3. Several genes produced multiple transcripts via
usage of alternative TSS and AS, referred to as Mix, thus underpinning complex layer of
transcriptional and post-transcriptional regulation. The observation that intron retention is a
more common form of splicing than ES in plants (Barbazuk et al., 2008) was corroborated by
the our analysis, with about 38% of all detected AS events being IR, and only 12% ES,
respectively. Remaining AS events were due to alternative usage of 5’ or 3’ splice junctions.
About 713, 445 and 761 changes in 5’ UTR, 3’ UTR and CDSs were detected, respectively.
While 18% of the changes in the 5’ UTRs were due to AS for the genes with transcript
variants using only one TSS, changes in the 3’ UTRs were due to AS and/or alternative
polyadenylation sites. Some examples of the different types of outcomes of the regulation of
the transcriptome at the isoform level are shown in Figure 4. The gene At2g39770 (involved
in vitamin C biosynthesis) generates two isoforms, both with the same TSS and encoding the
same protein. The two isoforms differ for IR in the 3’ UTR. The transcript variants of gene
At5g13320 (encoding a jasmonate-associated protein) share the same TSS and differ for AS
in the CDS regions. The glutamine synthase gene At5g35630 simplifies complex regulation
of the same protein by three different TSS. An alternative TSS/possible internal promoter
generate an isoform coding for different protein in the protein kinase gene in
At1g76040/CPK29. A more complex and multi-layered regulation at the transcriptional and
16
post-transcriptional level characterized the generation of the three transcript variants of the
gene At3g24560/RASBERRY, involved in embryogenesis.
3.5 Functional categorization of AS genes via GO Term enrichment analysis.
In order to functionally characterize the 764 AT genes detected that had a measurable level
of expression of all the annotated isoforms and at least one isoform with atwo fold expression
change in response to Pst, a GO term analysis (http://www.geneontology.org/) was
performed, using the on line tool AmiGO. This program accepts a list of candidate genes as
input and makes a comparison between their associated GO terms and those corresponding to
the relevant background set (Zhou et al., 2007). After the enriched terms were identified, they
were grouped into the three main categories “cellular component”, “biological process” and
“molecular function” (Table 4). A marked enrichment was observed in categories including
response to stimulus (GO:0050896), response to stress (GO:0006950), defense response
(GO:0006952).
Under
“molecular
function”,
an
enrichment
in
protein
binding
(GO:0005515), calmodulin binding (GO:0005516) and catalytic activity (GO:0003824).
3.6 Pst-induced changes in the Arabidopsis transcriptome and proteome associated with
resistance and susceptibility
Among the outcomes of the alternative transcripts is the change of localization of the
encoded protein. In the case of the gene At2g37130, involved in defense response to fungus,
isoform 1-encoded protein is secreted while that corresponding to isoform 2 is localized in
the cytoplasm. Due to a difference in TSS, the involvement of an alternative promoter
element is suspected. This gene is also representative of a “shift” in the relative expression of
the individual isoforms. Under the Mock treatment, abundances of the two isoforms were
similar to each other. Under Avr and Vir treatments, isoform 1 displayed a far greater
abundance than isoform 2. Another example of differential regulation of the transcript
variants corresponding to the same gene leading to changes in protein localization was
17
observed in At1g67800 (involved in N-terminal protein myristoylation), wherein isoform 2
alone is secreted, with the rest of the isoforms are localized in the chloroplast. This isoform 2
also displays a pronounced upregulation in response to the bacterium, which was not
observed with the other isoforms. This case shows that alternative promoters may regulate
the proteome. Example of multi-layered regulation resulting in protein changes was observed
in the phosphotidate phosphatase gene At2g01180. In this gene, in addition to alternative
TSS, an IR event was also observed resulting in the localization of isoforms 1 in the
chloroplast and isoform 2 in cytoplasm, respectively. Here, it is hard to determine whether
this change is caused by the IR event in the 5’ UTR or an alternative cis-element, or perhaps
even a combination of the two. These findings are summarized in Figure 5. Another effect of
Pst-induced alternative regulation was the loss of functional domains in the encoded proteins.
Such a change is observed in At2g24790 (a positive regulator of photomorphogenesis), which
sees the loss of the CCT domain (involved in nuclear localization and protein-protein
interaction) due to a partial intron retention in isoform 2, which seems to encounter a
premature transcription stop signal. Another example involves At4g27410 (transcriptional
activator under ABA-induced dehydration), which was seen to lose the NAM protein domain
in isoform 1 due to an alternate 5’UTR. This protein domain plays a role in pattern formation
of embryos and flowers, and is expressed at meristem boundaries. This gene is also
representative of a “switch” in the relative abundances of individual isoforms 1 and 2. Under
mock treatment, isoform 1 was more abundantly expressed while isoform 2 was more
abundant with Avr and Vir (particularly the latter). In the case of At5g14930 (involved in
senescence), isoform 1 was observed to have lost the lipase 3 domain due to premature
termination signals. These data are summarized in Figure 6. Pst has also induced changes
that have not been associated with any specifically characterized functional significance yet.
For instance, At2g14560 (LURP1) does not undergo an altered localization (both isoforms
are localized in the cytoplasm) or loss of domains, as it encodes a protein of unknown
function. There is currently no explanation for the fact that both isoforms are up-regulated in
response to Avr treatment, but down-regulated with Vir. A long 3’ UTR could make it a
suitable candidate for NMD. Similar are the cases of At4g02550 and At5g44570, both of
18
which undergo AS (the former also having an alternative TSS in its isoform 3) accompanied
by up-regulation of all isoforms in response to Pst, but without any known functional
significance or altered localization, with all isoforms of the former gene being localized in
the cytoplasm and that of the latter are secreted. This data is summarized in Figure 7. A
number of other selected genes involved in plant defense response, which underwent
significant regulation in response to Pst are listed in Table 5. Genes encoding JAZ proteins,
which play an important role in defense, were found to be expressed to a greater extent in
response to Vir treatment than Avr. EDS1/At3g48090 showed a marked up-regulation
particularly in response to Avr. However, its regulation is complex as it underwent splicing
in its CDS and also seemed to be driven by an alternative promoter resulting in altered
localization. The gene At2g29630 (THIC, involved in thiamin biosynthesis) contained a
highly conserved TPP-binding site in the 3’ UTR of isoform 2, which functions as a
riboswitch (discussed earlier). The receptor gene At1g51890 underwent loss of the LRR
domain involved in ligand recognition and binding in its isoform 2 due to the skipping of two
exons. This isoform was down-regulated in response to Pst, which is in contrast to the other
isoform. A similar effect was observed in At3g17510 (CIPK, response to abiotic stress), but
this time driven by an alternative promoter rather than an AS event. In case of At3g11820
(SYP121), involved in defense response, one isoform is up-regulated in response to Pst
treatments and the other is down-regulated, with both transcriptional alternative TSS and
post-transcriptional AS (IR) occurring. This data is summarized in Table 5.
3.7 Validation of individual transcript and whole gene abundance via qRT-PCR
The transcript isoform and whole gene mRNA steady levels of a total of 18 AT genes
randomly chosen to represent low to medium level of expression and significant and notsignificant regulation in response to virulent and avirulent Pst at 6hpi was analyzed via
quantitative real time (q-RT) PCR experiments (results summarized in Table 6). While for all
the genes considered the same RNA samples used for RNA-seq were used as a template, for
four genes (At1g01550, At1g76040, At5g41610 and At5g41750) the amplifications included
19
also material generated in an independent inoculation experiment (biological replicate 2) and
harvested at 3 and 6 hpi (Figures 8 and 9). In general, a high degree of correlation between
RNA-seq and qRT-PCR results as well as between biological replicates was observed. In
particular, down-regulation of individual transcript variants and whole gene level was
confirmed for the calcium-binding gene At4g38810 and for At5g04430 and At5g04830. No
significant changes were found for genes such as the RNA-binding At2g46610 and the heat
shock protein At3g57340. Stronger regulation of one of the two isoforms, possibly linked to
the presence of an alternative promoter was confirmed for the gene BYPASS (At1g01550),
the calcium-dependent protein kinase gene CPK29 (At1g76040), the cation exchanger gene
CHX18 (At5g41610) and the P-loop gene At5g17760
20
Inoculated leaves
at 72 hpi
7
Inoculation by leaf infiltration of 6-week-old Arabidopsis Col-0 plants with 10
CFU/mL virulent (DC3000) and avirulent (avrRPS4) Pseudomonas syringae pv
tomato (Pst) or buffer only (10 mM MgSO4)
Harvest of infiltrated leaves 6 hours post-inoculation (hpi)
Total RNA Isolation
Preparation of Illumina 75 nt paired-end sequencing libraries
(cDNA synthesis, DNA fragmentation and ligation to sequencing adapters)
Sequencing using Illumina GA IIx
Read filtering for removal of low quality and adaptor sequences
Mapping of the reads to the TAIR 10 transcriptome using TopHat
Assembly, merge and compare of the read alignments to the annotated
transcriptome (Cufflinks package: Cuffmerge and Cuffcompare)
Discovery of differentially expressed transcripts by Cuffdiff
Figure 1. Workflow of the followed protocols
21
A
Mock
Avr
B
Figure 2. Distribution of the FPKM values for genes (A) and individual
transcript isoforms (B) detected in the pair-wise comparison Mock/Avr.
22
A
B
all transcripts
transcripts of AT genes
Figure 3. Distribution of Pst-induced fold changes in overall gene (A) and
individual transcript isoform (B) mRNA steady-levels obtained by
comparing FPKM values of avirulent- vs mock-treated samples.
23
Figure 4. Examples of Arabidopsis transcript isoforms generated at the
transcriptional level by alternative usage of TSS and/or post-transcriptionally
via AS and impact on UTR and predicted protein structures.
24
Figure 5. Examples of Pst-induced changes in alternative TSS usage or AS
resulting in regulated changes of protein localization. M, mock-; A, avirulent Pst-;
V, virulent Pst-treatments.
25
Figure 6. Examples of Pst-induced changes in alternative TSS usage or AS
resulting in regulated loss of protein functional domains. M, mock-; A,
avirulent Pst-; V, virulent Pst-treatments.
26
Figure 7. Pst-regulated alternative TSS usage or AS resulting in changes in
the sequence of proteins of still unknown functions. M, mock-; A, avirulent
Pst-; V, virulent Pst-treatments.
27
Figure 8. Whole-gene and transcript isoform-specific expression analysis for the
genes BPS1/At1g01550 (A-C) and CPK29/At1g76040 (D-F) via qRT-PCR (B and
E, biological replicates 1 and 2) and RNA-seq experiments (C and F, biological
replicate 1), respectively.
28
Figure 9. Whole-gene and transcript isoform-specific expression analysis for the
genes CHX18/At5g41610 (A-C) and TIR-NBS-LRR/At5g41750 (D-F) via qRT-PCR
(B and E, biological replicates 1 and 2) and RNA-seq experiments (C and F,
biological replicate 1), respectively.
29
Table 1. Coverage of the Arabidopsis Col-0 transcriptome during challenge with virulent and
avirulent Pst obtained by our Illumina RNA-Seq experiments for the indicated pair-wise
comparisons.
Mock/Avr
Mock/Vir
Avr/Vir
Detected transcripts
10,757
12,789
12,783
Differentially expressed transcripts
2,474
2,044
1,362
Detected genes
7,192
8,552
8,558
Differentially expressed genes
2,210
1,715
1,110
Detected transcripts of AT genes
5,931
7,046
7,027
993
910
574
2,409
2,866
2,859
679
543
291
Differentially expressed transcripts of AT genes
Detected AT genes
Differentially expressed AT genes
Samples were treated with: Mock, buffer only; Avr, Pst avirulent strain expressing avrRPS4;
Vir, Pst virulent strain DC3000; AT, genes with annotated transcript variants; a total of 31.2,
33.5 and 41.2 million 75 bp paired-end reads were generated for each treatment, respectively.
30
Table 2. Changes in the expression of well-characterized defense marker genes induced at 6 hpi by
Pst AvrRPS4 as detected by RNA-Seq (this study) and Affymetrix ATH1 microarray experiments
(Bartsch et al., 2006), respectively.
AGI
AT1G02450
AT1G16420
AT1G19250
AT1G21250
AT1G51660
AT1G70690
AT1G74710
AT2G41100
AT2G45760
AT2G46400
AT3G11820
AT3G25070
AT3G25882
AT3G26830
AT3G48090
AT3G52430
AT3G56400
AT3G61190
AT4G12720
AT4G23570
AT4G31800
AT4G39030
AT5G13320
AT5G15410
AT5G45250
AT5G52640
Gene
symbol
NIMIN1
MC8
FMO1
WAK1
MKK4
HWI1
EDS16
TCH3
BAP2
WRKY46
SYP121
RIN4
NIMIN-2
PAD3
EDS1
PAD4
WRKY70
BAP1
NUDT7
SGT1A
WRK18
EDS5
PBS3
DND1
RPS4
HSP90
RNA-seq (this study)
Expression level
Fold change
(FPKM)
(log2)
Mock
Avr
Avr/Mock
2
78
5.1
1
24
4.5
1
70
7.0
196
523
1.4
13
91
2.8
12
129
3.4
4
338
6.3
157
1664
3.4
1
213
7.3
8
204
4.7
29
97
1.7
27
78
1.6
34
299
3.1
7
161
4.6
10
164
4.1
23
613
4.8
117
961
3.0
4
77
4.3
40
456
3.5
27
97
1.9
89
267
1.6
12
525
5.4
6
533
6.4
71
24
-1.6
9
17
0.9
18
114
2.6
Microarray Affymetrix ATH1
Expression level
Fold change
(signal intensity)
(log2)
Mock
Avr
Avr/Mock
39
896
4.5
14
289
4.4
6
835
7.3
6
34
2.5
79
874
3.5
2
862
9.1
72
2131
4.9
381
3949
3.4
19
2757
7.2
107
2004
4.2
184
890
2.3
116
486
2.1
ND
ND
ND
10
1432
7.2
73
1158
4
25
1526
5.9
466
2885
4.1
30
1052
5.2
388
5641
3.9
127
731
2.5
166
1403
3.1
62
3073
5.6
7
2726
8.6
244
34
-2.9
ND
ND
ND
201
2622
3.7
31
Table 3. Characterization of the contribution of transcriptome and proteome diversity of the 764 AT
genes with detectable accumulation of all the corresponding transcript variants with a two-fold change
in the expression of at least one variant in response to Pst.
Genes
Transcripts
TSS
Proteins
Change in 5' UTR
Change in 3' UTR
Change in CDS
Change in localization
Change in domain
IR
ES
AJ
Same
TSS
317
798
317
743
87
232
424
18
86
203
39
263
Different proteins
Different
TSS
Mix
All
117
122
556
249
477 1524
249
271
837
241
340 1324
132
250
469
37
104
373
120
217
761
38
43
99
40
51
177
29
91
323
23
59
121
44
140
447
Same
TSS
66
149
66
66
42
50
−
−
−
48
5
30
Same protein
Different
TSS
Mix
103
39
218
135
218
82
103
39
115
87
14
8
−
−
−
−
−
−
15
15
1
5
11
41
All
208
502
366
208
244
72
−
−
−
78
11
82
TSS, transcription start site; UTR, untranslated region; CDS, coding sequence; IR, intron retention; ES,
exon skipping AJ, alternative splice junction site; Mix, genes with more than two transcript variants and
only some of them with the same TSS.
32
Table 4. GO Term enrichment of the 764 AT genes with detected expression of all the annotated
transcript variants and with at least one variant with a twofold change in abundance in response to Pst
GO term ID
Description
Biological process
GO:0050896
response to stimulus
GO:0006950
response to stress
GO:0009628
response to abiotic stimulus
GO:0051707
response to other organism
GO:0010035
response to inorganic substances
GO:0009415
response to water
GO:0006952
defense response
GO:0009725
response to hormone
GO:0009753
response to jasmonic acid
GO:0009737
response to abscisic acid
GO:0006979
response to oxidative stress
GO:0008152
metabolic process
GO:0044281
small molecule metabolic process
GO:0006082
organic acid metabolic process
Cellular component
GO:0005737
Cytoplasm
GO:0043226
Organelle
GO:0009507
Chloroplast
GO:0016020
Membrane
GO:0005773
Vacuole
GO:0030054
cell junction
GO:0048046
Apoplast
Molecular function
GO:0003824
catalytic activity
GO:0005515
protein binding
GO:0005488
binding activity
GO:0005516
calmodulin binding
GO:0016491
oxidoreductase activity
GO:0016829
lyase activity
GO:0015267
channel activity
Sample
frequency
Background
frequency
P-value
36%
22%
17%
9%
8%
4%
9%
9%
3%
4%
4%
47%
11%
7%
16%
8%
5%
3%
2%
1%
3%
4%
1%
1%
1%
35%
5%
3%
4.88E-37
1.20E-28
2.02E-29
8.78E-16
8.63E-15
4.87E-11
3.52E-10
4.12E-08
2.12E-05
3.19E-04
5.44E-05
1.28E-08
8.69E-08
8.71E-07
46%
46%
23%
33%
9%
9%
5%
21%
23%
7%
15%
3%
3%
1%
7.05E-56
6.10E-42
4.89E-41
9.51E-30
4.72E-14
3.16E-13
2.26E-11
45%
18%
45%
3%
10%
4%
2%
28%
8%
33%
1%
5%
1%
0.4%
5.27E-22
6.58E-18
1.23E-09
1.15E-07
7.72E-06
4.55E-04
1.73E-03
33
34
35
36
37
38
Table 6. Pst-induced changes in Arabidopsis Col-0 whole-gene and individual transcript isoform
expression levels as detected by RNA-Seq and qRT-PCR experiments (this study).
AGI
Gene symbol
At3g26830
At3g48090
At3g52340
At1g09140
PAD3
EDS1
PAD4
SR30
At1g51620
protein kinase
At2g14560
LURP1
At2g22090
UBA14
At2g46610
RNA-binding
At3g23280
XB3 homolog
At3g24350
syntaxin
At3g57340
HSP
At4g02600
MLO1
At4g25500
SR40
WG
WG
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
iso.3
iso.4
WG
Mock
7
9.6
23
38
25
63
3
5
8
309
473
782
ND
ND
ND
3
3
6
57
47
104
14
6
20
18
11
29
9
8
17
28
15
9
6
57
FPKM
Avr
161
164
613
37
21
58
7
8
15
599
970
1569
ND
ND
ND
3
4
7
99
78
177
16
7
24
23
10
33
10
5
16
22
6
6
5
39
Vir
39
64
195
25
23
48
15
11
26
254
385
639
ND
ND
ND
4
5
9
87
51
138
13
8
21
20
10
30
19
4
23
26
11
10
7
53
FPKM
fold change *
A/M a V/M b
23
5.6
17
6.7
27
8.5
1
0.7
0.8
0.9
0.9
0.8
2.4
5.2
1.7
2.3
2
3.4
1.9
0.8
2.1
0.8
2
0.8
ND
ND
ND
ND
ND
ND
1
1.4
1.1
1.1
1
1.2
1.7
1.5
1.6
1.1
1.7
1.3
1.2
0.9
1.2
1.3
1.2
1
1.3
1.1
0.9
0.9
1.1
1
1.1
2
0.7
0.5
0.9
1.3
0.8
0.9
0.4
0.7
0.7
1.1
1
1.1
1.1
1.2
2 ^∆∆Ct **
A/M
V/M
21.4 ± 0.3 1.9 ± 0.3
15.9 ± 0.4 3.2 ± 0.4
10.5 ± 0.2 3.7 ± 0.3
1 ± 0.2
0.7 ± 0.2
0.6 ± 0.2 0.8 ± 0.2
1.1 ± 0.1 1.4 ± 0.2
3.2 ± 0.1 5.8 ± 0.2
3.2 ± 0.2 5.2 ± 0.2
ND
ND
5.6 ± 0.1 0.8 ± 0.3
2.6 ± 0.1 0.3 ± 0.3
3.1 ± 0.1 0.4 ± 0.4
0.7 ± 0.2 0.8 ± 0.4
0.5 ± 0.3 0.7 ± 0.5
0.7 ± 0.3 0.9 ± 0.4
0.8 ± 0.2 0.7 ± 0.2
1.2 ± 0.1 1.4 ± 0.2
1 ± 0.2
1.2 ± 0.5
2.3 ± 0.1 1.9 ± 0.2
2.1 ± 0.1 1.5 ± 0.2
ND
ND
1.3 ± 0.2 0.8 ± 0.2
0.9 ± 0.2 0.5 ± 0.2
1.2 ± 0.1 1.3 ± 0.2
1.2 ± 0.2 1.1 ± 0.1
1 ± 0.2
0.8 ± 0.2
0.9 ± 0.2
1 ± 0.4
0.7 ± 0.2 0.9 ± 0.1
0.9 ± 0.2 1.2 ± 0.3
0.6 ± 0.2 0.8 ± 0.5
0.7 ± 0.2
1 ± 0.5
0.5 ± 0.6 0.7 ± 0.5
0.6 ± 0.2 1.1 ± 0.5
0.8 ± 0.2 1.4 ± 0.4
0.6 ± 0.2 0.8 ± 0.5
39
continued Table 6
AGI
Gene
symbol
At4g29210
GGT4
At4g38810
Ca-binding
At5g04430
BTRIS
At5g04830
NTF2
At5g17760
P-loop
FPKM
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
iso.1
iso.2
WG
Mock
22
2
23
23
37
59
40
24
64
2
55
56
20
1
21
Avr
30
4
33
6
15
21
19
18
37
1
30
32
255
33
288
Vir
31
8
40
10
34
44
23
24
47
3
41
44
128
25
154
FPKM
fold change *
A/M a
V/M b
1.4
1.4
2.2
4.7
1.4
1.7
0.3
0.5
0.4
0.9
0.4
0.8
0.5
0.6
0.7
1
0.6
0.7
0.9
1.8
0.6
0.8
0.6
0.8
13
6.5
29.6
22.7
13.8
7.4
2 ^∆∆Ct **
A/M
V/M
1.2 ± 0.1
1 ± 0.2
4.1 ± 0.4
6.2 ± 0.3
1 ± 0.2
1.1 ± 0.4
0.5 ± 0.2
0.6 ± 0.3
0.3 ± 0.2
0.8 ± 0.3
0.3 ± 0.2
1.5 ± 0.1
0.5 ± 0.1
0.7 ± 0.2
0.6 ± 0.2
0.9 ± 0.2
0.4 ± 0.2
0.6 ± 0.4
0.6 ± 0.3
1.7 ± 0.1
0.6 ± 0.2
1.1 ± 0.2
0.5 ± 0.1
0.9 ± 0.2
14.9 ± 0.2
5.3 ± 0.2
23.4 ± 0.2 18.9 ± 0.2
16 ± 0.1
8.7 ± 0.1
WG, whole gene expression; iso.1, isoform 1 expression; * linear fold change; ** fold change by qRTPCR relative to the reference gene PP2AA3 (At1g13320); a pairwise comparison avirulent/mock
treatment; b pairwise comparison virulent/mock treatment;
40
Table 7. List of primers used in the qRT-PCR experiments.
AGI
Gene product
Primer sequence
At1g01550
BPS1
WG-F 5'-CTTCTCTCAGGCGAATTCAAGTTTC-3'
WG-R 5'-GAAGGGGTTTCCAAAAGGGAAG-3'
Iso.1-F 5'-ACGGCAACTTTTGTGGTTAGTC-3'
Iso.1-R 5'-GTGGACGAGCCATTCTGATAAGTTTC-3'
Iso.2-F 5'-TGCTCAGAAACTTCTCTCAGGCGA-3'
Iso.2-R 5'-TCTTGTGGACGAGCCATTCTGA-3'
WG-F 5'-TGGAGACTCTATCCACTGTTGAGG-3'
At1g13320
PP2AA3
WG-R 5'-ATGCTGATACTCTGCCTGTGAACC-3'
At1g51620
Protein kinase
WG-F 5'-CATTGGATATTGCGATGATGGT-3'
WG-R 5'-ACCATTTGCCACGAATTCG-3'
Iso.1-F 5'-TCCAGTAACTTGGTTTTTTCTTACCA-3'
Iso.1-R 5'-TCGTCTGCATCACCTGCAA-3'
Iso.2-F 5'-TGCATTCTAGGCGATGACAGA-3'
Iso.2-R 5'-TCTGCATCACAGCAAATGAACTT-3'
At1g76040
CPK29
WG-F 5'-TCAGTGGTGTTCCTCCATTTTGGG-3'
WG-R 5'-GGCCAAGGAGAAGTTTCAAGATCC-3'
Iso.1-F 5'-AGACTTTTGAGAGTGGAGAGCTG-3'
Iso.1-R 5'-TTCAGGTGCAACGTAGTATGCG-3'
Iso.2-F 5'-TCTTCCTCTTCCGATTCAAGCC-3'
Iso.2-R 5'-AGGTGGGATTACTGGTTTGACC-3'
At2g14560
LURP1
WG-F 5'-GGAGGGTGCTCTACTATACACGGTG-3'
WG-R 5'-GTAGACAACACAAGAGTCGTCGAGC-3'
Iso.1-F 5'-TCCAAGCACTAAAGCACCATCACC-3'
Iso.1-R 5'-AGACATGATTCTCGCTGGCATGAG-3'
Iso.2-F 5'-GCTCGACGACTCTTGTGTTGTCTAC-3'
Iso.2-R 5'-GTTTGTTTACGTGGGCAATATGGTGTC-3'
At2g22090
UBA1A, RNA-binding WG-F 5'-AATCATGGACAGCCTCAGCAA-3'
WG-R 5'-TCCACCACCGTTAAACACATGT-3'
Iso.1-F 5'-AGAGGGACGGGTAAAGATTCG-3'
Iso.1-R 5'-CCCACACCATCATGAATAGTTTGA-3'
Iso.2-F 5'-TCCTCCTTTCATGGCTACTCAAA-3'
Iso.2-R 5'-CATTGCTACAAAACACCAAAGGTT-3'
At2g46610
RS31A
WG-F 5'-CGCGATGCTGAAGATGCA-3'
arginine/serine-rich
WG-R 5'-CGTCGCCCATATCCAAAAGT-3'
splicing factor
Iso.1-F 5'-CGGATCTTGAAAGATTGTTCA-3'
Iso.1-R 5'-AAGCATAACCAGACTTCATAT-3'
Iso.2-F 5'-CTTCTTATGTACACCAGCCTTCACA-3'
Iso.2-R 5'-GCGCTCATCCTCAAAATACACA-3'
At3g13224
RNA-binding
WG-F 5'-GTTCACCTCCTTCTTACGGTAGTCAC-3'
RRM/RBD/RNP motifs WG-R 5'-GGTCCACCATAACTTGCATAACTATCG-3'
Iso.1-F 5'-CAAATGTAAGGAGCTTGTCACAGTAAGATAG-3'
Iso.1-R 5'-GCTTTCTTGATCTCCACCTACAACAAAG-3'
Iso.2-F 5'-CAGACACTCAGGTGGAGATCAAGAAAG-3'
Iso.2-R 5'-GTGACTACCGTAAGAAGGAGGTGAAC-3'
At3g23280
XBAT35 ubiquitin ligase WG-F 5'-GCTGCTTCTCTTCCAGCTTCA-3'
WG-R 5'-CGTTCCAGTGTTCCCGTCTT-3'
Iso.1-F 5'-CGGGAGACGTAGGCCTCAA-3'
Iso.1-R 5'-CTTCAGTCGAAGGTGCGAATT-3'
Iso.2-F 5'-TGATAACTCCACAAAAACACGTCTTAA-3'
41
At3g26830
PAD3
At3g48090
EDS1
At3g52430
PAD4
At3g57340
heat shock protein DnaJ
At4g25500
RSP35 arginine/serinerich splicing factor
At4g29210
GGT4
gammaglutamyltransferase
At4g38810
calcium-binding
At5g04430
BTR1,
BINDING TO TOMV
RNA 1
At5g17760
P-loop containing
nucleoside triphosphate
hydrolases
At5g41750
TIR-NBS-LRR, disease
resistance protein
continued Table 7
Iso.2-R 5'-CCACTTTAGCTGTTGGCTGTCA-3'
WG-F 5'-TCCGGTGAATCTTGAGAGAGCCAT-3'
WG-R 5'-ACAGTTTCTCTTATGACCGCT-3'
WG-F 5'-TCAGTCACGCACTTGGGAGAGAAA-3'
WG-R 5'-CCAATTGGGCAACATGAGGCA-3'
WG-F 5'-ACGAGTGAGTTGCAAGCTTCGGTA-3'
WG-R 5'-CCAATTGGGCAACATGAGGCA-3'
WG-F 5'-GAACATGGATGGAAACAAAGACGATGC-3'
WG-R 5'-GATCAAGACGACGAGCTTTAGCCAG-3'
Iso.1-F 5'-TGTTGTTACCTTCCCATTAGTCATCTTG-3'
Iso.1-R 5'-GTTTCCATCCATGTTCACCCTCACG-3'
Iso.2-F 5'-GTGAGCAGAAACCCTAATTTCGTACAG -3'
Iso.2-R 5'-CTAGGGCTCGCTTGTTCTGATTTC-3'
WG-F 5'-CCTTCTTTGTTTTTCTGGAAGCA-3'
WG-R 5'-ATTATCAAACACATCCAGCTTTC-3'
Iso.1-F 5'-CTACAGGAAGCATGAAGCCAGTCT-3'
Iso.1-R 5'-GCAAACCCAGCTTTCATATCAA-3'
Iso.2-F 5'-CTCAGATCTTTGCTTTTGGCTT-3'
Iso.2-R 5'-TCCACGTTCACTCTTTGTCCAT-3'
WG-F 5'-AAGGCTCACGCGGAAGAA-3'
WG-R 5'-TGGCTCCACCGGTTCATATAG-3'
Iso.1-F 5'-CTCGTGATCACCAAGGATGGT-3'
Iso.1-R 5'-TGAAGCACCGCTGGAATTATATG-3'
Iso.2-F 5'-TCACCAAGGTTCACAATCAATATATATCTA-3'
Iso.2-R 5'-CAACCGCGGGTTTCCAT-3'
WG-F 5'-CACTCATGGGAGTCAAGAGAAAGTGAG-3'
WG-R 5'-GTTTCAGACCAGCAGCCATACC-3'
Iso.1-F 5'-ACCCTCTAGCTGTCTTATCCGTACA-3'
Iso.1-R 5'-GCTCACTTTCTCTTGACTCCCATG-3'
Iso.2-F 5'-CAACCACCAGCAACAGCAACAAG-3'
Iso.2-R 5'-GAGAGCTTACCATCTTCATCTTGGTCTAAC-3'
WG-F 5'-ACATTATCTGGGACCTTCGAGGAG-3'
WG-R 5'-TATGGGGAATGCACATTCTGGGAG-3'
Iso.1-F 5'-TCATATGCAGCGGGATACAATTCG-3'
Iso.1-R 5'-CTTGTGGTTTTGATACTTGCCTCCAGAAC-3'
Iso.2-F 5'-TCTACTCTGGTTTTCATGGTCCTCCA-3'
Iso.2-R 5'-GTGGTTTTGATACTTGCCTCCAGAAC-3'
WG-F 5'-GACTATGGTCAAGCTGTGGAGACG-3'
WG-R 5'-TGCATATCCATACGTCCTGGTCTAAG-3'
Iso.1-F 5'-GAGTCTCAGGGACCATTGACGTTATC-3'
Iso.1-R 5'-CGTCTCCACAGCTTGACCATAGTC-3'
Iso.2-F 5'-GTGTAACAAGACTGACTCTGATTAGTATAGAG-3'
Iso.2-R 5'-CGTCTCCACAGCTTGACCATAGTC-3'
WG-F 5'-AGACTCGACATGACTGGTTGCT-3'
WG-R 5'-TTGAGGCTTCTGCTGCCAATGT-3'
Iso.1-F 5'-AGGGACCATCTGCTCTTCTCATGT-3'
Iso.1-R 5'-AGATAAGGCTGCTCCAATCCGT-3'
Iso.2-F 5'-GCCTGAAGCTGTTAAGCTCTGT-3'
Iso.2-R 5'-TCCTACGACCTAAGTTGCATGT-3'
WG, whole gene, primers amplify regions common to all transcript isoforms; Iso., isoform specific primers
42
4. DISCUSSION
RNA sequencing-mediated genome-wide analysis of the Arabidopsis transcriptome is
unveiling a large extent of alternative transcription and AS differentially regulated under
stress, including challenge with the bacterial pathogen Pst, thus clearly indicating the
existence of intertwined regulatory mechanisms acting at the transcriptional and posttranscriptional levels leading to the accumulation of transcript variants. Prior to the use of
RNA-Seq, large-scale detection of AS in plants has been primarily performed aligning
available EST and full length cDNA sequences, which generally has ensured a rather limited
coverage of all the alternative transcript events (Sablok et al., 2011). The availability of
sequencing platforms able to generate a large number of longer paired-end reads constitutes a
powerful tool for the interrogation of even a complex and not completely annotated
transcriptome, allowing the detection of rare transcripts and regulatory events like transsplicing (Halleger et al., 2010; Zhang et al., 2010). In fact, by increasing the read length, the
number of slice junction spanning reads and the accuracy of splicing alignments also increase
(Wang et al., 2010; Jean et al., 2010). However RNA-seq, which has being increasingly
employed owing to its progressively declining costs, needs to be complemented with
advanced computational tools able to mine the extensive data generated for biologically
relevant inferences like changes in the abundance of alternative transcripts (Chodavarapu et
al., 2010; Marguerat and Bahler, 2010).
The characterization of Pst-induced gene expression profiles obtained via our RNA-Seq
experiments has clearly indicated that those genes play a significant role in Arabidopsis
defense. In particular, we observed a high level of correlation between our results and those
previously published by other research groups and obtained via Affymetrix microarray
analysis (Bartsch et al., 2006). Furthermore, we verified a strong correlation between our
RNA-seq and qRT-PCR results, thus validating the reliability of our methods in detecting
whole gene and individual transcript abundance changes. The majority of the detected whole
gene and individual transcript expression values were in the range of 10-100 FPKM and most
of the expression changes were in the range of two-fold. The pair-wise comparisons we
43
conducted uncovered subsets of genes and individual transcript variants differentially
regulated in response to virulent, avirulent as well as to both biotic treatments. We decided to
focus our subsequent analysis on a group of 764 AT for which we were able to obtain
expression values for all the annotated transcript isoforms according to TAIR10 and with at
least one isoform with a two-fold change upon Pst-challenge. The functional characterization
of these genes displayed significant enrichment, among the others, for response to stress and
signal transduction components, thus confirming the extent and functional significance of the
uncovered Pst-induced transcriptome changes. Among those, we identified the genes whose
corresponding transcript isoforms code for different proteins (556 genes) and the genes
whose transcript isoforms are predicted to generate the same polypeptide (208 genes),
respectively. Within these two groups, we then separated the genes with all the transcript
variants corresponding to the same gene sharing the same TSS, so apparently generated at the
post-transcriptional level by AS only, the genes with all the transcripts variants
corresponding to the same gene with different TSS, and the genes with more than two
isoforms each, which are generated at both transcriptional and/or post-transcriptional level.
Finally, we proceeded to the manual annotation of the types of AS occurring, the changes in
the UTRs and CDSs, and in particular of the instances of loss of entire conserved domains or
new cellular localization in the predicted proteins, so providing a well-defined snapshot of
the transcriptome and proteome diversity as detected by analyzing the transcriptome at the
transcript isoform level. The finding that intron retention (IR) was the more abundant form of
AS than exon skipping (ES) is consistent with previous findings in other plant species, with
the underpinning mechanistic event still unknown (Kalyna et al., 2012).
We verified conservation of the alternative exon-intron architectures in several homologous
maize and rice genes, which further corroborate the evolutionary significance of the
accumulation of transcript variants.
Our analysis revealed Pst-induced “shift” and “switches” in the relative abundance of the
alternative transcripts generated from the same locus that is associated with the activity of
probable alternative promoters or due to a differentially regulation of AS. We found also
instances of alternative TSS usage and AS involving the same transcript isoform, indicating
44
the complexity of the interplay between transcriptional and post-transcriptional gene
regulation.
Future work may include the analysis of the transcriptional activity of alternative promoters
by fusing the region immediately upstream of the 5’ UTRs with a reporter gene. Functional
analysis of individual transcript variants could also be carried out via Agrobacterium
mediated transformation of available gene mutant knock-outs. Another interesting analysis
arising out of our results could be a more in-depth study of the regulatory effects of
AS/alternative promoters that we propose, by global proteome shotgun sequencing. This
method uses a combination of fragmentation and separation to identify individual proteins
from a complex mixture. After proteolytic cleavage, the resulting peptide fragments could be
separated by way of high performance liquid chromatography (HPLC). Subsequent
identification could then be made by tandem mass spectrometry.
In summary, the RNA-seq data generated and analyzed from Pst-infected Arabidopsis Col-0
plants served as a novel and effective way for the global analysis of the transcriptome at the
transcript isoform level. Treatment-specific expressed genes and transcripts were
successfully identified and a pair-wise comparison helped us narrow down the subset of
genes (and transcripts) that were differentially expressed. Hence, the identification of the
genes and transcript isoforms potentially involved in defense against Pst greatly enhance our
comprehension of the complex cellular and molecular events that regulate this process, and
also provide a platform for subsequent functional analyses.
To conclude, our research has revealed genome-wide isoform specific expression patterns
associated with defense responses. Furthermore, we have uncovered changes in the UTRs
and CDS due to alternative TSS/promoters and the presence of putative cis-regulatory
elements of AS. These findings have major implications in gene functional analysis
(promoter analysis, protein interactions, localization and gain/loss of function). The fact that
several of the transcript variants were conserved across distant plant species (e.g. rice and
corn) opens the doors to translating these findings to other major crops to enhance their
resistance to diseases.
45
5. REFERENCES
Ali, G.S., and Reddy, A.S.N. (2008). Regulation of alternative splicing of pre-mRNAs by
stresses. Curr Top Microbiol Immunol. 326, 257-275
Ali GS, Palusa SG, Golovkin M, Prasad J, Manley JL, Reddy AS. (2007). Regulation of
plant developmental processes by a novel splicing factor. PloS One. 2(5); e471
Ansorage, W.J. (2009). Next generation DNA sequencing techniques. Nat Biotechnol. 25,
195-203.
Asensi-Fabado, M.A., Cela, J., Muller, M., Arrom, L., Chang, C.R., and Munne-Bosch,
S. (2012). Enhanced oxidative stress in the ethylene-insensitive (ein3-1) mutant of
Arabidopsis thaliana exposed to salt stress. J Plant Physiol. 169, 360-368.
Baek, J.M., Han, P., Iandolino, A., and Cook, D.R. (2008). Characterization and
comparison of intron structure and alternative splicing between Medicago truncatula,
Populus trichocarpa, Arabidopsis and rice. Plant Mol Biol. 67, 499-510.
Barash, Y., Calarco, J.A., Gao, W.J., Pan, Q., Wang, X.C., Shai, O., Blencowe, B.J., and
Frey, B.J. (2010). Deciphering the splicing code. Nature 465, 53-59.
Barbazuk, W.B., Fu, Y., and McGinnis, K.M. (2008). Genome-wide analyses of
alternative splicing in plants: Opportunities and challenges. Genome Res. 18, 13811392.
Barta A, Kalyna M, Lorković ZJ. (2008). Plant SR proteins and their functions. Curr Top
Microbiol Immunol 326, 83-102
Bartsch M, Gobbato E, Bednarek P, Debey S, Schultze JL, Bautor J, Parker JE. (2006).
Salicylic acid-independent ENHANCED DISEASE SUSCEPTIBILITY1 signaling in
Arabidopsis immunity and cell death is regulated by the monooxygenase FMO1 and
the Nudix hydrolase NUDT7. Plant Cell 18(4), 1038-51.
Batut, P., Dobin, A., Plessy, C., Carninci, P., Gingeras, TR. (2012) High-fidelity promoter
profiling reveals widespread alternative promoter usage and transposon-driven
developmental gene expression. Genome Res doi: 10.1101/gr.139618.112
Berr., A., Menard, R., Heitz, T., Shen, W.H. (2012). Chromatin modification and
remodeling: a regulatory landscape for the control of Arabidopsis defense responses
upon pathogen attack. Cell Microbiol 14(6), 829-839.
Bicknell, AA., Cenik, C., Chua, H-N., Roth, FP, Moore, MJ. (2012) Introns in UTRs: why
46
we should stop ignoring them. BioEssays 34(12), 1025-1034.
Bocobza, S., Adato, A., Mandel, T., Shapira, M., Nudler, E., and Aharoni, A. (2007).
Riboswitch-dependent gene regulation and its evolution in the plant kingdom. Genes
Dev. 21, 2874-2879.
Boller T, Felix G. (2009). A renaissance of elicitors: perception of microbe-associated
molecular patterns and danger signals by pattern-recognition receptors. Annu Rev
Plant Biol. 60; 379-406
Bustin SA. (2002). Quantification of mRNA using real-time reverse transcription PCR (RTPCR): trends and problems. J Mol Endocrinol 29(1), 23-39
Calibi, M.N.T., C.Goff, L.Kozoil, M. Tazon-Vega, B. Regev, A. Rinn, J.L. (2011).
Integrative annotation of human large intergenic non-coding RNAs reveals global
properties and specific subclasses. Genes Dev. 25, 1915-1927.
Calkhoven CF, Muller C, Martin R, Krosl G, Pietsch H, Hoang T, Leutz A. (2003).
Translational control of SCL-isoform expression in hematopoietic lineage choice.
Genes Dev 17(8), 959-64.
Campbell, M.A., Haas, B.J., Hamilton, J.P., Mount, S.M., and Buell, C.R. (2006).
Comprehensive analysis of alternative splicing in rice and comparative analyses with
Arabidopsis. BMC Genomics 7, 327-343.
Caplan, J., Padmanabhan, M., and Dinesh-Kumar, S.P. (2008). Plant NB-LRR immune
receptors: from recognition to transcriptional reprogramming. Cell Host Microbe. 3,
126-135.
Castells E, Puigdomènech P, Casacuberta JM. (2006). Regulation of the kinase activity of
the MIK GCK-like MAPK4 by alternative splicing. Plant Mol Biol 61(4-5), 747-56.
Chodavarapu RK, Feng S, Bernatavichute YV, Chen PY, Stroud H, Yu Y, Hetzel
JA, Kuo F, Kim J, Cokus SJ, Casero D, Bernal M, Huijser P, Clark AT, Krämer
U,Merchant SS, Zhang X, Jacobsen SE, Pellegrini M. (2010). Relationship
between nucleosome positioning and DNA methylation. Nature 466(7304), 388-92.
Chen, C.B., Farmer, A.D., Langley, R.J., Mudge, J., Crow, J.A., May, G.D., Huntley, J.,
Smith, A.G., and Retzel, E.F. (2011). Meiosis-specific gene discovery in plants:
RNA-Seq applied to isolated Arabidopsis male meiocytes. BMC Plant Biol. 10, 280293.
Chen, X.M., and Laux, T. (2012). Plant development - a snapshot in 2012. Curr Opin Plant
47
Biol. 15, 1-3.
Cheah MT, Wachter A, Sudarsan N, Breaker RR. (2007). Control of alternative RNA
splicing and gene expression by eukaryotic riboswitches. Nature 447 (7143); 497-500
Chew, Y.H., and Halliday, K.J. (2011). A stress-free walk from Arabidopsis to crops. Curr
Opin Biotechnol. 22, 281-286.
Chisholm, S.T., Coaker, G., Day, B., and Staskawicz, B.J. (2006). Host-microbe
interactions: Shaping the evolution of the plant immune response. Cell. 124, 803-814.
Chung HS, Howe GA. (2009). A critical role for the TIFY motif in repression of jasmonate
signaling by a stabilized splice variant of the JASMONATE ZIM-domain protein
JAZ10 in Arabidopsis. Plant Cell 21(1), 131-45
Churbanov A, Rogozin IB, Babenko VN, Ali H, Koonin EV. (2005). Evolutionary
conservation suggests a regulatory function of AUG triplets in 5'-UTRs of eukaryotic
genes. Nucleic Acid Res 33(17), 5512-20.
Coppins RL, Hall KB, Groisman EA. (2007). The intricate world of riboswitches. Curr
Opin Microbiol 10(2), 176-81
Costa, V., Angelini, C., De Feis, I., and Ciccodicola, A. (2010). Uncovering the
Complexity of Transcriptomes with RNA-Seq. J. Biomed Biotechnol. 20, 1-19.
Cougot N, van Dijk E, Babajko S, Séraphin B. (2004). 'Cap-tabolism'. Trends Biochem
Sci 29(8), 436-444.
Cramer GR, U.K., Delrot S, Pezzotti M, Shinozaki K. (2011). Effects of abiotic stress on
plants: a systems biology approach review. BMC Plant Biol. 11, 163-176.
Croucher, N.J., and Thomson, N.R. (2010). Studying bacterial transcriptomes using RNAseq. Curr Opin Microbiol. 13, 619-624.
Csaba Koncz, Femke deJong, Nicolas Villacorta, Dóra Szakonyi, Zsuzsa Koncz. (2012).
The spliceosome-activating complex: molecular mechanisms underlying the function
of a pleiotropic regulator. Front Plant Sci 3(9), 3355604
Cumbie, J.S., Kimbrel, J.A., Di, Y.M., Schafer, D.W., Wilhelm, L.J., Fox, S.E., Sullivan,
C.M., Curzon, A.D., Carrington, J.C., Mockler, T.C., and Chang, J.H. (2011).
GENE-Counter: a computational pipeline for the analysis of RNA-Seq data for gene
expression differences. PloS One 6, 1-11.
48
Czechowski, T., Stitt, M., Altmann, T., Udvardi, M.K., and Scheible, W.R. (2005).
Genome-wide identification and testing of superior reference genes for transcript
normalization in Arabidopsis. Plant Physiol. 139, 5-17.
de Hoon M, Hayashizaki Y. (2008). Deep cap analysis gene expression (CAGE): genomewide identification of promoters, quantification of their expression, and network
inference. Biotechniques 44(5); 627-32.
Dijkstra JR, Mekenkamp LJ, Teerenstra S, De Krijger I, Nagtegaal ID. (2012).
MicroRNA expression in formalin-fixed paraffin embedded tissue using real time
quantitative PCR: the strengths and pitfalls. J Cell Mol Med 16(4), 683-90.
Dimon, M.T., Sorber, K., and DeRisi, J.L. (2010). HMMSplicer: A Tool for Efficient and
Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data. PloS
One 5, e13875.
Dinkins, R.D., Majee, S.M., Nayak, N.R., Martin, D., Xu, Q., Belcastro, M.P., Houtz,
R.L., Beach, C.M., and Downie, A.B. (2008). Changing transcriptional initiation
sites and alternative 5 '- and 3 '-splice site selection of the first intron deploys
Arabidopsis PROTEIN ISOASPARTYL METHYLTRANSFERASE2 variants to
different subcellular compartments. Plant J. 55, 1-13.
Dodds, P.N., and Rathjen, J.P. (2010). Plant immunity: towards an integrated view of
plant-pathogen interactions. Nat Rev Genet. 11, 539-548.
Dugas, D.V., Monaco, M.K., Olsen, A., Klein, R.R., Kumari, S., Ware, D., and Klein,
P.E. (2011). Functional annotation of the transcriptome of Sorghum bicolor in
response to osmotic stress and abscisic acid. BMC Genomics 12, 514-534.
Duque P. (2011). A role for SR proteins in plant stress responses. Plant Signaling Behav.
1;6(1): 49-54
Ederli, L., Madeo, L., Calderini, O., Gehring, C., Moretti, C., Buonaurio, R., Paolocci,
F., and Pasqualini, S. (2011). The Arabidopsis thaliana cysteine-rich receptor-like
kinase CRK20 modulates host responses to Pseudomonas syringae pv. tomato
DC3000 infection. Journal Plant Physiol. 168, 1784-1794.
Eichner, J., Zeller, G., Laubinger, S., and Ratsch, G. (2011). Support vector machinesbased identification of alternative splicing in Arabidopsis thaliana from wholegenome tiling arrays. BMC Bioinformatics 12, 1-17.
Emanuelsson, O., Brunak, S., von Heijne, G., and Nielsen, H. (2007). Locating proteins in
the cell using TargetP, SignalP and related tools. Nat Protoc. 2, 953-971.
49
Euskirchen
GM, Rozowsky
JS, Wei
CL, Lee
WH, Zhang
ZD, Hartman
S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB, Ruan Y, Snyder M.
(2007). Mapping of transcription factor binding regions in mammalian cells by ChIP:
comparison of array- and sequencing-based technologies. Genome Res 17(6), 898908
Fang, Z.D., and Cui, X.Q. (2011). Design and validation issues in RNA-seq experiments.
Brief Bioinform. 12, 280-287.
Farrer T, Roller AB, Kent WJ, Zahler AM. (2002). Analysis of the role of Caenorhabditis
elegans GC-AG introns in regulated splicing. Nucleic Acids Res 30(15), 3360-7
Feilner T, Hultschig C, Lee J, Meyer S, Imminik RG, Koenig A, Possling A, Seitz A,
Beveridge H, Scheel E, Cahill DJ, Lehrach H, Kersten B. (2005). High throughput
identification of potential Arabidopsis mitogen-activated protein kinases substrates.
Mol Cell Proteomics. 4(10); 1558-1568.
Ferragina P, Giancarlo R, Greco V, Manzini G, Valiente G. (2007). Compression-based
classification of biological sequences and structures via the Universal Similarity
Metric: experimental assessment. BMC Bioinformatics 8(252), 17629909
Filichkin, S.A., Priest, H.D., Givan, S.A., Shen, R.K., Bryant, D.W., Fox, S.E., Wong,
W.K., and Mockler, T.C. (2010). Genome-wide mapping of alternative splicing in
Arabidopsis thaliana. Genome Res. 20, 45-58.
Frazee, A.C.L., B., and Leek, J.T. (2011). Re-count: a multi-experiment resource for
analysis ready RNA-seq gene count databases. BMC Bioinformatics 12, 449-454.
Garber, M., Grabherr, M.G., Guttman, M., and Trapnell, C. (2011). Computational
methods for transcriptome annotation and quantification using RNA-seq. Nat.
Methods 8, 469-477.
Gassmann, W.Z., XC. (2003). RPS-mediated disease resistance requires the combined
presence of RPS transcripts with full-length and truncated open reading frames. Plant
Cell 15, 2333-2342.
Gassmann, W. (2008). Alternative splicing in plant defense. Curr Top Microbiol Immunol.
326, 219-233.
Glaus, P., Honkela, A., and Rattray, M. (2012). Identifying differentially expressed
transcripts from RNA-seq data with biological variation. Bioinformatics 28, 17211728.
50
Gowda M, Li H, Alessi J, Chen F, Pratt R, Wang GL. (2006). Robust analysis of 5'transcript ends (5'-RATE): a novel technique for transcriptome analysis and genome
annotation. Nucleic Acids Res 34(19); e126.
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I.,
Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q.D., Chen, Z.H., Mauceli, E.,
Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C.,
Lindblad-Toh, K., Friedman, N., and Regev, A. (2011). Full-length transcriptome
assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 29, 644652.
Guo, S.G., Zheng, Y., Joung, J.G., Liu, S.Q., Zhang, Z.H., Crasta, O.R., Sobral, B.W.,
Xu, Y., Huang, S.W., and Fei, Z.J. (2010). Transcriptome sequencing and
comparative analysis of cucumber flowers with different sex types. BMC Genomics
11, 1-13.
Gupta S, Zink D, Korn B, Vingron M, Haas SA. (2004). Genome wide identification and
classification of alternative splicing based on EST data. Bioinformatics 20(16), 257985.
Haas, B.J., and Zody, M.C. (2010). Advancing RNA-Seq analysis. Nat Biotechnol. 28, 421423.
Hainer SJ, Pruneski JA, Mitchell RD, Monteverde RM, Martens JA. (2010). Intergenic
transcription causes repression by directing nucleosome assembly. Genes Dev 25(1),
29-40.
Hallegger M, Llorian M, Smith CW. (2010). Alternative splicing: global insights. FEBS J
277(4), 856-866
Hinnebusch AG. (1997). Translational regulation of yeast GCN4. A window on factors that
control initiator-trna binding to the ribosome. J.Biol Chem 272(35), 21661-21664.
Hong, X., Scofield, DG., Lynch, M. (2006) Intron size, abundance and distribution within
untraslated regions of genes. Mol Biol Evol 23(12), 2392-2404.
Hosseini, P.T., A Matthews, BF Alkharouf, NW (2010). An efficient annotation and gene
expression tool for Illumina Solexa datasets. BMC Res Notes 3, 183-189.
Howard, B.E., and Heber, S. (2010). Towards reliable isoform quantification using RNASeq data. BMC Bioinformatics 11, 1-13.
Hu, Y., Wang, K., He, X.P., Chiang, D.Y., Prins, J.F., and Liu, J.Z. (2010). A
51
probabilistic framework for aligning paired-end RNA-seq data. Bioinformatics 26,
1950-1957.
Hubert DA, He Y, McNulty BC, Tornero P, Dangl JL. (2009). Specific Arabidopsis
HSP90.2 alleles recapitulate RAR1 cochaperone function in plant NB-LRR disease
resistance protein regulation. Proc Natl Acad Sci 106 (24), 9556-9563.
Ide Y, Kusano M, Oikawa A, Fukushima A, Tomatsu H, Saito K, Hirai MY, Fujiwara
T. (2011). Effects of Molybdenum deficiency and defects in molybdate transporter
MOT1 on transcript accumulation and nitrogen/sulfur metabolism in Arabidopsis
thaliana. J Exp Bot. 62 (4), 1483-1497
Ietswaart, R., Wu, Z., Dean, C. (2012) Flowering time control: another window to the
connection between antisense RNA and chromatin. Trends Genet 28(9), 445-453.
Iida K, Seki M, Sakurai T, Satou M, Akiyama K, Toyoda T, Konagaya A, Shinozaki K.
(2004). Genome-wide analysis of alternative pre-mRNA splicing in Arabidopsis
thaliana based on full-length cDNA sequences. Nucleic Acids Res 32(17), 50965103.
Iida K, Seki M, Sakurai T, Satou M, Akiyama K, Toyoda T, Konagaya A, Shinozaki K
(2005). RARTF: database and tools for complete sets of Arabidopsis transcription
factors. DNA Res 12(4), 247-256
Jackson RJ. (2005). Alternative mechanisms of initiating translation of mammalian
mRNAs. Biochem Soc Trans 33(6), 1231-1241
Jean, G.K., A. Sreedharan, V.T. De Bona, F. Ratsch, G. (2010). RNA-Seq read alignment
with PALMapper. Curr Protoc Bioinformatics. 32, 1-37.
Jeong, Y.M., Jung, E.J., Hwang, H.J., Kim, H., Lee, S.Y., and Kim, S.G. (2006). Distinct
roles of the first intron on the expression of Arabidopsis profilin gene members. Plant
Physiol. 140, 196-209.
Jiang, H., and Wong, W.H. (2009). Statistical inferences for isoform expression in RNASeq. Bioinformatics 25, 1026-1032.
Jiao, Y.L., and Meyerowitz, E.M. (2010). Cell-type specific analysis of translating RNAs in
developing flowers reveals new levels of control. Mol Syst Biol 6, 1-15.
Jorgensen, RA., Dorantes-Acosta, AE. (2012) Conserved peptide upstream open reading
frames are associated with regulatory genes in angiosperms. Front Plant Sci 3(191),
1-11.
52
Kapranov, P. (2009). From transcription start site to cell biology. Genome Biol. 10, 217220.
Kawaji
H, Frith MC, Katayama
S, Sandelin
A, Kai
C, Kawai
J, Carninci
P, Hayashizaki Y. (2006). Dynamic usage of transcription start sites within core
promoters. Genome Biol 7(12), R118.
Kelemen, O., Convertini, P., Zhang, Z., Wen, Y., Shen, M., Falaleeva, M., Stormann, S.
(2012). Function of alternative splicing. Gene, Aug 14 on line preview
Kilian, J., Peschke, F., Berendzen, K.W., Harter, K., and Wanke, D. (2012).
Prerequisites, performance and profits of transcriptional profiling the abiotic stress
response. Biochim Biophys Acta 1819, 166-175.
Kochetov, A.V. (2008). Alternative translation start sites and hidden coding potential of
eukaryotic mRNAs. Bioessays 30, 683-691.
Kozak M. (2001). Constraints on reinitiation of translation in mammals. Nucleic Acids Res
29(24), 5226-5232
Kreps JA, Wu Y, Chang HS, Zhu T, Wang X, Harper JF. (2002). Transcriptome changes
for Arabidopsis in response to salt, osmotic, and cold stress. Plant Physiol 130(4),
2129-2141.
Lamesch, P., Berardini, T.Z., Li, D.H., Swarbreck, D., Wilks, C., Sasidharan, R.,
Muller, R., Dreher, K., Alexander, D.L., Garcia-Hernandez, M., Karthikeyan,
A.S., Lee, C.H., Nelson, W.D., Ploetz, L., Singh, S., Wensel, A., and Huala, E.
(2011). The Arabidopsis Information Resource (TAIR): improved gene annotation
and new tools. Nucleic Acids Res 40, 1202-1210.
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biol. 10,
1-10.
Layat, E., Cotterell, S., Vaillant, I., Yukawa, Y., Tutois, S., and Tourmente, S. (2012).
Transcript levels, alternative splicing and proteolytic cleavage of TFIIIA control 5S
rRNA accumulation during Arabidopsis thaliana development. Plant J. 71, 35-44.
Lee, J.H., Gao, C., Peng, G.D., Greer, C., Ren, S.X., Wang, Y.B., and Xiao, X.S. (2011).
Analysis of transcriptome complexity through RNA sequencing in normal and failing
murine hearts. Circ Res. 109, 1332-1341.
Lee, S., Seo, C.H., Lim, B., Yang, J.O., Oh, J., Kim, M., Lee, S., Lee, B., Kang, C., and
53
Lee, S. (2011). Accurate quantification of transcriptome from RNA-Seq data by
effective length normalization. Nucleic Acids Res. 39, 9-18.
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., and Dewey, C.N. (2010). RNA-Seq gene
expression estimation with read mapping uncertainty. Bioinformatics 26, 493-500.
Li, G.B., J.H. Lee, J.H. Peng, G. Chen, Z. Nelson, S.F. Xiao, X. (2012). Identification of
allele-specific alternative mRNA processing via transcriptome signaling. Nucleic
Acids Res. 40, 104-116.
Livak, K.J., and Schmittgen, T.D. (2001). Analysis of relative gene expression data using
real-time quantitative PCR and the 2(T)(-Delta Delta C) method. Methods 25, 402408.
Mackey D, Belkhadir Y, Alonso JM, Ecker JR, Dangl JL. (2003). Arabidopsis RIN4 is a
target of the type III virulence effector AvrRpt2 and modulates RPS2-mediated
resistance. Cell 112(3), 379-389
Mandal M, Breaker RR. (2004). Gene regulation by riboswitches. Nat Rev Mol Cell Biol
5(6), 451-463.
Mangone M, Manoharan AP, Thierry-Mieg D, Thierry-Mieg J, Han T, Mackowiak
SD, Mis E, Zegar C, Gutwein MR, Khivansara V, Attie O, Chen K, SalehiAshtiani K, Vidal M, Harkins TT, Bouffard P, Suzuki Y, Sugano S, Kohara
Y, Rajewsky N, Piano F, Gunsalus KC, Kim JK. (2010). The landscape of C.
elegans 3'UTRs. Science 329(5990), 432-435.
Marco, F., Alcazar, R., Tiburcio, A.F., and Carrasco, P. (2011). Interactions between
polyamines and abiotic stress pathway responses unraveled by transcriptome analysis
of polyamine overproducers. OMICS 15, 775-781.
Marè C, Mazzucotelli E, Crosatti C, Francia E, Stanca AM, Cattivelli L. (2004). HvWRKY38: a new transcription factor involved in cold- and drought-response in
barley. Plant Mol Biol 55(3), 399-416.
Marguerat, S., and Bahler, J. (2010). RNA-seq: from technology to biology. Cell Mol Life
Sci. 67, 569-579.
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. (2008). RNA-Seq: an
assessment of technical reproducibility and comparison with gene expression arrays.
Genome Res 18(9), 1509-1517.
Marom, L., Hen-Avivi, S., Pinchasi, D., Chekanova, J.A., Belostotsky, D.A., and Elroy-
54
Stein, O. (2009). Diverse poly(A) binding proteins mediate internal translational
initiation by a plant viral IRES. RNA Biol. 6, 446-454.
Marquez, Y., Brown, J.W.S., Simpson, C., Barta, A., and Kalyna, M. (2012).
Transcriptome survey reveals increased complexity of the alternative splicing
landscape in Arabidopsis. Genome Research 22, 1184-1195.
Martineau, Y., Le Bec, C., Monbrun, L., Allo, V., Chiu, I-M., Danos, O., Maine, H.,
Prats, H., Prats, A-C. (2004). Internal ribosome entry site structural motifs
conserved among mammalian fibroblast growth factor 1 alternatively spliced mRNA.
Mol Cell Biol 24(17), 7622-7635.
McDowell JM, Simon SA. (2006). Recent insights into R gene evolution. Mol Plant Pathol
7(5); 437-448
Megraw M, Pereira F, Jensen ST, Ohler U, Hatzigeorgiou AG. (2009). A transcription
factor affinity-based code for mammalian transcription initiation. Genome Res 19(4),
644-56.
Meier, S.G., C. (2008). A guide to the integrated application of on-line data mining tools for
the influence of gene functions at the systems level. Biotechnol J. 3, 1375-1387.
Melotto M, Underwood W, Koczan J, Nomura K, He SY. (2006). Plant stomata function
in innate immunity against bacterial invasion. Cell 126(5), 969-980.
Mizuno, H., Kawahara, Y., Sakai, H., Kanamori, H.W., H., Yamagata, H.O., Y., Wu, J.,
Ikawa, H., Itoh, T., and Matsumoto, T. (2010). Massive parallel sequencing of
mRNA in identification of un-annotated salinity stress-induced transcripts in rice.
BMC Genomics 11, 1-13.
Moran-Diez, E.R., B., Dominguez, S.H., R., Monte, E., and Nicolas, C. (2010).
Transcriptomic response of Arabidopsis thaliana after 24h incubation with the
biocontrol fungus Trichoderma harzianum. Journal of Plant Physiology 169, 614-620.
Moyle RL, Crowe ML, Ripi-Koia J, Fairbairn DJ, Botella JR. (2005). PineappleDB: an
online pineapple bioinformatics resource. BMC Plant Biol 5(21), 1260026.
Mukhtar M.S, N.M.D.J. (2009). NPR1 in Plant Defense: It’s not over ‘til it’s turned over.
Cell 137, 804-806.
Murray R.G., Jones, JDG. (2009). Hormone (dis)harmony molds plant health and disease.
Science 324, 750-752.
55
Narita NN, Moore S, Horiguchi G, Kubo M, Demura T, Fukuda H, Goodrich
J, Tsukaya H. (2004). Overexpression of a novel small peptide ROTUNDIFOLIA4
decreases cell proliferation and alters leaf shape in Arabidopsis thaliana. Plant J
38(4), 699-713
Navarro L, Dunoyer P, Jay F, Arnold B, Dharmasiri N, Estelle M, Voinnet O, Jones JD.
(2006). A plant miRNA contributes to antibacterial resistance by
repressing auxin signaling. Science 312(5772); 436-439
Nishimura, M.T., and Dangl, J.L. (2010). Arabidopsis and the plant immune system. Plant
J. 61, 1053-1066.
Orlanda, D.A.B., S.M. Koch, J.D. Dinneny, J.R. Benfey, P.N. (2009). Manipulating largescale Arabidopsis microarray expression data: identifying dominant expression
patterns and biological process enrichment. Methods Mol Biol. 553, 57-77.
Oshlack, A., Robinson, M.D., and Young, M.D. (2011). From RNA-seq reads to
differential expression results. Genome Biol. 11, 220-229.
Okamota H, Hirochika H. (2001). Silencing of transposable elements in plants. Trends
Plant Sci. 6(11); 527-534
Pacheco, A., Martinez-Salas, E. (2010). Insights into the biology of IRES elements through
riboproteomic approaches. J Biomed Biotechnol, article ID 458927, 1-12.
Paterson AH, Freeling M, Sasaki T. (2005). Grains of knowledge: genomics of modern
cereals. Genome Res. 15(12), 1643-1650.
Pepke S, Wold B, Mortazavi A. (2009). Computation for ChIP-seq and RNA-seq studies.
Nat Methods. 6(11); s22-32
Petrillo E, Sanchez SE, Kornblihtt AR, Yanovsky MJ. (2011). Alternative splicing adds a
new loop to the circadian clock. Commun Integr Biol. 4 (3); 284-286.
Plessy C, Bertin N, Takahashi H, Simone R, Salimullah M, Lassmann T, Vitezic
M, Severin J ,Olivarius S, Lazarevic D, Hornig N, Orlando V, Bell I, Gao
H,Dumais J, Kapranov P, Wang H, Davis CA, Gingeras TR, Kawai J, Daub
CO, Hayashizaki Y, Gustincich S, Carninci P. (2010). Linking promoters to
functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods
7(7), 528-534.
Reddy, A.S.N. (2007). Alternative splicing of pre-messenger RNAs in plants in the genomic
era. Annu Rev Plant Biol., pp. 267-294.
56
Reddy, A.S.N., and Ali, G.S. (2011). Plant serine/arginine-rich proteins: roles in precursor
messenger RNA splicing, plant development, and stress responses. Wiley
Interdisciplinary Reviews-Rna 2, 875-889.
Reddy, A.S.N., and Rogers, M.F.R., D.N. Hamilton, M. Ben-Hur, A. (2012). Deciphering
the splicing code: experimental and computational approaches for predicting
alternative splicing and splicing regulatory elements. Front Plant Sci. 3, 1-15.
Ricroch, A.E., Berge, J.B., and Kuntz, M. (2011). Evaluation of genetically engineered
crops using transcriptomic, proteomic, and metabolomic profiling techniques. Plant
Physiol. 155, 1752-1761.
Robinson, S.J.P., I.A. (2009). Bridging the gene-to-function knowledge gap through
functional genomics. Methods Mol Biol. 513, 153-173.
Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS. (2002). Comparative genomics
of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms. J Biol
Chem 277(50), 48949-48959.
Rooke N, Markovtsov V, Cagavi E, Black DL. (2003). Roles for SR proteins and hnRNP
A1 in the regulation of c-src exon N1. Mol Cell Biol 23(6), 1874-1884.
Rosenberg, B.R., Hamilton, C.E., Mwangi, M.M., Dewell, S., and Papavasiliou, F.N.
(2011). Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing
targets in transcript 3 ' UTRs. Nat Struct Mol Biol. 19, 230-236.
Sasaki
D, Sato K, Shibata K,Shinagawa A, Shiraki T, Yoshino M, Hayashizaki
Y, Yasunishi A. (2003). Collection, mapping, and annotation of over 28,000 cDNA
clones from japonica rice. Science 301(5631); 376-379.
Sablok G, Gupta PK, Baek JM, Vazquez F, Min XJ. (2011). Genome-wide survey of
alternative splicing in the grass Brachypodium distachyon: a emerging model
biosystem for plant functional genomics. Biotechnol Lett 33(3), 629-636.
Schmid, M.W., Schmidt, A., Klostermeier, U.C., Barann, M., Rosenstiel, P., and
Grossniklaus, U. (2012). A powerful method for transcriptional profiling of specific
cell types in eukaryotes: laser-assisted microdissection and RNA Sequencing. PLoS
One 7, 1-13.
Shabab M, Shindo T, Gu C, Kaschani F, Pansuriya T, Chintha R, Harzen A, Colby
T, Kamoun S, van der Hoorn RA. (2008). Fungal effector protein AVR2 targets
diversifying defense-related cysteine proteases of tomato. Plant Cell 20(4), 11691183.
57
Sørensen V, Wiedlocha A, Haugsten EM, Khnykin D, Wesche J, Olsnes S. (2006).
Different abilities of the four FGFRs to mediate FGF-1 translocation are linked to
differences in the receptor C-terminal tail. J Cell Sci 119(20), 4332-4341.
Stauffer, E., and Westermann, A.W., G. Wachter, A. (2010). Polypyrimidene tract
binding protein homologs from Arabidopsis underlie regulatory circuits based on
alternative splicing and downstream control. Plant J. 64, 243-255.
Taft RJ, Kaplan CD, Simons C, Mattick JS. (2009). Evolution, biogenesis and function of
promoter-associated RNAs. Cell Cycle. 8(15); 2332-2338.
Tanabe N, Yoshimura K, Kimura A, Yabuta Y, Shigeoka S. (2007). Differential
expression of alternatively spliced mRNAs of Arabidopsis SR protein homologs,
atSR30 and atSR45a, in response to environmental stress. Plant Cell Physiol 48(7),
1036-1049.
Terzi LC, Simpson GG. (2008). Regulation of flowering time by RNA processing. Curr
Top Microbiol Immunol 326, 201-218.
Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions
with RNA-Seq. Bioinformatics 25, 1105-1111.
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren,
M.J.Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and
quantification by RNA-Seq reveals unannotated transcripts and isoform switching
during cell differentiation. Nat Biotechnol. 28, 511-515.
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H.,
Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript
expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc.
7, 562-578.
Tsuda, K., Sato, M., Stoddard, T., Glazebrook, J., Katagiri, F. (2009). Network
properties of robust immunity in plants. PLoS Genet 5(12), e1000772.
Tsuda, K., and Katagiri, F. (2010). Comparing signaling mechanisms engaged in patterntriggered and effector-triggered immunity. Curr Opin Plant Biol. 13, 459-465.
Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP. (2011). Conserved function of
lincRNAs in vertebrate embryonic development despite rapid sequence evolution.
Cell 147(7), 1537-1550.
Valen, E., and Sandelin, A. (2011). Genomic and chromatin signals underlying transcription
58
start-site selection. Trends Genet. 27, 475-485.
yan MP, Stafford GA, Yu L, Morse RH. (2000). Artificially recruited TATA-binding
protein fails to remodel chromatin and does not activate three promoters that
require chromatin remodeling. Mol Cell Biol 20(16), 5847-5857.
Wahl MC, Will CL, Lührmann R. (2009). The spliceosome: design principles of a
dynamic RNP machine. Cell 136(4), 701-718.
Wang, B., Guo, G.W., Wang, C., Lin, Y., Wang, X.N., Zhao, M.M., Guo, Y., He, M.H.,
Zhang, Y., and Pan, L. (2010). Survey of the transcriptome of Aspergillus oryzae
via massively parallel mRNA sequencing. Nucleic Acids Res. 38, 5075-5087.
Wang, B.B., and Brendel, V. (2006). Genomewide comparative analysis of alternative
splicing in plants. Proc Natl Acad Sci U S A 103, 7175-7180.
Wang BB, Brendel V. (2004). The ASRG database: identification and survey of Arabidopsis
thaliana genes involved in pre-mRNA splicing. Genome Biol 5(12), R102
Wang, X., Wang, K.J., Radovich, M., Wang, Y., Wang, G.H., Feng, W.X., Sanford,
J.R., and Liu, Y.L. (2009). Genome-wide prediction of cis-acting RNA elements
regulating tissue-specific pre-mRNA alternative splicing. BMC Genomics 10, 4-13.
Wang Y, Wang J, Gao L, Lafyatis R, Stamm S, Andreadis A. (2005). Tau exons 2 and
10, which are misregulated in neurodegenerative diseases, are partly regulated by
silencers which bind a SRp30c.SRp55 complex that either recruits or antagonizes
htra2beta1. J Biol Chem 280(14), 14230-14239.
Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for
transcriptomics. Nat Rev Genet. 10, 57-63.
Wen J, Lease KA, Walker JC. (2004). DVL, a novel class of small polypeptides:
overexpression alters Arabidopsis development. Plant J 37(5); 668-677.
Will., C., Luhrmann, R. (2011). Spliceosome: structure and function. Cold Spring Harb
Perspec Biol 3(7), a003707.
Winkler W, Nahvi A, Breaker RR. (2002). Thiamine derivatives bind messenger RNAs
directly to regulate bacterial gene expression. Nature 419(6910), 952-956.
Wu TD, Nacu S. (2010). Fast and SNP-tolerant detection of complex variants and splicing in
short reads. Bioinformatics 26(7), 873-881.
59
Wu, X.H., Liu, M., Downie, B., Liang, C., Ji, G.L., Li, Q.Q., and Hunt, A.G. (2011).
Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for
extensive alternative polyadenylation. Proc Natl Acad Sci U S A 108, 12533-12538.
Wu, Z.P., Wang, X., and Zhang, X.G. (2011). Using non-uniform read distribution models
to improve isoform expression inference in RNA-Seq. Bioinformatics 27, 502-508.
Xiong L, Yang Y. (2003). Disease resistance and abiotic stress tolerance in rice are inversely
modulated by an abscisic acid-inducible mitogen-activated protein kinase. Plant Cell
15(3), 745-759.
Xu, H., Gao, Y., and Wang, J.B. (2012). Transcriptomic analysis of rice (Oryza sativa)
developing embryos using the RNA-Seq technique. PLoS One 7, 1-10.
Yamamoto, Y.Y., Yoshitsugu, T., Sakurai, T., Seki, M., Shinozaki, K., and Obokata, J.
(2009). Heterogenity of Arabidopsis core promoters revealed by high-density TSS
analysis. Plant J. 60, 350-362.
Yazaki J, Shimatani Z, Hashimoto A, Nagata Y, Fujii F, Kojima K, Suzuki K, Taya
T, Tonouchi M, Nelson C, Nakagawa A, Otomo Y, Murakami K, Matsubara
K,Kawai J, Carninci P, Hayashizaki Y, Kikuchi S. (2004). Transcriptional
profiling of genes responsive to abscisic acid and gibberellin in rice: phenotyping and
comparative analysis between rice and Arabidopsis. Physiol Genomics 17(2), 87-100.
Zhang, G.J., Guo, G.W., Hu, X.D., Zhang, Y., Li, Q.Y., Li, R.Q., Zhuang, R.H., Lu,
Z.K., He, Z.Q., Fang, X.D., Chen, L., Tian, W., Tao, Y., Kristiansen, K., Zhang,
X.Q., Li, S.G., Yang, H.M., Wang, J., and Wang, J. (2010). Deep RNA sequencing
at single base-pair resolution reveals high complexity of the rice transcriptome.
Genome Res. 20, 646-654.
Zhang XC, Gassmann W. (2007). Alternative splicing and mRNA levels of the disease
resistance gene RPS4 are induced during defense responses. Plant Physiol 145(4),
1577-1587.
Zhang XN, Mount SM. (2009). Two alternatively spliced isoforms of the Arabidopsis SR45
protein have distinct roles during normal plant development. Plant Physiol 150(3),
1450-1458.
Zhou X, Su Z. (2007). EasyGO: Gene Ontology-based annotation and functional enrichment
analysis tool for agronomical species. BMC Genomics 8(246), 17645808
Zmienko, A., Guzowska-Nowowiejska, M., Urbaniak, R., Plader, W., Formanowicz, P.,
and Figlerowicz, M. (2011). A tiling microarray for global analysis of chloroplast
60
genome expression in cucumber and other plants. Plant Methods 7, 1-17.
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
Download