Powerpoint for lesson

advertisement
Introduction to Gene Mining
Part B: How similar are plant and human
versions of a gene?
After completing part B, you will demonstrate
How to use NCBI BLASTp and www.Araport.org
data to determine whether Arabidopsis
thaliana and human muscle protein genes and
gene products are homologous.
1
The Arabidopsis Information Portal is funded by a grant from
the National Science Foundation (#DBI-1262414)
and co-funded by a grant from the Biotechnology and
Biological Sciences Research Council (BB/L027151/1).
These lessons were developed during the summer of 2015 as
education outreach for the www.Araport.org portal in
conjunction with the J. Craig Venter Institute, Rockville, MD,
20850, USA.
Contact information
General information: araport@jcvi.org
Jason Miller, Grant Co-Principal Investigator, JCVI
jmiller@jcvi.org
This lesson was prepared by Andrea Cobb, Ph.D.
(adcobb@fcps.edu)
with the help of Margot Goldberg
(mgoldberg1@pghboe.net)
2
In Part A, our sample question was:
Can we study your muscle
disease using a plant model?
3
We used the NCBI portal to find names
of human muscle genes.
4
We also found the function of human actin-alpha 1 gene
( ACTA1) and asked “ Might plants need that same function?”
5
We used NCBI BLASTn to
search in Arabidopsis thaliana
for genes which align to human ACTA1
.
6
We learned that “alignment” is achieved by using an
algorithm that maximizes local matches between two
sequences.
7
We learned how to use the BLASTn report scores with
Query cover, Ident and the E-values to choose a
statistically meaningful alignment.
8
--Gene Discovery Scorecard
In a group of 3-4 students, examine your gene
discovery scorecard and then:
Infer characteristics of genes which were
in both A. thaliana and humans.
Identify characteristics of genes present in
humans but not found in plants.
9
What information so far indicates
whether or not plants have animal
muscle genes?
What additional information might
you need to be certain whether or
not plants have animal muscle
genes?
10
Part B: Evaluating homology- How similar are
plant and human versions of a gene?
11
Recipes handed down often change
12
Which parts of the recipes were conserved (were
almost the same) in all generations’ recipes?
Which parts were not conserved?
13
Reasons why a recipe might be changed
•
Discuss in groups and report your ideas.
14
How might you track the passage of a
recipe from one generation to the
next if you can’t ask the cooks?
?
15
How is a gene like a recipe?
• Discuss in groups and report your ideas.
16
What features of
a gene might
make it a version
of another
gene?
Record your
answers.
https://www.youtube.com/watch?
v=gCxrkl2igGY is a song you might
remember.
17
Explore
• What is homology?
• What criteria do scientists use to
classify particular genes and their protein
products as homologs?
18
• Homology- a general term describing 2 or
more genes which share an ancestral gene
• How might recipes be “homologous”?
19
To use a plant model for my
patient’s disease, I need to find a
plant homolog to his ACTA1 gene.
We found that the Arabidopsis
thaliana ACT7 gene is a version,
but is it similar enough to be a
homolog?
20
Should we search for homologs using a gene
sequence or a protein sequence?
21
The structure of a eukaryotic gene is complex!
Translation (protein synthesis)
http://nitro.biosci.arizona.edu/courses/EEB600A2003/lectures/lecture24/lecture24.htm
l
The amino acid sequence of the
protein is more likely to be
conserved than the gene sequence
22
A BLASTp using the gene product’s amino acid
sequence is likely to find protein homologs
A BLASTn might find more differences than similarities
23
We will use a protein BLAST tool, BLASTp, to find homologous
proteins. We need to first find the protein sequence coded by the
human ACTA1 gene on the NCBI protein page.
24
From the ACTA1 protein information page, select
FASTA, then copy and paste the amino acid sequence
Each amino acid
into a Word Document.
is represented by
a particular letter
>gi|49168518|emb|CAG38754.1|
ACTA1 [Homo sapiens]
MCDEDETTALVCDNGSGLVKAGFAGDD
APRAVFPSIVGRPRHQGVMVGMGQKD
SYVGDEAQSKRGILTLK
YPIEHGIITNWDDMEKIWHHTFYNELRV
APEEHPTLLTEAPLNPKANREKMTQIMF
ETFNVPAMYVAIQA
VLSLYASGRTTGIVLDSGDGVTHNVPIYE
GYALPHAIMRLDLAGRDLTDYLMKILTER
GYSFVTTAEREI
VRDIKEKLCYVALDFENEMATAASSSSLEK
SYELPDGQVITIGNERFRCPETLFQPSFIG
MESAGIHETT
YNSIMKCDIDIRKDLYANNVMSGGTTMY
PGIADRMQKEITALAPSTMKIKIIAPPERK
YSVWIGGSILAS
LSTFQQMWITKQEYDEAGPSIVHRKCF 25
Navigate to the BLASTp link on NCBI.
26
Paste the protein
sequence for
ACTA1 here.
Enter Arabidopsis
thaliana for the
search database.
Select blastp and
then click on the
BLAST button.
27
The BLASTp report is similar to the
BLASTn report.
Query sequence
28
“Descriptions” shows 4 actins with the same
query coverage, E-value and Ident!
There appear to be 4 possible homologous
proteins but which is most similar to the human
ACTA1 protein?
29
There are a number of actin proteins with high Query coverage, very low E-values and
high identity. Check them all (for some whose numbers are represented more than
once, check the first listing). Then select “Multiple Alignment” to directly compare those
sequences.
30
Conserved amino
acids are shown in
red. Which
differences can you
find quickly?
Can you spot a deletion?
Where is an amino acid
replaced by a chemically
similar type?
Where is an amino acid
replaced by a chemically
different type?
31
Protein sequence homology is analyzed by constructing a
Distance tree of results. Check the desired
“hits”, then select “Distance tree”.
32
Query—human ACTA1
protein
Nodes represent a
shared ancestral gene
These proteins
are all homologs.
33
34
Of the proteins in Arabidopsis thaliana, ACT7
has the highest identity (88%) and lowest Evalue (0.0) when compared to human ACTA1.
A gene tree program predicts the presence of
ancestral genes between ACT7 and ACTA1.
Is that sufficient to confirm protein homology
for experimental modeling?
35
A more restricted alignment between human ACTA1 and the
closest 3 Arabidopsis proteins can check that ACT7 is the
protein closest to the ancestral gene.
Check Align
two or more
sequences,
then copy and
past protein
sequences for
ACT7, ACT8
and ACT2 into
Subject
Sequence box.
36
Multiple alignment results for human ACTA1 protein and the 3
closest Arabidopsis proteins.
37
What do the
distance tree results
indicate?
38
Do you have enough data to use Arabidopsis
ACT7 gene as a model for the human ACTA1
gene?
Discuss and report your ideas.
39
What criteria from published work indicated that these
plant processes and human diseases involved
homologous genes or proteins ?
40
Homologous proteins will have:
• Very low E-values for sequence alignment(< .00001)
• >25% conserved sequences for >100 aa*
• Protein-protein interactions of one homolog which
are similar to protein-protein interactions of the
other homolog
• Similar co-expression of genes for each homolog
• Similar Function Gene Ontology (GO terms)
• Conserved sequences and protein domains
*
http://jura.wi.mit.edu/bio/education/hsteache
rs2012/form_blast_intro.pdf
41
Let’s find homology information and data about the
Arabidopsis ACT7 gene in http://www.Araport.org
Use the pull-down menu to access the ThaleMine tool.
42
Enter
information
about your
gene of
interest, in this
case, ACT7
43
Results show 1 gene, 2
articles and 1 mRNA in
the database.
We are only
interested in
studying the
gene for now,
so we will
select the
category –
Gene or just
select the
identifier for
the gene from
the list at right
44
This is the Gene information sheet for the Arabidopsis thaliana ACT7 gene.
How did the function listed under Curator Summary compare to your
previous prediction?
45
The blue bar under Curator
Summary has tabs that take you
quickly to that section down the
page. Click on the Homology tab.
Links to information
about human ACT7
homologs.
46
Homologous proteins will have:
•
•
•
•
Very low E-values for sequence alignment
(< .00001)
>25% conserved sequences for > 100 aa*
Protein-protein interactions of one homolog
which are similar to protein-protein interactions
of the other homolog
• Similar co-expression of genes for each homolog
• Similar Function Gene Ontology (GO terms)
• Conserved protein domains
*
http://jura.wi.mit.edu/bio/education/hsteache
rs2012/form_blast_intro.pdf
47
Compare the first (human ACTA1) and second (Arabidopsis ACT7)
sequences in each alignment and it is evident that many more than
25% of any 100 amino acids in any of the regions align.
48
Homologous proteins will have:
•
•
•
•
Very low E-values for sequence alignment
(< .00001)
>25% conserved sequences for > 100 aa*
Protein-protein interactions of one homolog
which are similar to protein-protein interactions
of the other homolog
• Similar co-expression of genes for each homolog
• Similar Function Gene Ontology (GO terms)
• Conserved protein domains
*
http://jura.wi.mit.edu/bio/education/hsteache
rs2012/form_blast_intro.pdf
49
Actin interacts with many proteins
https://www.youtube.com/watch?v=FzcTgrxM
zZk
50
ACT7 and ACTA1 proteins each interact with a variety of other proteins.
Because the same protein may have a plant name and a different animal
name, further investigation is needed to know from this data whether ACTA1
and ACT7 are interacting with identical proteins.
Arabidopsis ACT7 interacts with
these proteins
Human ACTA1 interacts with
these proteins
51
Homologous proteins will have:
•
•
•
•
Very low E-values for sequence alignment
(< .00001)
>25% conserved sequences for > 100 aa*
Protein-protein interactions of one homolog
which are similar to protein-protein interactions
of the other homolog ??
• Similar co-expression of genes for each homolog
• Similar Function Gene Ontology (GO terms)
• Conserved protein domains
*
http://jura.wi.mit.edu/bio/education/hsteache
rs2012/form_blast_intro.pdf
52
Co-expression (transcription of 2 or more
genes at the same time in the same cell)
is required for gene products (proteins)
to work together.
In the image above, two differently colored fluorescent proteins are co-expressed
in Arabidopsis.
http://www.frontiersin.org/files/Articles/9615
0/fpls-05-00426-HTML/image_m/fpls-0500426-g001.jpg
53
What genes are co-expressed
(same time, same location) for ACT7 or ACTA1?
Arabidopsis ACT7
is co-expressed with these genes
Scientists would need to confirm that the
different plant and animal names were actually
the same protein.
Human ACTA1 co-expression is
shown with purple lines.
54
Homologous proteins will have:
•
•
•
•
Very low E-values for sequence alignment
(< .00001)
>25% conserved sequences for > 100 aa*
Protein-protein interactions of one homolog which
are somewhat similar to protein-protein
interactions of the other homolog ??
• Some similar co-expression of genes for each
homolog ??
• Some similar Function Gene Ontology (GO terms)
• Conserved protein domains
*
http://jura.wi.mit.edu/bio/education/hsteache
rs2012/form_blast_intro.pdf
55
Gene Ontology provides information about biological
process, molecular function and cellular location –are
any ACT7 GO terms similar to human ACTA1 GO terms?
Arabidopsis ACT7
Human ACTA1
56
Homologous proteins will have:
•
•
•
•
Very low E-values for sequence alignment
(< .00001)
>25% conserved sequences for > 100 aa*
Protein-protein interactions of one homolog which
are somewhat similar to protein-protein
interactions of the other homolog ??
• Some similar co-expression of genes for each
homolog ??
• Some similar Function Gene Ontology (GO terms)
• Conserved protein domains
*
http://jura.wi.mit.edu/bio/education/hsteache
rs2012/form_blast_intro.pdf
57
58
Homologous proteins will have:
•
•
•
•
Very low E-values for sequence alignment
(< .00001)
>25% conserved sequences for > 100 aa*
Protein-protein interactions of one homolog which
are somewhat similar to protein-protein
interactions of the other homolog ??
• Some similar co-expression of genes for each
homolog ??
• Some similar Function Gene Ontology (GO terms)
• Conserved protein domains
*
http://jura.wi.mit.edu/bio/education/hsteache
rs2012/form_blast_intro.pdf
59
Members of the Arabidopsis actin family of genes are homologous
with each other. Does that mean that the Arabidopsis actins are
60
homologous with human ACTA1?
Arabidopsis actin gene ACT7 plays an essential role in germination and root growth
Wild-type, no
ACT7 mutation
Wild-type, no
ACT7 mutation
The Plant Journal
Volume 33, Issue 2, pages 319-328, 16 JAN 2003 DOI: 10.1046/j.1365-313X.2003.01626.x
http://onlinelibrary.wiley.com/doi/10.1046/j.1365-313X.2003.01626.x/full#f2
Mutant ACT7+
We have an
ACT7
mutant with
an
observable
phenotype
difference
compared to
the normal
wild type.
Mutant ACT7+
61
Have we found a suitable plant research model
for nemaline myopathy?
What additional information would you want?
Scientific literature searches for Arabidopsis
information are easy to access in
http:www.Araport.org  apps  50 years of
Arabidopsis research!
62
Download