Bioperl

advertisement
BioPerl - documentation
• Bioperl tutorial
http://www.bioperl.org/Core/Latest/bptutorial.html
• Mastering Perl for Bioinformatics: Introduction to Bioperl
http://www.oreilly.com/catalog/mperlbio/chapter/ch09.pdf
• Bioperl documentation
http://www.bioperl.org/Core/Latest/modules.html
• Modules listing.
• http://doc.bioperl.org/releases/bioperl-1.0.1/
• Unix help: man and perldoc.
• http://www.pasteur.fr/recherche/unites/sis/formation/bioperl/
Bio::Seq class structure
Building Sequence Bio::Seq
de novo:
use Bio::Seq;
my $seq1 = Bio::Seq->new ( -seq => 'ATGAGTAGTAGTAAAGGTA',
-id => 'my seq',
-desc => 'this is a new Seq');
from a file:
use Bio::SeqIO;
my $seqin = Bio::SeqIO->new ( -file => 'seq.fasta', -format => 'fasta');
my $seq3 = $seqin->next_seq();
my $seqin2 = Bio::SeqIO->newFh ( -file => 'seq.fasta', -format => 'fasta');
my $seq4 = <$seqin2>;
my $seqin3 = Bio::SeqIO->newFh ( -file => 'golden sp:TAUD_ECOLI |',
-format => 'swiss');
my $seq5 = <$seqin3>;
Building Sequence Bio::Seq
by fetching an entry from a remote database:
use Bio::DB::SwissProt;
my $banque = new Bio::DB::SwissProt;
my $seq6 = $banque->get_Seq_by_id('KPY1_ECOLI');
by fetching an entry from a bioperl indexed
local database:
use Bio::Index::Swissprot;
my $inx = Bio::Index::Swissprot->new( -filename => 'small_swiss.inx');
my $seq7 = $inx->fetch('MALK_ECOLI');
$seqobj->seq;
Relation of a SwissProt entry and the
corresponding Bio::Seq object components
analysis tools
Bio::Seq
# sequence as string
$seqobj->subseq(10,40);
# sub-sequence as string
$seqobj->trunc(10,100);
# sub-sequence as fresh Bio::PrimarySeq
$seqobj->translate;
statistical tools
Bio::Tools::SeqStats
$seq_stats = Bio::Tools::SeqStats->
new(-seq=>$seqobj);
$seq_stats->count_monomers();
$seq_stats->count_codons();
$weight = $seq_stats->get_mol_wt($seqobj);
An universal converter with SeqIO
use Bio::SeqIO;
my $in = Bio::SeqIO->new(-file => "inputfilename"
, '-format' => 'Fasta');
my $out = Bio::SeqIO->new(-file =>
">outputfilename" , '-format' => 'EMBL');
my $seq = $in->next_seq() ;
$out->write_seq($seq);
• Similarly, format conversions of alignment formats
using AlignIO, e.g. clustalw, fasta, phylip, Pfam,
msf, bl2seq, meme.
Align Classes diagram!!!
Run Clustalw and print the result on
standard output in Phylip format.
use Bio::Tools::Run::Alignment::Clustalw;
use Bio::AlignIO;
my $inputfilename = $ARGV[0];
my $outform = $ARGV[1];
my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
my $aln = $factory->align($inputfilename);
my $out = Bio::AlignIO->new ( -fh => \*STDOUT, -format => $outform );
$out->write_aln($aln);
Blast search
StandAloneBlast run
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;
my $Seq_in = Bio::SeqIO->new (-file => $query_file.faa, -format => 'fasta');
my $query = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp',
'database' => 'swissprot' );
$factory->e(1.0); # Setting parameters
$factory->outfile('blast.out'); # Saving output to file
my $blast_report = $factory->blastall($query);
# looking at the report: many results with multiple hits with multiple hsps…
my $result = $blast_report->next_result;
while( my $hit = $result->next_hit()) {
print "\thit name: ", $hit->name(), " significance: ", $hit->significance(), "\n";
while( my $hsp = $hit->next_hsp()) {
print “hsp length: ", $hsp->length(), "\n";
}}
Blast search
Parse Blast results on standard input
my $blast_report = new Bio::SearchIO ('-format' => 'blast', '-fh' => \*STDIN );
my $result = $blast_report->next_result;
Extracting alignmernts
my $pattern = $ARGV[1];
my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
while( my $hit = $result->next_hit()) {
if ($hit->name() =~ /$pattern/i ) {
while( my $hsp = $hit->next_hsp()) {
my $aln = $hsp->get_aln();
$out->write_aln($aln);
}}}
Blast Classes diagram
Extract translations from Genbank entry
Extract translations from Genbank entry
my $dbin = Bio::SeqIO->newFh ( -fh => STDIN, -format => $format );
my $out = Bio::SeqIO->newFh ( -fh => STDOUT, -format => 'fasta' );
while($entry = <$dbin>) {
my $entryid = $entry->id();
foreach my $feat ($entry->top_SeqFeatures()) {
if ($feat->primary_tag() eq 'CDS') {
my $id = $feat->has_tag('gene') ?
join(' ', $feat->each_tag_value('gene')) : 'no_gene';
my $acc = $feat->has_tag('protein_id') ?
join(' ', $feat->each_tag_value('protein_id')) : 'no_pid';
my $featseq = Bio::PrimarySeq->new();
my $translation = $feat->has_tag('translation') ?
join(' ', $feat->each_tag_value('translation')) :
if ($translation eq ``) {print "can't get the correct seq of $id|$acc **\n";
next; }
$featseq->seq($translation);
$featseq->id("$id|$acc");
print $out $featseq;
}}}
Download