Perl_3

advertisement

Programming and Perl for

Bioinformatics

Part III

Basic Data Types

Perl has three basic data types :

scalar

array (list)

associative array (hash)

Associative Arrays/Hashes

List of scalar values (like array)

Elements referred to by

key

, not index number

Elements stored as a list of

key-value

pairs

%threeletter = ('A','ALA','V','VAL','L','LEU'); key value key value key value print $threeletter{'A'}; # “ALA” print $threeletter{'L'}; ?

exists checks if a specific hash key exists if ($threeletter{'E'}) print ($threeletter{'E'}); ?

print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key};

Getting all keys and values in a hash

%threeletter = ('A','ALA','V','VAL','L','LEU');

 keys values returns a list of all keys returns a list of all values

 each returns one key-value pair each time it’s called

($key, $val) = each %threeletter;

Unlike array, not an ordered list (order of determined by the Perl interpreter) key-value pairs foreach $k ( keys %threeletter ) { print $k;}

# Might return, for instance, “A L V”,

# not “A V L” (need not to be sorted) foreach $v ( values %threeletter ) { print $v;} ?

Associative Arrays

Some common functions:

 keys(%hash) #returns a list of all the keys values(%hash) #returns a list of all the values each(%hash) #each time this is called, it will

#return a 2 element list

#consisting of the next

#key/value pair in the array delete($hash{[key]}) #remove the pair associated

#with key

More on Perl

Subroutines and Functions

A way to organize a program

Wrap up a block of code

Have a name

Provide a way to pass values to the block and report back the results

Regular expression

Basics about Subroutines

# define a subroutine sub myblock { my ($arg1, $arg2, $arg3, …, $argN) = @_;

# @_ is special variable containing args print " Please enter something: ";

}

# function call myblock($arg1, $arg2, …, $argN);

Example sub add8A { my ($rna) = @_;

$rna .= "AAAAAAAA"; return $rna;

}

#the original rna

$rna = "CGAAUCUAGGAU " ;

$longer_rna = add8A($rna); print " I added 8 As to $rna to get

$longer_rna.\n";

More example

sub denaturizing { my (@products) = @_; my @strands = (); foreach $pairs (@products) {

($A,$B) = split /\s/, $pairs;

@strands = (@strands, $A, $B);

} return @strands;

}

#templates are in the form "A B". Ex. “ACGT TGCA”

@Denatured = denaturizing(@PCRproducts);

Variables Scope

A variable $a is used both in the subroutine and in the main part program of the program.

use strict;

$a = 0; print " $a\n "; sub changeA {

$a = 1;

} print " $a\n "; changeA(); print " $a\n "; my $a = 0; print " $a\n "; sub changeA { my $a = 1;

} print " $a\n "; changeA(); print " $a\n ";

The value of $a is printed three times. Can you guess what values are printed?

$a is a global variable

Ex: What would be the output?

#!/usr/bin/perl -w

$dna = 'AAAAA';

$result = A_to_T($dna); print "I changed all the A's in $dna to T's and got

$result\n\n";

#############################################

# Subroutines sub A_to_T { my($input) = @_;

$dna = $input;

$dna =~ s/A/T/g; return $dna;

}

Output?

Regular Expressions

Regular Expressions: Language for specifying text strings

Regular Expressions is a mechanism for specifying character patterns

Useful for

Finding files by name

Finding text in a file

Finding (or not finding) interesting text in a string

Text based search and replace

Finding and extracting text

Pattern Finding

Problem: find an ORF in nucleotide sequence

Look for start (ATG) and stop codons (TAA, TAG, TGA)

Pattern search operator: m// or //

$string =~ /<pattern>/ returns true if the pattern matches somewhere in $string , false otherwise

Example:

$dna = "GATGCCATGACACTGTTCA"; if ($dna =~ /ATG/){ print "starting codon is there";

} else { print "no starting codon!\n";

}

Regular Expressions

Optional characters ?

, * and +

/colou ?

r/  color or colour

?

(0 or 1)

/oo * h!/  oh!

or ooh!

or ooooh!

* (0 or more)

/o + h!/  oh!

or ooh!

or ooooh!

+ (1 or more)

Wild cards .

/beg .

n/  begin or began or begun

* +

Stephen Cole Kleene

Common Regular Expressions

White-space characters \t (tab), \n (newline), \r (return)

\s x

.

^r

: match a whitespace character

: character 'x'

: any character except newline

: match at beginning of line r$ r|s

(r)

[xyz]

: match at end of line

: match either or

: group characters (to be saved in $1, $2, etc)

: character class , in this case, matches either an 'x', a 'y', or a 'z'

[abj-oZ] : character class with a range in it; matches 'a', 'b', any letter from 'j' through 'o', or 'Z' r* r+

: zero or more r's, where r is any regular expression

: one or more r's r?

: zero or one r's (i.e., an optional r)

{name} : expansion of the "name" definition rs : RE r followed by RE s (e.g., concatenation)

Exercise

Ex1:

$dna = AGGCTCGTACGACG; if( $dna =~ /CT[CGT]ACG/ ) { print "I found the motif!!\n"; #?

}

Ex2: Find an ORF in nucleotide sequence (look for start

(ATG) and stop codons (TAA, TAG, TGA))

$dna = "tatggagcctcctgaggctacagccacacctgagccactctaaga";

?

Download