List of scalar values (like array)
Elements referred to by
, not index number
Elements stored as a list of
pairs
%threeletter = ('A','ALA','V','VAL','L','LEU'); key value key value key value print $threeletter{'A'}; # “ALA” print $threeletter{'L'}; ?
exists checks if a specific hash key exists if ($threeletter{'E'}) print ($threeletter{'E'}); ?
print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key};
%threeletter = ('A','ALA','V','VAL','L','LEU');
keys values returns a list of all keys returns a list of all values
each returns one key-value pair each time it’s called
($key, $val) = each %threeletter;
Unlike array, not an ordered list (order of determined by the Perl interpreter) key-value pairs foreach $k ( keys %threeletter ) { print $k;}
# Might return, for instance, “A L V”,
# not “A V L” (need not to be sorted) foreach $v ( values %threeletter ) { print $v;} ?
Some common functions:
keys(%hash) #returns a list of all the keys values(%hash) #returns a list of all the values each(%hash) #each time this is called, it will
#return a 2 element list
#consisting of the next
#key/value pair in the array delete($hash{[key]}) #remove the pair associated
#with key
A way to organize a program
Wrap up a block of code
Have a name
Provide a way to pass values to the block and report back the results
# define a subroutine sub myblock { my ($arg1, $arg2, $arg3, …, $argN) = @_;
# @_ is special variable containing args print " Please enter something: ";
}
# function call myblock($arg1, $arg2, …, $argN);
Example sub add8A { my ($rna) = @_;
$rna .= "AAAAAAAA"; return $rna;
}
#the original rna
$rna = "CGAAUCUAGGAU " ;
$longer_rna = add8A($rna); print " I added 8 As to $rna to get
$longer_rna.\n";
sub denaturizing { my (@products) = @_; my @strands = (); foreach $pairs (@products) {
($A,$B) = split /\s/, $pairs;
@strands = (@strands, $A, $B);
} return @strands;
}
#templates are in the form "A B". Ex. “ACGT TGCA”
@Denatured = denaturizing(@PCRproducts);
A variable $a is used both in the subroutine and in the main part program of the program.
use strict;
$a = 0; print " $a\n "; sub changeA {
$a = 1;
} print " $a\n "; changeA(); print " $a\n "; my $a = 0; print " $a\n "; sub changeA { my $a = 1;
} print " $a\n "; changeA(); print " $a\n ";
The value of $a is printed three times. Can you guess what values are printed?
$a is a global variable
#!/usr/bin/perl -w
$dna = 'AAAAA';
$result = A_to_T($dna); print "I changed all the A's in $dna to T's and got
$result\n\n";
#############################################
# Subroutines sub A_to_T { my($input) = @_;
$dna = $input;
$dna =~ s/A/T/g; return $dna;
}
Regular Expressions: Language for specifying text strings
Regular Expressions is a mechanism for specifying character patterns
Useful for
Finding files by name
Finding text in a file
Finding (or not finding) interesting text in a string
Text based search and replace
Finding and extracting text
Problem: find an ORF in nucleotide sequence
Look for start (ATG) and stop codons (TAA, TAG, TGA)
Pattern search operator: m// or //
$string =~ /<pattern>/ returns true if the pattern matches somewhere in $string , false otherwise
Example:
$dna = "GATGCCATGACACTGTTCA"; if ($dna =~ /ATG/){ print "starting codon is there";
} else { print "no starting codon!\n";
}
Optional characters ?
, * and +
/colou ?
r/ color or colour
?
(0 or 1)
/oo * h!/ oh!
or ooh!
or ooooh!
* (0 or more)
/o + h!/ oh!
or ooh!
or ooooh!
+ (1 or more)
Wild cards .
/beg .
n/ begin or began or begun
* +
Stephen Cole Kleene
White-space characters \t (tab), \n (newline), \r (return)
\s x
.
^r
: match a whitespace character
: character 'x'
: any character except newline
: match at beginning of line r$ r|s
(r)
[xyz]
: match at end of line
: match either or
: group characters (to be saved in $1, $2, etc)
: character class , in this case, matches either an 'x', a 'y', or a 'z'
[abj-oZ] : character class with a range in it; matches 'a', 'b', any letter from 'j' through 'o', or 'Z' r* r+
: zero or more r's, where r is any regular expression
: one or more r's r?
: zero or one r's (i.e., an optional r)
{name} : expansion of the "name" definition rs : RE r followed by RE s (e.g., concatenation)
Ex1:
$dna = AGGCTCGTACGACG; if( $dna =~ /CT[CGT]ACG/ ) { print "I found the motif!!\n"; #?
}
Ex2: Find an ORF in nucleotide sequence (look for start
(ATG) and stop codons (TAA, TAG, TGA))
$dna = "tatggagcctcctgaggctacagccacacctgagccactctaaga";
?