Perl Laboratory Study Guide – Section III

advertisement
Perl Laboratory Study Guide – Section III
9. Hashes


There are three main data types in perl: scalar variables, arrays and hashes. Hashes provide very fast
nested array look-ups. The format is similar to that of an array.
o %hash = (‘key’ => ‘value’);
o $value = $hash{‘key’};
You may declare a hash using almost any delimiter.
%array = (
‘key1’,
‘key2’,
‘key3’,
);
‘value1’,
‘value2’,
‘value3’,
%array = (
‘key1’=>
‘key2’=>
‘key3’=>
);
‘value1’,
‘value2’,
‘value3’,




The keys and values in a hash may be addressed as arrays.
o @keys = keys %hash;
o @values = values %hash;
In perl there is technically no such animal as an associative array, but we can use them anyway. They
work like hashes (sort-of), but are much slower. $array[$i] -> [$j] produces the array
$array[$i][$j]
For ex9-1.pl, write a script that stores the following values as both an ‘associative array’ and as a
hash: {[1][a], [1][b], [1][c]}, {[2][a],[2][b],[2][c]},
{[3][a],[3][b],[3][c]}. You may, of course, hard code the population of these data types,
but populating them using for loops is probably more helpful in the long run.
Modify ex9-1.pl to sort both the keys and values of arrays/hashes using numeric and lexicological
sorting, and print the results to the screen. Don’t freak out … this isn’t that hard.
o To sort arrays alphabetically: @array = sort @array;
o To sort arrays numerically: @array = sort{$a <=> $b}@array;
o To sort keys and values: foreach ( sort keys (%hash)) { print “$_\t”,
“*” x $hash{$_},”\n”; }
o To sort keys in ascending order: foreach (sort {$hash{$b}<=>$hash{$_}}
keys (%hash)) { …… }
10. Hashes and the Genetic Code


As you have no doubt figured out from your fist assignment, there are numerous ways to return the
correct codon based on the appropriate tri-nucleotide combination.
Here’s the most difficult method:
sub codonReplacement {
my($codon) = @_;
return s if ($codon =~ /TCA/i );
return s elseif ($codon =~ /TCC/i);
return s elseif ($codon =~ /TCG/i);
…
}

Here is a better method:
sub codonReplacement {
my($codon) = @_;
return A if ($codon =~ /GC./i );
return C elseif ($codon =~ /TG[TC]/i);
return D elseif ($codon =~ /GA[TC]/i);
…
}

Here is the best method:
sub codonReplacement {
my($codon) = @_;
$codon uc $codon;
my(%genetic_code) = (
‘TCA’ => ‘S’, ‘TCC’ => ‘S’, ‘TCG’ => ‘S’ …. );
return $genetic_code{$codon} if (exists $genetic_code{$codon})
}

The ‘best’ method is not merely a thought exercise. Accurately reproducing this hash table will make
the rest of your semester much easier. This is up to you to finish.
11. A Sample Program
By now I’m certain that you are getting fairly proficient at perl. This is a good thing. Type in the sample
program below, ex11-1.pl, and think about the consequences of its findings. (I know that it’s a bit long, but
you just might learn something.)
A program to simulate the percentage of similar DNA in random seq
#!/usr/bin/perl –w
use strict;
#declare and initialize variables
my $percent;
my @percentages;
my $result;
#initialize an array to store DNA
my @randomDNA = ();
#Seed the random number generator
srand(time|$$);
#Generate ten random DNA sets using a subroutine
@randomDNA = make_random_DNA_set(10,10,10);
#iterate through all pairs of sequences
for (my $k = 0; $k < scaler @randomDNA-1; ++$k) {
for (my $i = ($k + 1); $i < scaler @random_DNA; ++$i) {
$percent = matching_percentage($random_DNA[$k],
$random_DNA[$i]);
puch(@percentages, $percent);
}
}
#Average the result
$result = 0;
foreach (@percentages) $result += $_;
$result = ($result / scaler(@percentages))*100;
print “The average percentage of matching positions is “;
print “$results\n\n”;
exit;
#Make a random set of DNA
sub make_random_DNA_set {
my($minLen, $maxLen, $sizeOfSet) = @_;
#length of DNA fragment, each fragment, set
my $length; my $dna; my @set;
#create a set of random DNA
for (my $i=0; $i < $sizeOfSet; ++$i) {
#find a random length
$length = random_length($minLen, $maxLen);
#make a random DNA fragment
$dna = make_random_DNA($length);
#add DNA fragment to @set
push(@set, $dna);
}
return @set;
}
#find random length between x and y
sub random_length {
my($minLen, $maxLen) = @_;
return (int(rand($maxLen - $minLen+1))+$minLen);
}
#pick random nucleotide
sub randomnucleotide {
my(@nucleotides) = (‘A’,’C’,’G’,’T’);
return randomelement(@nucleotides);
}
#randomly select element from array
sub randomelement {
my(@array) = @_;
return $array[rand @array];
}
#make_random DNA
sub make_random_DNA {
my($length) = @_;
my $dna;
for (my $i=0; $i < $length; ++$i) {
$dna .= randomnucleotide();
}
return $dna;
}
#matching percentage
sub matching_percentage {
my($string1, $string2) = @_;
my($length) = length($string1);
my($position);
my($count) = 0;
for ($position=0; $position<$length; ++$position) {
if(substr($string1,$position,1) eq
substr($string2,$position,1)) {
++$count; }
}
return $count/$length;
1.
2.
3.
4.
How do you think the percentage of matching DNA generated from this random generator relates to
percentages of matching DNA generated from real sets of DNA randomly extracted from different
genomes?
How can you modify this script to prove your theory?
Does the percent randomness change as the random fragment length increases?
Is the subroutine matching_percentage the best method for finding the matching percentage?
Download