Perl for biologists PERL More references, complex data types and objects A. Emerson, Perl for Biologists Perl for biologists References in Perl An alternative mechanism of using variables such as arrays or associative arrays. Useful to think of a reference as the address of the object in memory. Often very efficient and convenient because references are always stored in scalar variables, regardless of the size or complexity of the thing being referenced. Can be used to avoid the difficulty of passing multiple arrays in or out of subroutines. In Perl you can take a reference of just about anything, including subroutines. A. Emerson, Perl for Biologists Perl for biologists Creating references and dereferencing @dna1=(G,G,T,C,T,G); $dna_ref = \@dna1; # storing a reference in scalar add_seqs(\@dna1,\@dna2); # implicit references $ref_ref = \$dna_ref; # reference of a reference # dereferencing @array = @{$dna_ref}; $base = $$dna_ref[1]; # careful! dereferencing occurs $$dna_ref[0]=‘C’; # before array lookup. (We are not # dereferencing @dna_ref) $base = $dna_ref->[1]; # alternative notation (clearer) It may help to consider a reference as an alias of the variable being referenced. A. Emerson, Perl for Biologists Perl for biologists Two dimensional arrays References can be used to create data types not present in standard Perl, e.g. matrices (n-dimensional arrays) @row1=(1.0,0.0,1.0); @row2=(0.0,1.0,0.0); @row3=(1.0,0.0,1.0); @matrix=(\@row1,\@row2,\@row3); # perl simulation of a 2-d matrix(3,3) for(my $i=0;$i<3; $i++) { for(my $j=0;$j<3;$j++) { print “$matrix[$i]->[$j]; } print “\n”; } Strictly speaking this is a an array of references (to other arrays) but is more flexible because the rows can be of different lengths The above solution though creates unnecessary arrays (@row1, etc) – we would like to create the matrix directly. A. Emerson, Perl for Biologists Perl for biologists Anonymous arrays and hashes Perl provides an alternative method for creating arrays and hashes: # standard array definition @dna1=(G,G,T,C,T,G); # anonymous definition using a reference $dna = [ A, A, A, A ]; # NB [] brackets! $dna->[0] = ‘T’; The new array does not have a name (it is “anonymous”) but can be referenced. A. Emerson, Perl for Biologists Perl for biologists Using anonymous arrays # defining a 2D matrix in one line @matrix = ( [1.0,0.0,1.0], [0.0,1.0,0.0], [1.0,0.0,1.0]); print “$matrix[1]->[1]\n”; # or even $matrix = [ [1.0,0.0,1.0], [0.0,1.0,0.0], [1.0,0.0,1.0]]; $matrix->[1]->[1]=2.0; A. Emerson, Perl for Biologists Perl for biologists Using anonymous arrays Between two subscripts the -> is optional # defining a 2D matrix in one line @matrix = ( [1.0,0.0,1.0], [0.0,1.0,0.0], [1.0,0.0,1.0]); print “$matrix[1]->[1]\n”; # alternatively print “$matrix[1][1]\n”; Similarly for 3d arrays, hashes,etc. A. Emerson, Perl for Biologists Perl for biologists Using anonymous arrays Assigning one array ref to another doesn’t copy the array: @array=(0,0,0,0,0); $aref=\@array; $bref=$aref; $aref->[1]=99; print “@$aref \n @$bref\n”; -you can use an anonymous array $aref=\@array; $bref=[@$aref]; $aref->[1]=99; print “@$aref \n @$bref\n”; A. Emerson, Perl for Biologists Perl for biologists Using anonymous arrays and hashes Anonymous hashes can be similarly created using { }: # Anonymous hashes $code={AAA=>stop, CGT=> ‘ser’}; $code->{UAC}=‘tyr’; foreach $key (keys %{$code}) { print $code->{$key},”\n”; } A. Emerson, Perl for Biologists Perl for biologists Using anonymous arrays and hashes You can mix anonymous arrays and hashes to create compilcated structures: my $planck={home => 'csc', name=>"PLANCK", accounts=>['csc07141','csc18709','csc18091','csc11014'],all ocated => 500000}; my $muheart={home =>’epcc’,name=>”MuHeart”, accounts=>[‘hpx001’,’hpx0002’,’hpx0003’],allocated=>100000} ; notice my @projects=($planck,$muheart); foreach $project (@projects) { print $project->{name},”\n”; foreach $acc (@{$project->{accounts}}) { print “$project->{home} $acc\n”; } } A. Emerson, Perl for Biologists Perl for biologists References to functions and other things You can make references to anything, including scalars, functions, and other references. # reference to a sub $coderef = sub { print "Boink!\n" }; $coderef->(); This looks a bit strange at first but becomes important in object-oriented programming. A. Emerson, Perl for Biologists Perl for biologists Objects Simple scalars, arrays, hashes or more complex structures are often given the generic term object. Objects can created, destroyed, copied or passed between one part of the program and another. They can be combined to give other objects and you can access them via references. In languages such as C or Fortran (90+) new objects can defined by creating new data types which extend the built in set of int, double, real, character, etc. /* C program */ typedef struct { float i; /*real part*/ float j; /*imag part */ } complex; complex a,b; A. Emerson, Perl for Biologists Perl for biologists Objects In Object-oriented programming languages (e.g. C++ or Java) objects don’t just contain data but also program code which governs how the object interacts with the rest of the program. The code in the form of method functions can be used to create the object (“the constructor”) or to operate on other objects. The program is no longer written as a set of sequential instructions but instead a collection of interacting objects. Often a very convenient and natural way of representing a programming problem. A. Emerson, Perl for Biologists Perl for biologists Implementation of objects STATE BEHAVIOUR State is usually held as local variables (also called properties), quantities sometimes not visible outside the object (data hiding) Behaviour controlled by method functions or subroutines which act on the local variables and interface with the outside. A. Emerson, Perl for Biologists Perl for biologists C++ objects // template definition class CRectangle { // state data int x, y; public: // method functions void setvalues (int a,int b){x=a;y=b;} int area (){ return (x*y);} } // main code CRectangle recta, rectb; recta.setvalues(3,4); cout << recta.area() << endl; A. Emerson, Perl for Biologists Perl for biologists OOP – key concepts Classes The templates used to define the objects. Note that defining the class does not actually create the object. Instantiation When the object is created (an object is an instance of a class). Properties Data about the object itself. Can be private (not directly accessible by other objects ) or public (accessible). Method functions Program code used to define the behaviour or functionality of the object. Special functions called constructors create the object. Inheritance Deriving new classes from previously defined classes. Saves programming effort and can create object heirarchies. A. Emerson, Perl for Biologists Perl for biologists Key feature of OOP - Inheritance Important ability of any OOP is the ability to derive one object from a more general class of related objects: this is called inheritance. A standard eukaryotic cell • nucleus, cell membrane, cytoplasm, alive or dead • undergoes division, makes proteins from DNA white blood cell nerve cell skin cell A. Emerson, Perl for Biologists Perl for biologists OOP and Perl Perl was never designed as a true OOP language. The implementation is rather ad-hoc, and the OOP syntax is non-standard. Creating Perl objects is rarely done but it is important to know how to use them because library packages (e.g. from CPAN) often use them. Using and managing Perl objects requires the heavy use of references. A. Emerson, Perl for Biologists Perl for biologists PERL Objects - Example # Perl Database object package # (contains definitions of the DB object) use DBI; my $dsn=“mysql.cineca.it –p 2222”; my $user=“test01”; my $password=“testtest”; # create a DB object, referenced by $dbh my $dbh = DBI->connect($dsn, $user, $password, { RaiseError => 1, AutoCommit => 0 }); ..... do something $dbh->commit; $dbh->disconnect; A. Emerson, Perl for Biologists Perl for biologists Summary References, together with anonymous arrays and hashes, can be used to create complex data structures which arent present in standard Perl (e.g. 2D arrays or tables). An important use of references is in object –oriented programming. Most Perl programmers do not write object definitions in Perl but libraries (e.g. BioPerl) often use them. A. Emerson, Perl for Biologists