CSE 305 Introduc0on to Programming Languages Lecture 6 CSE @ SUNY-­‐Buffalo Zhi Yang Courtesy of University of Wales Swansea Department of Computer Science Courtesy of Alan Ableson at Queen’s University Courtesy of Pearson Addison-­‐Wesley and Robert W. Sebesta Courtesy of ch.EMBnet.org No0ce Board • Homework3 will be posted early next week, and it is due on 11:59pm June 26, 2013 (Wednesday). You can schedule your 0me accordingly. Learning Tools • Homework3 will use Fortran, which you have learnt in lecture4, and in Homework4(not showing up yet), you will use Perl(which we will talk about today). • Geang familiar with tools like gfortran, f77, perl, is important to finish homework, so so so sorry to men0on ~~~ How to use them • Unix based computers are usually equipped with perl, so you can open a command window and type perl ~~~ • You want to use it on Windows, please goto hfp://www.perl.org/get.html Why DFA ? Languages • In this sec0on we introduce the formal no0on of a language, and the basic problem of recognizing strings from a language. These are central concepts that we will use throughout the remainder of the lecture. Note.This section contains mainly theoretical definitions; the lectures will cover examples and diagrams illustrating the theory. 2.1 Basic Defini0on Basic Definitions • An alphabet Σ is a finite non-empty set (of symbols). • A string or word over an alphabet Σ is a finite concatenation (or juxtaposition) of symbols from Σ. • The length of a string w (that is, the number of characters comprising it) is denoted |w|. • The empty or null string is denoted !. (That is, ! is the unique string satisfying |!| = 0.) • The set of all strings over Σ is denoted Σ∗ . • For each n ≥ 0 we define Σn = {w ∈ Σ∗ | |w| = n}. • We define Σ+ = ! Σn . n≥1 (Thus Σ∗ = Σ+ ∪ {!}.) • For a symbol or word x, xn denotes x concatenated with itself n times, with the convention that x0 denotes !. • A language over Σ is a set L ⊆ Σ∗ . • The length of a string w (that is, the number of characters comprising it) is denoted |w|. • The empty or null string is denoted !. (That is, ! is the unique string satisfying |!| = 0.) Special Defini0on • The set of all strings over Σ is denoted Σ∗ . • For each n ≥ 0 we define Σn = {w ∈ Σ∗ | |w| = n}. • We define Σ+ = ! Σn . n≥1 (Thus Σ∗ = Σ+ ∪ {!}.) • For a symbol or word x, xn denotes x concatenated with itself n times, with the convention that x0 denotes !. • A language over Σ is a set L ⊆ Σ∗ . • Two languages L1 and L2 over common alphabet Σ are equal if they are equal as sets. Thus L1 = L2 if, and only if, L1 ⊆ L2 and L2 ⊆ L1 . 2.2 Decidability Given a language L over some alphabet Σ, a basic question is: For each possible word w ∈ Σ∗ , can we effectively decide if w is a member of L or not? We call this the decision problem for L. Note the use of the word ‘effectively’: this implies the mechanism by which we decide on • A language over Σ is a set L ⊆ Σ∗ . • Two languages L1 and L2 over common alphabet Σ are equal if they are equal as sets. Thus L1 = L2 if, and only if, L1 ⊆ L2 and L2 ⊆ L1 . Decidability 2.2 Decidability Given a language L over some alphabet Σ, a basic question is: For each possible word w ∈ Σ∗ , can we effectively decide if w is a member of L or not? We call this the decision problem for L. Note the use of the word ‘effectively’: this implies the mechanism by which we decide on membership (or non-membership) must be a finitistic, deterministic and mechanical procedure that can be carried out by some form of computing agent. Also note the decision problem asks if a given word is a member of L or not; that16is, it is not sufficient to be only able to decide when words are members of L. More precisely then, a language L ⊆ Σ∗ is said to be decidable if there exists an algorithm such that for every w ∈ Σ∗ (1) the algorithm terminates with output ‘Yes’ when w ∈ L and (2) the algorithm terminates with output ‘No’ when w #∈ L. If no such algorithm exists then L is said to be undecidable. Note. Decidability is based on the notion of an ‘algorithm’. In standard theoretical computer science this is taken to mean a Turing Machine; this is an abstract, but extremely low-level Note • If no such algorithm exists then L is said to be undecidable. • Decidability is based on the no0on of an ‘algorithm’. In standard theore0cal computer science this is taken to mean a Turing Machine; this is an abstract, but extremely low-­‐level model of computa0on that is equivalent to a digital computer with an infinite memory. Thus it is sufficient in prac0ce to use a more convenient model of computa0on such as Pascal programs provided that any decidability arguments we make assume an infinite memory. A Sample • Let ! = {0,1} be an alphabet. Let L be the (infinite) language L = {w ! " * | w = 0 n 1 for some n} Does the program below solve the decision problem for L ? pseudo code( this is not a real programming languages, just to describe ~~~ ) read( char ); if char = END_OF_STRING then print( "No" ) else /* char must be ’0’ or ’1’ */ while char = ’0’ do read( char ) endwhile /* char must be ’1’ or END_OF_STRING */ if char = ’1’ then print( "Yes" ) else print( "No" ) endif endif Basic Facts • (1) Every finite language is decidable. (Hence every undecidable language is infinite.) • (2) Not every infinite language is undecidable. • (3) Programming languages are (usually) infinite but (always) decidable. (Why?) Hi, how do we apply the defini0on to compila0on ? • Languages may be classified by the means in which they are defined. Of interest to us are regular languages and context-­‐free languages. • Regular Languages. The significant aspects of regular languages are: 1) they are defined by paferns called regular expressions; 2) every regular language is decidable; 3) the decision problem for any regular language is solved by a determinis0c finite state automaton (DFA); 4) programming languages lexical paferns are specified using regular expressions, and lexical analysers are (essen0ally) DFAs. Context-­‐Free Languages • The significant aspects of context-­‐free languages are: 1) they are defined by rules called context-­‐free grammars; 2) every context-­‐free language is decidable; 3) the decision problem for any context-­‐free language of interest to us is solved by a determinis0c push-­‐down automaton (DPDA); 4) programming language syntax is specified using context-­‐free grammars, and (most) parsers are (essen0ally) DPDAs. We will talk about Context-­‐Free languages and Push-­‐Down Automata in later lectures. Defini0on of Token Tokens are the basic building blocks of a programming • The first compiler phase (scanning) splits up the character stream into tokens. • Free-­‐format language: program is a sequence of tokens and posi0on of tokens on page is unimportant. • Fixed-­‐format language: indenta0on and/or posi0on of tokens on page is significant (early Basic , Fortran , Haskell ) • Case-­‐sensi0ve language: upper-­‐ and lowercase are dis0nct (C ,C++ , Java ) • Case-­‐insensi0ve language: upper-­‐ and lowercase are iden0cal (Ada , Fortran , Pascal ) • Tokens are described by regular expressions Rela0on between Token and R.E. Defini0on of R.E. Why Regular Expressions • A lexical analyzer uses pafern matching with respect to rules associated with the source language s tokens. • For example, the token then is associated with the pafern t, h, e, n, and the token id might be associated with the pafern an alphabe0c character followed by any number of alphanumeric characters . • The nota0on of regular expressions is a mathema0cal formalism ideal for expressing paferns such as these, and thus ideal for expressing the lexical structure of programming languages. Rela0on of R.E. and R.L. Proper0es of R.E. Regular Defini0ons The Decision Problem for Regular Languages Defini0on of DFA Idea behind DFA How DFA works Transi0on Diagrams Example of DFA Equivalence Theorem • (1) For every r ∈ RE( ! ) there exists a DFA M with alphabet such that L(M) = L(r). • (2) For every DFA M with alphabet there exists an r ∈ RE( ! ) such that L(r) = L(M). • Proof. See J.E. Hopcrot and J. D. Ullman Introduc0on to Automata Theory, Languages, and Computa0on (Addison Wesley, 1979). • What does it mean ? ~~~ any relaEon with lexical analysis ? ~~~ How should we use it ?~~~let see an example So Perl ~~~ This language: 1. Originally focused on text parsing, input and output 2. Since has generalized to a "glue" language, as well as having libraries dedicated to par0cular scien0fic and computer science areas History of Perl • In the mid 1980's, Larry Wall was working as a sysadmin and found that he needed to do a number of common, yet oddball func0ons over and over again. And he didn't like any of the scrip0ng languages that were around at the 0me, so he invented Perl. Version 1 was released circa 1987. A few changes have occurred between then and now. The current version of Perl has exceeded 5.8.0 and is a highly recommended upgrade. A lifle more about mo0va0on • In computa0on-­‐based science, a shockingly large number of person-­‐hours are spent on bookkeeping ~~~ • Common worries: have I run all the data through the filter? have I run all four algorithms on my new test case? in the graph in my paper generated using the latest op0miza0on algorithm, or the one before? how can I track the progress of this data through pre-­‐ processing, search, filtering, and graphing? • Modularity is most effec0ve when combined with automa0on Where it is used? • Anywhere you parse or manipulate text Web servers (CGI scripts) Database access and processing • Anywhere you want to automate file handling and organiza0on • Generally not the first choice for numeric calcula0ons • not compiled, not type-­‐safe So in fact, it s a program(or what you call an virtual machine or virtual environment) running a specific instrucEon set ~~~ that s true ! ~~~ Type of Language • Semi-­‐interpreted Uses byte-­‐code representa0on like Java Doesn't store byte code for later re-­‐use • In desktop usage, most like BASIC write program, then run it • Advantages – no compiling, no makefiles • Disadvantages – can be slower running, not type-­‐safe, can encourage poor style Example #!/usr/bin/perl -­‐w if (@ARGV != 1) { die "Usage: prog1 <filename>\n"; } $filename = shit @ARGV; open(FILE, $filename) or die "Unable to open $filename\n"; %word_count = (); while ($line = <FILE>) { chomp $line; @words = split('\s+', $line); foreach $word (@words) { $word_count{$word} += 1; } } close(FILE); Hey, what have we seen here?~~~ 1st, Basic Datatypes • Scalar -­‐-­‐-­‐ $ any single value can (and oten is) converted at will between integer, double, string • Array -­‐-­‐-­‐ @ an array of scalars • Hash -­‐-­‐-­‐ % like the STL map, an easy way to construct fast lookup tables nd ~~~, and 2 : ways to access shell • Back0cks $response = `<command>` • system system("<command>") • exec exec("<command>") Scalars and Their Opera0ons • Perl has three categories of variables/values: scalars, arrays, hashes – Variables for each category are dis0nguished by the first symbol in the variable name • $ for scalar • @ for array • % for hash • Scalars come in three kinds: numbers, character strings and references – Numbers are stored internally as double-­‐precision floa0ng point – Note that strings are considered scalars in Perl Numeric Literals • Integer literals are a string of digits – Integer literals can be wrifen in hexademical, base 16, bu beginning the number with 0x or 0X • Floa0ng-­‐point literals have either decimal point or exponent or both String Literals • String literals can be delimited by single or double quotes • Single quote delimiters do not allow any subs0tu0ons: no escape characters (other than \ or \ ), no variable interpola0on • Double quote delimiters allow subs0tu0ons for escape characters and for variable interpola0on String Literals • The lefer q is used to introduce a literal, single quoted string, bounded by an arbitrary character – So q$abcdef$ • The lefer pair qq is used to introduce a literal, double quoted string, bounded by an arbitrary character – So qq#abcdef# • If the beginning delimiter is one of ( < [ { then the matching delimiter must be ) > ] }, respec0vely • Then null string is or – Also known as the empty string Scalar Variables • Scalar variable names begin with $, followed by lefers digits and/or underscores – Case sensi0ve – Conven0onally, programmer defined names do not use upper case lefers • Scalar variable values are interpolated into double quoted strings – If $x has the value 3 – Then Value of x is $x becomes Value of x is 3 • Unassigned variables have the value undef – undef converts to 0 as a number and the null string as a string • Perl has a large number of predefined variables – Many are named with special characters, such as $_ and $^ Numeric Operators • Four arithme0c: + -­‐ * / – Note that 5/2 is 2.5 • Modulus: % • Exponen0a0on: ** • Unary: -­‐-­‐ ++ Operator Precedence Operator Associativity ++, -- Nonassociative unary +, - Right ** *, /, % Right Left binary +, - Left String Operators • A string is a single unit, a scalar • The period, . , is used as the concatena0on operator – Note, note th + as in many languages: Perl does not overload operators • If $a is cant then $a . aloupe is cantaloupe • The x operator indicates repe00on, so = x 4 is ==== String Func0ons • Predefined unary operators can be used as func0ons by simply parenthesizing the operand – Be wary of precedence changes since parentheses are the highest precedence String Func0ons Name Parameter(s) Actions Chomp A string Removes any terminating newline characters* from its parameter; returns the number of removed characters length A string Returns the number of characters in its parameter string lc A string Returns its parameter string with all uppercase letters converted to lowercase uc A string Returns its parameter string with all lowercase letters converted to uppercase hex A string Returns the decimal value of the hexadecimal number in its parameter string join A character and the strings catenated together with a list of strings Returns a string constructed by catenating the strings of the second and subsequent strings together, with the parameter character inserted between them Assignment Statements • The assignment operator, = , assigns a value to a variable – The result returned is a reference to the assigned variable • Compound assignment operators are similar to C, C++ and Java – $x *= 3 mul0plies the value of $x by 3 • Comments are signified in Perl by a # sign – The remainder of the line is ignored Keyboard Input • Perl treats all input and output as file input and output • Physical files have external names, but all files are referred to by internal names called filehandles • Certain filehandles are predefined – STDIN is console input, usually the keyboard – STDOUT is console output, usually the screen – STDERR is console error output, usually the screen – The execu0on environment of the Perl script my redirect these predefined handles to take input from other sources (such as a physical file) or put output to other targets • The line input operator, <> reads a line of input (indlucing a newline character) from the filehandle – $line = <STDIN> will read one line from standard input and assign it to $line Standard Perl Usage • Since in most cases the termina0ng newline character is not desired, the chomp operator is used to remove it: – $x = <STDIN>; – chomp($x); • This is oten abbreviated – chomp($x = <STDIN>); – The assignment operator returns a reference to $x which is passed to chomp The Diamond Operator • Using <> without a filehandle has a special meaning in Perl • Each argument on the command line is interpreted as a file name • The lines are read from these files in succession • Standard input can be included by using a single hyphen as an argument: -­‐ Screen Output • The print func0on takes as an operand a list of one or more strings separated by commas – There is no newline character provided automa0cally, it must be literally included • A C-­‐style prin‡ func0on is available • Example quadeval.pl demonstrates input from the standard console and output to the standard console – This program is run independently of browser or server. For example, it could be run from the command line Perl from the Command Line • Perl programs are fun from the command line by using the perl interpreter perl quadeval.pl • Command flags can be added – -­‐w asks that warnings be reported for problema0c programming – -­‐c asks for compila0on without running • For example perl –w quadeval.pl • If the program were invoked like this perl –w quadeval.pl quad.dat • Then the input would be taken from the file quad.dat by using this input $input = <> Control Statements • Perl provides a standard array of control structures for managing the flow of execu0on in programming Control Expressions • Control statements depend on the value of control expressions to determine execu0on flow • Control expressions are, conceptually, either true or false – A string value is true unless it is or 0 • Note, that is literally, 0 : 0.0 is considered true • The <FH> input operator returns an empty string if there is no more input in the filehandle, this is interpreted as false • So, while($a = <FH>) { … } executes as long as there is input available from filehandle FH – A numeric value is true unless it is 0 • Control expressions usually involve rela0onal operators • The following slides lists the rela0onal operators – Note that there are different operators for strings and for numbers – Operands are coerced as needed to match the type of the operator Rela0onal Operators in Perl Operation Is equal to Numeric Operands == String Operands eq Is not equal to != Ne Is less than < lt Is greater than > gt Is less than or equal to <= le Is greater than or equal to >= ge Compare, returning -1, 0, or +1 <=> cmp Rela0onal Operators • The first six operators produce +1 if true or if false • The last operator produces – -­‐1 if the first operand is less than the second – +1 if the first operand is greater than the second – 0 if the two operands are the same • Rela0onal operators are nonassocia0ve – That is, $a < $b < $c is not syntac0cally valid in Perl Boolean Operators • Perl provides two forms of boolean operators • ! (not), && (and) and || (or) have precendence above the assignment operators but below other operators • and, or and not have precedence below any other operators • $a = <> or die no input parses as ($a = <>) or (die no input ) – This causes the program to terminate if no input is read from <> – If there is input, the next line is assigned to $a • $a = <> || die not input ; parses as $a = (<> || (die no input )); – This causes the program to terminate if no input is read from <> – This causes $a to be assigned +1 if there is input! Selec0on and Loop Statements • A block of statements in Perl is a sequence of statements enclosed in a pair of curly braces: { } • Control statements in Perl require blocks of statements as components rather than allowing single statements without the braces Selec0on using if • The if statement syntax if( control-­‐expression ) block [ elsif( control-­‐expression ) block ... Repeated elsif clauses ] [ else block ] • [ ] indicates op0onal parts • The elsif part may appear 0 or more 0mes • The un0l statement reverses the sense of the if – An un0l has neither elsif nor else parts Repe00on in Perl • The basic repe00on uses while: while( control-expression ) block • The while executes the block as long as the control-­‐expression is true • The un0l reverses the sense of the while until( control-expression ) block • The un0l executes as long as the control-­‐expression is false The for Statement • Syntax of the for statement for(initial-expression; control-expression; increment-expression ) block – The ini0al and increment expressions can be mutliple expressions separated by commas • The last operator causes the loop to exit immediately Loop Labels • A loop may be provided a label by prefixing a name and a colon to the beginning of the loop • A last operator can have a loop label as an operand – In this case, the operator will cause exit from the loop with the given label even if it is not the smallest loop containing the statement execu0ng last Fundamentals of Arrays • An array holds a list of scalar values – Note that an array holds scalars, not other arrays or hashes • Different types of scalar data can be in the same array • Arrays have dynamic size, that is, they can increase and decrease in size as a program executes List Literals • A list literal is given as a pair of parentheses enclosing a list of values separated by commas • Note that if a sub-­‐list is include as in ( a , ( b , c ), d ), then the list is fla9ened to ( a , b , c , d ) – References are used to include arrays as elements in arrays Arrays • An array is a variable that stores a list • The name of an array variable begins with the character @ • An array variable may be assigned a literal list value – @a = (1, 2, three , iv ); • An array assignment creates a new array as a copy of the original – @b = @a; Scalar and List Context • An expression in Perl is evaluated in a context • For example in the assignment $a = expression ; The expression on the right is evaluated in a scalar context • On the other hand, in @a = expression; The expression on the right is evaluated in a list context • An array or list evaluated in a scalar context evaluates to the length of the list Parallel Assignment A list of values can be assigned to a list of variables ($a, $b, $c) = (1, 2, iii ); causes $a to get the value 1, $b to get the value of 2 and $c to get the value iii • Note that the right se is evaluated before the assignment, so ($x, $y) = ($y, $x) actually swaps the values of the two variables • If the target includes an array variable, all remaining values in the expression list are assigned to the list variable • Accessing an Array Element The elements in an array are indexed by integers, beginning with 0 Element index 1 of list @alist is accessed as $alist[1] Note that $ is used since the element is a scalar Note also that there is not rela0onship between the scalar variable $alist and the list element $alist[1] • Assigning to an array element may cause the array to expand to accommodate the element @a = ( a , b , c ); $a[20] = outfield ; Causes the array @a to expand to size 21 • The last subscript in the array @a is $#a • • • • foreach Statement • The foreach allows convenient itera0ng through the elements of an array or list • foreach $x (@a) { …. } • Executes the body of the loop for each element of the array @a • In each itera0on, $x is an alias for the element – That is, if $x is changed, the corresponding element of the array is changed Built-­‐In Array Func0ons • Four func0ons are provided by Perl to support stack and queue opera0ons on arrays • push @a, $x; inserts the value $x at the end of the array @a • pop @a; removes the last value of @a and returns it • shift @a; removes the first value of @a and returns it – All the remaining elements of @a are shited down one index, hence the name • unshift @a, $x; inserts the value $x at the beginning of the array @a – All the remaining elements of @a are shited up one index Build-­‐In List Func0ons • The split func0on breaks strings into parts using a character to separate the parts • The sort func0on sorts a list using string comparison – A more general usage is presented later – sort does not alter the parameter but returns a new list • The qw (quote words) func0on creates a list of words from a string • The die operator displays its list operand and then terminates the program Hashes • An associa;ve array uses general data, oten strings, as indexes – The index is referred to as a key, the corresponding element as a value • Since a hash table is oten used to implement an associa0ve array, these structures are known as hashes in Perl • Elements in a Perl hash do not have a natural ordering – When a list of keys is retrieved from a hash there is no definite rela0onship between the order of the keys and either the values of the keys or the order in which they were entered into the hash Hash Variables • Hash variables are named beginning with the character % • If an array is assigned to a hash, the even index elements become keys and the odd index elements are the corresponding values – Assigning an odd length array to a hash causes an error • Curly braces are used to subscript a hash – If %h is a hash, then the element corresponding to four is referenced as $h{ four } Changing a Hash • Values can be assigned to a hash reference to insert a new key/value rela0on or to change the value related to a key • A key/value rela0on can be removed from a hash with the delete operator • The undef operator will delete all the contents of a hash • The exists operator checks if a key is related to any value in a hash – Just check $h{ something } doesn t work since the related value may be the empty string or 0, both of which count as boolean foalse • A hash variable embedded in a string is not interpolated – However, a reference to a hash element is interpolated Itera0ng Through a Hash • The keys operator returns a list of the keys in a hash • The sort operator can also be applied to iterate through the keys in order A Predefined Hash • The %ENV variable is defined to be the key/value pairs defined in the environment of the running Perl process • Many of these are inherited from the run-­‐0me environment • In Microsot Windows, environment variables can be set through the command-­‐line set command • In Unix Bourne shell, environment variables may be set by a simple assignment References • A reference is a scalar value giving the address of another value in memory • A reference to an exis0ng variable is created by using the backslash operator • References to literal structures can be created – A reference to a list is created by enclosing a list in square brackets, […] – A reference to a hash is created by enclosing a list in curly braces {…} – For example $a = [1, 2, 3, 4] – For example $h = { i => 1, v => 5, x => 10}; – No0ce the assignment is to a scalar variable since the literal value is a reference Dereferencing References • To access the value pointed to by a reference, the programmer must explicitly dereference the reference • An extra $ sign can be used – If $a = 5 and $b = \$a then $$b is 5 – $$b = 7 changes the value of $a to 7 • In a reference to an array, -­‐> can be used between the reference and the index to indicate a dereference – If $r = \@list then $$r[3] is the element at index 3 of @list – $r-­‐>[3] is also the element at index 3 of @list – $r[3] is the element at index 3 of @r, completely unrelated Func0on Fundamentals • A func0on defini0on consists of a func0on header and the body – The body is a block of code that executes when the func0on is called – The header contains the keyword sub and the name of the func0on • A func0on declara0on consists of the keyword sub and the func0on name – A declara0on promises a full defini0on somewhere else • A func0on call can be part of an expression. In this case the func0on must return a value that is used in the expression • A func0on call can be a standalone statement. In this case a return value is not required. If there is one, it is discarded Func0on Return • When a func0on is called, the body begins execu0ng at the first statement • A return statement in a func0on body causes the func0on body to immediately cease execu0ng – If the return statement also has an expression, the value is returned as the value of the func0on – Otherwise, the func0on returns no value • If execu0on of a func0on reaches the end of the body without encountering a return statement, the return value is the value of the last expression evaluated in the func0on Local Variables • Variables that are not declared explicitly but simply assigned to have global scope • The my declara0on is used to declare a variable in a func0on body to be local to the func0on • If a local variable has the same name as a global variable, the global variable is not visible within the func0on body • Perl also supports a form of dynamic scoping using the local declara0on – A my declara0on has lexical scope which works like scope rules in C, C ++ and Java Parameters • Parameters used in a func0on call are called actual parameters • Formal parameters are the names used in the func0on body to refer to the actual parameters • In Perl, formal parameters are not named in the func0on header • Perl supports both pass-­‐by-­‐value and pass-­‐by-­‐reference • The array @_ is ini0alized in a func0on body to the list of actual parameters – An element of this array is a reference to the corresponding parameter: changing an element of the array changes the corresponding actual parameter • Oten, values of @_ are assigned to local variables which corresponds to pass-­‐by-­‐value Parameter Usage Examples • This code causes the variable $a to change sub plus10 { $_[0] += 10; } plus10($a); • The first line of this func0on copies actual parameters to local variables Sub f { my($x, $y) = @_; } Passing Structures as Parameters • An array or hash will be flafened if included directly in an actual parameter list • A reference to a hash or array will be passed properly since the reference is a scalar value sort Revisited • The sort func0on can be called with the first parameter being a block which returns a numerical value based on the comparison of two variables $a and $b • This parameter is not followed by a comma • For example, using sort {$a <=> $b} @num will sort the array @num using numerical comparison • Using sort {$b <= > $a} @num will sort in reverse order See an Example of Sort # *** Print words in alphabe0c order *** foreach $word (sort (keys(%word_count))) { print "$word: " . $word_count{$word} . "\n”; } # *** Print words in descending count order *** sub hashValueDescendingNum { $word_count{$b} <=> $word_count{$a}; } print ("-­‐" x 80) . "\n”; @sorted_words = sort hashValueDescendingNum keys(%word_count); foreach $word (@sorted_words) { print "$word: " . $word_count{$word} . "\n”; } Basics of Pafern Matching • Perl has powerful pafern matching facili0es built in – These have been imitated in a number of other systems • The m operator indicates a pafern matching – This is used with delimiters like q and qq but the enclosed characters form a pafern – If the delimiter is / then the m is not required • A match is indicated by the =~ operator with a string on the let and a pafern on the right – A pafern alone is matched by default to $_ • The split func0on can take a pafern as the first argument rather than a character – The pafern specifies the pafern of characters used to split the string apart Remembering Matches • Parts of a pafern can be parenthesized • If the pafern matches a string, the variables $1, $2, … refer to the parts of the string matched by the parenthesized sub-­‐paferns • If a match is successful on a string, three strings are available to give the context of the match – $& is the part that actually matched the pafern – $` is the part of the string before the part that matched – $ is the part of the string ater the part that matched An Example(Courtesy of David Till) • Normally, the pafern-­‐matching operator examines the value stored in the variable specified by a corresponding =~ or !~ operator. For example, the following statement prints hi if the string abc is contained in the value stored in $val: print ("hi") if ($val =~ /abc/); • By default, the pafern-­‐matching operator examines the value stored in $_. This means that you can leave out the =~ operator if you are searching $_: print ("hi") if ($_ =~ /abc/); print ("hi") if (/abc/); # these two are the same Perl Regular Expressions • Regular Expression = Pafern – Template that either matches or does not match a string Perl Regular Expression: Binding Operator • =~ has let operand a string • =~ has right operand a pafern – Could be interpreted at run 0me. • Returns true / false depending on the success of match. • !~ opera0on is the same, but result is negated. Perl Regular Expression: Binding Operator Example if ( my($k,$v) = $string =~ m/(\w+)=(\w*)/) { print Key $k Value $v\n ; } Let’s pick that apart. The =~ has precedence over =, so =~ happens first. The =~ binds $string to the pafern match on the right, which is scanning for occurrences of things that look like KEY=VALUE in your string. It’s in list context because it’s on the right side of a list assignment. If the pafern matches, it returns a list to be assigned to $k and $v, which are new variables created by my. The list assignment itself is in scalar context, so it returns 2, the number of values on the right side of the assignment. And 2 happens to be true, since our scalar context is also a Boolean context. When the match fails, no values are assigned, which returns 0, which is false. Perl Regular Expressions • Qualifiers: – * matches the preceding character zero or more 0mes. • Pafern abc*d is matched by – rabd – zabccccd – Use parentheses to group lefers #/perl/bin/perl while(<>) { chomp; last if $_ eq 'stop'; if (/abc*d /) { print "Matched: |$`<$&>$'|\n"; } else { print "No match.\n"; } } #/perl/bin/perl while(<>) { chomp; last if $_ eq 'stop'; if (/a(bc)*d /) { print "Matched: |$`<$&>$'|\n"; } else { print "No match.\n"; } } Perl Regular Expressions • Qualifiers: – * matches zero or more instances – + matches one or more instances • ab(cde)+fg – ? matches none or one Perl Regular Expressions • Alterna0ves – | or • Either the right or the let side matches Perl Regular Expressions • Character Classes – List of possible characters inside a square bracket – Example: • [a-­‐cw-­‐z]+ • [a-­‐zA-­‐Z0-­‐9] – Nega0on provided by caret • [^n\-­‐z] matches any character but n , -­‐ , z Perl Regular Expressions • Character classes shortcuts – \w (word) is a shortcut for [A-­‐Za-­‐z0-­‐9] – \s (space) is a shortcut for [\f\t\n\r ] – \d (digit) is a shortcut for [0-­‐9] – [^\d] anything but a digit – [^\s] anything but a space character – [^\w] anything but a word character Perl Regular Expressions • Perl regex seman0cs are based on: – Greed • Perl tries to match as much of an expression as is possible – Eagerness • Perl gives the first possible match • The let-­‐most match wins – Backtracking • The en0re expression needs to match • Perl regex evalua0on backtracks if match is impossible Perl Regular Expressions Perl's extensive and integrated support for regular expressions means that you not only have features available that you won't find in any other language, but you have new ways of using them, too. Programmers new to Perl oten look for func0ons like these: match( $string, $pafern ); subst( $string, $pafern, $replacement ); but matching and subs0tu0ng are such common tasks that they merit their own nota0on: $meadow =~ m/sheep/; # True if $meadow contains "sheep" $meadow !~ m/sheep/; # True if $meadow doesn't contain "sheep" $meadow =~ s/old/new/; # Replace "old" with "new" in $meadow • • Eagerness Example: – What is the result of this snippet $string = boo hoo ; $string =~ s/o*/e/; boo hoo be hoo bee hoo boo heo boo hee eboo hoo #left side of =~ needs to be an l-value Perl Regular Expressions • Quan0fiers *, +, ? are not always enough • Specify number of occurrences by placing comma separated range in curly brackets – /a{2,12}/ • 2 to 12 a – /a{5,}/ • 5 or more a – /a{5}/ • exactly 5 a Perl Regular Expressions • Anchors – pafern can match everywhere in the string unless you use anchors – ^ beginning of string – $ end of string – /b start or end of a group of w-­‐characters – /B non-­‐word boundary anchor • Examples: – /^hello/ matches only at beginning of string – /world$/ matches only at the end of string Perl Regular Expressions • Parentheses and Memory – ( ) group together part of a pafern – Also remember corresponding match part of string. – These are put into a backreference • Made by backslash followed by number • Available as $1, … ater matching • Examples – /(.)\1/ matches any character followed by itself – /../ matches any two characters – /([ ]).*\1/ matches any string star0ng with single or double quotes followed by zero or more arbitrary characters followed by the same type of quotes. • • • doesn t match does match does match Perl Regular Expressions Valida0ng email • if ( $email =~ /\@/ ) { … } – checks for an ampersand • if ( $email =~ /\S+\@\S+/ ) – checks for non-white space characters divided by an ampersand – matches thomas@hotmail • if ( $email =~ /\S+\@\S+\.\S+ ) • if ( $email =~ /[\w\-]+\@[\w\-]+\.[\w\-]+/ – matches most valid emails, but allows multiple emails • if ( $email =~ /^[\w\-]+\@[\w\-]+\.[\w\-]+$/ – anchored at beginning and end of word Perl as File Walker #!/usr/bin/perl -­‐w @ppt_files = <*.ppt>; print "PPT files in the directory are ". join(", ", @ppt_files) . "\n”; @all_files = <*>; foreach $file (@all_files) { if (-­‐f $file) { print "file: $file\n”; } elsif (-­‐d $file) { print "dir: $file\n”; } } Perl as a Web Browser with LWP Perl library, Perl can • act as a web browser • download web pages • iden0fy and follow links • download non-­‐HTML files Web Browser Example # Create a user agent object use LWP::UserAgent; $ua = LWP::UserAgent-­‐>new; $ua-­‐>agent("MyApp/0.1 "); # Create a request my $req = HTTP::Request-­‐>new(POST => 'hfp://search.cpan.org/search'); $req-­‐>content_type('applica0on/x-­‐www-­‐form-­‐urlencoded'); $req-­‐>content('query=libwww-­‐perl&mode=dist'); # Pass request to the user agent and get a response back my $res = $ua-­‐>request($req); # Check the outcome of the response if ($res-­‐>is_success) { print $res-­‐>content; } else { print $res-­‐>status_line, "\n”; } Perl isn't unique • Python and Ruby are two other languages that can play similar roles not compiled can be included on web servers similar intent of easy file and text manipula0ons • Using Perl is a personal preference for me learned first, haven't felt anything missing included in almost every Unix distribu0on Resources • Learning Perl by O'Reilly hfp://proquestcombo.safaribooksonline.com/0596101058 • Perl books on-­‐line hfp://www.perl.org/books/library.html • CPAN – Comprehensive Perl Archive Network hfp://www.cpan.org/ • Perl Monks hfp://www.perlmonks.org/ See how Larry Wall says: ~~~ • hfps://www.youtube.com/watch?v=ju1IMxGSuNE • hfps://www.youtube.com/watch?v=LR8fQiskYII