CSC 415 Programming Languages

advertisement
PHP: Under the Hood
Steven Taylor
10/20/2012
CSC 415 Programming Languages
Preface:
The 1990′s was the decade in which the Internet grew from a popular platform for
research related communications, to a social and economic cornerstone. Pre PHP, the prominent
server side web adapted programming languages were C+ + and Perl. Although these two
languages, C++ and Perl, are powerful and well designed, the rapidly expanding state of the
internet application market created a demand for a faster and friendlier solution, and PHP was
that solution. I unfold the many layers of PHP by presenting a historic overview of PHP;
examining the syntax and semantics of PHP; and finally, evaluating PHP based on readability,
writability, reliability, and cost.
History of PHP:
By answering three questions, the foundation of PHP can be succinctly summarized.
First: What is PHP? The authors and editors of the PHP Manual describe PHP by stating, “PHP,
which stands for "PHP: Hypertext Preprocessor" is a widely-used Open Source general-purpose scripting
language that is especially suited for Web development and can be embedded into HTML.” Second: Who
created PHP? The creator of PHP was a developer named, Rasmus Lerdorf, and the creation of PHP was
stimulated by a script that he developed to track the traffic generated by viewers of his online resume
(W. Jason Gilmore). Lerdorf’s script was so popular, that he continued engineering his scripting methods
to form a programming language. Initially, Lerdorf’s scripting toolset was referred to as Personal Home
Page (PHP), rather than the recursive form name, PHP Hypertext Preprocessor. (W. Jason Gilmore).
Finally: When was PHP created? The origin of PHP’s creation dates back to 1995. At this time, other web
scripting technologies possessed no such tracking capabilities, and this is why Lerdorf’s scripting
language was considered revolutionary. As the internet evolved over time, web application
development transcended from a static state, to a dynamic nature. Naturally, PHP evolved with respect
to the dynamic nature of web application needs. Now that I have elaborated the “Who”, “What”, and
“When” of PHP, an equally important discussion, the evolution of PHP, follows.
PHP has expanded from a humble foundation as a traffic tracking script, to a powerful,
Object Oriented supported server side language. In 1997, the Personal Home Page Construction
Kit and the Form Interpreter components were completely re-written and then combined into a
single program, and this new program was released as PHP 2.0 (PHP Manual). The PHP user
base grew substantially within two years of the release of version 2.0, however Lerdorf
recognized that more changes were necessary. In 1997, Andi Gutmans and Zeev Suraski of Tel
Aviv, Israel, rewrote the PHP parser and included support for a number of third party databases
and APIs (PHP Manual). PHP 4.0 was officially released in May 2000. In addition to core
improvements, PHP4.0 also included features such as a broader range of web server support,
support for Object Oriented Programming, HTTP sessions, output buffering, and input security
(PHP Manual). The release of PHP4.0 was the most notable, and was the most impacting
version, as this release established PHP as enterprise worthy (PHP Manual). Version 5.0 added
additional support for web services, native support for SQLite, and improved upon various XML
and Object oriented features (PHP Manual). Today, no news of a version 6.0 has been released,
but the many extensions to version 5.0 bring us up to the current version 5.4(PHP Manual).
PHP Syntax and Semantics
Identifiers, Variables, Constants, Reserved Words, and Scope
An identifier is used by a developer to access or call on a variable, function, or some user
defined object. Several syntax rules must be followed when declaring identifiers in PHP, and
these syntax rules are as follows:
1) Identifiers in PHP must begin with a dollar sign ($) (If not identifying a function or
constant)
2) Following the dollar sign, the first character must be either a letter or an underscore.
3) After the first character, identifiers can consist of numbers, letters, the underscore
character, and any character within the ASCII range of 127 – 255.
4) Identifiers of PHP can be of any length, which is helpful to developers in that more
freedom is available for naming.
5) Identifiers are case sensitive, so $Me is not the same as $me. This syntax rule helps with
readability.
6) Finally, identifiers may not be identical to any of the reserved words of PHP (Wellington).
A number of reserved words are available to PHP developers, and the set of reserved
words of PHP is divided into the following three distinct categories: keywords, pre-defined
constants, and Predefined classes. As the PHP manual states, PHP keywords may be replicated
as identifiers without error. This is helpful, especially to new programmers. However, even
though identifiers may legally replicate key words, the PHP manual warns that such practice
threatens readability. PHP also provides 8 pre-defined constants, and most of the magic constants
are made available to any PHP script. However, some of these constants are not available unless
specific extensions are included dynamically or at compile time. Finally, predefined classes
make up the last category of reserved words, and are defined in the standard set of functions
included in the PHP build. Predefined classes provide a type structure for things like directories,
objects generated by type casting, and even anonymous functions.
(http://www.php.net/manual/en/reserved.php) PHP provides a rich set of reserved words to aid
developers with swifter development. However, the number of reserved words isn’t so abundant
that the process of naming becomes a challenge.
Variable declaration is the process by which a variable becomes available for use in a
program, and declaration is governed by the six identifier syntax rules previously mentioned.
When declaring a variable in PHP, the first character must be a $. Gilmore states the following
about variable declarations: “Interestingly, variables do not have to be explicitly declared in PHP
as they do in a language such as C. Rather, variables can be declared and assigned values
simultaneously.” Finally, variable assignment may be performed by either value assignment, or
reference assignment. Gilmore also mentions that value assignment requires no special syntax,
however assigning by reference does require the use of an ampersand prefix.
In Beginning PHP and MySQL: From Novice to Professional, Fourth Edition , Gilmore
defines a constant as, “… a value that cannot be modified throughout the execution of a
program.” Syntactically, a constant is case-sensitive, and, by convention, always uppercase.
Additionally, naming a constant follows the same rules as those governing identifiers, aside from
the absence of a dollar sign. (http://www.php.net/manual/en/language.constants.php) A unique
detail of PHP constants is the manner by which constants are assigned values. The define()
function is used declare and assign to constants, so unlike C based languages, function based
assignment is used to assign to PHP constants.
(http://us.php.net/manual/en/language.constants.php)
The scope of PHP constants, by default, is global. As a result, PHP constants are
accessible any place in the entire script. As far as variables are concerned, the following three
different types of scope may be applied: local, static, global. A local variable is a variable with
local scope, meaning that this variable was declared in a function and may only be accessed from
within that function block. As soon as program control exits the function in which a variable is
defined, all variables local to that function are destroyed, and as a result become inaccessible.
Unlike global variables, the integrity of a local variable is protected from side effects. Static
scope variables behave much differently from local scope variables, in that they are history
sensitive. Unlike a local variable, when a static scope variable is declared inside of a function, it
will retain its value, rather than being deleted. In order to declare a variable to be of static scope,
prefix the variable declaration with the static reserved word. Additionally, it is important to note
the following about static variables: the declaration of a static variable is resolved at compile
time, and trying to assign values to these variables which are the result of expressions will cause
a parse error (PHP Manual). Next, in PHP, a global variable is accessible anywhere within its
native script. However, unlike the C based languages, global variables must be explicitly made
available. The following two syntactical methods exist for enabling access to a global variable
within some localized scope: prefixing with the global key word, and the $GLOBALS[] array.
Finally, note that variables declared within a “loop” or an “if” structure will be accessible outside
of that block. PHP’s approach to accessing global variables is safer than that of C based
languages, as this approach provides resistance against any incidental or changes.
PHP Data Types
As Sebasta indicates, an important factor in evaluating the effectiveness of a
programming language’s ability to produce results, is how well the supported data types
accurately model the objects of the problem domain. PHP is a server-side HTML-embedded
scripting language that allows developers to build web applications, and features an ample
number of data types for such problems. Similar to other programming languages, PHP supports
a set of primitive data types as well as a set of compound data types, however Kantor specifies,
“PHP is a loosely-typed language, so a variable does not need to be of a specific type and can
freely move between types as demanded by the code it is being used in.” Consequently, PHP
developers are handed the responsibility of managing the type of data, and this fact does not earn
PHP any quality points in terms of reliability or readability. PHP supports the following five
primitive types: integers, floating point numbers, strings, and Booleans. Additionally, PHP
supports two compound types, objects and arrays. As far as the primitive types of PHP, Boolean
values are represented by the key words, true and false, and as Kantor mentions, PHP supports
convenient methods of converting these two Booleans to integer values of 1 and 0 respectively.
Even though PHP technically doesn’t provide exclusive support for doubles, these types may be
represented as floating point values. The numeric primitives of PHP display flexibility by
supporting the assignment of base eight, ten, and sixteen values, but the symbolic representation
doesn’t resemble the numeric base. Overall, string and array types are the most useful and
important data types of PHP, especially because PHP is geared towards providing web solutions.
As support for stating the significance of the string type, Gilmore states, “PHP is the Slap
Chop™ of string manipulation, allowing you to slice and dice text in nearly every conceivable
fashion.” By providing close to one hundred different string manipulative functions, PHP
developers have the tools to work with an inconceivable number of text parsing techniques. The
indexing attribute of strings allows PHP developers to access an individual character, which is
similar to the String class strings of C++ and the accessing method of elements of PHP arrays. In
semi-strong and strongly typed languages, arrays are intended to store a collection of similar data
types, but type uniformity is not necessary when storing values in the array type of PHP. The
PHP Manual provides the following additional characteristics of PHP arrays: “An array in PHP
is actually an ordered map. A map is a type that associates values to keys. This type is optimized
for several different uses; it can be treated as an array, list (vector), hash table (an
implementation of a map), dictionary, collection, stack, queue, and probably more.
As array values can be other arrays, trees and multidimensional arrays are also possible.” The
key of PHP arrays can be of either string or an integer, and the value can be of any type.
Additionally, PHP provides a number of assignment methods for the arrays type, either through a
function call form of assignment, or through hardcoding values directly into appropriate
elements. The next data type worth noting is the resource type, which provides a means for
storing data base connection strings, and is a component of the unparalleled range of database
support provided by PHP.
Expressions
PHP expressions are simple, and the most important concepts involve associativity rules,
the process of forming complex expressions, as well as type conversions. Additional attributes of
expressions are the set of operators used to form expressions in PHP. PHP is a C based language,
so naturally all statements must end with a semicolon. PHP supports a number operators, and
types of operators, and Kantor provides the following categories of operators: “arithmetic
operators, string operators, assignment operators, auto increment operators, casting operators,
relational operators, logical operators, bitwise operators.” PHP enforces operator precedence in
which multiplication and division are equal at the highest level, followed by addition and
subtraction, with the assignment operator having the lowest level of precedence. A small group
of operators are right associative, such as the following: constructor calls, Logical NOT, Bitwise
NOT,
increment and decrement operators, unary plus and negation, the inhibit errors operator ,
the ternary operator, the assignment operator, and the set of assignment with operations
operators. However, all other operators are Left associative. Additionally, the PHP language
facilitates a means for manual, parenthetical control over operator precedence. Unfortunately,
PHP does not support operator overloading natively, however packages exist which facilitate
limited forms or operator overloading. (http://pecl.php.net/package/operator) Even though, in
PHP, implicit casting is performed dynamically when PHP code is implemented, Kantor
maintains that there are times when explicit casting is necessary, and such functions are provided
by the language (Kantor). In addition to explicit type casting, PHP developers frequently need to
determine the type of some piece of data, and two intuitively named functions,
gettype() and settype(), facilitate these type specific functionalities. Web development
frequently requires dynamic type conversions, as well as support for mixed mode expressions,
and such actions are supported and handled by the PHP interpreter.
Control Structures
Control Structures not only dictate the flow of a program, but also provide a means for
iterative execution. Kantor lists conditionals, iteration, and functions as the main control flow
categories of PHP, and he also mentions the following about PHP control structures: “There is
one last type of flow control, called goto's.”(Kantor)
The Conditionals of PHP are delimiting
statements that determine whether or not code is executed based on the value of a Boolean
expression(Kantor). Kantor also maintains that two basic conditional statements are provided by
all C-based languages, and these two statements are the if and the switch statement. The
semantics of the PHP if statement is simple, as the execution of some code is dictated by whether
or not a conditional expression is evaluated to true or false. Kantor states, “To use conditionals,
you need to be evaluating an expressions that evaluates to true or false.”
So as implied by the
above quote, PHP conditional expressions must be Boolean typed. Unlike Visual Basic, Ada, and
other languages, the PHP “if” statement does not make use of a “then” keyword. Additionally,
executing a single line within an if statement does not require enclosing curly braces (Kantor).
As a complement to the “if” statement, “else” and “elseif” statements are provided. PHP also
provides the switch statement to test a single variable or expression for multiple values. As
Kantor mentions, the syntax of the switch statement requires additional keywords and the
following syntactical components: The case statement, the break statement, and the default
statement. The flexible nature of PHP is exemplified by its support for multiple syntactical forms
for conditionals, as PHP not only supports the use of curly braces as conditional, but also the
“endif” and “endsswitch” labels. An interesting note concerning the semantics of the break
statement is expanded upon by Kantor in the following quote: “The break statement can also be
used in an odd way in PHP that can cause errors if you are not aware of it. If the break statement
is the only statement following a given case statement, the code will skip to the default statement
block instead of the end of the switch statement.” The previous semantic description of the PHP
break statement may seem confusing, but is useful when trying to avoid values that might meet a
later condition. The semantics of the iteration category of PHP control structures is summarized
by Kantor’s statement, “These statements have control structures that delimit them and which
determine how many times (zero or more) the delimited code is executed, based on some
condition.” PHP provides the following three iterative control structures: the while statement, the
do… while statement, and the for statement. Similar to the if and switch structures, PHP supports
an alternative block delimiting syntactical entity to the curly brace, the colon (Kantor). Finally,
Kantor provides the following syntactical attributes of the PHP for loop: “init: The initial state of
the variable to be tested. Condition: The condition to be tested for the statement to continue
processing. Increment: The increment by which the variable being tested changes.”(Kantor) As
Kantor defines, the PHP for loops also support the use of multiple conditions by using a comma
to separate each successive condition. Assistive to the PHP looping control structure, the
following termination statements are provided: The break Statement, and
the continue Statement. Additionally, each of these termination statements may be followed by
an integer specifying how many levels of nesting to step out (Kantor). Along with the basic for,
while, and do…while loops, PHP also provides the “foreach” construct to iterate over arrays and
objects exclusively (PHP Manual). Finally, PHP also provides a number of processing directive
control structures.
Functions
As specified by the PHP Manual, PHP provides support for three main function forms,
user defined functions, internal functions and variable functions. Defining a function in PHP
requires the same syntax rules as those that govern other labels in PHP. PHP does not require
that a function be defined before it is referenced, unless a function is defined within a conditional
block (PHP Manual). Function names are case insensitive, and as the PHP Manual maintains,
PHP supports variable length parameter lists and default arguments for user defined functions.
Arguments may be explicitly passed by reference by prefixing the argument with the C-style
syntax ampersand character. Additionally, the PHP Manual mentions that PHP functions may
optionally return any type value through the use of the return keyword. Many C based
programmers would expect pass by reference semantics for array arguments, but that is not the
case. PHP arrays and scalar values are implicitly passed by value, and objects are implicitly
passed by reference (PHP Manual). Variable function semantics in PHP means that if a variable
has a pair of open and closing parentheses appended its end, then the value of that variable will
be treated as a function name by the PHP interpreter. The PHP Manual states the following about
internal functions: “PHP comes standard with many functions and constructs. There are also
functions that require specific PHP extensions compiled in, otherwise fatal "undefined function"
errors will appear.” Such core functions mentioned in the previous quote are the string and
variable functions. Finally, PHP supports anonymous functions, also known as closures. Closure
may be used for many reasons such as assigning values to variables dynamically, or even
providing access to parenting scope. As provided by the PHP manual, anonymous functions are
implemented using the closure class.
Abstract Data Types
Beginning with PHP 5, support for user defined, abstract data types was provided.
Through the use of the abstract keyword, developers may declare a class to be abstract. Abstract
data types may not be instantiated, as these classes are provided for inheritance purposes. This
encapsulation feature is extremely useful for reducing code-reuse, as well as for hiding
information. The PHP Manual expands upon the semantics of abstract classes by stating, “When
inheriting from an abstract class, all methods marked abstract in the parent's class declaration
must be defined by the child; additionally, these methods must be defined with the same (or a
less restricted) visibility.”(PHP Manual) When deriving from an abstract parent class, the
derived method signatures must match those of the parent class, and the key word extends must
be appended at the end of the derived class declaration (PHP Manual). Additionally, PHP
supports parameterized abstract data types by defining the types of values to be returned by
generic classes so they can be checked dynamically at run-time
(http://www.phpclasses.org/package/5211-PHP-Implementation-of-generic-types.html).
Object Oriented Programming
The PHP Manual maintains that beginning with PHP 5, the Object Oriented
Programming model was included to provide for better performance and more features. The
object oriented capabilities of PHP 5 include visibility, abstract and final classes and methods, as
well as magic methods, interfaces (PHP Manual). Additionally, the semantics of objects in PHP
follows access by reference rather than by copying of the object. The PHP Manual describes the
syntax of object visibility as follows: “The visibility of a property or method can be defined by
prefixing the declaration with the keywords public, protected or private.” For instance classes,
PHP 5 provided constructors, as well as destructors that are syntactically and semantically
similar to those of other C-based languages (PHP Manual). Also, the object oriented capabilities
of PHP 5.0 provide a means for inheritance, however only single inheritance is allowed. When
accessing static, constant, or overridden properties or methods of class from outside the class
definition, PHP requires the use of the scope resolution operator (::). Along with the various
types of classes, modern versions of PHP facilitate the use of interfaces. As mentioned in the
PHP Manual, interfaces are defined by using the interface keyword, and other classes may
implement an interface through the use of either the implements operator or the extends operator.
Overloading of methods and properties in PHP is a bit confusing, and isn’t very developer
friendly. However, magic methods are made available as a means of dynamically creating
properties and methods. A feature known to PHP as late static binding is provided on objects
through the use of the self keyword, rather than using the this keyword. One feature of objects
that is frequently discussed is that objects are passed by reference by default. However, this is
not completely true, and the following explanation is provided by the PHP Manual: “A PHP
reference is an alias, which allows two different variables to write to the same value. As of PHP
5, an object variable doesn't contain the object itself as value anymore. It only contains an object
identifier which allows object accessors to find the actual object.” As a result of the properties of
objects mentioned in the aforementioned quote, any object that is either passed as an argument,
or returned or assigned to a variable is accessed by a pointer to the object from a copy of the
object identifier (PHP Manual).
Concurency
PHP does not provide support for Concurrency.
Exception Handling
PHP approaches exception handling in a similar manner to other programming languages
(PHP Manual). The Exception class facilitates catching and throwing exceptions by providing try
and catch control structures. Generally, the internal functions of PHP utilize errors and error
reporting more frequently than exceptions; however errors may be translated into exceptions
through methods of the ErrorException class (PHP Manual). The ErrorException class provides
properties and functions that not only convert errors to exceptions, but also means to assess the
severity of the error or exception. As stated in the PHP Manual, “The Standard PHP Library
(SPL) provides a good number of built-in exceptions.” Native exceptions of PHP range from out
of bounds to bad function call exceptions. Syntactically, the try and catch blocks of PHP are like
other C-based languages in the way that code that may be error prone is wrapped in a try block,
and then any error or exception handling takes places within a single or string of catch blocks
(PHP Manual http://www.php.net/manual/en/language.exceptions.php).
Evaluation
Readability
In my opinion, PHP is not a very readable language for a number of reasons. First
of all, the flexible nature of PHP syntax makes it difficult to decipher the exact role of each
syntactic unit, For example, PHP allows block structures to be delimited by either a pair of curly
braces, a colon, as well as by the use of keywords. Next, the dynamic typing of PHP makes it
difficult for developers to understand the expected outcome of various operations, and as a
result, specific functions exist strictly for deciphering the mystery of data and structure types.
Additionally, PHP supports “heredoc” and “nowdoc” structures as an alternative delimiter for
strings, and this appeal to flexibility makes for very ambiguous code blocks. The following is an
example from the PHP Manual of the heredoc string delimiter: “<<<EOF this is a block
EOF;”(PHP Manual). Finally, the semantically different processing of a variable identifier when
enclosed in a single quote, compared to when enclosed in a double quote, does not help PHP’s
readability.
Writability
As PHP developed as a programming language, the language designers placed writability
at a high level of priority in order to make PHP easy to learn. The flexible syntax is one example,
as many syntactical forms are available for the same purpose. I believe that the flexible
syntactic nature of PHP not only increases PHP’s expressivity, but also eases the burden on
rookie developers to quickly become productive. Secondly, the dynamic typing of PHP makes
development flow quickly, as new developers don’t have to be concerned with strict typing
rules. For example, the PHP interpreter will automatically convert arguments to the appropriate
type as they are passed in a function call. Additionally, when assigning to arrays, type
uniformity is not required, and this implementation characteristic also increases productivity, as
well as aids with writability. Also, PHP provides many Data Connection options, which are
intuitively named as well as easy to utilize, and these Data Connection objects make distributed
data processing a cake walk in PHP. Finally, the friendly syntax of associative arrays makes
assigning to and searching arrays more intuitive and natural.
Reliability
Even though PHP does facilitate the means for enforcing type strictness, such features
are not natural to the dynamic typing of PHP. As a result, the reliability of PHP is not one of its
highlights. For instance, the PHP interpreter implicitly handles mixed mode expressions, as well
as implicitly performs type conversions of arguments being passed within function calls.
Although such dynamic and loosely typed features of PHP are considered convenient to most
web developers, this feature compromises the reliability of PHP code. On a positive note, the
semantics of Global variables in PHP is very reliable, as PHP requires an explicit declaration of
any global variable before being accessed within local or function scope. Finally, the loose
scoping semantics of loop variables in PHP is handy, but more importantly compromises
reliability as this lends loop variables to suffer from various side effects.
Cost
As Sebesta maintains, the total cost of a programming language is
influenced by the overhead involved with training developers to use a language.
Because PHP is very writable, flexible, and therefore highly expressive, training a
developer to use PHP should be a lost cost event. Additionally, PHP is dynamically
typed, and therefore requires runtime checks for coding errors. Sebesta mentions
the aforementioned factor when evaluating the cost of a programming language,
however he does mention that execution efficiency is becoming less important.
Next, the cost of PHP’s implementation system is free, and this fact scores cost
points for PHP. Finally, the cost of program maintainability when PHP is utilized
receives mixed reviews. In my opinion, the readability of PHP is average at best
and therefore raises the cost of code maintenance. However, PHP is open source
and therefore friendly to extensions.
Overall Grade
Overall, PHP is a fun language with which to develop web applications. Being that I am
new to PHP and also spoiled by the Microsoft web development option, ASP.NET4 in C#, PHP
may be more readable than I assessed. As a result, on a four point grading scale, I would assign
PHP three points out of four in the area of readability. PHP is rather easy to pick up on the fly,
and offers various syntactical options, so I believe that PHP deserves all four writability points
Also, given that PHP does provide a means to be type strict and this fact does somewhat improve
the reliability of PHP code; I believe that PHP deserves two out of four points for reliability.
Finally, aside from a few readability concerns that I have with PHP, PHP is accessible and
affordable in terms of implementation, and is very writeable and expressive. As a result, I assess
three points out of four in the area of cost evaluation.
Tentative list of research sources:
1) “PHP Manual”. http://www.php.net/manual/en/index.php (This is the site for the official
manual of the PHP language)
2)
Peter L. Kantor . “Hudson Valley Community College Web Site” .
http://www.daaq.net/old/php/index.php
3) Sebesta Robert W. Concepts OF Programming Languages . Tenth Edition .
Boston:Pearson, 2010
4) Wellington,Luke Thomson,Laura. PHP and MySQL Web Development.
Second Edition. Developers Library. USA. Sams Publishing. 2003
5) Harris, Andy. PHP 5/MySQL Programming. for the absolute beginner.
Canada. Premier
6) Gilmore, Jason. Beginning PHP and MySQL: From Novice to Professional.
Fourth Edition. Apress. 2010
7) Habib, Irfan. “Integrating PHP and Perl”. Linux Journal. Volume 2007 Issue
154. Feb 2007
8) Fioretti, Marco. “Top Ten Tips for Getting Started with PHP”. Linux Journal.
Volume 2006 Issue 145.
May 2006
9) Knudsen, Craig. “PHP Version 4”. Linux Journal. Volume 1999 Issue 67es.
November 1999
10) http://pecl.php.net/package/operator
Download