PHP: Under the Hood Steven Taylor 10/20/2012 CSC 415 Programming Languages Preface: The 1990′s was the decade in which the Internet grew from a popular platform for research related communications, to a social and economic cornerstone. Pre PHP, the prominent server side web adapted programming languages were C+ + and Perl. Although these two languages, C++ and Perl, are powerful and well designed, the rapidly expanding state of the internet application market created a demand for a faster and friendlier solution, and PHP was that solution. I unfold the many layers of PHP by presenting a historic overview of PHP; examining the syntax and semantics of PHP; and finally, evaluating PHP based on readability, writability, reliability, and cost. History of PHP: By answering three questions, the foundation of PHP can be succinctly summarized. First: What is PHP? The authors and editors of the PHP Manual describe PHP by stating, “PHP, which stands for "PHP: Hypertext Preprocessor" is a widely-used Open Source general-purpose scripting language that is especially suited for Web development and can be embedded into HTML.” Second: Who created PHP? The creator of PHP was a developer named, Rasmus Lerdorf, and the creation of PHP was stimulated by a script that he developed to track the traffic generated by viewers of his online resume (W. Jason Gilmore). Lerdorf’s script was so popular, that he continued engineering his scripting methods to form a programming language. Initially, Lerdorf’s scripting toolset was referred to as Personal Home Page (PHP), rather than the recursive form name, PHP Hypertext Preprocessor. (W. Jason Gilmore). Finally: When was PHP created? The origin of PHP’s creation dates back to 1995. At this time, other web scripting technologies possessed no such tracking capabilities, and this is why Lerdorf’s scripting language was considered revolutionary. As the internet evolved over time, web application development transcended from a static state, to a dynamic nature. Naturally, PHP evolved with respect to the dynamic nature of web application needs. Now that I have elaborated the “Who”, “What”, and “When” of PHP, an equally important discussion, the evolution of PHP, follows. PHP has expanded from a humble foundation as a traffic tracking script, to a powerful, Object Oriented supported server side language. In 1997, the Personal Home Page Construction Kit and the Form Interpreter components were completely re-written and then combined into a single program, and this new program was released as PHP 2.0 (PHP Manual). The PHP user base grew substantially within two years of the release of version 2.0, however Lerdorf recognized that more changes were necessary. In 1997, Andi Gutmans and Zeev Suraski of Tel Aviv, Israel, rewrote the PHP parser and included support for a number of third party databases and APIs (PHP Manual). PHP 4.0 was officially released in May 2000. In addition to core improvements, PHP4.0 also included features such as a broader range of web server support, support for Object Oriented Programming, HTTP sessions, output buffering, and input security (PHP Manual). The release of PHP4.0 was the most notable, and was the most impacting version, as this release established PHP as enterprise worthy (PHP Manual). Version 5.0 added additional support for web services, native support for SQLite, and improved upon various XML and Object oriented features (PHP Manual). Today, no news of a version 6.0 has been released, but the many extensions to version 5.0 bring us up to the current version 5.4(PHP Manual). PHP Syntax and Semantics Identifiers, Variables, Constants, Reserved Words, and Scope An identifier is used by a developer to access or call on a variable, function, or some user defined object. Several syntax rules must be followed when declaring identifiers in PHP, and these syntax rules are as follows: 1) Identifiers in PHP must begin with a dollar sign ($) (If not identifying a function or constant) 2) Following the dollar sign, the first character must be either a letter or an underscore. 3) After the first character, identifiers can consist of numbers, letters, the underscore character, and any character within the ASCII range of 127 – 255. 4) Identifiers of PHP can be of any length, which is helpful to developers in that more freedom is available for naming. 5) Identifiers are case sensitive, so $Me is not the same as $me. This syntax rule helps with readability. 6) Finally, identifiers may not be identical to any of the reserved words of PHP (Wellington). A number of reserved words are available to PHP developers, and the set of reserved words of PHP is divided into the following three distinct categories: keywords, pre-defined constants, and Predefined classes. As the PHP manual states, PHP keywords may be replicated as identifiers without error. This is helpful, especially to new programmers. However, even though identifiers may legally replicate key words, the PHP manual warns that such practice threatens readability. PHP also provides 8 pre-defined constants, and most of the magic constants are made available to any PHP script. However, some of these constants are not available unless specific extensions are included dynamically or at compile time. Finally, predefined classes make up the last category of reserved words, and are defined in the standard set of functions included in the PHP build. Predefined classes provide a type structure for things like directories, objects generated by type casting, and even anonymous functions. (http://www.php.net/manual/en/reserved.php) PHP provides a rich set of reserved words to aid developers with swifter development. However, the number of reserved words isn’t so abundant that the process of naming becomes a challenge. Variable declaration is the process by which a variable becomes available for use in a program, and declaration is governed by the six identifier syntax rules previously mentioned. When declaring a variable in PHP, the first character must be a $. Gilmore states the following about variable declarations: “Interestingly, variables do not have to be explicitly declared in PHP as they do in a language such as C. Rather, variables can be declared and assigned values simultaneously.” Finally, variable assignment may be performed by either value assignment, or reference assignment. Gilmore also mentions that value assignment requires no special syntax, however assigning by reference does require the use of an ampersand prefix. In Beginning PHP and MySQL: From Novice to Professional, Fourth Edition , Gilmore defines a constant as, “… a value that cannot be modified throughout the execution of a program.” Syntactically, a constant is case-sensitive, and, by convention, always uppercase. Additionally, naming a constant follows the same rules as those governing identifiers, aside from the absence of a dollar sign. (http://www.php.net/manual/en/language.constants.php) A unique detail of PHP constants is the manner by which constants are assigned values. The define() function is used declare and assign to constants, so unlike C based languages, function based assignment is used to assign to PHP constants. (http://us.php.net/manual/en/language.constants.php) The scope of PHP constants, by default, is global. As a result, PHP constants are accessible any place in the entire script. As far as variables are concerned, the following three different types of scope may be applied: local, static, global. A local variable is a variable with local scope, meaning that this variable was declared in a function and may only be accessed from within that function block. As soon as program control exits the function in which a variable is defined, all variables local to that function are destroyed, and as a result become inaccessible. Unlike global variables, the integrity of a local variable is protected from side effects. Static scope variables behave much differently from local scope variables, in that they are history sensitive. Unlike a local variable, when a static scope variable is declared inside of a function, it will retain its value, rather than being deleted. In order to declare a variable to be of static scope, prefix the variable declaration with the static reserved word. Additionally, it is important to note the following about static variables: the declaration of a static variable is resolved at compile time, and trying to assign values to these variables which are the result of expressions will cause a parse error (PHP Manual). Next, in PHP, a global variable is accessible anywhere within its native script. However, unlike the C based languages, global variables must be explicitly made available. The following two syntactical methods exist for enabling access to a global variable within some localized scope: prefixing with the global key word, and the $GLOBALS[] array. Finally, note that variables declared within a “loop” or an “if” structure will be accessible outside of that block. PHP’s approach to accessing global variables is safer than that of C based languages, as this approach provides resistance against any incidental or changes. PHP Data Types As Sebasta indicates, an important factor in evaluating the effectiveness of a programming language’s ability to produce results, is how well the supported data types accurately model the objects of the problem domain. PHP is a server-side HTML-embedded scripting language that allows developers to build web applications, and features an ample number of data types for such problems. Similar to other programming languages, PHP supports a set of primitive data types as well as a set of compound data types, however Kantor specifies, “PHP is a loosely-typed language, so a variable does not need to be of a specific type and can freely move between types as demanded by the code it is being used in.” Consequently, PHP developers are handed the responsibility of managing the type of data, and this fact does not earn PHP any quality points in terms of reliability or readability. PHP supports the following five primitive types: integers, floating point numbers, strings, and Booleans. Additionally, PHP supports two compound types, objects and arrays. As far as the primitive types of PHP, Boolean values are represented by the key words, true and false, and as Kantor mentions, PHP supports convenient methods of converting these two Booleans to integer values of 1 and 0 respectively. Even though PHP technically doesn’t provide exclusive support for doubles, these types may be represented as floating point values. The numeric primitives of PHP display flexibility by supporting the assignment of base eight, ten, and sixteen values, but the symbolic representation doesn’t resemble the numeric base. Overall, string and array types are the most useful and important data types of PHP, especially because PHP is geared towards providing web solutions. As support for stating the significance of the string type, Gilmore states, “PHP is the Slap Chop™ of string manipulation, allowing you to slice and dice text in nearly every conceivable fashion.” By providing close to one hundred different string manipulative functions, PHP developers have the tools to work with an inconceivable number of text parsing techniques. The indexing attribute of strings allows PHP developers to access an individual character, which is similar to the String class strings of C++ and the accessing method of elements of PHP arrays. In semi-strong and strongly typed languages, arrays are intended to store a collection of similar data types, but type uniformity is not necessary when storing values in the array type of PHP. The PHP Manual provides the following additional characteristics of PHP arrays: “An array in PHP is actually an ordered map. A map is a type that associates values to keys. This type is optimized for several different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), dictionary, collection, stack, queue, and probably more. As array values can be other arrays, trees and multidimensional arrays are also possible.” The key of PHP arrays can be of either string or an integer, and the value can be of any type. Additionally, PHP provides a number of assignment methods for the arrays type, either through a function call form of assignment, or through hardcoding values directly into appropriate elements. The next data type worth noting is the resource type, which provides a means for storing data base connection strings, and is a component of the unparalleled range of database support provided by PHP. Expressions PHP expressions are simple, and the most important concepts involve associativity rules, the process of forming complex expressions, as well as type conversions. Additional attributes of expressions are the set of operators used to form expressions in PHP. PHP is a C based language, so naturally all statements must end with a semicolon. PHP supports a number operators, and types of operators, and Kantor provides the following categories of operators: “arithmetic operators, string operators, assignment operators, auto increment operators, casting operators, relational operators, logical operators, bitwise operators.” PHP enforces operator precedence in which multiplication and division are equal at the highest level, followed by addition and subtraction, with the assignment operator having the lowest level of precedence. A small group of operators are right associative, such as the following: constructor calls, Logical NOT, Bitwise NOT, increment and decrement operators, unary plus and negation, the inhibit errors operator , the ternary operator, the assignment operator, and the set of assignment with operations operators. However, all other operators are Left associative. Additionally, the PHP language facilitates a means for manual, parenthetical control over operator precedence. Unfortunately, PHP does not support operator overloading natively, however packages exist which facilitate limited forms or operator overloading. (http://pecl.php.net/package/operator) Even though, in PHP, implicit casting is performed dynamically when PHP code is implemented, Kantor maintains that there are times when explicit casting is necessary, and such functions are provided by the language (Kantor). In addition to explicit type casting, PHP developers frequently need to determine the type of some piece of data, and two intuitively named functions, gettype() and settype(), facilitate these type specific functionalities. Web development frequently requires dynamic type conversions, as well as support for mixed mode expressions, and such actions are supported and handled by the PHP interpreter. Control Structures Control Structures not only dictate the flow of a program, but also provide a means for iterative execution. Kantor lists conditionals, iteration, and functions as the main control flow categories of PHP, and he also mentions the following about PHP control structures: “There is one last type of flow control, called goto's.”(Kantor) The Conditionals of PHP are delimiting statements that determine whether or not code is executed based on the value of a Boolean expression(Kantor). Kantor also maintains that two basic conditional statements are provided by all C-based languages, and these two statements are the if and the switch statement. The semantics of the PHP if statement is simple, as the execution of some code is dictated by whether or not a conditional expression is evaluated to true or false. Kantor states, “To use conditionals, you need to be evaluating an expressions that evaluates to true or false.” So as implied by the above quote, PHP conditional expressions must be Boolean typed. Unlike Visual Basic, Ada, and other languages, the PHP “if” statement does not make use of a “then” keyword. Additionally, executing a single line within an if statement does not require enclosing curly braces (Kantor). As a complement to the “if” statement, “else” and “elseif” statements are provided. PHP also provides the switch statement to test a single variable or expression for multiple values. As Kantor mentions, the syntax of the switch statement requires additional keywords and the following syntactical components: The case statement, the break statement, and the default statement. The flexible nature of PHP is exemplified by its support for multiple syntactical forms for conditionals, as PHP not only supports the use of curly braces as conditional, but also the “endif” and “endsswitch” labels. An interesting note concerning the semantics of the break statement is expanded upon by Kantor in the following quote: “The break statement can also be used in an odd way in PHP that can cause errors if you are not aware of it. If the break statement is the only statement following a given case statement, the code will skip to the default statement block instead of the end of the switch statement.” The previous semantic description of the PHP break statement may seem confusing, but is useful when trying to avoid values that might meet a later condition. The semantics of the iteration category of PHP control structures is summarized by Kantor’s statement, “These statements have control structures that delimit them and which determine how many times (zero or more) the delimited code is executed, based on some condition.” PHP provides the following three iterative control structures: the while statement, the do… while statement, and the for statement. Similar to the if and switch structures, PHP supports an alternative block delimiting syntactical entity to the curly brace, the colon (Kantor). Finally, Kantor provides the following syntactical attributes of the PHP for loop: “init: The initial state of the variable to be tested. Condition: The condition to be tested for the statement to continue processing. Increment: The increment by which the variable being tested changes.”(Kantor) As Kantor defines, the PHP for loops also support the use of multiple conditions by using a comma to separate each successive condition. Assistive to the PHP looping control structure, the following termination statements are provided: The break Statement, and the continue Statement. Additionally, each of these termination statements may be followed by an integer specifying how many levels of nesting to step out (Kantor). Along with the basic for, while, and do…while loops, PHP also provides the “foreach” construct to iterate over arrays and objects exclusively (PHP Manual). Finally, PHP also provides a number of processing directive control structures. Functions As specified by the PHP Manual, PHP provides support for three main function forms, user defined functions, internal functions and variable functions. Defining a function in PHP requires the same syntax rules as those that govern other labels in PHP. PHP does not require that a function be defined before it is referenced, unless a function is defined within a conditional block (PHP Manual). Function names are case insensitive, and as the PHP Manual maintains, PHP supports variable length parameter lists and default arguments for user defined functions. Arguments may be explicitly passed by reference by prefixing the argument with the C-style syntax ampersand character. Additionally, the PHP Manual mentions that PHP functions may optionally return any type value through the use of the return keyword. Many C based programmers would expect pass by reference semantics for array arguments, but that is not the case. PHP arrays and scalar values are implicitly passed by value, and objects are implicitly passed by reference (PHP Manual). Variable function semantics in PHP means that if a variable has a pair of open and closing parentheses appended its end, then the value of that variable will be treated as a function name by the PHP interpreter. The PHP Manual states the following about internal functions: “PHP comes standard with many functions and constructs. There are also functions that require specific PHP extensions compiled in, otherwise fatal "undefined function" errors will appear.” Such core functions mentioned in the previous quote are the string and variable functions. Finally, PHP supports anonymous functions, also known as closures. Closure may be used for many reasons such as assigning values to variables dynamically, or even providing access to parenting scope. As provided by the PHP manual, anonymous functions are implemented using the closure class. Abstract Data Types Beginning with PHP 5, support for user defined, abstract data types was provided. Through the use of the abstract keyword, developers may declare a class to be abstract. Abstract data types may not be instantiated, as these classes are provided for inheritance purposes. This encapsulation feature is extremely useful for reducing code-reuse, as well as for hiding information. The PHP Manual expands upon the semantics of abstract classes by stating, “When inheriting from an abstract class, all methods marked abstract in the parent's class declaration must be defined by the child; additionally, these methods must be defined with the same (or a less restricted) visibility.”(PHP Manual) When deriving from an abstract parent class, the derived method signatures must match those of the parent class, and the key word extends must be appended at the end of the derived class declaration (PHP Manual). Additionally, PHP supports parameterized abstract data types by defining the types of values to be returned by generic classes so they can be checked dynamically at run-time (http://www.phpclasses.org/package/5211-PHP-Implementation-of-generic-types.html). Object Oriented Programming The PHP Manual maintains that beginning with PHP 5, the Object Oriented Programming model was included to provide for better performance and more features. The object oriented capabilities of PHP 5 include visibility, abstract and final classes and methods, as well as magic methods, interfaces (PHP Manual). Additionally, the semantics of objects in PHP follows access by reference rather than by copying of the object. The PHP Manual describes the syntax of object visibility as follows: “The visibility of a property or method can be defined by prefixing the declaration with the keywords public, protected or private.” For instance classes, PHP 5 provided constructors, as well as destructors that are syntactically and semantically similar to those of other C-based languages (PHP Manual). Also, the object oriented capabilities of PHP 5.0 provide a means for inheritance, however only single inheritance is allowed. When accessing static, constant, or overridden properties or methods of class from outside the class definition, PHP requires the use of the scope resolution operator (::). Along with the various types of classes, modern versions of PHP facilitate the use of interfaces. As mentioned in the PHP Manual, interfaces are defined by using the interface keyword, and other classes may implement an interface through the use of either the implements operator or the extends operator. Overloading of methods and properties in PHP is a bit confusing, and isn’t very developer friendly. However, magic methods are made available as a means of dynamically creating properties and methods. A feature known to PHP as late static binding is provided on objects through the use of the self keyword, rather than using the this keyword. One feature of objects that is frequently discussed is that objects are passed by reference by default. However, this is not completely true, and the following explanation is provided by the PHP Manual: “A PHP reference is an alias, which allows two different variables to write to the same value. As of PHP 5, an object variable doesn't contain the object itself as value anymore. It only contains an object identifier which allows object accessors to find the actual object.” As a result of the properties of objects mentioned in the aforementioned quote, any object that is either passed as an argument, or returned or assigned to a variable is accessed by a pointer to the object from a copy of the object identifier (PHP Manual). Concurency PHP does not provide support for Concurrency. Exception Handling PHP approaches exception handling in a similar manner to other programming languages (PHP Manual). The Exception class facilitates catching and throwing exceptions by providing try and catch control structures. Generally, the internal functions of PHP utilize errors and error reporting more frequently than exceptions; however errors may be translated into exceptions through methods of the ErrorException class (PHP Manual). The ErrorException class provides properties and functions that not only convert errors to exceptions, but also means to assess the severity of the error or exception. As stated in the PHP Manual, “The Standard PHP Library (SPL) provides a good number of built-in exceptions.” Native exceptions of PHP range from out of bounds to bad function call exceptions. Syntactically, the try and catch blocks of PHP are like other C-based languages in the way that code that may be error prone is wrapped in a try block, and then any error or exception handling takes places within a single or string of catch blocks (PHP Manual http://www.php.net/manual/en/language.exceptions.php). Evaluation Readability In my opinion, PHP is not a very readable language for a number of reasons. First of all, the flexible nature of PHP syntax makes it difficult to decipher the exact role of each syntactic unit, For example, PHP allows block structures to be delimited by either a pair of curly braces, a colon, as well as by the use of keywords. Next, the dynamic typing of PHP makes it difficult for developers to understand the expected outcome of various operations, and as a result, specific functions exist strictly for deciphering the mystery of data and structure types. Additionally, PHP supports “heredoc” and “nowdoc” structures as an alternative delimiter for strings, and this appeal to flexibility makes for very ambiguous code blocks. The following is an example from the PHP Manual of the heredoc string delimiter: “<<<EOF this is a block EOF;”(PHP Manual). Finally, the semantically different processing of a variable identifier when enclosed in a single quote, compared to when enclosed in a double quote, does not help PHP’s readability. Writability As PHP developed as a programming language, the language designers placed writability at a high level of priority in order to make PHP easy to learn. The flexible syntax is one example, as many syntactical forms are available for the same purpose. I believe that the flexible syntactic nature of PHP not only increases PHP’s expressivity, but also eases the burden on rookie developers to quickly become productive. Secondly, the dynamic typing of PHP makes development flow quickly, as new developers don’t have to be concerned with strict typing rules. For example, the PHP interpreter will automatically convert arguments to the appropriate type as they are passed in a function call. Additionally, when assigning to arrays, type uniformity is not required, and this implementation characteristic also increases productivity, as well as aids with writability. Also, PHP provides many Data Connection options, which are intuitively named as well as easy to utilize, and these Data Connection objects make distributed data processing a cake walk in PHP. Finally, the friendly syntax of associative arrays makes assigning to and searching arrays more intuitive and natural. Reliability Even though PHP does facilitate the means for enforcing type strictness, such features are not natural to the dynamic typing of PHP. As a result, the reliability of PHP is not one of its highlights. For instance, the PHP interpreter implicitly handles mixed mode expressions, as well as implicitly performs type conversions of arguments being passed within function calls. Although such dynamic and loosely typed features of PHP are considered convenient to most web developers, this feature compromises the reliability of PHP code. On a positive note, the semantics of Global variables in PHP is very reliable, as PHP requires an explicit declaration of any global variable before being accessed within local or function scope. Finally, the loose scoping semantics of loop variables in PHP is handy, but more importantly compromises reliability as this lends loop variables to suffer from various side effects. Cost As Sebesta maintains, the total cost of a programming language is influenced by the overhead involved with training developers to use a language. Because PHP is very writable, flexible, and therefore highly expressive, training a developer to use PHP should be a lost cost event. Additionally, PHP is dynamically typed, and therefore requires runtime checks for coding errors. Sebesta mentions the aforementioned factor when evaluating the cost of a programming language, however he does mention that execution efficiency is becoming less important. Next, the cost of PHP’s implementation system is free, and this fact scores cost points for PHP. Finally, the cost of program maintainability when PHP is utilized receives mixed reviews. In my opinion, the readability of PHP is average at best and therefore raises the cost of code maintenance. However, PHP is open source and therefore friendly to extensions. Overall Grade Overall, PHP is a fun language with which to develop web applications. Being that I am new to PHP and also spoiled by the Microsoft web development option, ASP.NET4 in C#, PHP may be more readable than I assessed. As a result, on a four point grading scale, I would assign PHP three points out of four in the area of readability. PHP is rather easy to pick up on the fly, and offers various syntactical options, so I believe that PHP deserves all four writability points Also, given that PHP does provide a means to be type strict and this fact does somewhat improve the reliability of PHP code; I believe that PHP deserves two out of four points for reliability. Finally, aside from a few readability concerns that I have with PHP, PHP is accessible and affordable in terms of implementation, and is very writeable and expressive. As a result, I assess three points out of four in the area of cost evaluation. Tentative list of research sources: 1) “PHP Manual”. http://www.php.net/manual/en/index.php (This is the site for the official manual of the PHP language) 2) Peter L. Kantor . “Hudson Valley Community College Web Site” . http://www.daaq.net/old/php/index.php 3) Sebesta Robert W. Concepts OF Programming Languages . Tenth Edition . Boston:Pearson, 2010 4) Wellington,Luke Thomson,Laura. PHP and MySQL Web Development. Second Edition. Developers Library. USA. Sams Publishing. 2003 5) Harris, Andy. PHP 5/MySQL Programming. for the absolute beginner. Canada. Premier 6) Gilmore, Jason. Beginning PHP and MySQL: From Novice to Professional. Fourth Edition. Apress. 2010 7) Habib, Irfan. “Integrating PHP and Perl”. Linux Journal. Volume 2007 Issue 154. Feb 2007 8) Fioretti, Marco. “Top Ten Tips for Getting Started with PHP”. Linux Journal. Volume 2006 Issue 145. May 2006 9) Knudsen, Craig. “PHP Version 4”. Linux Journal. Volume 1999 Issue 67es. November 1999 10) http://pecl.php.net/package/operator