Student@SVC

advertisement
PHP Type Casting Tutorial
September 3, 2014 by Ankur Kumar Singh Leave a Comment
Meaning of type casting is to use value of a variable with different data type. In other
word typecasting is way to utilize one data type variable into different data type.
Typecasting is explicit conversion of data type because user explicitly defines the
data type in which he wants to cast. In this tutorial we will explore various aspects of
PHP Type casting.
PHP Type casting
PHP does not require or support type definition of the variable. In php we never
define data type while declaring the variable. In PHP variables automatically decide
the data type on the basis of the value assignment or context. For example:
<?php
$i =1;
var_dump($i); //$i is integer
$i = 2.3;
var_dump($i); //$i is float
$i = "php type casting";
var_dump($i)//$i is string
?>
In above example you can see that variable $i type is getting changed on different
type of value assignment. So due to this flexible nature of the PHP we do not need
to type cast variable always. But Sometime when we need extra security in the
variable we cast type. For example if we are taking some integer input from user
then we should type cast.
PHP type casting works same as C programming. Desired data type name with
parenthesis before the variable which we need to cast. For example, if we need to
cast string to integer then below will work:
<?php
$string_var = "string value for php type";
$int_var = (int)$string_var;
var_dump($ini_var);
?>
We can cast following data type variable in php
1. Integer using (int) or (integer)
2. Boolean using (bool) or (boolean)
3. Floating Number using (float) or (real) or (double)
4. String using (str)
5. Array using (array)
6. Object using (object)
7. Null using (unset)
PHP type Casting to Integer
Using (int) or (integer) keyword we can cast/convert any data type value to integer. If
we need to take integer casting then we can also use intval() function.
If we will convert boolean to integer then False will output 0 and true will output 1.
For example
<?php
$bool_false = false;
$int_val =
(int) $bool_false ;
var_dump($int_val); //Output will be 0
$bool_true = true;
$int_val = (int) $bool_true ;
var_dump($int_val); //Output will be 1
?>
If we will convert resource data type to integer then it will return unique resource ID.
For example
<?php
$fp = fopen("filename.txt", "w");
$int_cast = (int) $fp;
var_dump($int_cast);
?>
If we will cast float number to integer then output will be the number before decimal.
Means if we will cast 10.9 to integer then output will be 10.
<?php
$float_num = 10.9;
echo (int) $float_num;
?>
Conversion from string to number is quite complected and also used rarely. For the
complete conversion list you can
refer http://php.net/manual/en/language.types.string.php#language.types.string.conv
ersion.
PHP type casting to Boolean
We can cast any variable to Boolean using (bool) or (boolean) keyword. If we will
convert any variable data type to Boolean then if the variable has value(and value is
not 0) then it will return true, otherwise false.
<?php
var_dump((bool) 1); //return true
var_dump((bool) 0);//return false
var_dump((bool) "");//return false
var_dump((bool) , "ank");//return true
?>
PHP type casting to Float
Except for the string all float casting first go through integer casting then converted
into float. If we will convert object to float then notice will be thrown in PHP 5.
PHP type casting to string
We can convert any data type to string using (string). String conversion
automatically happen in the scope where it needed. In the most of the cases value
not changed. But for boolean false converted into “” and true in “1”. Below is the
example
<?php
$boo_true = true;
var_dump((string) $boo_true);// Output will be "1"
var_dump((string) false) ;//Output will be ""
var_dump((string) 1);//output will be 1
?>
PHP type casting to array
We can convert any data type variable in array using (array) keyword. Any scalar
data type conversion into array will create array and add element at 0th index. For
example:
<?php
var_dump((array) , 5);// value 5 in the array with 0th index
var_dump((array) NULL);// Will be empty array
?>
Usually we use array conversion in case of the object. All public property will convert
in the key value format.
Abstract Classes and Interface in PHP
March 24, 2013 by Ankur Kumar Singh 50 Comments
Abstract class and Interface in php play very important role in oop. In this section we
will discuss following point
1. What is abstract classes.
2. What is interface
3. How to implement abstract classes in php
4. How to implement interface in php
5. Different between abstract classes and interface.
What is abstract Classes
As from name it seem like something that is hidden. Yes nature of the abstract
classes are same. Abstract classes are those classes which can not be
directly initialized. Or in other word we can say that you can not create object of
abstract classes. Abstract classes always created for inheritance purpose. You can
only inherit abstract class in your child class. Lots of people say that in abstract
class at least your one method should be abstract. Abstract method are the
method which is only defined but declared. This is not true definition as per my
assumption. But your any class has at least one method abstract than your class is
abstract class.
Usually abstract class are also known as base class. We call it base class because
abstract class are not the class which is available directly for creating object. It can
only act as parent class of any normal class. You can use abstract class in class
hierarchy. Mean one abstract class can inherit another abstract class also.
Abstract classes in PHP
Abstract classes in php are simillar like other oop languages. You can create
abstract classes in php using abstract keyword. Once you will make any class
abstract in php you can not create object of that class.
abstract class abc
{
public function xyz()
{
return 1;
}
}
$a = new abc();//this will throw error in php
above code will throw error in php.
Abstract classes in php are only for inheriting in other class.
abstract class testParent
{
public function abc()
{
//body of your funciton
}
}
class testChild extends testParent
{
public function xyz()
{
//body of your function
}
}
$a = new testChild();
In above example you are creating of testChild Class. TestChild class is inheriting
testParent abstract class. So your abstract class is only available for inheritance.
Main motive of creating abstract classes in php is to apply restriction of
direct initialization or object creation.
Implementation of abstract method
As we know that abstract functions are those functions of abstract class which is
only defined. It will be declared in your child class. You can create any method
abstract using keyword abstract. You can only create abstract method either in
abstract class or interface. Following is example of the abstract
method implementation:
abstract class abc
{
abstract protected function f1($a , $b);
}
class xyz extends abc
{
protected function f1($name , $address)
{
echo "$name , $address";
}
}
$a = new xyz();
In class abc we have defined an abstract function f1. Now when we have inherited
class abc then declared function f1. If you have an abstract method in your
abstract class then once you inherit your abstract class then it is necessary to
declare your abstract method. If you will not declare your abstract method
then PHP will throw error in that case.
You can declare your abstract method in child class with the same visibility or less
restricted visibility.
abstract class parentTest
{
abstract protected function f1();
abstract public function f2();
//abstract private function f3(); //this will trhow error
}
class childTest
{
public function f1()
{
//body of your function
}
public function f2()
{
//body of your function
}
protected function f3()
{
//body of your function
}
}
$a = new childTest();
In above code you can see that you have declare 3 function in abstract class. But
private declaration of the abstract method will always throw error. Because private
method is availabe only in the same class context. But in case of f1. This is
protected. Now in child class we have defined it as public because public is less
restricted than protected. And for function f2 which is already public so we have
defined it as public in our child class. We have defined it public because no
any visibility is less restricted than public.
What is Interface ?
Interface in oop enforce definition of some set of method in the class.
By implementing interface you are forcing any class to must declaring some specific
set of methods in oop. For example if you are creating class to render HTML
element then it is necessary to set id and name of your html tag. So in this case you
will create interface for that class and define method like setID and setName. So
whenever someone will create any class to render HTML tag and implemented your
interface then he must need to define setId and setName method in their class. In
other word you can say that by help of interface you can set some definition of your
object. Interface is very useful if you are creating architecture of any oop base
application. Inter
Interface in PHP
Interface in php can be implemented like other oop lanugage. You can create
interface in php using keyword interface. By implementation of interface in php class
you are specifying set of the method which classes must implement.
You can create interface in php using interface keyword. Rest of the things are
typically identical to classes. Following is very small example of interface in php.
interface abc
{
public function xyz($b);
}
So in above code you are creating interface with name abc. Interface abc has
function xyz. Whenever you will implement abc interface in your class then you have
to create method with name xyz. If you will not create function xyz then it will throw
error.
You can implement your interface in your class using implements keyword. Let us
implement our interface abc in our class
class test implements abc
{
public function xyz($b)
{
//your function body
}
}
You can only define method in interface with public accessibility. If you will use other
than public visibility in interface then it will throw error. Also while defining method in
your interface do not use abstract keyword in your methods.
You can also extend interface like class. You can extend interface in php
using extendskeyword.
interface template1
{
public function f1();
}
interface template2 extends template1
{
public function f2();
}
class abc implements template2
{
public function f1()
{
//Your function body
}
public function f2()
{
//your function body
}
}
So here template2 has all property of tempate2. So whenever you will implement
template2 in your class, you have to create function of both interfaces.
You can also extend multiple interface in one interface in php.
interface template1
{
public function f1();
}
interface template2
{
public function f2();
}
interface template3 extends template1, template2
{
public function f3();
}
class test implements template3
{
public function f1()
{
//your function body
}
public function f2()
{
//your function body
}
public function f3()
{
//your function body
}
}
You can also implement more than one interface in php class.
interface template1
{
public function f1();
}
interface template2
{
public function f2();
}
class test implments template1, template2
{
public function f1()
{
//your function body
}
public function f2()
{
//your function body
}
}
You can not implement 2 interfaces if both share function with same name. It
will throw error.
Your function parameter in class must be identical to the parameter in the interface
signature. Following is example some example
interface template1
{
public function f1($a)
}
class test implements template1
{
public function f1($a)
{
echo $a;
}
}
Above will work. But following example will not work:
interface template1
{
public function f1($a)
}
class test implements template1
{
public function f1()
{
echo $a;
}
}
But it is not necessary to use the same name of the variable. Like $a. You can also
use any name. For example:
interface template1
{
public function f1($a)
}
class test implements template1
{
public function f1($name)
{
echo $name;
}
}
If you are using default argument then you can change your value of the argument.
For example
interface template1
{
public function f1($a = 20)
}
class test implements template1
{
public function f1($name
= "ankur")
{
echo $name;
}
}
In above section we have discussed interfaces and abstract classes in php. Both are
almost doing same things but has some difference.
Differences between abstract class and interface
in PHP
Following are some main difference between abstract classes and interface in php
1. In abstract classes this is not necessary that every method should be abstract.
But in interface every method is abstract.
2. Multiple and multilevel both type of inheritance is possible in interface. But
single and multilevel inheritance is possible in abstract classes.
3. Method of php interface must be public only. Method in abstract class in php
could be public or protected both.
4. In abstract class you can define as well as declare methods. But in interface
you can only defined your methods.
Abstract Classes and Interface in PHP
March 24, 2013 by Ankur Kumar Singh 50 Comments
Abstract class and Interface in php play very important role in oop. In this section we
will discuss following point
1. What is abstract classes.
2. What is interface
3. How to implement abstract classes in php
4. How to implement interface in php
5. Different between abstract classes and interface.
What is abstract Classes
As from name it seem like something that is hidden. Yes nature of the abstract
classes are same. Abstract classes are those classes which can not be
directly initialized. Or in other word we can say that you can not create object of
abstract classes. Abstract classes always created for inheritance purpose. You can
only inherit abstract class in your child class. Lots of people say that in abstract
class at least your one method should be abstract. Abstract method are the
method which is only defined but declared. This is not true definition as per my
assumption. But your any class has at least one method abstract than your class is
abstract class.
Usually abstract class are also known as base class. We call it base class because
abstract class are not the class which is available directly for creating object. It can
only act as parent class of any normal class. You can use abstract class in class
hierarchy. Mean one abstract class can inherit another abstract class also.
Abstract classes in PHP
Abstract classes in php are simillar like other oop languages. You can create
abstract classes in php using abstract keyword. Once you will make any class
abstract in php you can not create object of that class.
abstract class abc
{
public function xyz()
{
return 1;
}
}
$a = new abc();//this will throw error in php
above code will throw error in php.
Abstract classes in php are only for inheriting in other class.
abstract class testParent
{
public function abc()
{
//body of your funciton
}
}
class testChild extends testParent
{
public function xyz()
{
//body of your function
}
}
$a = new testChild();
In above example you are creating of testChild Class. TestChild class is inheriting
testParent abstract class. So your abstract class is only available for inheritance.
Main motive of creating abstract classes in php is to apply restriction of
direct initialization or object creation.
Implementation of abstract method
As we know that abstract functions are those functions of abstract class which is
only defined. It will be declared in your child class. You can create any method
abstract using keyword abstract. You can only create abstract method either in
abstract class or interface. Following is example of the abstract
method implementation:
abstract class abc
{
abstract protected function f1($a , $b);
}
class xyz extends abc
{
protected function f1($name , $address)
{
echo "$name , $address";
}
}
$a = new xyz();
In class abc we have defined an abstract function f1. Now when we have inherited
class abc then declared function f1. If you have an abstract method in your
abstract class then once you inherit your abstract class then it is necessary to
declare your abstract method. If you will not declare your abstract method
then PHP will throw error in that case.
You can declare your abstract method in child class with the same visibility or less
restricted visibility.
abstract class parentTest
{
abstract protected function f1();
abstract public function f2();
//abstract private function f3(); //this will trhow error
}
class childTest
{
public function f1()
{
//body of your function
}
public function f2()
{
//body of your function
}
protected function f3()
{
//body of your function
}
}
$a = new childTest();
In above code you can see that you have declare 3 function in abstract class. But
private declaration of the abstract method will always throw error. Because private
method is availabe only in the same class context. But in case of f1. This is
protected. Now in child class we have defined it as public because public is less
restricted than protected. And for function f2 which is already public so we have
defined it as public in our child class. We have defined it public because no
any visibility is less restricted than public.
What is Interface ?
Interface in oop enforce definition of some set of method in the class.
By implementing interface you are forcing any class to must declaring some specific
set of methods in oop. For example if you are creating class to render HTML
element then it is necessary to set id and name of your html tag. So in this case you
will create interface for that class and define method like setID and setName. So
whenever someone will create any class to render HTML tag and implemented your
interface then he must need to define setId and setName method in their class. In
other word you can say that by help of interface you can set some definition of your
object. Interface is very useful if you are creating architecture of any oop base
application. Inter
Interface in PHP
Interface in php can be implemented like other oop lanugage. You can create
interface in php using keyword interface. By implementation of interface in php class
you are specifying set of the method which classes must implement.
You can create interface in php using interface keyword. Rest of the things are
typically identical to classes. Following is very small example of interface in php.
interface abc
{
public function xyz($b);
}
So in above code you are creating interface with name abc. Interface abc has
function xyz. Whenever you will implement abc interface in your class then you have
to create method with name xyz. If you will not create function xyz then it will throw
error.
You can implement your interface in your class using implements keyword. Let us
implement our interface abc in our class
class test implements abc
{
public function xyz($b)
{
//your function body
}
}
You can only define method in interface with public accessibility. If you will use other
than public visibility in interface then it will throw error. Also while defining method in
your interface do not use abstract keyword in your methods.
You can also extend interface like class. You can extend interface in php
using extendskeyword.
interface template1
{
public function f1();
}
interface template2 extends template1
{
public function f2();
}
class abc implements template2
{
public function f1()
{
//Your function body
}
public function f2()
{
//your function body
}
}
So here template2 has all property of tempate2. So whenever you will implement
template2 in your class, you have to create function of both interfaces.
You can also extend multiple interface in one interface in php.
interface template1
{
public function f1();
}
interface template2
{
public function f2();
}
interface template3 extends template1, template2
{
public function f3();
}
class test implements template3
{
public function f1()
{
//your function body
}
public function f2()
{
//your function body
}
public function f3()
{
//your function body
}
}
You can also implement more than one interface in php class.
interface template1
{
public function f1();
}
interface template2
{
public function f2();
}
class test implments template1, template2
{
public function f1()
{
//your function body
}
public function f2()
{
//your function body
}
}
You can not implement 2 interfaces if both share function with same name. It
will throw error.
Your function parameter in class must be identical to the parameter in the interface
signature. Following is example some example
interface template1
{
public function f1($a)
}
class test implements template1
{
public function f1($a)
{
echo $a;
}
}
Above will work. But following example will not work:
interface template1
{
public function f1($a)
}
class test implements template1
{
public function f1()
{
echo $a;
}
}
But it is not necessary to use the same name of the variable. Like $a. You can also
use any name. For example:
interface template1
{
public function f1($a)
}
class test implements template1
{
public function f1($name)
{
echo $name;
}
}
If you are using default argument then you can change your value of the argument.
For example
interface template1
{
public function f1($a = 20)
}
class test implements template1
{
public function f1($name
= "ankur")
{
echo $name;
}
}
In above section we have discussed interfaces and abstract classes in php. Both are
almost doing same things but has some difference.
Differences between abstract class and interface
in PHP
Following are some main difference between abstract classes and interface in php
1. In abstract classes this is not necessary that every method should be abstract.
But in interface every method is abstract.
2. Multiple and multilevel both type of inheritance is possible in interface. But
single and multilevel inheritance is possible in abstract classes.
3. Method of php interface must be public only. Method in abstract class in php
could be public or protected both.
4. In abstract class you can define as well as declare methods. But in interface
you can only defined your methods.
Overloading and Overriding in PHP
March 24, 2013 by Ankur Kumar Singh 22 Comments
Function or method Overloading and overriding method is very basic and useful
feature of any oop language. In this tutorial we will discuss implementation of
method overloading and overriding in php. Here first we will explore basics of
overloading and overriding. After exploration of basics we will implement overloading
and overriding in php. Before going further I am assuming that you have basic
knowledge of classes and inheritance in php. Also you have understanding about
magic method in php. Magic method because overloading in php can be implmented
using magic methods.
What is Method Overriding in OOP ?
Basic meaning of overriding in oop is same as real word meaning. In real word
meaning of overriding phenomena of replacing the same parental behavior in child.
This is same in case of method overriding in oop. In oop meaning of overriding is to
replace parent class method in child class. Or in simple technical word method
overriding mean changing behavior of the method. In oop overriding is process by
which you can re-declare your parent class method in child class. So basic meaning
of overriding in oop is to change behavior of your parent class method.
Normally method overriding required when your parent class have some method,
but in your child class you want the same method with different behavior.
By overriding of method you can complete change its behavior from parent class. To
implment method overiding in oop we commonly create same method in child class.
What is Method Overloading in OOP ?
Overloading in oop is same as overloading in real word. In real word overloading
means assigning extra work to same machine or person. In oop method overloading
is same. By process of method overloading you are asking your method to some
extra work. Or in some cases we can say some different work also.
Normally method overloading in oop is managed on the basis of the argument
passed in function. We can achieve overloading in oop by providing different
argument in same function.
Overloading and Overriding in PHP
Hope your basic concept of overloading and overriding is clear now. Now let us
explore implementation of overloading and overriding in php.
Implementation of overriding in php is very easy. If your parent class has a function.
You can create function with same name in your child class to override the function.
Implementation of overriding can not be achieved by creating 2 function with same
name and different argument in php. Because we can not create same name
function more than 1 time in php class. To implement overloading we need to take
help of magic method in php. In below section we will explore overloading and
overriding one by one.
Overloading in PHP
As we know that we can not implement overloading by create 2 function in with
same name in class. So to implement overloading in php we will take help of magic
method __call. Magic method __call invoked when method called by class object is
not available in class. So here we will not create method exactly and will take help of
__call method. Now call method will provide us 2 argument, 1st name of the method
called and parameter of the function. Now with the help of either switch , case or if
else we will implement overloading in php. Following is very simple example of
overloading in php.
class test
{
public function __construct()
{
//Your logic for constructor
}
public function __call($method_name , $parameter)
{
if($method_name == "overlodedFunction") //Function overloading
logic for function name overlodedFunction
{
$count = count($parameter);
switch($count)
{
case "1":
//Business log in case of overlodedFunction function has 1
argument
echo "You are passing 1 argument";
break;
case "2": //Incase of 2 parameter
echo "You are passing 2 parameter";
break;
default:
throw new exception("Bad argument");
}
}
else
{
throw new exception("Function $method_name does not exists ");
}
}
}
$a = new test();
$a->overlodedFunction("ankur");
$a->overlodedFunction("techflirt" , "ankur");
As in above class test magic method __call is implemented which is managing
overloading
public function __call($method_name , $parameter)
{
if($method_name == "overlodedFunction") //Function overloading
logic for function name overlodedFunction
{
$count = count($parameter);
switch($count)
{
case "1":
//Business log in case of overlodedFunction function has 1
argument
echo "You are passing 1 argument";
break;
case "2": //Incase of 2 parameter
echo "You are passing 2 parameter";
break;
default:
throw new exception("Bad argument");
}
}
else
{
throw new exception("Function $method_name does not exists ");
}
}
As we know that __call magic method invoked when method is not available in the
class. So in case of above test class example we have not created
function overlodedFunction. So whenever method overlodedFunction is called __call
invoked. __call pass 2 variable, first name of the called method and other is
parameter passed in the called function.
Now in the __call function I have applied if condition to ensure that our business
logic of overloading works only for overlodedFunction function. In if block we have
counted number of argument in parameter and applied business logic.
Overriding in PHP
Overriding in php is very easy. As we know that overriding is process of modifying
the inherited method. So in case of inheritance you only need to create method with
same name in your child class which you want to override. Following is example of
overriding of method in php.
class testParent
{
public function f1()
{
echo 1;
}
public function f2()
{
echo 2;
}
}
class testChild
{
function f2($a) //overriding function f2
{
echo "$a";
}
}
$a = new testChild();
$a->f2("ankur");//it will print ankur
In above example you are overriding function f2. While overriding you are free to
change business logic, visibility and number of parameter.
Inheritance in PHP
March 24, 2013 by Ankur Kumar Singh 9 Comments
Inheritance is a concept in object oriented programming. With the help
of inheritance we can get all property and method of one class in another class. This
is principle to take re-fusibility on upper level. Inheritance in php is introduced from
php5 version.
In this chapter we will explore about basics concept of inheritance After basic we
will discuss implementation of inheritance in php. This tutorial for the beginner who
want to learn basic concept of inheritance in php. Before going further I am
assuming that have some idea of oop in php. Later in this chapter we will cover
some advance aspect of the inheritance also.
What is inheritance?
Inheritance is nothing but a design principle in oop. By implementing inheritance you
can inherit(or get) all properties and methods of one class to another class.The class
who inherit feature of another class known as child class.The class which is being
inherited is know as parent class.Concept of the inheritance in oop is same as
inheritance in real world. For example, child inherits characteristics of their parent.
Same is here in oop. One class is inheriting characteristics of another class.
With the help of inheritance you can increase re-usability of code. Let us take an
example in terms of generic programming practices. Suppose you are going to
create classes to render different html tag(div, span, form, table etc). Now you will
create class with name html_div, html_span , html_form. You are creating different
class because every element is different in nature. For example form has action and
method and you will have different input element in form. But table will have tbody,
tr, th and td.
Now just think for some moment. There are some element and their rendering is
same in all element. For example all html mention above is having name, id, class
attribute which is same. Also rendering of those element is also same. So in above
case you can create parent class with name HTML and you can inherit that class
across all of your classes like div, span, form. Following is the generic code structure
of inheritance in oop taking your HTML attribute in consideration. I am taking php
syntex for better understnding
class HTML
{
protected $name;
protected $id;
protected function basicAttribute
{
return "name='$this->name' id='$this->id'";
}
}
Class HTML_div extends HTML
{
public function __construct($id , $name)
{
$this->id = $id;
$this->name = $name;
}
public function getDiv($content)
{
$basicAttribute = $this->basicAttribute();
return "<div $basicAttribute >$content</div>"
}
}
Class HTML_span extends HTML
{
public function __construct($id , $name)
{
$this->id = $id;
$this->name = $name;
}
public function getSpan($content)
{
$basicAttribute = $this->basicAttribute();
return "<span $basicAttribute >$content</span>"
}
}
Above code is and example of basic inheritance in php. All method(protected and
public) from HTML class is directly accessible in your class HTML_div and
HTML_span class. In both child classes you no need to write rendering of id and
name logic again and again. This really saves time and give some
good modulations in the code.
Hope your basic understanding about inheritance is clear. Now let us move to
implementation of inheritance in php.
Inheritance in php
Concept of inheritance in php is as simple as in other oop languages as from php5
community target is to provide healthy oop concept. If you will analyze basic code of
my previous topic, this is typical example of inheritance in php. To
implementing inheritance in php you need at least 2 classes. One will be parent
class and other will be child class. In child class you can inherit all properties and
methods(protected and public only) from parent class. You can implement
inheritance in php using keyword extends. Let us take above example again with
some modification:
class HTML
{
protected $name;
public $id;
private $with;
protected function basicAttribute
{
return "name='$this->name' id='$this->id'";
}
}
Class HTML_div extends HTML
{
public function __construct($id , $name)
{
$this->id = $id;
$this->name = $name;
}
public function getDiv($content)
{
$basicAttribute = $this->basicAttribute();
return "<div $basicAttribute >$content</div>"
}
}'
$objDiv = new HTML_div("bloc_main" , 'avc');
$objDiv->getDiv('this is and example of inheritance in php');
Now in above code class HTML_div is inheriting property and method from class
HTML.
Multilevel and Multiple inheritance in PHP
In php multilevel inheritance is possible but multiple inheritance is not possible. In
simplified terms in php child class can not inherit more than one parent class.
But hierarchical inheritance is possible in php. Hierarchical means Parent inherit
property of grand parent class. Grand child inherit property of parent class. So in
multilevel inheritance child can get some property of from grand parent class also.
Example of Multiple inheritance in PHP
class test
{
//Your class body
}
class test1
{
//Your class body
}
class test3 extends test1 test2
{
//your class body
}
Above code will not work in php. Because php is single inheritance language.
Example of Multilevel inheritance in PHP
class grandParent
{
//Body of your class
}
class parent extends grandParent
{
//Body Of your class
}
class child extends parent
{
//Body of your class
}
This is very basic example of multilevel inheritance. In php it is possible to
implement multilevel inheritance. In above example parent class is inheriting grand
parent property. And and child is inheriting parent property. So child have some
parent and grand parent property.
Static Methods and Property in Inheritance in
PHP
As in our example of HTML_div class we have explored that we can use $this> keyword to get all property and method of parent(HTML) class. But if your parent
or child method is static, then you can access static methods or properties
using self and parent keyword. Also this is not necessery to make method static if
you want to use self or parent keyword. This is very useful if your parent and child
both method is having property or method with same name. If both classes having
same property and you want to call specific property or method then you can use
this keyword.
Self and parent in case of static methods:
class parent
{
public static abc()
{
//your function body
}
}
class child
{
public static xyz()
{
//your function body
}
function callStatic()
{
self::xyz();
parent::abc();
}
}
Self and Parent without static
class parent
{
protected function xyz()
{
//Your function body
}
}
class child extends parent
{
public function xyz()
{
//your function body
}
public function calll()
{
self::xyz();
parent::xyz();
}
}
Download Code for Inheritance in PHP
Static Methods and Properties in PHP
March 24, 2013 by Ankur Kumar Singh 9 Comments
Static methods and properties in php is very useful feature. Static methods
and properties in php can directly accessible without creating object of class. Your
php class will be static class if your all methods and properties of the class is
static. Static Methods and Properties in PHP will be treated as public if
no visibility is defined.
Static Properties/Variables in PHP
Static properties of class is a property which is directly accessible from class with
the help of ::(scope resolution operator). You can declare static property
using static keyword. In other word you can make any property static by using static
keyword. following is the basic example of static variable in php class:
class test
{
public static $a;//Static variable
}
test::$a = 5;
echo test::$a;
You can not access regular property by static way. It will generate fatal error. For
withing the class you can access static property using self keyword. If you
are accessing parent class property then you need to use parent keyword.
class testParent
{
public static $var1;
}
class testChild extends testParent
{
public static $var2;
public $abc =2;
function testFunction()
{
self::$var2 = 3;
parent::$var1 = 5;
}
}
echo testChild::$abc; //throw fatal error
Static variable or property are the best way to preserver value of the variable within
the context of different instance. Please go through following example for
better expatiation:
class test
{
private static $no_of_call = 0;
public function __construct()
{
self::$no_of_call = self::$no_of_call + 1;
echo "No of time object of the class created is: ".
self::$no_of_call;
}
}
$objT = new test(); // Prints No of time object of the class
created is 1
$objT2 = new test(); //Prints No of time object of the class
created is 2
So creating static variable or property is very useful if you want to share some data
between the different object of the same class. We will get better example of the
static property implementation in chapter PHP Design Patterns.
Static Methods or functions
As in general class various process are same for methods and properties, same is
with Static Methods and Properties in PHP. You can create your function or
method static using static keyword. You can access all visible static methods for
you using :: like in static variables.
class test
{
static function abc($param1 , $param2)
{
echo "$param1 , $param2";
}
}
test::abc("ankur" , "techflirt");
If you will use regular or normal method statically then you will get
E_STRICT warning. In case of variable or property it was throwing fatal. Let us
take above example
class test
{
function abc($param1 , $param2)
{
echo "$param1 , $param2";
}
}
test::abc("ankur" , "techflirt"); //will work fine
with warning.
Since static methods is called direct $this variable will not available in the method.
Magic Methods in PHP
March 24, 2013 by Ankur Kumar Singh 32 Comments
Magic methods in php are some predefined function by php compiler which
executes on some event. Magic methods starts with prefix __, for example __call,
__get, __set. I am including magic methods topic in my oop tutorial here because
these method mostly applied in classes of PHP. If you have gone through my
previous chapter then you have seen __construct function. __construct is a magic
method which automatically call on creating object of the classes. There are verous
magic methods in php. Here we will discuss some of the most comman magic
methods of php which will be use in object oriented programming. First of let us
review all magic method with short description.
List of List of Magic Methods in PHP
Magic
Description
Method
This magic methods is called when someone create object of your class. Usually this
__construct
is used for creating constructor in php5.
This magic method is called when object of your class is unset. This is just opposite
__destruct
of __construct.
This method called when your object attempt to read property or variable of the
__get
class which is inaccessible or unavailable.
This method called when object of your class attempts to set value of the property
__set
which is really inaccessible or unavailable in your class.
This magic methods trigger when isset() function is applied on any property of the
__isset
class which isinaccessible or unavailable.
__unset is something opposite of isset method. This method triggers
__unset
when unset() function called on inaccessible or unavailable property of the class.
__call magic method trigger when you are attempting to call method or function of
__call
the class which is either inaccessible or unavailable.
__callstatic __callstatic execture when inaccessible or unavailable method is in static context.
__sleep
__sleep methods trigger when you are going to serialize your class object.
__wakeup
__wakeup executes when you are un serializing any class object.
__toString
__toString executes when you are using echo on your object.
__invoke
__invoke called when you are using object of your class as function
Above list is the most conman used magic methods in php object
oriented programming. Above magic methods of php executes on some specif
events occur on your class object. For example if you simply echo your object then
__toString method trigger. Let us create group of related magic method and analyze
how it is working.
__construct and __destruct magic method in PHP
__construct method trigger on creation of object. And __destruct triggers of deletion
of object. Following is very basic example of __construct and __destruct magic
method in php:
class test
{
function __construct()
{
echo 1;
}
function __destruct()
{
echo 2;
}
}
$objT = new test(); //__construct get automatically executed
and print 1 on screen
unset($objT);//__destruct triggers and print 2.
__get __set __call and __callStatic Magic
Methods
__get, __set, __call and __callStatic all magic methods in php directly related with
no accessible method and property of the class.
__get takes one argument and executes when any inaccessible property of the
method is called. It takes name of the property as argument.
__set takes two property and executes when object try to set value in inaccessible
property. It take first parameter as name of the property and second as the value
which object is try to set.
__call method fires when object of your class is trying to call method of property
which is either non accessible or not available. It takes 2 parameter
First parameter is string and is name of function. Second parameter is an array
which is arguments passed in the function.
__callStatic is a static magic method. It executes when any method of your class is
called by static techniques.
Following is example of __get , __set , __call and __callStatic magic methods
class test
{
function __get($name)
{
echo "__get executed with name $name ";
}
function __set($name , $value)
{
echo "__set executed with name $name , value $value";
}
function __call($name , $parameter)
{
$a = print_r($parameter , true); //taking recursive array in
string
echo "__call executed with name $name , parameter $a";
}
static function __callStatic($name , $parameter)
{
$a = print_r($parameter , true); //taking recursive array in
string
echo "__callStatic executed with name $name , parameter $a";
}
}
$a = new test();
$a->abc = 3;//__set will executed
$app = $a->pqr;//__get will triggerd
$a->getMyName('ankur' , 'techflirt', 'etc');//__call willl be
executed
test::xyz('1' , 'qpc' , 'test');//__callstatic will be
executed
__isset and __unset magic methods
__isset and __unset magic methods in php are opposite of each other.
__isset magic methods executes when function isset() is applied on property which
is not available or not defined. It takes name of the parameter as an argument.
__unset magic method triggers when unset() method is applied on the property
which is either not defined or not accessible. It takes name of the parameter as an
argument.
Following is example of __isset and __unset magic method in php
class test
{
function __isset($name)
{
echo "__isset is called for $name";
}
function __unset($name)
{
echo "__unset is called for $name";
}
}
$a = new test();
isset($a->x);
unset($a->c);
Classes and Objects Tutorial in PHP
March 24, 2013 by Ankur Kumar Singh 26 Comments
Classes and Objects are key part of object oriented programming (oop) in php. If
you are directly here then I am assuming that you have basic knowledge of OOP. If
you are beginner for oop and do not have good confidence on basics of classed and
objects then please first go to Basics of OOP.
Who can Read this tutorial and what you can can
learn here ?
This tutorial is for beginner who has some basic knowledge of oop. If you know what
is oop, what is object and classes and what to learn its implementation, then you
are at right place. Before going further you should have basic understanding of oop,
classes and object. Technical knowledge of oop is not commendatory for this
tutorial.
This tutorial, we will start from the basic concept of classes and objects. Here we will
start from how to create classes in php. How to create object of any class. And in
later part we will explore some advance concept of object and class here. In short if
you have basic knowledge of oop and want to learn classes and
objects implantation in php then you are at right place.
Class In PHP
Concept of class ( or basic object oriented structure) introduced from php4.
But complete coverage of class like access modifier or interface
is introduced from php5. Creating class is very easy in php. You can create class
with help of using class keywordin php. Following is a basic class example
class myOwnClass
{
//variables of the class
var $variable1;
var $variable2;
//Function of class
function mergeVariable()
{
return $this->variable1 . $this->variable2;
}
}
Class myOwnClass is created by using keyword class. Your name of the class will
be general string without space. Also complete block of the class in enclosed
within {}(see bold braces). All variables of this class is defined in the beginning of
the class. Variables are starting with var keyword. From php5 you can declare
variable using its level of visibility also. For example if you want to declare
$varaible1 to be accessible from anywhere then you can use public
$variable1 instead of var $variable1. If you will usevar $variable1 in php5,
the variable will be treated as public by default.
Next part is function declaration of your class. As per the above example you can
directly declare function as function mergeVariable(). It is very basic creation of
function within your class and supports from php4. In php5 you can apply visibility on
your function also. For the same function you can write in php5 like public function
mergeVariable(). If you will not define your visibility factor by default your function
will be treated as public.
following is example of same above class in php5:
class myOwnClass
{
//variables of the class
public $variable1;
private $variable2;
//Function of class
public function mergeVariable()
{
return $this->variable1 . $this->variable2;
}
}
So in the basic architecture of php4 and php5 class is almost same except use of
visibility. We will visibility in depth in chapter Visibility in OOP. For now you just think
thatvisibility is access factor of your class’s method and variable. If you want
your object to allow access of your variable or function then make it public. If you do
not want your object to access methods and properties then make it private.
You can directly pass value to your class from by the class function parameter also.
Function of the class always work like general function in the php. For example
{
//variables of the class
public $variable1;
private $variable2;
//Function of class
public function mergeVariable($third_var)
{
return $this->variable1 . $this->variable2. $third_var;
}
}
You can not create class with name stdClass in PHP. It is reserved class of the
php. stdClass represent standard object. It is used to create empty object. You can
use it without creating it. IF you will forcefully try to create class with name stdClass.
PHP will throw fatal error with following message.
Fatal error: Cannot redeclare class stdClass
It is recommended to not create your function of the class starting with __ like __call.
Because function started with __ is seems like magic function in php.
You can not breakup your class in several file. But withing your class function you
can breakup it into multiple file. Let us understand this by example. Following is not
allowed in your class
class myClass
{
public $abc;
public function test()
{
return $this->abc;
}
include "abc.php";
}
But following is allowed:
class myClass
{
public $abc;
public function test()
{
include "abc.php";
}
}
So while creating class you should take care of above practices.
Object IN PHP
Classes are useless without objects. Object is an instance of your class.If you have
class then you need to create object of the class to solve your problem using class.
You can create object of your class by using new keyword.
$objClass = new myClass();
Now in above code you are creating object of class myClass in variable
$objClass. You can create multiple object of your same class. Every object is
different from other.
$objClass1 = new myClass();
$objClass2 = new myClass();
To completely understand object Let us create full class and their object. Here I will
create class for interest calculation and then I will create object of that class and
calculate interest.
//Creating class interestCalculator
class interestCalculator
{
public $rate;
public $duration;
public $capital;
public function calculateInterest()
{
return ($this->rate*$this->duration*$this->capital)/100;
}
}
//Creating various object of class interestCalculator to
calculate interest on various amount
$calculator1 = new InterestCalculator();
$calculator2 = new InterestCalculator();
$calculator1->rate = 3;
$calculator1->duration =2;
$calculator1->capital = 300;
$calculator2->rate = 3.2;
$calculator2->duration =3;
$calculator2->capital = 400;
$interest1 = $calculator1->calculateInterest();
$interest2 = $calculator2->calculateInterest();
echo "Your interest on capital $calculator1->capital with rate
$calculator1->rate for duration $calculator1->duration is
$interest1 <br/> ";
echo "Your interest on capital $calculator2->capital with rate
$calculator2->rate for duration $calculator2->duration is
$interest2 <br/> ";
Please run above code in browser. You will get following output.
Now please
analyse above code carefully. We have created two object of interestCalculator
class in variable $calculator1 and $calculator2. Now property value of both objects
are different. for example $calculator1 capital is 300 and $calculator2 capital is 400.
Whenever you will call calculateInterest function of the both object then they will
calculate interest on their own properties.
Now just analyse code of your class interestCalculator
class interestCalculator
{
public $rate;
public $duration;
public $capital;
public function calculateInterest()
{
return ($this->rate*$this->duration*$this->capital)/100;
}
}
You can find that class has 3 variable or properties ($rate , $duration, $capital). Now
look into function calculateInterest. In the body of the function we have used
variable $this.$this is system defined object variable of the class. $this is object of
self class in the current context. For the both object of interestCalculator
class $this object is different. When you have object $calculator1 then $this->rate
is 3 and in case of $calculator2 $this->rate is 3.2
public function calculateInterest()
{
$rate = 5;
return ($this->rate*$this->duration*$this->capital)/100;
}
}
In above function of class $this->rate and $rate is different. $this->rate will always
has values assigned by the object of the class but $rate is fix value. If you will
replace $this->rate to $rate your rate of interest will always be 5.
public function calculateInterest()
{
$rate = 5;
return ($this->rate*$this->duration*$this->capital)/100;
}
}
You can create object of the class in some different way also. Following is some of
the example of creating object of class.
$className = 'interestCalculator';
$calc1 = new $className();
From php 5.3 onward you can create object of class
$cls1 = new interestCalculator();
$cls2 = new $cls1;
Constructor of Classes and Objects
Constructor is nothing but a function defined in your php class. Constructor function
automatically called when you will create object of the class. As soon as you will
write $object = new yourClass() your constructor function of the class will be
executed. In php4 you can create constructor by creating function with same name
of your class. But from php5 you can also create constructor by defining magic
function __construct. Please go through the blow example of the constructor.
PHP 4 constructor(will work in php 5 also)
class interestCalculator
{
var $rate;
var $duration;
var $capital;
//Constructor of the class
function interestCalculator()
{
$this->rate = 3;
$this->duration = 4;
}
}
PHP5 constructor
class interestCalculator
{
public $rate;
public $duration;
public $capital;
//Constructor of the class
public function __construct()
{
$this->rate = 3;
$this->duration = 4;
}
}
In both whenever instance of the class will be created rate will be set to 3 and
duration will be set to 4. But difference is in way of calling the constructor. In php4
you were limited to create constructor by creating function with same name of the
class. But in php5 you can either create function with same name or create a
function __construct to create constructor of the class. You can also pass parameter
in the constructor.
class interestCalculator
{
public $rate;
public $duration;
public $capital;
//Constructor of the class
public function __construct($rate , $duration)
{
$this->rate = $rate;
$this->duration = $duration;
}
}
$objCls = new interestCalculator(3.2 , 7) //passing value of
$rate and $duration
If you have created parameter in the constructor you need to pass value for them on
the time of object creation. $objCls = new interestCalculator(3.2 , 7). If you will not
send value php will throw error.
Playing with visibility and other feature of the
constructor
Let us explore in depth of constructor for our classes and objects in php.
All implantation is described here are considered only for php5.
Did you noticed that I have created my constructor function public. If not then please
go to above section and explore. Reason behind creating constructor function public
is it is accessible from outside of the class. This function is executed when we are
creating object. So php will aways through error if you will create your constructor
private. Let us try below code:
class interestCalculator
{
public $rate;
public $duration;
public $capital;
//Constructor of the class
private function __construct($rate , $duration)
{
$this->rate = $rate;
$this->duration = $duration;
}
}
$objCls = new interestCalculator(3.2 , 7); //passing value of
$rate and $duration
It will give you following output
Fatal error: Call to private interestCalculator::__construct() from invalid context
As you can define your constructor by creating function with same name of
class(event in php5), if you will use following code then your output will be same:
class interestCalculator
{
public $rate;
public $duration;
public $capital;
//Constructor of the class
private function interestCalculator($rate , $duration)
{
$this->rate = $rate;
$this->duration = $duration;
}
}
$objCls = new interestCalculator(3.2 , 7);
Following error you will receive
Fatal error: Call to private interestCalculator::interestCalculator() from invalid
context
So in short you can not make your constructor private. If you will make your
constructor private then you will receive an error.
Now Just think that you can define your constructor either by creating
function with same name of the class or by crating function with name
__construct. Now what happen if you will use both thing in your single class.
Let us try this code:
class test
{
public function __construct()
{
echo 1;
}
function test()
{
echo 2;
}
}
$t = new test();//Output will be 1
It will give you output 1. Means your __construct function is called. So if you have
__construct then it will be the first preference. If __construct function is not
present then it will search for the function with the same name of class. Think
what happen in case if you have both option and your function __construct is
private. Try this code
class test
{
private function __construct()
{
echo 1;
}
function test()
{
echo 2;
}
}
$t = new test();
You will get following error:
Fatal error: Call to private test::__construct() from invalid context
Best Practice of Classes and Objects
Following are some best practice of using classes and objects in your application.
1. Instead of assigning variable of the classes after creating object it is good if you
use constructor.
2. Use visibility as required. Do not make your variable and method either more
secure or completely open. Over security will effect your flexibility, under
security will distrust your structure.
3. Follow some convention in your classes and objects. Like start all public
method with camel case, all protected method and varaible prefix with _ etc. It
will give you better visibility.
4. Do not try to do every thing in single class. Create class very specific to
your requirement. It will same your time and execution.
5. Always try to create every class in separate file and follow some naming
convention.
New Features in PHP 5.4
February 23, 2013 by Ankur Kumar Singh 3 Comments
PHP 5.4 is a major release after PHP 5.3. PHP community has made very good
efforts and introduced some new features in php 5.4. Some of the feature decided to
release in PHP6 also introduced in PHP 5.4 and they have remove various
developer tedious thins in new version of PHP. In my Post New Features in PHP
5.4 I will describe some of the major changes made in php 5.4. Before going
for discussion if you would like to upgrade your PHP from older verion to 5.4 you can
read my previous post for Upgrade PHP in XAMPP. After installing php 5.4 your
phpinfo page will look like this.
New Features in PHP 5.4
First of all I will provide you overview here for the all changes made in PHP 4.5 and
then I will describe the major changes.
New Features in PHP 5.4
Officially PHP 5.4.0 is released at 1 March 2012. But it become quickly popular due
to some exciting features. As per the php community following point is made in php
5.4

Following legacy features has been removed

break/continue syntax.

Safe mode and related ini options

register_globals and register_long_arrays options from INI.

highlight.bg option from INI.

session_is_registered(), session_register() and session_unregister()
functions is removed.


Support for putenv(“TZ=..”) for setting the timezone.
Following are some improvement made in PHP 5.4

Added Short array syntex support

Added support for class expression like Class::{expr}()

Removed compile time dependency from extension mbstring

Traits introduced

Closure $this support back

Callable Typehinting

Improved ternary operator performance in case of array

Zend Engine Improvement

Improvement in some core function

CLI webserver introduced

Some improvement in curl and file system function.
For complete list of changes you can go to PHP website change log page PHP 5.4
Change Log. Now let us dicuss some major changes along with code in php 5.4.
Performance Improvement in PHP 5.4
In php5.5 community has tried their best to improve performance of php. Although
all improvement in php always deponds upon how you have code. However for the
same code I have tested on both PHP 5.3 and PHP 5.4 and found that PHP 5.4 is
around 30% faster than older released version in terms of memory utilization
performance. And also about speed I have tested zend frameword and it
performance is great when we are running on php 5.4. So if your website or web
application is little bit slow. Upgrade your PHP version. It will really work for you.
Fast CGI request handler is also faster in PHP 5.4. Now in new version class,
function and constant cache has been introduced. It will really help if you website is
build on OOP paradigm.
Traits
Traits is a new major feature introduced in new php version release. As per Rasmus
Lerdorf(main creator of php) it is compiler assisted copy paste method in PHP. This
is completly new thing/feature released in php 5.4. By help of trait we can reuse our
code in PHP. In general terms traits is process of reusing code in single inheritance
language. Structure is similar to class and we can use it to grouped functionality.
However we can not directly initiate traits. In other programming traits is
implemented with name mixins. Below is the example of traits
<?php
trait global_class_functions
{
public function helloword()
{
echo 'this is trait helloword';
}
}
class base
{
use global_class_functions;
}
$objBase = new base();
$objBase->helloword(); // print 'this is trai helloword'
?>
Callable Typehinting
In release of PHP 5.4 callable typehinting is released. Callable typehinting is very
cool feature in this new version of php. This is benificial for the people how always
want to make their function tightly type cased. Following is the sample code of
callable typehinting.
<?php
function test_callback_function()
{
return 123;
}
class cls
{
public function mthd()
{
return 456;
}
}
function test_callable(callable $a)
{
return $a();
}
echo test_callable(test_callback_function);// Imprementation
of callable using function, return 123
echo test_callable(['cls' , 'mthd']);//implementation of
callable for static implementation of class
echo test_callable([(new cls),'mthd']); // Implementation of
callable for object implementation of class
?>
Short Array Syntax
Short array syntax is already very popular method of defining and declaring array in
other programming language. Now in new release PHP community has also
released implementation of short array syntax. Now you can define your array in
following way also in php:
<?php
$arr1=[1,2,'test'];
print_r($arr1);
$arr2=['a'=>1,'b'=>'Ankur'];
print_r($arr2);
?>
Is not it version easy in new PHP version to declare array
Mix array function(Function array de-referencing)
PHP has introduced the direct referencing of the array if function is returning the
array value. From this method you can directly get the array value returned by
function without declaring the extra variable. In earlier version of PHP you were first
taking output of function in an array in case of array return. And then you were
getting the value of array. But from new version you can directly get this variable.
Following is an example of the array referencing of the function return:
<?php
function test_array_ref()
{
return [1,2,3,4];
}
function test_ref_2()
{
return ['one'=>'Ankur' , 'two'=>'puttul'];
}
echo test_array_ref()[3]; //return 4
echo test_ref_2()['one']; //return Ankur
?>
__invoke to Object as function
New magic function __invoke has been introduced in PHP 5.4 by which you can use
object as function. Whenever you will print the object of any class then it will
automatically execute __invoke function. So whenever you would like to print any
thing from your class object then you can go for __invoke magic method. In the
below example I am calculating area of the rectangle via object as function. Below is
the code for the object as function
<?php
class clsRect
{
private $height;
private $width;
//Constructor
function __construct($height, $width)
{
$this->height = $height;
$this->width = $width;
}
// Magic function invoke
function __invoke()
{
echo $this->height * $this->width;
}
}
//Implementation of __invoke
$objRect = new clsRect(10,25);
$objRect();
?>
Interface for Json and Session Handling
Two new interface introduced in new version of PHP. For the json interface
name JsonSerializable and for session SessionHandlerInterface is introduced in
the release PHP 5.4. Interface SessionHandlerInterface is a native PHP interface.
By SessionHandlerInterface you can handle PHP session using your class. Only you
need to do is to implement this interface. Following is the example of the interface
sessionhandleInterface implementation:
<?php
class CustomSessionHandler implements SessionHandlerInterface
{
private $sessionPatch ;
public function open($sessionPatch , $sessionName)
{
$this->sessionPatch = $sessionPatch;
if (!is_dir($this->sessionPatch)) {
mkdir($this->sessionPatch, 0777);
}
return true;
}
public function close()
{
return true;
}
public function read($id)
{
return (string)@file_get_contents("$this>sessionPatch/sess_$id");
}
public function write($id, $data)
{
return file_put_contents("$this->sessionPatch/sess_$id",
$data) === false ? false : true;
}
public function destroy($id)
{
$file = "$this->sessionPatch/sess_$id";
if (file_exists($file)) {
unlink($file);
}
return true;
}
public function gc($maxlifetime)
{
foreach (glob("$this->sessionPatch/sess_*") as $file) {
if (filemtime($file) + $maxlifetime < time() &&
file_exists($file)) {
unlink($file);
}
}
return true;
}
}
$objSess = new CustomSessionHandler();
session_set_save_handler($objSess, true);
session_start();
$_SESSION['a'] = 2;
?>
Web server built-in for CLI
In version PHP 5.4 features of built-in webserver for cli has been introduced.
Built-in CLI webserver PHP 5.4
CLI(Command line interface) web server is not recommended to use in the
production environment . However it completly deponds upon your chose. I am
using this and noting harm has been detected. It is build for testing some script in
the easy way from command like and Linux geek.
Comparison Between WordPress Joomla
and Drupal
October 14, 2012 by Ankur Kumar Singh 13 Comments
This “Comparison Between WordPress Joomla and Drupal” is my first post on the
techflirt.com. For last 1 week I was comparing all CMS and was trying to figure out
the best on for my need. I have developed lots of website on joomla, drupal and
wordpress. I have good experience with these(Joomla, Drupal and WordPress) and
have good understanding of the feature available in all 3 CMS. All 3 CMS are good
but when it comes to take a decision for the website where you have vision to
provide the out of box services then it is bit tough. In this article First I will
demonstrate the best suitable feature provided by all these 3 cms and then I will
share my though on how to take decision.
Drupal :
Drupal is not only an opensource CMS tool but also a very good CMS framework. It
provides you a lots of feature of creating an optimized CMS site. You can easily
setup a site in very flexible way. View and CCK in the drupal make it very powerful.
You can create your cms in your own way in drupal with CCK and view. Structured
code of the drupal makes it very reliable. Drupal has very powerful community
support. Following is the key feature of the drupal.
1. Nodes
2. Taxonomy
3. View
4. CCK
5. Extensive access control
6. Best theme integration
Above feature are very rich in terms of the content management system. These
feature makes drupal not only CMS tool but also the CMS framework. The basic
difference between CMS tool and framework is, a cms framework is a tool by which
you can create cms tool for your self. So Drupal is feature rich and flexible cms
framework by which one can create a good cms tool for himself.
Drupal has good community support also. You can get a very good documentation
at drupal.org and also there are lots of forum around the web on drupal. I am very
much sure that if you have some problem in your drupal people on the web has
answer. Drupal has super strong community support. You can get lots of Drupal
module in free of cost. Also there are lots of paid module are available. If you are
looking for the developer to maintain your drupal website you can very easily get
them and in very competitive price.
Now the other part of story. I have already mentioned that drupal is very feature rich
content management framework. So it is bit complex. Drupal view is good if you can
create it in correct way otherwise it will hamper the website performance. Drupal
need some extra care from your side because it is very feature rich. You need to
hire a good developer to integrate your website design(we commonly call it “theme”)
in drupal.
Joomla:
Joomal is a very very good opensource CMS tool. If you have proper setup of the
joomla you can forget anything about your cms tool and can concentrate on your
content management. Joomla is mid level content management tool. It has pre-built
good sets of feature to manage your website content in various way. Joomla has
specified sets on the content presentation style which almost suit every business.
Even you can display your website content in blog design or you can very easily
change it to magazine website style. So joomla is not as big as drupal but its generic
sets of features make it the best CMS tool. Usability of joomla is very easy. No need
to get any special training to run a joomla based website. It is very easy to use for
the webmasters.
Joomla has good community support and very good documentation on joomla.org.
You can very easily get any module to extend joomla. It has very good paid support
and service available in joomla. Webmaster can go over the internet and can get
that there are lots of developer available in joomla. Cost of maintenance of your
website in joomla is bit lower then drupal. Sets of feature in joomla is also bit lower
than drupal. As I have already mentioned that it is mid level content management
tool.
WordPress:
First of all let me clear that the wordpress is not a generic content management
tool(cms) it is a cms for the blogging sites or web blog. Lot of people thinks that it is
a CMS but in reality it is not. One can use as cms is the different thing. WordPress is
a very very easy to use CMS for blog website. It is designed to taking blog website
in the consideration. Some people use wordpress as their website CMS because it
suits their requirement. Because in small website people need category navigation,
specific style content presentation, a backend by which they can post. And all these
are available in wordpress. If you talk in terms of blog management tool then it is
very feature rich cms but if you are talking about generic CMS feature it does not
have lots of features.
WordPress has super community support. You can get better documentation
everywhere for wordress. Also it is very easy to use tool and even you can login into
admin panel and can start managing your website even without going for the
documentation. I personally love wordpress for its usability and efficiency.
My Personal advice For you on CMS selection
If you have a big-big website and want to use CMS tool then go for drupal. Drupal
has good infrastructure to manage big site very efficiently. But drupal require good
development task and need some maintenance. So I can say that if you have good
money and long vision for your website then go for Drupal. Also if your business
plan changes frequently and require frequent changes in your website design and
workflow then again Drupal is the best choice. In nutshell drupal is f0r the big
website. If you are conceren about your website administration security within your
company then drupal provide very rich set of access control list(ACL). ACL of Drupal
make it more secure.
If you have a website is mid level typical content serving website and it your very
stable business concept which is not going to change. Go for Joomla. Joomla
specially suitable for the mid level publisher. It has required sets of feature to run a
normal CMS website. If your content security and website visiting is is on mid level
then you can go for Joomla. Maintanence cost of your website is also bit lower then
drupal. So if your pocket is bit light and have vision for a good website you can
choose joomla. It is not premium but not less then premium.
If you are running a small website or blog, only go for wordpress. WordPress is cost
effective solution for all your website need. But if you need a different level of
security, different content placement pattern then this is not a good solution. If your
concept changes quickly and content display pattern alos getting changes it is not
suitable because you have to ask your developer to change theme. So wordpress is
fantastic for bloggers and people who want to run a website in very nominal cost.
Phishing is a fraudulent attempt, usually made through email, to steal your personal information. The best
way to protect yourself from phishing is to learn how to recognize a phish.
What are advantages of DBMS over traditional file based systems?
Ans: Database management systems were developed to handle the following difficulties of typical file-processing
systems supported by conventional operating systems.
1.Data redundancy and inconsistency
2. Difficulty in accessing data
3. Data isolation – multiple files and formats
4. Integrity problems
5. Atomicity of updates
6.Concurrent access by multiple users
7. Security problems
Source: http://cs.nyu.edu/courses/spring01/G22.2433-001/mod1.2.pdf
What are super, primary, candidate and foreign keys?
Ans: A superkey is a set of attributes of a relation schema upon which all attributes of the schema are functionally
dependent. No two rows can have the same value of super key attributes.
A Candidate key is minimal superkey, i.e., no proper subset of Candidate key attributes can be a superkey.
A Primary Key is one of the candidate keys. One of the candidate keys is selected as most important and becomes
the primary key. There cannot be more that one primary keys in a table.
Foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table. See this for an
example.
What is the difference between primary key and unique constraints?
Ans: Primary key cannot have NULL value, the unique constraints can have NULL values. There is only one primary
key in a table, but there can be multiple unique constrains.
What is database normalization?
Ans: It is a process of analyzing the given relation schemas based on their functional dependencies and primary
keys to achieve the following desirable properties:
1) Minimizing Redundancy
2) Minimizing the Insertion, Deletion, And Update Anomalies
Relation schemas that do not meet the properties are decomposed into smaller relation schemas that could meet
desirable properties.
Source: http://cs.tsu.edu/ghemri/CS346/ClassNotes/Normalization.pdf
What is SQL?
SQL is Structured Query Language designed for inserting and modifying in a relational database system.
What are the differences between DDL, DML and DCL in SQL?
Ans: Following are some details of three.
DDL stands for Data Definition Language. SQL queries like CREATE, ALTER, DROP and RENAME come under this.
DML stands for Data Manipulation Language. SQL queries like SELECT, INSERT and UPDATE come under this.
DCL stands for Data Control Language. SQL queries like GRANT and REVOKE come under this.
What is the difference between having and where clause?
Ans: HAVING is used to specify a condition for a group or an aggregate function used in select statement. The
WHERE clause selects before grouping. The HAVING clause selects rows after grouping. Unlike HAVING clause, the
WHERE clause cannot contain aggregate functions. (See this for examples)
What is Join?
Ans: An SQL Join is used to combine data from two or more tables, based on a common field between them. For
example, consider the following two tables.
Student Table
EnrollNo StudentName Address
1000
geek1
geeksquiz1
1001
geek2
geeksquiz2
1002
geek3
geeksquiz3
StudentCourse Table
CourseID EnrollNo
1
1000
2
1000
3
1000
1
1002
2
1003
Following is join query that shows names of students enrolled in different courseIDs.
SELECT StudentCourse.CourseID, Student.StudentName
FROM StudentCourse
INNER JOIN Customers
ON StudentCourse.EnrollNo = Student.EnrollNo
ORDER BY StudentCourse.CourseID;
The above query would produce following result.
CourseID StudentName
1
geek1
1
geek2
2
geek1
2
geek3
3
geek1
What is Identity?
Ans: Identity (or AutoNumber) is a column that automatically generates numeric values. A start and increment value
can be set, but most DBA leave these at 1. A GUID column also generates numbers; the value of this cannot be
controlled. Identity/GUID columns do not need to be indexed.
What is a view in SQL? How to create one
Ans: A view is a virtual table based on the result-set of an SQL statement. We can create using create view syntax.
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
What are the uses of view?
1. Views can represent a subset of the data contained in a table; consequently, a view can limit the degree of
exposure of the underlying tables to the outer world: a given user may have permission to query the view, while
denied access to the rest of the base table.
2. Views can join and simplify multiple tables into a single virtual table
3. Views can act as aggregated tables, where the database engine aggregates data (sum, average etc.) and presents
the calculated results as part of the data
4. Views can hide the complexity of data; for example a view could appear as Sales2000 or Sales2001, transparently
partitioning the actual underlying table
5. Views take very little space to store; the database contains only the definition of a view, not a copy of all the data
which it presentsv.
6. Depending on the SQL engine used, views can provide extra security
Source: Wiki Page
What is a Trigger?
Ans: A Trigger is a code that associated with insert, update or delete operations. The code is executed automatically
whenever the associated query is executed on a table. Triggers can be useful to maintain integrity in database.
What is a stored procedure?
Ans: A stored procedure is like a function that contains a set of operations compiled together. It contains a set of
operations that are commonly used in an application to do some common database tasks.
What is the difference between Trigger and Stored Procedure?
Ans: Unlike Stored Procedures, Triggers cannot be called directly. They can only be associated with queries.
What is a transaction? What are ACID properties?
Ans: A Database Transaction is a set of database operations that must be treated as whole, means either all
operations are executed or none of them.
An example can be bank transaction from one account to another account. Either both debit and credit operations
must be executed or none of them.
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions
are processed reliably.
What are indexes?
Ans: A database index is a data structure that improves the speed of data retrieval operations on a database table at
the cost of additional writes and the use of more storage space to maintain the extra copy of data.
Data can be stored only in one order on disk. To support faster access according to different values, faster search
like binary search for different values is desired, For this purpose, indexes are created on tables. These indexes need
extra space on disk, but they allow faster search according to different frequently searched values.
What are clustered and non-clustered Indexes?
Ans: Clustered indexes is the index according to which data is physically stored on disk. Therefore, only one
clustered index can be created on a given database table.
Non-clustered indexes don’t define physical ordering of data, but logical ordering. Typically, a tree is created whose
leaf point to disk records. B-Tree or B+ tree are used for this purpose.
We will soon be covering more DBMS questions. Please write comments if you find anything incorrect, or you want to
share more information about the topic discussed above.
In SQL, what’s the difference between an inner and outer join?
Joins are used to combine the data from two tables, with the result being a new, temporary table. The
temporary table is created based on column(s) that the two tables share, which represent meaningful
column(s) of comparison. The goal is to extract meaningful data from the resulting temporary table. Joins
are performed based on something called a predicate, which specifies the condition to use in order to
perform a join. A join can be either an inner join or an outer join, depending on how one wants the resulting
table to look.
It is best to illustrate the differences between inner and outer joins by use of an example. Here we have 2
tables that we will use for our example:
Employee
Location
EmpID EmpName
EmpID EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
39
Bangalore, India
It’s important to note that the very last row in the Employee table does not exist in the Employee Location
table. Also, the very last row in the Employee Location table does not exist in the Employee table. These
facts will prove to be significant in the discussion that follows.
Outer Joins
Let’s start the explanation with outer joins. Outer joins can be be further divided into left outer joins, right
outer joins, and full outer joins. Here is what the SQL for a left outer join would look like, using the tables
above:
select * from employee left outer join location
on employee.empID = location.empID;
Subscribe to our newsletter for more free interview questions.
In this SQL we are joining on the condition that the employee ID’s match in the rows tables. So, we will be
essentially combining 2 tables into 1, based on the condition that the employee ID’s match. Note that we
can get rid of the "outer" in left outer join, which will give us the SQL below. This is equivalent to what we
have above.
select * from employee left join location
on employee.empID = location.empID;
A left outer join retains all of the rows of the left table, regardless of whether there is a row that matches on
the right table. The SQL above will give us the result set shown below.
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
NULL
NULL
The Join Predicate – a geeky term you should know
Earlier we had mentioned something called a join predicate. In the SQL above, the join predicate is "on
employee.empID = location.empID". This is the heart of any type of join, because it determines what
common column between the 2 tables will be used to "join" the 2 tables. As you can see from the result set,
all of the rows from the left table are returned when we do a left outer join. The last row of the Employee
table (which contains the "Johson" entry) is displayed in the results even though there is no matching row in
the Location table. As you can see, the non-matching columns in the last row are filled with a "NULL". So,
we have "NULL" as the entry wherever there is no match.
A right outer join is pretty much the same thing as a left outer join, except that the rows that are retained
are from the right table. This is what the SQL looks like:
select * from employee right outer join location
on employee.empID = location.empID;
// taking out the "outer", this also works:
select * from employee right join location
on employee.empID = location.empID;
Using the tables presented above, we can show what the result set of a right outer join would look like:
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
NULL
NULL
39
Bangalore, India
We can see that the last row returned in the result set contains the row that was in the Location table, but
not in the Employee table (the "Bangalore, India" entry). Because there is no matching row in the Employee
table that has an employee ID of "39", we have NULL’s in the result set for the Employee columns.
Inner Joins
Now that we’ve gone over outer joins, we can contrast those with the inner join. The difference between an
inner join and an outer join is that an inner join will return only the rows that actually match based on the
join predicate. Once again, this is best illustrated via an example. Here’s what the SQL for an inner join will
look like:
select * from employee inner join location on
employee.empID = location.empID
This can also be written as:
select * from employee, location
where employee.empID = location.empID
Now, here is what the result of running that SQL would look like:
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
Inner vs Outer Joins
We can see that an inner join will only return rows in which there is a match based on the join predicate. In
this case, what that means is anytime the Employee and Location table share an Employee ID, a row will be
generated in the results to show the match. Looking at the original tables, one can see that those Employee
ID’s that are shared by those tables are displayed in the results. But, with a left or right outer join, the
result set will retain all of the rows from either the left or right tab
In SQL, what are the differences between primary, foreign, and unique
keys?
The one thing that primary, unique, and foreign keys all have in common is the fact that each type of key
can consist of more than just one column from a given table. In other words, foreign, primary, and
unique keys are not restricted to having just one column from a given table – each type of key can cover
multiple columns. So, that is one feature that all the different types of keys share – they can each be
comprised of more than just one column, which is something that many people in software are not aware of.
Of course, the database programmer is the one who will actually define which columns are covered by a
foreign, primary, or unique key. That is one similarity all those keys share, but there are also some major
differences that exist between primary, unique, and foreign keys. We will go over those differences in this
article. But first, we want to give a thorough explanation of why foreign keys are necessary in some
situations.
What is the point of having a foreign key?
Foreign keys are used to reference unique columns in another table. So, for example, a foreign key can be
defined on one table A, and it can reference some unique column(s) in another table B. Why would you want
a foreign key? Well, whenever it makes sense to have a relationship between columns in two different
tables.
An example of when a foreign key is necessary
Suppose that we have an Employee table and an Employee Salary table. Also assume that every
employee has a unique ID. The Employee table could be said to have the ‘master list’ of all Employee ID’s
in the company. But, if we want to store employees salaries in another table, then do we want to recreate
the entire master list of employee ID’s in the Employee Salary table as well? No – we don’t want to do that
because it’s inefficient. It would make a lot more sense to just define a relationship between an Employee ID
column in the Employee Salary table and the “master” Employee ID column in the Employee table – one
where the Employee Salary table can just reference the employee ID in the Employee table. This way,
whenever someone’s employee ID is updated in the Employee table, it will also automatically get updated in
the Employee Salary table. Sounds good right? So now, nobody has to manually update the employee ID’s
in the Employee Salary table every time the ID is update in the master list inside the Employee table. And, if
an employee is removed from the Employee table, he/she will also automatically be removed (by the
RDBMS) from the Employee Salary table – of course all of this behavior has to be defined by the database
programmer, but hopefully you get the point.
Foreign keys and referential integrity
Foreign keys have a lot to do with the concept of referential integrity. What we discussed in the previous
paragraph are some of the principles behind referential integrity. You can and should read a more in depth
article on that concept here: Referential integrity explained.
Can a table have multiple unique, foreign, and/or primary keys?
A table can have multiple unique and foreign keys. However, a table can have only one primary key.
Can a unique key have NULL values? Can a primary key have NULL
values?
Unique key columns are allowed to hold NULL values. The values in a primary key column, however,
can never be NULL.
Can a foreign key reference a non-primary key?
Yes, a foreign key can actually reference a key that is not the primary key of a table. But, a foreign key
must reference a unique key.
Can a foreign key contain null values?
Yes, a foreign key can hold NULL values. Because foreign keys can reference unique, non-primary keys –
which can hold NULL values – this means that foreign keys can themselves hold NULL values as well.
Some other differences between foreign, primary, and unique keys
While unique and primary keys both enforce uniqueness on the column(s) of one table, foreign keys define a
relationship between two tables. A foreign key identifies a column or group of columns in one (referencing)
table that refers to a column or group of columns in another (referenced) table – in our example above, the
Employee table is the referenced table and the Employee Salary table is the referencing table.
As we stated earlier, both unique and primary keys can be referenced by foreign keys.
http://www.programmerinterview.com/index.php/database-sql/simple-key-in-sql/
What is a simple key in a dbms?
In a database table, a simple key is just a single attribute (which is just a column) that can uniquely identify
a row. So, any single column in a table that can uniquely identify a row is a simple key. The reason it’s
called a simple key is because of the fact that it is simple in the sense that it’s just composed
of one column (as opposed to multiple columns) and that’s it.
Example of a simple key
Let’s go through an example of a simple key. Consider a table called Employees. If every employee has a
unique ID and a column called EmployeeID, then the EmployeeID column would be considered a simple key
because it’s a single column that can uniquely identify every row in the table (where each row is a separate
employee). Simple isn’t it?
What is the definition of a secondary key?
You may have heard the term secondary key in Oracle, MySQL, SQL Server, or whatever other dbms you
are dealing with. What exactly is a secondary key? Let’s start with a definition, and then a simple example
that will help you understand further.
A given table may have more than just one choice for a primary key. Basically, there may be another
column (or combination of columns for a multi-column primary key) that qualify as primary keys. Any
combination of column(s) that may qualify to be a primary key are known as candidate keys. This is because
they are considered candidates for the primary key. And the options that are not selected to be the primary
key are known as secondary keys.
Example of a Secondary Key in SQL
Let’s go through an example of a secondary key. Consider a table called Managers that stores all of the
managers in a company. Each manager has a unique Manager ID Number, a physical address, and an email
address. Let’s say that the Manager ID is chosen to be the primary key of the Managers table. Both the
physical address and email address could have been selected as the primary key, because they are both
unique fields for every manager row in the Managers table. But, because the email address and physical
address were not selected as the primary key, they are considered to be secondary keys.
Provide a definition and example of a superkey in SQL.
In SQL, the definition of a superkey is a set of columns in a table for which there are no two rows that will
share the same combination of values. So, the superkey is unique for each and every row in the table. A
superkey can also be just a single column.
Example of a superkey
Suppose we have a table that holds all the managers in a company, and that table is called Managers. The
table has columns called ManagerID, Name, Title, and DepartmentID. Every manager has his/her own
ManagerID, so that value is always unique in each and every row.
This means that if we combine the ManagerID column value for any given row with any other column
value, then we will have a unique set of values. So, for the combinations of (ManagerID, Name),
(ManagerID, TItle), (ManagerID, DepartmentID), (ManagerID, Name, DepartmentID), etc – there will be no
two rows in the table that share the exact same combination of values, because the ManagerID will always
be unique and different for each row. This means that pairing the Manager ID with any other column(s) will
ensure that the combination will also be unique across all rows in the table.
And that is exactly what defines a superkey – it’s any combination of column(s) for which that combination
of values will be unique across all rows in a table. So, all of those combinations of columns in the Manager
table that we gave earlier would be considered to be superkeys. Even the ManagerID column is considered
to be a superkey, although a special type of superkey as you can read more about below.
What is a minimal superkey?
A minimal superkey is the minimum number of columns that can be used to uniquely identify a single row.
In other words, the minimum number of columns, which when combined, will give a unique value for every
row in the table. Remember that we mentioned earlier that a superkey can be just a single column. So, in
our example above, the minimal superkey would be the ManagerID since it is unique for each and every row
in the Manager table.
Can a table have multiple minimal superkeys?
Yes, a table can have multiple minimal superkeys. Let use our example of a Manager table again. Suppose
we add another column for the Social Security Number (which, for our non-American readers, is a unique 9
digit number assigned to every citizen of the USA) to the Manager table – let’s just call it SSN. Since that
column will clearly have a unique value for every row in the table, it will also be a minimal superkey –
because it’s only one column and it also is unique for every row.
Can a minimal superkey have more than one column?
Absolutely. If there is no single column that is unique for every row in a given table, but there is a
combination of columns that produce a unique value for every row in a table, then that combination of
columns would be the minimal superkey. This is of course provided that the combination is the smallest
number of columns necessary to produce a unique value for each row.
Why is it called a superkey?
It’s called a superkey because it comes from RDBMS theory, as in superset and subset. So, a superkey is
essentially all the superset combinations of keys, which will of course uniquely identify a row in a table.
Superkey versus candidate key
We discussed minimal superkeys and defined exactly what they are. Candidate keys are actually minimal
superkeys – so both candidate keys and minimal superkeys mean exactly the same thing.
What’s referential integrity?
Referential integrity is a relational database concept in which multiple tables share a relationship based on
the data stored in the tables, and that relationship must remain consistent.
The concept of referential integrity, and one way in which it’s enforced, is best illustrated by an example.
Suppose company X has 2 tables, an Employee table, and an Employee Salary table. In the Employee table
we have 2 columns – the employee ID and the employee name. In the Employee Salary table, we have 2
columns – the employee ID and the salary for the given ID.
Now, suppose we wanted to remove an employee because he no longer works at company X.
would remove his entry in the Employee table. Because he also exists in the Employee Salary
would also have to manually remove him from there also. Manually removing the employee
Employee Salary table can become quite a pain. And if there are other tables in which Company X
employee then he would have to be deleted from those tables as well – an even bigger pain.
Then, we
table, we
from the
uses that
By enforcing referential integrity, we can solve that problem, so that we wouldn’t have to manually delete
him from the Employee Salary table (or any others). Here’s how: first we would define the employee ID
column in the Employee table to be our primary key. Then, we would define the employee ID column in the
Employee Salary table to be a foreign key that points to a primary key that is the employee ID column in
the Employee table. Once we define our foreign to primary key relationship, we would need to add what’s
called a ‘constraint’ to the Employee Salary table. The constraint that we would add in particular is called a
‘cascading delete’ – this would mean that any time an employee is removed from the Employee table, any
entries that employee has in the Employee Salary table would alsoautomatically be removed from the
Employee Salary table.
Note in the example given above that referential integrity is something that must beenforced, and that we
enforced only one rule of referential integrity (the cascading delete). There are actually 3 rules that
referential integrity enforces:
1.We may not add a record to the Employee Salary table
unless the foreign key for that record points to an existing
employee in the Employee table.
2.If a record in the Employee table is deleted, all corresponding
records in the Employee Salary table must be deleted using a
cascading delete. This was the example we had given earlier.
3.If the primary key for a record in the Employee table changes,
all corresponding records in the Employee Salary table must be
modified using what's called a cascading update.
It’s worth noting that most RDBMS’s – relational databases like Oracle, DB2, Teradata, etc. – can
automatically enforce referential integrity if the right settings are in place. But, a large part of the burden of
maintaining referential integrity is placed upon whoever designs the database schema – basically whoever
defined the tables and their corresponding structure/relationships in the database that you are using.
Referential integrity is an important concept and you simply must know it for any programmer interview.
In SQL, what’s the difference between the having clause and the
where clause?
The difference between the having and where clause is best illustrated by an example. Suppose we have a
table called emp_bonus as shown below. Note that the table has multiple entries for employees A and B.
emp_bonus
Employee Bonus
A
1000
B
2000
A
500
C
700
B
1250
If we want to calculate the total bonus that each employee received, then we would write a SQL statement
like this:
select employee, sum(bonus) from emp_bonus group by employee;
The Group By Clause
In the SQL statement above, you can see that we use the "group by" clause with the employee column.
What the group by clause does is allow us to find the sum of the bonuses for each employee. Using the
‘group by’ in combination with the ‘sum(bonus)’ statement will give us the sum of all the bonuses for
employees A, B, and C.
Subscribe to our newsletter for more free interview questions.
Running the SQL above would return this:
Employee Sum(Bonus)
A
1500
B
3250
C
700
Now, suppose we wanted to find the employees who received more than $1,000 in bonuses for the year of
2007. You might think that we could write a query like this:
BAD SQL:
select employee, sum(bonus) from emp_bonus
group by employee where sum(bonus) > 1000;
The WHERE clause does not work with aggregates like SUM
The SQL above will not work, because the where clause doesn’t work with aggregates – like sum, avg, max,
etc.. Instead, what we will need to use is the having clause. The having clause was added to sql just so we
could compare aggregates to other values – just how the ‘where’ clause can be used with non-aggregates.
Now, the correct sql will look like this:
GOOD SQL:
select employee, sum(bonus) from emp_bonus
group by employee having sum(bonus) > 1000;
Difference between having and where clause
So we can see that the difference between the having and where clause in sql is that the where clause
can not be used with aggregates, but the having clause can. One way to think of it is that the having clause
is an additional filter to the where clause.
How do database indexes work? And, how do indexes help? Provide
a tutorial on database indexes.
Let’s start out our tutorial and explanation of why you would need a database index by going through a very
simple example. Suppose that we have a database table called Employee with three columns –
Employee_Name, Employee_Age, and Employee_Address. Assume that the Employee table has thousands of
rows.
Now, let’s say that we want to run a query to find all the details of any employees who are named ‘Jesus’?
So, we decide to run a simple query like this:
SELECT * FROM Employee
WHERE Employee_Name = 'Jesus'
What would happen without an index on the table?
Once we run that query, what exactly goes on behind the scenes to find employees who are named
Jesus? Well, the database software would literally have to look at every single row in the Employee table
to see if the Employee_Name for that row is ‘Jesus’. And, because we want every row with the name ‘Jesus’
inside it, we can not just stop looking once we find just one row with the name ‘Jesus’, because there could
be other rows with the name Jesus. So, every row up until the last row must be searched – which means
thousands of rows in this scenario will have to be examined by the database to find the rows with the name
‘Jesus’. This is what is called a full table scan.
How a database index can help performance
You might be thinking that doing a full table scan sounds inefficient for something so simple – shouldn’t
software be smarter? It’s almost like looking through the entire table with the human eye – very slow and
not at all sleek. But, as you probably guessed by the title of this article, this is where indexes can help a
great deal. The whole point of having an index is to speed up search queries by essentially cutting
down the number of records/rows in a table that need to be examined.
What is an index?
So, what is an index? Well, an index is a data structure (most commonly a B- tree) that stores the values for
a specific column in a table. An index is created on a column of a table. So, the key points to remember are
that an index consists of column values from one table, and that those values are stored in a data structure.
The index is a data structure – remember that.
Subscribe to our newsletter for more free interview questions.
What kind of data structure is an index?
B- trees are the most commonly used data structures for indexes. The reason B- trees are the most popular
data structure for indexes is due to the fact that they are time efficient – because look-ups, deletions, and
insertions can all be done in logarithmic time. And, another major reason B- trees are more commonly used
is because the data that is stored inside the B- tree can be sorted. The RDBMS typically determines which
data structure is actually used for an index. But, in some scenarios with certain RDBMS’s, you can actually
specify which data structure you want your database to use when you create the index itself.
How does a hash table index work?
Hash tables are another data structure that you may see being used as indexes – these indexes are
commonly referred to as hash indexes. The reason hash indexes are used is because hash tables are
extremely efficient when it comes to just looking up values. So, queries that compare for equality to a string
can retrieve values very fast if they use a hash index. For instance, the query we discussed earlier (SELECT
* FROM Employee WHERE Employee_Name = ‘Jesus’) could benefit from a hash index created on the
Employee_Name column. The way a hash index would work is that the column value will be the key into the
hash table and the actual value mapped to that key would just be a pointer to the row data in the table.
Since a hash table is basically an associative array, a typical entry would look something like “Jesus =>
0×28939″, where 0×28939 is a reference to the table row where Jesus is stored in memory. Looking up a
value like “Jesus” in a hash table index and getting back a reference to the row in memory is obviously a lot
faster than scanning the table to find all the rows with a value of “Jesus” in the Employee_Name column.
The disadvantages of a hash index
Hash tables are not sorted data structures, and there are many types of queries which hash indexes can not
even help with. For instance, suppose you want to find out all of the employees who are less than 40 years
old. How could you do that with a hash table index? Well, it’s not possible because a hash table is only good
for looking up key value pairs – which means queries that check for equality (like “WHERE name = ‘Jesus’”).
What is implied in the key value mapping in a hash table is the concept that the keys of a hash table are not
sorted or stored in any particular order. This is why hash indexes are usually not the default type of data
structure used by database indexes – because they aren’t as flexible as B- trees when used as the index
data structure. Also see: Binary trees versus Hash Tables.
What are some other types of indexes?
Indexes that use a R- tree data structure are commonly used to help with spatial problems. For instance, a
query like “Find all of the Starbucks within 2 kilometers of me” would be the type of query that could show
enhanced performance if the database table uses a R- tree index.
Another type of index is a bitmap index, which work well on columns that contain Boolean values (like true
and false), but many instances of those values – basically columns with lowselectivity.
How does an index improve performance?
Because an index is basically a data structure that is used to store column values, looking up those values
becomes much faster. And, if an index is using the most commonly used data structure type – a B- tree –
then the data structure is also sorted. Having the column values be sorted can be a major performance
enhancement – read on to find out why.
Let’s say that we create a B- tree index on the Employee_Name column This means that when we search for
employees named “Jesus” using the SQL we showed earlier, then the entire Employee table does not have
to be searched to find employees named “Jesus”. Instead, the database will use the index to find employees
named Jesus, because the index will presumably be sorted alphabetically by the Employee’s name. And,
because it is sorted, it means searching for a name is a lot faster because all names starting with a “J” will
be right next to each other in the index! It’s also important to note that the index also stores pointers to the
table row so that other column values can be retrieved – read on for more details on that.
What exactly is inside a database index?
So, now you know that a database index is created on a column in a table, and that the index stores the
values in that specific column. But, it is important to understand that a database index does not store the
values in the other columns of the same table. For example, if we create an index on the Employee_Name
column, this means that the Employee_Age and Employee_Address column values are not also stored in the
index. If we did just store all the other columns in the index, then it would be just like creating another copy
of the entire table – which would take up way too much space and would be very inefficient.
An index also stores a pointer to the table row
So, the question is if the value that we are looking for is found in an index (like ‘Jesus’) , how does it find
the other values that are in the same row (like the address of Jesus and his age)? Well, it’s quite simple –
database indexes also store pointers to the corresponding rows in the table. A pointer is just a reference to
a place in memory where the row data is stored on disk. So, in addition to the column value that is stored in
the index, a pointer to the row in the table where that value lives is also stored in the index. This means
that one of the values (or nodes) in the index for an Employee_Name could be something like (“Jesus”,
0×82829), where 0×82829 is the address on disk (the pointer) where the row data for “Jesus” is stored.
Without that pointer all you would have is a single value, which would be meaningless because you would
not be able to retrieve the other values in the same row – like the address and the age of an employee.
How does a database know when to use an index?
When a query like “SELECT * FROM Employee WHERE Employee_Name = ‘Jesus’ ” is run, the database will
check to see if there is an index on the column(s) being queried. Assuming the Employee_Name column
does have an index created on it, the database will have to decide whether it actually makes sense to use
the index to find the values being searched – because there are some scenarios where it is actually less
efficient to use the database index, and more efficient just to scan the entire table. Read this article to
understand more about those scenarios: Selectivity in SQL.
Can you force the database to use an index on a query?
Generally, you will not tell the database when to actually use an index – that decision will be made by the
database itself. Although it is worth noting that in most databases (like Oracle and MySQL), you can actually
specify that you want the index to be used.
How to create an index in SQL:
Here’s what the actual SQL would look like to create an index on the Employee_Name column from our
example earlier:
CREATE INDEX name_index
ON Employee (Employee_Name)
How to create a multi-column index in SQL:
We could also create an index on two of the columns in the Employee table , as shown in this SQL:
CREATE INDEX name_index
ON Employee (Employee_Name, Employee_Age)
What is a good analogy for a database index?
A very good analogy is to think of a database index as an index in a book. If you have a book about dogs
and you are looking for the section on Golden Retrievers, then why would you flip through the entire book –
which is the equivalent of a full table scan in database terminology – when you can just go to the index at
the back of the book, which will tell you the exact pages where you can find information on Golden
Retrievers. Similarly, as a book index contains a page number, a database index contains a pointer to the
row containing the value that you are searching for in your SQL.
What is the cost of having a database index?
So, what are some of the disadvantages of having a database index? Well, for one thing it takes up space –
and the larger your table, the larger your index. Another performance hit with indexes is the fact that
whenever you add, delete, or update rows in the corresponding table, the same operations will have to be
done to your index. Remember that an index needs to contain the same up to the minute data as whatever
is in the table column(s) that the index covers.
As a general rule, an index should only be created on a table if the data in the indexed column will be
queried frequently.
What is a self join? Explain it with an example and tutorial.
Let’s illustrate the need for a self join with an example. Suppose we have the following table – that is called
employee. The employee table has 2 columns – one for the employee name (called employee_name), and
one for the employee location (called employee_location):
employee
employee_name employee_location
Joe
New York
Sunil
India
Alex
Russia
Albert
Canada
Jack
New York
Now, suppose we want to find out which employees are from the same location as the employee named Joe.
In this example, that location would be New York. Let’s assume – for the sake of our example – that we
can not just directly search the table for people who live in New York with a simple query like this (maybe
because we don’t want to hardcode the city name) in the SQL query:
SELECT employee_name
FROM employee
WHERE employee_location = "New York"
So, instead of a query like that what we could do is write a nested SQL query (basically a query within
another query – which more commonly called a subquery) like this:
SELECT employee_name
FROM employee
WHERE employee_location in
( SELECT employee_location
FROM employee
WHERE employee_name = "Joe")
Using a subquery for such a simple question is inefficient. Is there a more efficient and elegant solution to
this problem?
It turns out that there is a more efficient solution – we can use something called a self join. A self join is
basically when a table is joined to itself. The way you should visualize a self join for a given table is by
imagining that a join is performed between two identical copies of that table. And that is exactly why it
is called a self join – because of the fact that it’s just the same table being joined to another copy of itself
rather than being joined with a different table.
How does a self join work
Before we come up with a solution for this problem using a self join, we should go over some concepts so
that you can fully understand how a self join works. This will also make the SQL in our self join tutorial a lot
easier to understand, which you will see further below.
A self join must have aliases
In a self join we are joining the same table to itself by essentially creating two copies of that table. But, how
do we distinguish between the two different copies of the table – because there is only one table name after
all? Well, when we do a self join, the table names absolutely must use aliases otherwise the column names
would be ambiguous. In other words, we would not know which table’s columns are being referenced
without using aliases for the two copies of the table. If you don’t already know what an alias is, it’s simply
another name given to a table, and that name is then used in the SQL query to reference the table. So, we
will just use the aliases e1 and e2 for the employee table when we do a self join.
Self join predicate
As with any join there must be a condition upon which a self join is performed – we can not just arbitrarily
say “do a self join”, without specifying some condition. That condition will be our join predicate. If you need
a refresher on join predicates (or just joins in general) then check this link out: Inner vs. Outer joins.
Now, let’s come up with a solution to the original problem using a self join instead of a subquery. This will
help illustrate how exactly a self join works. The key question that we must ask ourselves is what should our
join predicate be in this example? Well, we want to find all the employees who have the same location as
Joe.
Because we want to match between our two tables (both of which are the same table – employee – aliased
as e1 and e2) on location our join predicate should clearly be “WHERE e1.employee_location =
e2.employee_location”. But is that enough to give us what we want? No, it’s not, because we also want to
filter the rows returned since we only want people who are from the same location as Joe.
So, how can we filter the rows returned so that only people from Joe’s location are returned? Well, what we
can do is simply add a condition on one of the tables (e2 in our example) so that it only returns the row
where the name is Joe. Then, the other table (e1) will match up all the names that have the same location
in e2, because of our join predicate – which is “WHERE e1.employee_location = e2.employee_location”. We
will then just select the names from e1, and not e2 because e2 will only have Joe’s name. If that’s confusing
then keep reading further to understand more about how the query will work.
So, the self join query that we come up with looks like this:
Self Join SQL Example
SELECT e1.employee_name
FROM employee e1, employee e2
WHERE e1.employee_location = e2.employee_location
AND e2.employee_name="Joe";
This query will return the names Joe and Jack – since Jack is the only other person who lives in New York
like Joe.
Generally, queries that refer to the same table can be greatly simplified by re-writing the queries as self
joins. And, there is definitely a performance benefit for this as well.
What does a self join look like?
It will help tremendously to actually visualize the actual results of a self join internally. Remember that a self
join is just like any other join, where the two tables are merged into one temporary table. First off, you
should visualize that we have two separate copies of the employee table, which are given aliases of e1 and
e2. These copies would simply look like this – note that we shortened the column names from
employee_name and employee_location to just Name and Location for convenience:
e1
e2
Name Location
Name Location
Joe
Joe
New York
New York
Sunil India
Sunil India
Alex
Alex
Russia
Russia
Albert Canada
Albert Canada
Jack
Jack
New York
New York
And the final results of running the self join query above – the actual joined table – would look like this:
e1.employee_name e1.employee_location e2.employee_name e2.employee_location
Joe
New York
Joe
New York
Jack
New York
Joe
New York
Self joins versus inner joins
Are self joins and inner joins the same? You might be wondering if all self joins are also inner joins. After all,
in our example above our self join uses an inner join because only the rows that match based on the join
predicate are returned – non-matching rows are not returned. Well, it turns out that a self join and inner
join are completely different concepts. A self join could just as well be an outer join or an inner join – it just
depends on how the query is written. We could easily change the query we used above to do a LEFT OUTER
JOIN – while the query still remains a self join – but that wouldn’t give us the results we want in our
example. So, we use an implied inner join instead because that gives us the correct results. Remember that
a query is a self join as long as the two tables being joined are exactly the same table, but whether it’s an
inner join or outer join depends on what is specified in the SQL. And, inner/outer joins are separate concepts
entirely from a self join.
Self joins manager employee example
The most commonly used example for self joins is the classic employee manager table. The table is called
Employee, but holds all employees – including their managers. Every employee has an ID, and there is also
a column for the manager ID. So, for example, let’s say we have a table that looks like this – and we call it
Employee:
EmployeeID Name
ManagerID
1
Sam
10
2
Harry
4
4
Manager
NULL
10
AnotherManager NULL
Notice that in the table above there are two managers, conveniently named “Manager” and
“AnotherManager”. And, those managers don’t have managers of their own – as noted by the NULL value in
their Manager column.
Now, given the table above, how can we return results that will show each employee’s name, and his/her
manager’s name in nicely arranged results – with the employee in one column and his/her manager’s name
in the other column. Well, it turns out we can use a self join to do this. Try to come up with the SQL on your
own before reading our answer.
Self join manager employee answer
In order to come up with a correct answer for this problem, our goal should be to perform a self join that
will have both the employee information and manager information in one row. First off, since we are doing a
self join, it helps to visualize the one table as two tables – let’s give them aliases of e1 and e2. Now, with
that in mind, we want the employee’s information on one side of the joined table and the manager’s
information on the other side of the joined table. So, let’s just say that we want e1 to hold the employee
information and e2 to hold the corresponding manager’s information. What should our join predicate be in
that case?
Well, the join predicate should look like “ON e1.ManagerID = e2.EmployeeID” – this basically says that we
should join the two tables (a self join) based on the condition that the manager ID in e1 is equal to the
employee ID in e2. In other words, an employee’s manager in e1 should have the manager’s information in
e2. An illustration will help clarify this. Suppose we use that predicate and just select everything after we
join the tables. So, our SQL would look like this:
SELECT *
FROM Employee e1
INNER JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID
The results of running the query above would look like this:
e1.EmployeeID e1.Name e1.ManagerID e2.EmployeeID e2.Name
e2.ManagerID
1
Sam
10
10
AnotherManager NULL
2
Harry
4
4
Manager
NULL
Note that there are only 2 rows returned – this is because an inner join is performed, which means that only
when there is a match between employee ID’s and manager ID’s will there be a result returned. And since
there are 2 people without managers (who have a manager ID of NULL), they will not be returned as part of
table e1, because no employees have a matching ID of NULL.
Now, remember that we only want to return the names of the employee and corresponding manager as a
pair. So, we can fine-tune the SQL as follows:
SELECT e1.Name, e2.Name
FROM Employee e1
INNER JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID
Running the SQL above would return:
Sam
Harry
AnotherManager
Manager
And that is the answer to the employee manager problem using a self join! Feel free to post any comments.
Suppose we have the Employee table below, and we want to retrieve
all of the cities that the employees live in, but we don’t want any
duplicates. How can we do this in SQL?
employee
employee_name employee_location
Joe
New York
Sunil
India
Alex
Russia
Albert
Canada
Jack
New York
Alex
Russia
In SQL, the distinct keyword will allow us to do that. Here’s what the simple SQL would look like:
SELECT DISTINCT employee_location from employee;
Running this query will return the following results:
employee_location
New York
India
Russia
Canada
So, you can see that the duplicate values for "Russia" and "Canada" are not returned in the results.
It’s worth noting that the DISTINCT keyword can be used with more than one column. That means that only
the unique combination of columns will be returned. Again, this is best illustrated by an example.
Suppose we run the following SQL:
SELECT DISTINCT employee_name, employee_location from employee;
If we run the SQL above, it will return this:
employee_name employee_location
Joe
New York
Sunil
India
Alex
Russia
Albert
Canada
Jack
New York
Note that the one extra entry for "Alex, Russia" is missing in the result set above. This is because when we
select a distinct combination of name and location, if there are 2 entries with the same exact name and
location then the sql that we ran above will only return one of those entries.
In the table below, how would you retrieve the unique values for the
employee_location without using the DISTINCT keyword?
employee
employee_name employee_location
Joe
New York
Sunil
India
Alex
Russia
Albert
Canada
Jack
New York
Alex
Russia
We can actually accomplish this with the GROUP BY keyword. Here’s what the SQL would look like:
SELECT employee_location from employee
GROUP BY employee_location
Running this query will return the following results:
employee_location
New York
India
Russia
Canada
So, you can see that the duplicate values for "Russia" and "Canada" are not returned in the results.
This is a valid alternative to using the DISTINCT keyword. If you need a refresher on the GROUP BY clause,
then check out this question: Group By and Having. This question would probably be asked just to see how
good you are with coming up with alternative options for SQL queries. Although, it probably doesn’t prove
much about your SQL skills.
Practice SQL Interview questions and Answers
There’s no better way to improve your SQL skills than to practice with some real SQL
interview questions – and these SQL practice problems are a great way to improve your SQL
online. We recommend first creating the following simple tables presented below in the RDBMS
software of your choice – MySQL, Oracle, DB2, SQL Server, etc, and then actually try to figure
out the answer on your own if possible.
The following SQL practice exercises were actually taken from real interview tests with Google
and Amazon. Once again, we highly recommended that you try finding the answers to these SQL
practice exercises on your own before reading the given solutions. The practice problems are
based on the tables presented below.
Salesperson
Customer
ID Name Age Salary
ID Name
1
Abe
61
140000
4
Samsonic pleasant J
2
Bob
34
44000
6
Panasung oaktown J
5
Chris 34
40000
7
Samony
jackson B
7
Dan
41
52000
9
Orange
Jackson B
8
Ken
57
115000
38
38000
11 Joe
City
Industry Type
Orders
Number order_date cust_id salesperson_id Amount
10
8/2/96
4
2
540
20
1/30/99
4
8
1800
30
7/14/95
9
1
460
40
1/29/98
7
2
2400
50
2/3/98
6
7
600
60
3/2/98
6
7
720
70
5/6/98
9
7
150
Given the tables above, find the following:
a. The names of all salespeople that have an order with Samsonic.
b. The names of all salespeople that do not have any order with Samsonic.
c. The names of salespeople that have 2 or more orders.
d. Write a SQL statement to insert rows into a table called highAchiever(Name, Age),
where a salesperson must have a salary of 100,000 or greater to be included in the table.
Subscribe to our newsletter for more free interview questions.
Let’s start by answering part a. It’s obvious that we would need to do a SQL join, because the
data in one table will not be enough to answer this question. This is a good question to get some
practice with SQL joins, so see if you can come up with the solution.
Now, what tables should we use for the join? We know that the customer ID of Samsonic is 4, so
we can use that information and do a simple join with the salesperson and customer tables. The
SQL would look like this:
select Salesperson.Name from Salesperson, Orders where
Salesperson.ID = Orders.salesperson_id and cust_id = '4';
We can also use subqueries (a query within a query) to come up with another possible answer.
Here is an alternative, but less efficient, solution using a subquery:
select Salesperson.Name from Salesperson where
Salesperson.ID = '{select Orders.salesperson_id from Orders,
Customer where Orders.cust_id = Customer.id
and Customer.name = 'Samsonic'}';
Click on the Next button below to check out the answer to parts B and C of this SQL interview
question.
Practice SQL Interview Questions
Let’s now work on answering parts B and C of the original question. We present the tables below again for
your convenience.
Here is part B: Find the names of all salespeople that do not have any orders with Samsonic.
This is part C: Find the names of salespeople that have 2 or more orders.
Salesperson
Customer
ID Name Age Salary
ID Name
1
Abe
61
140000
4
Samsonic pleasant J
2
Bob
34
44000
6
Panasung oaktown J
5
Chris 34
40000
7
Samony
jackson B
7
Dan
41
52000
9
Orange
Jackson B
8
Ken
57
115000
38
38000
11 Joe
City
Orders
Number order_date cust_id salesperson_id Amount
10
8/2/96
4
2
540
20
1/30/99
4
8
1800
30
7/14/95
9
1
460
40
1/29/98
7
2
2400
Industry Type
50
2/3/98
6
7
600
60
3/2/98
6
7
720
70
5/6/98
9
7
150
Part B of the question asks for the names of the salespeople who do not have an order with Samsonic. A
good way to approach this problem is to break it down: if we can first find the name of all the salespeople
who do have an order with Samsonic. Then, perhaps we can work with that list and get all the salespeople
who do not have an order with Samsonic.
So, let’s start by just getting a list of all the salespeople ID’s that have an order with Samsonic. We can get
this list by doing a join with a condition that the customer is Samsonic. We can use both the Customer and
Orders table. The SQL for this will look like:
select Orders.salesperson_id from Orders, Customer where
Orders.cust_id = Customer.ID and Customer.Name = 'Samsonic'
This will give us a list of all the salespeople ID’s that have an order with Samsonic. Now, we can get a list of
the names of all the salespeople who do NOT have an order with Samsonic. SQL has a ‘NOT’ operator that
easily allows us to exclude elements of the result set. We can use this to our advantage. Here is one
possible answer to question B, and this is what the final SQL will look like:
select Salesperson.Name from Salesperson
where Salesperson.ID NOT IN(
select Orders.salesperson_id from Orders, Customer
where Orders.cust_id = Customer.ID
and Customer.Name = 'Samsonic')
Now, lets work on answering part C. As always, it’s best to break the problem down into more manageable
pieces. So, lets focus on one table: the Orders table. Looking at that table we can find the ID’s that belong
to the salespeople who have 2 or more orders. This will require use of the "group by" syntax in SQL, which
allows us to group by whatever column we choose. In this case, the column that we would be grouping by is
the salesperson_id column, because for a given salesperson ID we would like to find out how many orders
were placed under that ID. With that said, we can write this SQL:
select salesperson_id from Orders group by
salesperson_id having count(salesperson_id) > 1
Note how we used the having clause instead of the where clause because we are using the ‘count’
aggregate. Well, now we have a SQL statement that gives us the ID’s of the salespeople who have more
than 1 order. But, what we really want is the names of the salespeople who have those ID’s. This is actually
quite simple if we do a join on the Salesperson and Orders table, and use the SQL that we came up earlier.
It would look like this:
SELECT name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.id
GROUP BY name, salesperson_id
HAVING COUNT( salesperson_id ) >1
Based on our tables, this SQL will return the names of Bob and Dan. Click on the Next button below to check
out the answer to part D.
Practice SQL Interview Questions
We’ve finally come to the last part of this question. Question D is presented below again for your
convenience.
Part D: Write a SQL statement to insert rows into a table called highAchiever(Name, Age), where
a salesperson must have a salary of 100,000 or greater to be included in the table.
Looking at part D, it’s easy to come up with the SQL to specify the condition that the salary of the
salesperson must be greater or equal to 100,000. It would look like this "WHERE SALARY >= 100000". The
only slightly difficult part of this question is how we insert values into the highachiever table while selecting
values from the salesperson table. It turns out that the SQL for this is:
insert into highAchiever (name, age)
(select name, age from salesperson where salary > 100000);
Because we are inserting values into the highAchiever table based off of what we select from another table,
we don’t use the "Values" clause that we would normally use when inserting. This is what a regular insertion
would look like (note the use of the "values" clause):
insert into highAchiever(name, age) values ('Jackson', 28)
As you can see the answer to this one is pretty simple. Click next below to read part 2 of our practice SQL
interview questions.
Practice SQL Interview Questions
We’ve finally come to the last part of this question. Question D is presented below again for your
convenience.
Part D: Write a SQL statement to insert rows into a table called highAchiever(Name, Age), where
a salesperson must have a salary of 100,000 or greater to be included in the table.
Looking at part D, it’s easy to come up with the SQL to specify the condition that the salary of the
salesperson must be greater or equal to 100,000. It would look like this "WHERE SALARY >= 100000". The
only slightly difficult part of this question is how we insert values into the highachiever table while selecting
values from the salesperson table. It turns out that the SQL for this is:
insert into highAchiever (name, age)
(select name, age from salesperson where salary > 100000);
Because we are inserting values into the highAchiever table based off of what we select from another table,
we don’t use the "Values" clause that we would normally use when inserting. This is what a regular insertion
would look like (note the use of the "values" clause):
insert into highAchiever(name, age) values ('Jackson', 28)
As you can see the answer to this one is pretty simple. Click next below to read part 2 of our practice SQL
interview questions.
Practice SQL Interview Questions
We’ve finally come to the last part of this question. Question D is presented below again for your
convenience.
Part D: Write a SQL statement to insert rows into a table called highAchiever(Name, Age),
where a salesperson must have a salary of 100,000 or greater to be included in the table.
Looking at part D, it’s easy to come up with the SQL to specify the condition that the salary of the
salesperson must be greater or equal to 100,000. It would look like this "WHERE SALARY >=
100000". The only slightly difficult part of this question is how we insert values into the highachiever
table while selecting values from the salesperson table. It turns out that the SQL for this is:
insert into highAchiever (name, age)
(select name, age from salesperson where salary > 100000);
Because we are inserting values into the highAchiever table based off of what we select from another
table, we don’t use the "Values" clause that we would normally use when inserting. This is what a
regular insertion would look like (note the use of the "values" clause):
insert into highAchiever(name, age) values ('Jackson', 28)
As you can see the answer to this one is pretty simple. Click next below to read part 2 of our practice
SQL interview questions.
Practice SQL Interview Question #2
This question was asked in a Google interview: Given the 2 tables below, User and UserHistory:
User
user_id
name
phone_num
UserHistory
user_id
date
action
1. Write a SQL query that returns the name, phone number and most recent date for any user
that has logged in over the last 30 days (you can tell a user has logged in if the action field in
UserHistory is set to "logged_on").
Every time a user logs in a new row is inserted into the UserHistory table with user_id, current
date and action (where action = "logged_on").
2. Write a SQL query to determine which user_ids in the User table are not contained in the
UserHistory table (assume the UserHistory table has a subset of the user_ids in User table). Do
not use the SQL MINUS statement. Note: the UserHistory table can have multiple entries for each
user_id.
Note that your SQL should be compatible with MySQL 5.0, and avoid using subqueries.
Let’s start with #1 by breaking down the problem into smaller, more manageable problems. Then we can
take the pieces and combine them to provide a solution to the overall problem.
Figuring out how to tell whether a user has logged on in the past 30 days seems like a good place to
start. We want to see how we can express this in MySQL. You can look online for some Mysql functions
that will help with this calculation. MySQL has a "date_sub" function, in which we can pass the current date
(as in today’s date) and an interval of 30 days, and it will return us the date 30 days ago from today. Once
we have that date, we can compare it with the date in the UserHistory table to see if it falls within the last
30 days. One question that remains is how we will retrieve the current date. This is simple, because MySQL
comes built in with a function called curdate() that will return the current date.
So, using the date_sub function, we can come up with this piece of SQL:
UserHistory.date >= date_sub(curdate(), interval 30 day)
This will check to see that the date in the UserHistory table falls within the last 30 days. Note that we use
the ">=" operator to compare dates – in this case, we are simply saying that the date in the UserHistory
table is greater than or equal to the date returned from the date_sub function. A date is "greater" than
another date when it occurs further in the future than the other date. So, 2007-9-07 will be considered
"greater" than 2006-08-19, because 2007-9-07 occurs further in the future than 2006-08-19.
Subscribe to our newsletter for more free interview questions.
Now, that’s only one piece of the overall problem, so let’s continue. The problem asks us to retrieve
the name, phone number, and the most recent date for any user that’s logged in over the last 30 days.
We have one table with the user_id and the phone number, but only the other table contains the actual
date. Clearly, we will have to do a join on the 2 tables in order to combine the data into a form that will
allow us to solve this problem. And since the 2 tables only share one column – the user_id column – it’s
clear what common column we will use to join the 2 tables. Doing a join, selecting the required fields, and
using the date condition will look like this:
select name, phone_num, date from User, UserHistory
where User.user_id=UserHistory.user_id
and UserHistory.date >= date_sub(curdate(), interval 30 day)
So far, we are selecting the name, phone number, and the date for any user that’s logged in over the last
30 days. But, wait a minute – the problem specifically asks for "the mostrecent date for any user that’s
logged in over the last 30 days." The problem with this is that we could get multiple entries for a user that
logged on more than once in the last 30 days. That is not what we want – we want to see the most recent
date that someone logged on in the last 30 days – this will return a maximum of 1 entry per user.
Now, the question is how do we get the most recent date? This is quite simple again, as MySQL provides a
MAX aggregate function that we can use to find the most recent date. Given a group of dates, the MAX
function will return the "maximum" date – which is basically just the most recent date (the one furthest in
the future). Because this is an aggregate function, we will have to provide the GROUP BY clause in order to
specify what column we would like to use as a ‘container’ of the group of dates. So, now our SQL looks like
this:
select User.name, User.phone_num, max(UserHistory.date)
from User, UserHistory
where User.user_id = UserHistory.user_id and
UserHistory.date >= date_sub(curdate(), interval 30 day)
group by (User.user_id);
Now all we need is to add the condition that checks to see that the user’s action equals "logged_on". So, the
final SQL, and the answer to the problem looks like this:
select User.name, User.phone_num, max(UserHistory.date)
from User, UserHistory
where User.user_id = UserHistory.user_id
and UserHistory.action = 'logged_on'
and UserHistory.date >= date_sub(curdate(), interval 30 day)
group by (User.user_id);
Phew! We are finally done with question 1, click next to check out the answer to question #2.
Practice SQL Interview Question #2
Given the 2 tables below, User and UserHistory:
User
user_id
name
phone_num
UserHistory
user_id
date
action
Let’s continue with the 2nd question, presented again below…
2. Given the tables above, write a SQL query to determine which user_ids in the User table are
not contained in the UserHistory table (assume the UserHistory table has a subset of the
user_ids in User table). Do not use the SQL MINUS statement. Note: the UserHistory table can
have multiple entries for each user_id.
Note that your SQL should be compatible with MySQL 5.0, and avoid using subqueries.
Basically we want the user_ids that exist in the User table but not in the UserHistory table. If
we do a regular inner join on the user_id column, then that would just do a join on all the
rows in which the User and UserHistory table share the same user_id values . But the question
specifically asks for just the user_ids that are in the User table, but are notin the UserHistory
table. So, using an inner join will not work.
What if, instead of an inner join, we use a left outer join on the user_id column? This will allow
us to retain all the user_id values from the User table (which will be our "left" table) even when
there is no matching user_id entry in the "right" table (in this case, the UserHistory table). When
there is no matching record in the "right" table the entry will just show up as NULL. This means
that any NULL entries are user_id values that exist in the User table but not in the UserHistory
table. This is exactly what we need to answer the question. So, here’s what the SQL will look like:
select distinct u.user_id
from User as u
left join UserHistory as uh on u.user_id=uh.user_id
where uh.user_id is null
You may be confused by the "User as u" and the "UserHistory as uh" syntax. Those are what’s
called aliases. Aliases allow us to assign a shorter name to a table, and it makes for cleaner and
more compact SQL. In the example above, "u" will actually be another name for the "User" table
and "uh" will be another name for the "UserHistory" table.
We also use the distinct keyword. This will ensure that each user_id is returned only once.
That concludes our series of practice sql interview questions. If you are looking for some more
advanced and challenging SQL interview questions the check out our other articles:Advanced SQL
practice questions.
Advanced SQL Interview Questions and Answers
Here are some complex SQL interview problems that are for people who are looking for more advanced and
challenging questions, along with the answers and complete explanations. Try to figure out the answer to
the questions yourself before reading the answers.
Suppose we have 2 tables called Orders and Salesperson shown below:
Salesperson
Orders
ID Name Age Salary
Number order_date cust_id salesperson_id Amount
1
Abe
61
140000
10
8/2/96
4
2
540
2
Bob
34
44000
20
1/30/99
4
8
1800
5
Chris 34
40000
30
7/14/95
9
1
460
7
Dan
41
52000
40
1/29/98
7
2
2400
8
Ken
57
115000
50
2/3/98
6
7
600
38
38000
60
3/2/98
6
7
720
70
5/6/98
9
7
150
11 Joe
Now suppose that we want to write SQL that must conform to the SQL standard.
We want to retrieve the names of all salespeople that have more than 1 order from the tables
above. You can assume that each salesperson only has one ID.
If that is the case, then what (if anything) is wrong with the following SQL?:
SELECT Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
The answer and explanation to advanced SQL question 1
There is definitely something wrong with the SQL above, and it is probably something that most beginner
SQL programmers may not notice. The problem is that the SQL Standard says that we can not select a
column that is not part of the group by clause unless it is also contained within an aggregate function. If we
try to run the SQL above in SQL Server, we would get an error that looks like this:
Column 'Name' is invalid in the select list because it is
not contained in either an aggregate function or
the GROUP BY clause.
You might be confused now, so let’s explain what that error means in plain English and through some
simple examples. The most important thing you should take out of this discussion is understanding
exactly why we get that error, and how to avoid it. There is a good reason for the error – read on to
understand why.
You can see in the bad SQL above that the “Name” column is clearly not also a part of the group by
statement, nor is it contained within an aggregate function (like SUM, MAX, etc).
As the error above suggests, we can fix the error by either wrapping the Name column inside an aggregate
function or adding it to the Group By clause.
So if we want to write SQL that complies with the standard, then we could write something like this by
adding the Name column to the Group By:
SELECT Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id, Name
-- we added the name column to the group by, and now it works!
HAVING COUNT( salesperson_id ) >1
The SQL above will run just fine without giving any error.
We could also fix the problem by putting the Name column in any aggregate function, and then simply make
that a part of our select statement. So, we could just write this SQL instead, and it would be perfectly legal
according to the SQL standard. We chose to use the MAX aggregate function, but any other aggregate would
work just fine as well:
SELECT MAX(Name) --put name in an aggregate function
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
Adding the Name column to the group by, or wrapping the Name column in an aggregate will certainly fix
the error – but it’s very important to note that both of those things will change the data that is returned to a
state that you may not want.
Why does the selected column have to be in the group by clause or part of
an aggregate function?
So, now you understand how to fix the error – but do you understand why it is a problem in the first
place? Well, you should – because that is the most important thing to understand! So, let’s explain
some more about why SQL gives that error shown above .
First off, let’s talk a little bit more about aggregate functions. You probably know what aggregate functions
in SQL are – we used one in the example above. In case you forgot, aggregate functions are used to
perform a mathematical function on the values inside a given column, which is passed into the aggregate
function. Here are some of the commonly used aggregate functions:
AVG() - Returns the average value
COUNT() - Returns the number of rows
FIRST() - Returns the first value
LAST() - Returns the last value
MAX() - Returns the largest value
MIN() - Returns the smallest value
SUM() - Returns the sum
To illustrate why the SQL standard says that a selected column has to be in the group by clause or part of
an aggregate function, let’s use another example. Suppose we have some tables called Starbucks_Stores
and Starbucks_Employees. In case you don’t already know, Starbucks is a popular coffee shop/cafe in the
USA:
Starbucks_Employees
Starbucks_Stores
ID Name Age HourlyRate StoreID
store_id city
1
Abe
61
14
10
10
San Francisco
2
Bob
34
10
30
20
Los Angeles
5
Chris 34
9
40
30
San Francisco
7
Dan
41
11
50
40
Los Angeles
8
Ken
57
11
60
50
San Francisco
11 Joe
38
13
70
60
New York
70
San Francisco
Now, given the tables above let’s say that we write some SQL like this:
SELECT count(*) as num_employees, HourlyRate
FROM Starbucks_Employees JOIN Starbucks_Stores
ON Starbucks_Employees.StoreID = Starbucks_Stores.store_id
GROUP BY city
It looks like the SQL above would just return the number of Starbucks employees in each city, along with
the HourlyRate – because it will group the employees based on whatever city they work in (thanks to the
“group by city” statement).
Subscribe to our newsletter for more free interview questions.
The problem with selecting a non-aggregate column that is not in the group
by
But the real question here is what exactly would be returned for the HourlyRate in the SQL above? Would
it return every employee’s hourly rate separated by commas? Since we group by city, will it return the
highest hourly rate for each city? Will it return the hourly rate as a distinct list, so those 2 guys making 11
dollars an hour will have the 11 returned only once?
The problem here is that we do not know what will be returned because we are notspecific enough with
what we are asking for in the SQL! If what we are asking for is not specific enough, then the SQL
processor will not know what to return.
This is why almost all database implementations return an error when the SQL above is run (with the
notable exception of MySQL) – and this is why the SQL does not conform to the Standard. In SQL Server
running the SQL above will return the same error that we showed earlier.
Let’s explain even further in case the problem with that SQL is not crystal clear. The order of operations in
which things will happen with the SQL above is:
1. The 2 tables are joined on the condition that the
Starbucks_Employees.StoreID column value is equal to the
Starbucks_Stores.store_id column values.
2. Groups are then created for each city - which means that
each distinct city will have it's own "group". So, there will
be a total of 3 groups one each for San Francisco, New York,
and Los Angeles.
3. The data we are interested in is selected from each group
that is created in step 2.
Because we end up with different groups based on the city, when we select a count(*), that will find the
total count of rows in each and every group. But, the problem is that when we select HourlyRate, there will
be multiple values for the HourlyRate within each group. For example, for the group created by the city of
San Francisco there will be 4 different values for the HourlyRate – 14, 10, 11, and 13.
So the question is which value of the HourlyRate should be selected from each group? Well, it could be
any one of those values – which is why that SQL results in an error. This is because what we are asking
for is NOT specific enough – hopefully this is crystal clear now to you.
If the same HourlyRate were part of an aggregate function like MAX then it would simply return the highest
HourlyRate within each group. And that is why having an aggregate function would fix the SQL error –
because only one value will be selected from any given group.
So, this SQL is perfectly fine because we are more specific in what we ask for – but this SQL would only
work for you if you actually want the highest HourlyRate for each city:
SELECT count(*) as num_employees, MAX(HourlyRate)
FROM Starbucks_Employees JOIN Starbucks_Stores
ON Starbucks_Employees.StoreID = Starbucks_Stores.store_id
GROUP BY city
Fix the error by adding column to the group clause
Another way to fix the error is to simply add the HourlyRate column to the group by clause. This also means
that having the HourlyRate column wrapped in aggregate function is no longer necessary. So you could write
some SQL like this and it would fix the error:
SELECT count(*) as num_employees, HourlyRate
FROM Starbucks_Employees JOIN Starbucks_Stores
ON Starbucks_Employees.StoreID = Starbucks_Stores.store_id
GROUP BY city, HourlyRate
This would then create groups based on the unique combination of the values in the HourlyRate and City
columns. This means that there will be a different group for each HourlyRate and City combination – so $11,
San Francisco and $11, Los Angeles will be 2 different groups. If you need to read up more on this topic
then you can go here: Group By With Multiple Columns
With the SQL above, each group will only have one value for the HourlyRate, which also means that there
will be no ambiguity or confusion when selecting the HourlyRate since there is only possible value to select.
It is now very clear that one and only one HourlyRate value can be returned for each group.
Adding the column to the group by clause fixes the error but will alter the
data that is returned
But, one very important thing to note is that even though adding the column to the group by will fix the
error, it will also change the groups that are created. This means that the data returned will be completely
different from what was returned before. So, the count(*) function will no longer return the count of
employees in a given city, and will instead return the number of rows in each group created by the unique
combination of the HourlyRate and city columns.
MySQL – selecting non-aggregate columns not in the group by
One very important thing that you should know is that MySQL actually allows you to have non-aggregated
columns in the select list even if they are not a part of the group by clause (a quick side note: a nonaggregated column is simply a column that is not wrapped within an aggregate function). What this means
is that you will not receive an error if you try to run any of the “bad” SQL above in MySQL. The reason it is
allowed in MySQL is because MySQL assumes that you know what you are doing – and it does actually make
sense in some scenarios. For instance, let’s refer back to the SQL that we started with:
SELECT Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
The reason the original SQL code (presented above) works just fine in MySQL is because there is a 1 to 1
mapping of salesperson name to ID – meaning that for every unique salesperson ID there is only one
possible name. Another way of saying that is that each salesperson can only have one name. So when we
create groups (which is done in the “GROUP BY salesperson_id”) based on the salesperson ID, each group
will only have one and only one name.
This SQL will also run just fine in MySQL without returning an error:
SELECT count(*) as num_employees, HourlyRate
FROM Starbucks_Employees JOIN Starbucks_Stores
ON Starbucks_Employees.StoreID = Starbucks_Stores.store_id
GROUP BY city
But, even though the code above will not return an error, the HourlyRate that is returned by MySQL will be
some arbitrary (random) value within each group. This is because when we create each group based on the
city, each group can have different values for the HourlyRate.
In other words, there is no one to one mapping between the HourlyRate and the city like we had before with
the salesperson ID and the name. So, because we are not being specific as to which HourlyRate we want,
MySQL will return an arbitrary value . For instance, in the group created by the city of San Francisco, MySQL
could return the HourlyRate for any employee who works in San Francisco – whether it is 14, 10, 11, or 13
we don’t really know since it is arbitrary/random in MySQL.
That concludes part 1 of our more difficult and complex SQL questions. Click on next to check out the next
question that’s a part of our advanced SQL interview questions list.
This is part 2 of our advanced practice SQL interview questions and answers. We highly suggest that you
read part 1 of our Advanced SQL interview Questions before reading this, since a lot of the concepts
presented in this portion are discussed in more depth in part 1.
The problem is based on the tables presented below where salespeople have orders with certain customers
that are in the Customers table.
Salesperson
Customer
ID Name Age Salary
ID Name
1
Abe
61
140000
4
Samsonic pleasant J
2
Bob
34
44000
6
Panasung oaktown J
5
Chris 34
40000
7
Samony
jackson B
7
Dan
41
52000
9
Orange
Jackson B
8
Ken
57
115000
38
38000
11 Joe
City
Industry Type
Orders
Number order_date cust_id salesperson_id Amount
10
8/2/96
4
2
2400
20
1/30/99
4
8
1800
30
7/14/95
9
1
460
40
1/29/98
7
2
540
50
2/3/98
6
7
600
60
3/2/98
6
7
720
70
5/6/98
9
7
150
In the tables above, each order in the Orders table is associated with a given Customer through
the cust_id foreign key column that references the ID column in the Customer table.
Here is the problem: find the largest order amount for each salesperson and the associated order
number, along with the customer to whom that order belongs to. You can present your answer in
any database’s SQL – MySQL, Microsoft SQL Server, Oracle, etc.
The answer to the problem and explanation
This question seems to be quite simple – but as you will soon find out it is deceptively complex. For each
salesperson, all we need to retrieve is the largest order amount, and the associated order number. In
order to retrieve that information we shouldbe able to simply do a join between the Orders and Salesperson
tables wherever the Salesperson.ID is equal to the Orders.salesperson_id (this would be our join predicate).
Then, we could group the results of that join by the Orders.salesperson_id column and retrieve both the
highest valued order (by using max(Amount)), and the associated Order number.
Let’s say that we choose to write our answer in MySQL. In MySQL we could legally write some code that
looks like this:
SELECT Orders.Number, max(Amount)
FROM Orders JOIN Salesperson
ON Salesperson.ID = Orders.salesperson_id
GROUP BY Orders.salesperson_id
And, if we run that code above it will return this as a result:
Number
30
10
50
20
max(Amount)
460
2400
720
1800
The problem with the data returned in MySQL
But, there’s a problem with the results returned from running the SQL above, and it should be fairly obvious
once you actually look at the data in the tables we have above. Here is the problem: Order number 50 does
not have an amount of “720″ – that amount actually belongs to order number 60. So, what is going on
here? Why do the results return an order number that is not even in the same row as the max(Amount) of
720? And why are all of the other results correct?
Understanding the group by statement is critical
Well, we will have to explain a bit more about what’s going on with the group by. If you already read Part 1
of the advanced SQL interview questions then you should understand exactly what the problem is with the
SQL above, and you can safely skip down to the section that says “New approach to the problem – start with
a subquery”. Unless you want to reinforce the concepts presented in part 1, in which case we highly
recommend that you read this entire explanation to this rather difficult interview question.
When we group by the salesperson ID, there will be one group created for each and every salesperson ID.
So, there will be 6 groups created – 1 for ID of 1, another for ID of 2, and others for ID’s 5, 7, 8, and 11.
Inside those groups will be any rows that share the same salesperson ID values.
When we select the max(Amount), MySQL will simply look for the highest value for Amountwithin each
group and return that value. And when we select Orders.Number, MySQL is not going to return every
Orders.Number value from each group – it is only going to select onevalue from each group.
Subscribe to our newsletter for more free interview questions.
Our SQL is not specific enough
But, the question is which order number should be returned from each group? Each group can potentially
have more than just one order number as long as there are more than one rows belonging to the group.
And that is the exact problem – the SQL that we wrote is not specific enough, and MySQL will
justarbitrarily/randomly return one of the values of the Orders.Number within each group. In this case,
because order number 50 is part of the group created by the salesperson_id’s of 7, it will return 50. MySQL
could just as well have returned order numbers 60 or 70 – the point is that it just randomly chooses one
order number from each group. For the group created by salesperson ID of 2, the fact that the order
number 10 is chosen (order number 10 corresponds to the largest order amount of 2400) is just pure
coincidence – MySQL could have returned us order number 40, which is also a part of the same group as
salesperson ID of 2.
Most relational database implementations would have thrown an error if we tried to run the SQL above
because the results are potentially arbitrary, as we just illustrated. MySQL is the exception, because it allows
us to run the SQL above error-free, but as we illustrated the data returned could potentially not make any
sense. Be sure to read Part 1 of the advanced SQL interview questions for more details on why.
Well, now we know that there is definitely an issue with the SQL above, so how can we write a good query
that would give us exactly what we want – along with the correct order number?
New approach to the problem – start with a subquery
Now let’s instead just try to break the problem down into more manageable pieces – starting with a simple
subquery. Here is a subquery to get the highest valued order for each salesperson:
SELECT salesperson_id, MAX(Amount) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
Running the query above will return this:
salesperson_id MaxOrder
1
460
2
2400
7
720
8
1800
The query above gives us the salesperson_id and that salesperson’s associated highest order amount – but
it still does not give us the order number associated with the highest order amount. So, how can we find the
order number as well?
Clearly we need to do something else with the subquery we have above that will also give us the correct
order number. What are our options? Try to come up with an answer on your own before reading on.
Well, we can do a join with the results of the subquery above. But, on what condition should our join be
done and what exactly should we be joining the subquery above with?
What if we join our subquery above with data from the Orders table, where the join is done on the basis that
the salesperson_id matches, AND that the value in the Order table’s Amount column is equal to the amount
(MaxOrder) returned from the subquery? This way, we can match up the correct Order Number with the
correct corresponding value for the maximum Order Amount for a given salesperson_id.
With that in mind, we can write this query:
select salesperson_id, Number as OrderNum, Amount from Orders
JOIN ( -- this is our subquery from above:
SELECT salesperson_id, MAX(Amount) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) as TopOrderAmountsPerSalesperson
USING (salesperson_id)
where Amount = MaxOrder
Running the query above returns us this:
salesperson_id OrderNum Amount
8
20
1800
1
30
460
2
10
2400
7
60
720
How does the query work exactly?
How does the query above work exactly? It’s actually pretty simple. First, the subquery (which is basically a
derived table here, named TopOrderAmountsPerSalesperson) returns the orders with the highest dollar
amounts per salesperson, and the associated salesperson ID. So, now we have each salesperson’s highest
valued order and his/her ID in a derived table. That derived table (the results from the subquery) is then
joined with the entire Orders table on the condition that the salesperson ID matches and that the Amount
from the Orders table matches the MaxOrder amount returned from the derived table. What’s the point of
this? Well, that join will give us the correct OrderNumber since it is matching on both the salesperson ID and
the amount. Even if there are 2 rows with the same exact salesperson ID and amount it will not even matter
because no matter which ordernumber is associated with that row, the result set will be exactly the same.
And remember that the whole reason we are doing this is to avoid the original problem with not being able
to select a non-aggregated column with a group by.
Now, retrieving the salesperson name is simple. Try to figure it out on your own.
Here is how we retrieve the salesperson name – we just use another join with the Salesperson table and
select the Name:
SELECT salesperson_id, Name,
Orders.Number AS OrderNumber, Orders.Amount
FROM Orders
JOIN Salesperson
ON Salesperson.ID = Orders.salesperson_id
JOIN (
SELECT salesperson_id, MAX( Amount ) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) AS TopOrderAmountsPerSalesperson
USING ( salesperson_id )
WHERE Amount = MaxOrder
Running the query above returns this:
salesperson_id Name OrderNumber Amount
1
Abe
30
460
2
Bob
10
2400
7
Dan
60
720
8
Ken
20
1800
And, finally we have our answer! But one last thing – let’s check for corner cases. What would happen if we
add one more row to the table where a given salesperson has 2 or more orders that have the same value for
the highest amount? For example, let’s add this row to the Orders table:
Number order_date cust_id salesperson_id Amount
80
02/19/94
7
2
2400
This now means that the salesperson with an ID of 2 has 2 orders with an amount of 2400 in the Orders
table. And, if we run the SQL above again, we will get this as a result (note the extra row for Bob):
salesperson_id Name OrderNumber Amount
1
Abe
30
460
2
Bob
40
2400
7
Dan
60
720
8
Ken
20
1800
2
Bob
80
2400
Now, the question is if we only want one of Bob’s orders to show up, how can we eliminate the duplicate?
Again, try to figure this out on your own before reading our answer.
Well, we could add a GROUP BY salesperson_id, Amount to the end of the query, which would create
separate groups for each unique combination of the salesperson ID and the Amount. This would give us a
query that looks like this:
SELECT salesperson_id, Salesperson.Name,
Number AS OrderNumber, Amount
FROM Orders
JOIN Salesperson
ON Salesperson.ID = Orders.salesperson_id
JOIN (
SELECT salesperson_id, MAX( Amount ) AS MaxOrder
FROM Orders
GROUP BY salesperson_id
) AS TopOrderAmountsPerSalesperson
USING ( salesperson_id )
WHERE Amount = MaxOrder
GROUP BY salesperson_id, Amount
Now, running this query even with the duplicate row in the Orders table would return us this:
salesperson_id Name OrderNumber Amount
1
Abe
30
460
2
Bob
40
2400
7
Dan
60
720
8
Ken
20
1800
And that’s it – we are now good to go, and we have a final answer to this difficult interview question! This
concludes our series of complex SQL interview questions – hopefully you found them challenging!
What’s the difference between data mining and data warehousing?
Data mining is the process of finding patterns in a given data set. These patterns can often provide
meaningful and insightful data to whoever is interested in that data. Data mining is used today in a wide
variety of contexts – in fraud detection, as an aid in marketing campaigns, and even supermarkets use it to
study their consumers.
Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources
into one common repository.
Example of data mining
If you’ve ever used a credit card, then you may know that credit card companies will alert you when
they think that your credit card is being fraudulently used by someone other than you. This is a perfect
example of data mining – credit card companies have a history of your purchases from the past and know
geographically where those purchases have been made. If all of a sudden some purchases are made in a
city far from where you live, the credit card companies are put on alert to a possible fraud since their data
mining shows that you don’t normally make purchases in that city. Then, the credit card company can
disable your card for that transaction or just put a flag on your card for suspicious activity.
Another interesting example of data mining is how one grocery store in the USA used the data it collected on
it’s shoppers to find patterns in their shopping habits. They found that when men bought diapers on
Thursdays and Saturdays, they also had a strong tendency to buy beer. The grocery store could have used
this valuable information to increase their profits. One thing they could have done – odd as it sounds – is
move the beer display closer to the diapers. Or, they could have simply made sure not to give any discounts
on beer on Thursdays and Saturdays. This is data mining in action – extracting meaningful data from a huge
data set.
Subscribe to our newsletter for more free interview questions.
Example of data warehousing – Facebook
A great example of data warehousing that everyone can relate to is what Facebook does. Facebook basically
gathers all of your data – your friends, your likes, who you stalk, etc – and then stores that data into one
central repository. Even though Facebook most likely stores your friends, your likes, etc, in separate
databases, they do want to take the most relevant and important information and put it into one central
aggregated database. Why would they want to do this? For many reasons – they want to make sure that
you see the most relevant ads that you’re most likely to click on, they want to make sure that the friends
that they suggest are the most relevant to you, etc – keep in mind that this is the data mining phase, in
which meaningful data and patterns are extracted from the aggregated data. But, underlying all these
motives is the main motive: to make more money – after all, Facebook is a business.
We can say that data warehousing is basically a process in which data from multiple sources/databases is
combined into one comprehensive and easily accessible database. Then this data is readily available to any
business professionals, managers, etc. who need to use the data to create forecasts – and who basically use
the data for data mining.
Datawarehousing vs Datamining
Remember that data warehousing is a process that must occur before any data mining can take place. In
other words, data warehousing is the process of compiling and organizing data into one common database,
and data mining is the process of extracting meaningful data from that database. The data mining process
relies on the data compiled in the datawarehousing phase in order to detect meaningful patterns.
In the Facebook example that we gave, the data mining will typically be done by business users who are not
engineers, but who will most likely receive assistance from engineers when they are trying to manipulate
their data. The data warehousing phase is a strictly engineering phase, where no business users are
involved. And this gives us another way of defining the 2 terms: data mining is typically done by business
users with the assistance of engineers, and data warehousing is typically a process done exclusively by
engineers.
What is ternary (also known as) three-valued logic in SQL?
This is a question best illustrated by an example. Suppose we have the following SQL table with the columns
modelNumber and laptopModel:
Computer {
modelNumber CHAR(30) NOT NULL,
laptopModel
CHAR(15),
}
Assume that the table stores entries for all the makes of PC’s and laptops – and if it’s a laptop the
laptopModel field is set. Given that information, let’s try to answer a question to explain three valued
logic: How would you write a SQL statement that returns only the PC’s and no laptops from the
table above?
You might think that the answer to this question is very easy, and the first thing that may come to mind is
this answer:
SELECT * FROM Computer WHERE laptopModel = null
SQL uses Ternary/Three valued logic
Actually the SQL code above will not return anything at all – not even the PC’s that are actually in the table!
The reason has to do with the fact that the fact that SQL uses ternary or three-valued logic. The concept of
ternary logic is important to understand in order to write effective SQL queries.
SQL Logical Operations have 3 possible values
This is an important fact to remember: logical operations in SQL have 3 possible values NOT 2 possible
values. What are those 3 possible values? They are TRUE, FALSE, and UNKNOWN. The UNKNOWN value,
as it’s name suggests, simply means that a value is unknown or unrepresentable. Running the SQL code
that we presented above will return UNKNOWN for a value.
The equality operator
The problem with the SQL statement above is the fact that we used the equality operator (the “=”) in order
to test for a NULL column value. In the majority of databases, a comparison to NULL returns UNKNOWN –
this is true even when comparing NULL to NULL. The correct way to check for a NULL or a non-NULL column
is to use the IS NULL or the IS NOT NULL syntax. So, the SQL query should be changed to this:
SELECT * FROM Computer WHERE laptopModel IS NULL
This is a common mistake – so be sure to account for UNKNOWN values in WHERE clause conditions.
Let’s say that you are given a SQL table called “Compare” (the
schema is shown below) with only one column called “Numbers”.
Compare
{
Numbers INT(4)
}
Write a SQL query that will return the maximum value from the
“Numbers” column, without using a SQL aggregate like MAX or MIN.
This problem is difficult because you are forced to think outside the box, and use whatever SQL you know to
solve a problem without using the most obvious solution (doing a “select MAX…” from the table).
Probably the best way to start breaking this problem down is by creating a sample table with some actual
data that matches the schema given. Here is a sample table to start out with:
Compare
Numbers
30
70
-8
90
The value that we want to extract from the table above is 90, since it is the maximum value in the
table. How can we extract this value from the table in a creative way (it will have to be creative since
we can’t use the max or min aggregates)? Well, what are the properties of the highest number (90 in our
example)? We could say that there are no numbers larger than 90 – that doesn’t sound very promising in
terms of solving this problem.
We could also say that 90 is the only number that does not have a number that is greater than it. If we can
somehow return every value that does not have a value greater than it then we would only be returning 90.
This would solve the problem. So, we should try to design a SQL statement that would return every number
that does not have another number greater than it. Sounds fun right?
Let’s start out simple by figuring out which numbers do have any numbers greater than themselves. This is
an easier query. We can start by joining the Compare table with itself – this is called a self join, which you
can read more about here in case you are not familiar with self joins: Example of self join in SQL .
Using a self join, we can create all the possible pairs for which each value in one column is greater than the
corresponding value in the other column. This is exactly what the following query does:
SELECT Smaller.Numbers, Larger.Numbers
FROM Compare as Larger JOIN Compare AS Smaller
ON Smaller.Numbers < Larger.Numbers
Now, let's use the sample table we created, and we end up with this table after running the query above:
Smaller Larger
-8
90
30
90
70
90
-8
70
30
70
70
90
Now we have every value in the "Smaller" column except the largest value of 90. This means that all we
have to do is find the value that is not in the Smaller column (but is in the Compare table), and that will
give us the maximum value. We can easily do this using the NOT IN operator in SQL.
Subscribe to our newsletter for more free interview questions.
But before we do that we have to change the query above so that it only selects the "Smaller" column because that is the only column we are interested in. So, we can simply change our query above to this in
order to get the "Smaller" column:
SELECT Smaller.Numbers
FROM Compare as Larger JOIN Compare AS Smaller
ON Smaller.Numbers < Larger.Numbers
Now, all we have to do is apply the NOT IN operator to find the max value.
SELECT Numbers
FROM Compare
WHERE Numbers NOT IN (
SELECT Smaller.Numbers
FROM Compare AS Larger
JOIN Compare AS Smaller ON Smaller.Numbers < Larger.Numbers
)
This will give us what we want - the maximum value. But there is one small problem with the SQL above - if
the maximum value is repeated in the Compare table then it will return that value twice. We can prevent
that by simply using the DISTINCT keyword. So, here's what the query looks like now:
SELECT DISTINCT Numbers
FROM Compare
WHERE Numbers NOT IN (
SELECT Smaller.Numbers
FROM Compare AS Larger
JOIN Compare AS Smaller ON Smaller.Numbers < Larger.Numbers
)
And there we have our final answer. Of course, some of you may be saying that there is a much simpler
solution to this problem. And you would be correct. Here is a simpler answer to the problem using the SQL
Top clause along with the SQL Order By clause - this is what it would look like in SQL Server:
select TOP 1 -- select the very top entry in result set
Numbers
from
Compare
order by
Numbers DESC
And since MySQL does not have a TOP clause this is what it would look like in MySQL using just ORDER BY
and LIMIT :
select
Numbers
from
Compare
order by
Numbers DESC - order in descending order
LIMIT 1 --retrieve only one value
So, even though there are a couple of much simpler answers it is nice to know the more complicated answer
using a self join so that you can impress your interviewer with your knowledge.
Provide an example of SQL Injection
A SQL injection attack is exactly what the name suggests – it is where a hacker tries to “inject” his
harmful/malicious SQL code into someone else’s database, and force that database to run his SQL. This
could potentially ruin their database tables, and even extract valuable or private information from their
database tables. The idea behind SQL injection is to have the application under attack run SQL that it was
never supposed to run. How do hackers do this? As always, it’s best to show this with examples that will act
as a tutorial on SQL injection.
SQL Injection Example
In this tutorial on SQL injection, we present a few different examples of SQL injection attacks, along with
how those attacks can be prevented. SQL injection attacks typically start with a hacker inputting his or her
harmful/malicious code in a specific form field on a website. A website ‘form’, if you don’t already know, is
something you have definitely used – like when you log into Facebook you are using a form to login, and a
form input field can be any field on a form that asks for your information – whether it’s an email address or
a password, these are all form fields.
For our example of SQL injection, we will use a hypothetical form which many people have probably dealt
with before: the “email me my password” form, which many websites have in case one of their users forgets
their password.
Subscribe to our newsletter for more free interview questions.
The way a typical “email me my password” form works is this: it takes the email address as an input from
the user, and then the application does a search in the database for that email address. If the application
does not find anything in the database for that particular email address, then it simply does not send out an
email with a new password to anyone. However, if the application does successfully find that email address
in its database, then it will send out an email to that email address with a new password, or whatever
information is required to reset the password.
But, since we are talking about SQL injection, what would happen if a hacker was not trying to input a valid
email address, but instead some harmful SQL code that he wants to run on someone else’s database to steal
their information or ruin their data? Well, let’s explore that with an example, starting from how a hacker
would typically get started in order to figure out a system works.
Starting the SQL Injection Process
The SQL that would retrieve the email address in the “email me my password” form would typically look
something like this – keep in mind that this SQL really is embedded within a scripting language like PHP (it
depends on what scripting language is being used by the application):
SELECT data
FROM table
WHERE Emailinput = '$email_input';
This is, of course, a guess at what the SQL being run by the application would look like, because a hacker
would not know this information since he does not have access to the application code. The “$email_input”
variable is used to hold whatever text the user inputs into the email address form field.
Step 1: Figure out how the application handles bad inputs
Before a hacker can really start taking advantage of a weak or insecure application, he must figure out how
the application handles a simple bad input first. Think of this initial step as the hacker “feeling out” his
opponent before he releases the really bad SQL.
So, with that in mind, the first step a hacker would typically take is inputting an email address with a quote
appended to the end into the email form field. We will of course explain why further down below. But for
now, the input from the hacker would look something like this – pay special attention to the fact that there
is a quote appended to the end of the email address:
hacker@programmerinterview.com'
If the hacker puts that exact text into the email address form field then there are basically 2 possibilities:

1. The application will first “sanitize” the input by removing the extra quote at the end, because we
will assume that the application considers email addresses with quotes as potentially malicious. But,
a side note: email addresses can actually contain quotes according to IETF standards. Sanitizing
data is the act of stripping out any characters that aren’t needed from the data that is supplied – in
our case, the email address. Then, the application may run the sanitized input in the database
query, and search for that particular email address in the database (without the quote of course).

2. The application will not sanitize the input first, and will take the input from the hacker and
immediately run it as part of the SQL. This is what the hacker is hoping would happen, and we will
assume that this is what our hypothetical application is doing. This is also known as constructing the
SQL literally, without sanitizing. What it means is that the SQL being run by the application would
look like this – pay extra attention to the fact that there is now an extra quote at the end of the
WHERE statement in the SQL below:
SELECT data
FROM table
WHERE Emailinput = 'hacker@programmerinterview.com'';
Now, what would happen if the SQL above is executed by the application? Well, the SQL parser would see
that there is an extra quote mark at the end, and it will abort with a syntax error.
The error response is key, and tells the hacker a lot
But, what will the hacker see on the actual form page when he tries to input this email address with a quote
at the end? Well, it really depends on how the application is set up to handle errors in the database, but the
key here is that the hacker will most likely not receive an error saying something like “This email address is
unknown. Please register to create an account” – which is what the hacker would see if the application is
actually sanitizing the input. Since we are assuming that the application is not sanitizing it’s input, the
hacker would most likely see something like “Internal error” or “Database error” – and now the
hacker also knows that the input to the database is not being sanitized . And if the application is not
sanitizing it’s input then it means that the database can most probably be exploited, destroyed, and/or
manipulated in some way that could be very bad for the application owner.
Step 2: Run the actual SQL injection attack
Now that the hacker now knows the database is vulnerable he can attack further to get some really good
information. What could our hacker do? Well, if he’s been able to successfully figure out the layout of the
table, he could just type this harmful code on the form field (where the email address would normally go):
Y';
UPDATE table
SET email = 'hacker@ymail.com'
WHERE email = 'joe@ymail.com';
Note that the SQL above is completely SQL compliant and legitimate. You can see that after the Y there is
an extra quote followed by a semicolon, which allows the hacker to close the statement and then incredibly
run another statement of his own!
Then, if this malicious code is run by the application under attack, it would look like this:
SELECT data
FROM table
WHERE Emailinput = 'Y';
UPDATE table
SET email = 'hacker@ymail.com'
WHERE email = 'joe@ymail.com';
Can you see what this code is doing? Well, it is resetting the email address that belongs to “joe@ymail.com”
to “hacker@ymail.com”. This means that the hacker is now changing a user’s account so that it uses his own
email address – hacker@ymail.com. This then means that the hacker can reset the password – and have it
sent to his own email address! Now, he also has a login and a password to the application, but it is under
someone else’s account.
In the example above, we did skip some steps that a hacker would have taken to figure out the table name
and the table layout, because we wanted to keep this article relatively short. But, the idea is that SQL
injection is a real threat, and taking measures to prevent it is extremely important.
Now, the question is how to prevent SQL injection attacks? Well, read on to the next page or just click
here: SQL Injection Prevention.
How to prevent SQL injection attacks?
In our earlier tutorial on SQL Injection, one way to have prevented the SQL injection attack was by simply
having the user input sanitized – which we briefly discussed. Since we are dealing with email addresses in
our example, this means that we should be able to safely exclude certain characters which don’t normally
appear in email addresses. Here is a list of characters that would normally appear in emails, and anything
else should not be allowed inside the database – the user should just receive an error saying something like
“Invalid email address” if he tries to input an email address with any characters other than the ones below:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
! $ & * - = ^ ` | ~ # % ' + / ? _ { } @ .
Sanitizing input is not enough to prevent SQL injection
Unfortunately, just sanitizing user inputs is not enough to prevent SQL injection – as you will see in the
examples below. So, let’s explore some other options and see what works and why – it’s good to know all
the options, so be sure to read everything.
Subscribe to our newsletter for more free interview questions.
What about escaping strings? Shouldn’t this remove the threat of quotes in
SQL injection?
In case you forgot what “escaping” means in the context of programming, basically it’s just allowing special
characters (like single/double quotes, percent signs, backslashes, etc.) in strings to be saved so that they
remain as part of the string, and are not mis-interpreted as something else. For example, if we want to
include a single quote in a string that gets output to the browser in PHP (note in the word “it’s” we have a
single quote that will be output), then we have to add a backslash to the single quote so that PHP outputs it
as a single quote:
echo 'Programmer Interview - It\'s Great!';
So, when this is displayed on a webpage it will look like:
Programmer Interview - It's Great!
This is what’s called escaping strings. If we did not escape the quote in our string then it would not output
anything, and would result in a PHP error because the quote is also used to enclose the characters in an
echo statement.
Now, how would escaping the quotes have helped in our previous example? Remember our hacker is trying
to input this harmful/malicious code into the email form field:
Y';
UPDATE table
SET email = 'hacker@ymail.com'
WHERE email = 'joe@ymail.com';
What if we escape the quotes in the string above before we pass the SQL to the database? Well, that would
mean the quotes in the string become a part of the string that is searched for using the Emailinput field – in
effect the query is searching for an email address that is equal to that giant string. In other words, the
quotes are part of the string literal, and will not be interpreted as SQL. In MySQL, we can escape a quote
simply by prepending a quote with another quote – basically 2 single quotes will be interpreted as one quote
– which is what we do in the example below. So, the actual SQL that will be run looks like this:
SELECT data
FROM table
WHERE Emailinput = “ Y''; --the quote after the Y is escaped
UPDATE table SET email = ''hacker@ymail.com'' -- escape quotes
WHERE email = ''joe@ymail.com'' ”; --and, more quotes escaped
The key in the example above is that the quotes are now being treated as part of a string that gets
compared to a field in the table, and NOT being translated as actual SQL – it’s very important that you
understand the distinction because it is exactly the problem that escaping quotes solves for us.
If we do not escape quotes, it allows those quotes to become part of the SQL, and basically allows the
hacker to run 2 statements at once – which is exactly what is so dangerous. The 2nd statement (the “
UPDATE table SET email = ‘hacker@ymail.com’ WHERE email = ‘joe@ymail.com’;”) is what really messes
things up, because it allows the hacker to change the email address of an existing account to his own email
address. And, that 2nd statement is only allowed to run because the quotes are not escaped. Escaping a
string is also known as quotesafing, since you are essentially making the SQL query “safe” for quotes.
Just Escaping Strings Does Not Prevent SQL Injection
Although we went through an example in which escaping the string prevented the SQL injection attack, just
escaping strings is actually not enough protection against SQL injection attacks. A decent hacker can run
another attack, by exploiting the fact that some databases allow people to escape strings in more than just
one way. MySQL actually allows you to escape quotes in a variety of different ways – in fact as you can see
below in some information pulled straight from the MySQL reference pages, you can easily escape quote
characters by preceding them with a backslash – a “\” :
There are several ways to include quote characters within a
string that goes into a MySQL query:
1.A “'” inside a string quoted with “'” may be written as “''”.
2.A “"” inside a string quoted with “"” may be written as “""”.
3.Precede the quote character by an escape character (“\”).
Let’s say that we choose to escape quotes manually by just adding a single quote every time a string comes
in with a quote. Because, if we have a name field, we want to allow people with quotes in their name to be
able to save their name without any issues – for instance, someone with the name Jack O’Leary should be
able to be saved in our database without the quote causing any issues.
So, if we are retrieving someone’s name from our database, then the SQL may look like this:
SELECT *
FROM customers
WHERE name = 'Jack O’’Leary';
-- this works great
And this works perfectly fine because the double quotes will be interpreted as a single quote, and MySQL will
search for Jack O’Leary (with one quote), and not Jack O’’Leary (with 2 quotes).
But, let’s say a clever hacker realizes that you may be running a MySQL database, and knows that MySQL
also allows you to escape quotes by preceding the quote character with a backslash – so a quote could also
be escaped like this: \’
So, our clever hacker tries to insert a string like this into the email field on our form:
\'; DROP TABLE users;
But after we do our own manual string escaping (by adding the extra quote), that string turns into this:
\''; DROP TABLE users; --
So, the SQL that is run will look like this:
SELECT *
FROM customers
WHERE name = '\''; DROP TABLE users; --';
What happens when this SQL is run? Well, the ‘\’’ gets interpreted by MySQL as a string with a single quote,
meaning that the system will just search for a name with a single quote. The 2nd quote (the one that comes
after the \’), will allow the hacker to close the first statement, insert a semicolon, and then run another
malicious statement (the DROP TABLE users; code).
The hacker essentially fools the system into NOT escaping one of the extra quotes by taking advantage of 2
things here:


1. The application developer is trying to escape quotes himself by just appending an extra quote.
2. MySQL supports escape mechanisms other than just appending a quote. In this case, the hacker
also used the backslash escape mechanism to run his malicious code.
Remember, the quotes are key because it allows the hacker to close one statement and run any extra
statement of his or her choosing.
Let’s repeat this again: Just escaping quotes is not enough to prevent SQL
injection
The lesson here is that escaping quotes is unfortunately not enough to prevent all SQL injection attacks, and
also extremely difficult to do correctly on your own. And because of the latter, many languages that provide
database interface libraries have a function that will handle escaping strings for you. These functions will
handle both parsing of the string and quotesafeing as well – so when you use those functions you have a
much better chance of getting things done correctly.
If you are looking for actual examples of those functions, PHP has a function called
mysql_real_escape_string and Perl’s DBD module has a function called quote. You absolutely should be
using these functions before using form data in your queries.
Provide a definition and example of a prepared statement in PHP, Java,
and Perl. What are the advantages of using prepared statements? How do
prepared statements help prevent SQL injection attacks?
Prepared statements, also known as parameterized statements or parameterized SQL, can be thought of
as a template for SQL statements. Prepared statements allow database engines to run SQL statements
more efficiently because the same (or similar) SQL is used over and over again – we’ll explain more about
the details below. The key feature of a prepared statement is the fact that values can be plugged into the
query after the query is “prepared”, and ready to be executed. This will make more sense when you see the
examples below.
Prepared Statements use Placeholders
Prepared statements use question marks (?), which are placeholders for where actual values that will be
used in the SQL should be “plugged” in. The placeholders used in prepared statements are also known as
bound parameters, since they are essentially parameters that are passed to the SQL that “bind” to the SQL
at a later time.
Confused yet? Well, some examples should clear it up – it’s really not difficult to understand at all.
Examples of Prepared Statements
Below we present some examples of prepared statements in Java, PHP, and Perl. Here we are using the
interface libraries that each language provides to communicate with different database environments (like
MySQL, Oracle, etc). As you may already know, Java uses a library known as JDBC, PHP uses something
called PDO (PHP Data Objects), and Perl uses something called the Perl DBI (Perl Database Interface)
Example of a prepared statement in Java using JDBC:
java.sql.PreparedStatement stmt =
connection.prepareStatement(
"SELECT * FROM table WHERE EMAIL = ?");
/* The statement below sets "?" to an actual value that
is stored in the email variable, we are also assuming
that the email variable is set beforehand: */
stmt.setString(1, email);
stmt.executeQuery();
Example of a prepared statement in PHP using PDO:
$stmt = $dbh->prepare("SELECT * FROM
table WHERE EMAIL = ? ");
/* The statement below sets "?" to an actual value that
is stored in the email variable, we are also assuming
that the $email variable is set beforehand: */
$stmt->execute($email);
Example of a prepared statement in Perl using Perl DBI:
my $stmt = $dbh->prepare('SELECT * FROM
table WHERE EMAIL = ?');
/* The statement below sets "?" to an actual value that
is stored in the email variable, we are also assuming
that the email variable is set beforehand: */
$stmt->execute($email);
Looking at the examples above, you can see that even though the syntax details are different for each
language, they are all fundamentally the same because they all use a “?” as a placeholder for the value that
will be passed in later. And they all “prepare” the SQL first and execute later, which is of course the whole
point behind prepared statements. A good way to think of a prepared statement is as a template for SQL –
because of the fact that it’s not a complete SQL statement since it does not have the values it needs in the
placeholder areas.
What exactly happens when SQL is “prepared”?
Prepared SQL is created by calling the respective prepare method in each language, as you can see in the
examples above. The prepared SQL template is sent to the DBMS (whether it’s MySQL, DB2, or whatever)
with the placeholder values (the “?”) left blank. Then, the DBMS will parse, compile, and perform query
optimization on the template. After that, the DBMS will store the result, but it can not execute the result
because it, of course, does not have any values to execute with since there is no data in the
placeholders/parameters. The SQL is only executed once the respective execute function is called and data
is passed in for the parameters.
What are the advantages of using prepared statements?
Prepared statements provide 2 primary benefits. The first is that they provide better performance. Even
though a prepared statement can be executed many times, it is is compiled and optimized only once by the
database engine. Because of the fact that a prepared statement does not have to be compiled and optimized
each and every time the values in the query change, it offers a distinct performance advantage. But, keep in
mind that not all query optimization can occur when a prepared statement is compiled. This is because the
best query plan may also depend on the specific values of the parameters being passed in. The best query
plan may also change over time, because of the fact that the database tables and indices also change over
time.
Why are prepared statements so effective against SQL injection?
The second advantage of using prepared statements is that they are the best solution to preventing SQL
injection attacks. If you are not familiar with SQL injection, it’s highly recommended that you read our
article on SQL injection – every programmer should know what SQL injection is. A short, non-academic
description of SQL injection is this: any time an application runs SQL based on some user input through a
web form, then a hacker could potentially pass in some input with the intent of having his input run as part
of your SQL, and either steal or corrupt your users’ data.
Now, back to our discussion: the reason that prepared statements help so much in preventing SQL injection
is because of the fact that the values that will be inserted into a SQL query are sent to the SQL
server after the actual query is sent to the server. In other words, the data input by a potential hacker is
sent separately from the prepared query statement. This means that there is absolutely no way that the
data input by a hacker can be interpreted as SQL, and there’s no way that the hacker could run his own SQL
on your application. Any input that comes in is only interpreted as data, and can not be interpreted as part
of your own application’s SQL code – which is exactly why prepared statements prevent SQL injection
attacks.
What is the difference between parameterized queries and prepared
statements?
Both parameterized queries and prepared statements are exactly the same thing. Prepared statement seems
to be the more commonly used term, but there is no difference between both terms. Parameterized queries
and prepared statements are features of database management systems that that basically act
as templates in which SQL can be executed. The actual values that are passed into the SQL are the
parameters (for example, which value needs to be searched for in the WHERE clause), which is why these
templates are called parameterized queries. And, the SQL inside the template is also parsed, compiled, and
optimized before the SQL is sent off to be executed – in other words “prepared”. That is why these
templates are often called prepared statements as well. So, just remember that they are two different
names for the same thing. You can read a more detailed description about prepared statements (a.k.a.
parameterized queries) and why they are useful here: Prepared statements and SQL injection.
What is blind SQL Injection? Provide an example of blind sql injection as
well.
In our SQL Injection Tutorial, we discussed how hackers use error messages from the database that
they are trying to attack in order to determine whether or not that database is vulnerable to a SQL
injection attack. But, what if databaseerror messages are suppressed so that they are not displayed on
the web page of a site that is under attack? Do hackers have some other way of running a SQL injection
attack?
It turns out that hackers do actually have a way to run a SQL injection attack even when database error
messages are disabled. This form of SQL injection is known as blind SQL injection.
Blind SQL injection versus SQL injection
What exactly is the difference between blind SQL injection and normal SQL injection? Well, in normal SQL
injection hackers rely on error messages returned from the database in order to give them some clues on
how to proceed with their SQL injection attack. But with blind SQL injection the hacker does not need
to see any error messages in order to run his/her attack on the database – and that is exactly why it is
called blind SQL injection. So, even if the database error messages are turned off a hacker can still run a
blind SQL injection attack.
Here we present a tutorial on blind sql injection using an example of a hypothetical blind SQL injection
attack below.
Example of Blind SQL Injection
For our example, let’s suppose that we have a fake example social networking site – let’s call it
mybigspace.com – that has different profiles for people (just like Facebook). Each user on the site
mybigspace.com has a unique ID number assigned to them that identifies their profile. And, a query string is
used to retrieve each individual’s profile – so in the URL below, the user with an ID of 1008 will be pulled up
and displayed on the page. Let’s say that the user ID of 1008 belongs to a user named “John Doe”. Here is
what the URL that’s used to load John Doe’s profile would look like:
// this is John Doe's page:
http://www.mybigspace.com?id=1008
Let’s assume that the user ID would be used to retrieve the user’s profile details (like links to pictures,
his/her
birthday,
etc)
from
a
database.
So,
if
a
user
requests
the
URL
“http://www.mybigspace.com?id=1008″, then that query string would be used to run some SQL on the
servers of mybigspace.com. That SQL could look like this:
SELECT * FROM profiles WHERE ID = '1008';
We are assuming that there is a master table called profiles which stores all the different profiles of people
who are on the social networking site.
But now let’s say that the hacker tries to inject some SQL into the URL query string – so the hacker tries to
load this URL in his/her browser:
http://www.mybigspace.com?id=1008 AND 1=1
Blind SQL Injection uses simple boolean expressions
Loading the URL above might result in the server of mybigspace.com running the SQL below. Note that the
SQL below contains a simple boolean expression – a “1 = 1″ which will of course always return true because
one is always equal to one. That expression is appended to the query string in the URL above. Here is the
SQL we discussed:
SELECT * FROM profiles WHERE ID = '1008' AND 1=1;
We said that loading the URL “http://www.mybigspace.com?id=1008 AND 1=1″ mightresult in
mybigspace.com running the SQL above – the reason we said might is because of the fact that it depends
on whether the server would allow the extra characters after the 1008 to be injected into the SQL. If the
server does accept that SQL and allows it to be run, then the page that belongs to “John Doe” would be
loaded just fine. And, the hacker will know that his SQL injection attack worked, which means that the site
mybigspace.com is vulnerable to SQL injection attacks.
Of course, if the server does not respond with John Doe’s page when the URL
“http://www.mybigspace.com?id=1008 AND 1=1″ is requested, and instead just returns something like a
“Page not found”, then the hacker knows that a blind SQL injection attack is probably not possible.
So, let’s continue with the assumption that the website is vulnerable to blind SQL injection. Now, the hacker
can use more sophisticated queries to gather information about the server environment, and he can work
his way into getting some potentially sensitive data. For instance, now if the hacker wants to find out which
version of MySQL the server is running (assuming that it is running a MySQL database), then the hacker
could try to load this URL, which has some extra SQL appended to check to see if the server is running
MySQL version 5 :
http://www.mybigspace.com?id=1008 AND substring(@@version, 1, 1)=5
The SQL “substring(@@version, 1, 1)=5″ just checks to see if the version of MySQL that is currently running
is version 5 (through the “=5″ check), and if it is running version 5 then the page will just load normally
because the SQL will run without a problem (this is of course assuming that the website is vulnerable to SQL
injection and is basically just running the SQL that is part of the query string). If mybigspace.com’s server is
not running MySQL version 5 then the SQL “substring(@@version, 1, 1)=5″ will return false because the
check for equality is false. This means that the page will probably not load because the profile will not be
retrieved, and so the hacker knows that the version of MySQL being run is not version 5.
Blind SQL Injection Prevention
As we have made pretty clear so far, a blind SQL injection attack can be done even if the display of
database error messages is turned off. So, clearly turning off error message is not enough for prevention
purposes. Prepared statements are great for preventing blind SQL injection because the SQL is compiled
before any user input is added, which makes it impossible for user input to change and therefore
compromise the integrity of the SQL statement.
You can also use a vulnerability assessment tool to test your application and see how it responds to blind
SQL injection attacks. There are many tools like that out there which will do this for you for a small fee, and
are great at helping you prevent blind SQL injection attacks.
Blind SQL Injection is slower than normal attacks
The hacker can continue on this way, and slowly find out more and more information about the database
system under attack. You can also see that blind SQL injection is quite a bit slower than normal SQL
injection attacks because of the fact that the hacker has to deal with a database system that does not
display error messages.
What is the difference between a left outer join and a right outer join?
It is best to illustrate the differences between left outer joins and right outer joins by use of an
example. Here we have 2 tables that we will use for our example:
Employee
Location
EmpID EmpName
EmpID EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
39
Bangalore, India
For the purpose of our example, it is important to note that the very last employee in the
Employee table (Johnson, who has an ID of 25) is not in the Location table. Also, no one from
the Employee table is from Bangalore (the employee with ID 39 is not in the Employee table).
These facts will be significant in the discussion that follows.
A left outer join
Using the tables above, here is what the SQL for a left outer join would look like:
select * from employee left outer join location
on employee.empID = location.empID;
In the SQL above, we are joining on the condition that the employee ID’s match in the tables
Employee and Location. So, we will be essentially combining 2 tables into 1, based on the
condition that the employee ID’s match. Note that we can get rid of the "outer" in left outer join,
which will give us the SQL below. This is equivalent to what we have above.
select * from employee left join location
on employee.empID = location.empID;
What do left and right mean?
A left outer join retains all of the rows of the “left” table, regardless of whether there is a row
that matches on the “right” table. What are the “left” and “right” tables? That’s easy – the “left”
table is simply the table that comes first in the join statement – in this case it is the Employee
table, it’s called the “left” table because it appears to the left of the keyword “join”. So, the
“right” table in this case would be Location. The SQL above will give us the result set shown
below.
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
NULL
NULL
As you can see from the result set, all of the rows from the “left” table (Employee) are returned
when we do a left outer join. The last row of the Employee table (which contains the "Johson"
entry) is displayed in the results even though there is no matching row in the Location table. As
you can see, the non-matching columns in the last row are filled with a "NULL". So, we have
"NULL" as the entry wherever there is no match.
Subscribe to our newsletter on the left to receive more free interview questions!
What is a right outer join?
A right outer join is pretty much the same thing as a left outer join, except that all the rows from
the right table are displayed in the result set, regardless of whether or not they have matching
values in the left table. This is what the SQL looks like for a right outer join:
select * from employee right outer join location
on employee.empID = location.empID;
// taking out the "outer", this also works:
select * from employee right join location
on employee.empID = location.empID;
Using the tables presented above, we can show what the result set of a right outer join would
look like:
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
NULL
NULL
39
Bangalore, India
We can see that the last row returned in the result set contains the row that was in the Location
table, but which had no matching “empID” in the Employee table (the "Bangalore, India" entry).
Because there is no row in the Employee table that has an employee ID of "39", we have
NULL’s in that row for the Employee columns.
So, what is the difference between the right and left outer joins?
The difference is simple – in a left outer join, all of the rows from the “left” table will be
displayed, regardless of whether there are any matching columns in the “right” table. In a right
outer join, all of the rows from the “right” table will be displayed, regardless of whether there are
any matching columns in the “left” table. Hopefully the example that we gave above help
clarified this as well.
Should I use a right outer join or a left outer join?
Actually, it doesn’t matter. The right outer join does not add any functionality that the left outer
join didn’t already have, and vice versa. All you would have to do to get the same results from a
right outer join and a left outer join is switch the order in which the tables appear in the SQL
statement. If that’s confusing, just take a closer look at the examples given above.
In SQL, what’s the difference between a full join and an inner join?
A brief explanation of a join
Let’s start with a quick explanation of a join. Joins are used to combine the data from two tables,
with the result being a new, temporary table. The temporary table is created based on column(s)
that the two tables share, which represent meaningful column(s) of comparison. The goal is to
extract meaningful data from the resulting temporary table. Joins are performed based on
something called a predicate, which specifies the condition to use in order to perform a join.
It is best to illustrate the differences between full joins and inner joins by use of an example.
Here we have 2 tables that we will use for our example:
Employee
Location
EmpID EmpName
EmpID EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
39
Bangalore, India
For the purpose of our example, it is important to note that the very last employee in the
Employee table (Johson, who has an ID of 25) is not in the Location table. Also, no one from the
Employee table is from Bangalore (the employee with ID 39 is not in the Employee table). These
facts will be significant in the discussion that follows.
Full joins
Let’s start the explanation with full joins. Here is what the SQL for a full join would look like,
using the tables above:
select * from employee full join location
on employee.empID = location.empID;
Subscribe to our newsletter on the left to receive more free interview questions!
A full join will return all rows that match based on the “employee.empID = location.empID” join
predicate, and it will even return all the rows that do not match – which is why it is called a full
join. The SQL above will give us the result set shown below:
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
NULL
NULL
NULL
NULL
39
Bangalore, India
You can see in the table above that the full outer join returned all the rows from both the tables –
and if the tables do have a match on the empID, then that is made clear in the results. Anywhere
there was not a match on the empID, there is a “NULL” for the column value. So, that is what a
full join will look like.
A full join is also known as a full outer join
It’s good to remember that a full join is also known as a full outer join – because it combines the
features of both a left outer join and a right outer join .
What about inner joins?
Now that we’ve gone over full joins, we can contrast those with the inner join. The difference
between an inner join and a full join is that an inner join will return only the rows that actually
match based on the join predicate – which in this case is “employee.empID = location.empID”.
Once again, this is best illustrated via an example. Here’s what the SQL for an inner join will
look like:
select * from employee inner join location on
employee.empID = location.empID
This can also be written as:
select * from employee, location
where employee.empID = location.empID
Now, here is what the result of running that SQL would look like:
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
The difference between the full join and inner join
We can see that an inner join will only return rows in which there is a match based on the join
predicate. In this case, what that means is anytime the Employee and Location table share an
Employee ID, a row will be generated in the results to show the match. Looking at the original
tables, one can see that those Employee ID’s that are shared by those tables are displayed in the
results. But, with a full join, the result set will retain all of the rows from both of the tables.
In SQL, what is the difference between a left join and a left outer join?
There is actually no difference between a left join and a left outer join – they both refer to the
exact same operation in SQL. An example will help clear this up.
Here we have 2 tables that we will use for our example:
Employee
Location
EmpID EmpName
EmpID EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
39
Bangalore, India
It’s important to note that the very last row in the Employee table does not exist in the Employee
Location table. Also, the very last row in the Employee Location table does not exist in the
Employee table. These facts will prove to be significant in the discussion that follows.
Left Outer Join
Here is what the SQL for a left outer join would look like, using the tables above:
select * from employee left outer join location
on employee.empID = location.empID;
Subscribe to our newsletter on the left to receive more free interview questions!
In the SQL above, we actually remove the "outer" in left outer join, which will give us the SQL
below. Running the SQL with the “outer” keyword, would give us the exact same results as
running the SQL without the “outer”. Here is the SQL without the “outer” keyword:
select * from employee left join location
on employee.empID = location.empID;
A left outer join (also known as a left join) retains all of the rows of the left table, regardless of
whether there is a row that matches on the right table. The SQL above will give us the result set
shown below.
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
NULL
NULL
What is the difference between a right outer join and a right join?
Once again, a right outer join is exactly the same as a right join. This is what the SQL looks like:
select * from employee right outer join location
on employee.empID = location.empID;
// taking out the "outer", this would give us
// the same results:
select * from employee right join location
on employee.empID = location.empID;
Using the tables presented above, we can show what the result set of a right outer join would
look like:
Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
NULL
NULL
39
Bangalore, India
We can see that the last row returned in the result set contains the row that was in the Location
table, but not in the Employee table (the "Bangalore, India" entry). Because there is no matching
row in the Employee table that has an employee ID of "39", we have NULL’s in the result set for
the Employee columns.
In SQL, what’s the difference between the having clause and the
group by statement?
In SQL, the having clause and the group by statement work together when using aggregate functions like
SUM, AVG, MAX, etc. This is best illustrated by an example. Suppose we have a table called emp_bonus as
shown below. Note that the table has multiple entries for employees A and B – which means that both
employees A and B have received multiple bonuses.
emp_bonus
Employee Bonus
A
1000
B
2000
A
500
C
700
B
1250
If we want to calculate the total bonus amount that each employee has received, then we would write a
SQL statement like this:
select employee, sum(bonus) from emp_bonus group by employee;
The Group By Clause
In the SQL statement above, you can see that we use the "group by" clause with the employee column. The
group by clause allows us to find the sum of the bonuses for eachemployee – because each employee is
treated as his or her very own group. Using the ‘group by’ in combination with the ‘sum(bonus)’ statement
will give us the sum of all the bonuses for employees A, B, and C.
Subscribe to our newsletter for more free interview questions.
Running the SQL above would return this:
Employee Sum(Bonus)
A
1500
B
3250
C
700
Now, suppose we wanted to find the employees who received more than $1,000 in bonuses for the year of
2012 – this is assuming of course that the emp_bonus table contains bonuses only for the year of 2012. This
is when we need to use the HAVING clause to add the additional check to see if the sum of bonuses is
greater than $1,000, and this is what the SQL look like:
GOOD SQL:
select employee, sum(bonus) from emp_bonus
group by employee having sum(bonus) > 1000;
And the result of running the SQL above would be this:
Employee Sum(Bonus)
A
1500
B
3250
Difference between having clause and group by statement
So, from the example above, we can see that the group by clause is used to group column(s) so that
aggregates (like SUM, MAX, etc) can be used to find the necessary information. The having clause is
used with the group by clause when comparisons need to be made with those aggregate functions – like to
see if the SUM is greater than 1,000, as in our example above. So, the having clause and group by
statements are not really alternatives to each other – but they are used alongside one another!
In SQL, how and when would you do a group by with multiple columns? Also provide an example.
In SQL, the group by statement is used along with aggregate functions like SUM, AVG, MAX, etc. Using
the group by statement with multiple columns is useful in many different situations – and it is best
illustrated by an example. Suppose we have a table shown below called Purchases. The Purchases table will
keep track of all purchases made at a fictitious store.
Purchases
purchase_date
item
items_purchased
2011-03-25 00:00:00.000 Wireless Mouse 2
2011-03-25 00:00:00.000 Wireless Mouse 5
2011-03-25 00:00:00.000 MacBook Pro
1
2011-04-01 00:00:00.000 Paper Clips
20
2011-04-01 00:00:00.000 Stapler
3
2011-04-01 00:00:00.000 Paper Clips
15
2011-05-15 00:00:00.000 DVD player
3
2011-05-15 00:00:00.000 DVD player
8
2011-05-15 00:00:00.000 Stapler
5
2011-05-16 00:00:00.000 MacBook Pro
2
Now, let’s suppose that the owner of the store wants to find out, on a given date, how many of each product
was sold in the store. Then we would write this SQL in order to find that out:
select purchase_date, item, sum(items_purchased) as
"Total Items" from Purchases group by item, purchase_date;
Subscribe to our newsletter on the left to receive more free interview questions!
Running the SQL above would return this:
purchase_date
item
Total Items
2011-03-25 00:00:00.000 Wireless Mouse 7
2011-03-25 00:00:00.000 MacBook Pro
1
2011-04-01 00:00:00.000 Paper Clips
35
2011-04-01 00:00:00.000 Stapler
3
2011-05-15 00:00:00.000 DVD player
11
2011-05-15 00:00:00.000 Stapler
5
2011-05-16 00:00:00.000 MacBook Pro
2
Note that in the SQL we wrote, the group by statement uses multiple columns: “group by item,
purchase_date;”. This allows us to group the individual items for a given date – so basically we are dividing
the results by the date the items are purchased, and then for a given date we are able to find how many
items were purchased for that date. This is why the group by statement with multiple columns is so useful!
In SQL, how do distinct and order by work together?
The best way to illustrate this is through an example. Let’s say that we have a table called Orders like the
one below – where each row represents a separate order.
Orders
ordernumber order_date cust_id salesperson_id Amount
10
8/2/96
4
2
540
20
1/30/99
4
8
1800
30
7/14/95
9
1
460
40
1/29/98
7
2
2400
50
2/3/98
6
7
600
60
3/2/98
6
7
720
70
5/6/98
9
7
150
Now suppose that we want to retrieve all of the salesperson ID’s and sort them in descending order
according to their highest respective Order Amount value (that would be the Amount column). This will
serve as a ranking of the salespeople to see who has the most valuable orders. And, of course, we only want
each salesperson ID to be displayed once in the results – we don’t really care about all of their order
amounts, just their highest order amount value.
So, now you think that you can just write some SQL like this to get what you want:
SELECT DISTINCT salesperson_id
FROM Orders
ORDER BY Amount DESC
-- in descending order, returns highest amount first...
DISTINCT and Order By in MySQL
If we run that query in MySQL you may have thought that it would return this:
salesperson_id
---2
8
7
1
Running the SQL above in MySQL actually returns this as the result set:
salesperson_id
---8
7
2
1
But, wait a minute…if you just look at the Orders table above you can see that the salesperson_id with the
highest corresponding Amount is not 8, but 2 – because the salesperson_id of 2 has an order with an
amount of 2400! And, 2 appears 3rd in the list. So what the heck is going on here – why is our SQL
returning such strange results?
Well, let’s analyze the query a bit more to see what is actually happening. We are asking for every distinct
salesperson_id in the Orders table, ordered by their corresponding order Amount. But, the problem here is
that the salespeople with salesperson_id values of 2 and 7 both have multiple orders in the Orders table.
The query itself is not specific enough
So, in the query above we are asking MySQL to retrieve every distinct value of the salesperson_id and order
those results by their corresponding Amount value. For example, when it comes across orders with
salesperson_id’s of 2, it does not know whether we want the row where the order amount is 540 or 2400 –
and it has to choose only one of those rows because we specifically asked for distinct values of the
salesperson_id . This means that it just chooses one of those rows arbitrarily/randomly – since we never
really told it which one. And, MySQL is obviously choosing the row where the amount is 540, because 2
should be returned at the top of our list if it chose the row where the Amount is 2400.
But, you might be thinking that we specify that we want to order the results by the descending Amount
values – so why doesn’t the SQL just take the highest value for each salesperson_id and use that? Well,
because we never really told SQL that is what we actually wanted! Look closely at the SQL and you
will see what I mean – do we ever actually specify to choose the highest Amount for EACH salesperson_id
and to use that value? No, we don’t!
And that means the problem is that the SQL is not specific enough – we have to tell the RDBMS exactly what
we want in order to get the right results, otherwise you get results that do not make sense. In other words,
when you do stupid things, stupid things happen.
Why does Mysql allow columns in the ORDER BY if they are not part of the
select DISTINCT list?
Actually, running the query above would result in an error message in other RDBMS’s like SQL Server. The
only reason MySQL allows it is because it assumes you know what you are doing – the query would actually
make sense if the Amount value was the same across different rows for a given salesperson_id. As always,
an example will help clarify what we mean here. Let’s suppose that the Orders table looks like this instead:
Orders
ordernumber order_date cust_id salesperson_id Amount
10
8/2/96
4
2
2400
20
1/30/99
4
8
1800
30
7/14/95
9
1
460
40
1/29/98
7
2
2400
50
2/3/98
6
7
600
60
3/2/98
6
7
600
70
5/6/98
9
7
600
Now, if we run that same exact query:
SELECT DISTINCT salesperson_id
FROM Orders
ORDER BY Amount DESC
-- in descending order, returns highest amount first...
We will now get results that make sense:
salesperson_id
---2
8
7
1
The reason we get the results that we expected is that now the rows with salesperson_id’s of 2 and 7 all
have the same exact value for Amount. This means that even though MySQL will arbitrarily choose a row in
the group of rows with salesperson_id of 2 or 7, each row will have the same exact Amount value as all
others, so it does not matter which row MySQL chooses in a given group – you will get the same
results.
So, we can say that it is safe to order by with a non-select column and select a different distinct column
when the different groups of that distinct column all contain the same values for the order by column. That
sounds confusing, but it should make sense if you paid attention to our example. If that condition does not
hold true, then you will run the risk of getting some very unexpected results, as we had shown above as
well.
Now, the question is what is a good workaround to the problem we presented above? Read on below to find
out – the solution we present should work across most (if not all) RDBMS’s.
Workaround for the “ORDER BY items must appear in the select list if
SELECT DISTINCT is specified” error message in SQL Server
As we mentioned above, MySQL would allow you to run a query like this without throwing any error:
SELECT DISTINCT salesperson_id
FROM Orders
ORDER BY Amount DESC
But, SQL Server actually does throw an error which says “ORDER BY items must appear in the select list if
SELECT DISTINCT is specified”. So, the question is what is a good solution to modify our SQL so that we can
workaround this error message, and get the results that we actually want?
Well, you might think that in order to fix the error message that you would get in SQL server you could just
write some code like this:
SELECT DISTINCT salesperson_id, Amount
FROM Orders
ORDER BY Amount DESC
But, think carefully about what the SQL above is doing. It is applying the DISTINCT keyword to both the
salesperson_id and Amount columns – which basically means that every row where those 2 columns have a
distinct combination of values will be returned in the results. Take a look at the Orders table and you can
see that every row in the table has a distinct combination of the salesperson_id and Amount values, which
also means that the salesperson_id and Amount will be returned from every row in the table when the SQL
above is run. Of course, the results will be ordered by the Amount in descending order.
And, this is what the results will look like when we run the SQL above:
salesperson_id
2
8
7
7
2
1
7
Amount
2400
1800
720
600
540
460
150
But, is this what we actually wanted? No! What we really want is the list of salesperson ID’s in order of who
has the highest valued order – where each salesperson ID only appears once in the result list . All the query
above is giving us is basically every row’s salesperson_id and Amount combination in the table, ordered by
the Amount value.
So what is a workaround for this problem – in other words, how can we be more specific to get what we
really want? Well, let’s rephrase the problem – what if we say we want to retrieve each salesperson ID
sorted by their respective highest dollar amount value (and only have each salesperson_id returned just
once)? This is different than just saying that we want each distinct salesperson ID sorted by their Amount,
because we are being more specific by saying that we want to sort by their respective highest dollar
amount value. Hopefully you see the difference.
Now that we have a more specific question in mind, let’s see if we can come up with a more
specific answerso that we can write the correct SQL. Well, since we want to find the highest Amount value
for each salesperson_id, what SQL construct do you think we should use? If you guessed group by you
would be correct – because in order to find the highest value for a group of salesperson_id’s, we would need
to use the GROUP BY statement. Then, we can order the results by the maximum value in each group of
salesperson_ids. So, this is what the SQL would look like:
SELECT distinct salesperson_id
FROM Orders
GROUP BY salesperson_id
ORDER BY MAX(Amount) DESC
-- in descending order, returns highest amount first...
Just to clarify how the group by will work – for the “group” of salesperson ID’s equal to 7, the maximum
value of the amount would be 720. And for the “group” of salesperson ID’s equal to 2, the maximum value
of the amount would be 2400. So, running the SQL above would give us these results, which is correct:
salesperson_id
---2
8
7
1
Finally we have a query that makes sense, which also gives us results that make sense!
In SQL, what is the default sort order of the Order By clause?
By default, the order by statement will sort in ascending order if no order (whether ascending or
descending) is explicitly specified. This means that because the default sort order is ascending, the values
will be sorted starting from the “smallest” value to the largest. This is true in all major RDBMS’s – including
MySQL, Oracle, Microsoft SQL Server, Teradata, SAP, and others.
An example showing the Order By default sort order:
Take a look at the simple table below.
Customers
cust_id cust_name
79
Joe
32
Bill
87
Akash
14
Sam
Now, let’s write some SQL to retrieve the cust_name values sorted by their respective cust_id’s, but note
that we do not specify whether to sort by descending or ascending order:
select cust_name
FROM Customers
ORDER BY cust_id
Because the order by will work in ascending order by default, the SQL above will return the following
results:
cust_name
Sam
Bill
Joe
Akash
Now you have seen the default behavior of the Order By clause in SQL – it will sort in ascending order.
Suppose that you are given the following simple database table called
Employee that has 2 columns named Employee ID and Salary:
Employee
Employee ID Salary
3
200
4
800
7
450
Write a SQL query to get the second highest salary from the table above.
Also write a query to find the nth highest salary in SQL, where n can be any
number.
The easiest way to start with a problem like this is to ask yourself a simpler question first. So, let’s ask
ourselves how can we find the highest salary in a table? Well, you probably know that is actually really easy
– we can just use the MAX aggregate function:
select MAX(Salary) from Employee;
Remember that SQL is based on set theory
You should remember that SQL uses sets as the foundation for most of its queries. So, the question is how
can we use set theory to find the 2nd highest salary in the table above? Think about it on your own for a bit
– even if you do not remember much about sets, the answer is very easy to understand and something that
you might be able to come up with on your own.
Figuring out the answer to find the 2nd highest salary
What if we try to exclude the highest salary value from the result set returned by the SQL that we run? If
we remove the highest salary from a group of salary values, then we will have a new group of values
whose highest salary is actually the 2nd highest in the original Employee table.
So, if we can somehow select the highest value from a result set thatexcludes the highest value, then we
would actually be selecting the 2nd highest salary value. Think about that carefully and see if you can come
up with the actual SQL yourself before you read the answer that we provide below. Here is a small hint to
help you get started: you will have to use the “NOT IN” SQL operator.
Solution to finding the 2nd highest salary in SQL
Now, here is what the SQL will look like:
SELECT MAX(Salary) FROM Employee
WHERE Salary NOT IN (SELECT MAX(Salary) FROM Employee )
Running the SQL above would return us “450″, which is of course the 2nd highest salary in the Employee
table.
Subscribe to our newsletter for more free interview questions.
An explanation of the solution
The SQL above first finds the highest salary value in the Employee table using “(select MAX(Salary) from
Employee)”. Then, adding the “WHERE Salary NOT IN” in front basically creates a new set of Salary
values that does not include the highest Salary value. For instance, if the highest salary in the Employee
table is 200,000 then that value will be excluded from the results using the “NOT IN” operator, and all
values except for 200,000 will be retained in the results.
This now means that the highest value in this new result set will actually be the 2nd highest value in the
Employee table. So, we then select the max Salary from the new result set, and that gives us 2nd highest
Salary in the Employee table. And that is how the query above works.
An alternative solution using the not equals SQL operator
We can actually use the not equals operator – the “<>” – instead of the NOT IN operator as an alternative
solution to this problem. This is what the SQL would look like:
select MAX(Salary) from Employee
WHERE Salary <> (select MAX(Salary) from Employee )
How would you write a SQL query to find the Nth highest salary?
What we did above was write a query to find the 2nd highest Salary value in the Employee table. But,
another commonly asked interview question is how can we use SQL to find theNth highest salary, where N
can be any number whether it’s the 3rd highest, 4th highest, 5th highest, 10th highest, etc? This is also an
interesting question – try to come up with an answer yourself before reading the one below to see what you
come up with.
The answer and explanation to finding the nth highest salary in SQL
Here we will present one possible answer to finding the nth highest salary first, and the explanation of
that answer after since it’s actually easier to understand that way. Note that the first answer we present
is actually not optimal from a performance standpoint since it uses a subquery, but we think that it will be
interesting for you to learn about because you might just learn something new about SQL. If you want to
see the more optimal solutions first, you can skip down to the sections that says “Find the nth highest salary
without a subquery” instead.
The SQL below will give you the correct answer – but you will have to plug in an actual value for N of
course. This SQL to find the Nth highest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata,
and almost any other RDBMS:
SELECT * /*This is the outer query part */
FROM Employee Emp1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)
How does the query above work?
The query above can be quite confusing if you have not seen anything like it before – pay special attention
to the fact that “Emp1″ appears in both the subquery (also known as an inner query) and the “outer” query.
The outer query is just the part of the query that is not the subquery/inner query – both parts of the query
are clearly labeled in the comments.
The subquery is a correlated subquery
The subquery in the SQL above is actually a specific type of subquery known as a correlatedsubquery. The
reason it is called a correlated subquery is because the the subquery uses a value from the outer query in
it’s WHERE clause. In this case that value is the Emp1 table alias as we pointed out earlier. A normal
subquery can be run independently of the outer query, but a correlated subquery can NOT be run
independently of the outer query. If you want to read more about the differences between correlated and
uncorrelated subqueries you can go here: Correlated vs Uncorrelated Subqueries.
The most important thing to understand in the query above is that the subquery is evaluated each and
every time a row is processed by the outer query. In other words, the inner query can not be processed
independently of the outer query since the inner query uses the Emp1 value as well.
Finding nth highest salary example and explanation
Let’s step through an actual example to see how the query above will actually execute step by step.
Suppose we are looking for the 2nd highest Salary value in our table above, so our N is 2. This means that
the query will look like this:
SELECT *
FROM Employee Emp1
WHERE (1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)
You can probably see that Emp1 and Emp2 are just aliases for the same Employee table – it’s like we just
created 2 separate clones of the Employee table and gave them different names.
Understanding and visualizing how the query above works
Let’s assume that we are using this data:
Employee
Employee ID Salary
3
200
4
800
7
450
For the sake of our explanation, let’s assume that N is 2 – so the query is trying to find the 2nd highest
salary in the Employee table. The first thing that the query above does is process the very first row of the
Employee table, which has an alias of Emp1.
The salary in the first row of the Employee table is 200. Because the subquery is correlated to the outer
query through the alias Emp1, it means that when the first row is processed, the query will essentially look
like this – note that all we did is replace Emp1.Salary with the value of 200:
SELECT *
FROM Employee Emp1
WHERE (1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > 200)
So, what exactly is happening when that first row is processed? Well, if you pay special attention to the
subquery you will notice that it’s basically searching for the count of salary entries in the Employee table
that are greater than 200. Basically, the subquery is trying to find how many salary entries are greater than
200. Then, that count of salary entries is checked to see if it equals 1 in the outer query, and if so then
everything from that particular row in Emp1 will be returned.
Note that Emp1 and Emp2 are both aliases for the same table – Employee. Emp2 is only being used in the
subquery to compare all the salary values to the current salary value chosen in Emp1. This allows us to find
the number of salary entries (the count) that are greater than 200. And if this number is equal to N-1
(which is 1 in our case) then we know that we have a winner – and that we have found our answer.
But, it’s clear that the subquery will return a 2 when Emp1.Salary is 200, because there are clearly 2
salaries greater than 200 in the Employee table. And since 2 is not equal to 1, the salary of 200 will clearly
not be returned.
So, what happens next? Well, the SQL processor will move on to the next row which is 800, and the
resulting query looks like this:
SELECT *
FROM Employee Emp1
WHERE (1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > 800)
Since there are no salaries greater than 800, the query will move on to the last row and will of course find
the answer as 450. This is because 800 is greater than 450, and the count will be 1. More precisely, the
entire row with the desired salary would be returned, and this is what it would look like:
EmployeeID Salary
7
450
It’s also worth pointing out that the reason DISTINCT is used in the query above is because there may be
duplicate salary values in the table. In that scenario, we only want to count repeated salaries just once,
which is exactly why we use the DISTINCT operator.
A high level summary of how the query works
Let’s go through a high level summary of how someone would have come up with the SQL in the first place
– since we showed you the answer first without really going through the thought process one would use to
arrive at that answer.
Think of it this way – we are looking for a pattern that will lead us to the answer. One way to look at it is
that the 2nd highest salary would have just one salary that is greater than it. The 4th highest salary would
have 3 salaries that are greater than it. In more general terms, in order to find the Nth highest salary,
we just find the salary that has exactly N-1 salaries greater than itself. And that is exactly what the
query above accomplishes – it simply finds the salary that has N-1 salaries greater than itself and returns
that value as the answer.
Find the nth highest salary using the TOP keyword in SQL Server
We can also use the TOP keyword (for databases that support the TOP keyword, like SQL Server) to find the
nth highest salary. Here is some fairly simply SQL that would help us do that:
SELECT TOP 1 Salary
FROM (
SELECT DISTINCT TOP N Salary
FROM Employee
ORDER BY Salary DESC
) AS Emp
ORDER BY Salary
To understand the query above, first look at the subquery, which simply finds the N highest salaries in the
Employee table and arranges them in descending order. Then, the outer query will actually rearrange those
values in ascending order, which is what the very last line “ORDER BY Salary” does, because of the fact that
the ORDER BY Default is to sort values in ASCENDING order. Finally, that means the Nth highest salary will
be at the top of the list of salaries, which means we just want the first row, which is exactly what “SELECT
TOP 1 Salary” will do for us!
Find the nth highest salary without using the TOP keyword
There are many other solutions to finding the nth highest salary that do not need to use the TOP keyword,
one of which we already went over. Keep reading for more solutions.
Find the nth highest salary in SQL without a subquery
The solution we gave above actually does not do well from a performance standpoint. This is because the
use of the subquery can really slow down the query. With that in mind, let’s go through some different
solutions to this problem for different database vendors. Because each database vendor (whether it’s
MySQL, Oracle, or SQL Server) has a different SQL syntax and functions, we will go through solutions for
specific vendors. But keep in mind that the solution presented above using a subquery should work across
different database vendors.
Find the nth highest salary in MySQL
In MySQL, we can just use the LIMIT clause along with an offset to find the nth highest salary. If that
doesn’t make sense take a look at the MySQL-specific SQL to see how we can do this:
SELECT Salary FROM Employee
ORDER BY Salary DESC LIMIT n-1,1
Note that the DESC used in the query above simply arranges the salaries in descending order – so from
highest salary to lowest. Then, the key part of the query to pay attention to is the “LIMIT N-1, 1″. The LIMIT
clause takes two arguments in that query – the first argument specifies the offset of the first row to return,
and the second specifies the maximum number of rows to return. So, it’s saying that the offset of the first
row to return should be N-1, and the max number of rows to return is 1. What exactly is the offset? Well,
the offset is just a numerical value that represents the number of rows from the very first row, and since the
rows are arranged in descending order we know that the row at an offset of N-1 will contain the (N-1)th
highest salary.
Find the nth highest salary in SQL Server
In SQL Server, there is no such thing as a LIMIT clause. But, we can still use the offset to find the nth
highest salary without using a subquery – just like the solution we gave above in MySQL syntax. But, the
SQL Server syntax will be a bit different. Here is what it would look like:
SELECT Salary FROM Employee
ORDER BY Salary DESC OFFSET N-1 ROW(S)
FETCH FIRST ROW ONLY
Note that I haven’t personally tested the SQL above, and I believe that it will only work in SQL Server 2012
and up. Let me know in the comments if you notice anything else about the query.
Find the nth highest salary in Oracle using rownum
Oracle syntax doesn’t support using an offset like MySQL and SQL Server, but we can actually use the
row_number analytic function in Oracle to solve this problem. Here is what the Oracle-specific SQL would
look like to find the nth highest salary:
select * from (
select Emp.*,
row_number() over (order by Salary DESC) rownumb
from Employee Emp
)
where rownumb = n; /*n is nth highest salary*/
The first thing you should notice in the query above is that inside the subquery the salaries are arranged in
descending order. Then, the row_number analytic function is applied against the list of descending salaries.
Applying the row_number function against the list of descending salaries means that each row will be
assigned a row number starting from 1. And since the rows are arranged in descending order the row with
the highest salary will have a 1 for the row number. Note that the row number is given the alias rownumb in
the SQL above.
This means that in order to find the 3rd or 4th highest salary we simply look for the 3rd or 4th row. The
query above will then compare the rownumb to n, and if they are equal will return everything in that row.
And that will be our answer!
Find the nth highest salary in Oracle using RANK
Oracle also provides a RANK function that just assigns a ranking numeric value (with 1 being the highest)
for some sorted values. So, we can use this SQL in Oracle to find the nth highest salary using the RANK
function:
select * FROM (
select EmployeeID, Salary
,rank() over (order by Salary DESC) ranking
from Employee
)
WHERE ranking = N;
The rank function will assign a ranking to each row starting from 1. This query is actually quite similar to the
one where we used the row_number() analytic function, and works in the same way as well.
We’ve now gone through many different solutions in different database vendors like Oracle, MySQL, and SQL
Server. Hopefully now you understand how to solve a problem like this, and you have improved your SQL
skills in the process! Be sure to leave a comment if you have any questions or observations.
What is a role in a database?
A database role is a collection of any number of permissions/privileges that can be assigned to one or more
users. A database role also is also given a name for that collection of privileges.
The majority of today’s RDBMS’s come with predefined roles that can be assigned to any user. But, a
database user can also create his/her own role if he or she has the CREATE ROLE privilege.
Advantages of Database Roles
Why are database roles needed? Well, let’s go over some of the advantages of using database roles and why
they would be necessary:
Roles continue to live in database even after users are deleted/dropped
Many times a DBA (Database Administrator) has to drop user accounts for various reasons – say, for
example, an employee quits the company so his/her user account is removed from the system. Now
suppose that those same user accounts need to be recreated later on – just assume that same employee rejoins the company later on and needs his same account. That employee’s user account probably had a lot of
specific permissions assigned to it. So, when his/her account was deleted then all of those permissions were
deleted as well, which creates a hassle for the DBA who has to reassign all of those permissions one by one.
But, if a role was being used then all of those permissions could have just been bundled into one role – and
then the process of re-instating that employee into the system would mean that the DBA simply reassigns
the role to the employee. And, of course that role could also be used for other users as well. So, this is a big
advantage of using a database role.
Roles save DBA’s time
Another advantage is the fact that a DBA can grant a lot of privileges with one simple command by
assigning a user to a role.
Database roles are present before users accounts are created
And finally, an advantage of database roles is that they can be used to assign a group of permissions that
can be re-used for new users who belong to a specific group of people who need those permissions. For
example, you may want to have a group of permissions in a role reserved just for some advanced users who
know what they are doing and assign that role to a user only when a new advanced user needs that role. Or,
you can have a group of privileges for users who are all working on the same project and need the same
type of access.
Disadvantages of Database Roles
The main disadvantage of using a database role is that a role may be granted to user, but that role may
have more privileges than that user may actually need. This could cause a potential security issue if that
user abuses his extra privileges and potentially ruins some part of the database.
An example of this is that in older versions of Oracle (before release 10.2), there is a role called CONNECT,
which included privileges like CREATE TABLE, CREATE VIEW, CREATE SESSIONS, ALTER SESSION, and
several other privileges. But, having all of these privileges is probably too much for a normal business user.
That is probably why in newer versions of Oracle (since version 10.2), the CONNECT role has been changed
so that it only has the CREATE SESSION privilege.
How to create a database role
Most RDBMS’s use the CREATE ROLE syntax to define a role. And then, the GRANT statement is used to give
permissions to that database role. But, the exact details vary from one RDBMS to another so it’s best to
consult the documentation.
Example of a database role
Here is an example of what creating a database role could look like:
CREATE ROLE advancedUsers;
GRANT UPDATE ON SOMETABLE
TO advancedUsers;
What does the CREATE USER Statement do in SQL? What is the syntax
and other details?
It’s pretty obvious what the CREATE USER statement does – it allows you to create a user in the database.
Most of the popular databases out there already provide some sort of graphical interface that allows you to
create users without actually typing in any SQL – like phpMyAdmin – which is a PHP interface to the MySQL
database. In any case, the SQL standard defines the CREATE USER statement.
SQL CREATE USER Syntax
Here is what the syntax of the CREATE USER statement looks like:
CREATE USER username
[IDENTIFIED BY password]
[other options];
CREATE USER Identified By
The Identified By clause lets you say how the database should authenticate the user. The exact syntax of
the Identified by clause varies from one database to another.
What is a database transaction? How do database transactions work?
Provide an example of a database transaction.
In databases, a transaction is a set of separate actions that must all be completely processed, or none
processed at all. When you think of a transaction, you should think of the phrase “all or nothing”, because
that is a defining feature of database transactions – either every part of the transaction is completed, or
nothing at all.
One thing that’s important to understand is that a transaction can consist of multiple SQL statements – not
just one. An example would be transferring some funds from one bank customer to another. This scenario
would have to both credit one customer and debit another – requiring updates to different rows in table, but
would be considered a single transaction.
A commonly used synonym for a transaction is a unit of work.
The acronym ACID can be used to remember the properties of database transactions. Here is what each
letter in the acronym ACID stands for:




Atomicity. This means that a transaction must remain whole – it’s all or nothing. So, the transaction
as a whole must either fully succeed or fully fail. If and when the transaction is a success, all of the
changes must be saved by the system. If the transaction fails, then all of the changes made by the
transaction must be completely undone and the system must revert back to it’s original state before
the changes were applied. The term rollback is used for the process that undoes any changes made
by a transaction that has failed – think of it as the database rolling back the changes of a failed
transaction. The term commit is used to refer to the process which makes the transaction changes
fixed – think of it as the database fully committing the transaction changes once and for all.
Consistency. This means that a transaction should change the database from one consistent state to
another.
Isolation. This means that each transaction should do it’s work independently of other transactions
that might be running at the same time.
Durability. This means that any changes made by a transactions that have run to completion should
stay permanent, even if the database fails or shuts down dues to something like power loss. You
might be confused, because clearly data in a database is always changing, so how could anything be
permanent? Well, permanent in this context simply means that the change made by the transaction
will not disappear if and when the database encounters some failure or shuts down.
RDBMS’s and Transaction support
Most RDBMS’s have support for transactions. What this means is that they are able to identify both the start
and end of every transaction and also log all changes made by a transaction in order to be ready for a
rollback if necessary. Of course, the way in which transactions are supported by each RDBMS varies from
one RDBMS vendor to another – so Oracle is different from MySQL in the way it supports transactions, as is
DB2 from SQL Server, etc.
What is a transaction log?
Most of the RDBMS’s that have transaction support record all of the transactions along with any changes
made by those transactions inside a transaction log. Inside the transaction log there is copy of what the
database looked like before and after any changes made by a transaction. This means that if a rollback is
necessary, then the record of what the database looked like before the changes were applied can be used to
reverse those changes that were made by the transaction. Also, a commit of a transaction is not really
considered finished until the transaction log has a record of the commit. If there is some sort of power
failure that brings down a database, then the transaction log may be the only way that data can be
recovered, especially because database changes are not written to disk immediately, and may not have
made it to disk before the database outage.
An example of a database transaction
While transaction support differs from one database to another, it’s hard to give an example of a transaction
without going into the specific syntax details of a particular RDBMS. But, some RDBM’s allow you to start a
transaction with a SQL statement that looks like “START TRANSACTION OR BEGIN TRANSACTION”. Then,
you follow that statement with the SQL that you would like to run as part of the transaction.
Which databases support transactions?
Here are some articles on different RDBMS’s and how they each support transactions:
What is a database deadlock? Provide an example and explanation of a
deadlock in a database.
In a database, a deadlock is a situation that occurs when two or more different database sessions have
some data locked, and each database session requests a lock on the data that another, different, session
has already locked. Because the sessions are waiting for each other, nothing can get done, and the sessions
just waste time instead. This scenario where nothing happens because of sessions waiting indefinitely for
each other is known as deadlock.
If you are confused, some examples of deadlock should definitely help clarify what goes on during deadlock.
And, you should probably read our explanation of database locks before proceeding since that will help your
understanding as well.
Database deadlock example
Suppose we have two database sessions called A and B. Let’s say that session A requests and has a lock on
some data – and let’s call the data Y. And then session B has a lock on some data that we will call Z. But
now, lets say that session A needs a lock on data Z in order to run another SQL statement, but that lock is
currently held by session B. And, let’s say that session B needs a lock on data Y, but that lock is currently
held by session A. This means that session B is waiting on session A’s lock and session B is waiting for
session A’s lock. And this is what deadlock is all about!
Let’s go through a more detailed (and less abstract) example of deadlock so that you can get a more specific
idea of how deadlock can arise.
Database deadlock example in banking
Let’s use an example of two database users working at a bank – let’s call those database users X and Y.
Let’s say that user X works in the customer service department and has to update the database for two of
the banks customers, because one customer (call him customer A) incorrectly received $5,000 in his
account when it should have gone to another customer (call him customer B) – so user X has to debit
customer X’s account by $5,000 and also credit customer Y’s account $5,000.
Note that the crediting of customer B and debiting of customer A will be run as a single transaction – this is
important for the discussion that follows.
Now, let’s also say that the other database user – Y – works in the IT department and has to go through the
customers table and update the zip code of all customers who currently have a zip code of 94520, because
that zip code has now been changed to 94521. So, the SQL for this would simply have a WHERE clause that
would limit the update to customers with a zip code of 94520.
Also, both customers A and B currently have zip codes of 94520, which means that their information will be
updated by database user Y.
Here is a breakdown of the events in our fictitious example that lead to deadlock:





1. Database user X in the customer service department selects customer A’s data and updates A’s
bank balance to debit/decrease it by $5,000. However, what’s important here is that there is no
COMMIT issued yet because database user X still has to update customer B’s balance to
increase/credit by $5,000 – and those 2 separate SQL statements will run as a single SQL
transaction. Most importantly, this means thatdatabase user X still holds a lock on the row for
customer A because his transaction is not fully committed yet (he still has to update
customer A). The lock on the row for customer A will stay until the transaction is committed.
2. Database user Y then has to run his SQL to update the zip codes for customers with zip codes of
94520. The SQL then updates customer B’s zip code. But, because the SQL statement from user Y
must be run as a single transaction, the transaction has not committed yet because all of the
customers haven’t had their zip codes changed yet. So, this means that database user Y holds a
lock on the row for customer B. .
3. Now, Database user X still has to run the SQL statement that will update customer B’s balance to
increase it by $5,000. But, now the problem is that database user Y has a lock on the row for
customer B. This means that the request to update customer B’s balance must wait for user Y to
release the lock on customer B. So, database user X is waiting for user Y to release a lock on
customer B.
4. Now, the SQL statement being run by user Y tries to update the zip code for customer A. But,
this update can not happen because user X holds a lock on customer A’s row. So, user Y is waiting
for a lock to be released by user X.
5. Now you can see that we have user X waiting for user Y to release a lock and user Y waiting for
user X to release a lock. This is the situation of deadlock, since neither user can make any progress,
and nothing happens because they are both waiting for each other. So, in theory, these two
database sessions will be stalled forever. But, read on to see how some DBMS’s deal with this
unique situation.
Database deadlock prevention
So now you have seen an example of deadlock. The question is how do DBMS’s deal with it? Well, very few
modern DBMS’s can actually prevent or avoid deadlocks, because there’s a lot of overhead required in order
to do so. This is because the DBMS’s that do try to prevent deadlocks have to try to predict what a database
user will do next, and the theory behind deadlock prevention is that each lock request is inspected to see if
it has the potential to cause contention. If that is the case, then the lock is not allowed to be placed.
Database deadlock detection
Instead of deadlock prevention, the more popular approach to dealing with database deadlocks is
deadlock detection. What is deadlock detection? Well, deadlock detection is based on the principle that one
of the requests that caused the deadlock should be aborted.
How does deadlock detection work?
There are two common approaches to deadlock detection: 1. Whenever a session is waiting for a lock to be
released it is in what’s known as a “lock wait” state. One way deadlock detection is implemented is to simply
set the lock wait time period to a certain preset limit (like 5 seconds). So, if a session waits more than 5
seconds for a lock to free up, then that session will will be terminated. 2. The RDBMS can regularly inspect
all the locks currently in place to see if there are any two sessions that have locked each other out and are
in a state of deadlock.
In either of the deadlock detection methods, one of the requests will have to be terminated to stop the
deadlock. This also means that any transaction changes which came before the request will have to be rolled
back so that the other request can make progress and finish.
What is the difference between == and === in PHP?
When comparing values in PHP for equality you can use either the == operator or the === operator. What’s
the difference between the 2? Well, it’s quite simple. The == operator just checks to see if the left and right
values are equal. But, the === operator (note the extra “=”) actually checks to see if the left and right
values are equal, and also checks to see if they are of the same variable type (like whether they are both
booleans, ints, etc.).
An example of when you need to use the === operator in PHP
It’s good to know the difference between the 2 types of operators that check for equality. But, it’s even
better to understand when and why you would need to use the === operator versus the == operator.
So, we want to give you an example of when you must use the === operator: When developing in PHP, you
may find a time when you will need to use the strpos function – you should read more about this
function here in order to understand our example (don’t worry it’s a very quick read).
When using the strpos function, it may return 0 to mean that the string you are searching for is at the 0th
index (or the very first position) of the other string that you are searching. Suppose, for whatever reason,
we want to make sure that an input string does not contain the string “xyz”. Then we would have this PHP
code:
//bad code:
if ( strpos( $inputString, 'xyz' ) == false ) {
// do something
}
But, there is a problem with the code above: Because $strpos will return a 0 (as in the 0th index) if the
$strpos variable happens to have the ‘xyz’ string at the very beginning of $inputString. But, the problem is
that a 0 is also treated as false in PHP, and when the == operator is used to compare 0 and false, PHP will
say that the 0 and false are equal. That is a problem because it is not what we wanted to have happen –
even though the $inputString variable contains the string ‘xyz’, the equality of 0 and false tells us that
$inputString doesnot contain the ‘xyz’ string. So, there is a problem with the way the return value of strpos
is compared to the boolean value of ‘false’. But, what is the solution? Well, as you probably guessed, we can
simply use the === operator for comparison. And, as we described earlier, the === operator will say that
the 2 things being compared are equal only if both the type and value of the operands are also equal. So, if
we compare a 0 to a false, then they will not be considered equal – which is exactly the kind of behavior we
want. Here is what the good code will look like:
//good code:
if ( strpos( $inputString, 'xyz' ) === false ) {
// do something
}
How would you parse HTML in PHP?
If you have programmed in PHP, you may have come across the need to parse an HTML document at some
point – because it is something that needs to be done in many different scenarios.
But, how should you approach this problem? The first answer that you may think of is to use regular
expressions, since they are good for finding patterns in strings. However, the reality is that HTML
documents can be quite complex, and trying to find patterns through regular expressions can become quite
difficult and painful. But there is good news – there is already a library in PHP that is meant for parsing
HTML: Parse HTML in PHP.
Has the problem already been solved?
It’s always good to remember that whenever you are looking to solve a difficult problem, look to see if
someone else has already solved it – because in the real world, you will want to save as much time on the
projects that you work on as possible. And, a lot of times if someone has encountered the same problem as
you, then there may be a solution that is already out there on the Web.
In PHP, what are magic methods and how are they used?
PHP functions that start with a double underscore – a “__” – are called magic functions (and/or methods) in
PHP. They are functions that are always defined inside classes, and are not stand-alone (outside of
classes) functions. The magic functions available in PHP are: __construct(), __destruct(), __call(),
__callStatic(), __get(), __set(), __isset(), __unset(), __sleep(), __wakeup(), __toString(), __invoke(),
__set_state(), __clone(), and __autoload().
Why are they called magic functions?
The definition of a magic function is provided by the programmer – meaning you, as the programmer, will
actually write the definition. This is important to remember – PHP does not provide the definitions of the
magic functions – the programmer must actually write the code that defines what the magic function will do.
But, magic functions will neverdirectly be called by the programmer – actually, PHP will call the function
‘behind the scenes’. This is why they are called ‘magic’ functions – because they are never directly called,
and they allow the programmer to do some pretty powerful things. Confused? An example will help make
this clear.
Example of using the __construct() magic function in PHP
The most commonly used magic function is __construct(). This is because as of PHP version 5, the
__construct method is basically the constructor for your class. If PHP 5 can not find the __construct()
function for a given class, then it will search for a function with the same name as the class name – this is
the old way of writing constructors in PHP, where you would just define a function with the same name as
the class.
Now, here is an example of a class with the __construct() magic function:
class Animal {
public $height;
public $weight;
// height of animal
// weight of animal
public function __construct($height, $weight)
{
$this->height = $height; //set the height instance variable
$this->weight = $weight; //set the weight instance variable
}
}
In the code above, we have a simple __construct function defined that just sets the height and weight of an
animal object. So let’s say that we create an object of the Animal class with this code:
Animal obj = new Animal(5, 150);
What happens when we run the code above? Well, a call to the __construct() function is made because that
is the constructor in PHP 5. And the obj object will be an object of the Animal class with a height of 5 and a
weight of 150. So, the __construct function is called behind the scenes. Magical, isn’t it?
If you’re looking for another example of a magical function, then just check out the next page – where we
give an example of the __autoload function in PHP and how it’s used.
In PHP, what is the __autoload function? Can you provide an example
of how it’s used?
PHP functions that start with a double underscore – a “__” – are called magic functions in PHP. The
__autoload function is also a magic function because it has a double underscore in front of it as well. If you
want to read a little bit more about magic functions in general, you can go here: Magic Functions in PHP.
Why is the __autoload function used?
In PHP, the __autoload function is used to simplify the job of the programmer by including classes
automatically without the programmer having to add a very large number of include statements. An
example will help clarify. Suppose we have the following code:
include
include
include
include
"class/class.Foo.php";
"class/class.AB.php";
"class/class.XZ.php";
"class/class.YZ.php";
$foo = new Foo;
$ab = new AB;
$xz = new XZ;
$yz = new YZ;
Note in the code above that we have to include each of the 4 different class files separately – because we
are creating an instance of each class, we absolutely must have each class file. Of course, we are assuming
that developers are defining only one class per source file – which is good practice when writing object
oriented programs, even though you are allowed to have multiple classes in one source file.
The __autoload function simplifies inclusion of class files in PHP
Imagine if we need to use 20 or even 30 different classes within this one file – writing out each include
statement can become a huge pain. And this is exactly the problem that the PHP __autoload function solves
– it allows PHP to load the classes for us automatically! So, instead of the code above, we can use the
__autoload function as shown below:
function __autoload($class_name)
{
require_once “./class/class.”.$class_name.“.php”;
}
$foo = new Foo;
$ab = new AB;
$xz = new XZ;
$yz = new YZ;
How does the __autoload function work?
Because the __autoload function is a magic function, it will not be called directly by you, the
programmer. Instead, it is called behind the scenes by PHP – that’s what makes it magical. But, when
does the __autoload function actually get called? Well, in the code above, the __autoload function will be
called 4 times, because PHP will not recognize the Foo, AB, XZ, and YZ classes so PHP will make a call to the
__autoload function each time it does not recognize a class name.
Also, in the code above, we can see that the autoload function takes the class name as a parameter (the
$class_name variable). PHP passes the $class_name variable behind the scenes to the __autoload function
whenever it finds that it doesn’t recognize the class name that is being used in a given statement. For
instance, when PHP sees the “$foo = new Foo;” line, it does not recognize the Foo class because the Foo
class was never included or “required” as part of the current file. So, PHP then (behind the scenes) passes
the “Foo” class to the __autoload function, and if the class file is found by the autoload function then it is
included by the “require_once” statement.
One last thing worth noticing in the code above is how we use the $class_name variable in the
class.”.$class_name.“.php”; piece of the code. Basically, this allows us to point to the correct file
dynamically. The assumption here is also that the class folder is in the same directory as the current file.
When is the __autoload function called?
The __autoload function is called anytime a reference to an unknown class is made in your code. Of course,
the __autoload function must be defined by you in order to actually be called.
Does the __autoload function work with static function calls?
Yes, it does. Remember that a static function is a function that can be called by just using the class name in
which it is defined – and there is no need to create an object of the class. The code below, which has a call
to a static function, will still run the __autoload function:
function __autoload($class_name)
{
require_once “./class/class.”.$class_name.“.php”;
}
//this is a call to a static function
SampleClass::staticFunctionCall($param);
In the code above, the class SampleClass is not recognized because it is not explicitly included anywhere in
the code. This means that PHP will make a call to the __autoload function when it realizes that the
SampleClass definition is not provided anywhere. Once the __autoload function is called, the
class.SampleClass.php file will be included in order to have the definition of the SampleClass class. Of
course, the SampleClass is needed because a call is being made to a static function that belongs to the
SampleClass class.
When else would the __autoload function be called automatically by PHP?
One last thing that is interesting and worth noting is that even calling the class_exists(which just checks to
see if a given class is defined) PHP function will call the __autoload function by default. There is an extra
parameter in the class_exists function that would allow you to disable the automatic call to the __autoload
function.
in PHP, what is the difference between self and $this?
In very general terms, we can say that $this is used to reference the currentobject, whereas self is used
to access the current class itself. But, there are more specific details we will discuss below that you should
definitely know about. Since we believe strongly in examples as a teaching aid, take a look at the examples
we have come up with below. In the examples below, we have a class called Animal and a derived class
called Tiger. The Tiger class overrides the whichClass() function – which is also very important to note for
the discussion that follows. Here is some code where we use the $this pointer:
class Animal {
public function whichClass() {
echo "I am an Animal!";
}
/*
Note that this method uses the $this keyword so
the calling object's class type (Tiger) will be
recognized and the Tiger class version of the
whichClass method will be called.
*/
public function sayClassName() {
$this->whichClass();
}
}
class Tiger extends Animal {
public function whichClass() {
echo "I am a Tiger!";
}
}
$tigerObj = new Tiger();
$tigerObj->sayClassName();
Running the code above will output this:
I am a Tiger!
In the code above, we create an object of the Tiger class and call it $tigerObj. And, inside the Animal class’s
version of the sayClassName() function, you can see the call to $this->whichClass(). Because the $this
pointer always references the current object, and we are dealing with an object of the Tiger class above, the
version of whichClass() that gets called is the one defined in the Tiger class. This is a valid example of
polymorphism in PHP.
Using “self” instead
Now, if we change the “sayClassName()” function to use the self keyword instead of the $this variable, we
would get a different result. So, suppose our code now looks like this – the only change we made is
highlighted in red, everything else is exactly the same:
class Animal {
public function whichClass() {
echo "I am an Animal!";
}
/* This method has been changed to use the
self keyword instead of $this
*/
public function sayClassName() {
self::whichClass();
}
}
class Tiger extends Animal {
public function whichClass() {
echo "I am a Tiger!";
}
}
$tigerObj = new Tiger();
$tigerObj->sayClassName();
Running the code above will output this:
I am an Animal!
Using self in PHP can turn off polymorphic behavior and bypasses the
vtable
So, what exactly is going on when we change the code to use “self” instead? When self is used, it
automatically just calls the version of sayClassName() that is in the same class – so since self is being used
within the Animal class, the version of sayClassName() that get’s called is the one that belongs to the
Animal class. If we compare self to $this, then we can see that the $this keyword will just call the version of
sayClassName() that belongs to the same class type as the calling object. Remember the $this variable is
basically a reference to the current object, which in this case is of type Tiger. So, when the $this variable is
used to call the sayClassName, it chooses to use the version that is in the Tiger class.
In the example above, self is essentially turning off polymorphic behavior by bypassing the vtable. If that is
confusing you can (and probably should) read more about vtables over here:Vtables.
$this versus self when used inside a static function
Let’s say you try to use the $this pointer inside a static method. So, let’s say you have some code that looks
like this:
class Animal {
public static $name;
//trying to use $this in a static function:
public static function nameChange()
{
$this->name = "Programmer Interview";
}
}
$animalObj = new Animal();
$animalObj->nameChange();
What actually happens when you run the code above is that you will end up with this error: “Fatal error:
Using $this when not in object context…”. Why do we get this error? Well, think about what you are doing
here – you are using the $this pointer inside a static function. And, static functions can actually be called
without using an object of the same class – you can call the nameChange function directly just using the
class name like this:
Animal::nameChange();
If the static nameChange function is ever called directly by just using the Animal class name, then the $this
variable really has no meaning because $this is meant to be used to reference the current object – and
there is no object in the scenario presented above. And that is exactly why you get that error message.
Now, what if we try to use the self variable inside the static nameChange function instead? So our code now
looks like this:
class Animal {
public static $name;
//trying to use self in a static function:
public static function nameChange()
{
self::$name = "Programmer Interview";
}
}
$animalObj = new Animal();
$animalObj->nameChange();
The code above runs just fine, without error. And, this is actually a big reason why self is used – to access
static members of the class. Now, just for the sake of showing you something else that is interesting, let’s
make a small change so that $name is no longer a static member variable:
class Animal {
//$name is no longer a static variable..
public $name;
//trying to use self in a static function:
public static function nameChange()
{
self::$name = "Programmer Interview";
}
}
$animalObj = new Animal();
$animalObj->nameChange();
Now that $name is no longer a static variable, running the code above gives us an error: “Fatal error:
Access to undeclared static property: Animal::$name”. What is the reason we get this error now?
Explanation of the “Fatal error: Access to undeclared static property” error
The reason we get that error is because we are not allowed to try to access non-static member variables
inside a static function. This makes sense because static functions can be called without using an object –
and because non-static member variables are meant to be used by objects this makes no sense. You can
read more about this here (even though it’s in the context of C++, the same concept applies to
PHP): Accessing non static members from a static function
$this vs self when accessing static members and calling static functions
Let’s see how $this and self behave when either trying to access static members or call static functions. Let’s
start with the self keyword:
class Animal {
public static function whichClass() {
echo "I am an Animal!";
}
public function sayClassName() {
self::whichClass();
}
}
$animalObj = new Animal();
$animalObj ->sayClassName();
The code above uses the self keyword to invoke a static function – and it runs just fine without giving any
errors. Now, let’s change it to use the $this variable instead of self:
class Animal {
public static function whichClass() {
echo "I am an Animal!";
}
public function sayClassName() {
$this->whichClass();
}
}
$animalObj = new Animal();
$animalObj ->sayClassName();
The code above actually runs just fine and returns “I am an Animal”. So, invoking a static function with the
$this variable is not a problem. And that actually makes sense because static functions do not even need an
object in order to be invoked.
Example of calling static member variable with $this
Invoking a static member variable with the $this pointer does not return an error, but is notrecommended.
Let’s take a look at an example to help clarify. Suppose you try to run the following code – note the use of
$this to access the $name member variable:
class Animal {
public static $name;
public static function whichClass() {
echo "I am an Animal!";
}
public function sayClassName() {
$this->name = "My name is Animal";
}
}
$animalObj = new Animal();
$animalObj ->sayClassName();
So, in the code above, we create an instance of the Animal class in $animalObj, and then we call the method
sayClassName, which uses the $this pointer to access the static member variable $name. Running the code
above runs without error. It is perfectly fine to access a static variable via the $this pointer. But, you should
know that any change to that static variable will only affect the current instance/object, which isn’t really
what you would expect if you change a static member variable. Confused? Well, check out this example:
class Animal {
public static $name;
public function setClassName() {
$this->name = "My name is Animal";
//echo $this->name;
}
}
$animalObj = new Animal();
$animalObj2 = new Animal();
$animalObj->setClassName();
echo $animalObj->name;
echo $animalObj2->name;
//what happens here?
Note that we set the $name static variable for the $animalObj object by calling the setClassName method,
but we do not set that variable for the $animalObj2 object. What do you think will happen when we try to
output the value of the $name static variable in $animalObj2 (in this line: echo $animalObj2->name; )?
Well, if you guessed that it would output “My name is Animal” then you are actually wrong! You can verify
this for yourself by running the simple code above. Yes, “echo $animalObj->name;” will output the text “My
name is Animal”, but when a static member variable is set in one object using the $this pointer, that value
does NOT transfer to other instances of the same class, even though it’s a static member variable and that
behavior may be exactly what you would expect. It seems that it’s not even setting the static variable at all,
and it’s just creating another non-static variable called $name that only belongs to that particular object.
You can confirm this fact by trying to output the value of the static variable using the correct syntax: “echo
Animal::$name;” – keep in mind that in PHP, static properties cannot be accessed through the object
using the arrow operator ->.
You will see that nothing is output for the code above when you try “echo Animal::$name;”, even after
setting what you may think is the static variable $name using $this. For this exact reason, in PHP, you
should not use the $this pointer to set a static member variable – instead, you should just use
the $self variable, which you can see an example of below.
Example of using self to access static member variable
Now, let’s say that we use the self keyword in the same example:
class Animal {
public static $name;
public static function whichClass() {
echo "I am an Animal!";
}
public function sayClassName() {
self::$name = "My name is Animal";
}
}
$animalObj = new Animal();
$animalObj ->sayClassName();
Now, this code also runs just fine as well – self is accessing the static $name member variable that belongs
to the Animal class, and that makes sense because no objects are involved or implied. If we try to output
the value of the $name static member variable using the code “echo Animal::$name;”, then it will output
just fine – because we correctly used the self keyword to set the static member variable. So, in PHP, it is
always necessary to always refer to static variables using a static context (e.g by using self, or
the class name).
Accessing a variable with same name as static member variable inside a
function
But, another interesting question is what would happen if self is not used, and we just used the $name by
itself like in this example:
class Animal {
public static $name;
public static function whichClass() {
echo "I am an Animal!";
}
public function sayClassName() {
$name = "My name is Animal";
}
}
$animalObj = new Animal();
$animalObj ->sayClassName();
Well, in this scenario, the $name variable used in the sayClassName function is actually a local variable that
is created inside the sayClassName function, and that is not the same static $name variable that belongs to
the class as a whole. So, you would actually have to use the self variable if you want to reference the static
member variable that belongs to the class as a whole.
Summary of the differences between self and $this
Finally, we are done. Let’s now go through a quick summary of the differences between self and $this that
we covered in our examples above:
-self refers to the current class
-self can be used to call static functions and reference static
member variables
-self can be used inside static functions
-self can also turn off polymorphic behavior by bypassing
the vtable
-$this refers to the current object
-$this can be used to call static functions
-$this should not be used to call static member
variables. Use self instead.
-$this can not be used inside static functions
In PHP, what is the difference between self and static?
The differences between self and static are fairly easy to understand with some good examples. So, let’s
take a look at some actual code. Suppose we have the following class – called Car – which has two simple
methods called model and getModel. Note the use of the self keyword:
Example of self in PHP
class Car
{
public static function model()
{
self::getModel();
}
protected static function getModel()
{
echo "I am a Car!";
}
}
Suppose we make this call to the static model function in the Car class – since it is a static function we can
of course call the function directly using only the class name:
Car::model();
The output after calling the model function in the Car class will be:
I am a Car!
The self keyword simply makes a call to the getModel function that belongs to the Car class, which just
outputs the text “I am a Car!” to the page.
Let’s say that we decide to add a new class, called Mercedes, that derives from the Car class. Here is what it
looks like:
class Mercedes extends Car
{
protected static function getModel()
{
echo "I am a Mercedes!";
}
}
Because the Mercedes class derives from the Car class, it will inherit the model function that is defined in
the Car class.
So, what do you think will happen if we call the static model function, but this time we use the Mercedes
class instead – like this:
Mercedes::model();
Well, the output from the function call above may not be what you expect – this is what the output looks
like:
I am a Car!
You may have been expecting the Mercedes::model(); call to have output “I am a Mercedes!”. So, what is
going on here?
Explaining self
The model function is defined inside the Car class, and it is not overridden by the Mercedes class – but
the model function is of course inherited by the Mercedes class. As a result, when we call the version of
model inside the Mercedes class, the scope of the function is still inside the Car class – because the
function definition is inside the Car class. The way the keyword “self” works is that it will call
thecurrent class’s implementation of the getModel function – and since the model function is defined inside
the Car class, the current class would be the Car class. So, it will call the Car class implementation of
getModel and NOT the Mercedes class implementation.
This behavior may be considered undesirable because it is not polymorphic, and is not aligned with object
oriented design principles. But, there is an alternative solution that can get us that kind of behavior – and
this is where the static keyword becomes useful.
This behavior may be considered undesirable because it is not polymorphic, and is not aligned with object
oriented design principles. But, there is an alternative solution that can get us that kind of behavior – and it
involves using the static keyword in a different way from how you may normally use it.
The static keyword and late static binding
In PHP 5.3, a new feature called late static bindings was added – and this can help us get the polymorphic
behavior that may be preferable in this situation. In simplest terms, late static bindings means that a call to
a static function that is inherited will “bind” to the calling class at runtime. So, in our example above, if we
use late static binding it would mean that when we make a call to “Mercedes::model();”, then the getModel
function in the Mercedes class will be called instead of the getModel function in the Car class. Mercedes is of
course the calling class in our example.
Example of late static binding in PHP
Now, the question is how can we actually make late static binding work for us? Well, all we have to do is
replace the “self::getModel();” call inside the Car class with “static::getModel();” instead. So, this is what
our new Car class will look like – note that we do not have to make any change to the Mercedes class:
class Car
{
public static function model()
{
static::getModel();
}
protected static function getModel()
{
echo "I am a Car!";
}
}
Now, if we make this call:
Mercedes::model();
Our output will be this:
I am a Mercedes!
Late static binding was not possible before PHP 5.3
Note that before PHP 5.3 late static binding was not possible – and trying to run the code above in any
version of PHP before 5.3 will return an error.
PHP self versus static
Now that we changed the code in our example to use static instead of self, you can see the difference is that
self references the current class, whereas the static keyword allows the function to bind to the calling class
at runtime.
How would you find out if a string contains another string in PHP?
Suppose we have a string that is stored in a PHP variable called $aString. And we want to find out if
inside of $aString there is another substring – let’s just say for the sake of an example that we are
looking for the string “Waldo” inside of the larger string.
Now let’s say that the name of the larger string ($aString) is this: “Where is Waldo?”. And, we just want to
find out if $aString contains “Waldo”. PHP provides us with a function called strposthat will allow us to find
the existence of one string inside of another. Here is an example of how to use the strpos function:
Example of how to find out if one string contains another in PHP
if (strpos($aString,'Waldo') !== false) {
echo 'I found Waldo!';
}
But, there is something you should be aware of when using the strpos function: if you are looking for
“Waldo” inside a string that looks like this: “heyWaldo are you there?”, then the strpos function will return
successfully with the string positon of “Waldo”, basically saying that “Waldo” was indeed found. This is of
course a problem if you only want to search for the string “Waldo” as a separate word, and not as part of
another word.
Strpos never returns true
One thing about the strpos function that you should remember is that it never returns the boolean value of
true. The strpos function returns a value indicating the position of the first occurrence of the substring being
searched for. If the substring is not found “false” is returned instead – which is why in the code above we
check for false instead of true.
!== vs != in PHP
One thing worth noting in the code above is that we used the !== operator instead of the != operator
(which has one less “=”). What’s the difference between the 2 operators?
You can think of the !== operator as being more ‘strict’ than the != operator. This is because the !==
operator will say that the two operands being compared are not equal only if the type of the two operands
are the same, but their values are not equal.
This is desirable behavior because the strpos function can return a 0 if the string being searched contains
the substring as the very first element. The 0 would represent the 0th index of the larger string – meaning
the first position in that string. So, if $aString is “Waldo is here”, and we are searching for “Waldo”, then the
strpos function will return a 0. This means that the check being performed will be to see if 0 is not equal to
false. But the problem is that 0 is also considered as the integer equivalent of the boolean ‘false’ in PHP,
which means that the statement “0 != false” will be considered false, because 0 is equal to false in PHP.
But, if we run “0 !== false” instead, then that statement will be considered to be true, because it just adds
the additional check to see if 0 and false are of the same type. Since 0 is an integer and false is a boolean,
clearly they are not equal so comparing the 0 and false forinequality returns true unlike the “0 != false”
check, which returns false.
Hopefully that was not too confusing and if you need more details on that concept you can read about it
here: Difference between == and === in PHP.
if we had this code instead – where we use != and not !== – then it would be a problem:
Problematic code to find a substring inside a larger string
if (strpos($aString,'Waldo') != false) {
echo 'I found Waldo!';
}
The code above can result in problems for the reasons discussed above. It’s always better to use !==
instead of !=.
How to delete an element from an array in php?
When deleting an element from an array in PHP, a good function to use is the unset function. Here is an
example of its usage:
An example of using unset to delete an element from an array:
$anArray = array("X", "Y", "Z");
unset($anArray[0]);
//'dumps' the content of $anArray to the page:
var_dump($anArray);
The output of the var_dump function will be:
array(2) { [1]=> string(1) "Y" [2]=> string(1) "Z" }
Unset leaves all of the index values the same after an element is deleted
In our example above, the $anArray array will have values of “Y” and “Z” at indices of 1 and 2,
respectively, after the element “X” is deleted using the unset function. This means that the indices for the
other elements were not changed to adjust for the fact that the very first element (“X”) was deleted.
This would also mean that if you delete an element in the very middle of an array using unset then it would
leave a gap in that array. Suppose we have this code:
$anArray = array("V", "W", "X", "Y", "Z");
unset($anArray[2]);
//'dumps' the content of $anArray to the page:
var_dump($anArray);
This would output the following:
array(4) { [0]=> string(1) "V" [1]=> string(1) "W"
[3]=> string(1) "Y" [4]=> string(1) "Z" }
Note in the output above, there is now no index # 2 – there is 0, 1, 3, and 4. Not having the continuous
index values could potentially be a negative drawback. So what are you alternatives? Well, you could use a
function called array_splice instead.
Using array_splice to delete an element from an array
The array_splice function is used to take one part of an array and replace it with some other contents. It can
also be used to delete an element in an array. Here is an example and explanation of how to use
array_splice to delete an element:
$anArray = array("V", "W", "X", "Y", "Z");
/*The 2 represents the offset - which basically
means move 2 positions from the beginning of the
array and that will take us to the "X" element. The 1
represents the length of the array that you want to
delete. Since we just want to delete 1 element, we
set the length parameter to 1.
And, since we are not replacing that element with anything
- we do just want to delete it - we leave the 4th parameter (
which is optional) blank
*/
array_splice($anArray, 2, 1);
var_dump($anArray);
Running the code above will return us this:
array(4) { [0]=> string(1) "V" [1]=> string(1) "W"
[2]=> string(1) "Y" [3]=> string(1) "Z" }
So, using array_splice will set the index values back to their correct order, and we will be good to go again.
There is however an assumption here that we should point out – the array_splice function accepts a value
for the offset, not the index. Theoffset that we used in the example above happened to be equal to the index
value. This is the case when the array we are dealing with already has a continuous integer index value, but
if the array has been changed for whatever reason before array_splice is used, that may not always be the
case.
Using the array_values and unset functions to delete an element from an
array
If you use the array_values function right after the unset function, then you can set the index values back to
their normal values, without any gaps in between the numbers. An example will help clarify:
Example of using array_values and unset to delete an element from an
array:
$anArray = array("V", "W", "X", "Y", "Z");
/*
This will cause index 2 to go missing,
so the array indices of $anArray will be
0,1,3,4 - obviously the 2 is missing
*/
unset($anArray[2]);
/*
array_values will take an array as an input
and then take the array values (not the keys,
but just the values), and numerically index those values
into a new array.
This means that array_values will essentially re-index
the array that is given as an input, which restores the indices
to the correct order of 0,1,2, and 3:
*/
$anArray = array_values($anArray);
//'dumps' the content of $anArray to the page:
var_dump($anArray);
This will now output:
array(4) { [0]=> string(1) "V" [1]=> string(1) "W"
[2]=> string(1) "Y" [3]=> string(1) "Z" }
The array_values function will replace non-numeric key indices with
numeric values
One thing we should point out about the array_values function is that if the key’s are non-numeric then they
will be replaced with numeric values anyways. So, if you have an array that uses strings as indices (which is
basically a hashtable), then array_values will remove those strings and replace them with numerical values.
This is something you should definitely watch out for if you do decide to use the array_values function.
Now you have seen the different options that you have available to you when deleting an element from an
array in PHP. And you are also aware of any potential side effects – which method you choose to use is
entirely up to you.
How to delete an element in an array if you only know the value in PHP
You may want to delete an element in an array for which you only know the value (but not the
corresponding key). In that scenario, you will have to search the array for the value you want first in order
to get the corresponding key. You can use the array_search function to do that – it will simply take the array
you want to search along with the value you want to search for as the parameters, and will return the
corresponding key if it is found. Then, you can use the unset function to remove the element as before.
And finally, here is an example of how to do it:
An example of deleting an element in an array if you only know the value in
PHP
$key = array_search($valueToSearch,$arrayToSearch);
if($key!==false){
unset($array[$key]);
}
How would you convert a PHP variable to a string? Is there something like
the toString method that Java has in PHP?
PHP has casting operators that can be used to convert non-string variables into strings. Here is how to use
a casting operator to convert a variable to a string in PHP:
Example of PHP’s equivalent to toString
// this is an integer:
$nonStringVar = 123;
/*now $stringVar is a string because
the "(string)" performs a type cast
and returns the string equivalent
of the integer
*/
$stringVar = (string)$nonStringVar;
The casting operator is fairly close in functionality to the toString method in Java, but read below because
PHP also has a it’s own __toString method.
Using the strval function to convert a variable to a string
You can also use the function strval to get the string value of a variable.
// this is an integer:
$nonStringVar = 123;
//now $stringVar is a string
$aString = strval($nonStringVar);
PHP does have a __toString magic method
PHP also provides a method called __toString that is defined by the programmer, and that basically tells
a class how to act when it is treated like a string. When would a class be treated like a string? Well,
suppose you have a class called SomeClass, and an object of that class called $someObject. If you decide to
do something like this: “echo $someObject;”, then what exactly should be output in that scenario? Well, that
is up to you to decide – because whatever you define in __toString method is what will be output to the
page. You can decide to output all of the class’s instance variables, output a particular string, etc. – but
whatever you do the __toString method will have to return a string.
Here is an example of the __toString method in action:
class SomeClass
{
public $aVariable;
public function __construct($aVariable)
{
$this->aVariable = $aVariable;
}
public function __toString()
{
return $this->aVariable;
}
}
$someObject = new SomeClass('Testing 123');
/* this will indirectly call the toString method:
which will output the string 'Testing 123'
*/
echo $someObject;
The __toString method is a magic method – you can read more about magic methods here if you are not
familiar with them already: Magic methods in PHP.
What is the best way to return data in the JSON format from PHP?
If you are running PHP 5.2 or greater, you can use the built-in json_encode function to return JSON
formatted data from PHP. Here is an example of it’s usage:
Example of returning JSON from PHP
$tennisArray = array('Djokovic' => 1, 'Federer' => 2,
'Nadal' => 3, 'Murray' => 4);
echo json_encode($tennisArray);
The code above will output the JSON formatted data like this:
{"Djokovic":1,"Federer":2,"Nadal":3,"Murray":4}
If you are running a version before PHP 5.2, how to return JSON from
PHP?
If you are running a version of PHP that came before 5.2 then you can use the PHP extension called PECL
available here: JSON and PHP
What is the best way to remove or turn off warning messages in PHP?
There will be times when you will see a warning message output to the browser after running your PHP
script and you may want to turn off that warning message. Obviously, it’s a lot better to get to the root of
the problem and fix that instead. But, if you know that you do not need to fix the root of the problem (for
whatever reason), in order to remove a warning message in PHP all you have to do is use
theerror_reporting function in PHP. If you don’t care about how the function works, then just skip to the
section that says “The code to turn off error messages in PHP” to see the code that you should use to turn
off warning messages in PHP. Otherwise, keep reading.
Using the error_reporting function to turn off warnings in PHP
The error_reporting function in PHP basically allows you to set the kind of error reporting that you want.
How does the error_reporting function work? Well, you simply pass in the type of errors to the
error_reporting function that you want to have reported on the page – you need to pass in constants (which
are text fields that translate to numbers) to the error_reporting function .
The E_PARSE constant tells PHP that compile time parse errors should be reported and displayed on the
page as you can read about here. Since you definitely want to know about any compile time errors, you
should pass this constant to the function. The E_ERROR constant tells PHP that the details of any fatal runtime errors should be reported and displayed – this is also something you definitely want, since you should
always know what the cause of any fatal run-time errors is.
Now that you understand a bit more about how the error_reporting function works – here is the actual code
to use:
The code to turn off error messages in PHP:
You have to place this line of code before the code that is causing the warning to be displayed. If you place
this code after the offending code, then it will not work in suppressing the error message that gets
displayed. Here is the line of code to use:
error_reporting(E_ERROR | E_PARSE);
The code above does not have the E_WARNING constant being passed in
Because the function above does not include the “E_WARNING” constant, the non-fatal run-time warnings
will not be displayed on the page when a PHP script is run. And that is exactly what prevents the warning
message from appearing on the page.
How do the constants work in the error_reporting function?
In the example above, E_PARSE and E_ERROR are both constants – which means that they are actually
numbers represented by text, so E_PARSE really is just some text that represents the numeric value of 4,
and E_ERROR is text representing the numeric value of 1. Read on to understand how those constants work.
How does the OR operator work with error_reporting function?
Note that the function above uses the “|” – the OR logical operator, which is applied against the constants
that are passed into the error_reporting function. You will notice if you look atthis page that those constants
are all multiples of 2 – the reason for this is because when they are “OR’ed, the appropriate bits will be
retained and that will tell the error_reporting function what errors need to be displayed.
Another way to hide or remove warning messages in PHP
Another option to remove warning messages in PHP is to use what is called the error control operator –
which is basically just the at sign – the “@”. When the “@” sign is put in front of an expression, any error
message that might be generated by that expression will be ignored.
The “@” error control prefix operator will even disable error reporting for critical run time errors. For this
reason, you should only use this operator if you really know what you are doing. The “@” can only be used
in front of expressions – so it can not be used in front of a function or class definition, a for loop, etc. But, it
can be used in front of a call to a function. Here is what it would look like in that scenario:
@someFunctionCall( );
Advanced PHP Practice Interview Questions And Answers
Here we present some more challenging practice PHP interview questions and answers that were asked in a
real interview for a PHP web developer position. These questions are really good to not just test your PHP
skills, but also your general web development knowledge. We think that you will benefit a lot, and gain
some good practice by going through these questions. The questions are for intermediate to somewhat
advanced PHP software engineers, but even if you are just a beginner or fresher you should be able to
understand the answers and explanations we give – but you may not be able to come up with the answers
on your own. Here is the first part of the question – read it carefully to really understand it, and we give a
simple, easy to understand explanation of everything in this question:
Write a PHP script to report the total download size of any URL. You may not use any 3rd-party
code that performs the entire task described below.
No HTML interface is necessary for this exercise; you can write this as a command-line script that
accepts the URL as an argument.
For a single-file resource such as an image or SWF, the script would simply report on the total
size of the document.
For a complex resource such as an HTML document, the script would need to parse it to find
references to embedded, included resources: javascript files, CSS files, iframes, etc.
The goal of this exercise is to output
total
number
- total download size for all requests
the
following
of
information
HTTP
for
a
given URL:
requests
So, there are 2 primary goals that this question asks us to solve: For any URL, find the total number of HTTP
requests generated by that URL, and also find the total download size for all requests. You may not
understand what is meant by an HTTP request, but don’t worry we explain it all below.
We’ll have to break down this question into more manageable pieces since it is a lot to comprehend. So,
we’ll go with the divide and conquer approach. Let’s start with the easier parts of the question first.
Accepting arguments in PHP scripts
The question says that “No HTML interface is necessary; you can write this as a command-line script that
accepts the URL as an argument”.
So, let’s just say that we want to just write this as a command line script. The question is how do we
retrieve arguments inside a PHP command-line script?
Well, if we plan on having the script called from the command line as “ourscript.php www.theurl.com”,
where the URL is passed as an argument, then inside the PHP script we can grab the URL value by using the
PHP variable “$argv[1];”. Inside our PHP script the code to retrieve the URL passed in as an argument would
look like:
/*
If this script is invoked as ourscript.php www.theurl.com,
then $argv[1] will hold the value www.theurl.com, and
that value will be stored in the $URL variable as well
*/
$URL = $argv[1];
That’s very simple code – now, let’s move on to other parts of the question.
How to connect to a URL in PHP?
It should also be clear that we will need to somehow be able to connect to a URL and view the contents of
the page that the URL points to. What is the best way to do this? Well PHP provides a library
called cURL that may already be included in your installation of PHP by default. cURL stands for client URL,
and it allows you to connect to a URL and retrieve information from that page – like the HTML content of the
page, the HTTP headers and their associated data, etc. You will see the use of cURL in our code below –
don’t worry if you’ve never used cURL before, it’s fairly easy to understand!
Understanding resources
If you are confused by what exactly is meant by the term “resource” in the question above, then you should
just think of a web resource as a generic term for a file. So, a CSS file, a Javascript file, an HTML file, a SWF
(a file used for Adobe Flash) file, an image file (jpg, png, etc) – each of these is a different type of resource,
and as you know there are many more types of resources on the web.
The difference between single file resources and other resources
The question specifically calls HTML files complex resources because of the simple fact that HTML
documents are complex – they can contain many references to single file resources like image files, and
SWF files. A single file resource does not contain references to other resources – a jpg or gif file can not
contain a reference to another file, and that is why they are both considered single file resources. An HTML
file, on the other hand, is also considered a resource itself, but because it contains references to other
resources, it is not considered to be a single file resource.
In order to retrieve a resource from the web server where that resource is stored, a web browser has to
make an HTTP request. Read on to understand more about HTTP requests.
What exactly is an HTTP request?
The question asks for two major things from a URL – the total number of HTTP requests and the total
download size for all requests. The download size is easy enough to understand, but you may be confused
by what exactly is meant by an HTTP request. HTTP is the protocol used to communicate on the web. When
you visit a webpage, your browser will make an HTTP request to the server that hosts that webpage, and
the server on which the webpage is hosted will respond with an HTTP response.
But, what is important to understand here, is that your browser will probably have to makemultiple HTTP
requests in order to retrieve a single HTML page at a given URL, because that webpage will probably have
some CSS files to go along with it, some Javascript files, and probably some images as well. Each one of
those resources is a separate HTTP request – 2 image files, 2 Javascript files, and 2 CSS files means 6
separate HTTP requests. In HTTP, only one resource can be requested at a time – so we can not have 1
request for 6 different resources, instead we must have 6 requests for those 6 different resources.
So, for the purpose of this interview question, we have to find out the number of HTTP requests that will be
made for a given URL – hopefully what that means is now clear to you. We’ll go more in depth on this later –
and show some actual code – as we cover some other things as well.
How to find the download size of a file?
The question also asks us to find the total download size of a URL. But what if that URL passed into the
script just points to a single file resource like a JPG file or a GIF file? Well, for a single file resource we just
need to find the size of that particular file and then return it as the answer, and we are done. But, for an
HTML document we will need to find the total size of all resources that are embedded and included on the
page and return that as the answer – because you must remember that we want the total download size
of a URL.
So, let’s write a PHP function that will return the download size of a single file resource. How should we
approach writing this function – what is the easiest way to find the download size of a single file resource on
the web?
Well, there is an HTTP header called “Content-Length” which will actually tell us the size of a particular
resource file in the HTTP response (after the resource is requested). So, all we have to do is use PHP’s built
in “get_headers” function, which will retrieve all the HTTP headers sent by the server in response to an HTTP
request.
The get_headers function accepts a URL as an argument. So, the PHP code to retrieve the “Content-Length”
header would look like this:
function get_remote_file_size($url) {
$headers = get_headers($url, 1);
if (isset($headers['Content-Length']))
return $headers['Content-Length'];
//checks for lower case "L" in Content-length:
if (isset($headers['Content-length']))
return $headers['Content-length'];
}
But, there is actually a problem with this code: you will not always receive the Content-Length header in an
HTTP response. In other words, the HTTP Content-Length header is not guaranteed to be sent back by the
web server hosting that particular URL, because it depends on the configuration of the server. This means
that you need an alternative that always works in case the approach above fails.
An alternative to using the content-length header
Well, we can actually download the file ourselves and then just get the download size for that URL. How can
we do this? Well, this is where we can use cURL as we discussed above. Once we download the resource, we
can retrieve the download size using the CURLINFO_SIZE_DOWNLOAD parameter. So, using this approach
as a backup to our first approach, we can come up with this code (the code in red below is the new code):
function get_remote_file_size($url) {
$headers = get_headers($url, 1);
if (isset($headers['Content-Length']))
return $headers['Content-Length'];
//checks for lower case "L" in Content-length:
if (isset($headers['Content-length']))
return $headers['Content-length'];
//the code below runs if no "Content-Length" header is found:
$c = curl_init();
curl_setopt_array($c, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0
(Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3)
Gecko/20090824 Firefox/3.5.3'),
));
curl_exec($c);
$size = curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
return $size;
curl_close($c);
}
How should we parse HTML in PHP?
What exactly is meant by the sentence “For a complex resource such as an HTML document, the script
would need to parse it to find references to embedded, included resources: javascript files, CSS files,
iframes, etc.”?
Well, as you probably know, an HTML page often uses other files to render the HTML page – like CSS file(s)
for styling, Javascript file(s) for adding more functionality to the HTML page, and so on. But the question is
how do we take an HTML page and find all of those resources in the HTML page. Of course, this is easy to do
if we are reading the HTML page with the human eye. But, we want to find these resources using a program
that will read the HTML for us. This is actually more complicated than it seems – and the process by which a
program (like PHP) reads an HTML file and analyzes the text to extract meaningful data (like resources) is
known as parsing the HTML. Any text can be parsed, but we are exclusively focused on HTML for the
purpose of this interview question.
Parsing HTML in PHP is definitely something that you do not want to do on your own, because it is so
complex – as you can read about here: How to parse HTML in PHP. The best way to parse HTML in PHP is to
use a library that already exists – because writing an entire library from scratch to do this would obviously
be considered way too much work for an answer to an interview question.
Note that the question states that “You may not use any 3rd-party code that performs the entire task
described below”. This just means you can not use 3rd party code to perform theentire task – but using a
PHP library to help you with part of this question is perfectly OK. Of course, you should clarify this with your
interviewer if you are in doubt, but we know for sure that for this particular question there’s no way that the
interviewers would be expecting you to perform this task without using a library to help you parse the HTML.
With that in mind, here is the library we plan on using: PHP HTML parser.
Note that the instructions say: “For a single-file resource such as an image or SWF, the script would simply
report on the total size of the document.”
This means that if the URL is single file resources like an image file, we can just return the size of the file
and we are done. But, how can we distinguish between a single-file resource and a non-single file resource?
Well, we could just say that all non-HTML pages are single file resources. That statement is not entirely true,
as you can read about in part 3, but we will pretend it is for the sake of keeping things simple.
But wait, you might be thinking – what about PHP, JSP, ASP and all of those pages? Well, of course there is
some application specific logic embedded in those pages, but once those pages are rendered in a browser
they are all HTML pages, regardless of what their file extension may be.
So, all we have to do in order to determine if a URL points to a single file resource is to see if it is an HTML
page – if it is not an HTML page, then we know that the file is a single resource file.
Using the HTTP Content-Type Header
But how do we check to see if a webpage is an HTML page? Clearly we can’t just look at the URL by itself,
because a PHP page, JSP page, etc. are all HTML pages, but the file extension does not tell us that. Well,
once again we can use the HTTP headers to our advantage – in this case, we just have to take a look at the
HTTP Content Type header.
And, if the Content-Type header is equal to “text/html”, then we know that we are dealing with an HTML
page. But if the Content-Type header for the URL is not equal to “text/html”, then we know that we are
dealing with a single file resource, and we can just return the size.
Let’s write some code in PHP that will tell us if a given URL is actually an HTML page by checking the HTTP
headers. Here is a PHP function that will do that for us:
function check_if_html($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
$data = curl_exec($ch);
$contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE );
curl_close($ch);
if (strpos($contentType,'text/html') !== false)
return TRUE;
// this is HTML, yes!
else
return FALSE;
}
In the code above, we just use a simple cURL connection to the URL to retrieve the headers, and then check
the contentType header to see if it has the text “text/html”. If it does, then we return true, otherwise we
return false.
Then, we can add some code that will actually call the function to determine if a URL points to just a single
resource file:
/*
check to see if the URL points to an HTML page,
if it doesn't then we are dealing with a single
file resource:
*/
if (!check_if_html($URL))
{
$totalSize = get_remote_file_size($URL);
echo "Final Total Download Size: $totalSize Bytes ";
$totalNumResources += 1;
echo "
//single resource is an HTTP request
Final total HTTP requests: $totalNumResources" ;
return;
}
How to find the total number of HTTP requests
We mentioned that we would need to find the total number of HTTP requests generated by a given URL –
let’s figure out how to write some code that will do that for us. It’s clear that we must have some variable
that maintains a total count of all HTTP requests, and this variable will be incremented as we come across
more and more HTTP requests.
We know that images will be wrapped in an “img” tag – so if we just do a search for all img tags we can
take a look at the src attribute, and find the size of any given image. For each image we find, we can
increment the variable that holds the total count of the HTTP requests. We can also do the same for CSS
files – they will be referenced inside “link” tags, and also for JavaScript files, which will be referenced inside
“script” tags.
We will need to use the simple HTML DOM parser that we discussed earlier in order to find all of the
references to CSS, Javascript, and image files. Here’s what the code looks like – note that we are using the
simple HTML DOM library functionality to parse through the HTML. Also note that we are using a variable
called $totalNumResources to hold the total number of resources, and another variable called $totalSize to
hold the total size of all of the resources:
include('simple_html_dom.php');
$URL = $argv[1];
// Create DOM from URL or file
$html = file_get_html($URL);
// find all images!!
foreach($html->find('img') as $element){
$size = get_remote_file_size($element->src);
$totalSize = $totalSize + $size;
$totalNumResources += 1;
/*
echo "Total Size So Far: $totalSize.\n";
echo "total resources: $totalNumResources .\n";
echo "IMAGE SIZE: $size.\n";
echo "$element->src.\n";
*/
}
// find all CSS files
foreach($html->find('link') as $element)
{
if (strpos($element->href,'.css') !== false) {
$size = retrieve_remote_file_size($element->href);
echo "SIZE: $size.\n";
$totalSize = $totalSize + $size;
$totalNumResources += 1;
}
}
// find all script tags
foreach($html->find('script') as $element)
{
//make sure this is javascript
if (strpos($element->src,'.js') !== false) {
$size = get_remote_file_size($element->src);
echo " Javascript SIZE: $size.\n";
$totalSize = $totalSize + $size;
$totalNumResources += 1;
}
}
The answer to Advanced PHP Interview Question Part 1
Finally, we present our complete answer to the advanced PHP interview question part 1 below – with all the
source code you need to answer the first portion of the question. You can also continue on to Part 2 of the
PHP Interview Questions and Answers, or just click the next button below.
include('simple_html_dom.php');
$URL = $argv[1];
$totalSize = 0;
$totalNumResources = 0;
/*
check to see if the URL points to an HTML page,
if it doesn't then we are dealing with a single
file resource:
*/
if (!check_if_html($URL))
{
$totalSize = get_remote_file_size($URL);
echo "Final Total Download Size: $totalSize Bytes ";
$totalNumResources += 1;
echo "
//a single resource is still an HTTP request
Final total HTTP requests: $totalNumResources" ;
return;
}
/* at this point we know we are dealing with an HTML document
which also counts as a resource, so increment the $totalNumResources
variable by 1
*/
$totalNumResources += 1;
$html = file_get_html($URL);
// find all images:
foreach($html->find('img') as $element){
$size = get_remote_file_size($element->src);
$totalSize = $totalSize + $size;
$totalNumResources += 1;
/*
echo "Total Size So Far: $totalSize.\n";
echo "total resources: $totalNumResources .\n";
echo "IMAGE SIZE: $size.\n";
echo "$element->src.\n";
*/
}
// Find all CSS:
foreach($html->find('link') as $element)
{
if (strpos($element->href,'.css') !== false) {
$size = get_remote_file_size($element->href);
$totalSize = $totalSize + $size;
$totalNumResources += 1;
/*
echo "total resources: $totalNumResources .\n";
echo "Total Size So Far: $totalSize.\n";
echo "$element->href.\n";
*/
}
//only output the ones with 'css' inside...
}
//find all javascript:
foreach($html->find('script') as $element)
{
//check to see if it is javascript file:
if (strpos($element->src,'.js') !== false) {
$size = get_remote_file_size($element->src);
//echo " JS SIZE: $size.\n";
$totalSize = $totalSize + $size;
$totalNumResources += 1;
/*
echo "Total Size So Far: $totalSize.\n";
echo "total resources: $totalNumResources .\n";
echo "$element->src.\n";
*/
}
}
echo "Final total download size: $totalSize Bytes" ;
echo "Final total HTTP requests: $totalNumResources";
function get_remote_file_size($url) {
$headers = get_headers($url, 1);
if (isset($headers['Content-Length'])) return $headers['Content-Length'];
//this one checks for lower case "L" IN CONTENT-length:
if (isset($headers['Content-length'])) return $headers['Content-length'];
$c = curl_init();
curl_setopt_array($c, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac
OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3'),
));
curl_exec($c);
$size = curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
return $size;
curl_close($c);
}
/*checks content type header to see if it is
an HTML page...
*/
function check_if_html($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
$data = curl_exec($ch);
$contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE );
curl_close($ch);
if (strpos($contentType,'text/html') !== false)
return TRUE;
// this is HTML, yes!
else
return FALSE;
}
If you see some improvements we can make to the code above, please let us know in the comments. Press
next to see part 2 of this series of PHP web developer interview questions.
Advanced PHP Practice Interview Questions And Answers Part 2
This is a continuation of the practice PHP interview question from part 1. Here is the additional part of the
question that we want you to try to answer:
The code should also be able to handle the URL in the src attribute of an iframe.
And, here is the original question for your convenience:
Write a PHP script to report the total download size of any URL. You may not use any 3rd-party
code that performs the entire task described below.
No HTML interface is necessary for this exercise; you can write this as a command-line script that
accepts the URL as an argument.
For a single-file resource such as an image or SWF, the script would simply report on the total
size of the document.
For a complex resource such as an HTML document, the script would need to parse it to find
references to embedded, included resources: javascript files, CSS files, iframes, etc.
The goal of this exercise is to output
total
number
- total download size for all requests
the
following
of
information
HTTP
for
a
given URL:
requests
How to handle an iframe src
The second part of the question states that we will need to be able to handle the URL in the src attribute
of an iframe tag. What exactly does that mean? Well, the src attribute of an iframe tag points
to another HTML page. When an iframe tag is used in a page it’s like embedding another HTML page within
that page. And since the whole point of this exercise is to find the number of HTTP requests being made
along with the total download size for all requests, we basically have to follow the iframe src URL ourselves
and figure out how many new HTTP requests are created from that URL and what their total download size
will be.
For example, we actually use an iframe tag on this page to embed the Facebook likebox, which you can see
on the bottom of the left hand sidebar. This is what our iframe tag looks like (you can also see this if you
“View Source” for this page) – note that the iframe src actually points to a php page called “likebox.php”:
<iframe src="http://www.facebook.com/plugins/likebox.php?
href=http%3A%2F%2Fwww.facebook.com%2Fpages%2FProgrammerInterview%2F120896424636091&width=238&colorscheme
=light&show_faces=false&stream=false&header=true&
height=62" scrolling="no" frameborder="0" style="border:none;
overflow:hidden; width:230px; height:70px;"
allowTransparency="true"></iframe>
You can see the result on this page itself when the iframe is actually rendered – there is a like button, a
count of likes, some text, and an image of a nerd. The like button and the nerd will be 2 separate HTTP
requests. The iframe src itself counts as an HTTP request as well – because the browser will have to make a
request for whatever URL the iframe src points to.
View source does not show you markup generated by iframe
But, the thing is that when we read the HTML on the page, we will only see the iframe tag – we will not see
the markup that is created by the iframe. This is an important point to understand, and you can confirm
this fact by just doing a view source on this page. Even if we tell cURL to retrieve the page for us, the HTML
returned will have the iframe tag in it’s original form, and not in it’s rendered form. For that reason, if we
want to find out the number of HTTP requests that the iframe will generate, we have to take a look at the
URL in the source tag itself and evaluate just like we did for the original URL.
What this means is that we will essentially have to re-use the same code to find the requests and download
size that we used for the top level (containing) document as we would use for the containing document that
will hold the iframe tag. Think about that for a second and see if you can come up with a good approach to
solve that problem on your own.
It turns out that we can actually use recursion to basically call our existing code, and pass in the URL value
from the iframe src attribute. This way we can find the number of HTTP requests and total download size for
the requests that will come from the iframe src URL, we just re-use the code that we have already written.
In order to use recursion here, we should have our code wrapped inside a function. So, with that in mind we
create the function below that we call start – note that we deliberately left out the code which is used to
find images/css/javascript since we just want to focus on the iframe piece and making a recursive call.
Using recursion to answer PHP Interview question part 2
So, we can just make a recursive call to the start function as shown below in red:
function start($URL){
if (!check_if_html($URL))
{
$totalSize = get_remote_file_size($URL);
echo "Final Total Download Size: $totalSize Bytes ";
$totalNumResources += 1;//single resource is an HTTP request
echo "Final total HTTP requests: $totalNumResources" ;
return;
}
/* at this point we know we are dealing with an HTML document
which also counts as a resource, so increment the
$totalNumResources variable by 1:
*/
$totalNumResources += 1;
foreach($html->find('iframe') as $element)
{
echo "IFRAME" . "$element->src.\n";
start($element->src);
}
} //CLOSING BRACE FOR "START" FUNCTION
But, wait a second. What will happen to the $totalSize and $totalNumResources variables? Well, with the
implementation above they will actually get reset during the recursive call, and when the recursive call
returns, the values will be back to what they were originally, before the recursive call. This makes no sense
– what we really want is to count the number of HTTP requests and calculate the download size of the
requests that are added by the iframe. Remember – we want a cumulative sum of the HTTP requests and
download size, including whatever resources are added to the page by the iframe.
Saving the values of the PHP variables
So, there clearly needs to be a way to save the value of those variables while the recursive call is made – so
that the recursive call can just add on to those values. The way to do this is to pass in the values of the
$totalSize and $totalNumResources variables into the recursive call to the start function – so the start
function will now have to be modified so it can accept two extra parameters.
But, just passing the variables $totalSize and $totalNumResources into the recursive call is not enough – we
also need to return those variables from the function itself. If those values are incremented during the
recursive call, we need to be sure to retain the modified values even after the recursive call is over. So, we
will have to use the code below – note the changes are in red:
function start($URL, $totalSize, $totalNumResources){
if (!check_if_html($URL))
{
$totalSize = get_remote_file_size($URL);
echo "Final Total Download Size: $totalSize Bytes ";
$totalNumResources += 1; //single resource is HTTP request
echo "
Final total HTTP requests: $totalNumResources" ;
return;
}
/* at this point we know we are dealing with an
HTML document which also counts as a resource,
so increment the $totalNumResources variable by 1
*/
$html = file_get_html($URL);
$totalNumResources += 1;
foreach($html->find('iframe') as $element)
{
echo "IFRAME:" . "$element->src.\n";
list($totalSize, $totalNumResources) =
start($element->src, $totalSize, $totalNumResources);
}
return array($totalSize, $totalNumResources) ;
} //closing brace for 'start' function...
Note that we use the list function in PHP to hold the variables that will be returned once the start function
returns from the recursive call.
Another interesting thing that we should point out in the code above is the fact that we do not have any
code inside the iframe foreach loop that increments the $totalNumResources by 1. This is because during
the recursive call the $totalNumResources variable will be incremented by 1 anyways, because the iframe
URL is counted as a separate HTML document.
The final answer to Advanced PHP Interview Question Part 2
Now, here is what the complete PHP code looks like which includes the recursive call to the start function,
and is our final answer to part 2 of the PHP interview questions:
include('simple_html_dom.php');
$URL = $argv[1];
$totalSize = 0;
$totalNumResources = 0;
list($totalSize, $totalNumResources) =
start($URL, $totalSize, $totalNumResources);
echo "Final total download size: $totalSize Bytes
" ;
echo "Final total HTTP requests: $totalNumResources";
function start($URL, $totalSize, $totalNumResources){
if (!check_if_html($URL))
{
$totalSize = get_remote_file_size($URL);
echo "Final Total Download Size: $totalSize Bytes ";
$totalNumResources += 1; //a single resource is an HTTP request
echo "
Final total HTTP requests: $totalNumResources" ;
return;
}
$html = file_get_html($URL);
// find all images!!
foreach($html->find('img') as $element){
$size = get_remote_file_size($element->src);
$totalSize = $totalSize + $size;
$totalNumResources += 1;
//echo "Here is the total size: $totalSize.\n";
// echo "total resources: $totalNumResources .\n";
}
// Find all css
foreach($html->find('link') as $element)
{
if (strpos($element->href,'.css') !== false) {
$size = get_remote_file_size($element->href);
//
echo "SIZE: $size.\n";
$totalSize = $totalSize + $size;
$totalNumResources += 1;
//echo "total resources: $totalNumResources .\n";
//
echo "Here is the total size: $totalSize.\n";
//echo "$element->href.\n";
}
//only output the ones with 'css' inside...
}
foreach($html->find('script') as $element)
{
if (strpos($element->src,'.js') !== false) {
$size = get_remote_file_size($element->src);
$totalSize = $totalSize + $size;
$totalNumResources += 1;
// echo "Here is the total size: $totalSize.\n";
//
echo "total resources: $totalNumResources .\n";
//echo "$element->src.\n";
}
}
foreach($html->find('iframe') as $element)
{
//echo "IFRAME" . "$element->src.\n";
/* DON't count iframe as request, because it
will be counted as an HTML document
which means it will be counted above, so
get rid of the increment line in the
actual code, and explain this point as well!!
*/
list($totalSize, $totalNumResources) =
start($element->src, $totalSize, $totalNumResources);
}
return array($totalSize, $totalNumResources) ;
} //CLOSING BRACE FOR THE FUNCTION "START"...
function get_remote_file_size($url) {
$headers = get_headers($url, 1);
if (isset($headers['Content-Length']))
return $headers['Content-Length'];
//THIS ONE CHECKS FOR LOWER CASE L IN CONTENT-length (DIFFERENT FROM ABOVE!!)
if (isset($headers['Content-length']))
return $headers['Content-length'];
$c = curl_init();
curl_setopt_array($c, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac
OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3'),
));
curl_exec($c);
$size = curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
return $size;
curl_close($c);
}
function check_if_html($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
$data = curl_exec($ch);
$contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE );
curl_close($ch);
if (strpos($contentType,'text/html') !== false)
return TRUE;
// this is HTML, yes!
else
return FALSE;
}
Advanced PHP Practice Interview Questions And Answers Part 3
This is the last portion of our PHP practice interview questions and answers. Here is part 3 of the PHP
practice interview question:
Given the previous two portions of this question, can you name some of the drawbacks or
disadvantages of the solution you provided?
And, here are parts 1 and 2 of the original question for your convenience:
Write a PHP script to report the total download size of any URL. You may not use any 3rd-party
code that performs the entire task described below.
No HTML interface is necessary for this exercise; you can write this as a command-line script that
accepts the URL as an argument.
For a single-file resource such as an image or SWF, the script would simply report on the total
size of the document.
For a complex resource such as an HTML document, the script would need to parse it to find
references to embedded, included resources: javascript files, CSS files, iframes, etc.
The goal of this exercise is to output
total
number
- total download size for all requests
the
following
of
information
HTTP
for
a
given URL:
requests
The code should also be able to handle the URL in the src attribute of an iframe.
There are a lot of potential problems with the solution that we gave for this interview question. And, some of
those problems are unavoidable given that this is an interview question, and you don’t have a few weeks to
give a perfect answer. Try to see if you can think of any potential problems on your own.
No Javascript was executed to find additional resources
One disadvantage is the fact that our PHP code does not try to execute any Javascript in order to find
additional resources. What exactly does that mean? Well, many websites have some Javascript code that,
when executed, will request and display specific resources on their page – like gif’s, jpg’s, swf’s, or whatever
else the Javascript want’s to display (it’s really up to whoever wrote the Javascript code).
One specific example of Javascript requesting and displaying resources are websites (like this one) which
use Google Adsense to put Google’s ads on their website. In order to do this, a Javascript script is provided
by Google. And, some Javascript variables are also passed to the script to tell the script what size the ad
should be.
That Javascript is then executed by the browser and the correct ad is generated – it could be a SWF file
(which is a Flash format), a jpg, a gif, or whatever file type Google determines is appropriate to display at
the moment for whoever is viewing the webpage.
But, we can’t possibly be expected to run this Javascript in order to see what kind of resource is generated –
doing this could be tricky. And, there’s no guarantee that trying to execute this Javascript would even be
successful. This means that this will be one less resource that will be counted – and even more if there are
multiple ads on a page (like there are on this page). So, that’s one drawback of the PHP code we came up
with to answer this interview question. And, this means that less HTTP requests are counted than the true
number of HTTP requests on pages that use Javascript to request additional resources.
Checking for duplicate resources
One thing we admittedly do not do in our implementation is check for duplicate resources – like
images/CSS/Javascript files being referenced more than once in the HTML. Even though the files are
referenced multiple times in the HTML file, they still result in only one HTTP request because browsers are
smart enough to only make the request once. But, we are counting each one of those as a separate request
– this should be fairly simple to fix, but it is an issue with our code which could result is double-counting of
HTTP requests.
Checking for resources inside CSS documents
Another thing that we did not do is check for resources named inside external CSS documents – like
“background-image”. That means our implementation will not count those resources, which would lead to a
total number of HTTP requests that is lower than the actual number.
Checking for browser specific code
Another thing we did not do in our implementation is check for browser specific code – as in the HTML tags
that look like this: “<!–[if IE]>”. This could also potentially affect the HTTP request count and total
download size that our implementation reports, because certain files could conditionally be included
depending on what browser the user is using. A common usage of this is to use a different stylesheet for
older Internet Explorer browsers. This would mean that our implementation would double count – both the
stylesheet for Internet Explorer and the stylesheet for non-Internet Explorer browsers.
Browser cache
One thing that may be challenging is accounting for the resources which are already cached by the browser.
If a resource is cached, then it means that the browser will not generate a new HTTP request for that
resource because the browser will just use the version of the resource that it has saved in it’s cache. This
means that since we are not even taking browser caching into account in our implementation (something
that would probably be very difficult), in situations where cached resources are used we would definitely be
over-estimating the number of HTTP requests because we just count every single resource in the HTML as
an HTTP request.
Final thoughts on Advanced PHP interview question and answer part 3
Now, those are just some of the potential drawbacks of our answer to the PHP interview question.
Remember, you can see the final solution we came up with right here: PHP Interview Question Part 2.
Writing something that is very accurate given all the complexities would be very challenging and time
consuming – and would certainly not be expected in a PHP interview question like this. Most likely, what
interviewers are looking for with a question like this is that you have some essential PHP skills, and a good
foundation of knowledge for how the web works.
How would you return an array from a function in PHP?
If you have a function and you want to return multiple values from that function then you can easily return
an array from the function. This is what it would look like:
function someFunc( ) {
$aVariable = 10;
$aVariable2 = 20;
return array($aVariable, $aVariable2);
}
How to retrieve the values returned from a function in PHP
You will probably want to retrieve the values returned from the function after you call it. What is the best
way to do this? Well, there is actually a nice way to do this using the list function in PHP. Here is an example
of how to retrieve the values returned from an array in a function – assuming that we are calling the same
someFunc function that we showed above:
list($var1, $var2)
someFunc( );
=
//will print out values from someFunc
echo "$var1 $var2";
Now, $var1 and $var2 will hold the same values as $aVariable and $aVariable2 from the function someFunc.
Another option for retrieving the values returned from a function in PHP
Another possibility is to just store the return values in an array as well – here is an example:
$results = someFunc();
echo $results[0];
echo $results[1];
Note that in the example above everything returned from the call to someFunc is stored in the $results array
– and the echo statements will output the values returned from someFunc.
How do you delete cookies in PHP? Also, provide an example showing
how it’s done.
The interesting thing about deleting a cookie in PHP is the fact that you must use the same PHP function
that you would use to create the cookie – and that is the setcookie function.
Deleting cookies using the setcookie function in PHP
The setcookie() function can actually accept up to six arguments, but only one argument is actually required
— and that is the cookie name. If you use the setcookie function, and just pass a cookie name without a
value, then it will have the same effect as deleting the existing cookie with the same exact name. For
example, to create a cookie called first_name, you use this line:
setcookie('first_name', 'Robert');
And to delete the first_name cookie, you would do this:
Example of deleting a cookie in PHP:
setcookie('first_name');
But, as an extra safety measure, you should also set the expiration time to a time in the past – as you can
see below where we pass in “time() – 300″ for the expiration date. This is the way we recommend that you
delete the cookie in PHP:
Recommended way to delete a cookie in PHP:
setcookie('first_name', '', time()-300);
Parameters that must be set when deleting a cookie
When you delete a cookie, you should always use the same exact parameters that were used to create the
cookie in the first place. For example, If you set the domain and path when you created the cookie, then you
should use those parameters again when deleting the cookie.
Other interesting facts about deleting cookies in PHP
When deleting a cookie, that deletion does not actually take effect until the page has been reloaded or
another page has been accessed. This means that a cookie will still be available to a given page even after
that page has deleted that cookie – but once the page is reloaded or another page is accessed in that
browser window the cookie will be deleted.
What’s the difference between a cookie and a session in PHP?
PHP sessions improve upon cookies because they allow web applications to store and retrieve more
information than cookies. PHP sessions actually use cookies, but they add more functionality and security.
Sessions store data on the server, not on the browser like cookies
The main difference between a session and a cookie is that session data is stored on the server, whereas
cookies store data in the visitor’s browser. Sessions use a session identifier to locate a particular user’s
session data. This session identifier is normally stored in the user’s web browser in a cookie, but the
sensitive data that needs to be more secure — like the user’s ID, name, etc. — will always stay on the
server.
Sessions are more secure than cookies
So, why exactly should we use sessions when cookies work just fine? Well, as we already mentioned,
sessions are more secure because the relevant information is stored on the server and not sent back and
forth between the client and server. The second reason is that some users either turn off cookies or reject
them. In that scenario, sessions, while designed to work with a cookie, can actually work without cookies as
a workaround, as you can read about here: Can PHP sessions work without cookies?.
Sessions need extra space, unlike cookies
PHP sessions, unlike cookies which are just stored on the user’s browser, need a temporary directory on
the server where PHP can store the session data. For servers running Unix this isn’t a problem at all,
because the /tmp directory is meant to be used for things like this. But, if your server is running Windows
and a version of PHP earlier than 4.3.6, then the server will need to be configured – here is what to do:
Create a new folder on your Windows server – you can call it something like C:\temp. You want to be sure
that every user can read and write to this folder. Then, you will need to edit your php.ini file, and set the
value of session.save_path to point to the folder which you created on the Windows server (in this case, that
folder is under C:\temp). And finally, you will need to restart your web server so that the changes in the
php.ini file take effect.
Sessions must use the session_start function
A very important thing to remember when using sessions is that each page that will use a session must
begin by calling the session_start() function. The session_start() function tells PHP to either start a brand
new session or access an existing one.
How session_start in PHP uses cookies
The first time the session_start() function is used, it will try to send a cookie with a name of PHPSESSID and
a value of something that looks like a30f8670baa8e10a44c878df89a2044b – which is the session identifier
that contains 32 hexadecimal letters. Because cookies must be sent before any data is sent to the browser,
this also means that session_start must be called before any data is sent to the Web browser.
Registering values to the session
After the session_start function is called, values can be registered to the session using the $_SESSION
associative array. This is what it would look like:
$_SESSION['name'] = 'Jack';
$_SESSION['last_name'] = 'Lopez';
Can sessions work without cookies? If so, how does a session work without
cookies enabled in PHP?
This is a great interview question because even if you do not know the answer, you could come up with a
fairly accurate answer on your own with some basic knowledge of PHP sessions and some analytical
thinking. See if you can possibly think of how PHP sessions would work without cookies enabled in the
browser.
The answer to how PHP sessions can work without cookies
Sessions in PHP normally do use cookies to function. But, PHP sessions can also workwithout cookies in
case cookies are disabled or rejected by the browser that the PHP server is trying to communicate with.
How PHP sessions work without cookies
PHP does two things in order to work without cookies:
1. For every HTML form that PHP finds in your HTML code (which of course can be part of a PHP file), PHP
will automatically add a hidden input tag with the name PHPSESSID right after the <form> tag. The value of
that hidden input tag would be whatever value PHP assigns your session ID. So, for example, the hidden
input could look something like this:
<form>
<input type="hidden" name="PHPSESSID" value="12345678" >
</form>
This way, when the form is submitted to the server, PHP will be able to retrieve the session identifier from
the form and will know who it is communicating with on the other end, and will also know which session
to associate the form parameters with if it is adding the form parameters to the PHP session.
2. PHP will find all the links in your HTML code, and will modify those links so that they have a GET
parameter appended to the link itself. That GET parameter will also have the name of PHPSESSID, and the
value will of course be the unique session identifier – so the PHP session ID will basically be a part of the
URL query string.
So, for example, if your code has a link that originally looks like this:
<a href="http://www.example.com">Go to this link><a/>
When modified by PHP to include the session ID, it could look something like this:
<a
href="http://www.example.com?PHPSESSID=72aa95axyz6cd67d82ba0f809277326dd">Go
this link</>
to
PHPSESSID can have it’s name changed in php ini file
Note that we said PHPSESSID is the name that will be used to hold the PHP session value. The name
PHPSESSID can actually be changed to whatever you want if you modify the session.name value in the
php.ini file.
What is a disadvantage of using PHP sessions without cookies enabled?
A disadvantage is that using PHP sessions without cookies is the fact that if you share a URL that has the
PHP session ID appended to it with someone else, then they could potentially use the same exact session
that you were using. It also opens you up to session hijacking – where a user’s session is deliberately stolen
so that a hacker can impersonate you and do some damage.
Download