PHP Type Casting Tutorial September 3, 2014 by Ankur Kumar Singh Leave a Comment Meaning of type casting is to use value of a variable with different data type. In other word typecasting is way to utilize one data type variable into different data type. Typecasting is explicit conversion of data type because user explicitly defines the data type in which he wants to cast. In this tutorial we will explore various aspects of PHP Type casting. PHP Type casting PHP does not require or support type definition of the variable. In php we never define data type while declaring the variable. In PHP variables automatically decide the data type on the basis of the value assignment or context. For example: <?php $i =1; var_dump($i); //$i is integer $i = 2.3; var_dump($i); //$i is float $i = "php type casting"; var_dump($i)//$i is string ?> In above example you can see that variable $i type is getting changed on different type of value assignment. So due to this flexible nature of the PHP we do not need to type cast variable always. But Sometime when we need extra security in the variable we cast type. For example if we are taking some integer input from user then we should type cast. PHP type casting works same as C programming. Desired data type name with parenthesis before the variable which we need to cast. For example, if we need to cast string to integer then below will work: <?php $string_var = "string value for php type"; $int_var = (int)$string_var; var_dump($ini_var); ?> We can cast following data type variable in php 1. Integer using (int) or (integer) 2. Boolean using (bool) or (boolean) 3. Floating Number using (float) or (real) or (double) 4. String using (str) 5. Array using (array) 6. Object using (object) 7. Null using (unset) PHP type Casting to Integer Using (int) or (integer) keyword we can cast/convert any data type value to integer. If we need to take integer casting then we can also use intval() function. If we will convert boolean to integer then False will output 0 and true will output 1. For example <?php $bool_false = false; $int_val = (int) $bool_false ; var_dump($int_val); //Output will be 0 $bool_true = true; $int_val = (int) $bool_true ; var_dump($int_val); //Output will be 1 ?> If we will convert resource data type to integer then it will return unique resource ID. For example <?php $fp = fopen("filename.txt", "w"); $int_cast = (int) $fp; var_dump($int_cast); ?> If we will cast float number to integer then output will be the number before decimal. Means if we will cast 10.9 to integer then output will be 10. <?php $float_num = 10.9; echo (int) $float_num; ?> Conversion from string to number is quite complected and also used rarely. For the complete conversion list you can refer http://php.net/manual/en/language.types.string.php#language.types.string.conv ersion. PHP type casting to Boolean We can cast any variable to Boolean using (bool) or (boolean) keyword. If we will convert any variable data type to Boolean then if the variable has value(and value is not 0) then it will return true, otherwise false. <?php var_dump((bool) 1); //return true var_dump((bool) 0);//return false var_dump((bool) "");//return false var_dump((bool) , "ank");//return true ?> PHP type casting to Float Except for the string all float casting first go through integer casting then converted into float. If we will convert object to float then notice will be thrown in PHP 5. PHP type casting to string We can convert any data type to string using (string). String conversion automatically happen in the scope where it needed. In the most of the cases value not changed. But for boolean false converted into “” and true in “1”. Below is the example <?php $boo_true = true; var_dump((string) $boo_true);// Output will be "1" var_dump((string) false) ;//Output will be "" var_dump((string) 1);//output will be 1 ?> PHP type casting to array We can convert any data type variable in array using (array) keyword. Any scalar data type conversion into array will create array and add element at 0th index. For example: <?php var_dump((array) , 5);// value 5 in the array with 0th index var_dump((array) NULL);// Will be empty array ?> Usually we use array conversion in case of the object. All public property will convert in the key value format. Abstract Classes and Interface in PHP March 24, 2013 by Ankur Kumar Singh 50 Comments Abstract class and Interface in php play very important role in oop. In this section we will discuss following point 1. What is abstract classes. 2. What is interface 3. How to implement abstract classes in php 4. How to implement interface in php 5. Different between abstract classes and interface. What is abstract Classes As from name it seem like something that is hidden. Yes nature of the abstract classes are same. Abstract classes are those classes which can not be directly initialized. Or in other word we can say that you can not create object of abstract classes. Abstract classes always created for inheritance purpose. You can only inherit abstract class in your child class. Lots of people say that in abstract class at least your one method should be abstract. Abstract method are the method which is only defined but declared. This is not true definition as per my assumption. But your any class has at least one method abstract than your class is abstract class. Usually abstract class are also known as base class. We call it base class because abstract class are not the class which is available directly for creating object. It can only act as parent class of any normal class. You can use abstract class in class hierarchy. Mean one abstract class can inherit another abstract class also. Abstract classes in PHP Abstract classes in php are simillar like other oop languages. You can create abstract classes in php using abstract keyword. Once you will make any class abstract in php you can not create object of that class. abstract class abc { public function xyz() { return 1; } } $a = new abc();//this will throw error in php above code will throw error in php. Abstract classes in php are only for inheriting in other class. abstract class testParent { public function abc() { //body of your funciton } } class testChild extends testParent { public function xyz() { //body of your function } } $a = new testChild(); In above example you are creating of testChild Class. TestChild class is inheriting testParent abstract class. So your abstract class is only available for inheritance. Main motive of creating abstract classes in php is to apply restriction of direct initialization or object creation. Implementation of abstract method As we know that abstract functions are those functions of abstract class which is only defined. It will be declared in your child class. You can create any method abstract using keyword abstract. You can only create abstract method either in abstract class or interface. Following is example of the abstract method implementation: abstract class abc { abstract protected function f1($a , $b); } class xyz extends abc { protected function f1($name , $address) { echo "$name , $address"; } } $a = new xyz(); In class abc we have defined an abstract function f1. Now when we have inherited class abc then declared function f1. If you have an abstract method in your abstract class then once you inherit your abstract class then it is necessary to declare your abstract method. If you will not declare your abstract method then PHP will throw error in that case. You can declare your abstract method in child class with the same visibility or less restricted visibility. abstract class parentTest { abstract protected function f1(); abstract public function f2(); //abstract private function f3(); //this will trhow error } class childTest { public function f1() { //body of your function } public function f2() { //body of your function } protected function f3() { //body of your function } } $a = new childTest(); In above code you can see that you have declare 3 function in abstract class. But private declaration of the abstract method will always throw error. Because private method is availabe only in the same class context. But in case of f1. This is protected. Now in child class we have defined it as public because public is less restricted than protected. And for function f2 which is already public so we have defined it as public in our child class. We have defined it public because no any visibility is less restricted than public. What is Interface ? Interface in oop enforce definition of some set of method in the class. By implementing interface you are forcing any class to must declaring some specific set of methods in oop. For example if you are creating class to render HTML element then it is necessary to set id and name of your html tag. So in this case you will create interface for that class and define method like setID and setName. So whenever someone will create any class to render HTML tag and implemented your interface then he must need to define setId and setName method in their class. In other word you can say that by help of interface you can set some definition of your object. Interface is very useful if you are creating architecture of any oop base application. Inter Interface in PHP Interface in php can be implemented like other oop lanugage. You can create interface in php using keyword interface. By implementation of interface in php class you are specifying set of the method which classes must implement. You can create interface in php using interface keyword. Rest of the things are typically identical to classes. Following is very small example of interface in php. interface abc { public function xyz($b); } So in above code you are creating interface with name abc. Interface abc has function xyz. Whenever you will implement abc interface in your class then you have to create method with name xyz. If you will not create function xyz then it will throw error. You can implement your interface in your class using implements keyword. Let us implement our interface abc in our class class test implements abc { public function xyz($b) { //your function body } } You can only define method in interface with public accessibility. If you will use other than public visibility in interface then it will throw error. Also while defining method in your interface do not use abstract keyword in your methods. You can also extend interface like class. You can extend interface in php using extendskeyword. interface template1 { public function f1(); } interface template2 extends template1 { public function f2(); } class abc implements template2 { public function f1() { //Your function body } public function f2() { //your function body } } So here template2 has all property of tempate2. So whenever you will implement template2 in your class, you have to create function of both interfaces. You can also extend multiple interface in one interface in php. interface template1 { public function f1(); } interface template2 { public function f2(); } interface template3 extends template1, template2 { public function f3(); } class test implements template3 { public function f1() { //your function body } public function f2() { //your function body } public function f3() { //your function body } } You can also implement more than one interface in php class. interface template1 { public function f1(); } interface template2 { public function f2(); } class test implments template1, template2 { public function f1() { //your function body } public function f2() { //your function body } } You can not implement 2 interfaces if both share function with same name. It will throw error. Your function parameter in class must be identical to the parameter in the interface signature. Following is example some example interface template1 { public function f1($a) } class test implements template1 { public function f1($a) { echo $a; } } Above will work. But following example will not work: interface template1 { public function f1($a) } class test implements template1 { public function f1() { echo $a; } } But it is not necessary to use the same name of the variable. Like $a. You can also use any name. For example: interface template1 { public function f1($a) } class test implements template1 { public function f1($name) { echo $name; } } If you are using default argument then you can change your value of the argument. For example interface template1 { public function f1($a = 20) } class test implements template1 { public function f1($name = "ankur") { echo $name; } } In above section we have discussed interfaces and abstract classes in php. Both are almost doing same things but has some difference. Differences between abstract class and interface in PHP Following are some main difference between abstract classes and interface in php 1. In abstract classes this is not necessary that every method should be abstract. But in interface every method is abstract. 2. Multiple and multilevel both type of inheritance is possible in interface. But single and multilevel inheritance is possible in abstract classes. 3. Method of php interface must be public only. Method in abstract class in php could be public or protected both. 4. In abstract class you can define as well as declare methods. But in interface you can only defined your methods. Abstract Classes and Interface in PHP March 24, 2013 by Ankur Kumar Singh 50 Comments Abstract class and Interface in php play very important role in oop. In this section we will discuss following point 1. What is abstract classes. 2. What is interface 3. How to implement abstract classes in php 4. How to implement interface in php 5. Different between abstract classes and interface. What is abstract Classes As from name it seem like something that is hidden. Yes nature of the abstract classes are same. Abstract classes are those classes which can not be directly initialized. Or in other word we can say that you can not create object of abstract classes. Abstract classes always created for inheritance purpose. You can only inherit abstract class in your child class. Lots of people say that in abstract class at least your one method should be abstract. Abstract method are the method which is only defined but declared. This is not true definition as per my assumption. But your any class has at least one method abstract than your class is abstract class. Usually abstract class are also known as base class. We call it base class because abstract class are not the class which is available directly for creating object. It can only act as parent class of any normal class. You can use abstract class in class hierarchy. Mean one abstract class can inherit another abstract class also. Abstract classes in PHP Abstract classes in php are simillar like other oop languages. You can create abstract classes in php using abstract keyword. Once you will make any class abstract in php you can not create object of that class. abstract class abc { public function xyz() { return 1; } } $a = new abc();//this will throw error in php above code will throw error in php. Abstract classes in php are only for inheriting in other class. abstract class testParent { public function abc() { //body of your funciton } } class testChild extends testParent { public function xyz() { //body of your function } } $a = new testChild(); In above example you are creating of testChild Class. TestChild class is inheriting testParent abstract class. So your abstract class is only available for inheritance. Main motive of creating abstract classes in php is to apply restriction of direct initialization or object creation. Implementation of abstract method As we know that abstract functions are those functions of abstract class which is only defined. It will be declared in your child class. You can create any method abstract using keyword abstract. You can only create abstract method either in abstract class or interface. Following is example of the abstract method implementation: abstract class abc { abstract protected function f1($a , $b); } class xyz extends abc { protected function f1($name , $address) { echo "$name , $address"; } } $a = new xyz(); In class abc we have defined an abstract function f1. Now when we have inherited class abc then declared function f1. If you have an abstract method in your abstract class then once you inherit your abstract class then it is necessary to declare your abstract method. If you will not declare your abstract method then PHP will throw error in that case. You can declare your abstract method in child class with the same visibility or less restricted visibility. abstract class parentTest { abstract protected function f1(); abstract public function f2(); //abstract private function f3(); //this will trhow error } class childTest { public function f1() { //body of your function } public function f2() { //body of your function } protected function f3() { //body of your function } } $a = new childTest(); In above code you can see that you have declare 3 function in abstract class. But private declaration of the abstract method will always throw error. Because private method is availabe only in the same class context. But in case of f1. This is protected. Now in child class we have defined it as public because public is less restricted than protected. And for function f2 which is already public so we have defined it as public in our child class. We have defined it public because no any visibility is less restricted than public. What is Interface ? Interface in oop enforce definition of some set of method in the class. By implementing interface you are forcing any class to must declaring some specific set of methods in oop. For example if you are creating class to render HTML element then it is necessary to set id and name of your html tag. So in this case you will create interface for that class and define method like setID and setName. So whenever someone will create any class to render HTML tag and implemented your interface then he must need to define setId and setName method in their class. In other word you can say that by help of interface you can set some definition of your object. Interface is very useful if you are creating architecture of any oop base application. Inter Interface in PHP Interface in php can be implemented like other oop lanugage. You can create interface in php using keyword interface. By implementation of interface in php class you are specifying set of the method which classes must implement. You can create interface in php using interface keyword. Rest of the things are typically identical to classes. Following is very small example of interface in php. interface abc { public function xyz($b); } So in above code you are creating interface with name abc. Interface abc has function xyz. Whenever you will implement abc interface in your class then you have to create method with name xyz. If you will not create function xyz then it will throw error. You can implement your interface in your class using implements keyword. Let us implement our interface abc in our class class test implements abc { public function xyz($b) { //your function body } } You can only define method in interface with public accessibility. If you will use other than public visibility in interface then it will throw error. Also while defining method in your interface do not use abstract keyword in your methods. You can also extend interface like class. You can extend interface in php using extendskeyword. interface template1 { public function f1(); } interface template2 extends template1 { public function f2(); } class abc implements template2 { public function f1() { //Your function body } public function f2() { //your function body } } So here template2 has all property of tempate2. So whenever you will implement template2 in your class, you have to create function of both interfaces. You can also extend multiple interface in one interface in php. interface template1 { public function f1(); } interface template2 { public function f2(); } interface template3 extends template1, template2 { public function f3(); } class test implements template3 { public function f1() { //your function body } public function f2() { //your function body } public function f3() { //your function body } } You can also implement more than one interface in php class. interface template1 { public function f1(); } interface template2 { public function f2(); } class test implments template1, template2 { public function f1() { //your function body } public function f2() { //your function body } } You can not implement 2 interfaces if both share function with same name. It will throw error. Your function parameter in class must be identical to the parameter in the interface signature. Following is example some example interface template1 { public function f1($a) } class test implements template1 { public function f1($a) { echo $a; } } Above will work. But following example will not work: interface template1 { public function f1($a) } class test implements template1 { public function f1() { echo $a; } } But it is not necessary to use the same name of the variable. Like $a. You can also use any name. For example: interface template1 { public function f1($a) } class test implements template1 { public function f1($name) { echo $name; } } If you are using default argument then you can change your value of the argument. For example interface template1 { public function f1($a = 20) } class test implements template1 { public function f1($name = "ankur") { echo $name; } } In above section we have discussed interfaces and abstract classes in php. Both are almost doing same things but has some difference. Differences between abstract class and interface in PHP Following are some main difference between abstract classes and interface in php 1. In abstract classes this is not necessary that every method should be abstract. But in interface every method is abstract. 2. Multiple and multilevel both type of inheritance is possible in interface. But single and multilevel inheritance is possible in abstract classes. 3. Method of php interface must be public only. Method in abstract class in php could be public or protected both. 4. In abstract class you can define as well as declare methods. But in interface you can only defined your methods. Overloading and Overriding in PHP March 24, 2013 by Ankur Kumar Singh 22 Comments Function or method Overloading and overriding method is very basic and useful feature of any oop language. In this tutorial we will discuss implementation of method overloading and overriding in php. Here first we will explore basics of overloading and overriding. After exploration of basics we will implement overloading and overriding in php. Before going further I am assuming that you have basic knowledge of classes and inheritance in php. Also you have understanding about magic method in php. Magic method because overloading in php can be implmented using magic methods. What is Method Overriding in OOP ? Basic meaning of overriding in oop is same as real word meaning. In real word meaning of overriding phenomena of replacing the same parental behavior in child. This is same in case of method overriding in oop. In oop meaning of overriding is to replace parent class method in child class. Or in simple technical word method overriding mean changing behavior of the method. In oop overriding is process by which you can re-declare your parent class method in child class. So basic meaning of overriding in oop is to change behavior of your parent class method. Normally method overriding required when your parent class have some method, but in your child class you want the same method with different behavior. By overriding of method you can complete change its behavior from parent class. To implment method overiding in oop we commonly create same method in child class. What is Method Overloading in OOP ? Overloading in oop is same as overloading in real word. In real word overloading means assigning extra work to same machine or person. In oop method overloading is same. By process of method overloading you are asking your method to some extra work. Or in some cases we can say some different work also. Normally method overloading in oop is managed on the basis of the argument passed in function. We can achieve overloading in oop by providing different argument in same function. Overloading and Overriding in PHP Hope your basic concept of overloading and overriding is clear now. Now let us explore implementation of overloading and overriding in php. Implementation of overriding in php is very easy. If your parent class has a function. You can create function with same name in your child class to override the function. Implementation of overriding can not be achieved by creating 2 function with same name and different argument in php. Because we can not create same name function more than 1 time in php class. To implement overloading we need to take help of magic method in php. In below section we will explore overloading and overriding one by one. Overloading in PHP As we know that we can not implement overloading by create 2 function in with same name in class. So to implement overloading in php we will take help of magic method __call. Magic method __call invoked when method called by class object is not available in class. So here we will not create method exactly and will take help of __call method. Now call method will provide us 2 argument, 1st name of the method called and parameter of the function. Now with the help of either switch , case or if else we will implement overloading in php. Following is very simple example of overloading in php. class test { public function __construct() { //Your logic for constructor } public function __call($method_name , $parameter) { if($method_name == "overlodedFunction") //Function overloading logic for function name overlodedFunction { $count = count($parameter); switch($count) { case "1": //Business log in case of overlodedFunction function has 1 argument echo "You are passing 1 argument"; break; case "2": //Incase of 2 parameter echo "You are passing 2 parameter"; break; default: throw new exception("Bad argument"); } } else { throw new exception("Function $method_name does not exists "); } } } $a = new test(); $a->overlodedFunction("ankur"); $a->overlodedFunction("techflirt" , "ankur"); As in above class test magic method __call is implemented which is managing overloading public function __call($method_name , $parameter) { if($method_name == "overlodedFunction") //Function overloading logic for function name overlodedFunction { $count = count($parameter); switch($count) { case "1": //Business log in case of overlodedFunction function has 1 argument echo "You are passing 1 argument"; break; case "2": //Incase of 2 parameter echo "You are passing 2 parameter"; break; default: throw new exception("Bad argument"); } } else { throw new exception("Function $method_name does not exists "); } } As we know that __call magic method invoked when method is not available in the class. So in case of above test class example we have not created function overlodedFunction. So whenever method overlodedFunction is called __call invoked. __call pass 2 variable, first name of the called method and other is parameter passed in the called function. Now in the __call function I have applied if condition to ensure that our business logic of overloading works only for overlodedFunction function. In if block we have counted number of argument in parameter and applied business logic. Overriding in PHP Overriding in php is very easy. As we know that overriding is process of modifying the inherited method. So in case of inheritance you only need to create method with same name in your child class which you want to override. Following is example of overriding of method in php. class testParent { public function f1() { echo 1; } public function f2() { echo 2; } } class testChild { function f2($a) //overriding function f2 { echo "$a"; } } $a = new testChild(); $a->f2("ankur");//it will print ankur In above example you are overriding function f2. While overriding you are free to change business logic, visibility and number of parameter. Inheritance in PHP March 24, 2013 by Ankur Kumar Singh 9 Comments Inheritance is a concept in object oriented programming. With the help of inheritance we can get all property and method of one class in another class. This is principle to take re-fusibility on upper level. Inheritance in php is introduced from php5 version. In this chapter we will explore about basics concept of inheritance After basic we will discuss implementation of inheritance in php. This tutorial for the beginner who want to learn basic concept of inheritance in php. Before going further I am assuming that have some idea of oop in php. Later in this chapter we will cover some advance aspect of the inheritance also. What is inheritance? Inheritance is nothing but a design principle in oop. By implementing inheritance you can inherit(or get) all properties and methods of one class to another class.The class who inherit feature of another class known as child class.The class which is being inherited is know as parent class.Concept of the inheritance in oop is same as inheritance in real world. For example, child inherits characteristics of their parent. Same is here in oop. One class is inheriting characteristics of another class. With the help of inheritance you can increase re-usability of code. Let us take an example in terms of generic programming practices. Suppose you are going to create classes to render different html tag(div, span, form, table etc). Now you will create class with name html_div, html_span , html_form. You are creating different class because every element is different in nature. For example form has action and method and you will have different input element in form. But table will have tbody, tr, th and td. Now just think for some moment. There are some element and their rendering is same in all element. For example all html mention above is having name, id, class attribute which is same. Also rendering of those element is also same. So in above case you can create parent class with name HTML and you can inherit that class across all of your classes like div, span, form. Following is the generic code structure of inheritance in oop taking your HTML attribute in consideration. I am taking php syntex for better understnding class HTML { protected $name; protected $id; protected function basicAttribute { return "name='$this->name' id='$this->id'"; } } Class HTML_div extends HTML { public function __construct($id , $name) { $this->id = $id; $this->name = $name; } public function getDiv($content) { $basicAttribute = $this->basicAttribute(); return "<div $basicAttribute >$content</div>" } } Class HTML_span extends HTML { public function __construct($id , $name) { $this->id = $id; $this->name = $name; } public function getSpan($content) { $basicAttribute = $this->basicAttribute(); return "<span $basicAttribute >$content</span>" } } Above code is and example of basic inheritance in php. All method(protected and public) from HTML class is directly accessible in your class HTML_div and HTML_span class. In both child classes you no need to write rendering of id and name logic again and again. This really saves time and give some good modulations in the code. Hope your basic understanding about inheritance is clear. Now let us move to implementation of inheritance in php. Inheritance in php Concept of inheritance in php is as simple as in other oop languages as from php5 community target is to provide healthy oop concept. If you will analyze basic code of my previous topic, this is typical example of inheritance in php. To implementing inheritance in php you need at least 2 classes. One will be parent class and other will be child class. In child class you can inherit all properties and methods(protected and public only) from parent class. You can implement inheritance in php using keyword extends. Let us take above example again with some modification: class HTML { protected $name; public $id; private $with; protected function basicAttribute { return "name='$this->name' id='$this->id'"; } } Class HTML_div extends HTML { public function __construct($id , $name) { $this->id = $id; $this->name = $name; } public function getDiv($content) { $basicAttribute = $this->basicAttribute(); return "<div $basicAttribute >$content</div>" } }' $objDiv = new HTML_div("bloc_main" , 'avc'); $objDiv->getDiv('this is and example of inheritance in php'); Now in above code class HTML_div is inheriting property and method from class HTML. Multilevel and Multiple inheritance in PHP In php multilevel inheritance is possible but multiple inheritance is not possible. In simplified terms in php child class can not inherit more than one parent class. But hierarchical inheritance is possible in php. Hierarchical means Parent inherit property of grand parent class. Grand child inherit property of parent class. So in multilevel inheritance child can get some property of from grand parent class also. Example of Multiple inheritance in PHP class test { //Your class body } class test1 { //Your class body } class test3 extends test1 test2 { //your class body } Above code will not work in php. Because php is single inheritance language. Example of Multilevel inheritance in PHP class grandParent { //Body of your class } class parent extends grandParent { //Body Of your class } class child extends parent { //Body of your class } This is very basic example of multilevel inheritance. In php it is possible to implement multilevel inheritance. In above example parent class is inheriting grand parent property. And and child is inheriting parent property. So child have some parent and grand parent property. Static Methods and Property in Inheritance in PHP As in our example of HTML_div class we have explored that we can use $this> keyword to get all property and method of parent(HTML) class. But if your parent or child method is static, then you can access static methods or properties using self and parent keyword. Also this is not necessery to make method static if you want to use self or parent keyword. This is very useful if your parent and child both method is having property or method with same name. If both classes having same property and you want to call specific property or method then you can use this keyword. Self and parent in case of static methods: class parent { public static abc() { //your function body } } class child { public static xyz() { //your function body } function callStatic() { self::xyz(); parent::abc(); } } Self and Parent without static class parent { protected function xyz() { //Your function body } } class child extends parent { public function xyz() { //your function body } public function calll() { self::xyz(); parent::xyz(); } } Download Code for Inheritance in PHP Static Methods and Properties in PHP March 24, 2013 by Ankur Kumar Singh 9 Comments Static methods and properties in php is very useful feature. Static methods and properties in php can directly accessible without creating object of class. Your php class will be static class if your all methods and properties of the class is static. Static Methods and Properties in PHP will be treated as public if no visibility is defined. Static Properties/Variables in PHP Static properties of class is a property which is directly accessible from class with the help of ::(scope resolution operator). You can declare static property using static keyword. In other word you can make any property static by using static keyword. following is the basic example of static variable in php class: class test { public static $a;//Static variable } test::$a = 5; echo test::$a; You can not access regular property by static way. It will generate fatal error. For withing the class you can access static property using self keyword. If you are accessing parent class property then you need to use parent keyword. class testParent { public static $var1; } class testChild extends testParent { public static $var2; public $abc =2; function testFunction() { self::$var2 = 3; parent::$var1 = 5; } } echo testChild::$abc; //throw fatal error Static variable or property are the best way to preserver value of the variable within the context of different instance. Please go through following example for better expatiation: class test { private static $no_of_call = 0; public function __construct() { self::$no_of_call = self::$no_of_call + 1; echo "No of time object of the class created is: ". self::$no_of_call; } } $objT = new test(); // Prints No of time object of the class created is 1 $objT2 = new test(); //Prints No of time object of the class created is 2 So creating static variable or property is very useful if you want to share some data between the different object of the same class. We will get better example of the static property implementation in chapter PHP Design Patterns. Static Methods or functions As in general class various process are same for methods and properties, same is with Static Methods and Properties in PHP. You can create your function or method static using static keyword. You can access all visible static methods for you using :: like in static variables. class test { static function abc($param1 , $param2) { echo "$param1 , $param2"; } } test::abc("ankur" , "techflirt"); If you will use regular or normal method statically then you will get E_STRICT warning. In case of variable or property it was throwing fatal. Let us take above example class test { function abc($param1 , $param2) { echo "$param1 , $param2"; } } test::abc("ankur" , "techflirt"); //will work fine with warning. Since static methods is called direct $this variable will not available in the method. Magic Methods in PHP March 24, 2013 by Ankur Kumar Singh 32 Comments Magic methods in php are some predefined function by php compiler which executes on some event. Magic methods starts with prefix __, for example __call, __get, __set. I am including magic methods topic in my oop tutorial here because these method mostly applied in classes of PHP. If you have gone through my previous chapter then you have seen __construct function. __construct is a magic method which automatically call on creating object of the classes. There are verous magic methods in php. Here we will discuss some of the most comman magic methods of php which will be use in object oriented programming. First of let us review all magic method with short description. List of List of Magic Methods in PHP Magic Description Method This magic methods is called when someone create object of your class. Usually this __construct is used for creating constructor in php5. This magic method is called when object of your class is unset. This is just opposite __destruct of __construct. This method called when your object attempt to read property or variable of the __get class which is inaccessible or unavailable. This method called when object of your class attempts to set value of the property __set which is really inaccessible or unavailable in your class. This magic methods trigger when isset() function is applied on any property of the __isset class which isinaccessible or unavailable. __unset is something opposite of isset method. This method triggers __unset when unset() function called on inaccessible or unavailable property of the class. __call magic method trigger when you are attempting to call method or function of __call the class which is either inaccessible or unavailable. __callstatic __callstatic execture when inaccessible or unavailable method is in static context. __sleep __sleep methods trigger when you are going to serialize your class object. __wakeup __wakeup executes when you are un serializing any class object. __toString __toString executes when you are using echo on your object. __invoke __invoke called when you are using object of your class as function Above list is the most conman used magic methods in php object oriented programming. Above magic methods of php executes on some specif events occur on your class object. For example if you simply echo your object then __toString method trigger. Let us create group of related magic method and analyze how it is working. __construct and __destruct magic method in PHP __construct method trigger on creation of object. And __destruct triggers of deletion of object. Following is very basic example of __construct and __destruct magic method in php: class test { function __construct() { echo 1; } function __destruct() { echo 2; } } $objT = new test(); //__construct get automatically executed and print 1 on screen unset($objT);//__destruct triggers and print 2. __get __set __call and __callStatic Magic Methods __get, __set, __call and __callStatic all magic methods in php directly related with no accessible method and property of the class. __get takes one argument and executes when any inaccessible property of the method is called. It takes name of the property as argument. __set takes two property and executes when object try to set value in inaccessible property. It take first parameter as name of the property and second as the value which object is try to set. __call method fires when object of your class is trying to call method of property which is either non accessible or not available. It takes 2 parameter First parameter is string and is name of function. Second parameter is an array which is arguments passed in the function. __callStatic is a static magic method. It executes when any method of your class is called by static techniques. Following is example of __get , __set , __call and __callStatic magic methods class test { function __get($name) { echo "__get executed with name $name "; } function __set($name , $value) { echo "__set executed with name $name , value $value"; } function __call($name , $parameter) { $a = print_r($parameter , true); //taking recursive array in string echo "__call executed with name $name , parameter $a"; } static function __callStatic($name , $parameter) { $a = print_r($parameter , true); //taking recursive array in string echo "__callStatic executed with name $name , parameter $a"; } } $a = new test(); $a->abc = 3;//__set will executed $app = $a->pqr;//__get will triggerd $a->getMyName('ankur' , 'techflirt', 'etc');//__call willl be executed test::xyz('1' , 'qpc' , 'test');//__callstatic will be executed __isset and __unset magic methods __isset and __unset magic methods in php are opposite of each other. __isset magic methods executes when function isset() is applied on property which is not available or not defined. It takes name of the parameter as an argument. __unset magic method triggers when unset() method is applied on the property which is either not defined or not accessible. It takes name of the parameter as an argument. Following is example of __isset and __unset magic method in php class test { function __isset($name) { echo "__isset is called for $name"; } function __unset($name) { echo "__unset is called for $name"; } } $a = new test(); isset($a->x); unset($a->c); Classes and Objects Tutorial in PHP March 24, 2013 by Ankur Kumar Singh 26 Comments Classes and Objects are key part of object oriented programming (oop) in php. If you are directly here then I am assuming that you have basic knowledge of OOP. If you are beginner for oop and do not have good confidence on basics of classed and objects then please first go to Basics of OOP. Who can Read this tutorial and what you can can learn here ? This tutorial is for beginner who has some basic knowledge of oop. If you know what is oop, what is object and classes and what to learn its implementation, then you are at right place. Before going further you should have basic understanding of oop, classes and object. Technical knowledge of oop is not commendatory for this tutorial. This tutorial, we will start from the basic concept of classes and objects. Here we will start from how to create classes in php. How to create object of any class. And in later part we will explore some advance concept of object and class here. In short if you have basic knowledge of oop and want to learn classes and objects implantation in php then you are at right place. Class In PHP Concept of class ( or basic object oriented structure) introduced from php4. But complete coverage of class like access modifier or interface is introduced from php5. Creating class is very easy in php. You can create class with help of using class keywordin php. Following is a basic class example class myOwnClass { //variables of the class var $variable1; var $variable2; //Function of class function mergeVariable() { return $this->variable1 . $this->variable2; } } Class myOwnClass is created by using keyword class. Your name of the class will be general string without space. Also complete block of the class in enclosed within {}(see bold braces). All variables of this class is defined in the beginning of the class. Variables are starting with var keyword. From php5 you can declare variable using its level of visibility also. For example if you want to declare $varaible1 to be accessible from anywhere then you can use public $variable1 instead of var $variable1. If you will usevar $variable1 in php5, the variable will be treated as public by default. Next part is function declaration of your class. As per the above example you can directly declare function as function mergeVariable(). It is very basic creation of function within your class and supports from php4. In php5 you can apply visibility on your function also. For the same function you can write in php5 like public function mergeVariable(). If you will not define your visibility factor by default your function will be treated as public. following is example of same above class in php5: class myOwnClass { //variables of the class public $variable1; private $variable2; //Function of class public function mergeVariable() { return $this->variable1 . $this->variable2; } } So in the basic architecture of php4 and php5 class is almost same except use of visibility. We will visibility in depth in chapter Visibility in OOP. For now you just think thatvisibility is access factor of your class’s method and variable. If you want your object to allow access of your variable or function then make it public. If you do not want your object to access methods and properties then make it private. You can directly pass value to your class from by the class function parameter also. Function of the class always work like general function in the php. For example { //variables of the class public $variable1; private $variable2; //Function of class public function mergeVariable($third_var) { return $this->variable1 . $this->variable2. $third_var; } } You can not create class with name stdClass in PHP. It is reserved class of the php. stdClass represent standard object. It is used to create empty object. You can use it without creating it. IF you will forcefully try to create class with name stdClass. PHP will throw fatal error with following message. Fatal error: Cannot redeclare class stdClass It is recommended to not create your function of the class starting with __ like __call. Because function started with __ is seems like magic function in php. You can not breakup your class in several file. But withing your class function you can breakup it into multiple file. Let us understand this by example. Following is not allowed in your class class myClass { public $abc; public function test() { return $this->abc; } include "abc.php"; } But following is allowed: class myClass { public $abc; public function test() { include "abc.php"; } } So while creating class you should take care of above practices. Object IN PHP Classes are useless without objects. Object is an instance of your class.If you have class then you need to create object of the class to solve your problem using class. You can create object of your class by using new keyword. $objClass = new myClass(); Now in above code you are creating object of class myClass in variable $objClass. You can create multiple object of your same class. Every object is different from other. $objClass1 = new myClass(); $objClass2 = new myClass(); To completely understand object Let us create full class and their object. Here I will create class for interest calculation and then I will create object of that class and calculate interest. //Creating class interestCalculator class interestCalculator { public $rate; public $duration; public $capital; public function calculateInterest() { return ($this->rate*$this->duration*$this->capital)/100; } } //Creating various object of class interestCalculator to calculate interest on various amount $calculator1 = new InterestCalculator(); $calculator2 = new InterestCalculator(); $calculator1->rate = 3; $calculator1->duration =2; $calculator1->capital = 300; $calculator2->rate = 3.2; $calculator2->duration =3; $calculator2->capital = 400; $interest1 = $calculator1->calculateInterest(); $interest2 = $calculator2->calculateInterest(); echo "Your interest on capital $calculator1->capital with rate $calculator1->rate for duration $calculator1->duration is $interest1 <br/> "; echo "Your interest on capital $calculator2->capital with rate $calculator2->rate for duration $calculator2->duration is $interest2 <br/> "; Please run above code in browser. You will get following output. Now please analyse above code carefully. We have created two object of interestCalculator class in variable $calculator1 and $calculator2. Now property value of both objects are different. for example $calculator1 capital is 300 and $calculator2 capital is 400. Whenever you will call calculateInterest function of the both object then they will calculate interest on their own properties. Now just analyse code of your class interestCalculator class interestCalculator { public $rate; public $duration; public $capital; public function calculateInterest() { return ($this->rate*$this->duration*$this->capital)/100; } } You can find that class has 3 variable or properties ($rate , $duration, $capital). Now look into function calculateInterest. In the body of the function we have used variable $this.$this is system defined object variable of the class. $this is object of self class in the current context. For the both object of interestCalculator class $this object is different. When you have object $calculator1 then $this->rate is 3 and in case of $calculator2 $this->rate is 3.2 public function calculateInterest() { $rate = 5; return ($this->rate*$this->duration*$this->capital)/100; } } In above function of class $this->rate and $rate is different. $this->rate will always has values assigned by the object of the class but $rate is fix value. If you will replace $this->rate to $rate your rate of interest will always be 5. public function calculateInterest() { $rate = 5; return ($this->rate*$this->duration*$this->capital)/100; } } You can create object of the class in some different way also. Following is some of the example of creating object of class. $className = 'interestCalculator'; $calc1 = new $className(); From php 5.3 onward you can create object of class $cls1 = new interestCalculator(); $cls2 = new $cls1; Constructor of Classes and Objects Constructor is nothing but a function defined in your php class. Constructor function automatically called when you will create object of the class. As soon as you will write $object = new yourClass() your constructor function of the class will be executed. In php4 you can create constructor by creating function with same name of your class. But from php5 you can also create constructor by defining magic function __construct. Please go through the blow example of the constructor. PHP 4 constructor(will work in php 5 also) class interestCalculator { var $rate; var $duration; var $capital; //Constructor of the class function interestCalculator() { $this->rate = 3; $this->duration = 4; } } PHP5 constructor class interestCalculator { public $rate; public $duration; public $capital; //Constructor of the class public function __construct() { $this->rate = 3; $this->duration = 4; } } In both whenever instance of the class will be created rate will be set to 3 and duration will be set to 4. But difference is in way of calling the constructor. In php4 you were limited to create constructor by creating function with same name of the class. But in php5 you can either create function with same name or create a function __construct to create constructor of the class. You can also pass parameter in the constructor. class interestCalculator { public $rate; public $duration; public $capital; //Constructor of the class public function __construct($rate , $duration) { $this->rate = $rate; $this->duration = $duration; } } $objCls = new interestCalculator(3.2 , 7) //passing value of $rate and $duration If you have created parameter in the constructor you need to pass value for them on the time of object creation. $objCls = new interestCalculator(3.2 , 7). If you will not send value php will throw error. Playing with visibility and other feature of the constructor Let us explore in depth of constructor for our classes and objects in php. All implantation is described here are considered only for php5. Did you noticed that I have created my constructor function public. If not then please go to above section and explore. Reason behind creating constructor function public is it is accessible from outside of the class. This function is executed when we are creating object. So php will aways through error if you will create your constructor private. Let us try below code: class interestCalculator { public $rate; public $duration; public $capital; //Constructor of the class private function __construct($rate , $duration) { $this->rate = $rate; $this->duration = $duration; } } $objCls = new interestCalculator(3.2 , 7); //passing value of $rate and $duration It will give you following output Fatal error: Call to private interestCalculator::__construct() from invalid context As you can define your constructor by creating function with same name of class(event in php5), if you will use following code then your output will be same: class interestCalculator { public $rate; public $duration; public $capital; //Constructor of the class private function interestCalculator($rate , $duration) { $this->rate = $rate; $this->duration = $duration; } } $objCls = new interestCalculator(3.2 , 7); Following error you will receive Fatal error: Call to private interestCalculator::interestCalculator() from invalid context So in short you can not make your constructor private. If you will make your constructor private then you will receive an error. Now Just think that you can define your constructor either by creating function with same name of the class or by crating function with name __construct. Now what happen if you will use both thing in your single class. Let us try this code: class test { public function __construct() { echo 1; } function test() { echo 2; } } $t = new test();//Output will be 1 It will give you output 1. Means your __construct function is called. So if you have __construct then it will be the first preference. If __construct function is not present then it will search for the function with the same name of class. Think what happen in case if you have both option and your function __construct is private. Try this code class test { private function __construct() { echo 1; } function test() { echo 2; } } $t = new test(); You will get following error: Fatal error: Call to private test::__construct() from invalid context Best Practice of Classes and Objects Following are some best practice of using classes and objects in your application. 1. Instead of assigning variable of the classes after creating object it is good if you use constructor. 2. Use visibility as required. Do not make your variable and method either more secure or completely open. Over security will effect your flexibility, under security will distrust your structure. 3. Follow some convention in your classes and objects. Like start all public method with camel case, all protected method and varaible prefix with _ etc. It will give you better visibility. 4. Do not try to do every thing in single class. Create class very specific to your requirement. It will same your time and execution. 5. Always try to create every class in separate file and follow some naming convention. New Features in PHP 5.4 February 23, 2013 by Ankur Kumar Singh 3 Comments PHP 5.4 is a major release after PHP 5.3. PHP community has made very good efforts and introduced some new features in php 5.4. Some of the feature decided to release in PHP6 also introduced in PHP 5.4 and they have remove various developer tedious thins in new version of PHP. In my Post New Features in PHP 5.4 I will describe some of the major changes made in php 5.4. Before going for discussion if you would like to upgrade your PHP from older verion to 5.4 you can read my previous post for Upgrade PHP in XAMPP. After installing php 5.4 your phpinfo page will look like this. New Features in PHP 5.4 First of all I will provide you overview here for the all changes made in PHP 4.5 and then I will describe the major changes. New Features in PHP 5.4 Officially PHP 5.4.0 is released at 1 March 2012. But it become quickly popular due to some exciting features. As per the php community following point is made in php 5.4 Following legacy features has been removed break/continue syntax. Safe mode and related ini options register_globals and register_long_arrays options from INI. highlight.bg option from INI. session_is_registered(), session_register() and session_unregister() functions is removed. Support for putenv(“TZ=..”) for setting the timezone. Following are some improvement made in PHP 5.4 Added Short array syntex support Added support for class expression like Class::{expr}() Removed compile time dependency from extension mbstring Traits introduced Closure $this support back Callable Typehinting Improved ternary operator performance in case of array Zend Engine Improvement Improvement in some core function CLI webserver introduced Some improvement in curl and file system function. For complete list of changes you can go to PHP website change log page PHP 5.4 Change Log. Now let us dicuss some major changes along with code in php 5.4. Performance Improvement in PHP 5.4 In php5.5 community has tried their best to improve performance of php. Although all improvement in php always deponds upon how you have code. However for the same code I have tested on both PHP 5.3 and PHP 5.4 and found that PHP 5.4 is around 30% faster than older released version in terms of memory utilization performance. And also about speed I have tested zend frameword and it performance is great when we are running on php 5.4. So if your website or web application is little bit slow. Upgrade your PHP version. It will really work for you. Fast CGI request handler is also faster in PHP 5.4. Now in new version class, function and constant cache has been introduced. It will really help if you website is build on OOP paradigm. Traits Traits is a new major feature introduced in new php version release. As per Rasmus Lerdorf(main creator of php) it is compiler assisted copy paste method in PHP. This is completly new thing/feature released in php 5.4. By help of trait we can reuse our code in PHP. In general terms traits is process of reusing code in single inheritance language. Structure is similar to class and we can use it to grouped functionality. However we can not directly initiate traits. In other programming traits is implemented with name mixins. Below is the example of traits <?php trait global_class_functions { public function helloword() { echo 'this is trait helloword'; } } class base { use global_class_functions; } $objBase = new base(); $objBase->helloword(); // print 'this is trai helloword' ?> Callable Typehinting In release of PHP 5.4 callable typehinting is released. Callable typehinting is very cool feature in this new version of php. This is benificial for the people how always want to make their function tightly type cased. Following is the sample code of callable typehinting. <?php function test_callback_function() { return 123; } class cls { public function mthd() { return 456; } } function test_callable(callable $a) { return $a(); } echo test_callable(test_callback_function);// Imprementation of callable using function, return 123 echo test_callable(['cls' , 'mthd']);//implementation of callable for static implementation of class echo test_callable([(new cls),'mthd']); // Implementation of callable for object implementation of class ?> Short Array Syntax Short array syntax is already very popular method of defining and declaring array in other programming language. Now in new release PHP community has also released implementation of short array syntax. Now you can define your array in following way also in php: <?php $arr1=[1,2,'test']; print_r($arr1); $arr2=['a'=>1,'b'=>'Ankur']; print_r($arr2); ?> Is not it version easy in new PHP version to declare array Mix array function(Function array de-referencing) PHP has introduced the direct referencing of the array if function is returning the array value. From this method you can directly get the array value returned by function without declaring the extra variable. In earlier version of PHP you were first taking output of function in an array in case of array return. And then you were getting the value of array. But from new version you can directly get this variable. Following is an example of the array referencing of the function return: <?php function test_array_ref() { return [1,2,3,4]; } function test_ref_2() { return ['one'=>'Ankur' , 'two'=>'puttul']; } echo test_array_ref()[3]; //return 4 echo test_ref_2()['one']; //return Ankur ?> __invoke to Object as function New magic function __invoke has been introduced in PHP 5.4 by which you can use object as function. Whenever you will print the object of any class then it will automatically execute __invoke function. So whenever you would like to print any thing from your class object then you can go for __invoke magic method. In the below example I am calculating area of the rectangle via object as function. Below is the code for the object as function <?php class clsRect { private $height; private $width; //Constructor function __construct($height, $width) { $this->height = $height; $this->width = $width; } // Magic function invoke function __invoke() { echo $this->height * $this->width; } } //Implementation of __invoke $objRect = new clsRect(10,25); $objRect(); ?> Interface for Json and Session Handling Two new interface introduced in new version of PHP. For the json interface name JsonSerializable and for session SessionHandlerInterface is introduced in the release PHP 5.4. Interface SessionHandlerInterface is a native PHP interface. By SessionHandlerInterface you can handle PHP session using your class. Only you need to do is to implement this interface. Following is the example of the interface sessionhandleInterface implementation: <?php class CustomSessionHandler implements SessionHandlerInterface { private $sessionPatch ; public function open($sessionPatch , $sessionName) { $this->sessionPatch = $sessionPatch; if (!is_dir($this->sessionPatch)) { mkdir($this->sessionPatch, 0777); } return true; } public function close() { return true; } public function read($id) { return (string)@file_get_contents("$this>sessionPatch/sess_$id"); } public function write($id, $data) { return file_put_contents("$this->sessionPatch/sess_$id", $data) === false ? false : true; } public function destroy($id) { $file = "$this->sessionPatch/sess_$id"; if (file_exists($file)) { unlink($file); } return true; } public function gc($maxlifetime) { foreach (glob("$this->sessionPatch/sess_*") as $file) { if (filemtime($file) + $maxlifetime < time() && file_exists($file)) { unlink($file); } } return true; } } $objSess = new CustomSessionHandler(); session_set_save_handler($objSess, true); session_start(); $_SESSION['a'] = 2; ?> Web server built-in for CLI In version PHP 5.4 features of built-in webserver for cli has been introduced. Built-in CLI webserver PHP 5.4 CLI(Command line interface) web server is not recommended to use in the production environment . However it completly deponds upon your chose. I am using this and noting harm has been detected. It is build for testing some script in the easy way from command like and Linux geek. Comparison Between WordPress Joomla and Drupal October 14, 2012 by Ankur Kumar Singh 13 Comments This “Comparison Between WordPress Joomla and Drupal” is my first post on the techflirt.com. For last 1 week I was comparing all CMS and was trying to figure out the best on for my need. I have developed lots of website on joomla, drupal and wordpress. I have good experience with these(Joomla, Drupal and WordPress) and have good understanding of the feature available in all 3 CMS. All 3 CMS are good but when it comes to take a decision for the website where you have vision to provide the out of box services then it is bit tough. In this article First I will demonstrate the best suitable feature provided by all these 3 cms and then I will share my though on how to take decision. Drupal : Drupal is not only an opensource CMS tool but also a very good CMS framework. It provides you a lots of feature of creating an optimized CMS site. You can easily setup a site in very flexible way. View and CCK in the drupal make it very powerful. You can create your cms in your own way in drupal with CCK and view. Structured code of the drupal makes it very reliable. Drupal has very powerful community support. Following is the key feature of the drupal. 1. Nodes 2. Taxonomy 3. View 4. CCK 5. Extensive access control 6. Best theme integration Above feature are very rich in terms of the content management system. These feature makes drupal not only CMS tool but also the CMS framework. The basic difference between CMS tool and framework is, a cms framework is a tool by which you can create cms tool for your self. So Drupal is feature rich and flexible cms framework by which one can create a good cms tool for himself. Drupal has good community support also. You can get a very good documentation at drupal.org and also there are lots of forum around the web on drupal. I am very much sure that if you have some problem in your drupal people on the web has answer. Drupal has super strong community support. You can get lots of Drupal module in free of cost. Also there are lots of paid module are available. If you are looking for the developer to maintain your drupal website you can very easily get them and in very competitive price. Now the other part of story. I have already mentioned that drupal is very feature rich content management framework. So it is bit complex. Drupal view is good if you can create it in correct way otherwise it will hamper the website performance. Drupal need some extra care from your side because it is very feature rich. You need to hire a good developer to integrate your website design(we commonly call it “theme”) in drupal. Joomla: Joomal is a very very good opensource CMS tool. If you have proper setup of the joomla you can forget anything about your cms tool and can concentrate on your content management. Joomla is mid level content management tool. It has pre-built good sets of feature to manage your website content in various way. Joomla has specified sets on the content presentation style which almost suit every business. Even you can display your website content in blog design or you can very easily change it to magazine website style. So joomla is not as big as drupal but its generic sets of features make it the best CMS tool. Usability of joomla is very easy. No need to get any special training to run a joomla based website. It is very easy to use for the webmasters. Joomla has good community support and very good documentation on joomla.org. You can very easily get any module to extend joomla. It has very good paid support and service available in joomla. Webmaster can go over the internet and can get that there are lots of developer available in joomla. Cost of maintenance of your website in joomla is bit lower then drupal. Sets of feature in joomla is also bit lower than drupal. As I have already mentioned that it is mid level content management tool. WordPress: First of all let me clear that the wordpress is not a generic content management tool(cms) it is a cms for the blogging sites or web blog. Lot of people thinks that it is a CMS but in reality it is not. One can use as cms is the different thing. WordPress is a very very easy to use CMS for blog website. It is designed to taking blog website in the consideration. Some people use wordpress as their website CMS because it suits their requirement. Because in small website people need category navigation, specific style content presentation, a backend by which they can post. And all these are available in wordpress. If you talk in terms of blog management tool then it is very feature rich cms but if you are talking about generic CMS feature it does not have lots of features. WordPress has super community support. You can get better documentation everywhere for wordress. Also it is very easy to use tool and even you can login into admin panel and can start managing your website even without going for the documentation. I personally love wordpress for its usability and efficiency. My Personal advice For you on CMS selection If you have a big-big website and want to use CMS tool then go for drupal. Drupal has good infrastructure to manage big site very efficiently. But drupal require good development task and need some maintenance. So I can say that if you have good money and long vision for your website then go for Drupal. Also if your business plan changes frequently and require frequent changes in your website design and workflow then again Drupal is the best choice. In nutshell drupal is f0r the big website. If you are conceren about your website administration security within your company then drupal provide very rich set of access control list(ACL). ACL of Drupal make it more secure. If you have a website is mid level typical content serving website and it your very stable business concept which is not going to change. Go for Joomla. Joomla specially suitable for the mid level publisher. It has required sets of feature to run a normal CMS website. If your content security and website visiting is is on mid level then you can go for Joomla. Maintanence cost of your website is also bit lower then drupal. So if your pocket is bit light and have vision for a good website you can choose joomla. It is not premium but not less then premium. If you are running a small website or blog, only go for wordpress. WordPress is cost effective solution for all your website need. But if you need a different level of security, different content placement pattern then this is not a good solution. If your concept changes quickly and content display pattern alos getting changes it is not suitable because you have to ask your developer to change theme. So wordpress is fantastic for bloggers and people who want to run a website in very nominal cost. Phishing is a fraudulent attempt, usually made through email, to steal your personal information. The best way to protect yourself from phishing is to learn how to recognize a phish. What are advantages of DBMS over traditional file based systems? Ans: Database management systems were developed to handle the following difficulties of typical file-processing systems supported by conventional operating systems. 1.Data redundancy and inconsistency 2. Difficulty in accessing data 3. Data isolation – multiple files and formats 4. Integrity problems 5. Atomicity of updates 6.Concurrent access by multiple users 7. Security problems Source: http://cs.nyu.edu/courses/spring01/G22.2433-001/mod1.2.pdf What are super, primary, candidate and foreign keys? Ans: A superkey is a set of attributes of a relation schema upon which all attributes of the schema are functionally dependent. No two rows can have the same value of super key attributes. A Candidate key is minimal superkey, i.e., no proper subset of Candidate key attributes can be a superkey. A Primary Key is one of the candidate keys. One of the candidate keys is selected as most important and becomes the primary key. There cannot be more that one primary keys in a table. Foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table. See this for an example. What is the difference between primary key and unique constraints? Ans: Primary key cannot have NULL value, the unique constraints can have NULL values. There is only one primary key in a table, but there can be multiple unique constrains. What is database normalization? Ans: It is a process of analyzing the given relation schemas based on their functional dependencies and primary keys to achieve the following desirable properties: 1) Minimizing Redundancy 2) Minimizing the Insertion, Deletion, And Update Anomalies Relation schemas that do not meet the properties are decomposed into smaller relation schemas that could meet desirable properties. Source: http://cs.tsu.edu/ghemri/CS346/ClassNotes/Normalization.pdf What is SQL? SQL is Structured Query Language designed for inserting and modifying in a relational database system. What are the differences between DDL, DML and DCL in SQL? Ans: Following are some details of three. DDL stands for Data Definition Language. SQL queries like CREATE, ALTER, DROP and RENAME come under this. DML stands for Data Manipulation Language. SQL queries like SELECT, INSERT and UPDATE come under this. DCL stands for Data Control Language. SQL queries like GRANT and REVOKE come under this. What is the difference between having and where clause? Ans: HAVING is used to specify a condition for a group or an aggregate function used in select statement. The WHERE clause selects before grouping. The HAVING clause selects rows after grouping. Unlike HAVING clause, the WHERE clause cannot contain aggregate functions. (See this for examples) What is Join? Ans: An SQL Join is used to combine data from two or more tables, based on a common field between them. For example, consider the following two tables. Student Table EnrollNo StudentName Address 1000 geek1 geeksquiz1 1001 geek2 geeksquiz2 1002 geek3 geeksquiz3 StudentCourse Table CourseID EnrollNo 1 1000 2 1000 3 1000 1 1002 2 1003 Following is join query that shows names of students enrolled in different courseIDs. SELECT StudentCourse.CourseID, Student.StudentName FROM StudentCourse INNER JOIN Customers ON StudentCourse.EnrollNo = Student.EnrollNo ORDER BY StudentCourse.CourseID; The above query would produce following result. CourseID StudentName 1 geek1 1 geek2 2 geek1 2 geek3 3 geek1 What is Identity? Ans: Identity (or AutoNumber) is a column that automatically generates numeric values. A start and increment value can be set, but most DBA leave these at 1. A GUID column also generates numbers; the value of this cannot be controlled. Identity/GUID columns do not need to be indexed. What is a view in SQL? How to create one Ans: A view is a virtual table based on the result-set of an SQL statement. We can create using create view syntax. CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition What are the uses of view? 1. Views can represent a subset of the data contained in a table; consequently, a view can limit the degree of exposure of the underlying tables to the outer world: a given user may have permission to query the view, while denied access to the rest of the base table. 2. Views can join and simplify multiple tables into a single virtual table 3. Views can act as aggregated tables, where the database engine aggregates data (sum, average etc.) and presents the calculated results as part of the data 4. Views can hide the complexity of data; for example a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table 5. Views take very little space to store; the database contains only the definition of a view, not a copy of all the data which it presentsv. 6. Depending on the SQL engine used, views can provide extra security Source: Wiki Page What is a Trigger? Ans: A Trigger is a code that associated with insert, update or delete operations. The code is executed automatically whenever the associated query is executed on a table. Triggers can be useful to maintain integrity in database. What is a stored procedure? Ans: A stored procedure is like a function that contains a set of operations compiled together. It contains a set of operations that are commonly used in an application to do some common database tasks. What is the difference between Trigger and Stored Procedure? Ans: Unlike Stored Procedures, Triggers cannot be called directly. They can only be associated with queries. What is a transaction? What are ACID properties? Ans: A Database Transaction is a set of database operations that must be treated as whole, means either all operations are executed or none of them. An example can be bank transaction from one account to another account. Either both debit and credit operations must be executed or none of them. ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. What are indexes? Ans: A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and the use of more storage space to maintain the extra copy of data. Data can be stored only in one order on disk. To support faster access according to different values, faster search like binary search for different values is desired, For this purpose, indexes are created on tables. These indexes need extra space on disk, but they allow faster search according to different frequently searched values. What are clustered and non-clustered Indexes? Ans: Clustered indexes is the index according to which data is physically stored on disk. Therefore, only one clustered index can be created on a given database table. Non-clustered indexes don’t define physical ordering of data, but logical ordering. Typically, a tree is created whose leaf point to disk records. B-Tree or B+ tree are used for this purpose. We will soon be covering more DBMS questions. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above. In SQL, what’s the difference between an inner and outer join? Joins are used to combine the data from two tables, with the result being a new, temporary table. The temporary table is created based on column(s) that the two tables share, which represent meaningful column(s) of comparison. The goal is to extract meaningful data from the resulting temporary table. Joins are performed based on something called a predicate, which specifies the condition to use in order to perform a join. A join can be either an inner join or an outer join, depending on how one wants the resulting table to look. It is best to illustrate the differences between inner and outer joins by use of an example. Here we have 2 tables that we will use for our example: Employee Location EmpID EmpName EmpID EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India 25 Johnson 39 Bangalore, India It’s important to note that the very last row in the Employee table does not exist in the Employee Location table. Also, the very last row in the Employee Location table does not exist in the Employee table. These facts will prove to be significant in the discussion that follows. Outer Joins Let’s start the explanation with outer joins. Outer joins can be be further divided into left outer joins, right outer joins, and full outer joins. Here is what the SQL for a left outer join would look like, using the tables above: select * from employee left outer join location on employee.empID = location.empID; Subscribe to our newsletter for more free interview questions. In this SQL we are joining on the condition that the employee ID’s match in the rows tables. So, we will be essentially combining 2 tables into 1, based on the condition that the employee ID’s match. Note that we can get rid of the "outer" in left outer join, which will give us the SQL below. This is equivalent to what we have above. select * from employee left join location on employee.empID = location.empID; A left outer join retains all of the rows of the left table, regardless of whether there is a row that matches on the right table. The SQL above will give us the result set shown below. Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India 25 Johnson NULL NULL The Join Predicate – a geeky term you should know Earlier we had mentioned something called a join predicate. In the SQL above, the join predicate is "on employee.empID = location.empID". This is the heart of any type of join, because it determines what common column between the 2 tables will be used to "join" the 2 tables. As you can see from the result set, all of the rows from the left table are returned when we do a left outer join. The last row of the Employee table (which contains the "Johson" entry) is displayed in the results even though there is no matching row in the Location table. As you can see, the non-matching columns in the last row are filled with a "NULL". So, we have "NULL" as the entry wherever there is no match. A right outer join is pretty much the same thing as a left outer join, except that the rows that are retained are from the right table. This is what the SQL looks like: select * from employee right outer join location on employee.empID = location.empID; // taking out the "outer", this also works: select * from employee right join location on employee.empID = location.empID; Using the tables presented above, we can show what the result set of a right outer join would look like: Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India NULL NULL 39 Bangalore, India We can see that the last row returned in the result set contains the row that was in the Location table, but not in the Employee table (the "Bangalore, India" entry). Because there is no matching row in the Employee table that has an employee ID of "39", we have NULL’s in the result set for the Employee columns. Inner Joins Now that we’ve gone over outer joins, we can contrast those with the inner join. The difference between an inner join and an outer join is that an inner join will return only the rows that actually match based on the join predicate. Once again, this is best illustrated via an example. Here’s what the SQL for an inner join will look like: select * from employee inner join location on employee.empID = location.empID This can also be written as: select * from employee, location where employee.empID = location.empID Now, here is what the result of running that SQL would look like: Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India Inner vs Outer Joins We can see that an inner join will only return rows in which there is a match based on the join predicate. In this case, what that means is anytime the Employee and Location table share an Employee ID, a row will be generated in the results to show the match. Looking at the original tables, one can see that those Employee ID’s that are shared by those tables are displayed in the results. But, with a left or right outer join, the result set will retain all of the rows from either the left or right tab In SQL, what are the differences between primary, foreign, and unique keys? The one thing that primary, unique, and foreign keys all have in common is the fact that each type of key can consist of more than just one column from a given table. In other words, foreign, primary, and unique keys are not restricted to having just one column from a given table – each type of key can cover multiple columns. So, that is one feature that all the different types of keys share – they can each be comprised of more than just one column, which is something that many people in software are not aware of. Of course, the database programmer is the one who will actually define which columns are covered by a foreign, primary, or unique key. That is one similarity all those keys share, but there are also some major differences that exist between primary, unique, and foreign keys. We will go over those differences in this article. But first, we want to give a thorough explanation of why foreign keys are necessary in some situations. What is the point of having a foreign key? Foreign keys are used to reference unique columns in another table. So, for example, a foreign key can be defined on one table A, and it can reference some unique column(s) in another table B. Why would you want a foreign key? Well, whenever it makes sense to have a relationship between columns in two different tables. An example of when a foreign key is necessary Suppose that we have an Employee table and an Employee Salary table. Also assume that every employee has a unique ID. The Employee table could be said to have the ‘master list’ of all Employee ID’s in the company. But, if we want to store employees salaries in another table, then do we want to recreate the entire master list of employee ID’s in the Employee Salary table as well? No – we don’t want to do that because it’s inefficient. It would make a lot more sense to just define a relationship between an Employee ID column in the Employee Salary table and the “master” Employee ID column in the Employee table – one where the Employee Salary table can just reference the employee ID in the Employee table. This way, whenever someone’s employee ID is updated in the Employee table, it will also automatically get updated in the Employee Salary table. Sounds good right? So now, nobody has to manually update the employee ID’s in the Employee Salary table every time the ID is update in the master list inside the Employee table. And, if an employee is removed from the Employee table, he/she will also automatically be removed (by the RDBMS) from the Employee Salary table – of course all of this behavior has to be defined by the database programmer, but hopefully you get the point. Foreign keys and referential integrity Foreign keys have a lot to do with the concept of referential integrity. What we discussed in the previous paragraph are some of the principles behind referential integrity. You can and should read a more in depth article on that concept here: Referential integrity explained. Can a table have multiple unique, foreign, and/or primary keys? A table can have multiple unique and foreign keys. However, a table can have only one primary key. Can a unique key have NULL values? Can a primary key have NULL values? Unique key columns are allowed to hold NULL values. The values in a primary key column, however, can never be NULL. Can a foreign key reference a non-primary key? Yes, a foreign key can actually reference a key that is not the primary key of a table. But, a foreign key must reference a unique key. Can a foreign key contain null values? Yes, a foreign key can hold NULL values. Because foreign keys can reference unique, non-primary keys – which can hold NULL values – this means that foreign keys can themselves hold NULL values as well. Some other differences between foreign, primary, and unique keys While unique and primary keys both enforce uniqueness on the column(s) of one table, foreign keys define a relationship between two tables. A foreign key identifies a column or group of columns in one (referencing) table that refers to a column or group of columns in another (referenced) table – in our example above, the Employee table is the referenced table and the Employee Salary table is the referencing table. As we stated earlier, both unique and primary keys can be referenced by foreign keys. http://www.programmerinterview.com/index.php/database-sql/simple-key-in-sql/ What is a simple key in a dbms? In a database table, a simple key is just a single attribute (which is just a column) that can uniquely identify a row. So, any single column in a table that can uniquely identify a row is a simple key. The reason it’s called a simple key is because of the fact that it is simple in the sense that it’s just composed of one column (as opposed to multiple columns) and that’s it. Example of a simple key Let’s go through an example of a simple key. Consider a table called Employees. If every employee has a unique ID and a column called EmployeeID, then the EmployeeID column would be considered a simple key because it’s a single column that can uniquely identify every row in the table (where each row is a separate employee). Simple isn’t it? What is the definition of a secondary key? You may have heard the term secondary key in Oracle, MySQL, SQL Server, or whatever other dbms you are dealing with. What exactly is a secondary key? Let’s start with a definition, and then a simple example that will help you understand further. A given table may have more than just one choice for a primary key. Basically, there may be another column (or combination of columns for a multi-column primary key) that qualify as primary keys. Any combination of column(s) that may qualify to be a primary key are known as candidate keys. This is because they are considered candidates for the primary key. And the options that are not selected to be the primary key are known as secondary keys. Example of a Secondary Key in SQL Let’s go through an example of a secondary key. Consider a table called Managers that stores all of the managers in a company. Each manager has a unique Manager ID Number, a physical address, and an email address. Let’s say that the Manager ID is chosen to be the primary key of the Managers table. Both the physical address and email address could have been selected as the primary key, because they are both unique fields for every manager row in the Managers table. But, because the email address and physical address were not selected as the primary key, they are considered to be secondary keys. Provide a definition and example of a superkey in SQL. In SQL, the definition of a superkey is a set of columns in a table for which there are no two rows that will share the same combination of values. So, the superkey is unique for each and every row in the table. A superkey can also be just a single column. Example of a superkey Suppose we have a table that holds all the managers in a company, and that table is called Managers. The table has columns called ManagerID, Name, Title, and DepartmentID. Every manager has his/her own ManagerID, so that value is always unique in each and every row. This means that if we combine the ManagerID column value for any given row with any other column value, then we will have a unique set of values. So, for the combinations of (ManagerID, Name), (ManagerID, TItle), (ManagerID, DepartmentID), (ManagerID, Name, DepartmentID), etc – there will be no two rows in the table that share the exact same combination of values, because the ManagerID will always be unique and different for each row. This means that pairing the Manager ID with any other column(s) will ensure that the combination will also be unique across all rows in the table. And that is exactly what defines a superkey – it’s any combination of column(s) for which that combination of values will be unique across all rows in a table. So, all of those combinations of columns in the Manager table that we gave earlier would be considered to be superkeys. Even the ManagerID column is considered to be a superkey, although a special type of superkey as you can read more about below. What is a minimal superkey? A minimal superkey is the minimum number of columns that can be used to uniquely identify a single row. In other words, the minimum number of columns, which when combined, will give a unique value for every row in the table. Remember that we mentioned earlier that a superkey can be just a single column. So, in our example above, the minimal superkey would be the ManagerID since it is unique for each and every row in the Manager table. Can a table have multiple minimal superkeys? Yes, a table can have multiple minimal superkeys. Let use our example of a Manager table again. Suppose we add another column for the Social Security Number (which, for our non-American readers, is a unique 9 digit number assigned to every citizen of the USA) to the Manager table – let’s just call it SSN. Since that column will clearly have a unique value for every row in the table, it will also be a minimal superkey – because it’s only one column and it also is unique for every row. Can a minimal superkey have more than one column? Absolutely. If there is no single column that is unique for every row in a given table, but there is a combination of columns that produce a unique value for every row in a table, then that combination of columns would be the minimal superkey. This is of course provided that the combination is the smallest number of columns necessary to produce a unique value for each row. Why is it called a superkey? It’s called a superkey because it comes from RDBMS theory, as in superset and subset. So, a superkey is essentially all the superset combinations of keys, which will of course uniquely identify a row in a table. Superkey versus candidate key We discussed minimal superkeys and defined exactly what they are. Candidate keys are actually minimal superkeys – so both candidate keys and minimal superkeys mean exactly the same thing. What’s referential integrity? Referential integrity is a relational database concept in which multiple tables share a relationship based on the data stored in the tables, and that relationship must remain consistent. The concept of referential integrity, and one way in which it’s enforced, is best illustrated by an example. Suppose company X has 2 tables, an Employee table, and an Employee Salary table. In the Employee table we have 2 columns – the employee ID and the employee name. In the Employee Salary table, we have 2 columns – the employee ID and the salary for the given ID. Now, suppose we wanted to remove an employee because he no longer works at company X. would remove his entry in the Employee table. Because he also exists in the Employee Salary would also have to manually remove him from there also. Manually removing the employee Employee Salary table can become quite a pain. And if there are other tables in which Company X employee then he would have to be deleted from those tables as well – an even bigger pain. Then, we table, we from the uses that By enforcing referential integrity, we can solve that problem, so that we wouldn’t have to manually delete him from the Employee Salary table (or any others). Here’s how: first we would define the employee ID column in the Employee table to be our primary key. Then, we would define the employee ID column in the Employee Salary table to be a foreign key that points to a primary key that is the employee ID column in the Employee table. Once we define our foreign to primary key relationship, we would need to add what’s called a ‘constraint’ to the Employee Salary table. The constraint that we would add in particular is called a ‘cascading delete’ – this would mean that any time an employee is removed from the Employee table, any entries that employee has in the Employee Salary table would alsoautomatically be removed from the Employee Salary table. Note in the example given above that referential integrity is something that must beenforced, and that we enforced only one rule of referential integrity (the cascading delete). There are actually 3 rules that referential integrity enforces: 1.We may not add a record to the Employee Salary table unless the foreign key for that record points to an existing employee in the Employee table. 2.If a record in the Employee table is deleted, all corresponding records in the Employee Salary table must be deleted using a cascading delete. This was the example we had given earlier. 3.If the primary key for a record in the Employee table changes, all corresponding records in the Employee Salary table must be modified using what's called a cascading update. It’s worth noting that most RDBMS’s – relational databases like Oracle, DB2, Teradata, etc. – can automatically enforce referential integrity if the right settings are in place. But, a large part of the burden of maintaining referential integrity is placed upon whoever designs the database schema – basically whoever defined the tables and their corresponding structure/relationships in the database that you are using. Referential integrity is an important concept and you simply must know it for any programmer interview. In SQL, what’s the difference between the having clause and the where clause? The difference between the having and where clause is best illustrated by an example. Suppose we have a table called emp_bonus as shown below. Note that the table has multiple entries for employees A and B. emp_bonus Employee Bonus A 1000 B 2000 A 500 C 700 B 1250 If we want to calculate the total bonus that each employee received, then we would write a SQL statement like this: select employee, sum(bonus) from emp_bonus group by employee; The Group By Clause In the SQL statement above, you can see that we use the "group by" clause with the employee column. What the group by clause does is allow us to find the sum of the bonuses for each employee. Using the ‘group by’ in combination with the ‘sum(bonus)’ statement will give us the sum of all the bonuses for employees A, B, and C. Subscribe to our newsletter for more free interview questions. Running the SQL above would return this: Employee Sum(Bonus) A 1500 B 3250 C 700 Now, suppose we wanted to find the employees who received more than $1,000 in bonuses for the year of 2007. You might think that we could write a query like this: BAD SQL: select employee, sum(bonus) from emp_bonus group by employee where sum(bonus) > 1000; The WHERE clause does not work with aggregates like SUM The SQL above will not work, because the where clause doesn’t work with aggregates – like sum, avg, max, etc.. Instead, what we will need to use is the having clause. The having clause was added to sql just so we could compare aggregates to other values – just how the ‘where’ clause can be used with non-aggregates. Now, the correct sql will look like this: GOOD SQL: select employee, sum(bonus) from emp_bonus group by employee having sum(bonus) > 1000; Difference between having and where clause So we can see that the difference between the having and where clause in sql is that the where clause can not be used with aggregates, but the having clause can. One way to think of it is that the having clause is an additional filter to the where clause. How do database indexes work? And, how do indexes help? Provide a tutorial on database indexes. Let’s start out our tutorial and explanation of why you would need a database index by going through a very simple example. Suppose that we have a database table called Employee with three columns – Employee_Name, Employee_Age, and Employee_Address. Assume that the Employee table has thousands of rows. Now, let’s say that we want to run a query to find all the details of any employees who are named ‘Jesus’? So, we decide to run a simple query like this: SELECT * FROM Employee WHERE Employee_Name = 'Jesus' What would happen without an index on the table? Once we run that query, what exactly goes on behind the scenes to find employees who are named Jesus? Well, the database software would literally have to look at every single row in the Employee table to see if the Employee_Name for that row is ‘Jesus’. And, because we want every row with the name ‘Jesus’ inside it, we can not just stop looking once we find just one row with the name ‘Jesus’, because there could be other rows with the name Jesus. So, every row up until the last row must be searched – which means thousands of rows in this scenario will have to be examined by the database to find the rows with the name ‘Jesus’. This is what is called a full table scan. How a database index can help performance You might be thinking that doing a full table scan sounds inefficient for something so simple – shouldn’t software be smarter? It’s almost like looking through the entire table with the human eye – very slow and not at all sleek. But, as you probably guessed by the title of this article, this is where indexes can help a great deal. The whole point of having an index is to speed up search queries by essentially cutting down the number of records/rows in a table that need to be examined. What is an index? So, what is an index? Well, an index is a data structure (most commonly a B- tree) that stores the values for a specific column in a table. An index is created on a column of a table. So, the key points to remember are that an index consists of column values from one table, and that those values are stored in a data structure. The index is a data structure – remember that. Subscribe to our newsletter for more free interview questions. What kind of data structure is an index? B- trees are the most commonly used data structures for indexes. The reason B- trees are the most popular data structure for indexes is due to the fact that they are time efficient – because look-ups, deletions, and insertions can all be done in logarithmic time. And, another major reason B- trees are more commonly used is because the data that is stored inside the B- tree can be sorted. The RDBMS typically determines which data structure is actually used for an index. But, in some scenarios with certain RDBMS’s, you can actually specify which data structure you want your database to use when you create the index itself. How does a hash table index work? Hash tables are another data structure that you may see being used as indexes – these indexes are commonly referred to as hash indexes. The reason hash indexes are used is because hash tables are extremely efficient when it comes to just looking up values. So, queries that compare for equality to a string can retrieve values very fast if they use a hash index. For instance, the query we discussed earlier (SELECT * FROM Employee WHERE Employee_Name = ‘Jesus’) could benefit from a hash index created on the Employee_Name column. The way a hash index would work is that the column value will be the key into the hash table and the actual value mapped to that key would just be a pointer to the row data in the table. Since a hash table is basically an associative array, a typical entry would look something like “Jesus => 0×28939″, where 0×28939 is a reference to the table row where Jesus is stored in memory. Looking up a value like “Jesus” in a hash table index and getting back a reference to the row in memory is obviously a lot faster than scanning the table to find all the rows with a value of “Jesus” in the Employee_Name column. The disadvantages of a hash index Hash tables are not sorted data structures, and there are many types of queries which hash indexes can not even help with. For instance, suppose you want to find out all of the employees who are less than 40 years old. How could you do that with a hash table index? Well, it’s not possible because a hash table is only good for looking up key value pairs – which means queries that check for equality (like “WHERE name = ‘Jesus’”). What is implied in the key value mapping in a hash table is the concept that the keys of a hash table are not sorted or stored in any particular order. This is why hash indexes are usually not the default type of data structure used by database indexes – because they aren’t as flexible as B- trees when used as the index data structure. Also see: Binary trees versus Hash Tables. What are some other types of indexes? Indexes that use a R- tree data structure are commonly used to help with spatial problems. For instance, a query like “Find all of the Starbucks within 2 kilometers of me” would be the type of query that could show enhanced performance if the database table uses a R- tree index. Another type of index is a bitmap index, which work well on columns that contain Boolean values (like true and false), but many instances of those values – basically columns with lowselectivity. How does an index improve performance? Because an index is basically a data structure that is used to store column values, looking up those values becomes much faster. And, if an index is using the most commonly used data structure type – a B- tree – then the data structure is also sorted. Having the column values be sorted can be a major performance enhancement – read on to find out why. Let’s say that we create a B- tree index on the Employee_Name column This means that when we search for employees named “Jesus” using the SQL we showed earlier, then the entire Employee table does not have to be searched to find employees named “Jesus”. Instead, the database will use the index to find employees named Jesus, because the index will presumably be sorted alphabetically by the Employee’s name. And, because it is sorted, it means searching for a name is a lot faster because all names starting with a “J” will be right next to each other in the index! It’s also important to note that the index also stores pointers to the table row so that other column values can be retrieved – read on for more details on that. What exactly is inside a database index? So, now you know that a database index is created on a column in a table, and that the index stores the values in that specific column. But, it is important to understand that a database index does not store the values in the other columns of the same table. For example, if we create an index on the Employee_Name column, this means that the Employee_Age and Employee_Address column values are not also stored in the index. If we did just store all the other columns in the index, then it would be just like creating another copy of the entire table – which would take up way too much space and would be very inefficient. An index also stores a pointer to the table row So, the question is if the value that we are looking for is found in an index (like ‘Jesus’) , how does it find the other values that are in the same row (like the address of Jesus and his age)? Well, it’s quite simple – database indexes also store pointers to the corresponding rows in the table. A pointer is just a reference to a place in memory where the row data is stored on disk. So, in addition to the column value that is stored in the index, a pointer to the row in the table where that value lives is also stored in the index. This means that one of the values (or nodes) in the index for an Employee_Name could be something like (“Jesus”, 0×82829), where 0×82829 is the address on disk (the pointer) where the row data for “Jesus” is stored. Without that pointer all you would have is a single value, which would be meaningless because you would not be able to retrieve the other values in the same row – like the address and the age of an employee. How does a database know when to use an index? When a query like “SELECT * FROM Employee WHERE Employee_Name = ‘Jesus’ ” is run, the database will check to see if there is an index on the column(s) being queried. Assuming the Employee_Name column does have an index created on it, the database will have to decide whether it actually makes sense to use the index to find the values being searched – because there are some scenarios where it is actually less efficient to use the database index, and more efficient just to scan the entire table. Read this article to understand more about those scenarios: Selectivity in SQL. Can you force the database to use an index on a query? Generally, you will not tell the database when to actually use an index – that decision will be made by the database itself. Although it is worth noting that in most databases (like Oracle and MySQL), you can actually specify that you want the index to be used. How to create an index in SQL: Here’s what the actual SQL would look like to create an index on the Employee_Name column from our example earlier: CREATE INDEX name_index ON Employee (Employee_Name) How to create a multi-column index in SQL: We could also create an index on two of the columns in the Employee table , as shown in this SQL: CREATE INDEX name_index ON Employee (Employee_Name, Employee_Age) What is a good analogy for a database index? A very good analogy is to think of a database index as an index in a book. If you have a book about dogs and you are looking for the section on Golden Retrievers, then why would you flip through the entire book – which is the equivalent of a full table scan in database terminology – when you can just go to the index at the back of the book, which will tell you the exact pages where you can find information on Golden Retrievers. Similarly, as a book index contains a page number, a database index contains a pointer to the row containing the value that you are searching for in your SQL. What is the cost of having a database index? So, what are some of the disadvantages of having a database index? Well, for one thing it takes up space – and the larger your table, the larger your index. Another performance hit with indexes is the fact that whenever you add, delete, or update rows in the corresponding table, the same operations will have to be done to your index. Remember that an index needs to contain the same up to the minute data as whatever is in the table column(s) that the index covers. As a general rule, an index should only be created on a table if the data in the indexed column will be queried frequently. What is a self join? Explain it with an example and tutorial. Let’s illustrate the need for a self join with an example. Suppose we have the following table – that is called employee. The employee table has 2 columns – one for the employee name (called employee_name), and one for the employee location (called employee_location): employee employee_name employee_location Joe New York Sunil India Alex Russia Albert Canada Jack New York Now, suppose we want to find out which employees are from the same location as the employee named Joe. In this example, that location would be New York. Let’s assume – for the sake of our example – that we can not just directly search the table for people who live in New York with a simple query like this (maybe because we don’t want to hardcode the city name) in the SQL query: SELECT employee_name FROM employee WHERE employee_location = "New York" So, instead of a query like that what we could do is write a nested SQL query (basically a query within another query – which more commonly called a subquery) like this: SELECT employee_name FROM employee WHERE employee_location in ( SELECT employee_location FROM employee WHERE employee_name = "Joe") Using a subquery for such a simple question is inefficient. Is there a more efficient and elegant solution to this problem? It turns out that there is a more efficient solution – we can use something called a self join. A self join is basically when a table is joined to itself. The way you should visualize a self join for a given table is by imagining that a join is performed between two identical copies of that table. And that is exactly why it is called a self join – because of the fact that it’s just the same table being joined to another copy of itself rather than being joined with a different table. How does a self join work Before we come up with a solution for this problem using a self join, we should go over some concepts so that you can fully understand how a self join works. This will also make the SQL in our self join tutorial a lot easier to understand, which you will see further below. A self join must have aliases In a self join we are joining the same table to itself by essentially creating two copies of that table. But, how do we distinguish between the two different copies of the table – because there is only one table name after all? Well, when we do a self join, the table names absolutely must use aliases otherwise the column names would be ambiguous. In other words, we would not know which table’s columns are being referenced without using aliases for the two copies of the table. If you don’t already know what an alias is, it’s simply another name given to a table, and that name is then used in the SQL query to reference the table. So, we will just use the aliases e1 and e2 for the employee table when we do a self join. Self join predicate As with any join there must be a condition upon which a self join is performed – we can not just arbitrarily say “do a self join”, without specifying some condition. That condition will be our join predicate. If you need a refresher on join predicates (or just joins in general) then check this link out: Inner vs. Outer joins. Now, let’s come up with a solution to the original problem using a self join instead of a subquery. This will help illustrate how exactly a self join works. The key question that we must ask ourselves is what should our join predicate be in this example? Well, we want to find all the employees who have the same location as Joe. Because we want to match between our two tables (both of which are the same table – employee – aliased as e1 and e2) on location our join predicate should clearly be “WHERE e1.employee_location = e2.employee_location”. But is that enough to give us what we want? No, it’s not, because we also want to filter the rows returned since we only want people who are from the same location as Joe. So, how can we filter the rows returned so that only people from Joe’s location are returned? Well, what we can do is simply add a condition on one of the tables (e2 in our example) so that it only returns the row where the name is Joe. Then, the other table (e1) will match up all the names that have the same location in e2, because of our join predicate – which is “WHERE e1.employee_location = e2.employee_location”. We will then just select the names from e1, and not e2 because e2 will only have Joe’s name. If that’s confusing then keep reading further to understand more about how the query will work. So, the self join query that we come up with looks like this: Self Join SQL Example SELECT e1.employee_name FROM employee e1, employee e2 WHERE e1.employee_location = e2.employee_location AND e2.employee_name="Joe"; This query will return the names Joe and Jack – since Jack is the only other person who lives in New York like Joe. Generally, queries that refer to the same table can be greatly simplified by re-writing the queries as self joins. And, there is definitely a performance benefit for this as well. What does a self join look like? It will help tremendously to actually visualize the actual results of a self join internally. Remember that a self join is just like any other join, where the two tables are merged into one temporary table. First off, you should visualize that we have two separate copies of the employee table, which are given aliases of e1 and e2. These copies would simply look like this – note that we shortened the column names from employee_name and employee_location to just Name and Location for convenience: e1 e2 Name Location Name Location Joe Joe New York New York Sunil India Sunil India Alex Alex Russia Russia Albert Canada Albert Canada Jack Jack New York New York And the final results of running the self join query above – the actual joined table – would look like this: e1.employee_name e1.employee_location e2.employee_name e2.employee_location Joe New York Joe New York Jack New York Joe New York Self joins versus inner joins Are self joins and inner joins the same? You might be wondering if all self joins are also inner joins. After all, in our example above our self join uses an inner join because only the rows that match based on the join predicate are returned – non-matching rows are not returned. Well, it turns out that a self join and inner join are completely different concepts. A self join could just as well be an outer join or an inner join – it just depends on how the query is written. We could easily change the query we used above to do a LEFT OUTER JOIN – while the query still remains a self join – but that wouldn’t give us the results we want in our example. So, we use an implied inner join instead because that gives us the correct results. Remember that a query is a self join as long as the two tables being joined are exactly the same table, but whether it’s an inner join or outer join depends on what is specified in the SQL. And, inner/outer joins are separate concepts entirely from a self join. Self joins manager employee example The most commonly used example for self joins is the classic employee manager table. The table is called Employee, but holds all employees – including their managers. Every employee has an ID, and there is also a column for the manager ID. So, for example, let’s say we have a table that looks like this – and we call it Employee: EmployeeID Name ManagerID 1 Sam 10 2 Harry 4 4 Manager NULL 10 AnotherManager NULL Notice that in the table above there are two managers, conveniently named “Manager” and “AnotherManager”. And, those managers don’t have managers of their own – as noted by the NULL value in their Manager column. Now, given the table above, how can we return results that will show each employee’s name, and his/her manager’s name in nicely arranged results – with the employee in one column and his/her manager’s name in the other column. Well, it turns out we can use a self join to do this. Try to come up with the SQL on your own before reading our answer. Self join manager employee answer In order to come up with a correct answer for this problem, our goal should be to perform a self join that will have both the employee information and manager information in one row. First off, since we are doing a self join, it helps to visualize the one table as two tables – let’s give them aliases of e1 and e2. Now, with that in mind, we want the employee’s information on one side of the joined table and the manager’s information on the other side of the joined table. So, let’s just say that we want e1 to hold the employee information and e2 to hold the corresponding manager’s information. What should our join predicate be in that case? Well, the join predicate should look like “ON e1.ManagerID = e2.EmployeeID” – this basically says that we should join the two tables (a self join) based on the condition that the manager ID in e1 is equal to the employee ID in e2. In other words, an employee’s manager in e1 should have the manager’s information in e2. An illustration will help clarify this. Suppose we use that predicate and just select everything after we join the tables. So, our SQL would look like this: SELECT * FROM Employee e1 INNER JOIN Employee e2 ON e1.ManagerID = e2.EmployeeID The results of running the query above would look like this: e1.EmployeeID e1.Name e1.ManagerID e2.EmployeeID e2.Name e2.ManagerID 1 Sam 10 10 AnotherManager NULL 2 Harry 4 4 Manager NULL Note that there are only 2 rows returned – this is because an inner join is performed, which means that only when there is a match between employee ID’s and manager ID’s will there be a result returned. And since there are 2 people without managers (who have a manager ID of NULL), they will not be returned as part of table e1, because no employees have a matching ID of NULL. Now, remember that we only want to return the names of the employee and corresponding manager as a pair. So, we can fine-tune the SQL as follows: SELECT e1.Name, e2.Name FROM Employee e1 INNER JOIN Employee e2 ON e1.ManagerID = e2.EmployeeID Running the SQL above would return: Sam Harry AnotherManager Manager And that is the answer to the employee manager problem using a self join! Feel free to post any comments. Suppose we have the Employee table below, and we want to retrieve all of the cities that the employees live in, but we don’t want any duplicates. How can we do this in SQL? employee employee_name employee_location Joe New York Sunil India Alex Russia Albert Canada Jack New York Alex Russia In SQL, the distinct keyword will allow us to do that. Here’s what the simple SQL would look like: SELECT DISTINCT employee_location from employee; Running this query will return the following results: employee_location New York India Russia Canada So, you can see that the duplicate values for "Russia" and "Canada" are not returned in the results. It’s worth noting that the DISTINCT keyword can be used with more than one column. That means that only the unique combination of columns will be returned. Again, this is best illustrated by an example. Suppose we run the following SQL: SELECT DISTINCT employee_name, employee_location from employee; If we run the SQL above, it will return this: employee_name employee_location Joe New York Sunil India Alex Russia Albert Canada Jack New York Note that the one extra entry for "Alex, Russia" is missing in the result set above. This is because when we select a distinct combination of name and location, if there are 2 entries with the same exact name and location then the sql that we ran above will only return one of those entries. In the table below, how would you retrieve the unique values for the employee_location without using the DISTINCT keyword? employee employee_name employee_location Joe New York Sunil India Alex Russia Albert Canada Jack New York Alex Russia We can actually accomplish this with the GROUP BY keyword. Here’s what the SQL would look like: SELECT employee_location from employee GROUP BY employee_location Running this query will return the following results: employee_location New York India Russia Canada So, you can see that the duplicate values for "Russia" and "Canada" are not returned in the results. This is a valid alternative to using the DISTINCT keyword. If you need a refresher on the GROUP BY clause, then check out this question: Group By and Having. This question would probably be asked just to see how good you are with coming up with alternative options for SQL queries. Although, it probably doesn’t prove much about your SQL skills. Practice SQL Interview questions and Answers There’s no better way to improve your SQL skills than to practice with some real SQL interview questions – and these SQL practice problems are a great way to improve your SQL online. We recommend first creating the following simple tables presented below in the RDBMS software of your choice – MySQL, Oracle, DB2, SQL Server, etc, and then actually try to figure out the answer on your own if possible. The following SQL practice exercises were actually taken from real interview tests with Google and Amazon. Once again, we highly recommended that you try finding the answers to these SQL practice exercises on your own before reading the given solutions. The practice problems are based on the tables presented below. Salesperson Customer ID Name Age Salary ID Name 1 Abe 61 140000 4 Samsonic pleasant J 2 Bob 34 44000 6 Panasung oaktown J 5 Chris 34 40000 7 Samony jackson B 7 Dan 41 52000 9 Orange Jackson B 8 Ken 57 115000 38 38000 11 Joe City Industry Type Orders Number order_date cust_id salesperson_id Amount 10 8/2/96 4 2 540 20 1/30/99 4 8 1800 30 7/14/95 9 1 460 40 1/29/98 7 2 2400 50 2/3/98 6 7 600 60 3/2/98 6 7 720 70 5/6/98 9 7 150 Given the tables above, find the following: a. The names of all salespeople that have an order with Samsonic. b. The names of all salespeople that do not have any order with Samsonic. c. The names of salespeople that have 2 or more orders. d. Write a SQL statement to insert rows into a table called highAchiever(Name, Age), where a salesperson must have a salary of 100,000 or greater to be included in the table. Subscribe to our newsletter for more free interview questions. Let’s start by answering part a. It’s obvious that we would need to do a SQL join, because the data in one table will not be enough to answer this question. This is a good question to get some practice with SQL joins, so see if you can come up with the solution. Now, what tables should we use for the join? We know that the customer ID of Samsonic is 4, so we can use that information and do a simple join with the salesperson and customer tables. The SQL would look like this: select Salesperson.Name from Salesperson, Orders where Salesperson.ID = Orders.salesperson_id and cust_id = '4'; We can also use subqueries (a query within a query) to come up with another possible answer. Here is an alternative, but less efficient, solution using a subquery: select Salesperson.Name from Salesperson where Salesperson.ID = '{select Orders.salesperson_id from Orders, Customer where Orders.cust_id = Customer.id and Customer.name = 'Samsonic'}'; Click on the Next button below to check out the answer to parts B and C of this SQL interview question. Practice SQL Interview Questions Let’s now work on answering parts B and C of the original question. We present the tables below again for your convenience. Here is part B: Find the names of all salespeople that do not have any orders with Samsonic. This is part C: Find the names of salespeople that have 2 or more orders. Salesperson Customer ID Name Age Salary ID Name 1 Abe 61 140000 4 Samsonic pleasant J 2 Bob 34 44000 6 Panasung oaktown J 5 Chris 34 40000 7 Samony jackson B 7 Dan 41 52000 9 Orange Jackson B 8 Ken 57 115000 38 38000 11 Joe City Orders Number order_date cust_id salesperson_id Amount 10 8/2/96 4 2 540 20 1/30/99 4 8 1800 30 7/14/95 9 1 460 40 1/29/98 7 2 2400 Industry Type 50 2/3/98 6 7 600 60 3/2/98 6 7 720 70 5/6/98 9 7 150 Part B of the question asks for the names of the salespeople who do not have an order with Samsonic. A good way to approach this problem is to break it down: if we can first find the name of all the salespeople who do have an order with Samsonic. Then, perhaps we can work with that list and get all the salespeople who do not have an order with Samsonic. So, let’s start by just getting a list of all the salespeople ID’s that have an order with Samsonic. We can get this list by doing a join with a condition that the customer is Samsonic. We can use both the Customer and Orders table. The SQL for this will look like: select Orders.salesperson_id from Orders, Customer where Orders.cust_id = Customer.ID and Customer.Name = 'Samsonic' This will give us a list of all the salespeople ID’s that have an order with Samsonic. Now, we can get a list of the names of all the salespeople who do NOT have an order with Samsonic. SQL has a ‘NOT’ operator that easily allows us to exclude elements of the result set. We can use this to our advantage. Here is one possible answer to question B, and this is what the final SQL will look like: select Salesperson.Name from Salesperson where Salesperson.ID NOT IN( select Orders.salesperson_id from Orders, Customer where Orders.cust_id = Customer.ID and Customer.Name = 'Samsonic') Now, lets work on answering part C. As always, it’s best to break the problem down into more manageable pieces. So, lets focus on one table: the Orders table. Looking at that table we can find the ID’s that belong to the salespeople who have 2 or more orders. This will require use of the "group by" syntax in SQL, which allows us to group by whatever column we choose. In this case, the column that we would be grouping by is the salesperson_id column, because for a given salesperson ID we would like to find out how many orders were placed under that ID. With that said, we can write this SQL: select salesperson_id from Orders group by salesperson_id having count(salesperson_id) > 1 Note how we used the having clause instead of the where clause because we are using the ‘count’ aggregate. Well, now we have a SQL statement that gives us the ID’s of the salespeople who have more than 1 order. But, what we really want is the names of the salespeople who have those ID’s. This is actually quite simple if we do a join on the Salesperson and Orders table, and use the SQL that we came up earlier. It would look like this: SELECT name FROM Orders, Salesperson WHERE Orders.salesperson_id = Salesperson.id GROUP BY name, salesperson_id HAVING COUNT( salesperson_id ) >1 Based on our tables, this SQL will return the names of Bob and Dan. Click on the Next button below to check out the answer to part D. Practice SQL Interview Questions We’ve finally come to the last part of this question. Question D is presented below again for your convenience. Part D: Write a SQL statement to insert rows into a table called highAchiever(Name, Age), where a salesperson must have a salary of 100,000 or greater to be included in the table. Looking at part D, it’s easy to come up with the SQL to specify the condition that the salary of the salesperson must be greater or equal to 100,000. It would look like this "WHERE SALARY >= 100000". The only slightly difficult part of this question is how we insert values into the highachiever table while selecting values from the salesperson table. It turns out that the SQL for this is: insert into highAchiever (name, age) (select name, age from salesperson where salary > 100000); Because we are inserting values into the highAchiever table based off of what we select from another table, we don’t use the "Values" clause that we would normally use when inserting. This is what a regular insertion would look like (note the use of the "values" clause): insert into highAchiever(name, age) values ('Jackson', 28) As you can see the answer to this one is pretty simple. Click next below to read part 2 of our practice SQL interview questions. Practice SQL Interview Questions We’ve finally come to the last part of this question. Question D is presented below again for your convenience. Part D: Write a SQL statement to insert rows into a table called highAchiever(Name, Age), where a salesperson must have a salary of 100,000 or greater to be included in the table. Looking at part D, it’s easy to come up with the SQL to specify the condition that the salary of the salesperson must be greater or equal to 100,000. It would look like this "WHERE SALARY >= 100000". The only slightly difficult part of this question is how we insert values into the highachiever table while selecting values from the salesperson table. It turns out that the SQL for this is: insert into highAchiever (name, age) (select name, age from salesperson where salary > 100000); Because we are inserting values into the highAchiever table based off of what we select from another table, we don’t use the "Values" clause that we would normally use when inserting. This is what a regular insertion would look like (note the use of the "values" clause): insert into highAchiever(name, age) values ('Jackson', 28) As you can see the answer to this one is pretty simple. Click next below to read part 2 of our practice SQL interview questions. Practice SQL Interview Questions We’ve finally come to the last part of this question. Question D is presented below again for your convenience. Part D: Write a SQL statement to insert rows into a table called highAchiever(Name, Age), where a salesperson must have a salary of 100,000 or greater to be included in the table. Looking at part D, it’s easy to come up with the SQL to specify the condition that the salary of the salesperson must be greater or equal to 100,000. It would look like this "WHERE SALARY >= 100000". The only slightly difficult part of this question is how we insert values into the highachiever table while selecting values from the salesperson table. It turns out that the SQL for this is: insert into highAchiever (name, age) (select name, age from salesperson where salary > 100000); Because we are inserting values into the highAchiever table based off of what we select from another table, we don’t use the "Values" clause that we would normally use when inserting. This is what a regular insertion would look like (note the use of the "values" clause): insert into highAchiever(name, age) values ('Jackson', 28) As you can see the answer to this one is pretty simple. Click next below to read part 2 of our practice SQL interview questions. Practice SQL Interview Question #2 This question was asked in a Google interview: Given the 2 tables below, User and UserHistory: User user_id name phone_num UserHistory user_id date action 1. Write a SQL query that returns the name, phone number and most recent date for any user that has logged in over the last 30 days (you can tell a user has logged in if the action field in UserHistory is set to "logged_on"). Every time a user logs in a new row is inserted into the UserHistory table with user_id, current date and action (where action = "logged_on"). 2. Write a SQL query to determine which user_ids in the User table are not contained in the UserHistory table (assume the UserHistory table has a subset of the user_ids in User table). Do not use the SQL MINUS statement. Note: the UserHistory table can have multiple entries for each user_id. Note that your SQL should be compatible with MySQL 5.0, and avoid using subqueries. Let’s start with #1 by breaking down the problem into smaller, more manageable problems. Then we can take the pieces and combine them to provide a solution to the overall problem. Figuring out how to tell whether a user has logged on in the past 30 days seems like a good place to start. We want to see how we can express this in MySQL. You can look online for some Mysql functions that will help with this calculation. MySQL has a "date_sub" function, in which we can pass the current date (as in today’s date) and an interval of 30 days, and it will return us the date 30 days ago from today. Once we have that date, we can compare it with the date in the UserHistory table to see if it falls within the last 30 days. One question that remains is how we will retrieve the current date. This is simple, because MySQL comes built in with a function called curdate() that will return the current date. So, using the date_sub function, we can come up with this piece of SQL: UserHistory.date >= date_sub(curdate(), interval 30 day) This will check to see that the date in the UserHistory table falls within the last 30 days. Note that we use the ">=" operator to compare dates – in this case, we are simply saying that the date in the UserHistory table is greater than or equal to the date returned from the date_sub function. A date is "greater" than another date when it occurs further in the future than the other date. So, 2007-9-07 will be considered "greater" than 2006-08-19, because 2007-9-07 occurs further in the future than 2006-08-19. Subscribe to our newsletter for more free interview questions. Now, that’s only one piece of the overall problem, so let’s continue. The problem asks us to retrieve the name, phone number, and the most recent date for any user that’s logged in over the last 30 days. We have one table with the user_id and the phone number, but only the other table contains the actual date. Clearly, we will have to do a join on the 2 tables in order to combine the data into a form that will allow us to solve this problem. And since the 2 tables only share one column – the user_id column – it’s clear what common column we will use to join the 2 tables. Doing a join, selecting the required fields, and using the date condition will look like this: select name, phone_num, date from User, UserHistory where User.user_id=UserHistory.user_id and UserHistory.date >= date_sub(curdate(), interval 30 day) So far, we are selecting the name, phone number, and the date for any user that’s logged in over the last 30 days. But, wait a minute – the problem specifically asks for "the mostrecent date for any user that’s logged in over the last 30 days." The problem with this is that we could get multiple entries for a user that logged on more than once in the last 30 days. That is not what we want – we want to see the most recent date that someone logged on in the last 30 days – this will return a maximum of 1 entry per user. Now, the question is how do we get the most recent date? This is quite simple again, as MySQL provides a MAX aggregate function that we can use to find the most recent date. Given a group of dates, the MAX function will return the "maximum" date – which is basically just the most recent date (the one furthest in the future). Because this is an aggregate function, we will have to provide the GROUP BY clause in order to specify what column we would like to use as a ‘container’ of the group of dates. So, now our SQL looks like this: select User.name, User.phone_num, max(UserHistory.date) from User, UserHistory where User.user_id = UserHistory.user_id and UserHistory.date >= date_sub(curdate(), interval 30 day) group by (User.user_id); Now all we need is to add the condition that checks to see that the user’s action equals "logged_on". So, the final SQL, and the answer to the problem looks like this: select User.name, User.phone_num, max(UserHistory.date) from User, UserHistory where User.user_id = UserHistory.user_id and UserHistory.action = 'logged_on' and UserHistory.date >= date_sub(curdate(), interval 30 day) group by (User.user_id); Phew! We are finally done with question 1, click next to check out the answer to question #2. Practice SQL Interview Question #2 Given the 2 tables below, User and UserHistory: User user_id name phone_num UserHistory user_id date action Let’s continue with the 2nd question, presented again below… 2. Given the tables above, write a SQL query to determine which user_ids in the User table are not contained in the UserHistory table (assume the UserHistory table has a subset of the user_ids in User table). Do not use the SQL MINUS statement. Note: the UserHistory table can have multiple entries for each user_id. Note that your SQL should be compatible with MySQL 5.0, and avoid using subqueries. Basically we want the user_ids that exist in the User table but not in the UserHistory table. If we do a regular inner join on the user_id column, then that would just do a join on all the rows in which the User and UserHistory table share the same user_id values . But the question specifically asks for just the user_ids that are in the User table, but are notin the UserHistory table. So, using an inner join will not work. What if, instead of an inner join, we use a left outer join on the user_id column? This will allow us to retain all the user_id values from the User table (which will be our "left" table) even when there is no matching user_id entry in the "right" table (in this case, the UserHistory table). When there is no matching record in the "right" table the entry will just show up as NULL. This means that any NULL entries are user_id values that exist in the User table but not in the UserHistory table. This is exactly what we need to answer the question. So, here’s what the SQL will look like: select distinct u.user_id from User as u left join UserHistory as uh on u.user_id=uh.user_id where uh.user_id is null You may be confused by the "User as u" and the "UserHistory as uh" syntax. Those are what’s called aliases. Aliases allow us to assign a shorter name to a table, and it makes for cleaner and more compact SQL. In the example above, "u" will actually be another name for the "User" table and "uh" will be another name for the "UserHistory" table. We also use the distinct keyword. This will ensure that each user_id is returned only once. That concludes our series of practice sql interview questions. If you are looking for some more advanced and challenging SQL interview questions the check out our other articles:Advanced SQL practice questions. Advanced SQL Interview Questions and Answers Here are some complex SQL interview problems that are for people who are looking for more advanced and challenging questions, along with the answers and complete explanations. Try to figure out the answer to the questions yourself before reading the answers. Suppose we have 2 tables called Orders and Salesperson shown below: Salesperson Orders ID Name Age Salary Number order_date cust_id salesperson_id Amount 1 Abe 61 140000 10 8/2/96 4 2 540 2 Bob 34 44000 20 1/30/99 4 8 1800 5 Chris 34 40000 30 7/14/95 9 1 460 7 Dan 41 52000 40 1/29/98 7 2 2400 8 Ken 57 115000 50 2/3/98 6 7 600 38 38000 60 3/2/98 6 7 720 70 5/6/98 9 7 150 11 Joe Now suppose that we want to write SQL that must conform to the SQL standard. We want to retrieve the names of all salespeople that have more than 1 order from the tables above. You can assume that each salesperson only has one ID. If that is the case, then what (if anything) is wrong with the following SQL?: SELECT Name FROM Orders, Salesperson WHERE Orders.salesperson_id = Salesperson.ID GROUP BY salesperson_id HAVING COUNT( salesperson_id ) >1 The answer and explanation to advanced SQL question 1 There is definitely something wrong with the SQL above, and it is probably something that most beginner SQL programmers may not notice. The problem is that the SQL Standard says that we can not select a column that is not part of the group by clause unless it is also contained within an aggregate function. If we try to run the SQL above in SQL Server, we would get an error that looks like this: Column 'Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. You might be confused now, so let’s explain what that error means in plain English and through some simple examples. The most important thing you should take out of this discussion is understanding exactly why we get that error, and how to avoid it. There is a good reason for the error – read on to understand why. You can see in the bad SQL above that the “Name” column is clearly not also a part of the group by statement, nor is it contained within an aggregate function (like SUM, MAX, etc). As the error above suggests, we can fix the error by either wrapping the Name column inside an aggregate function or adding it to the Group By clause. So if we want to write SQL that complies with the standard, then we could write something like this by adding the Name column to the Group By: SELECT Name FROM Orders, Salesperson WHERE Orders.salesperson_id = Salesperson.ID GROUP BY salesperson_id, Name -- we added the name column to the group by, and now it works! HAVING COUNT( salesperson_id ) >1 The SQL above will run just fine without giving any error. We could also fix the problem by putting the Name column in any aggregate function, and then simply make that a part of our select statement. So, we could just write this SQL instead, and it would be perfectly legal according to the SQL standard. We chose to use the MAX aggregate function, but any other aggregate would work just fine as well: SELECT MAX(Name) --put name in an aggregate function FROM Orders, Salesperson WHERE Orders.salesperson_id = Salesperson.ID GROUP BY salesperson_id HAVING COUNT( salesperson_id ) >1 Adding the Name column to the group by, or wrapping the Name column in an aggregate will certainly fix the error – but it’s very important to note that both of those things will change the data that is returned to a state that you may not want. Why does the selected column have to be in the group by clause or part of an aggregate function? So, now you understand how to fix the error – but do you understand why it is a problem in the first place? Well, you should – because that is the most important thing to understand! So, let’s explain some more about why SQL gives that error shown above . First off, let’s talk a little bit more about aggregate functions. You probably know what aggregate functions in SQL are – we used one in the example above. In case you forgot, aggregate functions are used to perform a mathematical function on the values inside a given column, which is passed into the aggregate function. Here are some of the commonly used aggregate functions: AVG() - Returns the average value COUNT() - Returns the number of rows FIRST() - Returns the first value LAST() - Returns the last value MAX() - Returns the largest value MIN() - Returns the smallest value SUM() - Returns the sum To illustrate why the SQL standard says that a selected column has to be in the group by clause or part of an aggregate function, let’s use another example. Suppose we have some tables called Starbucks_Stores and Starbucks_Employees. In case you don’t already know, Starbucks is a popular coffee shop/cafe in the USA: Starbucks_Employees Starbucks_Stores ID Name Age HourlyRate StoreID store_id city 1 Abe 61 14 10 10 San Francisco 2 Bob 34 10 30 20 Los Angeles 5 Chris 34 9 40 30 San Francisco 7 Dan 41 11 50 40 Los Angeles 8 Ken 57 11 60 50 San Francisco 11 Joe 38 13 70 60 New York 70 San Francisco Now, given the tables above let’s say that we write some SQL like this: SELECT count(*) as num_employees, HourlyRate FROM Starbucks_Employees JOIN Starbucks_Stores ON Starbucks_Employees.StoreID = Starbucks_Stores.store_id GROUP BY city It looks like the SQL above would just return the number of Starbucks employees in each city, along with the HourlyRate – because it will group the employees based on whatever city they work in (thanks to the “group by city” statement). Subscribe to our newsletter for more free interview questions. The problem with selecting a non-aggregate column that is not in the group by But the real question here is what exactly would be returned for the HourlyRate in the SQL above? Would it return every employee’s hourly rate separated by commas? Since we group by city, will it return the highest hourly rate for each city? Will it return the hourly rate as a distinct list, so those 2 guys making 11 dollars an hour will have the 11 returned only once? The problem here is that we do not know what will be returned because we are notspecific enough with what we are asking for in the SQL! If what we are asking for is not specific enough, then the SQL processor will not know what to return. This is why almost all database implementations return an error when the SQL above is run (with the notable exception of MySQL) – and this is why the SQL does not conform to the Standard. In SQL Server running the SQL above will return the same error that we showed earlier. Let’s explain even further in case the problem with that SQL is not crystal clear. The order of operations in which things will happen with the SQL above is: 1. The 2 tables are joined on the condition that the Starbucks_Employees.StoreID column value is equal to the Starbucks_Stores.store_id column values. 2. Groups are then created for each city - which means that each distinct city will have it's own "group". So, there will be a total of 3 groups one each for San Francisco, New York, and Los Angeles. 3. The data we are interested in is selected from each group that is created in step 2. Because we end up with different groups based on the city, when we select a count(*), that will find the total count of rows in each and every group. But, the problem is that when we select HourlyRate, there will be multiple values for the HourlyRate within each group. For example, for the group created by the city of San Francisco there will be 4 different values for the HourlyRate – 14, 10, 11, and 13. So the question is which value of the HourlyRate should be selected from each group? Well, it could be any one of those values – which is why that SQL results in an error. This is because what we are asking for is NOT specific enough – hopefully this is crystal clear now to you. If the same HourlyRate were part of an aggregate function like MAX then it would simply return the highest HourlyRate within each group. And that is why having an aggregate function would fix the SQL error – because only one value will be selected from any given group. So, this SQL is perfectly fine because we are more specific in what we ask for – but this SQL would only work for you if you actually want the highest HourlyRate for each city: SELECT count(*) as num_employees, MAX(HourlyRate) FROM Starbucks_Employees JOIN Starbucks_Stores ON Starbucks_Employees.StoreID = Starbucks_Stores.store_id GROUP BY city Fix the error by adding column to the group clause Another way to fix the error is to simply add the HourlyRate column to the group by clause. This also means that having the HourlyRate column wrapped in aggregate function is no longer necessary. So you could write some SQL like this and it would fix the error: SELECT count(*) as num_employees, HourlyRate FROM Starbucks_Employees JOIN Starbucks_Stores ON Starbucks_Employees.StoreID = Starbucks_Stores.store_id GROUP BY city, HourlyRate This would then create groups based on the unique combination of the values in the HourlyRate and City columns. This means that there will be a different group for each HourlyRate and City combination – so $11, San Francisco and $11, Los Angeles will be 2 different groups. If you need to read up more on this topic then you can go here: Group By With Multiple Columns With the SQL above, each group will only have one value for the HourlyRate, which also means that there will be no ambiguity or confusion when selecting the HourlyRate since there is only possible value to select. It is now very clear that one and only one HourlyRate value can be returned for each group. Adding the column to the group by clause fixes the error but will alter the data that is returned But, one very important thing to note is that even though adding the column to the group by will fix the error, it will also change the groups that are created. This means that the data returned will be completely different from what was returned before. So, the count(*) function will no longer return the count of employees in a given city, and will instead return the number of rows in each group created by the unique combination of the HourlyRate and city columns. MySQL – selecting non-aggregate columns not in the group by One very important thing that you should know is that MySQL actually allows you to have non-aggregated columns in the select list even if they are not a part of the group by clause (a quick side note: a nonaggregated column is simply a column that is not wrapped within an aggregate function). What this means is that you will not receive an error if you try to run any of the “bad” SQL above in MySQL. The reason it is allowed in MySQL is because MySQL assumes that you know what you are doing – and it does actually make sense in some scenarios. For instance, let’s refer back to the SQL that we started with: SELECT Name FROM Orders, Salesperson WHERE Orders.salesperson_id = Salesperson.ID GROUP BY salesperson_id HAVING COUNT( salesperson_id ) >1 The reason the original SQL code (presented above) works just fine in MySQL is because there is a 1 to 1 mapping of salesperson name to ID – meaning that for every unique salesperson ID there is only one possible name. Another way of saying that is that each salesperson can only have one name. So when we create groups (which is done in the “GROUP BY salesperson_id”) based on the salesperson ID, each group will only have one and only one name. This SQL will also run just fine in MySQL without returning an error: SELECT count(*) as num_employees, HourlyRate FROM Starbucks_Employees JOIN Starbucks_Stores ON Starbucks_Employees.StoreID = Starbucks_Stores.store_id GROUP BY city But, even though the code above will not return an error, the HourlyRate that is returned by MySQL will be some arbitrary (random) value within each group. This is because when we create each group based on the city, each group can have different values for the HourlyRate. In other words, there is no one to one mapping between the HourlyRate and the city like we had before with the salesperson ID and the name. So, because we are not being specific as to which HourlyRate we want, MySQL will return an arbitrary value . For instance, in the group created by the city of San Francisco, MySQL could return the HourlyRate for any employee who works in San Francisco – whether it is 14, 10, 11, or 13 we don’t really know since it is arbitrary/random in MySQL. That concludes part 1 of our more difficult and complex SQL questions. Click on next to check out the next question that’s a part of our advanced SQL interview questions list. This is part 2 of our advanced practice SQL interview questions and answers. We highly suggest that you read part 1 of our Advanced SQL interview Questions before reading this, since a lot of the concepts presented in this portion are discussed in more depth in part 1. The problem is based on the tables presented below where salespeople have orders with certain customers that are in the Customers table. Salesperson Customer ID Name Age Salary ID Name 1 Abe 61 140000 4 Samsonic pleasant J 2 Bob 34 44000 6 Panasung oaktown J 5 Chris 34 40000 7 Samony jackson B 7 Dan 41 52000 9 Orange Jackson B 8 Ken 57 115000 38 38000 11 Joe City Industry Type Orders Number order_date cust_id salesperson_id Amount 10 8/2/96 4 2 2400 20 1/30/99 4 8 1800 30 7/14/95 9 1 460 40 1/29/98 7 2 540 50 2/3/98 6 7 600 60 3/2/98 6 7 720 70 5/6/98 9 7 150 In the tables above, each order in the Orders table is associated with a given Customer through the cust_id foreign key column that references the ID column in the Customer table. Here is the problem: find the largest order amount for each salesperson and the associated order number, along with the customer to whom that order belongs to. You can present your answer in any database’s SQL – MySQL, Microsoft SQL Server, Oracle, etc. The answer to the problem and explanation This question seems to be quite simple – but as you will soon find out it is deceptively complex. For each salesperson, all we need to retrieve is the largest order amount, and the associated order number. In order to retrieve that information we shouldbe able to simply do a join between the Orders and Salesperson tables wherever the Salesperson.ID is equal to the Orders.salesperson_id (this would be our join predicate). Then, we could group the results of that join by the Orders.salesperson_id column and retrieve both the highest valued order (by using max(Amount)), and the associated Order number. Let’s say that we choose to write our answer in MySQL. In MySQL we could legally write some code that looks like this: SELECT Orders.Number, max(Amount) FROM Orders JOIN Salesperson ON Salesperson.ID = Orders.salesperson_id GROUP BY Orders.salesperson_id And, if we run that code above it will return this as a result: Number 30 10 50 20 max(Amount) 460 2400 720 1800 The problem with the data returned in MySQL But, there’s a problem with the results returned from running the SQL above, and it should be fairly obvious once you actually look at the data in the tables we have above. Here is the problem: Order number 50 does not have an amount of “720″ – that amount actually belongs to order number 60. So, what is going on here? Why do the results return an order number that is not even in the same row as the max(Amount) of 720? And why are all of the other results correct? Understanding the group by statement is critical Well, we will have to explain a bit more about what’s going on with the group by. If you already read Part 1 of the advanced SQL interview questions then you should understand exactly what the problem is with the SQL above, and you can safely skip down to the section that says “New approach to the problem – start with a subquery”. Unless you want to reinforce the concepts presented in part 1, in which case we highly recommend that you read this entire explanation to this rather difficult interview question. When we group by the salesperson ID, there will be one group created for each and every salesperson ID. So, there will be 6 groups created – 1 for ID of 1, another for ID of 2, and others for ID’s 5, 7, 8, and 11. Inside those groups will be any rows that share the same salesperson ID values. When we select the max(Amount), MySQL will simply look for the highest value for Amountwithin each group and return that value. And when we select Orders.Number, MySQL is not going to return every Orders.Number value from each group – it is only going to select onevalue from each group. Subscribe to our newsletter for more free interview questions. Our SQL is not specific enough But, the question is which order number should be returned from each group? Each group can potentially have more than just one order number as long as there are more than one rows belonging to the group. And that is the exact problem – the SQL that we wrote is not specific enough, and MySQL will justarbitrarily/randomly return one of the values of the Orders.Number within each group. In this case, because order number 50 is part of the group created by the salesperson_id’s of 7, it will return 50. MySQL could just as well have returned order numbers 60 or 70 – the point is that it just randomly chooses one order number from each group. For the group created by salesperson ID of 2, the fact that the order number 10 is chosen (order number 10 corresponds to the largest order amount of 2400) is just pure coincidence – MySQL could have returned us order number 40, which is also a part of the same group as salesperson ID of 2. Most relational database implementations would have thrown an error if we tried to run the SQL above because the results are potentially arbitrary, as we just illustrated. MySQL is the exception, because it allows us to run the SQL above error-free, but as we illustrated the data returned could potentially not make any sense. Be sure to read Part 1 of the advanced SQL interview questions for more details on why. Well, now we know that there is definitely an issue with the SQL above, so how can we write a good query that would give us exactly what we want – along with the correct order number? New approach to the problem – start with a subquery Now let’s instead just try to break the problem down into more manageable pieces – starting with a simple subquery. Here is a subquery to get the highest valued order for each salesperson: SELECT salesperson_id, MAX(Amount) AS MaxOrder FROM Orders GROUP BY salesperson_id Running the query above will return this: salesperson_id MaxOrder 1 460 2 2400 7 720 8 1800 The query above gives us the salesperson_id and that salesperson’s associated highest order amount – but it still does not give us the order number associated with the highest order amount. So, how can we find the order number as well? Clearly we need to do something else with the subquery we have above that will also give us the correct order number. What are our options? Try to come up with an answer on your own before reading on. Well, we can do a join with the results of the subquery above. But, on what condition should our join be done and what exactly should we be joining the subquery above with? What if we join our subquery above with data from the Orders table, where the join is done on the basis that the salesperson_id matches, AND that the value in the Order table’s Amount column is equal to the amount (MaxOrder) returned from the subquery? This way, we can match up the correct Order Number with the correct corresponding value for the maximum Order Amount for a given salesperson_id. With that in mind, we can write this query: select salesperson_id, Number as OrderNum, Amount from Orders JOIN ( -- this is our subquery from above: SELECT salesperson_id, MAX(Amount) AS MaxOrder FROM Orders GROUP BY salesperson_id ) as TopOrderAmountsPerSalesperson USING (salesperson_id) where Amount = MaxOrder Running the query above returns us this: salesperson_id OrderNum Amount 8 20 1800 1 30 460 2 10 2400 7 60 720 How does the query work exactly? How does the query above work exactly? It’s actually pretty simple. First, the subquery (which is basically a derived table here, named TopOrderAmountsPerSalesperson) returns the orders with the highest dollar amounts per salesperson, and the associated salesperson ID. So, now we have each salesperson’s highest valued order and his/her ID in a derived table. That derived table (the results from the subquery) is then joined with the entire Orders table on the condition that the salesperson ID matches and that the Amount from the Orders table matches the MaxOrder amount returned from the derived table. What’s the point of this? Well, that join will give us the correct OrderNumber since it is matching on both the salesperson ID and the amount. Even if there are 2 rows with the same exact salesperson ID and amount it will not even matter because no matter which ordernumber is associated with that row, the result set will be exactly the same. And remember that the whole reason we are doing this is to avoid the original problem with not being able to select a non-aggregated column with a group by. Now, retrieving the salesperson name is simple. Try to figure it out on your own. Here is how we retrieve the salesperson name – we just use another join with the Salesperson table and select the Name: SELECT salesperson_id, Name, Orders.Number AS OrderNumber, Orders.Amount FROM Orders JOIN Salesperson ON Salesperson.ID = Orders.salesperson_id JOIN ( SELECT salesperson_id, MAX( Amount ) AS MaxOrder FROM Orders GROUP BY salesperson_id ) AS TopOrderAmountsPerSalesperson USING ( salesperson_id ) WHERE Amount = MaxOrder Running the query above returns this: salesperson_id Name OrderNumber Amount 1 Abe 30 460 2 Bob 10 2400 7 Dan 60 720 8 Ken 20 1800 And, finally we have our answer! But one last thing – let’s check for corner cases. What would happen if we add one more row to the table where a given salesperson has 2 or more orders that have the same value for the highest amount? For example, let’s add this row to the Orders table: Number order_date cust_id salesperson_id Amount 80 02/19/94 7 2 2400 This now means that the salesperson with an ID of 2 has 2 orders with an amount of 2400 in the Orders table. And, if we run the SQL above again, we will get this as a result (note the extra row for Bob): salesperson_id Name OrderNumber Amount 1 Abe 30 460 2 Bob 40 2400 7 Dan 60 720 8 Ken 20 1800 2 Bob 80 2400 Now, the question is if we only want one of Bob’s orders to show up, how can we eliminate the duplicate? Again, try to figure this out on your own before reading our answer. Well, we could add a GROUP BY salesperson_id, Amount to the end of the query, which would create separate groups for each unique combination of the salesperson ID and the Amount. This would give us a query that looks like this: SELECT salesperson_id, Salesperson.Name, Number AS OrderNumber, Amount FROM Orders JOIN Salesperson ON Salesperson.ID = Orders.salesperson_id JOIN ( SELECT salesperson_id, MAX( Amount ) AS MaxOrder FROM Orders GROUP BY salesperson_id ) AS TopOrderAmountsPerSalesperson USING ( salesperson_id ) WHERE Amount = MaxOrder GROUP BY salesperson_id, Amount Now, running this query even with the duplicate row in the Orders table would return us this: salesperson_id Name OrderNumber Amount 1 Abe 30 460 2 Bob 40 2400 7 Dan 60 720 8 Ken 20 1800 And that’s it – we are now good to go, and we have a final answer to this difficult interview question! This concludes our series of complex SQL interview questions – hopefully you found them challenging! What’s the difference between data mining and data warehousing? Data mining is the process of finding patterns in a given data set. These patterns can often provide meaningful and insightful data to whoever is interested in that data. Data mining is used today in a wide variety of contexts – in fraud detection, as an aid in marketing campaigns, and even supermarkets use it to study their consumers. Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository. Example of data mining If you’ve ever used a credit card, then you may know that credit card companies will alert you when they think that your credit card is being fraudulently used by someone other than you. This is a perfect example of data mining – credit card companies have a history of your purchases from the past and know geographically where those purchases have been made. If all of a sudden some purchases are made in a city far from where you live, the credit card companies are put on alert to a possible fraud since their data mining shows that you don’t normally make purchases in that city. Then, the credit card company can disable your card for that transaction or just put a flag on your card for suspicious activity. Another interesting example of data mining is how one grocery store in the USA used the data it collected on it’s shoppers to find patterns in their shopping habits. They found that when men bought diapers on Thursdays and Saturdays, they also had a strong tendency to buy beer. The grocery store could have used this valuable information to increase their profits. One thing they could have done – odd as it sounds – is move the beer display closer to the diapers. Or, they could have simply made sure not to give any discounts on beer on Thursdays and Saturdays. This is data mining in action – extracting meaningful data from a huge data set. Subscribe to our newsletter for more free interview questions. Example of data warehousing – Facebook A great example of data warehousing that everyone can relate to is what Facebook does. Facebook basically gathers all of your data – your friends, your likes, who you stalk, etc – and then stores that data into one central repository. Even though Facebook most likely stores your friends, your likes, etc, in separate databases, they do want to take the most relevant and important information and put it into one central aggregated database. Why would they want to do this? For many reasons – they want to make sure that you see the most relevant ads that you’re most likely to click on, they want to make sure that the friends that they suggest are the most relevant to you, etc – keep in mind that this is the data mining phase, in which meaningful data and patterns are extracted from the aggregated data. But, underlying all these motives is the main motive: to make more money – after all, Facebook is a business. We can say that data warehousing is basically a process in which data from multiple sources/databases is combined into one comprehensive and easily accessible database. Then this data is readily available to any business professionals, managers, etc. who need to use the data to create forecasts – and who basically use the data for data mining. Datawarehousing vs Datamining Remember that data warehousing is a process that must occur before any data mining can take place. In other words, data warehousing is the process of compiling and organizing data into one common database, and data mining is the process of extracting meaningful data from that database. The data mining process relies on the data compiled in the datawarehousing phase in order to detect meaningful patterns. In the Facebook example that we gave, the data mining will typically be done by business users who are not engineers, but who will most likely receive assistance from engineers when they are trying to manipulate their data. The data warehousing phase is a strictly engineering phase, where no business users are involved. And this gives us another way of defining the 2 terms: data mining is typically done by business users with the assistance of engineers, and data warehousing is typically a process done exclusively by engineers. What is ternary (also known as) three-valued logic in SQL? This is a question best illustrated by an example. Suppose we have the following SQL table with the columns modelNumber and laptopModel: Computer { modelNumber CHAR(30) NOT NULL, laptopModel CHAR(15), } Assume that the table stores entries for all the makes of PC’s and laptops – and if it’s a laptop the laptopModel field is set. Given that information, let’s try to answer a question to explain three valued logic: How would you write a SQL statement that returns only the PC’s and no laptops from the table above? You might think that the answer to this question is very easy, and the first thing that may come to mind is this answer: SELECT * FROM Computer WHERE laptopModel = null SQL uses Ternary/Three valued logic Actually the SQL code above will not return anything at all – not even the PC’s that are actually in the table! The reason has to do with the fact that the fact that SQL uses ternary or three-valued logic. The concept of ternary logic is important to understand in order to write effective SQL queries. SQL Logical Operations have 3 possible values This is an important fact to remember: logical operations in SQL have 3 possible values NOT 2 possible values. What are those 3 possible values? They are TRUE, FALSE, and UNKNOWN. The UNKNOWN value, as it’s name suggests, simply means that a value is unknown or unrepresentable. Running the SQL code that we presented above will return UNKNOWN for a value. The equality operator The problem with the SQL statement above is the fact that we used the equality operator (the “=”) in order to test for a NULL column value. In the majority of databases, a comparison to NULL returns UNKNOWN – this is true even when comparing NULL to NULL. The correct way to check for a NULL or a non-NULL column is to use the IS NULL or the IS NOT NULL syntax. So, the SQL query should be changed to this: SELECT * FROM Computer WHERE laptopModel IS NULL This is a common mistake – so be sure to account for UNKNOWN values in WHERE clause conditions. Let’s say that you are given a SQL table called “Compare” (the schema is shown below) with only one column called “Numbers”. Compare { Numbers INT(4) } Write a SQL query that will return the maximum value from the “Numbers” column, without using a SQL aggregate like MAX or MIN. This problem is difficult because you are forced to think outside the box, and use whatever SQL you know to solve a problem without using the most obvious solution (doing a “select MAX…” from the table). Probably the best way to start breaking this problem down is by creating a sample table with some actual data that matches the schema given. Here is a sample table to start out with: Compare Numbers 30 70 -8 90 The value that we want to extract from the table above is 90, since it is the maximum value in the table. How can we extract this value from the table in a creative way (it will have to be creative since we can’t use the max or min aggregates)? Well, what are the properties of the highest number (90 in our example)? We could say that there are no numbers larger than 90 – that doesn’t sound very promising in terms of solving this problem. We could also say that 90 is the only number that does not have a number that is greater than it. If we can somehow return every value that does not have a value greater than it then we would only be returning 90. This would solve the problem. So, we should try to design a SQL statement that would return every number that does not have another number greater than it. Sounds fun right? Let’s start out simple by figuring out which numbers do have any numbers greater than themselves. This is an easier query. We can start by joining the Compare table with itself – this is called a self join, which you can read more about here in case you are not familiar with self joins: Example of self join in SQL . Using a self join, we can create all the possible pairs for which each value in one column is greater than the corresponding value in the other column. This is exactly what the following query does: SELECT Smaller.Numbers, Larger.Numbers FROM Compare as Larger JOIN Compare AS Smaller ON Smaller.Numbers < Larger.Numbers Now, let's use the sample table we created, and we end up with this table after running the query above: Smaller Larger -8 90 30 90 70 90 -8 70 30 70 70 90 Now we have every value in the "Smaller" column except the largest value of 90. This means that all we have to do is find the value that is not in the Smaller column (but is in the Compare table), and that will give us the maximum value. We can easily do this using the NOT IN operator in SQL. Subscribe to our newsletter for more free interview questions. But before we do that we have to change the query above so that it only selects the "Smaller" column because that is the only column we are interested in. So, we can simply change our query above to this in order to get the "Smaller" column: SELECT Smaller.Numbers FROM Compare as Larger JOIN Compare AS Smaller ON Smaller.Numbers < Larger.Numbers Now, all we have to do is apply the NOT IN operator to find the max value. SELECT Numbers FROM Compare WHERE Numbers NOT IN ( SELECT Smaller.Numbers FROM Compare AS Larger JOIN Compare AS Smaller ON Smaller.Numbers < Larger.Numbers ) This will give us what we want - the maximum value. But there is one small problem with the SQL above - if the maximum value is repeated in the Compare table then it will return that value twice. We can prevent that by simply using the DISTINCT keyword. So, here's what the query looks like now: SELECT DISTINCT Numbers FROM Compare WHERE Numbers NOT IN ( SELECT Smaller.Numbers FROM Compare AS Larger JOIN Compare AS Smaller ON Smaller.Numbers < Larger.Numbers ) And there we have our final answer. Of course, some of you may be saying that there is a much simpler solution to this problem. And you would be correct. Here is a simpler answer to the problem using the SQL Top clause along with the SQL Order By clause - this is what it would look like in SQL Server: select TOP 1 -- select the very top entry in result set Numbers from Compare order by Numbers DESC And since MySQL does not have a TOP clause this is what it would look like in MySQL using just ORDER BY and LIMIT : select Numbers from Compare order by Numbers DESC - order in descending order LIMIT 1 --retrieve only one value So, even though there are a couple of much simpler answers it is nice to know the more complicated answer using a self join so that you can impress your interviewer with your knowledge. Provide an example of SQL Injection A SQL injection attack is exactly what the name suggests – it is where a hacker tries to “inject” his harmful/malicious SQL code into someone else’s database, and force that database to run his SQL. This could potentially ruin their database tables, and even extract valuable or private information from their database tables. The idea behind SQL injection is to have the application under attack run SQL that it was never supposed to run. How do hackers do this? As always, it’s best to show this with examples that will act as a tutorial on SQL injection. SQL Injection Example In this tutorial on SQL injection, we present a few different examples of SQL injection attacks, along with how those attacks can be prevented. SQL injection attacks typically start with a hacker inputting his or her harmful/malicious code in a specific form field on a website. A website ‘form’, if you don’t already know, is something you have definitely used – like when you log into Facebook you are using a form to login, and a form input field can be any field on a form that asks for your information – whether it’s an email address or a password, these are all form fields. For our example of SQL injection, we will use a hypothetical form which many people have probably dealt with before: the “email me my password” form, which many websites have in case one of their users forgets their password. Subscribe to our newsletter for more free interview questions. The way a typical “email me my password” form works is this: it takes the email address as an input from the user, and then the application does a search in the database for that email address. If the application does not find anything in the database for that particular email address, then it simply does not send out an email with a new password to anyone. However, if the application does successfully find that email address in its database, then it will send out an email to that email address with a new password, or whatever information is required to reset the password. But, since we are talking about SQL injection, what would happen if a hacker was not trying to input a valid email address, but instead some harmful SQL code that he wants to run on someone else’s database to steal their information or ruin their data? Well, let’s explore that with an example, starting from how a hacker would typically get started in order to figure out a system works. Starting the SQL Injection Process The SQL that would retrieve the email address in the “email me my password” form would typically look something like this – keep in mind that this SQL really is embedded within a scripting language like PHP (it depends on what scripting language is being used by the application): SELECT data FROM table WHERE Emailinput = '$email_input'; This is, of course, a guess at what the SQL being run by the application would look like, because a hacker would not know this information since he does not have access to the application code. The “$email_input” variable is used to hold whatever text the user inputs into the email address form field. Step 1: Figure out how the application handles bad inputs Before a hacker can really start taking advantage of a weak or insecure application, he must figure out how the application handles a simple bad input first. Think of this initial step as the hacker “feeling out” his opponent before he releases the really bad SQL. So, with that in mind, the first step a hacker would typically take is inputting an email address with a quote appended to the end into the email form field. We will of course explain why further down below. But for now, the input from the hacker would look something like this – pay special attention to the fact that there is a quote appended to the end of the email address: hacker@programmerinterview.com' If the hacker puts that exact text into the email address form field then there are basically 2 possibilities: 1. The application will first “sanitize” the input by removing the extra quote at the end, because we will assume that the application considers email addresses with quotes as potentially malicious. But, a side note: email addresses can actually contain quotes according to IETF standards. Sanitizing data is the act of stripping out any characters that aren’t needed from the data that is supplied – in our case, the email address. Then, the application may run the sanitized input in the database query, and search for that particular email address in the database (without the quote of course). 2. The application will not sanitize the input first, and will take the input from the hacker and immediately run it as part of the SQL. This is what the hacker is hoping would happen, and we will assume that this is what our hypothetical application is doing. This is also known as constructing the SQL literally, without sanitizing. What it means is that the SQL being run by the application would look like this – pay extra attention to the fact that there is now an extra quote at the end of the WHERE statement in the SQL below: SELECT data FROM table WHERE Emailinput = 'hacker@programmerinterview.com''; Now, what would happen if the SQL above is executed by the application? Well, the SQL parser would see that there is an extra quote mark at the end, and it will abort with a syntax error. The error response is key, and tells the hacker a lot But, what will the hacker see on the actual form page when he tries to input this email address with a quote at the end? Well, it really depends on how the application is set up to handle errors in the database, but the key here is that the hacker will most likely not receive an error saying something like “This email address is unknown. Please register to create an account” – which is what the hacker would see if the application is actually sanitizing the input. Since we are assuming that the application is not sanitizing it’s input, the hacker would most likely see something like “Internal error” or “Database error” – and now the hacker also knows that the input to the database is not being sanitized . And if the application is not sanitizing it’s input then it means that the database can most probably be exploited, destroyed, and/or manipulated in some way that could be very bad for the application owner. Step 2: Run the actual SQL injection attack Now that the hacker now knows the database is vulnerable he can attack further to get some really good information. What could our hacker do? Well, if he’s been able to successfully figure out the layout of the table, he could just type this harmful code on the form field (where the email address would normally go): Y'; UPDATE table SET email = 'hacker@ymail.com' WHERE email = 'joe@ymail.com'; Note that the SQL above is completely SQL compliant and legitimate. You can see that after the Y there is an extra quote followed by a semicolon, which allows the hacker to close the statement and then incredibly run another statement of his own! Then, if this malicious code is run by the application under attack, it would look like this: SELECT data FROM table WHERE Emailinput = 'Y'; UPDATE table SET email = 'hacker@ymail.com' WHERE email = 'joe@ymail.com'; Can you see what this code is doing? Well, it is resetting the email address that belongs to “joe@ymail.com” to “hacker@ymail.com”. This means that the hacker is now changing a user’s account so that it uses his own email address – hacker@ymail.com. This then means that the hacker can reset the password – and have it sent to his own email address! Now, he also has a login and a password to the application, but it is under someone else’s account. In the example above, we did skip some steps that a hacker would have taken to figure out the table name and the table layout, because we wanted to keep this article relatively short. But, the idea is that SQL injection is a real threat, and taking measures to prevent it is extremely important. Now, the question is how to prevent SQL injection attacks? Well, read on to the next page or just click here: SQL Injection Prevention. How to prevent SQL injection attacks? In our earlier tutorial on SQL Injection, one way to have prevented the SQL injection attack was by simply having the user input sanitized – which we briefly discussed. Since we are dealing with email addresses in our example, this means that we should be able to safely exclude certain characters which don’t normally appear in email addresses. Here is a list of characters that would normally appear in emails, and anything else should not be allowed inside the database – the user should just receive an error saying something like “Invalid email address” if he tries to input an email address with any characters other than the ones below: abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 ! $ & * - = ^ ` | ~ # % ' + / ? _ { } @ . Sanitizing input is not enough to prevent SQL injection Unfortunately, just sanitizing user inputs is not enough to prevent SQL injection – as you will see in the examples below. So, let’s explore some other options and see what works and why – it’s good to know all the options, so be sure to read everything. Subscribe to our newsletter for more free interview questions. What about escaping strings? Shouldn’t this remove the threat of quotes in SQL injection? In case you forgot what “escaping” means in the context of programming, basically it’s just allowing special characters (like single/double quotes, percent signs, backslashes, etc.) in strings to be saved so that they remain as part of the string, and are not mis-interpreted as something else. For example, if we want to include a single quote in a string that gets output to the browser in PHP (note in the word “it’s” we have a single quote that will be output), then we have to add a backslash to the single quote so that PHP outputs it as a single quote: echo 'Programmer Interview - It\'s Great!'; So, when this is displayed on a webpage it will look like: Programmer Interview - It's Great! This is what’s called escaping strings. If we did not escape the quote in our string then it would not output anything, and would result in a PHP error because the quote is also used to enclose the characters in an echo statement. Now, how would escaping the quotes have helped in our previous example? Remember our hacker is trying to input this harmful/malicious code into the email form field: Y'; UPDATE table SET email = 'hacker@ymail.com' WHERE email = 'joe@ymail.com'; What if we escape the quotes in the string above before we pass the SQL to the database? Well, that would mean the quotes in the string become a part of the string that is searched for using the Emailinput field – in effect the query is searching for an email address that is equal to that giant string. In other words, the quotes are part of the string literal, and will not be interpreted as SQL. In MySQL, we can escape a quote simply by prepending a quote with another quote – basically 2 single quotes will be interpreted as one quote – which is what we do in the example below. So, the actual SQL that will be run looks like this: SELECT data FROM table WHERE Emailinput = “ Y''; --the quote after the Y is escaped UPDATE table SET email = ''hacker@ymail.com'' -- escape quotes WHERE email = ''joe@ymail.com'' ”; --and, more quotes escaped The key in the example above is that the quotes are now being treated as part of a string that gets compared to a field in the table, and NOT being translated as actual SQL – it’s very important that you understand the distinction because it is exactly the problem that escaping quotes solves for us. If we do not escape quotes, it allows those quotes to become part of the SQL, and basically allows the hacker to run 2 statements at once – which is exactly what is so dangerous. The 2nd statement (the “ UPDATE table SET email = ‘hacker@ymail.com’ WHERE email = ‘joe@ymail.com’;”) is what really messes things up, because it allows the hacker to change the email address of an existing account to his own email address. And, that 2nd statement is only allowed to run because the quotes are not escaped. Escaping a string is also known as quotesafing, since you are essentially making the SQL query “safe” for quotes. Just Escaping Strings Does Not Prevent SQL Injection Although we went through an example in which escaping the string prevented the SQL injection attack, just escaping strings is actually not enough protection against SQL injection attacks. A decent hacker can run another attack, by exploiting the fact that some databases allow people to escape strings in more than just one way. MySQL actually allows you to escape quotes in a variety of different ways – in fact as you can see below in some information pulled straight from the MySQL reference pages, you can easily escape quote characters by preceding them with a backslash – a “\” : There are several ways to include quote characters within a string that goes into a MySQL query: 1.A “'” inside a string quoted with “'” may be written as “''”. 2.A “"” inside a string quoted with “"” may be written as “""”. 3.Precede the quote character by an escape character (“\”). Let’s say that we choose to escape quotes manually by just adding a single quote every time a string comes in with a quote. Because, if we have a name field, we want to allow people with quotes in their name to be able to save their name without any issues – for instance, someone with the name Jack O’Leary should be able to be saved in our database without the quote causing any issues. So, if we are retrieving someone’s name from our database, then the SQL may look like this: SELECT * FROM customers WHERE name = 'Jack O’’Leary'; -- this works great And this works perfectly fine because the double quotes will be interpreted as a single quote, and MySQL will search for Jack O’Leary (with one quote), and not Jack O’’Leary (with 2 quotes). But, let’s say a clever hacker realizes that you may be running a MySQL database, and knows that MySQL also allows you to escape quotes by preceding the quote character with a backslash – so a quote could also be escaped like this: \’ So, our clever hacker tries to insert a string like this into the email field on our form: \'; DROP TABLE users; But after we do our own manual string escaping (by adding the extra quote), that string turns into this: \''; DROP TABLE users; -- So, the SQL that is run will look like this: SELECT * FROM customers WHERE name = '\''; DROP TABLE users; --'; What happens when this SQL is run? Well, the ‘\’’ gets interpreted by MySQL as a string with a single quote, meaning that the system will just search for a name with a single quote. The 2nd quote (the one that comes after the \’), will allow the hacker to close the first statement, insert a semicolon, and then run another malicious statement (the DROP TABLE users; code). The hacker essentially fools the system into NOT escaping one of the extra quotes by taking advantage of 2 things here: 1. The application developer is trying to escape quotes himself by just appending an extra quote. 2. MySQL supports escape mechanisms other than just appending a quote. In this case, the hacker also used the backslash escape mechanism to run his malicious code. Remember, the quotes are key because it allows the hacker to close one statement and run any extra statement of his or her choosing. Let’s repeat this again: Just escaping quotes is not enough to prevent SQL injection The lesson here is that escaping quotes is unfortunately not enough to prevent all SQL injection attacks, and also extremely difficult to do correctly on your own. And because of the latter, many languages that provide database interface libraries have a function that will handle escaping strings for you. These functions will handle both parsing of the string and quotesafeing as well – so when you use those functions you have a much better chance of getting things done correctly. If you are looking for actual examples of those functions, PHP has a function called mysql_real_escape_string and Perl’s DBD module has a function called quote. You absolutely should be using these functions before using form data in your queries. Provide a definition and example of a prepared statement in PHP, Java, and Perl. What are the advantages of using prepared statements? How do prepared statements help prevent SQL injection attacks? Prepared statements, also known as parameterized statements or parameterized SQL, can be thought of as a template for SQL statements. Prepared statements allow database engines to run SQL statements more efficiently because the same (or similar) SQL is used over and over again – we’ll explain more about the details below. The key feature of a prepared statement is the fact that values can be plugged into the query after the query is “prepared”, and ready to be executed. This will make more sense when you see the examples below. Prepared Statements use Placeholders Prepared statements use question marks (?), which are placeholders for where actual values that will be used in the SQL should be “plugged” in. The placeholders used in prepared statements are also known as bound parameters, since they are essentially parameters that are passed to the SQL that “bind” to the SQL at a later time. Confused yet? Well, some examples should clear it up – it’s really not difficult to understand at all. Examples of Prepared Statements Below we present some examples of prepared statements in Java, PHP, and Perl. Here we are using the interface libraries that each language provides to communicate with different database environments (like MySQL, Oracle, etc). As you may already know, Java uses a library known as JDBC, PHP uses something called PDO (PHP Data Objects), and Perl uses something called the Perl DBI (Perl Database Interface) Example of a prepared statement in Java using JDBC: java.sql.PreparedStatement stmt = connection.prepareStatement( "SELECT * FROM table WHERE EMAIL = ?"); /* The statement below sets "?" to an actual value that is stored in the email variable, we are also assuming that the email variable is set beforehand: */ stmt.setString(1, email); stmt.executeQuery(); Example of a prepared statement in PHP using PDO: $stmt = $dbh->prepare("SELECT * FROM table WHERE EMAIL = ? "); /* The statement below sets "?" to an actual value that is stored in the email variable, we are also assuming that the $email variable is set beforehand: */ $stmt->execute($email); Example of a prepared statement in Perl using Perl DBI: my $stmt = $dbh->prepare('SELECT * FROM table WHERE EMAIL = ?'); /* The statement below sets "?" to an actual value that is stored in the email variable, we are also assuming that the email variable is set beforehand: */ $stmt->execute($email); Looking at the examples above, you can see that even though the syntax details are different for each language, they are all fundamentally the same because they all use a “?” as a placeholder for the value that will be passed in later. And they all “prepare” the SQL first and execute later, which is of course the whole point behind prepared statements. A good way to think of a prepared statement is as a template for SQL – because of the fact that it’s not a complete SQL statement since it does not have the values it needs in the placeholder areas. What exactly happens when SQL is “prepared”? Prepared SQL is created by calling the respective prepare method in each language, as you can see in the examples above. The prepared SQL template is sent to the DBMS (whether it’s MySQL, DB2, or whatever) with the placeholder values (the “?”) left blank. Then, the DBMS will parse, compile, and perform query optimization on the template. After that, the DBMS will store the result, but it can not execute the result because it, of course, does not have any values to execute with since there is no data in the placeholders/parameters. The SQL is only executed once the respective execute function is called and data is passed in for the parameters. What are the advantages of using prepared statements? Prepared statements provide 2 primary benefits. The first is that they provide better performance. Even though a prepared statement can be executed many times, it is is compiled and optimized only once by the database engine. Because of the fact that a prepared statement does not have to be compiled and optimized each and every time the values in the query change, it offers a distinct performance advantage. But, keep in mind that not all query optimization can occur when a prepared statement is compiled. This is because the best query plan may also depend on the specific values of the parameters being passed in. The best query plan may also change over time, because of the fact that the database tables and indices also change over time. Why are prepared statements so effective against SQL injection? The second advantage of using prepared statements is that they are the best solution to preventing SQL injection attacks. If you are not familiar with SQL injection, it’s highly recommended that you read our article on SQL injection – every programmer should know what SQL injection is. A short, non-academic description of SQL injection is this: any time an application runs SQL based on some user input through a web form, then a hacker could potentially pass in some input with the intent of having his input run as part of your SQL, and either steal or corrupt your users’ data. Now, back to our discussion: the reason that prepared statements help so much in preventing SQL injection is because of the fact that the values that will be inserted into a SQL query are sent to the SQL server after the actual query is sent to the server. In other words, the data input by a potential hacker is sent separately from the prepared query statement. This means that there is absolutely no way that the data input by a hacker can be interpreted as SQL, and there’s no way that the hacker could run his own SQL on your application. Any input that comes in is only interpreted as data, and can not be interpreted as part of your own application’s SQL code – which is exactly why prepared statements prevent SQL injection attacks. What is the difference between parameterized queries and prepared statements? Both parameterized queries and prepared statements are exactly the same thing. Prepared statement seems to be the more commonly used term, but there is no difference between both terms. Parameterized queries and prepared statements are features of database management systems that that basically act as templates in which SQL can be executed. The actual values that are passed into the SQL are the parameters (for example, which value needs to be searched for in the WHERE clause), which is why these templates are called parameterized queries. And, the SQL inside the template is also parsed, compiled, and optimized before the SQL is sent off to be executed – in other words “prepared”. That is why these templates are often called prepared statements as well. So, just remember that they are two different names for the same thing. You can read a more detailed description about prepared statements (a.k.a. parameterized queries) and why they are useful here: Prepared statements and SQL injection. What is blind SQL Injection? Provide an example of blind sql injection as well. In our SQL Injection Tutorial, we discussed how hackers use error messages from the database that they are trying to attack in order to determine whether or not that database is vulnerable to a SQL injection attack. But, what if databaseerror messages are suppressed so that they are not displayed on the web page of a site that is under attack? Do hackers have some other way of running a SQL injection attack? It turns out that hackers do actually have a way to run a SQL injection attack even when database error messages are disabled. This form of SQL injection is known as blind SQL injection. Blind SQL injection versus SQL injection What exactly is the difference between blind SQL injection and normal SQL injection? Well, in normal SQL injection hackers rely on error messages returned from the database in order to give them some clues on how to proceed with their SQL injection attack. But with blind SQL injection the hacker does not need to see any error messages in order to run his/her attack on the database – and that is exactly why it is called blind SQL injection. So, even if the database error messages are turned off a hacker can still run a blind SQL injection attack. Here we present a tutorial on blind sql injection using an example of a hypothetical blind SQL injection attack below. Example of Blind SQL Injection For our example, let’s suppose that we have a fake example social networking site – let’s call it mybigspace.com – that has different profiles for people (just like Facebook). Each user on the site mybigspace.com has a unique ID number assigned to them that identifies their profile. And, a query string is used to retrieve each individual’s profile – so in the URL below, the user with an ID of 1008 will be pulled up and displayed on the page. Let’s say that the user ID of 1008 belongs to a user named “John Doe”. Here is what the URL that’s used to load John Doe’s profile would look like: // this is John Doe's page: http://www.mybigspace.com?id=1008 Let’s assume that the user ID would be used to retrieve the user’s profile details (like links to pictures, his/her birthday, etc) from a database. So, if a user requests the URL “http://www.mybigspace.com?id=1008″, then that query string would be used to run some SQL on the servers of mybigspace.com. That SQL could look like this: SELECT * FROM profiles WHERE ID = '1008'; We are assuming that there is a master table called profiles which stores all the different profiles of people who are on the social networking site. But now let’s say that the hacker tries to inject some SQL into the URL query string – so the hacker tries to load this URL in his/her browser: http://www.mybigspace.com?id=1008 AND 1=1 Blind SQL Injection uses simple boolean expressions Loading the URL above might result in the server of mybigspace.com running the SQL below. Note that the SQL below contains a simple boolean expression – a “1 = 1″ which will of course always return true because one is always equal to one. That expression is appended to the query string in the URL above. Here is the SQL we discussed: SELECT * FROM profiles WHERE ID = '1008' AND 1=1; We said that loading the URL “http://www.mybigspace.com?id=1008 AND 1=1″ mightresult in mybigspace.com running the SQL above – the reason we said might is because of the fact that it depends on whether the server would allow the extra characters after the 1008 to be injected into the SQL. If the server does accept that SQL and allows it to be run, then the page that belongs to “John Doe” would be loaded just fine. And, the hacker will know that his SQL injection attack worked, which means that the site mybigspace.com is vulnerable to SQL injection attacks. Of course, if the server does not respond with John Doe’s page when the URL “http://www.mybigspace.com?id=1008 AND 1=1″ is requested, and instead just returns something like a “Page not found”, then the hacker knows that a blind SQL injection attack is probably not possible. So, let’s continue with the assumption that the website is vulnerable to blind SQL injection. Now, the hacker can use more sophisticated queries to gather information about the server environment, and he can work his way into getting some potentially sensitive data. For instance, now if the hacker wants to find out which version of MySQL the server is running (assuming that it is running a MySQL database), then the hacker could try to load this URL, which has some extra SQL appended to check to see if the server is running MySQL version 5 : http://www.mybigspace.com?id=1008 AND substring(@@version, 1, 1)=5 The SQL “substring(@@version, 1, 1)=5″ just checks to see if the version of MySQL that is currently running is version 5 (through the “=5″ check), and if it is running version 5 then the page will just load normally because the SQL will run without a problem (this is of course assuming that the website is vulnerable to SQL injection and is basically just running the SQL that is part of the query string). If mybigspace.com’s server is not running MySQL version 5 then the SQL “substring(@@version, 1, 1)=5″ will return false because the check for equality is false. This means that the page will probably not load because the profile will not be retrieved, and so the hacker knows that the version of MySQL being run is not version 5. Blind SQL Injection Prevention As we have made pretty clear so far, a blind SQL injection attack can be done even if the display of database error messages is turned off. So, clearly turning off error message is not enough for prevention purposes. Prepared statements are great for preventing blind SQL injection because the SQL is compiled before any user input is added, which makes it impossible for user input to change and therefore compromise the integrity of the SQL statement. You can also use a vulnerability assessment tool to test your application and see how it responds to blind SQL injection attacks. There are many tools like that out there which will do this for you for a small fee, and are great at helping you prevent blind SQL injection attacks. Blind SQL Injection is slower than normal attacks The hacker can continue on this way, and slowly find out more and more information about the database system under attack. You can also see that blind SQL injection is quite a bit slower than normal SQL injection attacks because of the fact that the hacker has to deal with a database system that does not display error messages. What is the difference between a left outer join and a right outer join? It is best to illustrate the differences between left outer joins and right outer joins by use of an example. Here we have 2 tables that we will use for our example: Employee Location EmpID EmpName EmpID EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India 25 Johnson 39 Bangalore, India For the purpose of our example, it is important to note that the very last employee in the Employee table (Johnson, who has an ID of 25) is not in the Location table. Also, no one from the Employee table is from Bangalore (the employee with ID 39 is not in the Employee table). These facts will be significant in the discussion that follows. A left outer join Using the tables above, here is what the SQL for a left outer join would look like: select * from employee left outer join location on employee.empID = location.empID; In the SQL above, we are joining on the condition that the employee ID’s match in the tables Employee and Location. So, we will be essentially combining 2 tables into 1, based on the condition that the employee ID’s match. Note that we can get rid of the "outer" in left outer join, which will give us the SQL below. This is equivalent to what we have above. select * from employee left join location on employee.empID = location.empID; What do left and right mean? A left outer join retains all of the rows of the “left” table, regardless of whether there is a row that matches on the “right” table. What are the “left” and “right” tables? That’s easy – the “left” table is simply the table that comes first in the join statement – in this case it is the Employee table, it’s called the “left” table because it appears to the left of the keyword “join”. So, the “right” table in this case would be Location. The SQL above will give us the result set shown below. Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India 25 Johnson NULL NULL As you can see from the result set, all of the rows from the “left” table (Employee) are returned when we do a left outer join. The last row of the Employee table (which contains the "Johson" entry) is displayed in the results even though there is no matching row in the Location table. As you can see, the non-matching columns in the last row are filled with a "NULL". So, we have "NULL" as the entry wherever there is no match. Subscribe to our newsletter on the left to receive more free interview questions! What is a right outer join? A right outer join is pretty much the same thing as a left outer join, except that all the rows from the right table are displayed in the result set, regardless of whether or not they have matching values in the left table. This is what the SQL looks like for a right outer join: select * from employee right outer join location on employee.empID = location.empID; // taking out the "outer", this also works: select * from employee right join location on employee.empID = location.empID; Using the tables presented above, we can show what the result set of a right outer join would look like: Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India NULL NULL 39 Bangalore, India We can see that the last row returned in the result set contains the row that was in the Location table, but which had no matching “empID” in the Employee table (the "Bangalore, India" entry). Because there is no row in the Employee table that has an employee ID of "39", we have NULL’s in that row for the Employee columns. So, what is the difference between the right and left outer joins? The difference is simple – in a left outer join, all of the rows from the “left” table will be displayed, regardless of whether there are any matching columns in the “right” table. In a right outer join, all of the rows from the “right” table will be displayed, regardless of whether there are any matching columns in the “left” table. Hopefully the example that we gave above help clarified this as well. Should I use a right outer join or a left outer join? Actually, it doesn’t matter. The right outer join does not add any functionality that the left outer join didn’t already have, and vice versa. All you would have to do to get the same results from a right outer join and a left outer join is switch the order in which the tables appear in the SQL statement. If that’s confusing, just take a closer look at the examples given above. In SQL, what’s the difference between a full join and an inner join? A brief explanation of a join Let’s start with a quick explanation of a join. Joins are used to combine the data from two tables, with the result being a new, temporary table. The temporary table is created based on column(s) that the two tables share, which represent meaningful column(s) of comparison. The goal is to extract meaningful data from the resulting temporary table. Joins are performed based on something called a predicate, which specifies the condition to use in order to perform a join. It is best to illustrate the differences between full joins and inner joins by use of an example. Here we have 2 tables that we will use for our example: Employee Location EmpID EmpName EmpID EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India 25 Johnson 39 Bangalore, India For the purpose of our example, it is important to note that the very last employee in the Employee table (Johson, who has an ID of 25) is not in the Location table. Also, no one from the Employee table is from Bangalore (the employee with ID 39 is not in the Employee table). These facts will be significant in the discussion that follows. Full joins Let’s start the explanation with full joins. Here is what the SQL for a full join would look like, using the tables above: select * from employee full join location on employee.empID = location.empID; Subscribe to our newsletter on the left to receive more free interview questions! A full join will return all rows that match based on the “employee.empID = location.empID” join predicate, and it will even return all the rows that do not match – which is why it is called a full join. The SQL above will give us the result set shown below: Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India 25 Johnson NULL NULL NULL NULL 39 Bangalore, India You can see in the table above that the full outer join returned all the rows from both the tables – and if the tables do have a match on the empID, then that is made clear in the results. Anywhere there was not a match on the empID, there is a “NULL” for the column value. So, that is what a full join will look like. A full join is also known as a full outer join It’s good to remember that a full join is also known as a full outer join – because it combines the features of both a left outer join and a right outer join . What about inner joins? Now that we’ve gone over full joins, we can contrast those with the inner join. The difference between an inner join and a full join is that an inner join will return only the rows that actually match based on the join predicate – which in this case is “employee.empID = location.empID”. Once again, this is best illustrated via an example. Here’s what the SQL for an inner join will look like: select * from employee inner join location on employee.empID = location.empID This can also be written as: select * from employee, location where employee.empID = location.empID Now, here is what the result of running that SQL would look like: Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India The difference between the full join and inner join We can see that an inner join will only return rows in which there is a match based on the join predicate. In this case, what that means is anytime the Employee and Location table share an Employee ID, a row will be generated in the results to show the match. Looking at the original tables, one can see that those Employee ID’s that are shared by those tables are displayed in the results. But, with a full join, the result set will retain all of the rows from both of the tables. In SQL, what is the difference between a left join and a left outer join? There is actually no difference between a left join and a left outer join – they both refer to the exact same operation in SQL. An example will help clear this up. Here we have 2 tables that we will use for our example: Employee Location EmpID EmpName EmpID EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India 25 Johnson 39 Bangalore, India It’s important to note that the very last row in the Employee table does not exist in the Employee Location table. Also, the very last row in the Employee Location table does not exist in the Employee table. These facts will prove to be significant in the discussion that follows. Left Outer Join Here is what the SQL for a left outer join would look like, using the tables above: select * from employee left outer join location on employee.empID = location.empID; Subscribe to our newsletter on the left to receive more free interview questions! In the SQL above, we actually remove the "outer" in left outer join, which will give us the SQL below. Running the SQL with the “outer” keyword, would give us the exact same results as running the SQL without the “outer”. Here is the SQL without the “outer” keyword: select * from employee left join location on employee.empID = location.empID; A left outer join (also known as a left join) retains all of the rows of the left table, regardless of whether there is a row that matches on the right table. The SQL above will give us the result set shown below. Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India 25 Johnson NULL NULL What is the difference between a right outer join and a right join? Once again, a right outer join is exactly the same as a right join. This is what the SQL looks like: select * from employee right outer join location on employee.empID = location.empID; // taking out the "outer", this would give us // the same results: select * from employee right join location on employee.empID = location.empID; Using the tables presented above, we can show what the result set of a right outer join would look like: Employee.EmpID Employee.EmpName Location.EmpID Location.EmpLoc 13 Jason 13 San Jose 8 Alex 8 Los Angeles 3 Ram 3 Pune, India 17 Babu 17 Chennai, India NULL NULL 39 Bangalore, India We can see that the last row returned in the result set contains the row that was in the Location table, but not in the Employee table (the "Bangalore, India" entry). Because there is no matching row in the Employee table that has an employee ID of "39", we have NULL’s in the result set for the Employee columns. In SQL, what’s the difference between the having clause and the group by statement? In SQL, the having clause and the group by statement work together when using aggregate functions like SUM, AVG, MAX, etc. This is best illustrated by an example. Suppose we have a table called emp_bonus as shown below. Note that the table has multiple entries for employees A and B – which means that both employees A and B have received multiple bonuses. emp_bonus Employee Bonus A 1000 B 2000 A 500 C 700 B 1250 If we want to calculate the total bonus amount that each employee has received, then we would write a SQL statement like this: select employee, sum(bonus) from emp_bonus group by employee; The Group By Clause In the SQL statement above, you can see that we use the "group by" clause with the employee column. The group by clause allows us to find the sum of the bonuses for eachemployee – because each employee is treated as his or her very own group. Using the ‘group by’ in combination with the ‘sum(bonus)’ statement will give us the sum of all the bonuses for employees A, B, and C. Subscribe to our newsletter for more free interview questions. Running the SQL above would return this: Employee Sum(Bonus) A 1500 B 3250 C 700 Now, suppose we wanted to find the employees who received more than $1,000 in bonuses for the year of 2012 – this is assuming of course that the emp_bonus table contains bonuses only for the year of 2012. This is when we need to use the HAVING clause to add the additional check to see if the sum of bonuses is greater than $1,000, and this is what the SQL look like: GOOD SQL: select employee, sum(bonus) from emp_bonus group by employee having sum(bonus) > 1000; And the result of running the SQL above would be this: Employee Sum(Bonus) A 1500 B 3250 Difference between having clause and group by statement So, from the example above, we can see that the group by clause is used to group column(s) so that aggregates (like SUM, MAX, etc) can be used to find the necessary information. The having clause is used with the group by clause when comparisons need to be made with those aggregate functions – like to see if the SUM is greater than 1,000, as in our example above. So, the having clause and group by statements are not really alternatives to each other – but they are used alongside one another! In SQL, how and when would you do a group by with multiple columns? Also provide an example. In SQL, the group by statement is used along with aggregate functions like SUM, AVG, MAX, etc. Using the group by statement with multiple columns is useful in many different situations – and it is best illustrated by an example. Suppose we have a table shown below called Purchases. The Purchases table will keep track of all purchases made at a fictitious store. Purchases purchase_date item items_purchased 2011-03-25 00:00:00.000 Wireless Mouse 2 2011-03-25 00:00:00.000 Wireless Mouse 5 2011-03-25 00:00:00.000 MacBook Pro 1 2011-04-01 00:00:00.000 Paper Clips 20 2011-04-01 00:00:00.000 Stapler 3 2011-04-01 00:00:00.000 Paper Clips 15 2011-05-15 00:00:00.000 DVD player 3 2011-05-15 00:00:00.000 DVD player 8 2011-05-15 00:00:00.000 Stapler 5 2011-05-16 00:00:00.000 MacBook Pro 2 Now, let’s suppose that the owner of the store wants to find out, on a given date, how many of each product was sold in the store. Then we would write this SQL in order to find that out: select purchase_date, item, sum(items_purchased) as "Total Items" from Purchases group by item, purchase_date; Subscribe to our newsletter on the left to receive more free interview questions! Running the SQL above would return this: purchase_date item Total Items 2011-03-25 00:00:00.000 Wireless Mouse 7 2011-03-25 00:00:00.000 MacBook Pro 1 2011-04-01 00:00:00.000 Paper Clips 35 2011-04-01 00:00:00.000 Stapler 3 2011-05-15 00:00:00.000 DVD player 11 2011-05-15 00:00:00.000 Stapler 5 2011-05-16 00:00:00.000 MacBook Pro 2 Note that in the SQL we wrote, the group by statement uses multiple columns: “group by item, purchase_date;”. This allows us to group the individual items for a given date – so basically we are dividing the results by the date the items are purchased, and then for a given date we are able to find how many items were purchased for that date. This is why the group by statement with multiple columns is so useful! In SQL, how do distinct and order by work together? The best way to illustrate this is through an example. Let’s say that we have a table called Orders like the one below – where each row represents a separate order. Orders ordernumber order_date cust_id salesperson_id Amount 10 8/2/96 4 2 540 20 1/30/99 4 8 1800 30 7/14/95 9 1 460 40 1/29/98 7 2 2400 50 2/3/98 6 7 600 60 3/2/98 6 7 720 70 5/6/98 9 7 150 Now suppose that we want to retrieve all of the salesperson ID’s and sort them in descending order according to their highest respective Order Amount value (that would be the Amount column). This will serve as a ranking of the salespeople to see who has the most valuable orders. And, of course, we only want each salesperson ID to be displayed once in the results – we don’t really care about all of their order amounts, just their highest order amount value. So, now you think that you can just write some SQL like this to get what you want: SELECT DISTINCT salesperson_id FROM Orders ORDER BY Amount DESC -- in descending order, returns highest amount first... DISTINCT and Order By in MySQL If we run that query in MySQL you may have thought that it would return this: salesperson_id ---2 8 7 1 Running the SQL above in MySQL actually returns this as the result set: salesperson_id ---8 7 2 1 But, wait a minute…if you just look at the Orders table above you can see that the salesperson_id with the highest corresponding Amount is not 8, but 2 – because the salesperson_id of 2 has an order with an amount of 2400! And, 2 appears 3rd in the list. So what the heck is going on here – why is our SQL returning such strange results? Well, let’s analyze the query a bit more to see what is actually happening. We are asking for every distinct salesperson_id in the Orders table, ordered by their corresponding order Amount. But, the problem here is that the salespeople with salesperson_id values of 2 and 7 both have multiple orders in the Orders table. The query itself is not specific enough So, in the query above we are asking MySQL to retrieve every distinct value of the salesperson_id and order those results by their corresponding Amount value. For example, when it comes across orders with salesperson_id’s of 2, it does not know whether we want the row where the order amount is 540 or 2400 – and it has to choose only one of those rows because we specifically asked for distinct values of the salesperson_id . This means that it just chooses one of those rows arbitrarily/randomly – since we never really told it which one. And, MySQL is obviously choosing the row where the amount is 540, because 2 should be returned at the top of our list if it chose the row where the Amount is 2400. But, you might be thinking that we specify that we want to order the results by the descending Amount values – so why doesn’t the SQL just take the highest value for each salesperson_id and use that? Well, because we never really told SQL that is what we actually wanted! Look closely at the SQL and you will see what I mean – do we ever actually specify to choose the highest Amount for EACH salesperson_id and to use that value? No, we don’t! And that means the problem is that the SQL is not specific enough – we have to tell the RDBMS exactly what we want in order to get the right results, otherwise you get results that do not make sense. In other words, when you do stupid things, stupid things happen. Why does Mysql allow columns in the ORDER BY if they are not part of the select DISTINCT list? Actually, running the query above would result in an error message in other RDBMS’s like SQL Server. The only reason MySQL allows it is because it assumes you know what you are doing – the query would actually make sense if the Amount value was the same across different rows for a given salesperson_id. As always, an example will help clarify what we mean here. Let’s suppose that the Orders table looks like this instead: Orders ordernumber order_date cust_id salesperson_id Amount 10 8/2/96 4 2 2400 20 1/30/99 4 8 1800 30 7/14/95 9 1 460 40 1/29/98 7 2 2400 50 2/3/98 6 7 600 60 3/2/98 6 7 600 70 5/6/98 9 7 600 Now, if we run that same exact query: SELECT DISTINCT salesperson_id FROM Orders ORDER BY Amount DESC -- in descending order, returns highest amount first... We will now get results that make sense: salesperson_id ---2 8 7 1 The reason we get the results that we expected is that now the rows with salesperson_id’s of 2 and 7 all have the same exact value for Amount. This means that even though MySQL will arbitrarily choose a row in the group of rows with salesperson_id of 2 or 7, each row will have the same exact Amount value as all others, so it does not matter which row MySQL chooses in a given group – you will get the same results. So, we can say that it is safe to order by with a non-select column and select a different distinct column when the different groups of that distinct column all contain the same values for the order by column. That sounds confusing, but it should make sense if you paid attention to our example. If that condition does not hold true, then you will run the risk of getting some very unexpected results, as we had shown above as well. Now, the question is what is a good workaround to the problem we presented above? Read on below to find out – the solution we present should work across most (if not all) RDBMS’s. Workaround for the “ORDER BY items must appear in the select list if SELECT DISTINCT is specified” error message in SQL Server As we mentioned above, MySQL would allow you to run a query like this without throwing any error: SELECT DISTINCT salesperson_id FROM Orders ORDER BY Amount DESC But, SQL Server actually does throw an error which says “ORDER BY items must appear in the select list if SELECT DISTINCT is specified”. So, the question is what is a good solution to modify our SQL so that we can workaround this error message, and get the results that we actually want? Well, you might think that in order to fix the error message that you would get in SQL server you could just write some code like this: SELECT DISTINCT salesperson_id, Amount FROM Orders ORDER BY Amount DESC But, think carefully about what the SQL above is doing. It is applying the DISTINCT keyword to both the salesperson_id and Amount columns – which basically means that every row where those 2 columns have a distinct combination of values will be returned in the results. Take a look at the Orders table and you can see that every row in the table has a distinct combination of the salesperson_id and Amount values, which also means that the salesperson_id and Amount will be returned from every row in the table when the SQL above is run. Of course, the results will be ordered by the Amount in descending order. And, this is what the results will look like when we run the SQL above: salesperson_id 2 8 7 7 2 1 7 Amount 2400 1800 720 600 540 460 150 But, is this what we actually wanted? No! What we really want is the list of salesperson ID’s in order of who has the highest valued order – where each salesperson ID only appears once in the result list . All the query above is giving us is basically every row’s salesperson_id and Amount combination in the table, ordered by the Amount value. So what is a workaround for this problem – in other words, how can we be more specific to get what we really want? Well, let’s rephrase the problem – what if we say we want to retrieve each salesperson ID sorted by their respective highest dollar amount value (and only have each salesperson_id returned just once)? This is different than just saying that we want each distinct salesperson ID sorted by their Amount, because we are being more specific by saying that we want to sort by their respective highest dollar amount value. Hopefully you see the difference. Now that we have a more specific question in mind, let’s see if we can come up with a more specific answerso that we can write the correct SQL. Well, since we want to find the highest Amount value for each salesperson_id, what SQL construct do you think we should use? If you guessed group by you would be correct – because in order to find the highest value for a group of salesperson_id’s, we would need to use the GROUP BY statement. Then, we can order the results by the maximum value in each group of salesperson_ids. So, this is what the SQL would look like: SELECT distinct salesperson_id FROM Orders GROUP BY salesperson_id ORDER BY MAX(Amount) DESC -- in descending order, returns highest amount first... Just to clarify how the group by will work – for the “group” of salesperson ID’s equal to 7, the maximum value of the amount would be 720. And for the “group” of salesperson ID’s equal to 2, the maximum value of the amount would be 2400. So, running the SQL above would give us these results, which is correct: salesperson_id ---2 8 7 1 Finally we have a query that makes sense, which also gives us results that make sense! In SQL, what is the default sort order of the Order By clause? By default, the order by statement will sort in ascending order if no order (whether ascending or descending) is explicitly specified. This means that because the default sort order is ascending, the values will be sorted starting from the “smallest” value to the largest. This is true in all major RDBMS’s – including MySQL, Oracle, Microsoft SQL Server, Teradata, SAP, and others. An example showing the Order By default sort order: Take a look at the simple table below. Customers cust_id cust_name 79 Joe 32 Bill 87 Akash 14 Sam Now, let’s write some SQL to retrieve the cust_name values sorted by their respective cust_id’s, but note that we do not specify whether to sort by descending or ascending order: select cust_name FROM Customers ORDER BY cust_id Because the order by will work in ascending order by default, the SQL above will return the following results: cust_name Sam Bill Joe Akash Now you have seen the default behavior of the Order By clause in SQL – it will sort in ascending order. Suppose that you are given the following simple database table called Employee that has 2 columns named Employee ID and Salary: Employee Employee ID Salary 3 200 4 800 7 450 Write a SQL query to get the second highest salary from the table above. Also write a query to find the nth highest salary in SQL, where n can be any number. The easiest way to start with a problem like this is to ask yourself a simpler question first. So, let’s ask ourselves how can we find the highest salary in a table? Well, you probably know that is actually really easy – we can just use the MAX aggregate function: select MAX(Salary) from Employee; Remember that SQL is based on set theory You should remember that SQL uses sets as the foundation for most of its queries. So, the question is how can we use set theory to find the 2nd highest salary in the table above? Think about it on your own for a bit – even if you do not remember much about sets, the answer is very easy to understand and something that you might be able to come up with on your own. Figuring out the answer to find the 2nd highest salary What if we try to exclude the highest salary value from the result set returned by the SQL that we run? If we remove the highest salary from a group of salary values, then we will have a new group of values whose highest salary is actually the 2nd highest in the original Employee table. So, if we can somehow select the highest value from a result set thatexcludes the highest value, then we would actually be selecting the 2nd highest salary value. Think about that carefully and see if you can come up with the actual SQL yourself before you read the answer that we provide below. Here is a small hint to help you get started: you will have to use the “NOT IN” SQL operator. Solution to finding the 2nd highest salary in SQL Now, here is what the SQL will look like: SELECT MAX(Salary) FROM Employee WHERE Salary NOT IN (SELECT MAX(Salary) FROM Employee ) Running the SQL above would return us “450″, which is of course the 2nd highest salary in the Employee table. Subscribe to our newsletter for more free interview questions. An explanation of the solution The SQL above first finds the highest salary value in the Employee table using “(select MAX(Salary) from Employee)”. Then, adding the “WHERE Salary NOT IN” in front basically creates a new set of Salary values that does not include the highest Salary value. For instance, if the highest salary in the Employee table is 200,000 then that value will be excluded from the results using the “NOT IN” operator, and all values except for 200,000 will be retained in the results. This now means that the highest value in this new result set will actually be the 2nd highest value in the Employee table. So, we then select the max Salary from the new result set, and that gives us 2nd highest Salary in the Employee table. And that is how the query above works. An alternative solution using the not equals SQL operator We can actually use the not equals operator – the “<>” – instead of the NOT IN operator as an alternative solution to this problem. This is what the SQL would look like: select MAX(Salary) from Employee WHERE Salary <> (select MAX(Salary) from Employee ) How would you write a SQL query to find the Nth highest salary? What we did above was write a query to find the 2nd highest Salary value in the Employee table. But, another commonly asked interview question is how can we use SQL to find theNth highest salary, where N can be any number whether it’s the 3rd highest, 4th highest, 5th highest, 10th highest, etc? This is also an interesting question – try to come up with an answer yourself before reading the one below to see what you come up with. The answer and explanation to finding the nth highest salary in SQL Here we will present one possible answer to finding the nth highest salary first, and the explanation of that answer after since it’s actually easier to understand that way. Note that the first answer we present is actually not optimal from a performance standpoint since it uses a subquery, but we think that it will be interesting for you to learn about because you might just learn something new about SQL. If you want to see the more optimal solutions first, you can skip down to the sections that says “Find the nth highest salary without a subquery” instead. The SQL below will give you the correct answer – but you will have to plug in an actual value for N of course. This SQL to find the Nth highest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS: SELECT * /*This is the outer query part */ FROM Employee Emp1 WHERE (N-1) = ( /* Subquery starts here */ SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > Emp1.Salary) How does the query above work? The query above can be quite confusing if you have not seen anything like it before – pay special attention to the fact that “Emp1″ appears in both the subquery (also known as an inner query) and the “outer” query. The outer query is just the part of the query that is not the subquery/inner query – both parts of the query are clearly labeled in the comments. The subquery is a correlated subquery The subquery in the SQL above is actually a specific type of subquery known as a correlatedsubquery. The reason it is called a correlated subquery is because the the subquery uses a value from the outer query in it’s WHERE clause. In this case that value is the Emp1 table alias as we pointed out earlier. A normal subquery can be run independently of the outer query, but a correlated subquery can NOT be run independently of the outer query. If you want to read more about the differences between correlated and uncorrelated subqueries you can go here: Correlated vs Uncorrelated Subqueries. The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the Emp1 value as well. Finding nth highest salary example and explanation Let’s step through an actual example to see how the query above will actually execute step by step. Suppose we are looking for the 2nd highest Salary value in our table above, so our N is 2. This means that the query will look like this: SELECT * FROM Employee Emp1 WHERE (1) = ( SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > Emp1.Salary) You can probably see that Emp1 and Emp2 are just aliases for the same Employee table – it’s like we just created 2 separate clones of the Employee table and gave them different names. Understanding and visualizing how the query above works Let’s assume that we are using this data: Employee Employee ID Salary 3 200 4 800 7 450 For the sake of our explanation, let’s assume that N is 2 – so the query is trying to find the 2nd highest salary in the Employee table. The first thing that the query above does is process the very first row of the Employee table, which has an alias of Emp1. The salary in the first row of the Employee table is 200. Because the subquery is correlated to the outer query through the alias Emp1, it means that when the first row is processed, the query will essentially look like this – note that all we did is replace Emp1.Salary with the value of 200: SELECT * FROM Employee Emp1 WHERE (1) = ( SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > 200) So, what exactly is happening when that first row is processed? Well, if you pay special attention to the subquery you will notice that it’s basically searching for the count of salary entries in the Employee table that are greater than 200. Basically, the subquery is trying to find how many salary entries are greater than 200. Then, that count of salary entries is checked to see if it equals 1 in the outer query, and if so then everything from that particular row in Emp1 will be returned. Note that Emp1 and Emp2 are both aliases for the same table – Employee. Emp2 is only being used in the subquery to compare all the salary values to the current salary value chosen in Emp1. This allows us to find the number of salary entries (the count) that are greater than 200. And if this number is equal to N-1 (which is 1 in our case) then we know that we have a winner – and that we have found our answer. But, it’s clear that the subquery will return a 2 when Emp1.Salary is 200, because there are clearly 2 salaries greater than 200 in the Employee table. And since 2 is not equal to 1, the salary of 200 will clearly not be returned. So, what happens next? Well, the SQL processor will move on to the next row which is 800, and the resulting query looks like this: SELECT * FROM Employee Emp1 WHERE (1) = ( SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > 800) Since there are no salaries greater than 800, the query will move on to the last row and will of course find the answer as 450. This is because 800 is greater than 450, and the count will be 1. More precisely, the entire row with the desired salary would be returned, and this is what it would look like: EmployeeID Salary 7 450 It’s also worth pointing out that the reason DISTINCT is used in the query above is because there may be duplicate salary values in the table. In that scenario, we only want to count repeated salaries just once, which is exactly why we use the DISTINCT operator. A high level summary of how the query works Let’s go through a high level summary of how someone would have come up with the SQL in the first place – since we showed you the answer first without really going through the thought process one would use to arrive at that answer. Think of it this way – we are looking for a pattern that will lead us to the answer. One way to look at it is that the 2nd highest salary would have just one salary that is greater than it. The 4th highest salary would have 3 salaries that are greater than it. In more general terms, in order to find the Nth highest salary, we just find the salary that has exactly N-1 salaries greater than itself. And that is exactly what the query above accomplishes – it simply finds the salary that has N-1 salaries greater than itself and returns that value as the answer. Find the nth highest salary using the TOP keyword in SQL Server We can also use the TOP keyword (for databases that support the TOP keyword, like SQL Server) to find the nth highest salary. Here is some fairly simply SQL that would help us do that: SELECT TOP 1 Salary FROM ( SELECT DISTINCT TOP N Salary FROM Employee ORDER BY Salary DESC ) AS Emp ORDER BY Salary To understand the query above, first look at the subquery, which simply finds the N highest salaries in the Employee table and arranges them in descending order. Then, the outer query will actually rearrange those values in ascending order, which is what the very last line “ORDER BY Salary” does, because of the fact that the ORDER BY Default is to sort values in ASCENDING order. Finally, that means the Nth highest salary will be at the top of the list of salaries, which means we just want the first row, which is exactly what “SELECT TOP 1 Salary” will do for us! Find the nth highest salary without using the TOP keyword There are many other solutions to finding the nth highest salary that do not need to use the TOP keyword, one of which we already went over. Keep reading for more solutions. Find the nth highest salary in SQL without a subquery The solution we gave above actually does not do well from a performance standpoint. This is because the use of the subquery can really slow down the query. With that in mind, let’s go through some different solutions to this problem for different database vendors. Because each database vendor (whether it’s MySQL, Oracle, or SQL Server) has a different SQL syntax and functions, we will go through solutions for specific vendors. But keep in mind that the solution presented above using a subquery should work across different database vendors. Find the nth highest salary in MySQL In MySQL, we can just use the LIMIT clause along with an offset to find the nth highest salary. If that doesn’t make sense take a look at the MySQL-specific SQL to see how we can do this: SELECT Salary FROM Employee ORDER BY Salary DESC LIMIT n-1,1 Note that the DESC used in the query above simply arranges the salaries in descending order – so from highest salary to lowest. Then, the key part of the query to pay attention to is the “LIMIT N-1, 1″. The LIMIT clause takes two arguments in that query – the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. So, it’s saying that the offset of the first row to return should be N-1, and the max number of rows to return is 1. What exactly is the offset? Well, the offset is just a numerical value that represents the number of rows from the very first row, and since the rows are arranged in descending order we know that the row at an offset of N-1 will contain the (N-1)th highest salary. Find the nth highest salary in SQL Server In SQL Server, there is no such thing as a LIMIT clause. But, we can still use the offset to find the nth highest salary without using a subquery – just like the solution we gave above in MySQL syntax. But, the SQL Server syntax will be a bit different. Here is what it would look like: SELECT Salary FROM Employee ORDER BY Salary DESC OFFSET N-1 ROW(S) FETCH FIRST ROW ONLY Note that I haven’t personally tested the SQL above, and I believe that it will only work in SQL Server 2012 and up. Let me know in the comments if you notice anything else about the query. Find the nth highest salary in Oracle using rownum Oracle syntax doesn’t support using an offset like MySQL and SQL Server, but we can actually use the row_number analytic function in Oracle to solve this problem. Here is what the Oracle-specific SQL would look like to find the nth highest salary: select * from ( select Emp.*, row_number() over (order by Salary DESC) rownumb from Employee Emp ) where rownumb = n; /*n is nth highest salary*/ The first thing you should notice in the query above is that inside the subquery the salaries are arranged in descending order. Then, the row_number analytic function is applied against the list of descending salaries. Applying the row_number function against the list of descending salaries means that each row will be assigned a row number starting from 1. And since the rows are arranged in descending order the row with the highest salary will have a 1 for the row number. Note that the row number is given the alias rownumb in the SQL above. This means that in order to find the 3rd or 4th highest salary we simply look for the 3rd or 4th row. The query above will then compare the rownumb to n, and if they are equal will return everything in that row. And that will be our answer! Find the nth highest salary in Oracle using RANK Oracle also provides a RANK function that just assigns a ranking numeric value (with 1 being the highest) for some sorted values. So, we can use this SQL in Oracle to find the nth highest salary using the RANK function: select * FROM ( select EmployeeID, Salary ,rank() over (order by Salary DESC) ranking from Employee ) WHERE ranking = N; The rank function will assign a ranking to each row starting from 1. This query is actually quite similar to the one where we used the row_number() analytic function, and works in the same way as well. We’ve now gone through many different solutions in different database vendors like Oracle, MySQL, and SQL Server. Hopefully now you understand how to solve a problem like this, and you have improved your SQL skills in the process! Be sure to leave a comment if you have any questions or observations. What is a role in a database? A database role is a collection of any number of permissions/privileges that can be assigned to one or more users. A database role also is also given a name for that collection of privileges. The majority of today’s RDBMS’s come with predefined roles that can be assigned to any user. But, a database user can also create his/her own role if he or she has the CREATE ROLE privilege. Advantages of Database Roles Why are database roles needed? Well, let’s go over some of the advantages of using database roles and why they would be necessary: Roles continue to live in database even after users are deleted/dropped Many times a DBA (Database Administrator) has to drop user accounts for various reasons – say, for example, an employee quits the company so his/her user account is removed from the system. Now suppose that those same user accounts need to be recreated later on – just assume that same employee rejoins the company later on and needs his same account. That employee’s user account probably had a lot of specific permissions assigned to it. So, when his/her account was deleted then all of those permissions were deleted as well, which creates a hassle for the DBA who has to reassign all of those permissions one by one. But, if a role was being used then all of those permissions could have just been bundled into one role – and then the process of re-instating that employee into the system would mean that the DBA simply reassigns the role to the employee. And, of course that role could also be used for other users as well. So, this is a big advantage of using a database role. Roles save DBA’s time Another advantage is the fact that a DBA can grant a lot of privileges with one simple command by assigning a user to a role. Database roles are present before users accounts are created And finally, an advantage of database roles is that they can be used to assign a group of permissions that can be re-used for new users who belong to a specific group of people who need those permissions. For example, you may want to have a group of permissions in a role reserved just for some advanced users who know what they are doing and assign that role to a user only when a new advanced user needs that role. Or, you can have a group of privileges for users who are all working on the same project and need the same type of access. Disadvantages of Database Roles The main disadvantage of using a database role is that a role may be granted to user, but that role may have more privileges than that user may actually need. This could cause a potential security issue if that user abuses his extra privileges and potentially ruins some part of the database. An example of this is that in older versions of Oracle (before release 10.2), there is a role called CONNECT, which included privileges like CREATE TABLE, CREATE VIEW, CREATE SESSIONS, ALTER SESSION, and several other privileges. But, having all of these privileges is probably too much for a normal business user. That is probably why in newer versions of Oracle (since version 10.2), the CONNECT role has been changed so that it only has the CREATE SESSION privilege. How to create a database role Most RDBMS’s use the CREATE ROLE syntax to define a role. And then, the GRANT statement is used to give permissions to that database role. But, the exact details vary from one RDBMS to another so it’s best to consult the documentation. Example of a database role Here is an example of what creating a database role could look like: CREATE ROLE advancedUsers; GRANT UPDATE ON SOMETABLE TO advancedUsers; What does the CREATE USER Statement do in SQL? What is the syntax and other details? It’s pretty obvious what the CREATE USER statement does – it allows you to create a user in the database. Most of the popular databases out there already provide some sort of graphical interface that allows you to create users without actually typing in any SQL – like phpMyAdmin – which is a PHP interface to the MySQL database. In any case, the SQL standard defines the CREATE USER statement. SQL CREATE USER Syntax Here is what the syntax of the CREATE USER statement looks like: CREATE USER username [IDENTIFIED BY password] [other options]; CREATE USER Identified By The Identified By clause lets you say how the database should authenticate the user. The exact syntax of the Identified by clause varies from one database to another. What is a database transaction? How do database transactions work? Provide an example of a database transaction. In databases, a transaction is a set of separate actions that must all be completely processed, or none processed at all. When you think of a transaction, you should think of the phrase “all or nothing”, because that is a defining feature of database transactions – either every part of the transaction is completed, or nothing at all. One thing that’s important to understand is that a transaction can consist of multiple SQL statements – not just one. An example would be transferring some funds from one bank customer to another. This scenario would have to both credit one customer and debit another – requiring updates to different rows in table, but would be considered a single transaction. A commonly used synonym for a transaction is a unit of work. The acronym ACID can be used to remember the properties of database transactions. Here is what each letter in the acronym ACID stands for: Atomicity. This means that a transaction must remain whole – it’s all or nothing. So, the transaction as a whole must either fully succeed or fully fail. If and when the transaction is a success, all of the changes must be saved by the system. If the transaction fails, then all of the changes made by the transaction must be completely undone and the system must revert back to it’s original state before the changes were applied. The term rollback is used for the process that undoes any changes made by a transaction that has failed – think of it as the database rolling back the changes of a failed transaction. The term commit is used to refer to the process which makes the transaction changes fixed – think of it as the database fully committing the transaction changes once and for all. Consistency. This means that a transaction should change the database from one consistent state to another. Isolation. This means that each transaction should do it’s work independently of other transactions that might be running at the same time. Durability. This means that any changes made by a transactions that have run to completion should stay permanent, even if the database fails or shuts down dues to something like power loss. You might be confused, because clearly data in a database is always changing, so how could anything be permanent? Well, permanent in this context simply means that the change made by the transaction will not disappear if and when the database encounters some failure or shuts down. RDBMS’s and Transaction support Most RDBMS’s have support for transactions. What this means is that they are able to identify both the start and end of every transaction and also log all changes made by a transaction in order to be ready for a rollback if necessary. Of course, the way in which transactions are supported by each RDBMS varies from one RDBMS vendor to another – so Oracle is different from MySQL in the way it supports transactions, as is DB2 from SQL Server, etc. What is a transaction log? Most of the RDBMS’s that have transaction support record all of the transactions along with any changes made by those transactions inside a transaction log. Inside the transaction log there is copy of what the database looked like before and after any changes made by a transaction. This means that if a rollback is necessary, then the record of what the database looked like before the changes were applied can be used to reverse those changes that were made by the transaction. Also, a commit of a transaction is not really considered finished until the transaction log has a record of the commit. If there is some sort of power failure that brings down a database, then the transaction log may be the only way that data can be recovered, especially because database changes are not written to disk immediately, and may not have made it to disk before the database outage. An example of a database transaction While transaction support differs from one database to another, it’s hard to give an example of a transaction without going into the specific syntax details of a particular RDBMS. But, some RDBM’s allow you to start a transaction with a SQL statement that looks like “START TRANSACTION OR BEGIN TRANSACTION”. Then, you follow that statement with the SQL that you would like to run as part of the transaction. Which databases support transactions? Here are some articles on different RDBMS’s and how they each support transactions: What is a database deadlock? Provide an example and explanation of a deadlock in a database. In a database, a deadlock is a situation that occurs when two or more different database sessions have some data locked, and each database session requests a lock on the data that another, different, session has already locked. Because the sessions are waiting for each other, nothing can get done, and the sessions just waste time instead. This scenario where nothing happens because of sessions waiting indefinitely for each other is known as deadlock. If you are confused, some examples of deadlock should definitely help clarify what goes on during deadlock. And, you should probably read our explanation of database locks before proceeding since that will help your understanding as well. Database deadlock example Suppose we have two database sessions called A and B. Let’s say that session A requests and has a lock on some data – and let’s call the data Y. And then session B has a lock on some data that we will call Z. But now, lets say that session A needs a lock on data Z in order to run another SQL statement, but that lock is currently held by session B. And, let’s say that session B needs a lock on data Y, but that lock is currently held by session A. This means that session B is waiting on session A’s lock and session B is waiting for session A’s lock. And this is what deadlock is all about! Let’s go through a more detailed (and less abstract) example of deadlock so that you can get a more specific idea of how deadlock can arise. Database deadlock example in banking Let’s use an example of two database users working at a bank – let’s call those database users X and Y. Let’s say that user X works in the customer service department and has to update the database for two of the banks customers, because one customer (call him customer A) incorrectly received $5,000 in his account when it should have gone to another customer (call him customer B) – so user X has to debit customer X’s account by $5,000 and also credit customer Y’s account $5,000. Note that the crediting of customer B and debiting of customer A will be run as a single transaction – this is important for the discussion that follows. Now, let’s also say that the other database user – Y – works in the IT department and has to go through the customers table and update the zip code of all customers who currently have a zip code of 94520, because that zip code has now been changed to 94521. So, the SQL for this would simply have a WHERE clause that would limit the update to customers with a zip code of 94520. Also, both customers A and B currently have zip codes of 94520, which means that their information will be updated by database user Y. Here is a breakdown of the events in our fictitious example that lead to deadlock: 1. Database user X in the customer service department selects customer A’s data and updates A’s bank balance to debit/decrease it by $5,000. However, what’s important here is that there is no COMMIT issued yet because database user X still has to update customer B’s balance to increase/credit by $5,000 – and those 2 separate SQL statements will run as a single SQL transaction. Most importantly, this means thatdatabase user X still holds a lock on the row for customer A because his transaction is not fully committed yet (he still has to update customer A). The lock on the row for customer A will stay until the transaction is committed. 2. Database user Y then has to run his SQL to update the zip codes for customers with zip codes of 94520. The SQL then updates customer B’s zip code. But, because the SQL statement from user Y must be run as a single transaction, the transaction has not committed yet because all of the customers haven’t had their zip codes changed yet. So, this means that database user Y holds a lock on the row for customer B. . 3. Now, Database user X still has to run the SQL statement that will update customer B’s balance to increase it by $5,000. But, now the problem is that database user Y has a lock on the row for customer B. This means that the request to update customer B’s balance must wait for user Y to release the lock on customer B. So, database user X is waiting for user Y to release a lock on customer B. 4. Now, the SQL statement being run by user Y tries to update the zip code for customer A. But, this update can not happen because user X holds a lock on customer A’s row. So, user Y is waiting for a lock to be released by user X. 5. Now you can see that we have user X waiting for user Y to release a lock and user Y waiting for user X to release a lock. This is the situation of deadlock, since neither user can make any progress, and nothing happens because they are both waiting for each other. So, in theory, these two database sessions will be stalled forever. But, read on to see how some DBMS’s deal with this unique situation. Database deadlock prevention So now you have seen an example of deadlock. The question is how do DBMS’s deal with it? Well, very few modern DBMS’s can actually prevent or avoid deadlocks, because there’s a lot of overhead required in order to do so. This is because the DBMS’s that do try to prevent deadlocks have to try to predict what a database user will do next, and the theory behind deadlock prevention is that each lock request is inspected to see if it has the potential to cause contention. If that is the case, then the lock is not allowed to be placed. Database deadlock detection Instead of deadlock prevention, the more popular approach to dealing with database deadlocks is deadlock detection. What is deadlock detection? Well, deadlock detection is based on the principle that one of the requests that caused the deadlock should be aborted. How does deadlock detection work? There are two common approaches to deadlock detection: 1. Whenever a session is waiting for a lock to be released it is in what’s known as a “lock wait” state. One way deadlock detection is implemented is to simply set the lock wait time period to a certain preset limit (like 5 seconds). So, if a session waits more than 5 seconds for a lock to free up, then that session will will be terminated. 2. The RDBMS can regularly inspect all the locks currently in place to see if there are any two sessions that have locked each other out and are in a state of deadlock. In either of the deadlock detection methods, one of the requests will have to be terminated to stop the deadlock. This also means that any transaction changes which came before the request will have to be rolled back so that the other request can make progress and finish. What is the difference between == and === in PHP? When comparing values in PHP for equality you can use either the == operator or the === operator. What’s the difference between the 2? Well, it’s quite simple. The == operator just checks to see if the left and right values are equal. But, the === operator (note the extra “=”) actually checks to see if the left and right values are equal, and also checks to see if they are of the same variable type (like whether they are both booleans, ints, etc.). An example of when you need to use the === operator in PHP It’s good to know the difference between the 2 types of operators that check for equality. But, it’s even better to understand when and why you would need to use the === operator versus the == operator. So, we want to give you an example of when you must use the === operator: When developing in PHP, you may find a time when you will need to use the strpos function – you should read more about this function here in order to understand our example (don’t worry it’s a very quick read). When using the strpos function, it may return 0 to mean that the string you are searching for is at the 0th index (or the very first position) of the other string that you are searching. Suppose, for whatever reason, we want to make sure that an input string does not contain the string “xyz”. Then we would have this PHP code: //bad code: if ( strpos( $inputString, 'xyz' ) == false ) { // do something } But, there is a problem with the code above: Because $strpos will return a 0 (as in the 0th index) if the $strpos variable happens to have the ‘xyz’ string at the very beginning of $inputString. But, the problem is that a 0 is also treated as false in PHP, and when the == operator is used to compare 0 and false, PHP will say that the 0 and false are equal. That is a problem because it is not what we wanted to have happen – even though the $inputString variable contains the string ‘xyz’, the equality of 0 and false tells us that $inputString doesnot contain the ‘xyz’ string. So, there is a problem with the way the return value of strpos is compared to the boolean value of ‘false’. But, what is the solution? Well, as you probably guessed, we can simply use the === operator for comparison. And, as we described earlier, the === operator will say that the 2 things being compared are equal only if both the type and value of the operands are also equal. So, if we compare a 0 to a false, then they will not be considered equal – which is exactly the kind of behavior we want. Here is what the good code will look like: //good code: if ( strpos( $inputString, 'xyz' ) === false ) { // do something } How would you parse HTML in PHP? If you have programmed in PHP, you may have come across the need to parse an HTML document at some point – because it is something that needs to be done in many different scenarios. But, how should you approach this problem? The first answer that you may think of is to use regular expressions, since they are good for finding patterns in strings. However, the reality is that HTML documents can be quite complex, and trying to find patterns through regular expressions can become quite difficult and painful. But there is good news – there is already a library in PHP that is meant for parsing HTML: Parse HTML in PHP. Has the problem already been solved? It’s always good to remember that whenever you are looking to solve a difficult problem, look to see if someone else has already solved it – because in the real world, you will want to save as much time on the projects that you work on as possible. And, a lot of times if someone has encountered the same problem as you, then there may be a solution that is already out there on the Web. In PHP, what are magic methods and how are they used? PHP functions that start with a double underscore – a “__” – are called magic functions (and/or methods) in PHP. They are functions that are always defined inside classes, and are not stand-alone (outside of classes) functions. The magic functions available in PHP are: __construct(), __destruct(), __call(), __callStatic(), __get(), __set(), __isset(), __unset(), __sleep(), __wakeup(), __toString(), __invoke(), __set_state(), __clone(), and __autoload(). Why are they called magic functions? The definition of a magic function is provided by the programmer – meaning you, as the programmer, will actually write the definition. This is important to remember – PHP does not provide the definitions of the magic functions – the programmer must actually write the code that defines what the magic function will do. But, magic functions will neverdirectly be called by the programmer – actually, PHP will call the function ‘behind the scenes’. This is why they are called ‘magic’ functions – because they are never directly called, and they allow the programmer to do some pretty powerful things. Confused? An example will help make this clear. Example of using the __construct() magic function in PHP The most commonly used magic function is __construct(). This is because as of PHP version 5, the __construct method is basically the constructor for your class. If PHP 5 can not find the __construct() function for a given class, then it will search for a function with the same name as the class name – this is the old way of writing constructors in PHP, where you would just define a function with the same name as the class. Now, here is an example of a class with the __construct() magic function: class Animal { public $height; public $weight; // height of animal // weight of animal public function __construct($height, $weight) { $this->height = $height; //set the height instance variable $this->weight = $weight; //set the weight instance variable } } In the code above, we have a simple __construct function defined that just sets the height and weight of an animal object. So let’s say that we create an object of the Animal class with this code: Animal obj = new Animal(5, 150); What happens when we run the code above? Well, a call to the __construct() function is made because that is the constructor in PHP 5. And the obj object will be an object of the Animal class with a height of 5 and a weight of 150. So, the __construct function is called behind the scenes. Magical, isn’t it? If you’re looking for another example of a magical function, then just check out the next page – where we give an example of the __autoload function in PHP and how it’s used. In PHP, what is the __autoload function? Can you provide an example of how it’s used? PHP functions that start with a double underscore – a “__” – are called magic functions in PHP. The __autoload function is also a magic function because it has a double underscore in front of it as well. If you want to read a little bit more about magic functions in general, you can go here: Magic Functions in PHP. Why is the __autoload function used? In PHP, the __autoload function is used to simplify the job of the programmer by including classes automatically without the programmer having to add a very large number of include statements. An example will help clarify. Suppose we have the following code: include include include include "class/class.Foo.php"; "class/class.AB.php"; "class/class.XZ.php"; "class/class.YZ.php"; $foo = new Foo; $ab = new AB; $xz = new XZ; $yz = new YZ; Note in the code above that we have to include each of the 4 different class files separately – because we are creating an instance of each class, we absolutely must have each class file. Of course, we are assuming that developers are defining only one class per source file – which is good practice when writing object oriented programs, even though you are allowed to have multiple classes in one source file. The __autoload function simplifies inclusion of class files in PHP Imagine if we need to use 20 or even 30 different classes within this one file – writing out each include statement can become a huge pain. And this is exactly the problem that the PHP __autoload function solves – it allows PHP to load the classes for us automatically! So, instead of the code above, we can use the __autoload function as shown below: function __autoload($class_name) { require_once “./class/class.”.$class_name.“.php”; } $foo = new Foo; $ab = new AB; $xz = new XZ; $yz = new YZ; How does the __autoload function work? Because the __autoload function is a magic function, it will not be called directly by you, the programmer. Instead, it is called behind the scenes by PHP – that’s what makes it magical. But, when does the __autoload function actually get called? Well, in the code above, the __autoload function will be called 4 times, because PHP will not recognize the Foo, AB, XZ, and YZ classes so PHP will make a call to the __autoload function each time it does not recognize a class name. Also, in the code above, we can see that the autoload function takes the class name as a parameter (the $class_name variable). PHP passes the $class_name variable behind the scenes to the __autoload function whenever it finds that it doesn’t recognize the class name that is being used in a given statement. For instance, when PHP sees the “$foo = new Foo;” line, it does not recognize the Foo class because the Foo class was never included or “required” as part of the current file. So, PHP then (behind the scenes) passes the “Foo” class to the __autoload function, and if the class file is found by the autoload function then it is included by the “require_once” statement. One last thing worth noticing in the code above is how we use the $class_name variable in the class.”.$class_name.“.php”; piece of the code. Basically, this allows us to point to the correct file dynamically. The assumption here is also that the class folder is in the same directory as the current file. When is the __autoload function called? The __autoload function is called anytime a reference to an unknown class is made in your code. Of course, the __autoload function must be defined by you in order to actually be called. Does the __autoload function work with static function calls? Yes, it does. Remember that a static function is a function that can be called by just using the class name in which it is defined – and there is no need to create an object of the class. The code below, which has a call to a static function, will still run the __autoload function: function __autoload($class_name) { require_once “./class/class.”.$class_name.“.php”; } //this is a call to a static function SampleClass::staticFunctionCall($param); In the code above, the class SampleClass is not recognized because it is not explicitly included anywhere in the code. This means that PHP will make a call to the __autoload function when it realizes that the SampleClass definition is not provided anywhere. Once the __autoload function is called, the class.SampleClass.php file will be included in order to have the definition of the SampleClass class. Of course, the SampleClass is needed because a call is being made to a static function that belongs to the SampleClass class. When else would the __autoload function be called automatically by PHP? One last thing that is interesting and worth noting is that even calling the class_exists(which just checks to see if a given class is defined) PHP function will call the __autoload function by default. There is an extra parameter in the class_exists function that would allow you to disable the automatic call to the __autoload function. in PHP, what is the difference between self and $this? In very general terms, we can say that $this is used to reference the currentobject, whereas self is used to access the current class itself. But, there are more specific details we will discuss below that you should definitely know about. Since we believe strongly in examples as a teaching aid, take a look at the examples we have come up with below. In the examples below, we have a class called Animal and a derived class called Tiger. The Tiger class overrides the whichClass() function – which is also very important to note for the discussion that follows. Here is some code where we use the $this pointer: class Animal { public function whichClass() { echo "I am an Animal!"; } /* Note that this method uses the $this keyword so the calling object's class type (Tiger) will be recognized and the Tiger class version of the whichClass method will be called. */ public function sayClassName() { $this->whichClass(); } } class Tiger extends Animal { public function whichClass() { echo "I am a Tiger!"; } } $tigerObj = new Tiger(); $tigerObj->sayClassName(); Running the code above will output this: I am a Tiger! In the code above, we create an object of the Tiger class and call it $tigerObj. And, inside the Animal class’s version of the sayClassName() function, you can see the call to $this->whichClass(). Because the $this pointer always references the current object, and we are dealing with an object of the Tiger class above, the version of whichClass() that gets called is the one defined in the Tiger class. This is a valid example of polymorphism in PHP. Using “self” instead Now, if we change the “sayClassName()” function to use the self keyword instead of the $this variable, we would get a different result. So, suppose our code now looks like this – the only change we made is highlighted in red, everything else is exactly the same: class Animal { public function whichClass() { echo "I am an Animal!"; } /* This method has been changed to use the self keyword instead of $this */ public function sayClassName() { self::whichClass(); } } class Tiger extends Animal { public function whichClass() { echo "I am a Tiger!"; } } $tigerObj = new Tiger(); $tigerObj->sayClassName(); Running the code above will output this: I am an Animal! Using self in PHP can turn off polymorphic behavior and bypasses the vtable So, what exactly is going on when we change the code to use “self” instead? When self is used, it automatically just calls the version of sayClassName() that is in the same class – so since self is being used within the Animal class, the version of sayClassName() that get’s called is the one that belongs to the Animal class. If we compare self to $this, then we can see that the $this keyword will just call the version of sayClassName() that belongs to the same class type as the calling object. Remember the $this variable is basically a reference to the current object, which in this case is of type Tiger. So, when the $this variable is used to call the sayClassName, it chooses to use the version that is in the Tiger class. In the example above, self is essentially turning off polymorphic behavior by bypassing the vtable. If that is confusing you can (and probably should) read more about vtables over here:Vtables. $this versus self when used inside a static function Let’s say you try to use the $this pointer inside a static method. So, let’s say you have some code that looks like this: class Animal { public static $name; //trying to use $this in a static function: public static function nameChange() { $this->name = "Programmer Interview"; } } $animalObj = new Animal(); $animalObj->nameChange(); What actually happens when you run the code above is that you will end up with this error: “Fatal error: Using $this when not in object context…”. Why do we get this error? Well, think about what you are doing here – you are using the $this pointer inside a static function. And, static functions can actually be called without using an object of the same class – you can call the nameChange function directly just using the class name like this: Animal::nameChange(); If the static nameChange function is ever called directly by just using the Animal class name, then the $this variable really has no meaning because $this is meant to be used to reference the current object – and there is no object in the scenario presented above. And that is exactly why you get that error message. Now, what if we try to use the self variable inside the static nameChange function instead? So our code now looks like this: class Animal { public static $name; //trying to use self in a static function: public static function nameChange() { self::$name = "Programmer Interview"; } } $animalObj = new Animal(); $animalObj->nameChange(); The code above runs just fine, without error. And, this is actually a big reason why self is used – to access static members of the class. Now, just for the sake of showing you something else that is interesting, let’s make a small change so that $name is no longer a static member variable: class Animal { //$name is no longer a static variable.. public $name; //trying to use self in a static function: public static function nameChange() { self::$name = "Programmer Interview"; } } $animalObj = new Animal(); $animalObj->nameChange(); Now that $name is no longer a static variable, running the code above gives us an error: “Fatal error: Access to undeclared static property: Animal::$name”. What is the reason we get this error now? Explanation of the “Fatal error: Access to undeclared static property” error The reason we get that error is because we are not allowed to try to access non-static member variables inside a static function. This makes sense because static functions can be called without using an object – and because non-static member variables are meant to be used by objects this makes no sense. You can read more about this here (even though it’s in the context of C++, the same concept applies to PHP): Accessing non static members from a static function $this vs self when accessing static members and calling static functions Let’s see how $this and self behave when either trying to access static members or call static functions. Let’s start with the self keyword: class Animal { public static function whichClass() { echo "I am an Animal!"; } public function sayClassName() { self::whichClass(); } } $animalObj = new Animal(); $animalObj ->sayClassName(); The code above uses the self keyword to invoke a static function – and it runs just fine without giving any errors. Now, let’s change it to use the $this variable instead of self: class Animal { public static function whichClass() { echo "I am an Animal!"; } public function sayClassName() { $this->whichClass(); } } $animalObj = new Animal(); $animalObj ->sayClassName(); The code above actually runs just fine and returns “I am an Animal”. So, invoking a static function with the $this variable is not a problem. And that actually makes sense because static functions do not even need an object in order to be invoked. Example of calling static member variable with $this Invoking a static member variable with the $this pointer does not return an error, but is notrecommended. Let’s take a look at an example to help clarify. Suppose you try to run the following code – note the use of $this to access the $name member variable: class Animal { public static $name; public static function whichClass() { echo "I am an Animal!"; } public function sayClassName() { $this->name = "My name is Animal"; } } $animalObj = new Animal(); $animalObj ->sayClassName(); So, in the code above, we create an instance of the Animal class in $animalObj, and then we call the method sayClassName, which uses the $this pointer to access the static member variable $name. Running the code above runs without error. It is perfectly fine to access a static variable via the $this pointer. But, you should know that any change to that static variable will only affect the current instance/object, which isn’t really what you would expect if you change a static member variable. Confused? Well, check out this example: class Animal { public static $name; public function setClassName() { $this->name = "My name is Animal"; //echo $this->name; } } $animalObj = new Animal(); $animalObj2 = new Animal(); $animalObj->setClassName(); echo $animalObj->name; echo $animalObj2->name; //what happens here? Note that we set the $name static variable for the $animalObj object by calling the setClassName method, but we do not set that variable for the $animalObj2 object. What do you think will happen when we try to output the value of the $name static variable in $animalObj2 (in this line: echo $animalObj2->name; )? Well, if you guessed that it would output “My name is Animal” then you are actually wrong! You can verify this for yourself by running the simple code above. Yes, “echo $animalObj->name;” will output the text “My name is Animal”, but when a static member variable is set in one object using the $this pointer, that value does NOT transfer to other instances of the same class, even though it’s a static member variable and that behavior may be exactly what you would expect. It seems that it’s not even setting the static variable at all, and it’s just creating another non-static variable called $name that only belongs to that particular object. You can confirm this fact by trying to output the value of the static variable using the correct syntax: “echo Animal::$name;” – keep in mind that in PHP, static properties cannot be accessed through the object using the arrow operator ->. You will see that nothing is output for the code above when you try “echo Animal::$name;”, even after setting what you may think is the static variable $name using $this. For this exact reason, in PHP, you should not use the $this pointer to set a static member variable – instead, you should just use the $self variable, which you can see an example of below. Example of using self to access static member variable Now, let’s say that we use the self keyword in the same example: class Animal { public static $name; public static function whichClass() { echo "I am an Animal!"; } public function sayClassName() { self::$name = "My name is Animal"; } } $animalObj = new Animal(); $animalObj ->sayClassName(); Now, this code also runs just fine as well – self is accessing the static $name member variable that belongs to the Animal class, and that makes sense because no objects are involved or implied. If we try to output the value of the $name static member variable using the code “echo Animal::$name;”, then it will output just fine – because we correctly used the self keyword to set the static member variable. So, in PHP, it is always necessary to always refer to static variables using a static context (e.g by using self, or the class name). Accessing a variable with same name as static member variable inside a function But, another interesting question is what would happen if self is not used, and we just used the $name by itself like in this example: class Animal { public static $name; public static function whichClass() { echo "I am an Animal!"; } public function sayClassName() { $name = "My name is Animal"; } } $animalObj = new Animal(); $animalObj ->sayClassName(); Well, in this scenario, the $name variable used in the sayClassName function is actually a local variable that is created inside the sayClassName function, and that is not the same static $name variable that belongs to the class as a whole. So, you would actually have to use the self variable if you want to reference the static member variable that belongs to the class as a whole. Summary of the differences between self and $this Finally, we are done. Let’s now go through a quick summary of the differences between self and $this that we covered in our examples above: -self refers to the current class -self can be used to call static functions and reference static member variables -self can be used inside static functions -self can also turn off polymorphic behavior by bypassing the vtable -$this refers to the current object -$this can be used to call static functions -$this should not be used to call static member variables. Use self instead. -$this can not be used inside static functions In PHP, what is the difference between self and static? The differences between self and static are fairly easy to understand with some good examples. So, let’s take a look at some actual code. Suppose we have the following class – called Car – which has two simple methods called model and getModel. Note the use of the self keyword: Example of self in PHP class Car { public static function model() { self::getModel(); } protected static function getModel() { echo "I am a Car!"; } } Suppose we make this call to the static model function in the Car class – since it is a static function we can of course call the function directly using only the class name: Car::model(); The output after calling the model function in the Car class will be: I am a Car! The self keyword simply makes a call to the getModel function that belongs to the Car class, which just outputs the text “I am a Car!” to the page. Let’s say that we decide to add a new class, called Mercedes, that derives from the Car class. Here is what it looks like: class Mercedes extends Car { protected static function getModel() { echo "I am a Mercedes!"; } } Because the Mercedes class derives from the Car class, it will inherit the model function that is defined in the Car class. So, what do you think will happen if we call the static model function, but this time we use the Mercedes class instead – like this: Mercedes::model(); Well, the output from the function call above may not be what you expect – this is what the output looks like: I am a Car! You may have been expecting the Mercedes::model(); call to have output “I am a Mercedes!”. So, what is going on here? Explaining self The model function is defined inside the Car class, and it is not overridden by the Mercedes class – but the model function is of course inherited by the Mercedes class. As a result, when we call the version of model inside the Mercedes class, the scope of the function is still inside the Car class – because the function definition is inside the Car class. The way the keyword “self” works is that it will call thecurrent class’s implementation of the getModel function – and since the model function is defined inside the Car class, the current class would be the Car class. So, it will call the Car class implementation of getModel and NOT the Mercedes class implementation. This behavior may be considered undesirable because it is not polymorphic, and is not aligned with object oriented design principles. But, there is an alternative solution that can get us that kind of behavior – and this is where the static keyword becomes useful. This behavior may be considered undesirable because it is not polymorphic, and is not aligned with object oriented design principles. But, there is an alternative solution that can get us that kind of behavior – and it involves using the static keyword in a different way from how you may normally use it. The static keyword and late static binding In PHP 5.3, a new feature called late static bindings was added – and this can help us get the polymorphic behavior that may be preferable in this situation. In simplest terms, late static bindings means that a call to a static function that is inherited will “bind” to the calling class at runtime. So, in our example above, if we use late static binding it would mean that when we make a call to “Mercedes::model();”, then the getModel function in the Mercedes class will be called instead of the getModel function in the Car class. Mercedes is of course the calling class in our example. Example of late static binding in PHP Now, the question is how can we actually make late static binding work for us? Well, all we have to do is replace the “self::getModel();” call inside the Car class with “static::getModel();” instead. So, this is what our new Car class will look like – note that we do not have to make any change to the Mercedes class: class Car { public static function model() { static::getModel(); } protected static function getModel() { echo "I am a Car!"; } } Now, if we make this call: Mercedes::model(); Our output will be this: I am a Mercedes! Late static binding was not possible before PHP 5.3 Note that before PHP 5.3 late static binding was not possible – and trying to run the code above in any version of PHP before 5.3 will return an error. PHP self versus static Now that we changed the code in our example to use static instead of self, you can see the difference is that self references the current class, whereas the static keyword allows the function to bind to the calling class at runtime. How would you find out if a string contains another string in PHP? Suppose we have a string that is stored in a PHP variable called $aString. And we want to find out if inside of $aString there is another substring – let’s just say for the sake of an example that we are looking for the string “Waldo” inside of the larger string. Now let’s say that the name of the larger string ($aString) is this: “Where is Waldo?”. And, we just want to find out if $aString contains “Waldo”. PHP provides us with a function called strposthat will allow us to find the existence of one string inside of another. Here is an example of how to use the strpos function: Example of how to find out if one string contains another in PHP if (strpos($aString,'Waldo') !== false) { echo 'I found Waldo!'; } But, there is something you should be aware of when using the strpos function: if you are looking for “Waldo” inside a string that looks like this: “heyWaldo are you there?”, then the strpos function will return successfully with the string positon of “Waldo”, basically saying that “Waldo” was indeed found. This is of course a problem if you only want to search for the string “Waldo” as a separate word, and not as part of another word. Strpos never returns true One thing about the strpos function that you should remember is that it never returns the boolean value of true. The strpos function returns a value indicating the position of the first occurrence of the substring being searched for. If the substring is not found “false” is returned instead – which is why in the code above we check for false instead of true. !== vs != in PHP One thing worth noting in the code above is that we used the !== operator instead of the != operator (which has one less “=”). What’s the difference between the 2 operators? You can think of the !== operator as being more ‘strict’ than the != operator. This is because the !== operator will say that the two operands being compared are not equal only if the type of the two operands are the same, but their values are not equal. This is desirable behavior because the strpos function can return a 0 if the string being searched contains the substring as the very first element. The 0 would represent the 0th index of the larger string – meaning the first position in that string. So, if $aString is “Waldo is here”, and we are searching for “Waldo”, then the strpos function will return a 0. This means that the check being performed will be to see if 0 is not equal to false. But the problem is that 0 is also considered as the integer equivalent of the boolean ‘false’ in PHP, which means that the statement “0 != false” will be considered false, because 0 is equal to false in PHP. But, if we run “0 !== false” instead, then that statement will be considered to be true, because it just adds the additional check to see if 0 and false are of the same type. Since 0 is an integer and false is a boolean, clearly they are not equal so comparing the 0 and false forinequality returns true unlike the “0 != false” check, which returns false. Hopefully that was not too confusing and if you need more details on that concept you can read about it here: Difference between == and === in PHP. if we had this code instead – where we use != and not !== – then it would be a problem: Problematic code to find a substring inside a larger string if (strpos($aString,'Waldo') != false) { echo 'I found Waldo!'; } The code above can result in problems for the reasons discussed above. It’s always better to use !== instead of !=. How to delete an element from an array in php? When deleting an element from an array in PHP, a good function to use is the unset function. Here is an example of its usage: An example of using unset to delete an element from an array: $anArray = array("X", "Y", "Z"); unset($anArray[0]); //'dumps' the content of $anArray to the page: var_dump($anArray); The output of the var_dump function will be: array(2) { [1]=> string(1) "Y" [2]=> string(1) "Z" } Unset leaves all of the index values the same after an element is deleted In our example above, the $anArray array will have values of “Y” and “Z” at indices of 1 and 2, respectively, after the element “X” is deleted using the unset function. This means that the indices for the other elements were not changed to adjust for the fact that the very first element (“X”) was deleted. This would also mean that if you delete an element in the very middle of an array using unset then it would leave a gap in that array. Suppose we have this code: $anArray = array("V", "W", "X", "Y", "Z"); unset($anArray[2]); //'dumps' the content of $anArray to the page: var_dump($anArray); This would output the following: array(4) { [0]=> string(1) "V" [1]=> string(1) "W" [3]=> string(1) "Y" [4]=> string(1) "Z" } Note in the output above, there is now no index # 2 – there is 0, 1, 3, and 4. Not having the continuous index values could potentially be a negative drawback. So what are you alternatives? Well, you could use a function called array_splice instead. Using array_splice to delete an element from an array The array_splice function is used to take one part of an array and replace it with some other contents. It can also be used to delete an element in an array. Here is an example and explanation of how to use array_splice to delete an element: $anArray = array("V", "W", "X", "Y", "Z"); /*The 2 represents the offset - which basically means move 2 positions from the beginning of the array and that will take us to the "X" element. The 1 represents the length of the array that you want to delete. Since we just want to delete 1 element, we set the length parameter to 1. And, since we are not replacing that element with anything - we do just want to delete it - we leave the 4th parameter ( which is optional) blank */ array_splice($anArray, 2, 1); var_dump($anArray); Running the code above will return us this: array(4) { [0]=> string(1) "V" [1]=> string(1) "W" [2]=> string(1) "Y" [3]=> string(1) "Z" } So, using array_splice will set the index values back to their correct order, and we will be good to go again. There is however an assumption here that we should point out – the array_splice function accepts a value for the offset, not the index. Theoffset that we used in the example above happened to be equal to the index value. This is the case when the array we are dealing with already has a continuous integer index value, but if the array has been changed for whatever reason before array_splice is used, that may not always be the case. Using the array_values and unset functions to delete an element from an array If you use the array_values function right after the unset function, then you can set the index values back to their normal values, without any gaps in between the numbers. An example will help clarify: Example of using array_values and unset to delete an element from an array: $anArray = array("V", "W", "X", "Y", "Z"); /* This will cause index 2 to go missing, so the array indices of $anArray will be 0,1,3,4 - obviously the 2 is missing */ unset($anArray[2]); /* array_values will take an array as an input and then take the array values (not the keys, but just the values), and numerically index those values into a new array. This means that array_values will essentially re-index the array that is given as an input, which restores the indices to the correct order of 0,1,2, and 3: */ $anArray = array_values($anArray); //'dumps' the content of $anArray to the page: var_dump($anArray); This will now output: array(4) { [0]=> string(1) "V" [1]=> string(1) "W" [2]=> string(1) "Y" [3]=> string(1) "Z" } The array_values function will replace non-numeric key indices with numeric values One thing we should point out about the array_values function is that if the key’s are non-numeric then they will be replaced with numeric values anyways. So, if you have an array that uses strings as indices (which is basically a hashtable), then array_values will remove those strings and replace them with numerical values. This is something you should definitely watch out for if you do decide to use the array_values function. Now you have seen the different options that you have available to you when deleting an element from an array in PHP. And you are also aware of any potential side effects – which method you choose to use is entirely up to you. How to delete an element in an array if you only know the value in PHP You may want to delete an element in an array for which you only know the value (but not the corresponding key). In that scenario, you will have to search the array for the value you want first in order to get the corresponding key. You can use the array_search function to do that – it will simply take the array you want to search along with the value you want to search for as the parameters, and will return the corresponding key if it is found. Then, you can use the unset function to remove the element as before. And finally, here is an example of how to do it: An example of deleting an element in an array if you only know the value in PHP $key = array_search($valueToSearch,$arrayToSearch); if($key!==false){ unset($array[$key]); } How would you convert a PHP variable to a string? Is there something like the toString method that Java has in PHP? PHP has casting operators that can be used to convert non-string variables into strings. Here is how to use a casting operator to convert a variable to a string in PHP: Example of PHP’s equivalent to toString // this is an integer: $nonStringVar = 123; /*now $stringVar is a string because the "(string)" performs a type cast and returns the string equivalent of the integer */ $stringVar = (string)$nonStringVar; The casting operator is fairly close in functionality to the toString method in Java, but read below because PHP also has a it’s own __toString method. Using the strval function to convert a variable to a string You can also use the function strval to get the string value of a variable. // this is an integer: $nonStringVar = 123; //now $stringVar is a string $aString = strval($nonStringVar); PHP does have a __toString magic method PHP also provides a method called __toString that is defined by the programmer, and that basically tells a class how to act when it is treated like a string. When would a class be treated like a string? Well, suppose you have a class called SomeClass, and an object of that class called $someObject. If you decide to do something like this: “echo $someObject;”, then what exactly should be output in that scenario? Well, that is up to you to decide – because whatever you define in __toString method is what will be output to the page. You can decide to output all of the class’s instance variables, output a particular string, etc. – but whatever you do the __toString method will have to return a string. Here is an example of the __toString method in action: class SomeClass { public $aVariable; public function __construct($aVariable) { $this->aVariable = $aVariable; } public function __toString() { return $this->aVariable; } } $someObject = new SomeClass('Testing 123'); /* this will indirectly call the toString method: which will output the string 'Testing 123' */ echo $someObject; The __toString method is a magic method – you can read more about magic methods here if you are not familiar with them already: Magic methods in PHP. What is the best way to return data in the JSON format from PHP? If you are running PHP 5.2 or greater, you can use the built-in json_encode function to return JSON formatted data from PHP. Here is an example of it’s usage: Example of returning JSON from PHP $tennisArray = array('Djokovic' => 1, 'Federer' => 2, 'Nadal' => 3, 'Murray' => 4); echo json_encode($tennisArray); The code above will output the JSON formatted data like this: {"Djokovic":1,"Federer":2,"Nadal":3,"Murray":4} If you are running a version before PHP 5.2, how to return JSON from PHP? If you are running a version of PHP that came before 5.2 then you can use the PHP extension called PECL available here: JSON and PHP What is the best way to remove or turn off warning messages in PHP? There will be times when you will see a warning message output to the browser after running your PHP script and you may want to turn off that warning message. Obviously, it’s a lot better to get to the root of the problem and fix that instead. But, if you know that you do not need to fix the root of the problem (for whatever reason), in order to remove a warning message in PHP all you have to do is use theerror_reporting function in PHP. If you don’t care about how the function works, then just skip to the section that says “The code to turn off error messages in PHP” to see the code that you should use to turn off warning messages in PHP. Otherwise, keep reading. Using the error_reporting function to turn off warnings in PHP The error_reporting function in PHP basically allows you to set the kind of error reporting that you want. How does the error_reporting function work? Well, you simply pass in the type of errors to the error_reporting function that you want to have reported on the page – you need to pass in constants (which are text fields that translate to numbers) to the error_reporting function . The E_PARSE constant tells PHP that compile time parse errors should be reported and displayed on the page as you can read about here. Since you definitely want to know about any compile time errors, you should pass this constant to the function. The E_ERROR constant tells PHP that the details of any fatal runtime errors should be reported and displayed – this is also something you definitely want, since you should always know what the cause of any fatal run-time errors is. Now that you understand a bit more about how the error_reporting function works – here is the actual code to use: The code to turn off error messages in PHP: You have to place this line of code before the code that is causing the warning to be displayed. If you place this code after the offending code, then it will not work in suppressing the error message that gets displayed. Here is the line of code to use: error_reporting(E_ERROR | E_PARSE); The code above does not have the E_WARNING constant being passed in Because the function above does not include the “E_WARNING” constant, the non-fatal run-time warnings will not be displayed on the page when a PHP script is run. And that is exactly what prevents the warning message from appearing on the page. How do the constants work in the error_reporting function? In the example above, E_PARSE and E_ERROR are both constants – which means that they are actually numbers represented by text, so E_PARSE really is just some text that represents the numeric value of 4, and E_ERROR is text representing the numeric value of 1. Read on to understand how those constants work. How does the OR operator work with error_reporting function? Note that the function above uses the “|” – the OR logical operator, which is applied against the constants that are passed into the error_reporting function. You will notice if you look atthis page that those constants are all multiples of 2 – the reason for this is because when they are “OR’ed, the appropriate bits will be retained and that will tell the error_reporting function what errors need to be displayed. Another way to hide or remove warning messages in PHP Another option to remove warning messages in PHP is to use what is called the error control operator – which is basically just the at sign – the “@”. When the “@” sign is put in front of an expression, any error message that might be generated by that expression will be ignored. The “@” error control prefix operator will even disable error reporting for critical run time errors. For this reason, you should only use this operator if you really know what you are doing. The “@” can only be used in front of expressions – so it can not be used in front of a function or class definition, a for loop, etc. But, it can be used in front of a call to a function. Here is what it would look like in that scenario: @someFunctionCall( ); Advanced PHP Practice Interview Questions And Answers Here we present some more challenging practice PHP interview questions and answers that were asked in a real interview for a PHP web developer position. These questions are really good to not just test your PHP skills, but also your general web development knowledge. We think that you will benefit a lot, and gain some good practice by going through these questions. The questions are for intermediate to somewhat advanced PHP software engineers, but even if you are just a beginner or fresher you should be able to understand the answers and explanations we give – but you may not be able to come up with the answers on your own. Here is the first part of the question – read it carefully to really understand it, and we give a simple, easy to understand explanation of everything in this question: Write a PHP script to report the total download size of any URL. You may not use any 3rd-party code that performs the entire task described below. No HTML interface is necessary for this exercise; you can write this as a command-line script that accepts the URL as an argument. For a single-file resource such as an image or SWF, the script would simply report on the total size of the document. For a complex resource such as an HTML document, the script would need to parse it to find references to embedded, included resources: javascript files, CSS files, iframes, etc. The goal of this exercise is to output total number - total download size for all requests the following of information HTTP for a given URL: requests So, there are 2 primary goals that this question asks us to solve: For any URL, find the total number of HTTP requests generated by that URL, and also find the total download size for all requests. You may not understand what is meant by an HTTP request, but don’t worry we explain it all below. We’ll have to break down this question into more manageable pieces since it is a lot to comprehend. So, we’ll go with the divide and conquer approach. Let’s start with the easier parts of the question first. Accepting arguments in PHP scripts The question says that “No HTML interface is necessary; you can write this as a command-line script that accepts the URL as an argument”. So, let’s just say that we want to just write this as a command line script. The question is how do we retrieve arguments inside a PHP command-line script? Well, if we plan on having the script called from the command line as “ourscript.php www.theurl.com”, where the URL is passed as an argument, then inside the PHP script we can grab the URL value by using the PHP variable “$argv[1];”. Inside our PHP script the code to retrieve the URL passed in as an argument would look like: /* If this script is invoked as ourscript.php www.theurl.com, then $argv[1] will hold the value www.theurl.com, and that value will be stored in the $URL variable as well */ $URL = $argv[1]; That’s very simple code – now, let’s move on to other parts of the question. How to connect to a URL in PHP? It should also be clear that we will need to somehow be able to connect to a URL and view the contents of the page that the URL points to. What is the best way to do this? Well PHP provides a library called cURL that may already be included in your installation of PHP by default. cURL stands for client URL, and it allows you to connect to a URL and retrieve information from that page – like the HTML content of the page, the HTTP headers and their associated data, etc. You will see the use of cURL in our code below – don’t worry if you’ve never used cURL before, it’s fairly easy to understand! Understanding resources If you are confused by what exactly is meant by the term “resource” in the question above, then you should just think of a web resource as a generic term for a file. So, a CSS file, a Javascript file, an HTML file, a SWF (a file used for Adobe Flash) file, an image file (jpg, png, etc) – each of these is a different type of resource, and as you know there are many more types of resources on the web. The difference between single file resources and other resources The question specifically calls HTML files complex resources because of the simple fact that HTML documents are complex – they can contain many references to single file resources like image files, and SWF files. A single file resource does not contain references to other resources – a jpg or gif file can not contain a reference to another file, and that is why they are both considered single file resources. An HTML file, on the other hand, is also considered a resource itself, but because it contains references to other resources, it is not considered to be a single file resource. In order to retrieve a resource from the web server where that resource is stored, a web browser has to make an HTTP request. Read on to understand more about HTTP requests. What exactly is an HTTP request? The question asks for two major things from a URL – the total number of HTTP requests and the total download size for all requests. The download size is easy enough to understand, but you may be confused by what exactly is meant by an HTTP request. HTTP is the protocol used to communicate on the web. When you visit a webpage, your browser will make an HTTP request to the server that hosts that webpage, and the server on which the webpage is hosted will respond with an HTTP response. But, what is important to understand here, is that your browser will probably have to makemultiple HTTP requests in order to retrieve a single HTML page at a given URL, because that webpage will probably have some CSS files to go along with it, some Javascript files, and probably some images as well. Each one of those resources is a separate HTTP request – 2 image files, 2 Javascript files, and 2 CSS files means 6 separate HTTP requests. In HTTP, only one resource can be requested at a time – so we can not have 1 request for 6 different resources, instead we must have 6 requests for those 6 different resources. So, for the purpose of this interview question, we have to find out the number of HTTP requests that will be made for a given URL – hopefully what that means is now clear to you. We’ll go more in depth on this later – and show some actual code – as we cover some other things as well. How to find the download size of a file? The question also asks us to find the total download size of a URL. But what if that URL passed into the script just points to a single file resource like a JPG file or a GIF file? Well, for a single file resource we just need to find the size of that particular file and then return it as the answer, and we are done. But, for an HTML document we will need to find the total size of all resources that are embedded and included on the page and return that as the answer – because you must remember that we want the total download size of a URL. So, let’s write a PHP function that will return the download size of a single file resource. How should we approach writing this function – what is the easiest way to find the download size of a single file resource on the web? Well, there is an HTTP header called “Content-Length” which will actually tell us the size of a particular resource file in the HTTP response (after the resource is requested). So, all we have to do is use PHP’s built in “get_headers” function, which will retrieve all the HTTP headers sent by the server in response to an HTTP request. The get_headers function accepts a URL as an argument. So, the PHP code to retrieve the “Content-Length” header would look like this: function get_remote_file_size($url) { $headers = get_headers($url, 1); if (isset($headers['Content-Length'])) return $headers['Content-Length']; //checks for lower case "L" in Content-length: if (isset($headers['Content-length'])) return $headers['Content-length']; } But, there is actually a problem with this code: you will not always receive the Content-Length header in an HTTP response. In other words, the HTTP Content-Length header is not guaranteed to be sent back by the web server hosting that particular URL, because it depends on the configuration of the server. This means that you need an alternative that always works in case the approach above fails. An alternative to using the content-length header Well, we can actually download the file ourselves and then just get the download size for that URL. How can we do this? Well, this is where we can use cURL as we discussed above. Once we download the resource, we can retrieve the download size using the CURLINFO_SIZE_DOWNLOAD parameter. So, using this approach as a backup to our first approach, we can come up with this code (the code in red below is the new code): function get_remote_file_size($url) { $headers = get_headers($url, 1); if (isset($headers['Content-Length'])) return $headers['Content-Length']; //checks for lower case "L" in Content-length: if (isset($headers['Content-length'])) return $headers['Content-length']; //the code below runs if no "Content-Length" header is found: $c = curl_init(); curl_setopt_array($c, array( CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3'), )); curl_exec($c); $size = curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD); return $size; curl_close($c); } How should we parse HTML in PHP? What exactly is meant by the sentence “For a complex resource such as an HTML document, the script would need to parse it to find references to embedded, included resources: javascript files, CSS files, iframes, etc.”? Well, as you probably know, an HTML page often uses other files to render the HTML page – like CSS file(s) for styling, Javascript file(s) for adding more functionality to the HTML page, and so on. But the question is how do we take an HTML page and find all of those resources in the HTML page. Of course, this is easy to do if we are reading the HTML page with the human eye. But, we want to find these resources using a program that will read the HTML for us. This is actually more complicated than it seems – and the process by which a program (like PHP) reads an HTML file and analyzes the text to extract meaningful data (like resources) is known as parsing the HTML. Any text can be parsed, but we are exclusively focused on HTML for the purpose of this interview question. Parsing HTML in PHP is definitely something that you do not want to do on your own, because it is so complex – as you can read about here: How to parse HTML in PHP. The best way to parse HTML in PHP is to use a library that already exists – because writing an entire library from scratch to do this would obviously be considered way too much work for an answer to an interview question. Note that the question states that “You may not use any 3rd-party code that performs the entire task described below”. This just means you can not use 3rd party code to perform theentire task – but using a PHP library to help you with part of this question is perfectly OK. Of course, you should clarify this with your interviewer if you are in doubt, but we know for sure that for this particular question there’s no way that the interviewers would be expecting you to perform this task without using a library to help you parse the HTML. With that in mind, here is the library we plan on using: PHP HTML parser. Note that the instructions say: “For a single-file resource such as an image or SWF, the script would simply report on the total size of the document.” This means that if the URL is single file resources like an image file, we can just return the size of the file and we are done. But, how can we distinguish between a single-file resource and a non-single file resource? Well, we could just say that all non-HTML pages are single file resources. That statement is not entirely true, as you can read about in part 3, but we will pretend it is for the sake of keeping things simple. But wait, you might be thinking – what about PHP, JSP, ASP and all of those pages? Well, of course there is some application specific logic embedded in those pages, but once those pages are rendered in a browser they are all HTML pages, regardless of what their file extension may be. So, all we have to do in order to determine if a URL points to a single file resource is to see if it is an HTML page – if it is not an HTML page, then we know that the file is a single resource file. Using the HTTP Content-Type Header But how do we check to see if a webpage is an HTML page? Clearly we can’t just look at the URL by itself, because a PHP page, JSP page, etc. are all HTML pages, but the file extension does not tell us that. Well, once again we can use the HTTP headers to our advantage – in this case, we just have to take a look at the HTTP Content Type header. And, if the Content-Type header is equal to “text/html”, then we know that we are dealing with an HTML page. But if the Content-Type header for the URL is not equal to “text/html”, then we know that we are dealing with a single file resource, and we can just return the size. Let’s write some code in PHP that will tell us if a given URL is actually an HTML page by checking the HTTP headers. Here is a PHP function that will do that for us: function check_if_html($url){ $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_HEADER, TRUE); curl_setopt($ch, CURLOPT_NOBODY, TRUE); $data = curl_exec($ch); $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE ); curl_close($ch); if (strpos($contentType,'text/html') !== false) return TRUE; // this is HTML, yes! else return FALSE; } In the code above, we just use a simple cURL connection to the URL to retrieve the headers, and then check the contentType header to see if it has the text “text/html”. If it does, then we return true, otherwise we return false. Then, we can add some code that will actually call the function to determine if a URL points to just a single resource file: /* check to see if the URL points to an HTML page, if it doesn't then we are dealing with a single file resource: */ if (!check_if_html($URL)) { $totalSize = get_remote_file_size($URL); echo "Final Total Download Size: $totalSize Bytes "; $totalNumResources += 1; echo " //single resource is an HTTP request Final total HTTP requests: $totalNumResources" ; return; } How to find the total number of HTTP requests We mentioned that we would need to find the total number of HTTP requests generated by a given URL – let’s figure out how to write some code that will do that for us. It’s clear that we must have some variable that maintains a total count of all HTTP requests, and this variable will be incremented as we come across more and more HTTP requests. We know that images will be wrapped in an “img” tag – so if we just do a search for all img tags we can take a look at the src attribute, and find the size of any given image. For each image we find, we can increment the variable that holds the total count of the HTTP requests. We can also do the same for CSS files – they will be referenced inside “link” tags, and also for JavaScript files, which will be referenced inside “script” tags. We will need to use the simple HTML DOM parser that we discussed earlier in order to find all of the references to CSS, Javascript, and image files. Here’s what the code looks like – note that we are using the simple HTML DOM library functionality to parse through the HTML. Also note that we are using a variable called $totalNumResources to hold the total number of resources, and another variable called $totalSize to hold the total size of all of the resources: include('simple_html_dom.php'); $URL = $argv[1]; // Create DOM from URL or file $html = file_get_html($URL); // find all images!! foreach($html->find('img') as $element){ $size = get_remote_file_size($element->src); $totalSize = $totalSize + $size; $totalNumResources += 1; /* echo "Total Size So Far: $totalSize.\n"; echo "total resources: $totalNumResources .\n"; echo "IMAGE SIZE: $size.\n"; echo "$element->src.\n"; */ } // find all CSS files foreach($html->find('link') as $element) { if (strpos($element->href,'.css') !== false) { $size = retrieve_remote_file_size($element->href); echo "SIZE: $size.\n"; $totalSize = $totalSize + $size; $totalNumResources += 1; } } // find all script tags foreach($html->find('script') as $element) { //make sure this is javascript if (strpos($element->src,'.js') !== false) { $size = get_remote_file_size($element->src); echo " Javascript SIZE: $size.\n"; $totalSize = $totalSize + $size; $totalNumResources += 1; } } The answer to Advanced PHP Interview Question Part 1 Finally, we present our complete answer to the advanced PHP interview question part 1 below – with all the source code you need to answer the first portion of the question. You can also continue on to Part 2 of the PHP Interview Questions and Answers, or just click the next button below. include('simple_html_dom.php'); $URL = $argv[1]; $totalSize = 0; $totalNumResources = 0; /* check to see if the URL points to an HTML page, if it doesn't then we are dealing with a single file resource: */ if (!check_if_html($URL)) { $totalSize = get_remote_file_size($URL); echo "Final Total Download Size: $totalSize Bytes "; $totalNumResources += 1; echo " //a single resource is still an HTTP request Final total HTTP requests: $totalNumResources" ; return; } /* at this point we know we are dealing with an HTML document which also counts as a resource, so increment the $totalNumResources variable by 1 */ $totalNumResources += 1; $html = file_get_html($URL); // find all images: foreach($html->find('img') as $element){ $size = get_remote_file_size($element->src); $totalSize = $totalSize + $size; $totalNumResources += 1; /* echo "Total Size So Far: $totalSize.\n"; echo "total resources: $totalNumResources .\n"; echo "IMAGE SIZE: $size.\n"; echo "$element->src.\n"; */ } // Find all CSS: foreach($html->find('link') as $element) { if (strpos($element->href,'.css') !== false) { $size = get_remote_file_size($element->href); $totalSize = $totalSize + $size; $totalNumResources += 1; /* echo "total resources: $totalNumResources .\n"; echo "Total Size So Far: $totalSize.\n"; echo "$element->href.\n"; */ } //only output the ones with 'css' inside... } //find all javascript: foreach($html->find('script') as $element) { //check to see if it is javascript file: if (strpos($element->src,'.js') !== false) { $size = get_remote_file_size($element->src); //echo " JS SIZE: $size.\n"; $totalSize = $totalSize + $size; $totalNumResources += 1; /* echo "Total Size So Far: $totalSize.\n"; echo "total resources: $totalNumResources .\n"; echo "$element->src.\n"; */ } } echo "Final total download size: $totalSize Bytes" ; echo "Final total HTTP requests: $totalNumResources"; function get_remote_file_size($url) { $headers = get_headers($url, 1); if (isset($headers['Content-Length'])) return $headers['Content-Length']; //this one checks for lower case "L" IN CONTENT-length: if (isset($headers['Content-length'])) return $headers['Content-length']; $c = curl_init(); curl_setopt_array($c, array( CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3'), )); curl_exec($c); $size = curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD); return $size; curl_close($c); } /*checks content type header to see if it is an HTML page... */ function check_if_html($url){ $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_HEADER, TRUE); curl_setopt($ch, CURLOPT_NOBODY, TRUE); $data = curl_exec($ch); $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE ); curl_close($ch); if (strpos($contentType,'text/html') !== false) return TRUE; // this is HTML, yes! else return FALSE; } If you see some improvements we can make to the code above, please let us know in the comments. Press next to see part 2 of this series of PHP web developer interview questions. Advanced PHP Practice Interview Questions And Answers Part 2 This is a continuation of the practice PHP interview question from part 1. Here is the additional part of the question that we want you to try to answer: The code should also be able to handle the URL in the src attribute of an iframe. And, here is the original question for your convenience: Write a PHP script to report the total download size of any URL. You may not use any 3rd-party code that performs the entire task described below. No HTML interface is necessary for this exercise; you can write this as a command-line script that accepts the URL as an argument. For a single-file resource such as an image or SWF, the script would simply report on the total size of the document. For a complex resource such as an HTML document, the script would need to parse it to find references to embedded, included resources: javascript files, CSS files, iframes, etc. The goal of this exercise is to output total number - total download size for all requests the following of information HTTP for a given URL: requests How to handle an iframe src The second part of the question states that we will need to be able to handle the URL in the src attribute of an iframe tag. What exactly does that mean? Well, the src attribute of an iframe tag points to another HTML page. When an iframe tag is used in a page it’s like embedding another HTML page within that page. And since the whole point of this exercise is to find the number of HTTP requests being made along with the total download size for all requests, we basically have to follow the iframe src URL ourselves and figure out how many new HTTP requests are created from that URL and what their total download size will be. For example, we actually use an iframe tag on this page to embed the Facebook likebox, which you can see on the bottom of the left hand sidebar. This is what our iframe tag looks like (you can also see this if you “View Source” for this page) – note that the iframe src actually points to a php page called “likebox.php”: <iframe src="http://www.facebook.com/plugins/likebox.php? href=http%3A%2F%2Fwww.facebook.com%2Fpages%2FProgrammerInterview%2F120896424636091&width=238&colorscheme =light&show_faces=false&stream=false&header=true& height=62" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:230px; height:70px;" allowTransparency="true"></iframe> You can see the result on this page itself when the iframe is actually rendered – there is a like button, a count of likes, some text, and an image of a nerd. The like button and the nerd will be 2 separate HTTP requests. The iframe src itself counts as an HTTP request as well – because the browser will have to make a request for whatever URL the iframe src points to. View source does not show you markup generated by iframe But, the thing is that when we read the HTML on the page, we will only see the iframe tag – we will not see the markup that is created by the iframe. This is an important point to understand, and you can confirm this fact by just doing a view source on this page. Even if we tell cURL to retrieve the page for us, the HTML returned will have the iframe tag in it’s original form, and not in it’s rendered form. For that reason, if we want to find out the number of HTTP requests that the iframe will generate, we have to take a look at the URL in the source tag itself and evaluate just like we did for the original URL. What this means is that we will essentially have to re-use the same code to find the requests and download size that we used for the top level (containing) document as we would use for the containing document that will hold the iframe tag. Think about that for a second and see if you can come up with a good approach to solve that problem on your own. It turns out that we can actually use recursion to basically call our existing code, and pass in the URL value from the iframe src attribute. This way we can find the number of HTTP requests and total download size for the requests that will come from the iframe src URL, we just re-use the code that we have already written. In order to use recursion here, we should have our code wrapped inside a function. So, with that in mind we create the function below that we call start – note that we deliberately left out the code which is used to find images/css/javascript since we just want to focus on the iframe piece and making a recursive call. Using recursion to answer PHP Interview question part 2 So, we can just make a recursive call to the start function as shown below in red: function start($URL){ if (!check_if_html($URL)) { $totalSize = get_remote_file_size($URL); echo "Final Total Download Size: $totalSize Bytes "; $totalNumResources += 1;//single resource is an HTTP request echo "Final total HTTP requests: $totalNumResources" ; return; } /* at this point we know we are dealing with an HTML document which also counts as a resource, so increment the $totalNumResources variable by 1: */ $totalNumResources += 1; foreach($html->find('iframe') as $element) { echo "IFRAME" . "$element->src.\n"; start($element->src); } } //CLOSING BRACE FOR "START" FUNCTION But, wait a second. What will happen to the $totalSize and $totalNumResources variables? Well, with the implementation above they will actually get reset during the recursive call, and when the recursive call returns, the values will be back to what they were originally, before the recursive call. This makes no sense – what we really want is to count the number of HTTP requests and calculate the download size of the requests that are added by the iframe. Remember – we want a cumulative sum of the HTTP requests and download size, including whatever resources are added to the page by the iframe. Saving the values of the PHP variables So, there clearly needs to be a way to save the value of those variables while the recursive call is made – so that the recursive call can just add on to those values. The way to do this is to pass in the values of the $totalSize and $totalNumResources variables into the recursive call to the start function – so the start function will now have to be modified so it can accept two extra parameters. But, just passing the variables $totalSize and $totalNumResources into the recursive call is not enough – we also need to return those variables from the function itself. If those values are incremented during the recursive call, we need to be sure to retain the modified values even after the recursive call is over. So, we will have to use the code below – note the changes are in red: function start($URL, $totalSize, $totalNumResources){ if (!check_if_html($URL)) { $totalSize = get_remote_file_size($URL); echo "Final Total Download Size: $totalSize Bytes "; $totalNumResources += 1; //single resource is HTTP request echo " Final total HTTP requests: $totalNumResources" ; return; } /* at this point we know we are dealing with an HTML document which also counts as a resource, so increment the $totalNumResources variable by 1 */ $html = file_get_html($URL); $totalNumResources += 1; foreach($html->find('iframe') as $element) { echo "IFRAME:" . "$element->src.\n"; list($totalSize, $totalNumResources) = start($element->src, $totalSize, $totalNumResources); } return array($totalSize, $totalNumResources) ; } //closing brace for 'start' function... Note that we use the list function in PHP to hold the variables that will be returned once the start function returns from the recursive call. Another interesting thing that we should point out in the code above is the fact that we do not have any code inside the iframe foreach loop that increments the $totalNumResources by 1. This is because during the recursive call the $totalNumResources variable will be incremented by 1 anyways, because the iframe URL is counted as a separate HTML document. The final answer to Advanced PHP Interview Question Part 2 Now, here is what the complete PHP code looks like which includes the recursive call to the start function, and is our final answer to part 2 of the PHP interview questions: include('simple_html_dom.php'); $URL = $argv[1]; $totalSize = 0; $totalNumResources = 0; list($totalSize, $totalNumResources) = start($URL, $totalSize, $totalNumResources); echo "Final total download size: $totalSize Bytes " ; echo "Final total HTTP requests: $totalNumResources"; function start($URL, $totalSize, $totalNumResources){ if (!check_if_html($URL)) { $totalSize = get_remote_file_size($URL); echo "Final Total Download Size: $totalSize Bytes "; $totalNumResources += 1; //a single resource is an HTTP request echo " Final total HTTP requests: $totalNumResources" ; return; } $html = file_get_html($URL); // find all images!! foreach($html->find('img') as $element){ $size = get_remote_file_size($element->src); $totalSize = $totalSize + $size; $totalNumResources += 1; //echo "Here is the total size: $totalSize.\n"; // echo "total resources: $totalNumResources .\n"; } // Find all css foreach($html->find('link') as $element) { if (strpos($element->href,'.css') !== false) { $size = get_remote_file_size($element->href); // echo "SIZE: $size.\n"; $totalSize = $totalSize + $size; $totalNumResources += 1; //echo "total resources: $totalNumResources .\n"; // echo "Here is the total size: $totalSize.\n"; //echo "$element->href.\n"; } //only output the ones with 'css' inside... } foreach($html->find('script') as $element) { if (strpos($element->src,'.js') !== false) { $size = get_remote_file_size($element->src); $totalSize = $totalSize + $size; $totalNumResources += 1; // echo "Here is the total size: $totalSize.\n"; // echo "total resources: $totalNumResources .\n"; //echo "$element->src.\n"; } } foreach($html->find('iframe') as $element) { //echo "IFRAME" . "$element->src.\n"; /* DON't count iframe as request, because it will be counted as an HTML document which means it will be counted above, so get rid of the increment line in the actual code, and explain this point as well!! */ list($totalSize, $totalNumResources) = start($element->src, $totalSize, $totalNumResources); } return array($totalSize, $totalNumResources) ; } //CLOSING BRACE FOR THE FUNCTION "START"... function get_remote_file_size($url) { $headers = get_headers($url, 1); if (isset($headers['Content-Length'])) return $headers['Content-Length']; //THIS ONE CHECKS FOR LOWER CASE L IN CONTENT-length (DIFFERENT FROM ABOVE!!) if (isset($headers['Content-length'])) return $headers['Content-length']; $c = curl_init(); curl_setopt_array($c, array( CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3'), )); curl_exec($c); $size = curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD); return $size; curl_close($c); } function check_if_html($url){ $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_HEADER, TRUE); curl_setopt($ch, CURLOPT_NOBODY, TRUE); $data = curl_exec($ch); $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE ); curl_close($ch); if (strpos($contentType,'text/html') !== false) return TRUE; // this is HTML, yes! else return FALSE; } Advanced PHP Practice Interview Questions And Answers Part 3 This is the last portion of our PHP practice interview questions and answers. Here is part 3 of the PHP practice interview question: Given the previous two portions of this question, can you name some of the drawbacks or disadvantages of the solution you provided? And, here are parts 1 and 2 of the original question for your convenience: Write a PHP script to report the total download size of any URL. You may not use any 3rd-party code that performs the entire task described below. No HTML interface is necessary for this exercise; you can write this as a command-line script that accepts the URL as an argument. For a single-file resource such as an image or SWF, the script would simply report on the total size of the document. For a complex resource such as an HTML document, the script would need to parse it to find references to embedded, included resources: javascript files, CSS files, iframes, etc. The goal of this exercise is to output total number - total download size for all requests the following of information HTTP for a given URL: requests The code should also be able to handle the URL in the src attribute of an iframe. There are a lot of potential problems with the solution that we gave for this interview question. And, some of those problems are unavoidable given that this is an interview question, and you don’t have a few weeks to give a perfect answer. Try to see if you can think of any potential problems on your own. No Javascript was executed to find additional resources One disadvantage is the fact that our PHP code does not try to execute any Javascript in order to find additional resources. What exactly does that mean? Well, many websites have some Javascript code that, when executed, will request and display specific resources on their page – like gif’s, jpg’s, swf’s, or whatever else the Javascript want’s to display (it’s really up to whoever wrote the Javascript code). One specific example of Javascript requesting and displaying resources are websites (like this one) which use Google Adsense to put Google’s ads on their website. In order to do this, a Javascript script is provided by Google. And, some Javascript variables are also passed to the script to tell the script what size the ad should be. That Javascript is then executed by the browser and the correct ad is generated – it could be a SWF file (which is a Flash format), a jpg, a gif, or whatever file type Google determines is appropriate to display at the moment for whoever is viewing the webpage. But, we can’t possibly be expected to run this Javascript in order to see what kind of resource is generated – doing this could be tricky. And, there’s no guarantee that trying to execute this Javascript would even be successful. This means that this will be one less resource that will be counted – and even more if there are multiple ads on a page (like there are on this page). So, that’s one drawback of the PHP code we came up with to answer this interview question. And, this means that less HTTP requests are counted than the true number of HTTP requests on pages that use Javascript to request additional resources. Checking for duplicate resources One thing we admittedly do not do in our implementation is check for duplicate resources – like images/CSS/Javascript files being referenced more than once in the HTML. Even though the files are referenced multiple times in the HTML file, they still result in only one HTTP request because browsers are smart enough to only make the request once. But, we are counting each one of those as a separate request – this should be fairly simple to fix, but it is an issue with our code which could result is double-counting of HTTP requests. Checking for resources inside CSS documents Another thing that we did not do is check for resources named inside external CSS documents – like “background-image”. That means our implementation will not count those resources, which would lead to a total number of HTTP requests that is lower than the actual number. Checking for browser specific code Another thing we did not do in our implementation is check for browser specific code – as in the HTML tags that look like this: “<!–[if IE]>”. This could also potentially affect the HTTP request count and total download size that our implementation reports, because certain files could conditionally be included depending on what browser the user is using. A common usage of this is to use a different stylesheet for older Internet Explorer browsers. This would mean that our implementation would double count – both the stylesheet for Internet Explorer and the stylesheet for non-Internet Explorer browsers. Browser cache One thing that may be challenging is accounting for the resources which are already cached by the browser. If a resource is cached, then it means that the browser will not generate a new HTTP request for that resource because the browser will just use the version of the resource that it has saved in it’s cache. This means that since we are not even taking browser caching into account in our implementation (something that would probably be very difficult), in situations where cached resources are used we would definitely be over-estimating the number of HTTP requests because we just count every single resource in the HTML as an HTTP request. Final thoughts on Advanced PHP interview question and answer part 3 Now, those are just some of the potential drawbacks of our answer to the PHP interview question. Remember, you can see the final solution we came up with right here: PHP Interview Question Part 2. Writing something that is very accurate given all the complexities would be very challenging and time consuming – and would certainly not be expected in a PHP interview question like this. Most likely, what interviewers are looking for with a question like this is that you have some essential PHP skills, and a good foundation of knowledge for how the web works. How would you return an array from a function in PHP? If you have a function and you want to return multiple values from that function then you can easily return an array from the function. This is what it would look like: function someFunc( ) { $aVariable = 10; $aVariable2 = 20; return array($aVariable, $aVariable2); } How to retrieve the values returned from a function in PHP You will probably want to retrieve the values returned from the function after you call it. What is the best way to do this? Well, there is actually a nice way to do this using the list function in PHP. Here is an example of how to retrieve the values returned from an array in a function – assuming that we are calling the same someFunc function that we showed above: list($var1, $var2) someFunc( ); = //will print out values from someFunc echo "$var1 $var2"; Now, $var1 and $var2 will hold the same values as $aVariable and $aVariable2 from the function someFunc. Another option for retrieving the values returned from a function in PHP Another possibility is to just store the return values in an array as well – here is an example: $results = someFunc(); echo $results[0]; echo $results[1]; Note that in the example above everything returned from the call to someFunc is stored in the $results array – and the echo statements will output the values returned from someFunc. How do you delete cookies in PHP? Also, provide an example showing how it’s done. The interesting thing about deleting a cookie in PHP is the fact that you must use the same PHP function that you would use to create the cookie – and that is the setcookie function. Deleting cookies using the setcookie function in PHP The setcookie() function can actually accept up to six arguments, but only one argument is actually required — and that is the cookie name. If you use the setcookie function, and just pass a cookie name without a value, then it will have the same effect as deleting the existing cookie with the same exact name. For example, to create a cookie called first_name, you use this line: setcookie('first_name', 'Robert'); And to delete the first_name cookie, you would do this: Example of deleting a cookie in PHP: setcookie('first_name'); But, as an extra safety measure, you should also set the expiration time to a time in the past – as you can see below where we pass in “time() – 300″ for the expiration date. This is the way we recommend that you delete the cookie in PHP: Recommended way to delete a cookie in PHP: setcookie('first_name', '', time()-300); Parameters that must be set when deleting a cookie When you delete a cookie, you should always use the same exact parameters that were used to create the cookie in the first place. For example, If you set the domain and path when you created the cookie, then you should use those parameters again when deleting the cookie. Other interesting facts about deleting cookies in PHP When deleting a cookie, that deletion does not actually take effect until the page has been reloaded or another page has been accessed. This means that a cookie will still be available to a given page even after that page has deleted that cookie – but once the page is reloaded or another page is accessed in that browser window the cookie will be deleted. What’s the difference between a cookie and a session in PHP? PHP sessions improve upon cookies because they allow web applications to store and retrieve more information than cookies. PHP sessions actually use cookies, but they add more functionality and security. Sessions store data on the server, not on the browser like cookies The main difference between a session and a cookie is that session data is stored on the server, whereas cookies store data in the visitor’s browser. Sessions use a session identifier to locate a particular user’s session data. This session identifier is normally stored in the user’s web browser in a cookie, but the sensitive data that needs to be more secure — like the user’s ID, name, etc. — will always stay on the server. Sessions are more secure than cookies So, why exactly should we use sessions when cookies work just fine? Well, as we already mentioned, sessions are more secure because the relevant information is stored on the server and not sent back and forth between the client and server. The second reason is that some users either turn off cookies or reject them. In that scenario, sessions, while designed to work with a cookie, can actually work without cookies as a workaround, as you can read about here: Can PHP sessions work without cookies?. Sessions need extra space, unlike cookies PHP sessions, unlike cookies which are just stored on the user’s browser, need a temporary directory on the server where PHP can store the session data. For servers running Unix this isn’t a problem at all, because the /tmp directory is meant to be used for things like this. But, if your server is running Windows and a version of PHP earlier than 4.3.6, then the server will need to be configured – here is what to do: Create a new folder on your Windows server – you can call it something like C:\temp. You want to be sure that every user can read and write to this folder. Then, you will need to edit your php.ini file, and set the value of session.save_path to point to the folder which you created on the Windows server (in this case, that folder is under C:\temp). And finally, you will need to restart your web server so that the changes in the php.ini file take effect. Sessions must use the session_start function A very important thing to remember when using sessions is that each page that will use a session must begin by calling the session_start() function. The session_start() function tells PHP to either start a brand new session or access an existing one. How session_start in PHP uses cookies The first time the session_start() function is used, it will try to send a cookie with a name of PHPSESSID and a value of something that looks like a30f8670baa8e10a44c878df89a2044b – which is the session identifier that contains 32 hexadecimal letters. Because cookies must be sent before any data is sent to the browser, this also means that session_start must be called before any data is sent to the Web browser. Registering values to the session After the session_start function is called, values can be registered to the session using the $_SESSION associative array. This is what it would look like: $_SESSION['name'] = 'Jack'; $_SESSION['last_name'] = 'Lopez'; Can sessions work without cookies? If so, how does a session work without cookies enabled in PHP? This is a great interview question because even if you do not know the answer, you could come up with a fairly accurate answer on your own with some basic knowledge of PHP sessions and some analytical thinking. See if you can possibly think of how PHP sessions would work without cookies enabled in the browser. The answer to how PHP sessions can work without cookies Sessions in PHP normally do use cookies to function. But, PHP sessions can also workwithout cookies in case cookies are disabled or rejected by the browser that the PHP server is trying to communicate with. How PHP sessions work without cookies PHP does two things in order to work without cookies: 1. For every HTML form that PHP finds in your HTML code (which of course can be part of a PHP file), PHP will automatically add a hidden input tag with the name PHPSESSID right after the <form> tag. The value of that hidden input tag would be whatever value PHP assigns your session ID. So, for example, the hidden input could look something like this: <form> <input type="hidden" name="PHPSESSID" value="12345678" > </form> This way, when the form is submitted to the server, PHP will be able to retrieve the session identifier from the form and will know who it is communicating with on the other end, and will also know which session to associate the form parameters with if it is adding the form parameters to the PHP session. 2. PHP will find all the links in your HTML code, and will modify those links so that they have a GET parameter appended to the link itself. That GET parameter will also have the name of PHPSESSID, and the value will of course be the unique session identifier – so the PHP session ID will basically be a part of the URL query string. So, for example, if your code has a link that originally looks like this: <a href="http://www.example.com">Go to this link><a/> When modified by PHP to include the session ID, it could look something like this: <a href="http://www.example.com?PHPSESSID=72aa95axyz6cd67d82ba0f809277326dd">Go this link</> to PHPSESSID can have it’s name changed in php ini file Note that we said PHPSESSID is the name that will be used to hold the PHP session value. The name PHPSESSID can actually be changed to whatever you want if you modify the session.name value in the php.ini file. What is a disadvantage of using PHP sessions without cookies enabled? A disadvantage is that using PHP sessions without cookies is the fact that if you share a URL that has the PHP session ID appended to it with someone else, then they could potentially use the same exact session that you were using. It also opens you up to session hijacking – where a user’s session is deliberately stolen so that a hacker can impersonate you and do some damage.