Laboratory Manual Computer Lab Practice - II (Distributed Systems) Final Year - Information Technology Teaching Scheme Examination Scheme Theory : —— Term Work: 50 Marks Practical : 2 Hrs/Week Practical : 50 Marks Oral : —— Prepared By Prof. Dinesh A. Zende Department of Information Technology Vidya Pratishthan’s College of Engineering Baramati – 413133, Dist- Pune (M.S.) INDIA December 2012 Table of Contents 1 Implementation of Chat application using socket programming 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Pre Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Hardware and Software Requirement . . . . . . . . . . . . . . . . . 1.4 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Using UDP Socket . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Using TCP Socket . . . . . . . . . . . . . . . . . . . . . . . 1.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Viva Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . in Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 2 3 4 5 6 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 10 10 11 11 17 17 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 20 20 20 20 24 24 Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 26 26 26 2 Implementation of Remote Method Invocation using Java RMI 2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Pre Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Hardware and Software Requirement . . . . . . . . . . . . . . . . . . 2.4 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Viva Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Implementation of Client-Server architecture using Socket 3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Pre Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Hardware and Software Requirement . . . . . . . . . . . . . . 3.4 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Viva Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Case Study on Cloud 4.1 Problem Statement 4.2 Pre Lab . . . . . . 4.3 Theory . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 i List of Tables ii List of Figures 1.1 Client GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.1 2.2 General RMI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RMI Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 3.1 3.2 File Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steps to establish socket communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 21 4.1 Architecture of cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 iii Assignment 1 Implementation of Chat application using socket programming in Java 1.1 Problem Statement Implementation of a simple chat system using Socket Programming (TCP Sockets) 1. Your chat system includes two types of components (a) A chat room and (b) The client 2. System contains maximum 3 clients, each can enter or leave the system at any time and one can design GUI as given in Figure 1.1. 3. Chat room is long lived ’server’ component and there is no GUI at server side. Figure 1.1: Client GUI 1 Implementation of Chat application using socket programming in Java 4. All messages are to be broadcasted to all clients connected to the chat room. 1.2 Pre Lab • Concepts of Sockets, Ports, Transport Level Protocols • Knowledge of Computer Networks • Knowledge of Programming in Core Java. • Knowledge of Network Programming in Java. 1.3 Hardware and Software Requirement 1. Hardware Requirement • Computer with 1 GHz Processor, 256 MB RAM, 40 GB HDD with network support. 2. Software Requirement • JDK 1.6 or Higher • (optional) Netbeans 6.9 IDE 1.4 Theory • Implement the Inter Process Communication (IPC) using Socket Programming 1. Using UDP Sockets 2. Using TCP Sockets • In this assignment you will implement a Chat Server. Client process will send some string or messages to the server. And in response to that the server process will send the same string to all available client processes (i.e. it will broadcast the message). Lab Manual - Computer Lab Practice - II 2 VPCOE, Baramati Implementation of Chat application using socket programming in Java 1.4.1 Using UDP Socket public class DatagramPacket DatagramPackets can be created with one of four constructors: public DatagramPacket(byte[ ] ibuf, int size); public DatagramPacket(byte[ ] ibuf, int offset,int size); public DatagramPacket(byte[ ] ibuf, int size, InetAddress ipaddr,int port); public DatagramPacket(byte[ ] ibuf, InetAddress ipaddr, int port); Public Instance Methods • public InetAddress getAddress(); It is used to get the address of destination. • public byte[] getData(); Returns the byte array of data contained in the datagram. Mostly used to retrieve data from the datagram after it has been received. • public int getLength(); Returns the length of valid data contained in the byte array that would be returned from getData() method. This typically does not equal the length of the whole byte array. • public int getPort(); Returns the port number. Passed To: DatagramSocket.receive(); DatagramSocket.send(); DatagramSocketImpl.receive(); DatagramSocketImpl.send(); MulticastSocket.send(); public class DatagramSocket Lab Manual - Computer Lab Practice - II 3 VPCOE, Baramati Implementation of Chat application using socket programming in Java • This class defines a socket that can receive and send unreliable datagram packets over the network using the UDP protocol. • A datagram does not implement any kind of stream-based communication protocol, and there is no connection established between the sender and the receiver. Datagram packets are called ”unreliable” because the protocol does not make any attempt to ensure that they arrived or to resend them if they did not. Public Constructors • public DatagramSocket(); • public DatagramSocket(int port); • public DatagramSocket(int port,InetAddress ipaddr); All above constructors throw exception SocketException; Public Instance Methods • public void close(); • public InetAddress getLocalAddress(); • public int getLocalPort(); • public int getSoTimeout() throws SocketException; • public void receive(DatagramPacket p) throws IOException; • public void send(DatagramPacket p) throws IOException; • public void setSoTimeout(int timeout) throwsSocketException; 1.4.2 Using TCP Socket In the implementation with TCP Socket you have to make use of the following classes public class ServerSocket Public Constructors • public ServerSocket (int port); Lab Manual - Computer Lab Practice - II 4 VPCOE, Baramati Implementation of Chat application using socket programming in Java Public Methods • public Socket accept(); • public void close(); • public InetAddress getInetAddress(); • public int getLocalPort(); public class Socket Public Constructors • public Socket(String host, int port); • public Socket(InetAddress aHost, int port); Methods • public InetAddress getInetAddress(); • public InputStream getInputStream(); • public OutputStream getOutputStream(); 1.5 Procedure Steps for Implementation of UDP Socket (Server) 1. Create a DatagramSocket and bind to specified port. 2. Create an instance of DatagramPacket to be read from port. 3. Receive DatagramPacket from port using receive method. 4. Display received packet data. Steps for Implementation of UDP Socket (Client) 1. Create a DatagramSocket. 2. Create a DatagramPacket with the specification of the Remote Host, port number and the data to be sent. Lab Manual - Computer Lab Practice - II 5 VPCOE, Baramati Implementation of Chat application using socket programming in Java 3. Send this packet using send method. Steps for Implementation of TCP Socket (Server) 1. Create a socket with the specification of the port number 2. Listen to that port with listen method of the socket. 3. When there is a request for connections then accept the connection using accept method of the socket. 4. Read and write to the socket using DataInputStream and DataOutputStream respectively. Steps for Implementation of TCP Socket (Client) 1. Create a socket with the specification of the host machine (server) and the port number. 2. Specify DataInputStream and DataOutputStream for reading and writing to the Socket. 3. Write data using UTF8 encoding with writeUTF8( ) method. 1.6 Post Lab • Implement an application for a chat server and multiple clients using TCP and UDP both. Compare the usage of TCP Vs UDP Sockets w.r.t. this application. Which is best suitable? • Hence from this assignment you can learn how to build client and server application, that communicate using socket. Also you can learn how reliable and unreliable communication occurs in them. 1.7 Viva Questions 1. What is a Distributed Systems? Lab Manual - Computer Lab Practice - II 6 VPCOE, Baramati Implementation of Chat application using socket programming in Java 2. Give few examples of distributed systems? 3. What is the Difference between Networked System and Distributed System? 4. Name few characteristics of Distributed Systems? 5. Name Some Case Studies of Distributed Systems which you have studied? 6. If you are said to design a Distributed Systems for your Client which design issues you are going to consider? 7. Explain the TCP and UDP Protocols 8. What is a Distributed Systems? 9. Give few examples of distributed systems? 10. What is the Difference between Networked System and Distributed System? 11. Name few characteristics of Distributed Systems? 12. Name Some Case Studies of Distributed Systems which you have studied? 13. If you are said to design a Distributed Systems for your Client which design issues you are going to consider? 14. Explain the TCP and UDP Protocols 15. What are Diff challenges faced by Distributed Systems? 16. Name Popular System Models in Distributed Systems? 17. Explain the Difference between Message oriented Communication and Stream Oriented Communication. 18. What are Layered Protocols? Lab Manual - Computer Lab Practice - II 7 VPCOE, Baramati [This page intentionally left blank ] Implementation of Chat application using socket programming in Java Lab Manual - Computer Lab Practice - II 9 VPCOE, Baramati Assignment 2 Implementation of Remote Method Invocation using Java RMI 2.1 Problem Statement Write a program to implement Simple Student database application using RMI. Remote client consist of GUI for performing different database operations (For ex. Insert, delete, update) and retrieving data through RMI. 2.2 Pre Lab • Concepts of Sockets, Ports, Transport Level Protocols • Knowledge of TCP and UDP Socket Programming • Knowledge of Programming in Core Java. • Knowledge of Remote Method Invocation. 2.3 Hardware and Software Requirement 1. Hardware Requirement • Computer with 1 GHz Processor, 256 MB RAM, 40 GB HDD with network support. 10 Implementation of Remote Method Invocation using Java RMI 2. Software Requirement • JDK 1.6 or Higher • (optional) Netbeans 6.9 IDE 2.4 Theory • The server must first bind its name to the registry • The client lookup the server name in the registry to establish remote references. • The Stub serializing the parameters to skeleton, the skeleton invoking the remote method and serializing the result back to the stub.. • A client invokes a remote method; the call is first forwarded to stub. • The stub is responsible for sending the remote call over to the server-side skeleton. • The stub opening a socket to the remote server, marshaling the object parameters and forwarding the data stream to the skeleton. • A skeleton contains a method that receives the remote calls, unmarshals the parameters, and invokes the actual remote object implementation. 2.5 Procedure Steps for Developing an RMI System 1. Define the remote interface 2. Develop the remote object by implementing the remote interface. 3. Develop the client program. 4. Compile the Java source files. 5. Generate the client stubs and server skeletons. 6. Start the RMI registry. Lab Manual - Computer Lab Practice - II 11 VPCOE, Baramati Implementation of Remote Method Invocation using Java RMI Figure 2.1: General RMI Architecture Figure 2.2: RMI Invocation Lab Manual - Computer Lab Practice - II 12 VPCOE, Baramati Implementation of Remote Method Invocation using Java RMI 7. Start the remote server objects. 8. Run the client • Step 1: Defining the Remote Interface To create an RMI application, the first step is the defining of a remote interface between the client and server objects. /* SampleServer.java */ import java.rmi.*; public interface SampleServer extends Remote { public int sum(int a,int b) throws RemoteException; } • Step 2: Develop the remote object by implementing the remote interface. – The server is a simple unicast remote server. – Create server by extending java.rmi.server.UnicastRemoteObject. – The server uses the RMISecurityManager to protect its resources while engaging in remote communication. /* SampleServerImpl.java */ import java.rmi.*; import java.rmi.server.*; import java.rmi.registry.*; public class SampleServerImpl extends UnicastRemoteObject implements SampleServer { SampleServerImpl() throws RemoteException { super(); } Lab Manual - Computer Lab Practice - II 13 VPCOE, Baramati Implementation of Remote Method Invocation using Java RMI public int sum(int a,int b) throws RemoteException { return a + b; } } – The server must bind its name to the registry, the client will look up the server name. – Use java.rmi.Naming class to bind the server name to registry. In this example the name call SAMPLE-SERVER. – In the main method of your server object, the RMI security manager is created and installed. //RMIServer.java public static void main(String args[]) { try { //create a local instance of the object SampleServerImpl Server = new SampleServerImpl(); //put the local instance in the registry Naming.rebind("SAMPLE-SERVER " , Server); System.out.println("Server waiting....."); } catch (java.net.MalformedURLException me) { System.out.println("Malformed URL: " + me.toString()); } catch (RemoteException re) Lab Manual - Computer Lab Practice - II 14 VPCOE, Baramati Implementation of Remote Method Invocation using Java RMI { System.out.println("Remote exception: " + re.toString()); } } • Step 3: Develop the client program – In order for the client object to invoke methods on the server, it must first look up the name of server in the registry. – You use the java.rmi.Naming class to lookup the server name. – The server name is specified as URL in the from rmi://host:port/name – Default RMI port is 1099. – The name specified in the URL must exactly match the name that the server has bound to the registry. – In this example, the name is SAMPLE-SERVER – The remote method invocation is programmed using the remote interface name (remoteObject) as prefix and the remote method name sum as suffix. //RMIClient.java import java.rmi.*; import java.rmi.server.*; public class SampleClient { public static void main(String[] args) { //get the remote object from the registry try { System.out.println("Security Manager loaded"); String url = "//localhost/SAMPLE-SERVER"; SampleServer remoteObject = (SampleServer)Naming.lookup(url); Lab Manual - Computer Lab Practice - II 15 VPCOE, Baramati Implementation of Remote Method Invocation using Java RMI System.out.println("Got remote object"); System.out.println(" 1 + 2 = " + remoteObject.sum(1,2) ); } catch (RemoteException exc){ System.out.println("Error in lookup: " + exc.toString()); } catch (java.net.MalformedURLException exc) { System.out.println("Malformed URL: " + exc.toString()); } catch (java.rmi.NotBoundException exc) { System.out.println("NotBound: " + exc.toString()); } } } • Step 4 and 5: Compile the Java source files and Generate the client stubs and server skeletons – Once the interface is completed, you need to generate stubs and skeleton code. The RMI system provides an RMI compiler (rmic) that takes your generated interface class and procedures stub code on its self. Follow these steps to compile and run RMI Application c:\jdk1.4\RMI> set CLASSPATH= c:\jdk1.4\bin\ c:\jdk1.4\RMI> javac SampleServer.java c:\jdk1.4\RMI> javac SampleServerImpl.java c:\jdk1.4\RMI> javac SampleClient.java c:\jdk1.4\RMI> rmic SampleServerImpl c:\jdk1.4\RMI> start rmiregistry • The RMI applications need install to Registry. And the Registry must start manual by call rmiregistry. • The rmiregistry uses port 1099 by default. You can also bind rmiregistry to a different port by indicating the new port number as : rmiregistry ¡new port¿ • On Windows, you have to type in from the command line: start rmiregistry Lab Manual - Computer Lab Practice - II 16 VPCOE, Baramati Implementation of Remote Method Invocation using Java RMI Advancements: Create an RMI application for following requirements 1. Unit Converter Application 2. Currency Converter Application 3. Simple Calculator 4. Time Server 5. Echo Server 6. String Operations 2.6 Post Lab You have to develop an RMI Server, where database will be residing. RMI Client will have GUI with functions like, Insert, delete, update. 2.7 Viva Questions 1. What is RPC and LRPC? 2. What is the advantage of RPC 2 over RPC? 3. How do we provide security to RMI classes? 4. What are Layered Protocols? 5. What is Remote Method Invocation? 6. What is Distributed File System (DFS)? 7. What do you mean by Auto mounting? 8. What is the advantage of RPC2 over RPC? 9. What are advances in CODA as to AFS? 10. Which is the most Important Feature of CODA? Lab Manual - Computer Lab Practice - II 17 VPCOE, Baramati Implementation of Remote Method Invocation using Java RMI 11. What are Stubs and Skeletons? 12. How communication does takes place in NFS? 13. Explain the Naming concept in NFS? 14. How Synchronization takes place in NFS? 15. How do you implement file locking in NFS? 16. What is Vice and Virtue related to CODA? Lab Manual - Computer Lab Practice - II 18 VPCOE, Baramati Assignment 3 Implementation of Client-Server architecture using Socket Programming in Linux 3.1 Problem Statement Imagine a Client-Server architecture (As shown in figure 3.1 ), where user stores the file on a server. The main server splits that file into two or more fragments and store each fragment on separate storage server. When client retrieve the file from the main server, the main server again retrieves the file in fragments from storage servers and present it as a one file to user. Figure 3.1: File Server Architecture 19 Implementation of Client-Server architecture using Socket Programming in Linux 3.2 Pre Lab • Concepts of Sockets, Ports, Transport Level Protocols • Concepts of Computer Network • Knowledge of Programming in C under Linux. 3.3 Hardware and Software Requirement 1. Hardware Requirement • Computer with 1 GHz Processor, 256 MB RAM, 40 GB HDD with network support. 2. Software Requirement • Operating System - Linux • (optional) GEdit or any other Editor 3.4 Theory In this assignment you will implement client-server architecture using socket.A socket is a communication mechanism that allows client/server systems to be developed either locally, on a single machine or across network.Client and main server can communicate by using socket. Main server and fragmented server can also communicate by using socket. 3.5 Procedure 1. Server creates socket by calling socket system call and it can’t be shared with another process. #include<sys/types.h> #include<sys/socket.h> Lab Manual - Computer Lab Practice - II 20 VPCOE, Baramati Implementation of Client-Server architecture using Socket Programming in Linux Figure 3.2: Steps to establish socket communication int socket( int family, int type, int protocol ); 2. A socket is named using bind. int bind(int sockfd,struct sockaddr *myaddr, int addr_len); if successful, returns 0,otherwise -1 3. To accept incoming connections on socket,a server program must create a queue to store pending request. The system call, listen creates queue for incoming connections. int listen(int sockfd, int backlog); if successful, returns 0,otherwise -1 4. Servers accept incoming requests by calling accept. When server calls accept, new socket is get created that is distinct from named socket and is used for communication with client. int accept(int sockfd,struct sockaddr *addr, int addrlen); 5. Client creates socket by using socket system call and send connection request to server through connect system call. int connect(int sockfd,struct sockaddr *addr, int addrlen); Lab Manual - Computer Lab Practice - II 21 VPCOE, Baramati Implementation of Client-Server architecture using Socket Programming in Linux if successful, returns 0,otherwise -1 6. Once connection is established, further communication is done by using read and write. read(int sockfd, string ch[], int len); write(int sockfd, string ch[], int len); 7. Finally, client and server calls close to close the connection. int close(int sockfd); Simple network client example: #include<sys/types.h> #include<sys/socket.h> #include<stdio.h> #include<netinet/in.h> #include<arpa/inet.h> #include<unistd.h> #include<stdlib.h> int main() { int sockfd; int len; struct sockaddr_in address; int result; char ch = ’A’; //Creating and naming the socket sockfd = socket(AF_INET,SOCK_STREAM,0); address.sin_family = AF_INET; address.sin_addr.s_addr = inet_addr("127.0.0.1"); address.sin_port = 1234; len = sizeof(address); //Connect our socket to server socket result = connect(sockfd,(struct sockaddr *) &address, len); Lab Manual - Computer Lab Practice - II 22 VPCOE, Baramati Implementation of Client-Server architecture using Socket Programming in Linux if(result == -1) { perror("oops:client1");exit(1); } //Read and Write via sockfd write(sockfd,&ch,1); read(sockfd,&ch,1); printf("\n Servers says : %c\n",ch); close(sockfd); exit(0); } Simple network server example #include<sys/types.h> #include<sys/socket.h> #include<stdio.h> #include<netinet/in.h> #include<arpa/inet.h> #include<unistd.h> #include<stdlib.h> int main() { int server_sockfd,client_sockfd; int server_len,client_len; struct sockaddr_in server_address; struct sockaddr_in client_address; //Create and name the socket server_sockfd = socket(AF_INET,SOCK_STREAM,0); server_address.sin_family = AF_INET; server_address.sin_addr.s_addr =inet_addr("127.0.0.1"); server_address.sin_port = 1234; server_len = sizeof(server_address); Lab Manual - Computer Lab Practice - II 23 VPCOE, Baramati Implementation of Client-Server architecture using Socket Programming in Linux bind(server_sockfd,(struct sockaddr *)&server_address, server_len); //Create a connection queue and wait for the clients listen(server_sockfd,5); while(1) { char ch; printf("server waiting \n"); //Accept a connection client_len = sizeof(client_address); client_sockfd = accept(server_sockfd,(struct sockaddr *) &client_address, //Read and Write to client on client sockfd read(client_sockfd,&ch,1); ch++; write(client_sockfd,&ch,1); close(client_sockfd); } } Compiling and Running server and client programs $ cc -o Serverapp server2.c $ cc -o Clientapp client2.c $ ./Serverapp & $ ./ Clientapp 3.6 Post Lab From this assignment you can study how to write a socket program in C under Linux. 3.7 Viva Questions 1. Explain TCP and UDP protocols? Lab Manual - Computer Lab Practice - II 24 VPCOE, Baramati Implementation of Client-Server architecture using Socket Programming in Linux 2. Explain difference between TCP and UDP? 3. Which system calls are used at server side program? 4. Which system calls are used at client side program? 5. What accept system call returns ? 6. Explain different socket address structures? 7. List out different address families used in socket programming 8. Explain the fields in socket system call. Lab Manual - Computer Lab Practice - II 25 VPCOE, Baramati Assignment 4 Case Study on Cloud Computing 4.1 Problem Statement Perform case study on cloud computing which will include Definition, Benefits, Drawbacks, All the services like Process as a Service, Platform as a Service, Info as a Service, Integration as a Service, Security as a Service, Storage as a Service, Governance or Management as a Service, TAAS, Infrastructure as a Service. 4.2 Pre Lab • Knowledge of Computer Networks 4.3 Theory Definition Cloud computing is a technology that uses the internet and central remote servers to maintain data and applications. • Cloud computing allows consumers and businesses to use applications without installation and access their personal files at any computer with internet access. • This technology allows for much more efficient computing by centralizing storage, memory, processing and bandwidth. • Example Yahoo email, Gmail, or Hotmail etc. 26 Case Study on Cloud Computing • You don’t need a software or a server to use them. All a consumer would need is just an internet connection and you can start sending emails. The server and email management software is all on the cloud (internet) and is totally managed by the cloud service provider Yahoo , Google etc. Characteristics of Cloud computing 1. On-demand self-service: individuals can set themselves up without needing anyone’s help; 2. Ubiquitous network access: available through standard Internet-enabled devices; 3. Location independent resource pooling: processing and storage demands are balanced across a common infrastructure with no particular resource assigned to any individual user; 4. Rapid elasticity: consumers can increase or decrease capacity at will. 5. Pay per use: consumers are charged fees based on their usage of a combination of computing power, bandwidth use and/or storage Architecture Advantages of cloud computing 1. Reduced Cost Cloud technology is paid incrementally, saving organizations money. 2. Increased Storage Organizations can store more data than on private computer systems. 3. Highly Automated No longer do IT personnel need to worry about keeping software up to date. 4. Flexibility Cloud computing offers much more flexibility than past computing methods. 5. More Mobility Employees can access information wherever they are, rather than having to remain at their desks. Lab Manual - Computer Lab Practice - II 27 VPCOE, Baramati Case Study on Cloud Computing Figure 4.1: Architecture of cloud computing 6. Allows IT to Shift Focus No longer having to worry about constant server updates and other computing issues, government organizations will be free to concentrate on innovation. Disadvantages of Cloud Computing 1. Security and Privacy The biggest concerns about cloud computing are security and privacy. Users might not be comfortable handing over their data to a third party. This is an even greater concern when it comes to companies that wish to keep their sensitive information on cloud servers. While most service vendors would ensure that their servers are kept free from viral infection and malware, it Is still a concern considering the fact that a number of users from around the world are accessing the server. Privacy is another issue with cloud servers. Ensuring that a client’s data is not accessed by any unauthorized users is of great importance for any cloud service. To make their servers more secure, cloud service vendors have developed password protected accounts, security servers through which all data being transferred must pass and data encryption techniques. After all, the success of a cloud service depends on its Lab Manual - Computer Lab Practice - II 28 VPCOE, Baramati Case Study on Cloud Computing reputation, and any sign of a security breach would result in a loss of clients and business. 2. Dependancy(loss of control) (a) Quality problems with CSP(Cloud Service Providers).No influence on maintenance levels and fix frequency when using cloud services from a CSP. (b) No or little insight in CSP contingency procedures. Especially backup, restore and disaster recovery. (c) Measurement of resource usage and end user activities lies in the hands of the CSP. 3. Cost Higher costs. While in the long run, cloud hosting is a lot cheaper than traditional technologies, the fact that it’s currently new and has to be researched and improved actually makes it more expensive. Data centers have to buy or develop the software that’ll run the cloud, rewire the machines and fix unforeseen problems (which are always there). This makes their initial cloud offers more expensive. Like in all other industries, the first customers pay a higher price and have to deal with more issues than those who switch later (although it would be very hard to create and improve new technologies without these initial adopters). 4. Decreased flexibility This is only a temporary problem (as the others on this list), but current technologies are still in the testing stages, so they don’t really offer the flexibility they promise. Of course, that’ll change in the future, but some of the current users might have to deal with the facts that their cloud server is difficult or impossible to upgrade without losing some data, for example. 5. Knowledge and integration Knowledge: More and deeper knowledge is required for implementing and managing SLA contracts with CSP’s ,Since all knowledge about the working of the cloud (e.g. hardware, software, virtualization, deployment) is concentrated at the CSP, it is hard to Lab Manual - Computer Lab Practice - II 29 VPCOE, Baramati Case Study on Cloud Computing get grip on the CSP. Integration: Integration with equipment hosted in other data centers is difficult to achieve. Peripherals integration. (Bulk)Printers and local security IT equipment (e.g. access systems) is difficult to integrate. But also (personal) USB devices or smart phones or groupware and email systems are difficult to integrate. Lab Manual - Computer Lab Practice - II 30 VPCOE, Baramati References [1] Dr. P. K. Sinha, ”Distributed Operating Systems Concepts and Design”, Prentice Hall India(PHI). [2] Andrew S. Tanenbaum and Maarten van Steen,”Distributed Systems - Principles and Paradigms”, Prentice Hall India(PHI). [3] Elliotte Rusty Harold,”Java Network Programming”-Third Edition ,O’Reilly [4] Herbert Schildt,”Java-The Complete Reference”,TMH [5] Neil Matthew et.al,”Beginnig Linux Programming”-Third Edition,Wrox Publications. [6] W. Richard Stevens,”UNIX Network Programming”,Prentice Hall India(PHI). [7] David S. Linthicum,”Cloud Computing and SOA Convergence in your Enterprise - A step by Step Guide” 31 Laboratory Manual Computer Laboratory Practice-II (Information Retrieval) Final Year - Information Technology Teaching Scheme Examination Scheme Theory : —— Term Work: 50 Marks Practical : 02 Hrs/Week/Batch Practical : 50 Marks Oral : —— Prepared By Prof.Shah Sahil K. Department of Information Technology Vidya Pratishthan’s College of Engineering Baramati – 413133, Dist- Pune (M.S.) INDIA December 2013 Table of Contents 1 Implementation of Conflation Algorithm 1.1 Problem Statement . . . . . . . . . . . . . 1.2 Pre Lab . . . . . . . . . . . . . . . . . . . 1.3 Hardware and Software Requirement . . . 1.4 Theory . . . . . . . . . . . . . . . . . . . . 1.4.1 Conflation Algorithm . . . . . . . 1.4.2 Luhn’s idea . . . . . . . . . . . . . 1.4.3 M.F.Porter’s Algorithm . . . . . . 1.5 Procedure . . . . . . . . . . . . . . . . . . 1.6 Post Lab . . . . . . . . . . . . . . . . . . . 1.7 Viva Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 1 2 2 2 3 5 5 5 2 Implementation of Single Pass Clustering 2.1 Problem Statement . . . . . . . . . . . . . 2.2 Pre Lab . . . . . . . . . . . . . . . . . . . 2.3 Theory . . . . . . . . . . . . . . . . . . . . 2.3.1 Clustering . . . . . . . . . . . . . . 2.3.2 Single Pass Clustering . . . . . . . 2.4 Procedure . . . . . . . . . . . . . . . . . . 2.5 Post Lab . . . . . . . . . . . . . . . . . . . 2.6 Viva Questions . . . . . . . . . . . . . . . Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 6 6 6 7 8 8 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 9 9 9 10 10 11 11 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 12 12 12 13 13 14 14 3 Implementation of Inverted 3.1 Problem Statement . . . . 3.2 Pre Lab . . . . . . . . . . 3.3 Theory . . . . . . . . . . . 3.3.1 File Structure . . . 3.3.2 Indexing . . . . . . 3.4 Procedure . . . . . . . . . 3.5 Post Lab . . . . . . . . . . 3.6 Viva Questions . . . . . . Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Implementation of Feature Extraction 4.1 Problem Statement . . . . . . . . . . . 4.2 Pre Lab . . . . . . . . . . . . . . . . . 4.3 Theory . . . . . . . . . . . . . . . . . . 4.3.1 Feature Extraction . . . . . . . 4.3.2 Use of feature extraction . . . . 4.4 Procedure . . . . . . . . . . . . . . . . 4.5 Post Lab . . . . . . . . . . . . . . . . . 4.6 Viva Questions . . . . . . . . . . . . . in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TABLE OF CONTENTS TABLE OF CONTENTS 5 Case Study 5.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 15 15 References 16 ii List of Figures 1.1 Relation between frequency of word and significance of word [Luhn’s idea] . . . . . . . . 3 3.1 Example of Inverted Index Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1 Histogram for a 2D Color Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 iii Assignment 1 Implementation of Conflation Algorithm 1.1 Problem Statement Develop an automated text processing system which generates the document representative of the text by giving weightage to the words appearing in the text.(Use - Luhn’s concept of automatic text analysis & Working concept of conflation algorithm.) 1.2 Pre Lab • Luhn’s Idea • M.F.Porter’s Suffix Stripping Algorithm 1.3 Hardware and Software Requirement • System with minimum 512MB RAM • JDK 1.7 • Java editor viz. Netbeans IDE 6.8/Higher Version,Eclipse etc. 1 1.4. THEORY 1.4 Implementation of Conflation Algorithm Theory Information Retrieval Calvin Mooers coined the term information retrieval in 1950. In the context of library and information science, we mean to get back information, which is, in a way, hidden, from normal sight or vision. According to, J.H. Shera: It is,“The process of locating and selecting data, relevant to a given requirement.” Calvin Mooers:“Searching and retrieval of information from storage, according to specification by subject.” 1.4.1 Conflation Algorithm In order to develop an automated text processing system which by means of computable methods with the minimum of human intervention will generate from the input text (full text, abstract, or title) a document representative adequate for use in an automatic retrieval system,conflation algorithm is mainly useful. A document will be indexed by a name if one of its significant words occurs as a member of that class. Such a system will usually consist of three parts: 1. Removal of high frequency words(Stop words & Non words Removal) 2. Suffix stripping (Using M.F.Porter’s Algorithm) 3. Detection & Removal of equivalent stems. 1.4.2 Luhn’s idea Luhn proposed that “the frequency of word occurrence in an article furnishes a useful measurement of word significance”. Luhn used Zipf’s Law as a null hypothesis to specify two cut-offs, an upper and a lower,thus excluding non-significant words. The words exceeding the upper cut-off were considered to be common and those below the lower cut-off rare, and therefore not contributing significantly to the content of the article. He thus devised a counting technique for finding significant words. The same is shown by using a plot of frequency versus rank. Stop words These are the very common words occurring frequently in a sentence and which does not have any meaning and these will not contribute in relevance of the sentence. Example of stop words include but not limited to words like a,an,the,is,was,are,were,he,she,it etc. Non words These are the words/notations used in order to represent the sentence with proper formatting characters. Example of non words include all formatting(or special) characters like ?,“,”,;,:,& etc. The removal of high frequency words, ‘Stop’ words or ‘fluff’ words is one way of implementing Luhn’s Lab Manual - CLP-II(Information Retrieval) 2 Prof.Shah Sahil K. VPCOE, Baramati 1.4. THEORY Implementation of Conflation Algorithm Figure 1.1: Relation between frequency of word and significance of word [Luhn’s idea] upper cut-off. This is normally done by comparing the input text with a ‘stop word list’ of words which are to be removed. The advantages of the process are not only that non-significant words are removed and will therefore not interfere during retrieval, but also that the size of the total document file can be reduced by between 30 and 50 per cent. 1.4.3 M.F.Porter’s Algorithm Terms with a common stem will usually have similar meanings, for example: CONNECT, CONNECTED, CONNECTING, CONNECTION, CONNECTIONS. Performance of an IR system will be improved if term groups such as this are conflated into a single term. This may be done by removal of the various suffixes -ED, -ING, -ION, -IONS, etc to leave the single term CONNECT. In addition, the suffix stripping process will reduce the total number of terms in the IR system, and hence reduce the size and complexity of the data in the system, which is always advantageous. Assumption for the algorithm is: a ’consonant’ in a word is: ”a letter other than A, E, I, O or U, and other than Y preceded by a consonant”. A ’vowel’ in a word is: ”if a letter is not a consonant it is a vowel”. Every consonant is represented by ’C’ and every vowel is represented by ’V’. A list CCC.... of length greater than 0 will be denoted by C, and a list VVV... of length greater than 0 will be denoted by V. Any word, or part of a word, therefore has one of the four forms: Lab Manual - CLP-II(Information Retrieval) 3 Prof.Shah Sahil K. VPCOE, Baramati 1.4. THEORY Implementation of Conflation Algorithm These all may be represented by the single form: [C]VCVC ... [V]. Where, the square brackets denote arbitrary presence of their contents. Using (VC)m to denote VC repeated m times, this may again be written as: [C](VC)m[V] ‘m’ will be called the ’measure’ of any word or word part when represented in this form. Some examples of it are as follows: The ‘rules’ for removing a suffix will be given in the form: This means that if a word ends with the suffix S1 and the stem before S1 satisfies the given condition, S1 is replaced by S2. The condition is usually given in terms of m, e.g.: Here S1 is ‘EMENT’ and S2 is null. This would map REPLACEMENT to REPLAC, since REPLAC is a word part for which m = 2. For two stems to be equivalent they must match except for their endings, which themselves must appear in the list as equivalent. Lab Manual - CLP-II(Information Retrieval) 4 Prof.Shah Sahil K. VPCOE, Baramati 1.5. PROCEDURE Implementation of Conflation Algorithm For example, stems such as ABSORB- and ABSORPT- are conflated because there is an entry in the list defining B and PT as equivalent stem-endings if the preceding characters match. Document representative It is a list of significant words(words having high frequency of occurrence). These are often referred to as the documents index terms or keywords. 1.5 Procedure 1. A text file is taken as a input to conflation algorithm 2. Maintain/Create a database containing list of stop words and non words. 3. Process the input file to remove the stop words and non words. This step is known as document preprocessing 4. Preprocessed file is given as input to M.F.Porter’s Suffix Stripping algorithm. 5. Detect the equivalent stems and find the frequency of occurrence of each term in the document. 6. Based on Luhn’s idea decide the upper bound(maximum frequency value of the term) and lower cutoff(based on maximum frequency value it can be decided).Apply Luhn’s idea to decide significant word set. Input: Any text(.txt,.doc) file. Output: Set of index terms/keywords(Document Representative) 1.6 Post Lab After completing this assignment,analyze the performance of conflation algorithm by taking different inputs and write your concluding points accordingly.Discuss different areas where conflation algorithm is used widely. 1.7 Viva Questions 1. Define Information Retrieval.Also,discuss advantages of IR System. 2. Explain Luhn’s idea. 3. Define Document representative. 4. Which are the major steps in Conflation Algorithm? Lab Manual - CLP-II(Information Retrieval) 5 Prof.Shah Sahil K. VPCOE, Baramati Assignment 2 Implementation of Single Pass Clustering Algorithm 2.1 Problem Statement Implement single pass clustering algorithm for clustering text documents. Input: 4-5 text files represented in Vector Space model(Term(Keyword) Vs Document matrix) 2.2 Pre Lab • Concept of Document Clustering • Concept of IR Models 2.3 2.3.1 Theory Clustering Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.Commonly clustering can be classified into following types 1. Graph Theoretic Approach 2. Hierarchical Clustering 6 2.3. THEORY 2.3.2 Implementation of Single Pass Clustering Algorithm Single Pass Clustering The clustering algorithms which only require one pass of the file of object descriptions,are known as ‘Single-Pass Algorithms’. Given a collection of clusters and a threshold value h, if a new document n has the highest similarity more than h to some cluster, the document n is appended to the cluster, and if there exists no cluster, a new cluster is generated which contains only the document n. Clearly Single Pass Clustering is suitable for incremental clustering to temporal data (or data stream) since, once a document is assigned to a cluster, it is not changed in the future. Algorithm 1. Object descriptors (document representatives) are processed serially.The objects(input documents) are described using Vector Model. The Vector for a document di is (W1i,W2i.....,Wki),where Wki represents weight(frequency) of term Wk in document di 2. The first object becomes cluster representative(or centroid) of the first cluster. 3. Each subsequent object is matched against all cluster representatives existing at its processing time.When a new document(object descriptor) di(i > 1) comes in, calculate the similarity values to all the clusters C by using cosine similarity between cluster representative and document. 4. A given object(document) is assigned to one cluster (or more if overlap is allowed) according to some condition(threshold value) on the matching function. 5. When an object is assigned to a cluster the representative for that cluster is recomputed. If D1,D2,....,Dn are the documents in the cluster and each Di is represented by a numerical vector(d1,d2,...dt) then the centroid C of the cluster is given by Lab Manual - CLP-II(Information Retrieval) 7 Prof.Shah Sahil K. VPCOE, Baramati 2.4. PROCEDURE Where, kDik = Implementation of Single Pass Clustering Algorithm √ d12 + d22 + .... + dn2 6. If an object fails a certain test(condition) it becomes the cluster representative of a new Cluster. 2.4 Procedure 1. 4-5 text files are taken as a input to Single Pass Clustering Algorithm.These input files should be represented in term vs document matrix form(Vector Space Model Representation) 2. Pass each input text file(document) serially through algorithm till all documents are covered. Input:Collection of objects (documents) to be clustered in Vector space format. Output:Clusters of given object 2.5 Post Lab After completing this assignment,analyze the performance of single pass clustering algorithm by taking different inputs and write your concluding points accordingly.Also,Compare single pass clustering with single link clustering algorithm. 2.6 Viva Questions 1. Define Clustering. 2. Discuss different IR models. 3. Explain Cluster Hypothesis in short. 4. Define Cluster representative/Centroid of a cluster. 5. Which are alternatives to single pass clustering algorithm? Lab Manual - CLP-II(Information Retrieval) 8 Prof.Shah Sahil K. VPCOE, Baramati Assignment 3 Implementation of Inverted Index Structure 3.1 Problem Statement Implement inverted index/file structure for set of documents. Consider 3 to 4 text documents. 3.2 Pre Lab • Concept of File Structures in IR System • Concept of Term Indexing 3.3 3.3.1 Theory File Structure For a set of ‘attributes’or ‘features’ A and a set of ‘values’ V for a text document, a record R is a subset of the cartesian product A x V in which each attribute has one and only one value. Thus R is a set of ordered pairs of the form (an attribute, its value). For example, the record for a document which has been processed by an automatic content analysis algorithm would be R = (K1, x1), (K2, x2) . . . (Km, xm) Records are collected into logical units called files. They enable one to refer to a set of records by name, the file name. The records within a file are often organized according to relationships between 9 3.4. PROCEDURE Implementation of Inverted Index Structure Figure 3.1: Example of Inverted Index Structure the records. This logical organization has become known as a file structure (or data structure). 3.3.2 Indexing In general, indexing is the technique of mapping of identifiers to set of objects in order to fasten the searching of the objects.In IR perspective, objects will be set of documents or document representatives. Inverted index/file structure An inverted file is a file structure in which every list contains only one record. Remember that a list is defined with respect to a keyword K, so every K-list contains only one record.This implies that the directory will be such that ni = hi for all i, that is, the number of records containing Ki will equal the number of Ki-lists. So the directory will have an address for each record containing Ki . For document retrieval this means that given a keyword we can immediately locate the addresses of all the documents containing that keyword. The definition of inverted files does not require that the addresses in the directory are in any order. However, to facilitate operations such as conjunction (‘and’) and disjunction (‘or’) on any two inverted lists, the addresses are normally kept in record number order. This means that ‘and’ and ‘or’ operations can be performed with one pass through both lists. The penalty we pay is of course that the inverted file becomes slower to update. 3.4 Procedure 1. 3-4 text files are taken as a input in order to build inverted index structure. 2. Process each input text file(document) word by word. Lab Manual - CLP-II(Information Retrieval) 10 Prof.Shah Sahil K. VPCOE, Baramati 3.5. POST LAB Implementation of Inverted Index Structure 3. For each distinct keyword,maintain a data structure containing keyword and (Document no.,Position of keyword in whole document) Input:3-4 text files Output:Inverted index structure of input files 3.5 Post Lab After completing this assignment,analyze the performance of inverted index structure in query evaluation of search engine and write your concluding points accordingly.Also discuss role of inverted index structure in search engine optimization. 3.6 Viva Questions 1. Define Indexing 2. Which indexing structure is used widely in Search engines? 3. Compare different indexing structures. Lab Manual - CLP-II(Information Retrieval) 11 Prof.Shah Sahil K. VPCOE, Baramati Assignment 4 Implementation of Feature Extraction in 2D Color Images 4.1 Problem Statement Implement feature extraction of 2D color image. Extract any one of feature like color,texture,aspect ratio etc.) 4.2 Pre Lab • Concept of Multimedia IR • Concept of Feature Extraction 4.3 Theory 4.3.1 Feature Extraction Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.Alternatively, feature extraction can be termed as method of capturing visual content of images for indexing and retrieval. Features of images used in Multimedia-IR can be of following types: 1. Visual features(primitive or low-level image features) These features are the most basic features with structure of the image. Examples are listed below 12 4.4. PROCEDURE Implementation of Feature Extraction in 2D Color Images a. Edge b. Corner c. Ridge of image 2. Domain-specific features These features depict the characteristics of the image domain. Ex: Fingerprints, human face,eye retina. 3. General features Ex: color, texture, shape, height, width, aspect ratio. 4.3.2 Use of feature extraction • Reduced representation of original data so that repetitions can be omitted. • If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. The issue of choosing the features to be extracted should consider following concerns: • The features should carry enough information about the image and should not require any domainspecific knowledge for their extraction. • They should be easy to compute in order for the approach to be feasible for a large image collection and rapid retrieval. • They should relate well with the human perceptual characteristics since users will finally determine the suitability of the retrieved images. Because of perception subjectivity, there does not exist a single best representation for a feature. Color feature is one of the most widely used feature in Image Retrieval. 4.4 Procedure Process of Feature extraction 1. Any 2D color image is taken as a input file. 2. Scan the input image in a single pass and maintain a count of the number of pixels found at each feature (color, intensity,texture etc.) 3. Each 8-bit image is consisting of 0-255 gray levels/bins. Extraction process involves finding the pixel (x, y) from image which has particular gray level. This process can be applied to whole image. Lab Manual - CLP-II(Information Retrieval) 13 Prof.Shah Sahil K. VPCOE, Baramati 4.5. POST LAB Implementation of Feature Extraction in 2D Color Images Figure 4.1: Histogram for a 2D Color Image 4. Final output will be 256 grey levels/bins containing pixels having respective grey level values.These extracted values can be used to generate a histogram.(In this case,it is a graph showing the number of pixels in an image at each different intensity value found in that image.) For an 8-bit grey scale image there are 256 different possible intensities, and so the histogram will graphically display 256 numbers showing the distribution of pixels amongst those grey scale values. Input:2D Color image Output:Extracted Features of input image 4.5 Post Lab After completing this assignment,analyze the use of feature extraction in case of multimedia content retrieval.Discuss role of feature extraction in relevant content(images,videos etc.) retrieval.Write your concluding points accordingly. 4.6 Viva Questions 1. Define Multimedia IR 2. Which features are mostly extracted in case of Search Engines? 3. Compare Text retrieval Vs Multimedia Retrieval. 4. Define Feature Extraction.How it is useful in reducing the storage space of multimedia documents? Lab Manual - CLP-II(Information Retrieval) 14 Prof.Shah Sahil K. VPCOE, Baramati Assignment 5 Case Study 5.1 Problem Statement Study of any recent technology/topic that contributes to information retrieval system. 5.2 Theory Explain the presentation topic by clearly stating each point thoroughly. Use examples, diagrams to make the explanation more effective. 5.3 Post Lab Analyze & compare the topic of study and write the concluding points accordingly. 15 References [1] C.J. Rijsbergen, “Information Retrieval”,(E-book available at www.dcs.gla.ac.uk) [2] Yates & Neto, “Modern Information Retrieval”, Pearson Education, ISBN 81-297-0274-6 [3] M.F.Porter, “An algorithm for suffix stripping”, Originally published in July 1980. [4] Bob Boiko & Wiley, “Content Management Bible”, 2nd Edition, ISBN-978-0-7645-7371-2, E-book available. 16