Computer Lab Practice - II (Distributed Systems) Laboratory Manual

advertisement
Laboratory Manual
Computer Lab Practice - II
(Distributed Systems)
Final Year - Information Technology
Teaching Scheme
Examination Scheme
Theory : ——
Term Work: 50 Marks
Practical : 2 Hrs/Week
Practical : 50 Marks
Oral :
——
Prepared By
Prof. Dinesh A. Zende
Department of Information Technology
Vidya Pratishthan’s College of Engineering
Baramati – 413133, Dist- Pune (M.S.)
INDIA
December 2012
Table of Contents
1 Implementation of Chat application using socket programming
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Pre Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Hardware and Software Requirement . . . . . . . . . . . . . . . . .
1.4 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Using UDP Socket . . . . . . . . . . . . . . . . . . . . . . .
1.4.2 Using TCP Socket . . . . . . . . . . . . . . . . . . . . . . .
1.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7 Viva Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
in Java
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
2
2
3
4
5
6
6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
10
10
11
11
17
17
Linux
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
19
19
20
20
20
20
24
24
Computing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
26
26
26
2 Implementation of Remote Method Invocation using Java RMI
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Pre Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Hardware and Software Requirement . . . . . . . . . . . . . . . . . .
2.4 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Viva Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Implementation of Client-Server architecture using Socket
3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Pre Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Hardware and Software Requirement . . . . . . . . . . . . . .
3.4 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Viva Questions . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Case Study on Cloud
4.1 Problem Statement
4.2 Pre Lab . . . . . .
4.3 Theory . . . . . . .
References
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Programming in
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
31
i
List of Tables
ii
List of Figures
1.1
Client GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2.1
2.2
General RMI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RMI Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
12
3.1
3.2
File Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Steps to establish socket communication . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
21
4.1
Architecture of cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
iii
Assignment 1
Implementation of Chat application
using socket programming in Java
1.1
Problem Statement
Implementation of a simple chat system using Socket Programming (TCP Sockets)
1. Your chat system includes two types of components
(a) A chat room and
(b) The client
2. System contains maximum 3 clients, each can enter or leave the system at any time
and one can design GUI as given in Figure 1.1.
3. Chat room is long lived ’server’ component and there is no GUI at server side.
Figure 1.1: Client GUI
1
Implementation of Chat application using socket programming in Java
4. All messages are to be broadcasted to all clients connected to the chat room.
1.2
Pre Lab
• Concepts of Sockets, Ports, Transport Level Protocols
• Knowledge of Computer Networks
• Knowledge of Programming in Core Java.
• Knowledge of Network Programming in Java.
1.3
Hardware and Software Requirement
1. Hardware Requirement
• Computer with 1 GHz Processor, 256 MB RAM, 40 GB HDD with network
support.
2. Software Requirement
• JDK 1.6 or Higher
• (optional) Netbeans 6.9 IDE
1.4
Theory
• Implement the Inter Process Communication (IPC) using Socket Programming
1. Using UDP Sockets
2. Using TCP Sockets
• In this assignment you will implement a Chat Server. Client process will send some
string or messages to the server. And in response to that the server process will send
the same string to all available client processes (i.e. it will broadcast the message).
Lab Manual - Computer Lab Practice - II
2
VPCOE, Baramati
Implementation of Chat application using socket programming in Java
1.4.1
Using UDP Socket
public class DatagramPacket
DatagramPackets can be created with one of four constructors:
public DatagramPacket(byte[ ] ibuf, int size);
public DatagramPacket(byte[ ] ibuf, int offset,int size);
public DatagramPacket(byte[ ] ibuf, int size, InetAddress ipaddr,int port);
public DatagramPacket(byte[ ] ibuf, InetAddress ipaddr, int port);
Public Instance Methods
• public InetAddress getAddress();
It is used to get the address of destination.
• public byte[] getData();
Returns the byte array of data contained in the datagram. Mostly used to retrieve
data from the datagram after it has been received.
• public int getLength();
Returns the length of valid data contained in the byte array that would be returned
from getData() method. This typically does not equal the length of the whole byte
array.
• public int getPort();
Returns the port number.
Passed To:
DatagramSocket.receive();
DatagramSocket.send();
DatagramSocketImpl.receive();
DatagramSocketImpl.send();
MulticastSocket.send();
public class DatagramSocket
Lab Manual - Computer Lab Practice - II
3
VPCOE, Baramati
Implementation of Chat application using socket programming in Java
• This class defines a socket that can receive and send unreliable datagram packets
over the network using the UDP protocol.
• A datagram does not implement any kind of stream-based communication protocol, and there is no connection established between the sender and the receiver.
Datagram packets are called ”unreliable” because the protocol does not make any
attempt to ensure that they arrived or to resend them if they did not.
Public Constructors
• public DatagramSocket();
• public DatagramSocket(int port);
• public DatagramSocket(int port,InetAddress ipaddr);
All above constructors throw exception SocketException;
Public Instance Methods
• public void close();
• public InetAddress getLocalAddress();
• public int getLocalPort();
• public int getSoTimeout() throws SocketException;
• public void receive(DatagramPacket p) throws IOException;
• public void send(DatagramPacket p) throws IOException;
• public void setSoTimeout(int timeout) throwsSocketException;
1.4.2
Using TCP Socket
In the implementation with TCP Socket you have to make use of the following classes
public class ServerSocket
Public Constructors
• public ServerSocket (int port);
Lab Manual - Computer Lab Practice - II
4
VPCOE, Baramati
Implementation of Chat application using socket programming in Java
Public Methods
• public Socket accept();
• public void close();
• public InetAddress getInetAddress();
• public int getLocalPort();
public class Socket
Public Constructors
• public Socket(String host, int port);
• public Socket(InetAddress aHost, int port);
Methods
• public InetAddress getInetAddress();
• public InputStream getInputStream();
• public OutputStream getOutputStream();
1.5
Procedure
Steps for Implementation of UDP Socket (Server)
1. Create a DatagramSocket and bind to specified port.
2. Create an instance of DatagramPacket to be read from port.
3. Receive DatagramPacket from port using receive method.
4. Display received packet data.
Steps for Implementation of UDP Socket (Client)
1. Create a DatagramSocket.
2. Create a DatagramPacket with the specification of the Remote Host, port number
and the data to be sent.
Lab Manual - Computer Lab Practice - II
5
VPCOE, Baramati
Implementation of Chat application using socket programming in Java
3. Send this packet using send method.
Steps for Implementation of TCP Socket (Server)
1. Create a socket with the specification of the port number
2. Listen to that port with listen method of the socket.
3. When there is a request for connections then accept the connection using accept
method of the socket.
4. Read and write to the socket using DataInputStream and DataOutputStream respectively.
Steps for Implementation of TCP Socket (Client)
1. Create a socket with the specification of the host machine (server) and the port
number.
2. Specify DataInputStream and DataOutputStream for reading and writing to the
Socket.
3. Write data using UTF8 encoding with writeUTF8( ) method.
1.6
Post Lab
• Implement an application for a chat server and multiple clients using TCP and UDP
both. Compare the usage of TCP Vs UDP Sockets w.r.t. this application. Which
is best suitable?
• Hence from this assignment you can learn how to build client and server application,
that communicate using socket. Also you can learn how reliable and unreliable
communication occurs in them.
1.7
Viva Questions
1. What is a Distributed Systems?
Lab Manual - Computer Lab Practice - II
6
VPCOE, Baramati
Implementation of Chat application using socket programming in Java
2. Give few examples of distributed systems?
3. What is the Difference between Networked System and Distributed System?
4. Name few characteristics of Distributed Systems?
5. Name Some Case Studies of Distributed Systems which you have studied?
6. If you are said to design a Distributed Systems for your Client which design issues
you are going to consider?
7. Explain the TCP and UDP Protocols
8. What is a Distributed Systems?
9. Give few examples of distributed systems?
10. What is the Difference between Networked System and Distributed System?
11. Name few characteristics of Distributed Systems?
12. Name Some Case Studies of Distributed Systems which you have studied?
13. If you are said to design a Distributed Systems for your Client which design issues
you are going to consider?
14. Explain the TCP and UDP Protocols
15. What are Diff challenges faced by Distributed Systems?
16. Name Popular System Models in Distributed Systems?
17. Explain the Difference between Message oriented Communication and Stream Oriented Communication.
18. What are Layered Protocols?
Lab Manual - Computer Lab Practice - II
7
VPCOE, Baramati
[This page intentionally left blank ]
Implementation of Chat application using socket programming in Java
Lab Manual - Computer Lab Practice - II
9
VPCOE, Baramati
Assignment 2
Implementation of Remote Method
Invocation using Java RMI
2.1
Problem Statement
Write a program to implement Simple Student database application using RMI.
Remote client consist of GUI for performing different database operations (For ex.
Insert, delete, update) and retrieving data through RMI.
2.2
Pre Lab
• Concepts of Sockets, Ports, Transport Level Protocols
• Knowledge of TCP and UDP Socket Programming
• Knowledge of Programming in Core Java.
• Knowledge of Remote Method Invocation.
2.3
Hardware and Software Requirement
1. Hardware Requirement
• Computer with 1 GHz Processor, 256 MB RAM, 40 GB HDD with network
support.
10
Implementation of Remote Method Invocation using Java RMI
2. Software Requirement
• JDK 1.6 or Higher
• (optional) Netbeans 6.9 IDE
2.4
Theory
• The server must first bind its name to the registry
• The client lookup the server name in the registry to establish remote references.
• The Stub serializing the parameters to skeleton, the skeleton invoking the remote
method and serializing the result back to the stub..
• A client invokes a remote method; the call is first forwarded to stub.
• The stub is responsible for sending the remote call over to the server-side skeleton.
• The stub opening a socket to the remote server, marshaling the object parameters
and forwarding the data stream to the skeleton.
• A skeleton contains a method that receives the remote calls, unmarshals the parameters, and invokes the actual remote object implementation.
2.5
Procedure
Steps for Developing an RMI System
1. Define the remote interface
2. Develop the remote object by implementing the remote interface.
3. Develop the client program.
4. Compile the Java source files.
5. Generate the client stubs and server skeletons.
6. Start the RMI registry.
Lab Manual - Computer Lab Practice - II
11
VPCOE, Baramati
Implementation of Remote Method Invocation using Java RMI
Figure 2.1: General RMI Architecture
Figure 2.2: RMI Invocation
Lab Manual - Computer Lab Practice - II
12
VPCOE, Baramati
Implementation of Remote Method Invocation using Java RMI
7. Start the remote server objects.
8. Run the client
• Step 1: Defining the Remote Interface
To create an RMI application, the first step is the defining of a remote interface
between the client and server objects.
/* SampleServer.java */
import java.rmi.*;
public interface SampleServer extends Remote
{
public int sum(int a,int b) throws RemoteException;
}
• Step 2: Develop the remote object by implementing the remote interface.
– The server is a simple unicast remote server.
– Create server by extending java.rmi.server.UnicastRemoteObject.
– The server uses the RMISecurityManager to protect its resources while engaging
in remote communication.
/* SampleServerImpl.java */
import java.rmi.*;
import java.rmi.server.*;
import java.rmi.registry.*;
public class SampleServerImpl extends UnicastRemoteObject
implements SampleServer
{
SampleServerImpl() throws RemoteException
{
super();
}
Lab Manual - Computer Lab Practice - II
13
VPCOE, Baramati
Implementation of Remote Method Invocation using Java RMI
public int sum(int a,int b) throws RemoteException
{
return a + b;
}
}
– The server must bind its name to the registry, the client will look up the server
name.
– Use java.rmi.Naming class to bind the server name to registry. In this example
the name call SAMPLE-SERVER.
– In the main method of your server object, the RMI security manager is created
and installed.
//RMIServer.java
public static void main(String args[])
{
try
{
//create a local instance of the object
SampleServerImpl Server = new SampleServerImpl();
//put the local instance in the registry
Naming.rebind("SAMPLE-SERVER " , Server);
System.out.println("Server waiting.....");
}
catch (java.net.MalformedURLException me)
{
System.out.println("Malformed URL: " + me.toString());
}
catch (RemoteException re)
Lab Manual - Computer Lab Practice - II
14
VPCOE, Baramati
Implementation of Remote Method Invocation using Java RMI
{
System.out.println("Remote exception: " + re.toString());
}
}
• Step 3: Develop the client program
– In order for the client object to invoke methods on the server, it must first look
up the name of server in the registry.
– You use the java.rmi.Naming class to lookup the server name.
– The server name is specified as URL in the from rmi://host:port/name
– Default RMI port is 1099.
– The name specified in the URL must exactly match the name that the server
has bound to the registry.
– In this example, the name is SAMPLE-SERVER
– The remote method invocation is programmed using the remote interface name
(remoteObject) as prefix and the remote method name sum as suffix.
//RMIClient.java
import java.rmi.*;
import java.rmi.server.*;
public class SampleClient
{
public static void main(String[] args)
{
//get the remote object from the registry
try
{
System.out.println("Security Manager loaded");
String url = "//localhost/SAMPLE-SERVER";
SampleServer remoteObject = (SampleServer)Naming.lookup(url);
Lab Manual - Computer Lab Practice - II
15
VPCOE, Baramati
Implementation of Remote Method Invocation using Java RMI
System.out.println("Got remote object");
System.out.println(" 1 + 2 = " + remoteObject.sum(1,2) );
}
catch (RemoteException exc){
System.out.println("Error in lookup: " + exc.toString()); }
catch (java.net.MalformedURLException exc) {
System.out.println("Malformed URL: " + exc.toString()); }
catch (java.rmi.NotBoundException exc) {
System.out.println("NotBound: " + exc.toString()); }
}
}
• Step 4 and 5: Compile the Java source files and Generate the client stubs and server
skeletons
– Once the interface is completed, you need to generate stubs and skeleton code.
The RMI system provides an RMI compiler (rmic) that takes your generated
interface class and procedures stub code on its self.
Follow these steps to compile and run RMI Application
c:\jdk1.4\RMI> set CLASSPATH= c:\jdk1.4\bin\
c:\jdk1.4\RMI> javac SampleServer.java
c:\jdk1.4\RMI> javac SampleServerImpl.java
c:\jdk1.4\RMI> javac SampleClient.java
c:\jdk1.4\RMI> rmic SampleServerImpl
c:\jdk1.4\RMI> start rmiregistry
• The RMI applications need install to Registry. And the Registry must start manual
by call rmiregistry.
• The rmiregistry uses port 1099 by default. You can also bind rmiregistry to a
different port by indicating the new port number as : rmiregistry ¡new port¿
• On Windows, you have to type in from the command line: start rmiregistry
Lab Manual - Computer Lab Practice - II
16
VPCOE, Baramati
Implementation of Remote Method Invocation using Java RMI
Advancements:
Create an RMI application for following requirements
1. Unit Converter Application
2. Currency Converter Application
3. Simple Calculator
4. Time Server
5. Echo Server
6. String Operations
2.6
Post Lab
You have to develop an RMI Server, where database will be residing. RMI Client will
have GUI with functions like, Insert, delete, update.
2.7
Viva Questions
1. What is RPC and LRPC?
2. What is the advantage of RPC 2 over RPC?
3. How do we provide security to RMI classes?
4. What are Layered Protocols?
5. What is Remote Method Invocation?
6. What is Distributed File System (DFS)?
7. What do you mean by Auto mounting?
8. What is the advantage of RPC2 over RPC?
9. What are advances in CODA as to AFS?
10. Which is the most Important Feature of CODA?
Lab Manual - Computer Lab Practice - II
17
VPCOE, Baramati
Implementation of Remote Method Invocation using Java RMI
11. What are Stubs and Skeletons?
12. How communication does takes place in NFS?
13. Explain the Naming concept in NFS?
14. How Synchronization takes place in NFS?
15. How do you implement file locking in NFS?
16. What is Vice and Virtue related to CODA?
Lab Manual - Computer Lab Practice - II
18
VPCOE, Baramati
Assignment 3
Implementation of Client-Server
architecture using Socket
Programming in Linux
3.1
Problem Statement
Imagine a Client-Server architecture (As shown in figure 3.1 ), where user stores the file
on a server. The main server splits that file into two or more fragments and store each
fragment on separate storage server. When client retrieve the file from the main server,
the main server again retrieves the file in fragments from storage servers and present it
as a one file to user.
Figure 3.1: File Server Architecture
19
Implementation of Client-Server architecture using Socket Programming in Linux
3.2
Pre Lab
• Concepts of Sockets, Ports, Transport Level Protocols
• Concepts of Computer Network
• Knowledge of Programming in C under Linux.
3.3
Hardware and Software Requirement
1. Hardware Requirement
• Computer with 1 GHz Processor, 256 MB RAM, 40 GB HDD with network
support.
2. Software Requirement
• Operating System - Linux
• (optional) GEdit or any other Editor
3.4
Theory
In this assignment you will implement client-server architecture using socket.A socket is
a communication mechanism that allows client/server systems to be developed either
locally, on a single machine or across network.Client and main server can communicate
by using socket. Main server and fragmented server can also communicate by using
socket.
3.5
Procedure
1. Server creates socket by calling socket system call and it can’t be shared with another
process.
#include<sys/types.h>
#include<sys/socket.h>
Lab Manual - Computer Lab Practice - II
20
VPCOE, Baramati
Implementation of Client-Server architecture using Socket Programming in Linux
Figure 3.2: Steps to establish socket communication
int socket( int family, int type, int protocol );
2. A socket is named using bind.
int bind(int sockfd,struct sockaddr *myaddr, int addr_len);
if successful, returns 0,otherwise -1
3. To accept incoming connections on socket,a server program must create a queue
to store pending request. The system call, listen creates queue for incoming
connections.
int listen(int sockfd, int backlog);
if successful, returns 0,otherwise -1
4. Servers accept incoming requests by calling accept. When server calls accept, new
socket is get created that is distinct from named socket and is used for communication with client.
int accept(int sockfd,struct sockaddr *addr, int addrlen);
5. Client creates socket by using socket system call and send connection request to
server through connect system call.
int connect(int sockfd,struct sockaddr *addr, int addrlen);
Lab Manual - Computer Lab Practice - II
21
VPCOE, Baramati
Implementation of Client-Server architecture using Socket Programming in Linux
if successful, returns 0,otherwise -1
6. Once connection is established, further communication is done by using read and
write.
read(int sockfd, string ch[], int len);
write(int sockfd, string ch[], int len);
7. Finally, client and server calls close to close the connection. int close(int sockfd);
Simple network client example:
#include<sys/types.h>
#include<sys/socket.h>
#include<stdio.h>
#include<netinet/in.h>
#include<arpa/inet.h>
#include<unistd.h>
#include<stdlib.h>
int main()
{
int sockfd;
int len;
struct sockaddr_in address;
int result;
char ch = ’A’;
//Creating and naming the socket
sockfd = socket(AF_INET,SOCK_STREAM,0);
address.sin_family = AF_INET;
address.sin_addr.s_addr = inet_addr("127.0.0.1");
address.sin_port = 1234;
len = sizeof(address);
//Connect our socket to server socket
result = connect(sockfd,(struct sockaddr *) &address, len);
Lab Manual - Computer Lab Practice - II
22
VPCOE, Baramati
Implementation of Client-Server architecture using Socket Programming in Linux
if(result == -1)
{ perror("oops:client1");exit(1); }
//Read and Write via sockfd
write(sockfd,&ch,1);
read(sockfd,&ch,1);
printf("\n Servers says : %c\n",ch);
close(sockfd);
exit(0);
}
Simple network server example
#include<sys/types.h>
#include<sys/socket.h>
#include<stdio.h>
#include<netinet/in.h>
#include<arpa/inet.h>
#include<unistd.h>
#include<stdlib.h>
int main()
{
int server_sockfd,client_sockfd;
int server_len,client_len;
struct sockaddr_in server_address;
struct sockaddr_in client_address;
//Create and name the socket
server_sockfd = socket(AF_INET,SOCK_STREAM,0);
server_address.sin_family = AF_INET;
server_address.sin_addr.s_addr =inet_addr("127.0.0.1");
server_address.sin_port = 1234;
server_len = sizeof(server_address);
Lab Manual - Computer Lab Practice - II
23
VPCOE, Baramati
Implementation of Client-Server architecture using Socket Programming in Linux
bind(server_sockfd,(struct sockaddr *)&server_address, server_len);
//Create a connection queue and wait for the clients
listen(server_sockfd,5);
while(1)
{
char ch;
printf("server waiting \n");
//Accept a connection
client_len = sizeof(client_address);
client_sockfd = accept(server_sockfd,(struct sockaddr *) &client_address,
//Read and Write to client on client sockfd
read(client_sockfd,&ch,1);
ch++;
write(client_sockfd,&ch,1);
close(client_sockfd);
}
}
Compiling and Running server and client programs
$ cc -o Serverapp server2.c
$ cc -o Clientapp client2.c
$ ./Serverapp &
$ ./ Clientapp
3.6
Post Lab
From this assignment you can study how to write a socket program in C under Linux.
3.7
Viva Questions
1. Explain TCP and UDP protocols?
Lab Manual - Computer Lab Practice - II
24
VPCOE, Baramati
Implementation of Client-Server architecture using Socket Programming in Linux
2. Explain difference between TCP and UDP?
3. Which system calls are used at server side program?
4. Which system calls are used at client side program?
5. What accept system call returns ?
6. Explain different socket address structures?
7. List out different address families used in socket programming
8. Explain the fields in socket system call.
Lab Manual - Computer Lab Practice - II
25
VPCOE, Baramati
Assignment 4
Case Study on Cloud Computing
4.1
Problem Statement
Perform case study on cloud computing which will include Definition, Benefits,
Drawbacks, All the services like Process as a Service, Platform as a Service, Info as a
Service, Integration as a Service, Security as a Service, Storage as a Service,
Governance or Management as a Service, TAAS, Infrastructure as a Service.
4.2
Pre Lab
• Knowledge of Computer Networks
4.3
Theory
Definition Cloud computing is a technology that uses the internet and central remote
servers to maintain data and applications.
• Cloud computing allows consumers and businesses to use applications without installation and access their personal files at any computer with internet access.
• This technology allows for much more efficient computing by centralizing storage,
memory, processing and bandwidth.
• Example Yahoo email, Gmail, or Hotmail etc.
26
Case Study on Cloud Computing
• You don’t need a software or a server to use them. All a consumer would need is
just an internet connection and you can start sending emails. The server and email
management software is all on the cloud (internet) and is totally managed by the
cloud service provider Yahoo , Google etc.
Characteristics of Cloud computing
1. On-demand self-service: individuals can set themselves up without needing anyone’s
help;
2. Ubiquitous network access: available through standard Internet-enabled devices;
3. Location independent resource pooling: processing and storage demands are balanced across a common infrastructure with no particular resource assigned to any
individual user;
4. Rapid elasticity: consumers can increase or decrease capacity at will.
5. Pay per use: consumers are charged fees based on their usage of a combination of
computing power, bandwidth use and/or storage
Architecture
Advantages of cloud computing
1. Reduced Cost
Cloud technology is paid incrementally, saving organizations money.
2. Increased Storage
Organizations can store more data than on private computer systems.
3. Highly Automated
No longer do IT personnel need to worry about keeping software up to date.
4. Flexibility
Cloud computing offers much more flexibility than past computing methods.
5. More Mobility
Employees can access information wherever they are, rather than having to remain
at their desks.
Lab Manual - Computer Lab Practice - II
27
VPCOE, Baramati
Case Study on Cloud Computing
Figure 4.1: Architecture of cloud computing
6. Allows IT to Shift Focus
No longer having to worry about constant server updates and other computing
issues, government organizations will be free to concentrate on innovation.
Disadvantages of Cloud Computing
1. Security and Privacy
The biggest concerns about cloud computing are security and privacy. Users might
not be comfortable handing over their data to a third party. This is an even greater
concern when it comes to companies that wish to keep their sensitive information
on cloud servers. While most service vendors would ensure that their servers are
kept free from viral infection and malware, it Is still a concern considering the fact
that a number of users from around the world are accessing the server. Privacy
is another issue with cloud servers. Ensuring that a client’s data is not accessed
by any unauthorized users is of great importance for any cloud service. To make
their servers more secure, cloud service vendors have developed password protected
accounts, security servers through which all data being transferred must pass and
data encryption techniques. After all, the success of a cloud service depends on its
Lab Manual - Computer Lab Practice - II
28
VPCOE, Baramati
Case Study on Cloud Computing
reputation, and any sign of a security breach would result in a loss of clients and
business.
2. Dependancy(loss of control)
(a) Quality problems with CSP(Cloud Service Providers).No influence on maintenance levels and fix frequency when using cloud services from a CSP.
(b) No or little insight in CSP contingency procedures. Especially backup, restore
and disaster recovery.
(c) Measurement of resource usage and end user activities lies in the hands of the
CSP.
3. Cost
Higher costs. While in the long run, cloud hosting is a lot cheaper than traditional
technologies, the fact that it’s currently new and has to be researched and improved
actually makes it more expensive. Data centers have to buy or develop the software
that’ll run the cloud, rewire the machines and fix unforeseen problems (which are
always there). This makes their initial cloud offers more expensive. Like in all other
industries, the first customers pay a higher price and have to deal with more issues
than those who switch later (although it would be very hard to create and improve
new technologies without these initial adopters).
4. Decreased flexibility
This is only a temporary problem (as the others on this list), but current technologies
are still in the testing stages, so they don’t really offer the flexibility they promise.
Of course, that’ll change in the future, but some of the current users might have
to deal with the facts that their cloud server is difficult or impossible to upgrade
without losing some data, for example.
5. Knowledge and integration
Knowledge:
More and deeper knowledge is required for implementing and managing SLA contracts with CSP’s ,Since all knowledge about the working of the cloud (e.g. hardware, software, virtualization, deployment) is concentrated at the CSP, it is hard to
Lab Manual - Computer Lab Practice - II
29
VPCOE, Baramati
Case Study on Cloud Computing
get grip on the CSP.
Integration:
Integration with equipment hosted in other data centers is difficult to achieve. Peripherals integration. (Bulk)Printers and local security IT equipment (e.g. access
systems) is difficult to integrate. But also (personal) USB devices or smart phones
or groupware and email systems are difficult to integrate.
Lab Manual - Computer Lab Practice - II
30
VPCOE, Baramati
References
[1] Dr. P. K. Sinha, ”Distributed Operating Systems Concepts and Design”,
Prentice Hall India(PHI).
[2] Andrew S. Tanenbaum and Maarten van Steen,”Distributed Systems - Principles and Paradigms”, Prentice Hall India(PHI).
[3] Elliotte Rusty Harold,”Java Network Programming”-Third Edition ,O’Reilly
[4] Herbert Schildt,”Java-The Complete Reference”,TMH
[5] Neil Matthew et.al,”Beginnig Linux Programming”-Third Edition,Wrox Publications.
[6] W. Richard Stevens,”UNIX Network Programming”,Prentice Hall India(PHI).
[7] David S. Linthicum,”Cloud Computing and SOA Convergence in your Enterprise - A step by Step Guide”
31
Laboratory Manual
Computer Laboratory Practice-II
(Information Retrieval)
Final Year - Information Technology
Teaching Scheme
Examination Scheme
Theory : ——
Term Work: 50 Marks
Practical : 02 Hrs/Week/Batch
Practical : 50 Marks
Oral :
——
Prepared By
Prof.Shah Sahil K.
Department of Information Technology
Vidya Pratishthan’s College of Engineering
Baramati – 413133, Dist- Pune (M.S.)
INDIA
December 2013
Table of Contents
1 Implementation of Conflation Algorithm
1.1 Problem Statement . . . . . . . . . . . . .
1.2 Pre Lab . . . . . . . . . . . . . . . . . . .
1.3 Hardware and Software Requirement . . .
1.4 Theory . . . . . . . . . . . . . . . . . . . .
1.4.1 Conflation Algorithm . . . . . . .
1.4.2 Luhn’s idea . . . . . . . . . . . . .
1.4.3 M.F.Porter’s Algorithm . . . . . .
1.5 Procedure . . . . . . . . . . . . . . . . . .
1.6 Post Lab . . . . . . . . . . . . . . . . . . .
1.7 Viva Questions . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
1
2
2
2
3
5
5
5
2 Implementation of Single Pass Clustering
2.1 Problem Statement . . . . . . . . . . . . .
2.2 Pre Lab . . . . . . . . . . . . . . . . . . .
2.3 Theory . . . . . . . . . . . . . . . . . . . .
2.3.1 Clustering . . . . . . . . . . . . . .
2.3.2 Single Pass Clustering . . . . . . .
2.4 Procedure . . . . . . . . . . . . . . . . . .
2.5 Post Lab . . . . . . . . . . . . . . . . . . .
2.6 Viva Questions . . . . . . . . . . . . . . .
Algorithm
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
6
6
6
6
7
8
8
8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
9
9
10
10
11
11
Images
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
12
12
12
13
13
14
14
3 Implementation of Inverted
3.1 Problem Statement . . . .
3.2 Pre Lab . . . . . . . . . .
3.3 Theory . . . . . . . . . . .
3.3.1 File Structure . . .
3.3.2 Indexing . . . . . .
3.4 Procedure . . . . . . . . .
3.5 Post Lab . . . . . . . . . .
3.6 Viva Questions . . . . . .
Index
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
Structure
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
4 Implementation of Feature Extraction
4.1 Problem Statement . . . . . . . . . . .
4.2 Pre Lab . . . . . . . . . . . . . . . . .
4.3 Theory . . . . . . . . . . . . . . . . . .
4.3.1 Feature Extraction . . . . . . .
4.3.2 Use of feature extraction . . . .
4.4 Procedure . . . . . . . . . . . . . . . .
4.5 Post Lab . . . . . . . . . . . . . . . . .
4.6 Viva Questions . . . . . . . . . . . . .
in 2D
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Color
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
TABLE OF CONTENTS
TABLE OF CONTENTS
5 Case Study
5.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
15
15
15
References
16
ii
List of Figures
1.1
Relation between frequency of word and significance of word [Luhn’s idea] . . . . . . . .
3
3.1
Example of Inverted Index Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
4.1
Histogram for a 2D Color Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
iii
Assignment 1
Implementation of Conflation
Algorithm
1.1
Problem Statement
Develop an automated text processing system which generates the document representative of the text
by giving weightage to the words appearing in the text.(Use - Luhn’s concept of automatic text analysis
& Working concept of conflation algorithm.)
1.2
Pre Lab
• Luhn’s Idea
• M.F.Porter’s Suffix Stripping Algorithm
1.3
Hardware and Software Requirement
• System with minimum 512MB RAM
• JDK 1.7
• Java editor viz. Netbeans IDE 6.8/Higher Version,Eclipse etc.
1
1.4. THEORY
1.4
Implementation of Conflation Algorithm
Theory
Information Retrieval
Calvin Mooers coined the term information retrieval in 1950. In the context of library and information
science, we mean to get back information, which is, in a way, hidden, from normal sight or vision.
According to, J.H. Shera: It is,“The process of locating and selecting data, relevant to a given requirement.” Calvin Mooers:“Searching and retrieval of information from storage, according to specification
by subject.”
1.4.1
Conflation Algorithm
In order to develop an automated text processing system which by means of computable methods with
the minimum of human intervention will generate from the input text (full text, abstract, or title)
a document representative adequate for use in an automatic retrieval system,conflation algorithm is
mainly useful. A document will be indexed by a name if one of its significant words occurs as a member
of that class. Such a system will usually consist of three parts:
1. Removal of high frequency words(Stop words & Non words Removal)
2. Suffix stripping (Using M.F.Porter’s Algorithm)
3. Detection & Removal of equivalent stems.
1.4.2
Luhn’s idea
Luhn proposed that “the frequency of word occurrence in an article furnishes a useful measurement of
word significance”. Luhn used Zipf’s Law as a null hypothesis to specify two cut-offs, an upper and a
lower,thus excluding non-significant words. The words exceeding the upper cut-off were considered to
be common and those below the lower cut-off rare, and therefore not contributing significantly to the
content of the article. He thus devised a counting technique for finding significant words. The same is
shown by using a plot of frequency versus rank.
Stop words
These are the very common words occurring frequently in a sentence and which does not have any
meaning and these will not contribute in relevance of the sentence.
Example of stop words include but not limited to words like a,an,the,is,was,are,were,he,she,it etc.
Non words
These are the words/notations used in order to represent the sentence with proper formatting characters.
Example of non words include all formatting(or special) characters like ?,“,”,;,:,& etc.
The removal of high frequency words, ‘Stop’ words or ‘fluff’ words is one way of implementing Luhn’s
Lab Manual - CLP-II(Information Retrieval)
2
Prof.Shah Sahil K. VPCOE, Baramati
1.4. THEORY
Implementation of Conflation Algorithm
Figure 1.1: Relation between frequency of word and significance of word [Luhn’s idea]
upper cut-off. This is normally done by comparing the input text with a ‘stop word list’ of words which
are to be removed. The advantages of the process are not only that non-significant words are removed
and will therefore not interfere during retrieval, but also that the size of the total document file can be
reduced by between 30 and 50 per cent.
1.4.3
M.F.Porter’s Algorithm
Terms with a common stem will usually have similar meanings, for example: CONNECT, CONNECTED, CONNECTING, CONNECTION, CONNECTIONS. Performance of an IR system will be
improved if term groups such as this are conflated into a single term. This may be done by removal of
the various suffixes -ED, -ING, -ION, -IONS, etc to leave the single term CONNECT. In addition, the
suffix stripping process will reduce the total number of terms in the IR system, and hence reduce the
size and complexity of the data in the system, which is always advantageous.
Assumption for the algorithm is: a ’consonant’ in a word is: ”a letter other than A, E, I, O or U, and
other than Y preceded by a consonant”. A ’vowel’ in a word is: ”if a letter is not a consonant it is a
vowel”. Every consonant is represented by ’C’ and every vowel is represented by ’V’. A list CCC.... of
length greater than 0 will be denoted by C, and a list VVV... of length greater than 0 will be denoted
by V. Any word, or part of a word, therefore has one of the four forms:
Lab Manual - CLP-II(Information Retrieval)
3
Prof.Shah Sahil K. VPCOE, Baramati
1.4. THEORY
Implementation of Conflation Algorithm
These all may be represented by the single form: [C]VCVC ... [V]. Where, the square brackets denote
arbitrary presence of their contents. Using (VC)m to denote VC repeated m times, this may again be
written as:
[C](VC)m[V]
‘m’ will be called the ’measure’ of any word or word part when represented in this form.
Some examples of it are as follows:
The ‘rules’ for removing a suffix will be given in the form:
This means that if a word ends with the suffix S1 and the stem before S1 satisfies the given condition,
S1 is replaced by S2. The condition is usually given in terms of m, e.g.:
Here S1 is ‘EMENT’ and S2 is null. This would map REPLACEMENT to REPLAC, since REPLAC is
a word part for which m = 2.
For two stems to be equivalent they must match except for their endings, which themselves must appear
in the list as equivalent.
Lab Manual - CLP-II(Information Retrieval)
4
Prof.Shah Sahil K. VPCOE, Baramati
1.5. PROCEDURE
Implementation of Conflation Algorithm
For example, stems such as ABSORB- and ABSORPT- are conflated because there is an entry in the
list defining B and PT as equivalent stem-endings if the preceding characters match.
Document representative
It is a list of significant words(words having high frequency of occurrence). These are often referred to
as the documents index terms or keywords.
1.5
Procedure
1. A text file is taken as a input to conflation algorithm
2. Maintain/Create a database containing list of stop words and non words.
3. Process the input file to remove the stop words and non words. This step is known as document
preprocessing
4. Preprocessed file is given as input to M.F.Porter’s Suffix Stripping algorithm.
5. Detect the equivalent stems and find the frequency of occurrence of each term in the document.
6. Based on Luhn’s idea decide the upper bound(maximum frequency value of the term) and lower
cutoff(based on maximum frequency value it can be decided).Apply Luhn’s idea to decide significant word set.
Input: Any text(.txt,.doc) file.
Output: Set of index terms/keywords(Document Representative)
1.6
Post Lab
After completing this assignment,analyze the performance of conflation algorithm by taking different
inputs and write your concluding points accordingly.Discuss different areas where conflation algorithm
is used widely.
1.7
Viva Questions
1. Define Information Retrieval.Also,discuss advantages of IR System.
2. Explain Luhn’s idea.
3. Define Document representative.
4. Which are the major steps in Conflation Algorithm?
Lab Manual - CLP-II(Information Retrieval)
5
Prof.Shah Sahil K. VPCOE, Baramati
Assignment 2
Implementation of Single Pass
Clustering Algorithm
2.1
Problem Statement
Implement single pass clustering algorithm for clustering text documents.
Input: 4-5 text files represented in Vector Space model(Term(Keyword) Vs Document matrix)
2.2
Pre Lab
• Concept of Document Clustering
• Concept of IR Models
2.3
2.3.1
Theory
Clustering
Clustering can be considered the most important unsupervised learning problem; so, as every other
problem of this kind, it deals with finding a structure in a collection of unlabeled data. A definition of
clustering could be “the process of organizing objects into groups whose members are similar in some
way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar”
to the objects belonging to other clusters.Commonly clustering can be classified into following types
1. Graph Theoretic Approach
2. Hierarchical Clustering
6
2.3. THEORY
2.3.2
Implementation of Single Pass Clustering Algorithm
Single Pass Clustering
The clustering algorithms which only require one pass of the file of object descriptions,are known as
‘Single-Pass Algorithms’.
Given a collection of clusters and a threshold value h, if a new document n has the highest similarity
more than h to some cluster, the document n is appended to the cluster, and if there exists no cluster, a
new cluster is generated which contains only the document n. Clearly Single Pass Clustering is suitable
for incremental clustering to temporal data (or data stream) since, once a document is assigned to a
cluster, it is not changed in the future.
Algorithm
1. Object descriptors (document representatives) are processed serially.The objects(input documents)
are described using Vector Model.
The Vector for a document di is (W1i,W2i.....,Wki),where Wki represents weight(frequency) of
term Wk in document di
2. The first object becomes cluster representative(or centroid) of the first cluster.
3. Each subsequent object is matched against all cluster representatives existing at its processing
time.When a new document(object descriptor) di(i > 1) comes in, calculate the similarity values
to all the clusters C by using cosine similarity between cluster representative and document.
4. A given object(document) is assigned to one cluster (or more if overlap is allowed) according to
some condition(threshold value) on the matching function.
5. When an object is assigned to a cluster the representative for that cluster is recomputed.
If D1,D2,....,Dn are the documents in the cluster and each Di is represented by a numerical vector(d1,d2,...dt) then the centroid C of the cluster is given by
Lab Manual - CLP-II(Information Retrieval)
7
Prof.Shah Sahil K. VPCOE, Baramati
2.4. PROCEDURE
Where, kDik =
Implementation of Single Pass Clustering Algorithm
√
d12 + d22 + .... + dn2
6. If an object fails a certain test(condition) it becomes the cluster representative of a new Cluster.
2.4
Procedure
1. 4-5 text files are taken as a input to Single Pass Clustering Algorithm.These input files should be
represented in term vs document matrix form(Vector Space Model Representation)
2. Pass each input text file(document) serially through algorithm till all documents are covered.
Input:Collection of objects (documents) to be clustered in Vector space format.
Output:Clusters of given object
2.5
Post Lab
After completing this assignment,analyze the performance of single pass clustering algorithm by taking
different inputs and write your concluding points accordingly.Also,Compare single pass clustering with
single link clustering algorithm.
2.6
Viva Questions
1. Define Clustering.
2. Discuss different IR models.
3. Explain Cluster Hypothesis in short.
4. Define Cluster representative/Centroid of a cluster.
5. Which are alternatives to single pass clustering algorithm?
Lab Manual - CLP-II(Information Retrieval)
8
Prof.Shah Sahil K. VPCOE, Baramati
Assignment 3
Implementation of Inverted Index
Structure
3.1
Problem Statement
Implement inverted index/file structure for set of documents.
Consider 3 to 4 text documents.
3.2
Pre Lab
• Concept of File Structures in IR System
• Concept of Term Indexing
3.3
3.3.1
Theory
File Structure
For a set of ‘attributes’or ‘features’ A and a set of ‘values’ V for a text document, a record R is a subset
of the cartesian product A x V in which each attribute has one and only one value. Thus R is a set of
ordered pairs of the form (an attribute, its value). For example, the record for a document which has
been processed by an automatic content analysis algorithm would be R = (K1, x1), (K2, x2) . . . (Km,
xm)
Records are collected into logical units called files. They enable one to refer to a set of records by
name, the file name. The records within a file are often organized according to relationships between
9
3.4. PROCEDURE
Implementation of Inverted Index Structure
Figure 3.1: Example of Inverted Index Structure
the records. This logical organization has become known as a file structure (or data structure).
3.3.2
Indexing
In general, indexing is the technique of mapping of identifiers to set of objects in order to fasten the
searching of the objects.In IR perspective, objects will be set of documents or document representatives.
Inverted index/file structure
An inverted file is a file structure in which every list contains only one record. Remember that a list
is defined with respect to a keyword K, so every K-list contains only one record.This implies that the
directory will be such that ni = hi for all i, that is, the number of records containing Ki will equal the
number of Ki-lists. So the directory will have an address for each record containing Ki . For document
retrieval this means that given a keyword we can immediately locate the addresses of all the documents
containing that keyword. The definition of inverted files does not require that the addresses in the
directory are in any order. However, to facilitate operations such as conjunction (‘and’) and disjunction
(‘or’) on any two inverted lists, the addresses are normally kept in record number order. This means
that ‘and’ and ‘or’ operations can be performed with one pass through both lists. The penalty we pay
is of course that the inverted file becomes slower to update.
3.4
Procedure
1. 3-4 text files are taken as a input in order to build inverted index structure.
2. Process each input text file(document) word by word.
Lab Manual - CLP-II(Information Retrieval)
10
Prof.Shah Sahil K. VPCOE, Baramati
3.5. POST LAB
Implementation of Inverted Index Structure
3. For each distinct keyword,maintain a data structure containing keyword and (Document no.,Position
of keyword in whole document)
Input:3-4 text files
Output:Inverted index structure of input files
3.5
Post Lab
After completing this assignment,analyze the performance of inverted index structure in query evaluation
of search engine and write your concluding points accordingly.Also discuss role of inverted index structure
in search engine optimization.
3.6
Viva Questions
1. Define Indexing
2. Which indexing structure is used widely in Search engines?
3. Compare different indexing structures.
Lab Manual - CLP-II(Information Retrieval)
11
Prof.Shah Sahil K. VPCOE, Baramati
Assignment 4
Implementation of Feature
Extraction in 2D Color Images
4.1
Problem Statement
Implement feature extraction of 2D color image.
Extract any one of feature like color,texture,aspect ratio etc.)
4.2
Pre Lab
• Concept of Multimedia IR
• Concept of Feature Extraction
4.3
Theory
4.3.1
Feature Extraction
Transforming the input data into the set of features is called feature extraction. If the features extracted
are carefully chosen it is expected that the features set will extract the relevant information from the
input data in order to perform the desired task using this reduced representation instead of the full size
input.Alternatively, feature extraction can be termed as method of capturing visual content of images
for indexing and retrieval. Features of images used in Multimedia-IR can be of following types:
1. Visual features(primitive or low-level image features)
These features are the most basic features with structure of the image. Examples are listed below
12
4.4. PROCEDURE
Implementation of Feature Extraction in 2D Color Images
a. Edge b. Corner c. Ridge of image
2. Domain-specific features
These features depict the characteristics of the image domain. Ex: Fingerprints, human face,eye
retina.
3. General features
Ex: color, texture, shape, height, width, aspect ratio.
4.3.2
Use of feature extraction
• Reduced representation of original data so that repetitions can be omitted.
• If the features extracted are carefully chosen it is expected that the features set will extract the
relevant information from the input data in order to perform the desired task using this reduced
representation instead of the full size input.
The issue of choosing the features to be extracted should consider following concerns:
• The features should carry enough information about the image and should not require any domainspecific knowledge for their extraction.
• They should be easy to compute in order for the approach to be feasible for a large image collection
and rapid retrieval.
• They should relate well with the human perceptual characteristics since users will finally determine
the suitability of the retrieved images.
Because of perception subjectivity, there does not exist a single best representation for a feature. Color
feature is one of the most widely used feature in Image Retrieval.
4.4
Procedure
Process of Feature extraction
1. Any 2D color image is taken as a input file.
2. Scan the input image in a single pass and maintain a count of the number of pixels found at each
feature (color, intensity,texture etc.)
3. Each 8-bit image is consisting of 0-255 gray levels/bins. Extraction process involves finding the
pixel (x, y) from image which has particular gray level. This process can be applied to whole
image.
Lab Manual - CLP-II(Information Retrieval)
13
Prof.Shah Sahil K. VPCOE, Baramati
4.5. POST LAB
Implementation of Feature Extraction in 2D Color Images
Figure 4.1: Histogram for a 2D Color Image
4. Final output will be 256 grey levels/bins containing pixels having respective grey level values.These
extracted values can be used to generate a histogram.(In this case,it is a graph showing the number
of pixels in an image at each different intensity value found in that image.)
For an 8-bit grey scale image there are 256 different possible intensities, and so the histogram will
graphically display 256 numbers showing the distribution of pixels amongst those grey scale values.
Input:2D Color image
Output:Extracted Features of input image
4.5
Post Lab
After completing this assignment,analyze the use of feature extraction in case of multimedia content
retrieval.Discuss role of feature extraction in relevant content(images,videos etc.) retrieval.Write your
concluding points accordingly.
4.6
Viva Questions
1. Define Multimedia IR
2. Which features are mostly extracted in case of Search Engines?
3. Compare Text retrieval Vs Multimedia Retrieval.
4. Define Feature Extraction.How it is useful in reducing the storage space of multimedia documents?
Lab Manual - CLP-II(Information Retrieval)
14
Prof.Shah Sahil K. VPCOE, Baramati
Assignment 5
Case Study
5.1
Problem Statement
Study of any recent technology/topic that contributes to information retrieval system.
5.2
Theory
Explain the presentation topic by clearly stating each point thoroughly. Use examples, diagrams to
make the explanation more effective.
5.3
Post Lab
Analyze & compare the topic of study and write the concluding points accordingly.
15
References
[1] C.J. Rijsbergen, “Information Retrieval”,(E-book available at www.dcs.gla.ac.uk)
[2] Yates & Neto, “Modern Information Retrieval”, Pearson Education, ISBN 81-297-0274-6
[3] M.F.Porter, “An algorithm for suffix stripping”, Originally published in July 1980.
[4] Bob Boiko & Wiley, “Content Management Bible”, 2nd Edition, ISBN-978-0-7645-7371-2,
E-book available.
16
Download