- International Journal of Engineering Research

advertisement

About Publication House

Innovative Research Publications (IRP) is a fast growing international academic publisher that publishes

International Journals in the fields of Engineering, Science, Management. IRP is establishing a distinctive and independent profile in the international arena. Our publications are distinctive for their relevance to the target groups and for their stimulating contribution to R&D. Our Journals are the products of dynamic interchange between Scientists, authors, publisher and designer.

Objectives:

· Publishing National and Internationals Journals, Magazine, Books and others in online version as well as print version to provide high quality and high standard publications in National and International

Journals

· Organizing technical events i.e. Seminars, workshop, conferences and symposia etc. to expose knowledge of researchers

· Collaborating with educational and research organizations to expand awareness about R&D

· Helping to financial weak researchers to promote their researches at world level

Our Journals

1. International Journal of Scientific Engineering and Technology

ISSN : 2277-1581

Subject : Science, Engineering, Management and Agriculture Engineering

Last Date for submitting paper : 10th of each month

Web : www.ijset.com, Email : editor@ijset.com

2. International Journal of Engineering Research

ISSN : 2319-6890

Subject : Engineering

Last Date for submitting paper : 10th of each month

Web : www.ijer.in, Email : editor@ijer.in

Innovative Research Publications

Gulmohar, Bhopal M.P. India, Contant No.:+91-9752135004

Web : www.irpindiia.org, Email : info@irpindia.org

0 ISSN : 2319-6890

ISSN : 2319-6890(Online)

2347-5013(Print)

International Journal of

Engineering Research

(IJER)

Volume 4 Issue 1

Jan. 01, 2015

4

Editor in Chief :

Dr. R.K. Singh

Web : www.ijer.in, Email : editor@ijer.in

Department of Chemical Engg, JDIET, Yavatmal (M.S) India

Contant No.:+91-9752135004

Cambridge Institute of Technology, K.R. Puram, Bangalore

Editorial Board

Editor in Chief

Dr. R. K. Singh,

Professor and Head, Department of Electronics and Communication,

KNIT Sultanpur U.P., India

Managing Editor

Mr. J. K. Singh, Managing Editor

Innovative Research Publications, Bhopal M.P. India

Advisory Board

1. Dr. Asha Sharma, Jodhpur, Rajasthan, India

2. Dr. Subhash Chander Dubey, Jammu India

3. Dr. Rajeev Jain, Jabalpur M.P. India

4. Dr. C P Paul, Indore M.P. India

5. Dr. S. Satyanarayana, Guntur, A.P, India.

Organizing Committee

List of Contents

Manuscript Detail S.No.

Page No.

1.

Simulation and Analysis of an Energy efficient protocol Ad-LEACH for Smart Home

Wireless Sensor Network

2.

R. Kavitha, Dr. Nasira G M

Troubleshooter: Solution Finder for log errors from Multiple Solution Sources

3.

Smitab Patil Dr. D R Shashikumar

A Web Enabled Wireless Sensor Networks System For Precision Agriculture

Applications Using Internet Of Things

4.

Sowmya L., Krishna Kumar P R.

A Survey on Developing an Efficient Optimization Technique for Cluster Head

Selection in Wireless Sensor Network

Sridhar R., Dr. N Guruprasad

Spatial and Location Based Rating Systems

5.

6.

Vinay Kumar M., N. Rajesh

Energy Efficient Zone-based Proactive Source Routing to Minimize Overhead in

Mobile Ad-hoc Networks

7.

Lakshmi K. M., Levina Tukaram

Homomorphic Encryption based Query Processing

Homomorphic Encryption based Query Processing

Aruna T.M., M.S. Satyanarayana, Madhurani M.S.

Aruna T.M., M.S. Satyanarayana, Madhurani M.S.

Intrusion Detection System against Bandwidth DDoS Attack

8.

Basavaraj Muragod, Sai Madhvi D.

Information Retrieval With Keywords Using Nearest Neighbor Search

9.

Prathibha

10.

Mining the Frequent Item Set Using Pre Post Algorithm

Manasa M. S., Shivakumar Dallali

11.

Design and Implementation of Research Proposal Selection

Neelambika Biradar, Prajna M., Dr. Antony P J

12.

Light Weight Integrated Log Parsing Tool:: LOG ANALYZER

Priyanka Sigamani S.,Dr. D. R. Shashi Kumar

336-339

340-342

343-345

346-348

249-352

353-357

358-361

362-364

365-367

368-370

371-373

374-379

13.

Towards Secure and Dependable for Reliable Data Fusion in Wireless Sensor Networks under Byzantine Attacks

Valmeeki B.R.

,

Krishna Kumar. P.R.

,

Shreemantha M.C.

380-382

14.

Identity and Access Management To Encrypted Cloud Database

Archana A., Dr. Suresh L., Dr. Chandrakanth Naikodi

15.

An Analysis of Multipath Aomdv in Mobile Adhoc Networks

S. Muthusamy, Dr. C. Poongodi

16.

Light Weight SNMP Based Network Management and Control System for a

Homogeneous Network

Brunda Reddy H K, K Satyanarayan Reddy

17.

Lagrange Based Quadratic Interpolation for Color Image Demosaicking

Shilpa N.S., Shivakumar Dalali

18.

Case Study: Leveraging Biometrics to Big Data

Shivakumar Dalali, Dr. Suresh L., Dr. Chandrakant Naikodi

19.

‘DHERO’ -Mobile Location Based fast Services

Bharath D., Anand S Uppar

20.

‘Im@’ - A Technique for Sharing Location

Bhavya A., Balapradeep K. N., Dr. Antony P. J.

21.

Self-Vanishing of Data in the Cloud Using Intelligent Storage

Shruthi, Ramya N., SwathiS.M., SreelathaP.K

22.

A Survey on Various Comparisons on Anonymous Routing Protocol in MANET’S

Dr. Rajashree V. Biradar , K. Divya Bhavani

23.

Stock Market Prediction: A Survey

Guruprasad S., Rajshekhar Patil , Dr.Chandramouli H, Veena N

24.

Identifying and Monitoring the Internet Traffic with Hadoop

Ranganatha T.G., Narayana H.M

25.

Authentication Prevention Online Technique in Auction Fraud Detection

Anitha k., Priyanka M., Radha Shree B

26.

Differential Query Services Using Efficient Information Retrieval Query Scheme In

Cost-Efficient Cloud Environment

Shwetha R., Kishor Kumar K., Dr. Antony P. J.

383-386

387-389

390-391

392-395

396-398

399-401

402-405

406-408

409-411

412-414

415-418

419-423

424-427

27.

An Efficient and Effective Information Hiding Scheme using Symmetric Key cryptography

Sushma U., Dr. D.R. Shashi Kumar

28.

ATM Deployment using Rank Based Genetic Algorithm with convolution

Kulkarni Manjusha M.

29.

Data Oblivious Caching Framework for Hadoop using MapReduce in Big data

Sindhuja.M, Hemalatha.S

30.

PAPR Reduction For STBC MIMO-OFDM Using Modified PTS Technique Combined

With Interleaving and Pulse Shaping

Poonam, Sujatha S

31.

Generation of Migration List of Media Streaming Applications for Resource Allocation in Cloud Computing

Vinitha Pandiyan

,

Preethi

,

Manjunath S.

32.

Child Tracking in School Bus Using GPS and RFID

ShilpithaSwarna, Prithvi B. S., Veena N.

33.

Risk Mitigation by Overcoming Time Latency during Maternity- An IoT Based

Approach

Sanjana Ghosh, R. Valliammai, Kiran Babu T.S., Manoj Challa

428-431

432-435

436-439

440-444

445-447

448-449

450-452

34.

Predicting Future Resources for Dynamic Resource Management using Virtual

Machines in Cloud Environment

Vidya Myageri,, Mrs. Preethi. S

35.

Pixel Based Approach of Satellite Image Classification

Rohith. K.M., Dr.D.R.Shashi Kumar, VenuGopal A.S.

453-457

458-460

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Simulation and Analysis of an Energy efficient protocol Ad-LEACH for Smart

Home Wireless Sensor Network

R. Kavitha

1

, Dr. Nasira G M

2

1

Department of Computer Science, Christ University, Bangalore, Karnataka, India

2

Department of Computer Applications, Chikkanna Govt. College, Tirupur, Tamilnadu, India kavitha.r@christuniversity.in, nasiragm99@yahoo.com

Abstract

Wireless Sensor Network is one of the emerging technology in the research field. It made up of many inexpensive, tiny sensors connected through the wireless network. Routing with minimal energy is the major challenging task in this network. Many cluster based routing protocols have been developed to overcome this challenge. In this paper, we proposed a new energy efficient protocol Ad-

LEACH. The performance of this protocol is compared with other protocols using MATLAB. The simulation result shows that proposed protocol gives better performance in terms of energy efficient. This protocol can be used in smart home wireless sensor network also.

Keywords

Wireless Sensor Network, Cluster, Energy

Efficient, Smart Home

I.

I

NTRODUCTION

A wireless sensor Network(WSN) is made up of large number of small sensors with low-power transceivers. It helps to gather data in a different environment. Each sensor collects data and sends it through the network to a single processing centre, base station (BS). These collected data helps to decide the features of the environment or to detect a state of an object in a network. As shown in Fig 1, in a smart home(SH) environment[6] each sensor deployed devices are considered as nodes in WSN. The status of the device information is sensed by the sensor. Then SH-WSN passes the sensed information to the

BS. The collected information are used to know the current scenario of the smart home. It plays a great role in controlling the smart home.

Each node in the WSN spends energy to transmit collected data to CH. Each CH spends energy to receive data from all the nodes in the cluster, to aggregate the collected data and transmit to the BS. The network protocol plays a vital role in the data communication. Since WSN consume energy for the communication, it is major and critical task to identify the energy efficient protocol. There are many protocols are proposed for

WSN. Among those LEACH(Low Energy Adaptive Clustering

Hierarchy) protocol helps to save energy in the smaller WSN like smart home.

ICCIT15 @ CiTech, Banglore

Fig 1. Wireless Sensor Network in a Smart Home

II.

LEACH (L

OW

E

NERGY

A

DAPTIVE

C

LUSTERING

H

IERARCHY

)

The first hierarchical cluster-based routing protocol for wireless sensor network is LEACH[1]. This protocol divides the nodes in the network into clusters. As shown in Fig 2 each cluster has a Cluster Head (CH). This dedicated CH node have extra privileges. It is responsible for creating and manipulating a

TDMA (Time division multiple access) schedule[7]. The aggregated data send from nodes to the BS using CDMA (Code division multiple access) by CH. Other than the CHs the remaining nodes in a network are cluster members. LEACH protocol is divided into rounds. Each round consists of two phases:

Set-up Phase o Advertisement Phase o Cluster Set-up Phase

Steady Phase o Schedule Creation o Data Transmission

A. Setup Phase

Each node independently decides that it can become a CH or not. When the node became CH, it decides based on the node served as a CH for the last time. There is a more chance to become CH for a node that hasn't been a CH for long time than the nodes that have been a CH recently. During the advertisement phase, an advertisement packet helps CH to

Page 336

International Journal of Engineering Research

Volume No.4, Issue Special 5 informs their neighborhood that they become CHs. Non-CH nodes receive this CH information with the strongest signal strength. By sending their IDs, the member nodes inform the

CH that they become a member of that cluster. All the CHs communicate to the cluster members using TDMA. Now the

CH knows the number of member nodes and their Ids in the cluster. Based on all messages received within the cluster, the

CH creates a TDMA schedule, pick a CDMA code randomly, and broadcast the TDMA table to cluster members. After that steady-state phase begins.

Fig 2. LEACH Protocol

B.

Steady-State Phase

Actual data transmission begins in this phase. Nodes starts sending their data to the CH during their allocated TDMA slot.

Minimal amount of energy is used in this transmission. All the other non-CH node's radio can be turned off until the nodes are allocated TDMA slot. This minimize the energy dissipation in these nodes. The CH aggregates the received data and sends it to the BS.

D ISADVANTAGE OF LEACH

LEACH protocol selects the CH randomly without considering energy consumption. In this protocol, a node with the less energy has the same priority to become CH as the node with the more energy. This results, all nodes die soon that leads the network failure fast. Since LEACH has the drawback, many researchers have been done to make this protocol perform better[5].

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Hierarchical Cluster-Based Routing - HRC[3] generate energy-efficient clusters in a sensor network. It generates the CH set. Members of a cluster set are selected as a CH using roundrobin technique. The formed clusters are continued for a short period of time called round. A round consists two phases namely called an election phase and a data transfer phase. In the first phase, the sensor nodes generate the cluster with the head-set. In the second phase, the members in the head-set transmit the collected data to the BS as per their turn. The HCR protocol is more energy efficient than the traditional cluster-based routing techniques for continuous monitoring applications.

Enhanced-LEACH - En-LEACH [4] is the another version of LEACH protocol. Probability method is used to select the cluster CH. The formula used for selecting cluster head is

Cluster Head = Energy of the node / Energy of the Cluster

Objective of En-LEACH o To handle cluster-head failure o To account for the non-uniform and dynamic residual energy of the nodes

After implementation, the result shows the first node death is almost two times later than the LEACH. Last node death occurs much later than LEACH.

Ad-LEACH is the new approach for the wireless sensor network. In this approach, CH is elected based on two criteria (i) node's residual energy (ii) node should not serve as a CH recently. The higher priority node will be elected as a CH. After this, the next priority node will be considered as CH-Rep(Cluster

Head Representative). In LEACH protocol, CH loses more energy than the normal node because it spends energy to receive data from all the cluster node and to transmit to the base station which is far away. CH spends less energy to transmit since it transmits single base station and spend more energy to receive since it receives from many nodes. Here CH-Rep helps CH to save energy. It receives data from all the nodes in a cluster, aggregate it and transmit to the CH. Now the CH transmit to the base station. In Ad-LEACH, CH spends less energy than in

LEACH since it receives data from only one node CH-Rep.

III.

L

ITERATURE REVIEW

Cluster Based Routing Protocol-CBRP[2] is a distributive energy efficient protocol for data gathering in wireless sensor network . This protocol elects CH only based on a node’s own residual energy. After the CH selection, CBRP establishes a spanning tree over all of the CHs. Only the root node of the spanning tree can communicate with the sink node by single-hop communication. The energy consumption by the nodes in the network for all communication is calculated by the free space model. CBRP proved that the energy saved extremely and extends the network lifetime.

IV.

I

MPLEMENTATION

Nowadays, research in the area of low-energy radio is a great challenge for researchers. There are distinct theories about the radio model and energy dissipation in the transmit and receive mode. Fig 3 shows the radio energy dissipation model[1] used in our work.

ICCIT15 @ CiTech, Banglore

Page 337

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Fig 3 First Order Radio Model

In this model, the radio dissipates E elec

= 50nJ/bit to run the transmitter or receiver circuitry and =100nJ/bit/m

2 for the transmit amplifier. By using this radio model, transmit a k-bit message to the distance d the radio spends:

Fig 4a. Sensor dead(dots) and remain alive(circle) after 50 rounds in LEACH protocol and to receive k-bit message, the radio spends

All the nodes in the network have energy E(n) except base station. The characteristics of the test network are specified in table I.

TABLE I. Parameter used in simulation

Parameter

No. of Nodes

Network Size

Node Position

Data Size

No. of Rounds

Value

100

100 X 100

Random

1000 Bits

50

Fig 4b. Sensor dead(dots) and remain alive(circle) after 50 rounds in Ad-LEACH protocol

After 50 rounds of transmission, many sensor nodes dead(dots) and very few sensor nodes are alive(circle) in the

LEACH protocol. But, in Ad-LEACH many sensor nodes are remains alive(circle) after 50 rounds of transmission. These results are shown in figure 4a and 4b respectively.

I.

R

ESULT

Ad-LEACH protocol is simulated using MATLAB tool. The sensor network consists of 100 nodes that scatter randomly in 100

X 100 of a square field. The table I shows all parameters used to implement the Ad-LEACH protocol. In this simulation, all node begins with 20J of energy.

Figure 5 Network Lifetime

TABLE. II Comparative Study of CBRP, HCR, En-LEACH &

Ad-LEACH

Features

Selection of CH

CBRP

Node with High

Residual

Energy

HCR

One from

Head-Set by Round

Robin

En-

LEACH

Probability method

Ad-

LEACH node's residual energy

&node

ICCIT15 @ CiTech, Banglore

Page 338

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Network

Life time compared with

LEACH

Data

Transfer

Mechanism between

CH &BS

Special

Feature

Drawback

80% more

Method

30% more 50% more

Minimal

Spanning Tree with Single

Hop

Communication

Decided by

Genetic

Algorithm

(GA)

CDMA should not served as a CH recently

60% more

After 50 rounds of transmission, comparison of network lifetime of the protocols LEACH and Ad-LEACH are shown in

Figure 5. This simulation proves Ad-LEACH perform well than

LEACH in terms of network lifetime of the sensor network.

There are many protocols proposed for energy efficient WSN.

The performance of Ad-LEACH protocol is compared with other three main protocols CBRP, HCR, and En-LEACH. These four protocols performance are measured with the base protocol

LEACH. TABLE I shows this result comparative analysis.

TDMA

We tracked the lifetime of the nodes in the network. The summarized results are as follows:

Only Spanning

Tree Root node will

Communicate with BS

If

PSV(Parent

Selection

Same

Value) come for two nodes

Head-Set approach

GA needs to improve while generating hierarchal structure

All cluster members are kept informed about energy status of their clusterhead node which not served as a CH recently will become a CH

Less energy save than

CBRP

A node with more energy, served

CH recently will not be next

CH

The first node die in the 7th round of Ad-LEACH hence it occurs in the 1st round itself in the LEACH protocol.

At the end of 50th round almost 85 nodes are die in the

LEACH, but in Ad-LEACH only 55nodes are die which shows the Ad-LEACH protocol can help the network to keep more life.

II.

C

ONCLUSIONS

Routing in wireless sensor network for a smart home is a emerge area of research. Since sensor networks are designed for specific applications, designing efficient routing protocols for sensor networks is very important. In this paper, we discussed

Ad-LEACH, a cluster based routing protocol which can be used in a smaller network application like smart home, minimize the energy usage by the entire network. Ad-LEACH is better than

LEACH in terms of CH selection, optimizing CH energy consumption and extending network lifetime.

R

EFERENCES

[1] WendiRabiner Heinzelman, AnanthaChandrakasan, and Hari Balakrishnan,

2000. Energy-Ef ficient Communication Protocol for Wireless Micro sensor

Networks. IEEE Proceedings of the 33rd Hawaii International Conference on System Sciences.

[2] D Bager Zarei, Mohammad Zeynali and Vahid Majid Nezhad. 2010. Novel

Cluster Based Routing Protocol in Wireless Sensor Networks. IJCSI

International Journal of Computer Science.

[3] Sajid Hussain and Abdul W. Matin, Hierarchical Cluster-based Routing in

Wireless Sensor Networks, IPSN, US.

[4] Suyog Pawar, Prabha Kasliwal. 2012. Design and Evaluation of En-

LEACH Routing Protocol for Wireless Sensor Network. IEEE International

Conference on Cyber-Enabled Distributed Computing and Knowledge

Discovery.

[5] Debnath Bhattacharyya , Tai-hoon Kim , and Subhajit Pal. 2010

Comparative Study of Wireless Sensor Networks and Their Routing

Protocols.

[6] R. Kavitha, Dr. G. M. Nasira, Dr. M. Nachamai. 2012. Smart home systems using wireless sensor network – A comparative analysis. International

Journal Of Computer Engineering & Technology (IJCET).

[7] Suyog Pawar, Prabha Kasliwal, Design and Evaluation of En-LEACH

Routing Protocol for Wireless Sensor Network

ICCIT15 @ CiTech, Banglore

Page 339

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Troubleshooter: Solution Finder for log errors from Multiple Solution Sources

SmitaB Patil Dr. D R Shashikumar

M.Tech Student Deptt of CSE, CiTech, Bangalure, Karnataka India smitapatil.july90@gmail.com

Abstract :

Currently, one of the challenges for team-

Escalation is to have a unified tool through all the Knowledge

Base can be searched together. With this tool-Troubleshooter, we would be able to facilitate the team to pull the information from all the knowledge sources such as, engineering KB,

Documentationand JIRA or even from external Google.

Further, in Engineering Knowledge base, we have multiple sources like bug tracking system, Forum and even customer searchable knowledge base. With this proposed tool-

“Troubleshooter: Solution Finder for Log Errors”, we will be integrating all the knowledge sources under one umbrella through which the team can search through. This would reduce the time taken to search all knowledge sources independently and would help the team to correlate the data available with other knowledge sources. Additionally, this tool also has the capacity to search based on the product selected and thus would help in filtering the result which are essentially for the selected product.

Index Terms :

Search Solutions in single solution sources,

Search Solutions in Multiple solution sources. History for

Search Results, Product Based Search.

I. INTRODUCTION

The escalation team in industryis used to examine the log files manually for an error. Once the employee comes up with errors in the log fie, his/her next target is to find solutions for the obtained error(s) with respective to the products.

With proposed system-Troubleshooter, we would be able to facilitate the team to pull the information from all the knowledge sources such as, engineering KB, Documentation and

JIRA or even from external Google. Further, in Engineering

Knowledge base, we have multiple sources like bug tracking system, Forum and even customer searchable knowledge base.

With this proposed tool“Troubleshooter: Solution Finder for

Log Errors”, we will be integrating all the knowledge sources under one umbrella through which the team can search through.

The tool-Troubleshooter: Solution Finder for log errors from Multiple Solution Sources provides a solution search from different solution sources with respect to the products like SRM,

NCM/UIM, ViPR, SMARTS and Watch4Net.

The Troubleshooter will also provide a search history for the user. With search history one can directly go back to the previous search operations. The GUI of the tool is very simple and user friendly. Troubleshooter: Solution Finder for log errors from Multiple Solution Sources is a standalone tool so it needs to install on each user’s personal computer.

This tool would reduce the time taken to search all knowledge sources

ICCIT15@CiTech, Bangalore independently and would help the team to correlate the data available with other knowledge sources. The result with respect to the different solution sources are display on different tabs of the tabbed pane where each tabs will display names of solution sources so that it is easy for the user to know from where the solutions are fetched.

The tool Troubleshooter also provided with list of standard java errors and also some frequently occurring errors with respect to the different products- SRM, NCM/UIM, ViPR,

SMARTS and Watch4Net.By providing bunch of errors

Troubleshooter helps the user to directly select an error from the provided list and perform search operation instead of typing error manually and perform search operation.

The tool is very useful for the user to find solutions from different solution sources at a time as there is no such application which provides solutions from multiple solution sources at time this tool helps them for the same. Initially this tool will authenticate the user and never ask again for the authentication. But without this tool one need to authenticate by all solutions sources individually irrespective of other solution sources. Without this tool the user needs to give his/her username and password manually for the multiple times whenever solution sources needs an authentication this is one overhead work for the user.

II. Existing System

There are many systems exists to find the errors from log files for examples Xpolog, Log Analyzer, Event Log Analyzer,

Piwik, nxlog and Octopussy. These log analyzing systems that are available in the market are used to analyze a log file and to find the errors in the log file, but there is no existing system that finds the solutions for the errors-Obtained after log search. By which we can conclude that there is no existing system that provides an efficient way for finding solutions from different solution sources within same framework. But there are

Knowledge sources from which user can get solutions individually irrespective of each other Knowledge Bases.

III. Solution search in multiple solution sources

The tool “Troubleshooter: Solution Finder for log errors from

Multiple Solution Sources” - The objective of this tool is to provide an efficient way of finding solutions for log errors from single or multiple solution sources and display solutions within the same framework- One can easily correlate the data available with all the knowledge sources andfind the most relevant solution among all the errors obtained from multiple solution sources for an errors-Errors obtained by log file search.

Page 340

International Journal of Engineering Research

Volume No.4, Issue Special 5

This tool provides the user to perform search in single or multiple solution sources- Knowledge Base, Clear Quest, JIRA,

Forum and Documentation.

We will be integrating all the knowledge sources under one umbrella through which the team can search through. This would reduce the time taken to search all knowledge sources independently and would help the team to correlate the data available with other knowledge sources.

Additionally, this tool also has the capacity to search based on the product selected and thus would help in filtering the result which are essentially for the selected productSRM, NCM,

ViPR, SMARTS and Watch4Net.

The Tool will also provide a search history for the user, in case if the user facing same errors again and again very frequently that time user can use the history to go back to his/her previous search operations to get relevant solution instead of searching solution from multiple solution sources for the same error again. The GUI of the tool is very simple and user friendly, the tool tips are provided for all the components of the tool so that one can easily operate this tool. This tool is standalone so it needs to install on each user‘s personal computer.

This tool reduces workload of user by giving an option to search the error for solutions in multiple solution sources. The result with respect to the different solution sources are display on different tabs of the tabbed pane where each tabs will display names of solution sources so that it is easy for the user to get from where the solutions are fetched and tool will also provide products filtered search results- The search is based on particular product based on user requirement.

The tool also provided with some standard java errors and also the bunch of errors with respect to the products- SRM,

NCM/UIM, ViPR, SMARTS and Watch4Net. This helps the user to directly select an error from the provided list and perform search operation instead of typing error manually and perform search operation.

The tool is very useful for the user to find solutions from different solution sources because it integrate all the knowledge sources under one umbrella through which the team can search through. Initially this tool will authenticate the user and never ask again for the authentication. But without this tool one need to authenticate by all solutions sources individually with irrespective of other solution sources. Without this tool the user needs to give his/her username and password manually for the multiple times whenever a solution source needs an authentication this is again an overhead for the user.

Fig 1- System architecture of a Tool-Troubleshooter: Solution

Finder for log errors from Multiple Solution Sources

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Obtained errors from the log files search are given as input to this tool- Troubleshooter: Solution Finder for log errors from Multiple Solution Sources. Then the tool asks the user to input the name of solution source(s) and name of the product based on user requirements. Once after getting proper input from the user the tool searches for given error in specified solution for the specified product based on user requirements. After solution search the tool displays output from different solution sources on different tabs of display.

IV. Search History

History contains an information such as Product selected-the type of product that user has been selected, Searched for-The query for which user want to find the solution from multiple solution sources with respect to an product, Solution Sources-

Name of solution source(s) where user want to search a query and Date and Time of search process.

V. Results

A. Initial Screen

Fig 2: Initial Screen of the Tool-Troubleshooter: Solution Finder for log errors from Multiple Solution Sources

This snapshot shows how an initial page looks. Initial page contains knowledge base search where user need to select

Knowledge sources and products based on his/her requirements and post the query in the text field. Query can be entered manually by user or can be selected from standard java errors list or can be selected from the product based errors that the tool provided.

Solution search process takes place by searching the query posted on the text filed in selected knowledge base for the selected product. The query that user want to search may be a standard error, product based error, error found after log search or it can be manually typed by the user. Results obtained from multiple solution sources are displayed on the display window.

The result with respect to the different solution sources are display on different tabs of the tabbed pane where each tabs will

Page 341

International Journal of Engineering Research

Volume No.4, Issue Special 5 display names of solution sources so that it is easy for the user to know from where the solutions are fetched. One can see the different tabs with the knowledge base name in initial screen fig

2.

B. History Table

History of the searched results is shown in history tab of the display window as shown in fig 3. History contains an information such as Product selected-the type of product that user has been selected, Searched for-The query for which user want to find the solution from multiple solution sources with respect to an product, Solution Sources- Name of solution source(s) where user wants to search a query and Date and Time of search process.

History for the searched results.

C. Solutions from Solution Sources.

Fig 3:

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 vi.

vii.

viii.

ix.

i.

ii.

iii.

iv.

v.

solutions from different solution sources are displayed on different tabs of tabbed pane on display window. Where each tabs will display names of solution sources so that it is easy for the user to get from where the solutions are fetched.

V. Conclusion

With this Tool- Troubleshooter: Solution Finder for log errors from Multiple Solution Sources. One can find solution from single or multiple solution sources and get the solution from each solution sources. Using this proposed tool, we will be integrating all the knowledge sources under one umbrella through which the team can search through. This would reduce the time taken to search all knowledge sources independently and would help the team to correlate the data available with other knowledge sources. Additionally, this tool also has the capacity to search based on the product selected and thus would help in filtering the result which are essentially for the selected product.The tool will also provide a History for searched results.

With this history user can go to his/her previous search if he/she come up with same errors again.

REFERENCES

http://www.xpolog.com/ https://eventloganalyzer.codeplex.com/ http://en.wikipedia.org/wiki/Piwik https://toolbox.googleapps.com/apps/loganalyzer/ http://en.wikipedia.org/wiki/JIRA http://www.google.com/custom?q

http://nxlog-ce.sourceforge.net/ http://sourceforge.net/projects/syslog-analyzer/ https://confluence.atlassian.com/display/JIRA/JIRA +Requirements

Fig 4:

Solutions from CQ Knowledge source with respect to ViPR product for the query selected from product based error.

Fig 4 shows how the tool display the results from a CQ

Knowledge Base for the product ViPR and for the query selected from the product based error. The solutions are displayed from

CQ Knowledge base in CQ tab of the display. User can further view the detailed information by clicking on the links that are displayed on the display window. Similarly the tool will search solution from different solution sources and the obtained

ICCIT15@CiTech, Bangalore Page 342

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

A Web Enabled Wireless Sensor Networks System For Precision Agriculture

Applications Using Internet Of Things

SOWMYA L

1

, KRISHNA KUMAR P R

2

¹Student of M.Tech(CSE), IV Semester

² Associate Professor, Dept. of M.Tech(CSE), Cambridge Institute of Technology, Bengaluru-036 sowmyaklrsv@gmail.com, rana.krishnakumar@citech.edu.in

ABSTRACT- Environmental Monitoring Systems and Sensors systems have increased in importance over the years. However, increases in measurement points mean increases in installation and maintenance cost. Not to mention, the measurement points once they have been built and installed, can be tedious to relocate in the future. Therefore, the purpose of this Master’s thesis is to present a project called “A web enabled wireless sensor network system for precision agriculture application using Internet of Things” which is capable of intelligently monitoring agricultural conditions in a pre-programmed manner. The proposed system consists of three stations: Sensor

Node, Router, and Server. To allow for better monitoring of the climate condition in an agricultural environment such as field or greenhouse, the sensor station is equipped with several sensor elements such as Temperature, humidity, pollution sensor, and soil moisture. The communication between the sensor node and the server is achieved via wireless ZigBee modules. The overall system architecture shows advantages in cost, size, flexibility and power. It is believed that the outcomes of the project allow for opportunities to perform further research and development of a ZigBee based Wireless Sensor

Network that is a portable and flexible type of sensing system for an Agricultural Environment.

Index Terms – Internet of Things, Precision Agriculture, Zigbee based Wireless Sensor Network.

1. INTRODUCTION

Agriculture products are dependent upon environmental factors where plant growth and development are largely affected by the conditions experienced. Similarly diseases that occur due to environmental factors can cause plant growth to be significantly affected.

Agriculture environments such as fields and greenhouses allow growers to produce plants with an emphasis on agriculture yield and productivity. In addition, it also provides the possibility to grow plants in environments previously not suited for the task.

In particular, the use of greenhouse provides plants with protection from the harsh weather conditions, diseases and a controlled environment.

Agriculture environments are complex system where significant changes in one environmental factor could have an adverse effect on another. Environmental factors can affect survival and growth, in particular with regards to germination, sprouting, flowering and fruit development. They can also indicate increased risk of disease and be used for prediction of upcoming changes in the environment. It is therefore of particular interest to monitor these environmental factors in particular for any control and management systems that might be implemented.

Temperature, humidity, pollution sensor, soil moisture, are the variables that are of interest to growers. Manual collection of data for desired factors can be sporadic, not continuous and produce variations from incorrect measurement taking. This can cause difficulty in controlling these important factors. Sensor networks have been deployed for a wide variety of applications and awareness has increased with regards to implementing technology into an agricultural environment. Sensors are becoming the solution to many existing problems in industries with their ability to operate in a wide range of environments.

Sensor nodes can reduce the time and effort required to monitor an environment. This method reduces the risk of information being lost or misplaced. It would also allow placement in critical locations without the need to place personnel at risk. Monitoring systems can permit quicker response time to adverse factors and conditions, better quality control to produce and lower labour cost. The utilization of this technology would allow for remote measurements of factors such as temperature, humidity, soil moisture, pollution.

In Agriculture field of studies the sensing devices are mainly necessary for two intensions, i. Sense and communicate with actuators ii.To sense the parameter and send the information to remote base station for expert analysis.

In this paper study an attempt has been made to develop

KrishiSense.A Web enabled WSN system for agriculture application using IOT integrate the open Geospatial Consortium specified Sensor Web Enablement standards on the sensing system thereby enabling the interoperability between different standardized sensing devices.KrishiSense is an interconnection

ICCIT15@CiTech, Bangalore

Page 343

International Journal of Engineering Research

Volume No.4, Issue Special 5 between multiple researchers,scientists,farmers and extension community through multiple protocol and distributed web connected platforms,thusfaciliting human participatory sensing.

2. WORKING PRINCIPLE

In this project we are developing a system that is based on precision agriculture system. Here in our project we will be using Temperature Sensor, humidity sensor, Soil Moisture sensor, Pollution Sensor to monitor the different parameters in the agriculture system and a ZigBee Module to wirelessly transmit to the PC and update the same on the server. From server it can get updates of all sensor information about agriculture. Soil Moisture and Temperature Sensor is the analog sensor and it had been connected to the ADC, humidity sensor and Carbon dioxide sensor and carbon monoxide sensors are

Digital sensors which had been directly connected to the controller. Here we will be monitoring all these sensors, and the data collected by all these sensors are updated on PC that is further updated on the Server. The owner can remotely see any activity on his/her PDA or smartphone by going on to the server.

Thus, making a very efficient way of monitoring the agriculture and increasing the productivity.

3. SYSTEM ARCHITECTURE

In system architecture it consists of three parts they are , i.

Sensing part ii. Server part iii.Client part (fig 1).All three parts are interconnected to internet forming a WSN based close loop internet of things platform.

Figure 1. System Architecture

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

The system architecture consists of low powered system on chip single board computer as a data collection and dissemination platform. It also acts as WI-FIhotspot for other sensor nodes and acess point to the farmers/stakeholders.

3.1 Sensing Part

The sensing system comprises of SoC hardware with Linux operating system and softwares such as Apache. Number of sensor like temperature,humidity,soil temperature etc are connected with hardware with time settings of minimum

15minutes.The sensing system transfer the informations to remote server through FTP.

3.2 Server Part

The Server part acts as a data assimilation platform of sensing system. The Remote SOS Server part facilitates the FTP communication with field level which in turn simplify the remote configuration of field sensor nodes.

3.3 Client Part

The client part of sensing system can communicate through multiple protocols and multi-modal communication platforms such as mobile and internet. The sensing system configuaration tasks of information collection and transfer interval,change in hardware,faultdetection,etc can be authenticate based on remote configuaration module.

4. RESULTS

The output result of Web enabled WSN system for agriculture using IOT has been designed to precisely monitor citrus crop resources like soil, water and weather requirements and management in vidarbha,semi arid tropical region of

Maharashtra,India. The guidelines for automatic weather station deployment are considered for sensor placement in agriculture sensing system.

The main development of this project over existing sensing systems are, i.real time processing of raw analog voltage information,ii. It convert raw observations into real values and iii. Real time update of observations into SOS database.These

facilitates the interoperability of agriculture with any other sensing system.

In figure, the placement of various sensors is 1) two sensors each temperature and humidity are placed at height of 5 meter from ground,2) similarly two sensors are placed at height of 2.5

meter from ground,3) at radial distance of 5 meter from orange tree, two soil temperature and one soil moisture sensors are placed at 15, 30 and 30cm depth respectively and 4) remaining channel is closed using pull down resistor(figure 2).

ICCIT15@CiTech, Bangalore

Page 344

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print) iv.

19 & 20 May 2015

A. Piedra, F.B. Capistros, F. Dominguez, A. Touhafi, "Wireless

Sensor Networks for Environmental Research: A Survey on Limitations and

Challenges", EuroCon 2013, Zagreb, Croatia, 1-4 July, 2013.

v.

I. Mampentzidou, E. Karapistoli, A.A. Economide, "Basic Guidelines for Deploying Wireless Sensor Networks in Agriculture", Fourth International

Workshop on MobileComputing and Networking Technologies, pp. 864-869,

2012.

vi.

M. Botts, A. Robin, J. Davidson, I. Simonis, "OpenGIS Sensor Web

Enablement (SWE) Architecture Document", OpenGeospatial Consortium, OGC

06-021r1, 2006, http://www.opengeospatial.org/standards , accessed on: 26

Nov., 2013.

Figure 2. Agricultural sensing system field layout

5. CONCLUSION

The study showed that the selection of proper data processing location based real time agro-meterological sensing system can be designed. The web enabled WSN system for agriculture using

IOT provides better plants grow in the best condition. Man power can be reduce by applying this method.it also helps for out of seasons plants to be grown. The change in weather can not affect the plant growth. Finally, the main application is, We can operate this project to find out geographical area are suitable for agriculture.

REFERNCES i.

S. Li, J. Cui, Z. Li, "Wireless Sensor Network for Preci7se Agriculture

Monitoring,"Fourth International Conference onIntelligent Computation

Technology and Automation, Shenzhen, China, March 28-29, 2011.

ii.

A. Piedra, F.B. Capistros, F. Dominguez, A. Touhafi, "Wireless

Sensor Networks for Environmntal Research: A Survey on Limitations and

Challenges", EuroCon 2013, Zagreb, Croatia, 1-4 July, 2013 e, F. Soto, J.

Suardiaz, et. al., "Wireless Sensor Networks for Precision Horticulture in

Southern Spain", Computers and Electronics in Agriculture, vol. 68, pp. 25-35,

2009.

iii.

V.M. Patil, S.S. Durbha, J. Adinarayana, "Standards-Based Sensor

Web for Agro-Informatics Applications", InternationalGeoscience and Remote

Sensing Symposium (IGRASS 2012),Muninch, Germany, pp. 6681-6684, 22-27

July, 2012.

ICCIT15@CiTech, Bangalore

Page 345

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

A Survey on Developing an Efficient Optimization Technique for Cluster Head

Selection in Wireless Sensor Network

Sridhar R.

1

, Dr. N Guruprasad

2

1

Dept of ISE, Global Academy Of Technology, Bangalore

2

Dept. of CSE, Raja Rajeswari College Of Engineering, Bangalore srimln@yahoo.com, nguruprasad18@gmail.com

Abstract - This paper proposes an efficient optimization technique for the selection of cluster head with distinctive features, which will produce a dynamic energy efficient wireless sensor networks. It proposes an energy efficient artificial bee colony (ABC) algorithm for cluster head selection in wireless sensor networks. The optimization is in the sense; the wireless sensor network will choose the cluster head and then gives efficient sensor readings with less number nodes and reduced energy consumption.

The cluster heads are planned to perform effective aggregation of the sensor readings. The sensor readings are added dynamically to the cluster head from the sensor nodes. The cluster head is then aggregates the sensor readings and passes to the base station

(BS). The expecting outcome of the paper is the remarkable reduction of energy consumption because of the dynamic and efficient optimization technique for the selection of cluster head in wireless sensor networks.

Keywords – Wireless Sensor networks, cluster head, optimization

1.

Introduction

A wireless sensor network consists of tiny sensing devices, which normally run on battery power. Sensor nodes are densely deployed in the region of interest. Each device has sensing and wireless communication capabilities, which enable it to sense and gather information from the environment and then send the data and messages to other nodes in the sensor network or to the remote base station[16].Wireless sensor networks have been envisioned to have a wide range of application in both military and civilian domains [10]. Due to the less power of sensor node energy, researchers have designed lots of energy-efficient routing protocols to prolong the life time of sensor networks [21].The energy source of sensor nodes in wireless sensor networks (WSN) is usually powered by battery, which is undesirable, even impossible to be recharged or replaced. Therefore, improving the energy efficiency and maximizing the networking lifetime are the major challenges in sensor networks [20].

Considering the limited energy capabilities of an individual sensor, a sensor node can sense only up to very limited area, so a wireless sensor network has a large number of sensor nodes deployed in very high density (up to

20nodes/m) [22, 5, 12], which causes severe problems such as scalability, redundancy, and radio channel contention [16].

Energy efficiency is one of core challenges in wireless sensor networks because energy is scarce and valuable. In order to minimize energy expenditure and maximize network lifetime,

ICCIT15 @CiTech, Bangalore numerous energy-aware protocols and algorithms for wireless sensor networks (e.g. S-MAC [11], SPIN [19]) have been proposed by researchers [14]. In order to reduce the amount of traffic in the network, we build clusters of sensor nodes as proposed in e.g. [15, 18, and 7]. In wireless sensor networks data transmission is very expensive in terms of energy consumption, while data processing consumes significantly less [9].

Minimizing the number of communications by eliminating or aggregating redundant sensed data saves much amount of energy

[13]. Hierarchical or cluster-based routing, are well-known techniques with special advantages related to scalability and efficient communication. As such, the concept of hierarchical routing is also utilized to perform energy-efficient routing in

WSNs. In a hierarchical architecture, higher energy nodes can be used to process and send the information while low energy nodes can be used to perform the sensing in the proximity of the target. Some of routing protocols in this group are: LEACH [14] ,

PEGASIS [11] , TEEN [19] and APTEEN [6] . Placing few heterogeneous nodes in wireless sensor network is an effective way to increase network lifetime and reliability.

However, the LEACH algorithm selects the cluster heads dynamically and frequently by round mechanism, which makes the cluster heads broadcast messages to all the general nodes in the establishment stage with additional energy consumption.

Thus modification on clustering algorithms becomes inevitable for energy efficient wireless sensor networks [4].

The concept of cluster heads alleviates the some problems regarding the hierarchical clustering to an extent. I.e., some sensor nodes become cluster heads and collect all traffic from their respective cluster. The cluster head aggregates the collected data and then sends it to its base station [6]. The nodes in wireless sensor networks are often assumed to be organized into clusters. A typical scenario is that sensor readings are first collected in each cluster by a designated node, known as cluster head, that aggregates them and sends only the result of the aggregation to the base station [1]. In a homogeneous network, cluster head uses more energy than non-cluster head nodes. As a result, network performance decreases since the cluster head nodes go down before other nodes do. Besides, if the algorithm strives to balance the energy consumption of every node, cluster heads will be selected dynamically and frequently, which results in additional energy consumption for the cluster head set-up. At the same time, some residual energy of general nodes cannot be used effectively because of receiving broadcast message frequently from the new cluster head. Thus energy efficient cluster head selection algorithm is very important issue in clustered WSNs. Hence defining the optimization technique for

Page 346

International Journal of Engineering Research

Volume No.4, Issue Special 5 cluster head selection minimizes the consumption of energy in

WSN.

2.

Literature Survey

Our work is motivated by a number of prior works related to clustering in wireless sensor network. Some of them are analyzed here.

OzlemDurmazIncel et al [17] have proposed a method for

Fast Data Collectionin Tree-Based Wireless Sensor Networks in their work, they explored andevaluated a number of different techniques using realistic simulation modelsunder the many-toone communication paradigm known as converge cast. They first consider time scheduling on a single frequency channel with the aim ofminimizing the number of time slots required

(schedule length) to complete a converge cast. Next, they combined scheduling with transmission power control tomitigate the effects of interference, and show that while power control helpsin reducing the schedule length under a single frequency, scheduling transmissions using multiple frequencies is more efficient. They gave lower bounds on the schedule length when interference is completely eliminated, and propose algorithms that achieve these bounds. They also evaluate theperformance of various channel assignment methods and find empirically that formoderate size networks of about 100 nodes, the use of multi frequency scheduling can suffice to eliminate most of the interference. Then, the data collectionrate no longer remains limited by interference but by the topology of therouting tree. To this end, they constructed degree-constrained spanning treesand capacitated minimal spanning trees, and show significant improvement inscheduling performance over different deployment densities. Lastly, Theyevaluated the impact of different interference and channel models on the schedule length.

Guoliang Xing et al [8], have proposed a rendezvous-based data collectionapproach in which a subset of nodes serve as rendezvous points that buffer andaggregate data originated from sources and transfer to the base station when itarrives. This approach combines the advantages of controlled mobility and innetwork data caching and can achieve a desirable balance between network energysaving and data collection delay. They proposed efficient rendezvous designalgorithms with provable performance bounds for mobile base stations withvariable and fixed tracks, respectively. The effectiveness of their approachwas validated through both theoretical analysis and extensive simulations.

Dan Wu et al [3], have proposed a method to focus on how to select a proper transmission scheme, with the goal of improving the energy efficiency, e.g., prolonging the network lifetime. In particular, they model the transmission scheme selection problem as a nontransferablecoalition formation game, with the characteristic function based on the networklifetime.

Then, a simple algorithm based on a merge-and-split rule and the

Pareto order is proposed to form coalition groups among individual sensor nodes.The resulting coalitional structure is

ICCIT15 @CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 characterized through novel stabilitynotions and shows which transmission scheme is employed and which cluster nodes are chosen to collaborate with the clusterhead. Extensive simulation results are provided to demonstrate theeffectiveness of their proposed game model and algorithm.

Dali Wei et al [2] have proposed a distributed clustering algorithm, Energy-efficient Clustering (EC) that determines suitable clustersizes depending on the hop distance to the data sink, while achieving approximate equalization of node lifetimes and reducedenergy consumption levels. They additionally proposed a simpleenergy-efficient multi hop data collection protocol to evaluatethe effectiveness of EC and calculate the end-to-end energyconsumption of this protocol; yet EC is suitable for any datacollection protocol that focuses on energy conservation. Performanceresults demonstrate that EC extends network lifetimeand achieves energy equalization more effectively than two well-knownclustering algorithms, HEED and UCR.

Otgonchimeg Buyanjargal and Youngmi Kwon [22] have proposed a modified algorithm of Low Energy Adaptive

Clustering Hierarchy (LEACH) protocol which is a well-known energy efficient clustering algorithm for WSNs. Their modified protocol called “Adaptive and Energy Efficient Clustering

Algorithm for Event-Driven Application in Wireless Sensor

Networks (AEEC)” is aimed at prolonging the lifetime of a sensor network by balancing energy usage of the nodes. AEEC makes the nodes with more residual energy have more chances to be selected as cluster head. Also, they used elector nodes which took the responsibility of collecting energy information of the nearest sensor nodes and selecting the cluster head. They compared the performance of their AEEC algorithm with the

LEACH protocol using simulations.

Dilip Kumar et al. [4] have studied the impact of heterogeneity of nodes in terms of their energy in wireless sensor networks that are hierarchically clustered. They have assumed that a percentage of the population of sensor nodes is equipped with the additional energy resources. They also assumed that the sensor nodes were randomly distributed and were not mobile, the coordinates of the sink and the dimensions of the sensor field were known. Homogeneous clustering protocols assume that all the sensor nodes were equipped with the same amount of energy and as a result, they cannot take the advantage of the presence of node heterogeneity. Adapting this approach, they have introduced an energy efficient heterogeneous clustered scheme for wireless sensor networks based on weighted election probabilities of each node to become a cluster head according to the residual energy in each node.

Finally, the simulation results demonstrated that their proposed heterogeneous clustering approach was more effective in prolonging the network life-time compared with LEACH.

Xiang Min et al. [20] have presented the clustering algorithm mainly takes into account reducing the total energy consumption with optimum parameters. By optimizing the one-hop distance and the clustering angle, all nodes are divided into static clusters

Page 347

International Journal of Engineering Research

Volume No.4, Issue Special 5 with different sizes, which was maintained the connectivity and reduce the energy consumption for inter-cluster communication.

Besides, with continuous working mechanisms for cluster head acting as the local control center, the frequency of cluster head updating was reduced, and the energy consumption for the new cluster head set-up was reduced. With the clustering algorithm, the total energy consumption for inter-cluster and intra-cluster communications was reduced. The simulation results show that the system time is extended effectively.

3.

Proposed Methodology

In 2005, Karaboga proposed an Artificial Bee Colony

(ABC), which is based on a particular intelligent behavior of honeybee swarms.ABC is developed based on inspecting the behaviors of real bees on finding nectar and sharing the information of food sources to the bees in the hive.

Agents in ABC are The Employed Bee, The Onlooker Bee and

The Scout.

The Employed bees: It stays on a food source and provides the neighborhood of the source in its memory.

The Onlooker bees: It gets the information of food sources from the employed bees in the hive and select one of the food source to gathers the nectar.

The Scout: It is responsible for finding new food, the new nectar, and sources.

Procedure of ABC:

Initialize (Move the scouts).

Move the onlookers.

Move the scouts only if the counters of the employed bees hit the limit.

Update the memory

Check the termination condition

Thus ABC procedure optimizes by choosing the cluster head and then gives efficient sensor readings with less number of nodes and reduced energy consumption.

4

. Objectives

A study over the recent techniques for developing a cluster head selection technique for wireless sensor network.

Developing an optimization technique for the selection of cluster head for dynamic environment in wireless sensor network to achieve energy efficient aggregation of sensor readings from cluster head to Base station (BS).

Analysis of the proposed technique using various simulations set up with different existing techniques.

5. Possible outcome

The expecting outcome of the paper is the remarkable reduction of energy consumption because of the dynamic and efficient optimization technique for the selection of cluster head in sensor networks.

ICCIT15 @CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

References

Buttyan, L. and Holczer, T., “Perfectly anonymous data aggregation in wireless sensor networks Mobile Adhoc and Sensor Systems (MASS), IEEE 7th

International Conference on, pp. 513-528, 2010.

i.

Dali Wei, Yichao Jin, SerdarVural, Klaus Moessner, Rahim Tafazolli,

"An Energy Efficient Clustering Solution for Wireless Sensor Networks", IEEE

Transactions on Wireless Communications, VOL. 10, NO. 11, 2011 ii.

Dan Wu, YuemingCai, Jinlong Wang, " A Coalition Formation

Framework forTransmission Scheme Selection in Wireless Sensor Networks",

IEEE Transactions OnVehicular Technology, VOL. 60, No. 6 2011.

iii.

Dilip Kumar, Trilok C. Aseri, R.B. Patel, “EEHC: Energy efficient heterogeneous clustered scheme for wireless sensor networks”, Computer

Communications, Vol. 32, pp. 662-667, 2009.

iv.

D. Tian, and N. D. Georganas,” A Node Scheduling Scheme f or

Energy Conservation in Large Wireless Sensor Networks” Thesis,

Multimedia Communications Research Laboratory, School of Information

Technology and Engineering, University of Ottawa, 2002.

v.

Ewa Hansen, Jonas Neander, Mikael Nolin, Mats Björkman ,

"Efficient Cluster Formation for Sensor Networks ", march 2006, “MRTC report

ISSN 1404-3041 ISRN MDH-MRTC-199/2006-1- SE, Mälardalen Real-Time

Research Centre, MälardaleUniversity, March, 2006" vi.

G. Pei and C. Chien,”Low Power TDMA in Large Wireless Sensor

Networks”, Military Communications Conference, vol.1, pp.347– 351, 2001.

vii.

Guoliang Xing, M Minming Li, Tian Wang, WeijiaJia and Jun Huang,

"Efficient Rendezvous Algorithms for Mobility Enabled Wireless Sensor

Networks", IEEE Transactions On Mobile Computing, VOL. 11, No. 1 2012.

viii.

Hnin Yu Shwe, JIANG Xiaohong, Susumu Horiguchi, “Energy saving in wireless sensor networks”, Journal of Communication and Computer

Volume 6, No.5, 2009 ix.

I.F Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, “A survey on senso r networks. IEEE Communications Magazine”,pp.102– 114, 2002.

x.

J. Kulik, W. Heinzelman, and H. Balakrishnan, “Negotiation -based protocols for disseminating information in wireless sensor networks,” Wireless

Networks, vol. 8, no. 2/3, pp. 69 – 185, 2002.

xi.

J.M. M cCune,”Adaptability in sensor networks” Undergraduate

Thesis in Computer Engineering, University of Virginia, April 2003.

xii.

K. Intae and R. Poovendran, “ Maximizing static network lifetime of wireless broadcast ad hoc networks,” in Proceedings of the IEEE Int ernational

Conference on Communications pp. 2256 – 2261.11

– 15, 2003.

xiii.

Liyang Yu , Neng Wang , Wei Zhang and Chunlei Zheng,”GROUP: a

Gridclustering Routing Protocol for Wireless Sensor Networks”, In proceedings of Wireless Communications, Networking and Mobile Computing, pp. 1 - 5 ,

2006.

xiv.

M. Gerla, T. Kwon, and G. Pei,” On Demand Routing in Large Ad

Hoc Wireless Networks with Passive Clustering”, proceedings of IEEE Wireless

Communications and Networking Confernce, pp.100-105.

xv.

Mohammad Zeynali, Leili Mohammad Khanli and Amir Mollanejad

“TBRP: Novel Tree Based Routing Protocol in Wireless Sensor Network”,

International Journal of Grid and Distributed Computing, Vol. 2, No. 4, 2009.

xvi.

OzlemDurmazIncel, Amitabha Ghosh, Bhaskar Krishnamachari, and

Krishnakant Chintalapudi, "Fast Data Collection in Tree-Based Wireless Sensor

Networks", IEEE Transactions On Mobile Computing, Vol. 11, No. 1 2012.

xvii.

W. Heinzelman, A. Chandrakasan, and H. Balakrishnan,” Energy-

Efficient Communication Protocol for Wireless Microsensor Netwo rks”, Maui,

Hawaii, In Proceedings of the 33rd International Con-ference on System

Sciences, 2000.

xviii.

W. Ye, J. Heidemann, and D. Estrin, “An Energy -Efficient MAC

Protocol for Wireless Sensor Networks,” in Proceedings of IEEE INFOCOM, pp. 1567-1576, 2002.

xix.

Xiang Min, Shi Wei-ren, Jiang Changjiang and Zhang Ying, “a

Energy efficient clustering algorithm for maximizing lifetime of wireless sensor networks”, AEU - International journal of electronics and communications, vol.64, no. 4, pp. 289-298, 2010.

xx.

Xianghui W ang and Guoyin Zhang, “DECP: A Distributed Election

Clustering Protocol for Heterogeneous Wireless Sensor Networks”, computational science, vol. 4489/2007, pp. 105-108, 2007.

xxi.

[22]. Ye, M.; Li, C.F.; Chen, G.; Wu, J.,” EECS: An Energy

Efficient Clustering Scheme in Wireless Sensor Networks” In Proceedings of the IEEE International Performance Computing and Communications

Conference, pp.535-540, 2005

Page 348

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Spatial and Location Based Rating Systems

Vinay Kumar M., N. Rajesh

Dept. of ISE, The National Institute of Enginering, Mysore.

Affiliated to Visveswaraya Technological University, Belgavi, Karnataka

{vinaym884, nrajeshin}@gmail.com

Abstract : Recommender systems forms an integral part of ones day to day activities in today's world of Internet. We come across recommender systems in many fields in our day to day online transactions. But earlier systems are not well equipped with the spatial ratings of the items based on location. In this paper we propose an efficient and scalable recommender systems which uses location based ratings for rating the items.

Index Terms — spatial ratings, filtering, preference locality, collaborative filtering, recommender systems.

I . I N T R O D U C T I O N

With the advent of the Internet we come across different day to day transactions being done over the networks.When dealing with such online transactions viz.,mobile Internet,online ticket booking,online shopping,online money transfer over network etc (popular example in today's Internet world is paytm,a popular method of doing online money transfer,online recharges,mobile banking and so on) and also we often come across large inventory places with large no of listed items.In

such cases often we look for the ratings provided to different items by different users so as to shortlist the items based on their recommendations.Often this types of recommendations will lead us to the items which are best suited according to our specifications.Even though there are plenty of advantages in utilizing such ratings and recommendations provided by such users there are certain limitations in such systems.They often do not consider the spatial location of the users who have given the ratings.In this paper we utilize location based ratings for the set of items being listed on the inventory.For an instance we have an web based applications like movie lens and Netflix where we find location based ratings for different items which are provided by the different community members.Currently, myriad applications can produce location based ratings that embed user and/or item locations.Basically,there are three novel classes of location based ratings, namely, spatial ratings for non-spatial items, non spatial ratings for spatial items, and spatial ratings for spatial items. Certain location aware networks

(e.g., Foursquare [4] and Facebook Places [5]) allow users to

“check in” at spatial destinations (e.g., restaurants) and rate their visit, thus are capable of associating both user and item locations with ratings.Based on the spatial ratings provided by the users new users who are currently unaware of certain new items and their quality listings can know many hidden things about the different products without actually experiencing it and allows users to save their energy as well as efficiency.Hence

location aware systems are becoming popular today and the spatial ratings provided by such systems are of better if not the best in nature when compared to the earlier recommender

ICCIT15@CiTech, Bangalore systems.

II.

MATERIAL AND METHODOLOGY

A.

Location aware query model

Location based systems uses the technique in which it provides recommendations based on the location aware ratings which are present in the given system.It supports both the continuous as well as the snapshot queries.

B. Collaborative filtering technique for item filtering

This is the popular technique which is used by the recommender systems.There are two types of it among which one is narrow one and the other is more general.Collaborative

filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.[2] Applications of collaborative filtering typically involve very large data sets.This is the general case definition for the collaborative filtering.In the new narrow sense of collaborative filtering technique it makes automatic filtering of items based on the preferences or tastes of the users.But the difficulties in this collaborative filtering technique includes it needs active user participation,it requires easy way of representing the user interests with the system,and the algorithms which can match the people with similar kind of interests.

C. Filtering technique based on content

This is another popular filtering technique which are used in designing the recommender systems.They are based on the description of the given item as well as the preference of the users based on their profile.Widely used algorithm approach is vector space representation.The user profile is created based on the model of the user's preference and the history of users interaction with the recommender systems.Some of the machine learning techniques such as cluster analysis, decision trees Bayesian Classifiers,and artificial neural networks are used in order to identify the probability that the user is going to like an item.Direct feedback from a user, usually in the form of a like or dislike button, can be used to assign higher or lower weights on the importance of certain attributes (using Rocchio Classification or other similar techniques).There are number of content based recommender systems which are aimed at providing certain movie recommendations such as Internet Movie Database,Rotten

Tomatoes,Jinni etc.

D. Hybrid recommender systems

It is a combination of collaborative filtering as well as content based filtering technique and such approaches has found to be

Page 349

International Journal of Engineering Research

Volume No.4, Issue Special 5 useful in many scenarios.They can be implemented in several ways one by making content as well as collaborative based pedictions separately and then combining them. Netflix is a popular example of hybrid systems.They makes the comparison based recommendations of similar kinds of users with similar actions.

E. Certain popular recommender systems

1. Mobile recommender systems: is trending area of research in the field of recommender systems.With the advent of smart phones now it is possible to offer personalized, contextsensitive recommendations.This is very difficult area of research as the mobile data is heterogeneous, noisy and more complex for the recommender systems to deal with and they also suffers from the transplantation problem. recommendations may not apply in all regions (for instance, it would be unwise to recommend a recipe in an area where all of the ingredients may not be available).One popular example of a mobile recommender system and such systems with basic requirement of Internet offers potentially profitable routes for the drivers of taxi in a city. This system takes as input data in the form of

GPS traces of the routes that taxi drivers took while working, which include location (latitude and longitude), time stamps, and operational status (with or without passengers). It then recommends a list of pickup points along a route that will lead to optimal occupancy times and profits. This type of system is obviously location-dependent, and as it must operate on a hand held or embedded device, the computation and energy requirements must remain low.

2. Risk aware recommender systems: they are the intelligent recommender systems which takes into account of varieties of risks which are associated with the recommendations and steps which needs to be taken in order to counter them.And also the performance of the recommender system depends on the risk factor as well.Hence risk has to be taken into account before providing any recommendations.

F. Preference Locality

Preference locality is a popular methodology which suggests the preferred users from the spatial region in order to (e.g., neighborhood) prefer items (e.g., movies, destinations) that are different in entirety from the items which are preferred from the other and even adjacent areas.This technique is adopted widely in case of location based rating systems in order to provide recommendations.

III. RESULTS AND TABLES

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Fig:- Similarity calculation based on items.

In order to compute the similarity we use cosine similarity function as follows which is based on the items similarity.

Cosine similarity between the items ip and iq. It is calculated by using circle co-rated dimensions. Preference locality is the technique which is used for the spatial user ratings for the non spatial items.We require three things for such kinds of recommendations-locality,scalability and the influence.

In location aware systems it produces recommendations on non spatial items using spatial ratings i.e,the tuple (user, ulocation, rating, item),by employing a user partitioning technique that exploit preference locality.This technique uses an adaptive pyramid structure to partition ratings by their user location attribute into spatial regions of varying sizes at different hierarchies. For a querying user located in a region

R, we apply an existing collaborative filtering technique that utilizes only the ratings located in R.

Travel penalty is the technique which we utilize in order to provide recommendations for spatial items using non spatial ratings.i.e., the tuple (user, rating, item, ilocation).Both the techniques like travel penalty and the user partitioning is used in order to produce the recommendations for the spatial items by using spatial ratings.i.e., the tuple (user, ulocation, rating, item, ilocation).

Fig:- Item based collaborative model building.

ICCIT15@CiTech, Bangalore

Fig:- Pyramidal data structure technique for providing

recommendations.

A. Model building and recommendation generation

Given a querying user u, recommendations are produced by computing u ’s predicted rating P(u,i) for each item i not rated by u [9]:

Page 350

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

C. Some experimental results of location aware ratings system.

Before this computation, we reduce each similarity list L to contain only items rated by user u.

B. Our Contribution to the paper

Recommender system is not today's hot topic to be discussed.It

had its impact since the inventory of the Internet. Even though there is a lots of improvements which we have seen from the traditional recommender system to today’s well equipped recommender system there is always a scope for an improvement.Even in today's location based ratings there is no proper authentication methods which are employed in order to register the users to give their recommendations on the items.Here we propose to design the recommender system with sophisticated encryption mechanism in order to make sure that the users who provide the ratings are genuine.And we also propose to employ automatic database alteration mechanism where in we maintain the database with their recommendations and the ratings for an item and after certain period of time the database is automatically refreshed in order to eliminate the ratings of inactive users.Only the ratings which are provided by the active users with certain period of time is retained because maintaining the older recommendations may be not useful and there may be a chance that there is a deterioration in the quality of the items which are listed over a period of time.Even we prefer to maintain the databases in every renowned location and make a provision to give ratings related to their experience by registering themselves to the local website and their should be provision to be made such that the ratings provided by the users should be monitored only by the central database administrator not by the owner of the organizations.

We try to adopt RSA encryption algorithm because of its interoperabilty and one cannot encrypt or decrypt the message by using different algorithm than the one which is used to create it and also it is available freely for non commercial use.

c = ENCRYPT (m) = me mod n m = DECRYPT (c) = cd mod n

(1)

(2)

The calculation of m,c,d,e,n is done by following the normal steps of RSA algorithm.

Fig:- Depicts the application of RSA in providing both confidentiality and authentication.

ICCIT15@CiTech, Bangalore

Fig-Quality experiments for varying locality.

Fig- Quality experiments for varying answer sizes.

D. Experimental evaluation of location aware systems

Experiments evaluation are based on three data sets.

(1)Foursquare: a real data set consisting of spatial user ratings for spatial items derived from Foursquare user histories.

(2) MovieLens: a real data set consisting of spatial user ratings for non-spatial items taken from the popular

MovieLens recommender system [7]. The Foursquare and

MovieLens data are used to test recommendation quality.

(3) Synthetic: a synthetically generated data set consisting spatial user ratings for spatial items for venues in the state of

Minnesota, USA; we use this data to test scalability and query efficiency. Quality of the recommendations results are depicted as earlier in the above figures.

IV. CONCLUSION

In our proposed spatial aware recommender rating systems tackles the problems which were not solved by the earlier traditional recommender systems.Our system deals with three kinds of ratings-(a) spatial ratings for the non spatial items,(b) spatial ratings for spatial items,(c) non spatial ratings for spatial items.In addition to these methods we also add proper authentication mechanism for the users in order to give ratings for the items.We adopt travel penalty and user partitioning techniques in order to support spatial items and the spatial ratings respectively.Both these techniques can be

Page 351

International Journal of Engineering Research

Volume No.4, Issue Special 5 applied separately or in concert to provide location based ratings.Experimental results and the evaluation shows how the spatial recommender systems are more efficient and scalable than the traditional recommender systems.

ACKNOWLEDGMENT

The successful publishing of this paper would be incomplete without the mention of the people who made it possible and whose constant guidance crowned my effort with success.

I would like to thank the Principal of our college Dr.

G.L.Shekar for his whole hearted support in facilitating to publish this paper.

I would like to thank my guide Sri N. Rajesh, Assistant

Professor Department of ISE,NIE-Mysore for his valuable inputs and guidance for the presentation of this paper.

Finally, I would like to thank all the teaching and non teaching staff for their co-operation.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

REFERENCES i.

G. Linden et al, “Amazon.com Recommendations: Item -to-Item

Collaborative Filtering,” IEEE Internet Computing,vol 7, no. 1, pp. 76 –

80,2003.

ii.

“Netflix: http://www.netflix.com.” iii. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl,

“GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” in CSWC, 1994.

iv.

“Foursquare: http://foursquare.com.” v.

“The Facebook http://tinyurl.com/3aetfs3.”

Blog, ”Facebook Places”: vi.

G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of

Recommender Systems: A Survey of the State-of-the-Art and Possible

Extensions,” TKDE, vol. 17, no. 6, pp. 734 – 749, 2005.

vii.

“MovieLens: http://www.movielens.org/.” viii.

“New York Times A Peek Into Netflix Queues: http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflixmap.html.” ix.

B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item -Based

Collaborative Filtering Recommendation Alg orithms,” in WWW, 2001.

x.

J. S. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis of

Predictive Algorithms for Collaborative Filtering,” in UAI, 1998.

xi.

W. G. Aref and H. Samet, “Efficient Processing of Window Queries in the Pyramid Data Structure,” POD.

ICCIT15@CiTech, Bangalore

Page 352

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Energy Efficient Zone-based Proactive Source Routing to Minimize Overhead in Mobile Ad-hoc Networks

Lakshmi K. M., Levina Tukaram

Dept Of CSE, Alpha College Of Engineering, Bangalore, India lakshmi.km682@gmail.com, levinajunias@gmail.com

Abstract: Opportunistic data forwarding has become a hot topic in the multihop wireless networking. Opportunistic data forwarding is not used in mobile ad hoc networks (MANETs) due to the lack of an efficient lightweight proactive strong source routing scheme. Proactive Source Routing uses

Breadth First Spanning Trees (BFSTs) and maintains more network topology information to facilitate source routing. It overhead is much smaller than traditional DV-based protocols, link state (LS)-based routing protocols and reactive source routing protocols but the computational and memory overhead involved in maintaining BFSTs to reach every node in the denser networks is high. In this paper Zone-based Proactive

Source Routing Protocol is proposed. Zone routing protocol

(ZRP) uses partition based routing. It uses Source routing inside the zone and on-demand routing outside the zone. The advantages of both proactive and zone based routing protocols is combined by this approach. The simulations shows that the

Z-PSR, zone based proactive source routing protocol performs better compared to PSR.

Keywords: PSR, BFST, Link State, Source routing, Ad-hoc

Network.

I. INTRODUCTION

Mobile ad-hoc network (MANET) is a self-organized and selfconfigurable wireless Communication network. It represents complex distributed systems that contains various wireless mobile nodes which can freely move and dynamically selforganize into arbitrary and temporary ad-hoc network topologies. It allows people and devices to seamlessly internetwork in areas without pre-existing communication infrastructure, e.g., battlefield communications, emergency operations, disaster recovery environments. A great deal of research results have been published since its early days in

1980s [i]. The salient research challenges in this area are link access control, security, end-to-end transfer and providing support for real-time multimedia streaming [ii]. In the research on MANETs, the network layer has received a considerable amount of attention. Hence large number of routing protocols with differing objectives for various specific needs have been proposed in this network [iii].

Figure 1 shows an example of mobile ad-hoc network and its communication technology. As shown in Figure, an ad hoc network might consist of several home-computing devices which includes laptops, cellular phones etc. Each node can communicate directly with any other node that resides within its transmission range. The node needs to use intermediate nodes to relay the messages hop by hop to communicate with nodes that reside beyond this range.

Opportunistic data forwarding utilizes the broadcast nature of wireless communication links [iv] and data packets are handled in a multihop wireless network. In traditional IP forwarding, the intermediate nodes looks up a forwarding table to find a dedicated next hop, but Opportunistic data forwarding broadcasts the data packet and allows potentially multiple downstream nodes to act on the packet. One of the initial works on opportunistic data forwarding is selective diversity forwarding by Larsson [v]. In this paper the transmitter sends the packet to multiple receivers and selects best forwarder from these receivers which successfully receives the data and requests the selected node to forward the data. The overhead in this approach is more and it should be reduced before it can be implemented in practical networks. This issue was addressed in the seminal work on ExOR [vi], which outlines a solution at the link and network layers. In ExOR, all the nodes in the work are enabled to overhear all packets on the air and therefore, more number of nodes can potentially forward a packet, provided that all the nodes should be included in the forwarder list which is carried by the packet. The contention feature of the mediumaccess-control (MAC) sublayer effectively utilized and hence the forwarder which is very much closer to the destination will access the medium. Therefore, the MAC sublayer can determine the actual next-hop forwarder to utilize the long-haul transmissions in a better way.

A lightweight proactive source routing (PSR) protocol is proposed to facilitate opportunistic data forwarding in

MANETs. In this protocol, each node maintains a breadth-first search spanning tree of the network rooted at itself. This routing information is periodically exchanged among neighbouring nodes for updated network topology information. And hence

PSR allows a node to have full-path information to all other nodes in the network. The communication cost is only linear to the number of the nodes. Thus, it supports both source routing and conventional IP forwarding. But the computational and memory overhead involved in maintaining the BFSTs to reach every node in the denser networks will be high.

In this paper, Z-PSR (Zone based proactive source routing protocol is proposed which is lightweight, source routed, uses Breadth First Spanning Trees and is based on

PSR[vii] and ZRP[viii].

Fig 1: Mobile Ad-hoc Network

ICCIT15@CiTech, Bangalore

Page 353

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

The remainder of this paper is organized as follows.

Section II reviews related work on routing protocol in MANETs.

Section III describes the design and implementation details of our proposed Zone-based proactive source routing scheme. The computer simulation, related experimental results, and comparisons between PSR and Z-PSR are presented in Section

IV. Section V concludes this paper with a discussion of future research.

II. RELATED WORK

Routing is the process of establishing path and forwarding packets from source node to destination node. MANET routing protocols could be broadly classified into three major categories: proactive, reactive and hybrid as shown in Figure 2.

MANET Routing Protocols

Proactive or Reactive or

Hybrid

Table Driven On-demand

DSDV DSR ZRP

OLSR AODV TORA

Fig 2. Routing Protocols in MANETs

A. Proactive Routing Protocol or Table driven Routing protocols

In Proactive routing protocol, each mobile node maintains a routing table. when a route to a destination is needed, the routing information can be obtained from the routing table immediately.

Proactive protocols maintains the table and keeps on updating it as the topology changes in the network. When it is required to forward the data or packet to a particular node, the route can be easily and immediately obtained from the table. So there is no any time delay because there is no time spent in route discovery process. So a shortest path can be found without any time delay.

However in a denser network, these protocols are not suitable due to high traffic. The disadvantages of such protocols are high latency time in route finding and excessive flooding can lead to network clogging. Example: DSDV [ix] (Destination-Sequenced

Distance-Vector), OLSR [x] (Optimized link state routing).

B. Reactive Routing Protocols or On-demand routing protocols

Reactive Routing Protocols are also called as on-demand routing protocols. These protocols are more efficient than proactive routing protocols. The idea behind this type of routing protocols is to find a route between a source and destination whenever it is needed. By this the routing overhead can be reduced, whereas overhead is more in proactive protocols since the nodes maintain routes to all other nodes in network without knowing its state of use. So in reactive protocols it is not needed to maintain the routes which are not being used currently. On-demand or reactive routing protocols avoids the cost of maintaining routes that are not being used. Ad-hoc On Demand Distance Vector

(AODV)[xi], and Dynamic Source Routing (DSR) [xii] protocols are the examples of reactive or on-demand protocols.

C. Hybrid Routing Protocol

Hybrid routing protocol is a combination of both proactive and reactive routing protocols. It uses a table driven approach within a given zone around the node, and a demanddriven approach is then applied outside of that zone. Example:

Zone Routing Protocol (ZRP), Temporary Ordered Routing

Algorithm (TORA).

Proactive Source Routing (PSR)

PSR uses Table driven approach and it is the base for the newly proposed Zone-based proactive source routing protocol. So it’s required to know the working of PSR to understand Zone Routing Protocol (ZRP). A lightweight proactive source routing (PSR) protocol facilitates Opportunistic data forwarding in Mobile Ad-hoc Networks. To facilitate source routing PSR maintains more network topology information than distance vector (DV) routing. The breadth-first spanning tree(BFST) of the entire network rooted at itself is provided by PSR to every node of the network. To facilitate this, nodes periodically broadcasts the tree structure which has built to entire network in each iteration. A node can expand and refresh its knowledge about the network topology by constructing a deeper and more recent BFST based on the information collected from neighbours during the most recent iteration. This routing information will be distributed to the neighbours in the next round of operation. Thus, Each node will have full-path information to all other nodes in the network using this routing scheme. The communication cost of PSR is only linear to the number of nodes in the network and hence both source routing and conventional IP forwarding is supported by PSR.

The details of PSR is described in the following three sections. Before that we will review some graph-theoretic terms used here. The network is modelled as undirected graph G = ( V,

E ), where V is the set of nodes(or vertices) in the network, and E is the set of wireless links (or edges). The edge connected by two nodes u and v is denoted as e = ( u, v ) ∈ E if they are close to each other and if they can directly communicate with a given reliability. Given node v , N ( v ) is used to denote its open neighbourhood, i.e., {u ∈ V | ( u, v ) ∈ E} . Similarly, N [ v ] is used to denote its closed neighbourhood, i.e., N ( v ) ∪ {v}.

(Refer [14] for other graph-theoretic notions).

A.

Route Update

The update operation of PSR is iterative and distributed among all nodes in the network due to its proactive nature. Node v is only aware of the existence of itself at the beginning. Therefore, only single node is there in its BFST, which is root node v. It is able to construct a BFST within N[ v ], by exchanging the BFSTs with the neighbours, i.e., the star graph which is denoted by S v and centered at v . Nodes exchange their spanning trees with their neighbours, in each subsequent iteration. Towards the end of each operation interval, from the perspective of node v, it has received a set of routing messages from its neighbours packaging the BFSTs. The most recent information from each

ICCIT15@CiTech, Bangalore

Page 354

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 neighbour is incorporated by node v to update its own BFST. At the end of the period, it then broadcasts this tree structure to its neighbours. Formally, v has received the BFSTs from some of its neighbours. Node v has a BFST which contains received updates in recent previous iterations and it is denoted by T u

, and cached for each neighbour u

N ( v ). The union graph constructed by node v

G v

= S v

∪ _ u ∈ N ( v )( T u

− v ) . (1)

Here,

T − x denotes the operation of removing the subtree of T rooted at node x . Some special cases are: if x is not in T, then T − x = T , and if x is the root of T , then T − x = ∅ .

Then, node v calculates a BFST of G v

, which is denoted T v

, and places T v in a routing packet to broadcast to its neighbours.

B.

Neighbourhood Trimming

When a neighbour node is deemed lost, all its relevant information from the topology repository which is maintained by the detecting node and also its contribution to the network connectivity should be removed. This process is called neighbourhood trimming. Consider the node v. The neighbour trimming procedure is triggered at v about neighbour u either by the following cases:

1) For a given period of time, if there is no routing update (data packet) has been received from this neighbour.

2) As reported by the link layer, if a data transmission to node u has failed.

C.

Streamlined Differential Update

In PSR, The “full dump” routing messages are interleaved with

“differential updates ". The idea is to send the full update messages less frequently than shorter messages containing the difference between the current and previous knowledge of a node’s routing module. The routing update is further streamlined in two new ways. First, we use a compact tree representation in full-dump and differential update messages to halve the size of these messages. Second, as the network changes every node attempts to maintain an updated BFST so that the differential update messages are even shorter.

III ZONE-BASED PROACTIVE SOURCE ROUTING

Proactive routing uses more bandwidth to maintain

Fig 3. Routing zone of node S with zone radius β=2 routing information, while reactive routing involves long route request delays. Reactive routing protocols also floods the entire network inefficiently for route determination. The Zone Routing

Protocol (ZRP) addresses these problems by combining the best properties of both approaches. ZRP can be called as a hybrid reactive/proactive routing protocol.

It can be assumed that, in an ad-hoc network, the largest part of the traffic is directed to nearby nodes. Hence, ZRP reduces the proactive scope to a zone which is centered on each node. The maintenance of routing information is easier in a limited zone. Further, the amount of routing information which is never used is minimized. The nodes which are farther away can be reached still with reactive routing. Since all nodes proactively store local routing information, route requests can be more efficiently performed without querying all the nodes in the network.

ZRP is having a flat view over the network despite the use of zones. By this way, the organizational overhead which is related to hierarchical protocols can be avoided. Since the hierarchical routing protocols depend on the strategic assignment of gateways or landmarks, every node in the network can access all levels, especially the top level. The nodes which are belonging to different subnets must send their communication to a subnet which is common to both nodes. By this, the parts of the network may be congested. Since the zones overlap, ZRP can be categorized as a flat protocol. Hence, network congestion can be reduced and optimal routes can be detected. Further, the behaviour of ZRP is adaptive. The behaviour depends on the behaviour of the users and the current configuration of the network.

The routing zone has a radius r which is expressed in hops. Thus, the zone includes the nodes whose distance from the node in question is at most r hops. An example routing zone is shown in Figure 3, where the routing zone of S includes the nodes A–I, but not K. The radius is marked as a circle around the node in question. It should be noted that the zone is defined in hops, not as a physical distance. The nodes of a zone are divided into peripheral nodes and interior nodes. The nodes whose minimum distance to the central node is exactly equal to the zone radius r are peripheral nodes and the nodes whose minimum distance is less than r are interior nodes. In the Figure

3, the nodes A–F are interior nodes, the nodes G–J are peripheral nodes and the node K is outside the routing zone.

Here, note that node H can be reached by two paths, one with the length 2 and one with the length 3 hops. Since the shortest path is less than or equal to the zone radius, the node is within the zone. By adjusting the transmission power of the nodes, the number of nodes in the Figure 3 routing zone can be regulated. Lowering the power reduces the number of nodes within direct reach and vice versa. To provide adequate reachability and redundancy, the number of neighbouring nodes should be sufficient. On the other hand, a too large coverage results in many zone members and the update traffic becomes excessive. Further, large transmission coverage adds to the probability of local contention.

Protocol Design

The new routing protocol proposed in this paper is named as Z-

PSR. We are combining the advantages of both PSR and ZRP hence the name. Basic problems of PSR are discussed below.

In a denser network, the overhead involved in maintaining the BFST to reach every node in the network will become high in case of PSR. The time taken to search for a route

ICCIT15@CiTech, Bangalore

Page 355

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 from the set of BFSTs is also high. Even though PSR is reducing the overhead in terms of communication bytes, it fails to reduce the computational overhead and memory overhead incurred by each node in finding out the route. This results in high energy consumption.

The objectives of the Zone-based proactive source routing protocol are as follows

1. Develop a routing protocol which minimizes the computation overhead in searching for a route.

2. The protocol should reduce the memory occupied by each

BFST.

3. The protocol should find route to the destination with minimum delay.

4. The minimize energy consumption compared to the existing

PSR protocol.

The following steps are taken, in order to meet the above objectives.

1. Each node will maintain a BFST of its one hop or two hop neighbours only, as opposed to PSR where every node needs to maintain BFST to reach every other node in the network.

2. Whether to maintain one hop or two hop neighbours BFST is decided based on parameter radius. If radius = 1, maintain BFST to reach one hop neighbours. If Radius =2, maintain BFST to reach two hop neighbours and so on. The Simulations in this paper has used radius 2.

3. When a node needs to send data to its one hop or two hop neighbours, it will use BFSTs maintained at that node. When it needs to send data to other nodes, (other than one/ two hop neighbours), it needs to send data to one of the two hop neighbour which will have BFST to reach the destination.

4. The challenge in this protocol is to determine which two hop neighbour will have the BFST to reach the destination. A node will be receiving BFSTs from all the nodes, but it need not store them. Periodically update messages also sent by the neighbouring nodes. So when a node needs to transmit data to a node which is not its one/two hop neighbour, it has to check the update from neighbours to check if it has a path to the destination. The BFST messages will be transmitted as a broadcast messages. Therefore , here the protocol is actually following the concept of Link state vector algorithm, which says pass information about neighbours to all the nodes in the network.

5. Thus only when needed, the node will accept and process the broadcast messages carrying BFST of other nodes.

6. This reduces computation overhead and memory overhead maintaining the communication overhead at same level as PSR.

IV. PERFORMANCE EVALUATION

Fig.5 shows the graphical result of length of the BFST to be stored at each node. It is clear from the graph that Z-PSR needs to store only shorter length BFSTs compared to PSR.

70

60

50

40

30

20

Z-PSR

PSR

10

0

1 2 3 4 5 6 7 8 9 10111213141516

Node Id

Fig.5 Length of BFST at each node

Below Fig.6 Shows the Packet delivery ratio, for both

PSR and Z-PSR, Z-PSR maintains a 99.9% delivery ratio.

1.2

1

0.8

0.6

0.4

0.2

0

PSR

Z-PSR

1 20 40 60 80 100 120 140

Simulation time In seconds

Fig.6 Packet Delivery ratio Vs. Simulation Time

V. CONCLUSION

Routing in ad hoc network is always a challenging one. This paper proposed a routing protocol based on two existing protocols ZRP and PSR. The simulations results show that the proposed protocol outperforms the existing PSR protocol which acts as a base for this new protocol. Simulations can be extended to change the number of nodes, node mobility.

REFERENCES i.

Chlamtac, M. Conti, and J.-N. Liu, “Mobile ad hoc networking:

Imperatives and challenges,” Ad Hoc Netw., vol. 1, no. 1, pp. 13–64, Jul. 2003. ii.

M. Al-Rabayah and R. Malaney, “A new scalable hybrid routing protocol for ANETs,” IEEE Trans. Veh. Technol., vol. 61, no. 6, pp. 2625–

2635,Jul. 2012.

ICCIT15@CiTech, Bangalore

Page 356

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 iii.

R. Rajaraman, “Topology control and routing in ad hoc networks: A survey,” ACM SIGACT News, vol. 33, no. 2, pp. 60–73, Jun. 2002. iv.

Y. P. Chen, J. Zhang, and I. Marsic, “Link-layer-and-above diversity in multi-hop wireless networks,” IEEE Commun. Mag., vol. 47, no. 2,pp. 118–

124, Feb. 2009. v.

P. Larsson, “Selection diversity forwarding in a multihop packet radio network with fading channel and capture,” ACM Mobile Comput.

Commun. Rev., vol. 5, no. 4, pp. 47–54, Oct. 2001. vi.

S. Biswas and R. Morris, “ExOR: Opportunistic multi-hop routing for wireless networks,” in Proc. ACM Conf. SIGCOMM, Philadelphia, PA, USA,

Aug. 2005, pp. 133–144. vii.

Zehua Wang, Cheng Li ; Yuanzhu Chen, “PSR: A lightweight

Proactive Source Routing protocol for Mobile Ad Hoc Networks” in IEEE transactions on Vehicular Technology, Vol. 63, No. 2, February 2014. viii.

Zone Routing Protocol available online http://www.cs.mun.ca/~yzchen/papers/tvt2014.pdf. ix.

C. E. Perkins and P. Bhagwat, “Highly dynamic Destination-

Sequenced Distance Vector Routing (DSDV) for mobile computers,” Comput.

Com mun. Rev., vol. 24, pp. 234–244, Oct. 1994. x.

T. Clausen and P. Jacquet, “OptimizedLink State Routing Protocol

(OLSR),”RFC 3626, Oct. 2003. [Online].Available: http://www.ietf.org/rfc/rfc3626.txt xi.

C. E. Perkins and E. M. Royer, “Ad hoc On-Demand Distance Vector

(AODV) routing,” RFC 3561, Jul. 2003. [Online]. Available: http://www.ietf.org/rfc/rfc3561.txt xii.

D. B. Johnson, Y.-C. Hu, and D. A. Maltz, “On The Dynamic Source

Routing Protocol (DSR) for mobile adhoc networks for IPv4,” RFC 4728, Feb.

2007.[Online].Available: http://www.ietf.org/rfc/rfc4728.txt.

ICCIT15@CiTech, Bangalore

Page 357

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Homomorphic Encryption based Query Processing

Aruna T.M., M.S. Satyanarayana, Madhurani M.S.

Department of I S E, SVCE,Banglore Karnataka India aruna150888@gmail.com, satyanarayanams@outlook.com, ms.madhu2009@gmail.com

ABSTRACT: In a private database query system a client issues without learning what the query was or even how many records match the query. The client learns nothing else about the queries to a database server and obtains the results without learning anything else about the database and without the server learning the query. In this work we develop tools for implementing private database queries using homomorphic encryption (HE), that is, using an encryption system that supports only limited computations on encrypted data. We show that a polynomial encoding of the database enables an e ffi cient implementation of several di ff erent query types using only low-degree computations on cipher texts. Specifically, we study two separate settings that o ff er di ff erent privacy/e ffi ciency tradeo ff s. In the basic client-server setting, we show that additive homomorphisms are su ffi cient to implement conjunction and threshold queries. We obtain further e ffi ciency improvements using an additive system that also supports a single homomorphic multiplication on ciphertexts. This implementation hides all aspects of the client’s query from the server, and reveals nothing to the client on non-matching records. To improve performance further we turn to the “Isolated-Box” architecture of De Cristofaro et al.

In that architecture the role of the database server is split between two non-colluding parties. The server encrypts and database contents.

Unfortunately, being a generalization of SPIR, private database queries is subject to all the same inherent ine ffi ciency constraints as SPIR, making the design of practical schemes for private database queries a challenging task. In this work we explore the use of homomorphic encryption (SWHE) [3] for the design of private database query protocols. In particular, we show that certain polynomial encodings of the database let us implement interesting query types using only homomorphiccomputations involving low-degree polynomials. There are now several encryp- tion schemes that e ffi ciently support the low-degree homomorphiccomputations on encrypted data that we need [4,

5].

In this work we consider two di ff erent settings. The first is the traditional, two-party, client- server setting. In this setting the server has the database, the client has a query, and we seek a protocol that gives the client all (and only) those records that match its query without the server learning what the query is. As mentioned above,in this setting the server must process the entire database for every query (or else it would learn that the pre-processes then-record database and also prepares an encrypted inverted index. The server sends the encrypted database and inverted index to a proxy, but keeps the decryption keys to itself. The client interacts with both server unprocessed records do not match the query). Moreover the server has to return to the client as much data as the number of records in the database, or else it would learn some information about the number of records that match the query. and proxy for every query and privacy holds as long as the server and proxy do not collude. We show that using a system that supports only log(n) multiplications on encrypteddata it is possible to implement conjunctions and threshold queries e ffi ciently.We implemented our protocols for the Isolated-box architecture using the ho- momorphic encryption system by

Brakerski, and compared it to a simpler implementation that only uses Paillier’s additively homomorphic encryption system.

The implementation using some- what homomorphic encryption was able to handle a query with a few thousand matches out of a million-record database in just a few minutes, far outperforming the implementation using additively homomorphic encryption.

Keywords : Cipher Text, Homomorphic Encryption,

Threshold, Non Colluding Parties.

1.

INTRODUCTION

Enabling private database queries is an important (and hard) research problem arising in many real- world settings. The problem can be thought of as a generalization of symmetric private information retrieval (SPIR) [1, 2] where clients can retrieve records by specifying complex queries. For example, the client may ask for the records of all people of age 25 to 29 who also live in Bangalore, and the server should return these records

To bypass these severe limitations, we consider also a di ff erent model in which the database server is split into two entities

(called here “server” and “proxy”), and privacy holds only so long as these two entities do not collude. This approach was taken in particular by De Cristofaro et al. [6], where they support private evaluation of a few simple query types and report performance very close to a non-private o ff -the-shelf MySQL system. However, the architecture of De Cristofaro et al. cannot handle conjunctions: the client can ask for all the records with age=25 OR name=‘Bob’, but cannot ask for all the records with age=25 AND name=‘Bob’. In this work we show how to implement conjunctions, disjunctions, and threshold queries in a similar architecture.

1.1. Our Protocols

The protocols and tools we present in this work are aimed at revealing to the client the indexes of the records that match its query, leaving it to a standard follow-up protocol to fetch the records themselves. Also, we only consider honest but curious security in this work. Our protocols can be enhanced to handle malicious adversaries using generic tools such as [7]. It is an interesting open problem to design more e ffi cient protocols in the malicious settings specific to the private database queries problem.

ICCIT15@CiTech, Bangalore

Page 358

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 polynomials using the clients’ query so as to obtain a new polynomial whose roots are the indexes of the matching records.

This representation is well suited for conjunction and threshold queries, since it lets us use techniques similar to the Kissner-

Song protocol for (multi-)set intersection [8] (building on prior work by Freedman et al. [9]). We sketch our protocols below.

1.1.1. The Two-Party Setting

In this setting, the server has a database and the client has a secret-key for a SWHE scheme. The server encodes the database as a bivariate polynomial D ( x, y ), where for every record number r and for every column (or attribute) a , if record r has value v for attribute a then D ( r, a ) = v . (The space that it takes to specify this polynomial D is roughly the same as the size of the original database.)

Consider a conjunction query specified by the attribute-value pairs { ( ai , vi ) : 1 ≤ i ≤ t} , i.e., the SQL query

SELECT ⋆ FROM db WHERE a 1 = v 1 AND · · · AND at

= vt The client constructs a univariate query polynomial Q ( y ) such that Q ( ai ) = vi for i = 1 , . . . , t , and sends to the server the encrypted coe ffi cients of the polynomial Q . For simplicity, we assume for now that the client also sends to the server all the attributes ai in the clear (but of course not the corresponding values vi ).

Given the database polynomial D ( x, y ) and the encrypted query polynomial Q ( y ), the server uses the additive homomorphism of the cryptosystem to compute the encrypted polynomial A ( x, y ) =

D ( x, y ) − Q ( y ). Note that for every record r in the database and every attribute ai in the query,we have A ( r, ai ) = 0 if and only if

D ( r, ai ) = vi , namely, this record matches the condition ai = vi from the query.

The server returns to the client the encrypted polynomial B, and the client decrypts and factors B to find its roots, thus learning the indexes of the records that match its query. The client can use PIR or ORAM protocols to get the records themselves. In

Section 6 we describe this protocol in more detail, and show how to adapt it to the harder case where the attributes ai must also be kept secret, and also how to modify it to handle disjunctions and threshold queries.

1.1.2

The Three-Party Setting

The three parties in this setting are a client with a query, a proxy that has an inverted index for the database (as well as an encrypted and permuted version of the database itself), and a server. We stress that we do not make a black-box use of the set-intersection protocol, in particular we don’t know if other protocols for set-intersection (e.g., [10, 11, 12]) can be used in our setting. Who prepared the inverted index during a preprocessing step and now keeps only the keys that were used to create this inverted index. Specifically, the server keeps some

“hashing keys” and the secret key for a SWHE scheme. For every attribute-value pair ( a, v ) in the database, the inverted index contains a record (tg , Enc ( A ( x ))) where tg is a tag, computed as tg=Hash(“ a = v ”), and A ( x ) is a polynomial whose roots are exactly the records indexes r that contain this attributevalue pair.

In the basic three-party protocol, given a query

SELECT ⋆ FROM db WHERE a 1 = v 1 AND · · · AND at

= vt the client (with oblivious help from the server) computes the tags tg i =Hash(“ ai = vi ”) and sends them to the proxy. The proxy fetches all the encrypted polynomials Ai ( x ), chooses random polynomials Ri ( x ) of “appropriate degrees” and computes the encrypted polynomial B ( x ) = i =1 Ri ( x ) Ai ( x ).

The proxy returns the encrypted B to the client, who again uses oblivious help from the server to decrypt B , and then factors it to find its roots, which are the indexes of the matching records

(whp).

This technique o ff ers a space/degree tradeo ff , where the proxy stores more information if we are to use a SWHE scheme supporting only lower-degree function. In one extreme case, using fully homomorphic encryption, the proxy need not store any more information than in the basic protocol above. On the other extreme, we can use a quadratic homomorphic scheme by having the proxy store roughly m times as much as in the basic protocol (for an m -record database). In the middle, we can have the proxy storage grow by an O (log m ) factor, and use a O (log m )-homomorphic scheme. We discuss in Section 3 some other optimizations to this three-party protocol. Also, in Section 4 we discuss an optimization that applies in both the 2-party and 3party settings, where we use homomorphic batching (similar to

[13, 14]) to speed up the computation.

2. HOMORPHIC ENCRYPTION

Fix a particular plaintext space P which is ring. (For example, our plaintext could be bits, P = F2,or binary polynomials modulo a cyclotomic polynomial, P =

F2[ X ] /

Φ m ( X ), etc.) Let C be a class of arithmetic circuits over the plaintext space P . A somewhat homomorphic (public-key) encryption relative to C is specified by the usual procedures Key

Gen , Enc , Dec (for key generation, encryption, and decryption, respectively) and the additional procedure Evalthat takes a circuit from C and one cipher text per input to that circuit, and returns one cipher text per output of that circuit. The security requirement is the usual notion of semantic-security [16], namely it should be hard to distinguish between the encryption of any two messages, even if the public key is known to the attacker and even if the two messages are chosen by the attacker.

The functionality requirement from homomorphic schemes is that for every circuit

π ∈ C and every set of inputs to

π

, if we choose at random the keys, then encrypt all the inputs, then run the Evalprocedure on these cipher texts and decrypt the result, we will get the same thing as evaluating π on this set of inputs

(except perhaps with a negligible probability). See [3] for more details. In this work we use “low degree” somewhat homomorphic encryption, namely homomorphic encryption schemes relative to the class of low degree polynomials. The degree of polynomials that we need to evaluate varies between protocols. Some require only additive homomorphism, while others require that the scheme support polynomials of higher degree (as much as O (log m ) for an m -record database).

Two important properties of SWHE schemes are compactness and circuit privacy . Compactness roughly means that the size of the evaluated cipher text does not depend on the complexity of the circuit that was evaluated. Circuit privacy means that even the holder of the secret key cannot learn from the evaluated cipher text anything about the circuit, beyond the output value.

ICCIT15@CiTech, Bangalore

Page 359

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

3. HOMOMORPHIC ENCRYPTION SCHEMES

Paillier Cryptosystem

Recall that the Paillier cryptosystem works over Z ∈ n 2 for an RSA-modulus n of unknown factorization. The scheme has plaintext space P = Z n and ciphertext space Z ∈ n 2 . The scheme is additively homomorphic, with homomorphic addition implemented by multiplying the corresponding cipher texts in

Z ∈ n 2 . Similarly, we can homomorphically multiply a ciphertext c Z

∈ n 2 by a constant a

Z n by computing ca mod n 2.

Brakerski’s Leveled Homomorphic Cryptosystem For our homomorphic encryption system, we use a ring-LWE-based variant of Brakerski’s scale-invariant homomorphic cryptosystem [5].Specifically, our implementation operates over polynomial rings modulo a cyclotomicpolynomial.LetΦ m ( x ) denote the m thcyclotomic polynomial. Then, we work over the ring R = Z[ x ] / Φ m ( x ).Specifically, we take our plaintext space to be P = Rp = Z p [ x ] / Φ m ( x ) and our ciphertext space to be Rq =

Z q [ x ] / Φ m ( x ). In this scheme, our secret keys and ciphertexts are vectors of elements in Rq . Now, if c 1 and c 2 are encryptions of messages m 1 and m 2 under a secret key s , then c 1 + c 2 is an encryption of m 1 + m 2. To homomorphically multiply a ciphertext c by a public scalar a ∈ Rp ,we compute a c .

Homomorphic multiplication of two ciphertexts is performed using a scaled tensor product. That is, c prod= ⌊ pq· ( c 1 ∈ c 2) ⌉ is an encryption of m 1 m 2 under the tensored secret key s ⊗ s .

Here, [x] denotes rounding x to the nearest integer. Using a technique called key-switching, the resulting product ciphertext c prodcan be transformed into a regular ciphertext c ′ prod encrypted under s such that c

′ prod is a valid encryption of m 1 m 2. As noted in Section 4, one of the main advantages of using a ring-LWE-based homomorphic scheme is the fact that we can pack multiple plaintext messages into one ciphertext using a technique called batching. To use batching we partition a database with r records into ℓ separate databases, each containing approximately r/ℓ records. If we assume that the records associated with each tag are split uniformly across the databases, then the degrees of the underlying polynomials are correspondingly reduced by a factor of ℓ . In our implementation, ℓ ≥ 5000, so this translates to a substantial improvement in performance.We now consider a choice for the plaintext modulus p for use in the Brakerski scheme. From Lemma 1, we have that the probability of a false positive (mistaking an element not in the intersection to be in the intersection) is given by |U| / | F p| . If we tolerate a false positive rate of at most 0 < λ

< 1, then we require that | F p| ≥ 1

λ|U|

= r λ

, where r is the number of records in the database.Additionally, to maximize the number of plaintext slots, we choose p such that p = 1 (mod m ).

To summarize, we choose our plaintext modulus p such that p =

1 (mod m ) and p ≥ rλ

4. EXPERIMENTAL SETUP

We can implement the three-party protocol using both the Paillier and Brakerski cryptosystems as the underlying homomorphic encryption scheme. Our implementation was done in C++ using the NTL library over GMP. Our code was compiled using g++ 4.6.3 on Ubuntu 12.04. We ran all timing experiments on cluster machines with multicore AMD Opteron processors running at 2.1 GHz. The machines had 512 KB of cache and 96 GB of available memory. Note that because NTL is not thread-safe, all of our experiments were conducted in a single-threaded, single-processor environment. Memory usage during the computation generally stayed below 10 GB. In the

Paillier-based scheme, we used a 1024-bit RSA modulus for all of our experiments. For the Brakerski system, we chose parameters m, p, q to obtain 128-bit security and a false positive rate of λ = 10 − 3 according to the analysis presented in the

Appendix. Since the Brakerski system supports both the batching and modular reduction optimizations described in

Section 4 and Section 3.2, respectively, we considered three different experimental setups to assess the viability of these optimizations. Below, we describe each of our experiments. The parameters used in our FHE scheme for each setup is listed in

Table 1.

NoMR:

Brakerski scheme without the modular reduction optimization.

In the NoMRsetup, we just used the batching capabilities of the Brakerski system. Since we were not performing the modular reduction optimization from Section 3.2, this setup only required homomorphic addition. Because we did not need homomorphic multiplication, we were able to use smaller parameters for the Brakerski system, and therefore, reduce the cost of each homomorphic operation.

MR:

Brakerski scheme with the modular reduction optimization . In the MR setup, we considered the modular reduction optimization. Recall that in the final step of the three-party protocol, the proxy computes the polynomial B ( x ) = A 1( x ) R 1( x )

+ A′ ( x ) R′ ( x ) where the degree of A 1( x ) is less than the degree of

A′ ( x ). When we perform modular reduction, we first compute

A′ ( x ) (mod A 1( x )) and then compute B ( x ) (mod A 1( x )). Observe that this optimization both reduces the degree of the polynomial

B ( x ) that the proxy sends to the client as well as the cost of the computation of B ( x ). To perform this optimization, the FHE scheme must support at least one multiplication. Enabling support for homomorphic multiplication translated to larger parameters in the scheme, thus increasing the cost of each homomorphic operation. Since we are performing fewer operations overall, however, the modular reduction can yield substantial gains in the case where there is a significant difference in the number of records associated with the smallest and largest tag in the query. We assessed these tradeoffs in the

MR experiment. Due to the significant cost of performing homomorphic multiplication, we focused on the case where we just needed a single multiply.

MRNoKS:

Brakerski scheme with the modular reduction optimization but without key switching.

Recall that when we homomorphically multiply two ciphertexts in the Brakerski system, we obtain a tensoredciphertext (e.g., a higherdimensional ciphertext) encrypted under a tensored secret key.

Normally, we perform a key-switching operation that transforms the tensoredciphertext into a new ciphertext encrypted under the normal secret key. If left unchecked, the length of the ciphertexts grows exponentially with the number of successive multiplications. Thus, the key-switching procedure is important

ICCIT15@CiTech, Bangalore

Page 360

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 for constraining the length of the ciphertexts. In our application, we perform a single multiplication, and so the key-switching procedure may be unnecessary. Since the keyswitching operation has non-negligible cost, we can achieve improved performance at the expense of slightly longer ciphertexts (and thus, increased bandwidth) by not performing the key switch.We assessed this time/space tradeoff in the third setup, denoted

MRNoKS.

Query type

In each of our experiments, we operated over a database with 106 records and performed queries consisting of five tags.

As usual, let d 1 ≤ d 2 ≤ · · · ≤ d 5 denote the number of elements associated with each tag tg1 , . . . , tg5. We profiled our system on two different sets of queries: balanced queries and unbalanced queries. In a balanced query, the number of elements associated with each tag was approximately the same: d 1 ≈ d 2 ≈ · · · ≈ d 5.

In an unbalanced query, the number of elements associated with each tag varies significantly. Specifically, d 1 is at most 5% of d 5. There are many examples where a query would be unbalanced. For instance, consider a database of people living in

Northern California and suppose we run a query for the records of people named Joe and who live in San Francisco. In this case, the number of people living in San Francisco will be significantly greater than the number of people named Joe.

Queries like these where we compute an intersection of a large set with a much smaller set are very common and so, it is important that we can perform such queries efficiently. Finally, for each query, we measured the computation time as well as the total network bandwidth required by each of our setups. Note that due to the poor scalability of the Paillier system, we were not able to perform the full set of experiments using the Paillier cryptosystem.

5. CONCLUSION AND FUTURE ENHANCEMENT

This paper presents new protocols and tools that can be used to construct a private database query system supporting a rich set of queries. We showed how a polynomial representation of the database enables private conjunction, range, and threshold queries. The basic schemes uses only an additive homomorphic system like Paillier, but we showed that significant performance improvements can be obtained using a stronger homomorphic system that supports both homomorphic additions and a few homomorphic multiplications on ciphertexts. Our experiments quantify this improvement showing a real-world example where lattice-based homomorphic systems can outperform their factoring-based counterparts.

REFERENCES i.

B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private information retrieval,” J. ACM, vol. 45, no. 6, pp. 965–981, 1998. ii.

Y. Gertner, Y. Ishai, E. Kushilevitz, and T. Malkin, “Protecting data privacy in private information retrieval schemes,” in STOC '98, 1998, pp. 151–

160. iii.

C. Gentry, “A fully homomorphic encryption scheme,” Ph.D. dissertation, Stanford University, 2009, crypto.stanford.edu/craig. iv.

Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “Fully homomorphic encryption without bootstrapping,” in Innovations in Theoretical Computer

Science (ITCS'12), 2012, available at http://eprint.iacr.org/2011/277. v.

Z. Brakerski, “Fully homomorphic encryption without modulus switching from classical gapsvp,” in Advances in Cryptology - Crypto 2012, ser.

Lecture Notes in Computer Science, vol. 7417. Springer, 2012, pp. 868–886. vi.

E. D. Cristofaro, Y. Lu, and G. Tsudik, “Efficient techniques for privacy-preserving sharing of sensitive information,” in Trust and Trustworthy

Computing - TRUST 2011, ser. Lecture Notes in Computer Science, J.M.

McCune, B. Balacheff, A. Perrig, A.-R. Sadeghi, A. Sasse,and Y. Beres, Eds., vol.

6740. Springer, 2011, pp. 239–253. vii.

Y. Ishai, M. Prabhakaran, and A. Sahai, “Founding cryptography on oblivious transfer - efficiently,” in CRYPTO, 2008, pp. 572–591. viii.

L. Kissner and D. X. Song, “Privacy-preserving set operations,” in

Advances in ryptology -CRYPTO 2005, ser. Lecture Notes in Computer Science,

V. Shoup, Ed., vol. 3621. Springer, 2005, pp. 241–257. ix.

M. J. Freedman, K. Nissim, and B. Pinkas, “Efficient private matching and set intersection,” in Advances in Cryptology - EUROCRYPT 2004, ser. Lecture Notes in Computer Science, C. Cachin and J. Camenisch, Eds., vol.

3027. Springer, 2004, pp. 1–19. x.

S. Jarecki and X. Liu, “Fast secure computation of set intersection,” in Security and Cryp-tography for Networks - SCN 2010, ser. Lecture Notes in

Computer Science, J. A. Garay and R. D. Prisco, Eds., vol. 6280. Springer, 2010, pp. 418–435. xi.

E. D. Cristofaro, J. Kim, and G. Tsudik, “Linear-complexity private set intersection protocols secure in malicious model,” in Advances in Cryptology

- ASIACRYPT 2010, ser. Lecture Notes in Computer Science, M. Abe, Ed., vol.

6477. Springer, 2010, pp. 213–231. xii.

Y. Huang, D. Evans, and J. Katz, “Private set intersection: Are garbled circuits better than custom protocols?” in Proceedings of the Network and Distributed System Security Symposium - NDSS 2012. The Internet Society,

2012. xiii.

N. P. Smart and F. Vercauteren, “Fully homomorphic SIMD operations,” Manuscript at http://eprint.iacr.org/2011/133, 2011. xiv.

C. Gentry, S. Halevi, and N. P. Smart, “Fully homomorphic encryption with polylog overhead,”Eurocrypt 2012, to appear. Full version at http://eprint.iacr.org/2011/566, 2012. xv.

P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in Proc.of EUROCRYPT'99, 1999, pp. 223–238. xvi.

Private Database Queries Using Somewhat Homomorphic

Encryption Dan Boneh Stanford University Craig Gentry IBM Research

ShaiHalevi IBM Research Frank Wang Stanford University David Wu Stanford

University

ICCIT15@CiTech, Bangalore

Page 361

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Intrusion Detection System against Bandwidth DDoS Attack

ABSTRACT : Distributed denial of service (DDoS) is a rapidly growing problem. Traditional architecture of the internet is more exposed to Bandwidth distributed denial of service (BW-DDoS) attacks. These attacks disrupt network infrastructure operation by sending a huge number of packets to cause congestion and delayed response. Attacker disrupts connectivity between client and server. According a recent survey of Akamai’s Prolexic Quarterly Global DDoS Attack

Report Quarter 4 of 2014 Compared to Q1 of 2014 then find

39% increase in bandwidth-DDoS attack. In this paper, we exposed the different types of BW-DDoS attacks on the internet and also we build an intrusion detection system to detect DDoS attacks.

Keywords — DDOS, Bandwidth-DDoS, Internet,

Congestion.

I. I n t r o d u c t i o n

The internet is a group of two or more devices or nodes or terminals which are connected by a large number of network devices. Denial of Service (DoS) attacks is very common in the world of internet today. A distributed denial of service (DDoS) attack is a form of DoS, which uses multiple machines to prevent the legitimate use of services. Internet services are more exposed to Bandwidth DDoS attacks. The increase of these attacks has made servers and network devices at risk.

BW-DDoS aim to deny normal services for legitimate users by sending huge traffic to the machines or networks to exhaust services, connection and the bandwidth. The BW-DDoS attacker uses the different methods and attacking agents like zombies. Zombies are groups of computers connected to internet that compromised by an attacker and it can be used to perform malicious task on victim and Fig.1.1 shows the attacker uses three zombies to generate high volume of malicious traffic to network over Internet cause legitimate user unable to access the services.

Fig. 1.1 Illustration of BW-DDoS Attack scenarios

To solve the security issues we need an Intrusion detection system. We used intrusion detection method to detect the intruder in the network. IDS can be categorized into two models: Signature-based intrusion detection and anomaly-based intrusion detection. At Signature-based intrusion detection will

ICCIT15@CiTech, Bangalore

Basavaraj Muragod, SaiMadhvi.D

Computer Science & Engg Dept, RYMEC , Bellary KA , India

Email:muragodbasu05@gmail.com, saidmadhavi@yahoo.co.in

monitor packets on the network and compare them against a database of signatures or attributes from known malicious threats. In Anomaly-based intrusion detection will monitor network traffic and compare it against base profile. The base profile will identify what is normal for that network and what sort of bandwidth is used and what protocols and ports are used, and devices are connected to each other and alert the user when traffic is detected which is anomalous and significantly different than the base profile. In this paper, we used anomaly intrusion detection method to identify the intruder in the network.

II. Material and Methodology

We observed some information on DDoS attack statistics obtained in the first quarter of 2014 on networks of various sectors in the world including financial sector networks. The source of data is Prolexic Attack Report Q4 2014 [2] provided by Prolexic Technologies. The world largest and most trusted

DDoS attack mitigation provider. Ten of the world’s largest banks and the leading e-commerce companies get the services of Prolexic to protect themselves from DDoS attacks. The range of data is based on all DDoS attacks dealt by Prolexic in different regions of the world. Some key information extracted from the report regarding the comparison of first quarter of 2014 with the last quarter of 2014 is: i) Total number of DDoS attacks was increased by

25%.

ii) Total number of BW-DDoS attacks increased by

39%.

iii) 60 to 86.5 percent of BW-DDoS attacks targeted the network.

iv) A decline was observed in UDP flood attacks

2.1. Motivation behind BW-DDoS Attacks

The motivation behind the BW-DDoS attacks by personal, social or financial benefits. Attacker may do so due to personal revenge, getting publicity or some political motivation. However, most BW-DDoS attacks are launched by organized groups targeting financial websites such as banks or stock exchange.

2.2. Different Types of BW-DDoS Attacks

In this paper, we discussed different types of BW-DDoS attacks on the internet.

2.2.1. Flood Attack

Flood attack is a direct attack in which zombies flood the victim system directly with amount of traffic. The large the amount of traffic changes the victim’s network bandwidth so that other legitimate users are not able to access the service or experience server slow down. Normally in those attacks, the following packets are used.

Page 362

International Journal of Engineering Research

Volume No.4, Issue Special 5

TCP floods: A stream of TCP packets with various flags set are sent to the victims IP address. The SYN, ACK and RST flags are commonly used

ICMP/IGMP Flood: The Internet Control Message protocol

(ICMP) is used for sending control messages in a TCP/IP network and reporting possible errors in the communication environment. The Internet Group management Protocol

(IGMP) is a communication protocol used by the host and router on an IP network to establish multicast group memberships. The attacker launches flood attack on the target with massive ICMP/IGMP message to consume the bandwidth of the target.

UDP flood: In UDP (user datagram protocol) flood attack is a connectionless transport protocol. The Attacker send huge amount of UDP packets to the target. It can lead congestion in victim’s bandwidth and degrade services.

2.2.2. Reflection Attack

Reflection attacks fool legitimate users by sending not requested response to victim hosts. The reflection attack also known as DRDoS (Distributed Reflection Denial of Service). It exploits the requested responses of routers and servers

(reflectors) and to reflect the attack traffic as well as hide the attack source.

Fig.2.2 Reflection attack Scenario

2.2.3. Amplification attack

Amplification attack most effectively uses the zombie’s bandwidth. Each packet sent by a compromised computer cause transmission of large packets to the victim by non compromised machines. The response data must be larger than the request data in size. The larger the amplification means effective bandwidth consumption.

DNS Amplification Attack

Domain Name System (DNS) is a core service of the Internet.

Since the DNS response packets are larger than the query packet. The attackers send queries to the open DNS resolvers with large size UDP messages and spoof the source IP address as the target address. Upon receiving the query request, the

DNS resolver will send back the resolution to the attack target.

Flooded by large quantities of resolution responses, the target will suffer network congestion leading to the Bandwidth distributed denial of services.

2.3. Methods for Attack detection and mitigation

The proposed system used to identify the attacker and traffic in the network. These methods includes filtering, rate limiting, detouring method are used.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

2.3.1. Filtering

Filtering method takes place in various locations in the network. It can be used at source end or core end (router) or victim end. To be effective BW-DDoS detection filtering must occur before the congested link.

2.3.2. Rate limiting

The rate limiting method used to limit or block the unwanted traffic flow in the network.

2.3.3. Detouring

Detouring method used to bypass the default routing if attack happened in the network. It reduces the bandwidth-DDoS attack when some routes are congested.

In above mentioned methods can be applied by control centers that may be located at different locations in network such as source end , core end, victims end, distributed ends.

At source end detection, the source devices identify malicious packets in outgoing traffic and filter or rate-limit the traffic.

At core end detection, any router in the network can identify the malicious traffic and filter or rate limit. In case of filtering technique, it is possibility that legitimate packets would also be dropped. Also it is better place to rate limit all the traffic.

At victim end detection, the victim detects malicious incoming traffic and filter or rate limits the same. It is a place where a legitimate and traffic can be clearly distinguished.

Hence attack traffic reaching the victim may have several effects such as denied services and bandwidth saturation.

In our proposed system, we used distributed ends because distribution of the methods of detection and mitigation at different ends can be more advantage. Attack can be identified at victim end for which an attack signature can be generated. Based on this signature, the victim can send request to upstream routers to rate limit such attack traffic and we used intrusion detection system to detect attacks and prevention system at network level.

III. Results and Tables

The simulation is implemented in the java platform. In our simulation, we used some parameters to establish a proposed system to identify intruder in the network and how performance of network affected by these attack? Simulation parameters are provided in table I. we implement router based identification to send queries to the destination based on the bandwidth provided to the intermediate nodes. We observed that attacker like zombies act like it’s having higher bandwidth to transfer query to destination. It leads to access traffic to network and delays the response from the network.

Table I Simulation parameters

Number of nodes 10

Mac

Simulation Time

802.11

20sec

Traffic Source

Packet Size

Dimension of area

CBR

512

800×600

ICCIT15@CiTech, Bangalore

Page 363

International Journal of Engineering Research

Volume No.4, Issue Special 5

The simulation is carried with normal case and attack case. We simulated BW-DDoS attacks consisting of client and server and in between number zombies are distributed. Each zombie has some bandwidth and sends its traffic to server results traffic in the network and using IDS manger find out the attacker shows in fig. 3.1 shows the bandwidth Vs time.

Fig. 3.1.Bandwidth Vs time

This graph represents heavy traffic in the network duet to the attacking flow; hence congestion occurred in the network.

Since the proposed scheme is identified congestion in the network using IDS manager.

IV. Conclusion

The ultimate goal of bandwidth DDoS attacks is targeting the server and using up the bandwidth of the network device is to consume the bandwidth of the server or the link of the network devices. It causes the legitimate traffic and unable to access the target system. Our proposed mechanism helps to detect bandwidth DDoS attacks. We believe that this is an acceptable performance, given that the attack prevented has a large impact on the performance of the protocol. The proposed mechanism can also be helped in for securing the network from other

DDoS attacks by changing the security parameters in accordance with the nature of the attacks.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Acknowledgement

It’s my immense pleasure to express my deep sense of gratitude and indebtedness to my highly respected and esteemed Miss. SaiMadhavi.D

her invaluable guidance, inspiration, constant encouragement sincere criticism and sympathetic attitude could make this paper possible

References i.

A . Mitrokotsa, and C. Douligeris, “Denial -ofService Attacks,”

NetworkSecurity: Current Status and Future Directions (Chapter 8), Wiley

Online Library, pp. 117-134, June 2006.

ii.

P. T. Inc., “Prolexic Attack Report, Q3 2014 – Q1 2013 ,” http://www.prolexic.com/attackreports,.

iii.

L. Zhang, S. Yu, D. Wu, and P. Watters, “A Survey on Latest

BotnetAttack and Defense,” Proc. of 10th Intl’ Conference On Trust, Security

And Privacy in Computing and Communications (TrustCom), IEEE, pp.

53-60, November 2011.

iv.

A. Mishra, B.B. Gupta, and R.C. Joshi, “A Comparative Study of

Distributed Denial of Service Attacks, Intrusion Tolerance and Mitigation

Techniques,” Proc. of European Intelligence and Security Informatics

Conference (EISIC), IEEE, pp. 286-289, September 2011.

v.

K. W. M. Ghazali, and R. Hassan, “Flooding Distributed Denial of Service AttacksA Review,” Journal of Computer Science 7 (8), Science

Publications, 2011, pp. 1218-1223.

vi.

H. Beitollahi, and G. Deconinck, “Denial of Service Attacks: A

Tutorial,”, Electrical Engineering Department (ESAT), University of

Leuven, Technical Report: 08-2011-0115, August 2011 vii.

N. Ahlawat, and C. Sharma, “Classification and Prevention of

Distributed Denial of Service Attacks,” International Journal of Advanced

Engineering Sciences and Technologies, vol. 3, issue 1, 2011, pp. 52-60..

viii.

B. B. Gupta, P. K. Agrawal, R. C. Joshi, and M. Misra,

“Estimating Strength of a DDoS Attack Using Multiple Regression

Analysis,” Communications in Computer and Information Science, Springer,

2011,vol. 133, part 3, pp. 280-289.

ix.

B. B. Gupta, P. K. Agrawal, A. Mishra, and M. K. Pattanshetti,

“On Estimating Strength of a DDoS Attack Using Polynomial Regression

Model,” Communications in Computer and Information Science,

Springer, 2011, vol. 193, part 2, pp. 244-249.

x.

S. Kent, and R. Atkinson, “Security Arch itecture for the Internet

Protocol,” RFC 2401, November 1998.

ICCIT15@CiTech, Bangalore

Page 364

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Information Retrieval With Keywords Using Nearest Neighbor Search

Prathibha

Deptt of CSE, VTU, SJBIT Bangalore, Karnataka India

Abstract: pattidiggavi@gmail.com

Many search engines are used to search anything from anywhere , this system is used to search fast nearest number of objects to be examined. The IR2-tree, however, also neighbor using keyword in which the existing system works mainly on finding top-k nearest neighbor, whereas each node has to match whole querying keywords. It does not consider the density of objects in spatial space also this are low efficient for inherits a drawback of signature files: false hits . That is, a signature file, due to its conservative nature, may still direct the search to some objects, even though they do not have all the keywords. To overcome such problems involved spatial inverted incremental queries. The existing system works on Ir2-Tree but it is not such efficient, to overcome this the triplet form of spatial index are introduced and which is accurate and consist of efficient responsive time.

Keywords—query keywords,NN search,IR 2 tree index which will convert in tuples and find the matching keyword through latitude and longitude using the nearest neighbour search. Spatial keyword query typically takes a location and a set of keywords as input parameters and returns the matched objects according to certain spatial constraints and textual patterns.

I.

I

NTRODUCTION

In modern century technology place an important role, it shows an immense presence of every person, and it has reduced human works. Under certain circumstances they act to be tedious, preferably to the middle educated people. Here in our nearest neighboring search, it finds the nearest locations available to the user. We may guess that we can find solution through internet.

Yes it is possible, but it takes some smothering steps to locate our destination. It gives a lot of inappropriate details and it takes time to solve the process. This can be done when there is no emergency, but in the most of situation time place a predominant role. People search instantaneously, so time act as a crucial factor. Here by using our method we can easily track down the exact place. There are easy ways to support queries that combinespatial and text features. For example, for the abovequery, we could first fetch all the restaurants whose menus contain the set of keywords and then from the retrieved restaurants, findthe nearest one. Similarly, one could also do it reverselyby targeting first the spatial conditions – browse all the restaurants in ascending order of their distances to the query point until encountering one whose menu has all the keywords.

The major drawback of these straightforward approaches is that they will fail to provide real time answers on difficult inputs. A typical example is that the real nearest neighbor lies quite faraway from the query point, while all the closer neighbor are missing at least one of the query keywords .

The best method to date for nearest neighbor search with keywords is due to Felipe et al. [12]. They nicely integrate two well-known concepts: Rtree [2], a popular spatial index, and signature file [11], an effective method for keyword-based document retrieval. By doing so they develop a structure called the IR 2 -tree [12], which has the strengths of both R-trees and signature files. Like Rtrees, the IR2-tree preserves objects’ spatial proximity, which is the key to solving spatial queries efficiently. On the other hand, like signature files, the IR2-tree is able to filter a considerable portion of the objects that do not contain all the query keywords, thus significantly reducing the

II.

RELATED WORK

Literature review is contains the points IR2 - Tree, Drawbacks of the IR2-tree, Previous methods. IR2 – Tree The IR2 – Tree

[1] combines the R-Tree and signature file. First we will review

Signature files. Then IR2-trees are discussed. Consider the knowledge of R-trees and the best- first algorithm [1] for Near

Neighbor Search. Signature file is known as a hashing-based framework and hashing -based framework is which is known as superimposed coding (SC)[1].

Drawbacks of the IR2-Tree

IR2-Tree is first access method to answer nearest neighbour queries. IR2-tree is popular technique for indexing data but it having some drawbacks, which impacted on its efficiency. The disadvantage called as false hit affecting it seriously. The number of false positive ratio is large when the aim of the final result is far away from the query point and also when the result is simply empty. In these cases, the query algorithm will load the documents of many objects; as each loading necessitates a random access, it acquires costly overhead [1].

Keyword search on spatial databases This work, mainly focus on finding top-k Nearest Neighbors, in this method each node has to match the whole querying keywords. As this method match the whole query to each node, it does not consider the density of data objects in the spatial space. When number of queries increases then it leads to lower the e ffi ciency and speed.

They present an e ffi cient method to answer top-k spatial keyword queries. This work has the following contributions: 1) the problem of top-k spatial keyword search is defined. 2) The

IR2-Tree is proposed as an e ffi cient indexing structure to store spatial and textual information for a set of objects. There are e ffi cient algorithms are used to maintain the IR2-tree, that is, insert and delete objects. 3) An e ffi cient incremental algorithm is presented to answer top-k spatial keyword queries using the IR2-

Tree. Its performance is estimated and compared to the current approaches. Real datasets are used in our experiments that show the significant improvement in execution times. Disadvantages: -

ICCIT15@CiTech, Bangalore

Page 365

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

1. Each node has to match with querying keyword. So it a ff ects on performance also it becomes time consuming and maximizing searching space.

2. IR2-tree has some drawbacks.

Processing Spatial-Keyword (SK) Queries in Geographic

Information Retrieval (GIR) Systems. Location based information stored in GIS database. These information entities of such databases have both spatial and textual descriptions. This paper proposes a framework for GIR system and focus on indexing strategies that can process spatial keyword query. The following contributions in this paper: 1) It gives framework for query processing in Geo- graphic Information Retrieval (GIR)

Systems. 2) Develop a novel indexing structure called KR*-tree that captures the joint distribution of keywords in space and significantly improves performance over existing index structures. 3) This method have conducted experiments on real

GIS datasets showing the e ff ectiveness of our techniques compared to the existing solutions. It introduces two index structures to store spatial and textual information.

A) Separate index for spatial and text attributes:

Advantages: -

1. Easy of maintaining two separate indices.

2. Performance bottleneck lies in the number of candidate object generated during the filtering stage.

Disadvantages: -

1. If spatial filtering is done first, many objects may lie within a query is spatial extent, but very few of them are relevant to query keywords. This increases the disk access cost by generating a large number of candidate objects. The subsequent stage of keyword filtering becomes expensive.

B) Hybrid index

Advantages and limitations: -

1.

When query contains keywords that closely correlated in space, this approach su ff er from paying extra disk cost accessing

R*-tree and high overhead in subsequent merging process.

Hybrid Index Structures for Location-based Web Search.

There is more and more research interest in location-based web search, i.e. searching web content whose topic is related to a particular place or region. This type of search contains location information; it should be indexed as well as text information. text search engine is set-oriented where as location information is two-dimensional and in Euclidean space. In previous paper we see same two indexes for spatial as well as text information. This creates new problem, i.e. how to combine two types of indexes.

This paper uses hybrid index structure, to handle textual and location based queries, with help of inverted files and R*-trees.

It considered three strategies to combine these indexes namely:

1) inverted file and R*-tree double index.2) first inverted file then R*-tree.3) first R*-tree then inverted file. It implements search engine to check performance of hybrid structure, that contains four parts:(1) an extractor which detects geographical scopes of web pages and represents geographical scopes as multiple MBRs based on geographical coordinates. (2) The work

ICCIT15@CiTech, Bangalore of indexer is use to build hybrid index structures integrate text and location information. (3) The work of ranker is to ranks

Disadvantages: -

1. Indexer wants to build hybrid index structures to integrate text and location information of web pages. To textually index web pages, inverted files are a good. To spatially index web pages, two-dimensional spatial indexes are used, both include di ff erent approaches, this cause to degrading performance of indexer.

2. In ranking phase, it combine geographical ranking and nongeographical ranking, combination of two rankings and the computation of geographical relevance may a ff ects on performance of ranking.

III.

PROPOSED SYSTEM

A spatial database manages dimensional objects (such as points, rectangles, etc.) and provides quick access to those objects. The importance of spatial databases is, it represents entities of reality in geometric manner. For example, locations of restaurants, hotels, hospitals are described as points in map, whereas larger extents like parks, lakes and landscapes as a mix of rectangles.

In this paper we design a proposed system called spatial inverted index (SI-index). SI-index preserves the spatial location of data points and builds R-tree on every inverted list at little space overhead. The figure1 shows architecture diagram of Proposed system, end user sends a spatial keyword query to the server, server retrieves data from database to process the query and sends results back to end user. Server frequently updates location information to the database.

Fig.1 Architecture of the proposed system

Let us take take an example as shown in the figure2 first we start by locating the leaf node(s) containing q . Next, imagine a circle centered at q being expanded from a starting radius of 0; we call this circle the search region . Each time the circle hits the boundary of a node region, the contents of that node are put on the queue, and each time the circle hits an object, we have found the object next nearest to q . Note that when the circle hits a node or an object, we are guaranteed that the node or object is already in the priority queue, since the node that contains it must already have been hit (this is guaranteed by the consistency condition).

Page 366

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Fig.2 The circle around query object q depicts the search region after reporting o as next nearest object.

We design a variant of inverted index that is optimized for multidimensional points, and is thus named the spatial inverted index (SI-index).This access method successfully incorporates point coordinates into a conventional inverted index with small extra space, owing to a delicate compact storage scheme.

Meanwhile, an SI-index preserves the spatial locality of data points, and comes with an R-tree built on every inverted list at little space overhead. As a result, it offers two competing ways for query processing.

• We can (sequentially) merge multiple lists very much like merging traditional inverted lists by ids.

• Alternatively, we can also leverage the R-trees to browse the points of all relevant lists in ascending order of their distances to the query point.

NN Search Algorithm

Figure3 shows the flow diagram of the NN search in which first it gives the latitude and longitude with keywords and range and then it calculates the distance of nearest one and sort the distances in ascending order and filter the places and match the keywords with query keywords and then finally give matched placed list.

The figure above shows the comparision result of the both existing IR2 tree and the proposed NN search algorithm in mili seconds and prove the effieciency of the NN search.

V.

CONCLUSION

There are many applications seen for calling a search engine thats ready to with efficiency support novel varieties of abstraction queries that are integrated with keyword search. The present solutions to such queries either incur preventative space consumption or are unable to provide real time answers. The planned system has remedied the situation by developing an access methodology referred to as the abstraction Inverted index

(SI-index). Not solely that the SI-index is fairly space economical, however additionally its the flexibility to perform keyword-augmented nearest neighbour search in time thas at the order of dozens of milliseconds. Moreover, because the SI- index relies on the standard technology of inverted index, its readily incorporable in a business search engine that applies huge similarity, implying its immediate industrial merits.

Lat,lang keyword range

Distance calculation

Process

Sorting

Process

Filtered Places

Keyword matching algorithm

Fig.3 Flow diagram of NN Search

Matched places list

IV.

RESULT AND ANALYSIS

When look on to the comparisons between existing and proposed system, the primary set of experiments is to check the performance of various mixtures of fast neighbour search and existing search methods. All methods are tested below two request patterns: information analysis and results. In additional specific the chapter particularly curious about the overall number of results and search delay during a spatial data search and also the average interval of an information extraction since they are the dominant factors affecting service quality experienced by the users.

ICCIT15@CiTech, Bangalore

REFERENCES i.

D. Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In Proc. of International Conference on Data Engineering (ICDE), pages 656–665, 2008. ii.

X. Cao, L. Chen, G. Cong, C. S. Jensen, Q. Qu, A. Skovsgaard, D.

Wu, and M. L. Yiu. Spatial keyword querying. In ER, pages 16–29, 2012. iii.

G. Cong, C. S. Jensen, and D. Wu. Efficient retrieval of the top-k most relevant spatial web objects. PVLDB, 2(1):337–348, 2009. iv.

R. Hariharan, B. Hore, C. Li, and S. Mehrotra . Processing spatial keyword (SK) queries in geographic information retrieval (GIR) systems. In

Proc. of Scientific and Statistical Database Management(SSDBM), 2007. v.

Yanwei Xu,Jihong Guan,Fengrong Li,shuigeng Zhou Scalable continual top-k keyword search in relational databases. Data and knowledge

Engineering 86(2013)206-223. vi.

Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In Proc. of Very Large Data Bases (VLDB), pages 670–

681, 2002.

I.

Kamel and C. Faloutsos. Hilbert R-tree: An improved r-tree using fractals. In Proc. of Very Large Data Bases (VLDB), pages 500–509, 1994. vii.

Lu, Y. Lu, and G. Cong. Reverse spatial and textual k nearest neighbor search. In Proc. of ACM Management of Data (SIGMOD), pages 349–

360, 2011. viii.

S. Stiassny. mathematical analysis of various superimposedcoding methods. Am. Doc., 11(2):155–169, 1960. ix.

D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M.

Kitsuregawa. Keyword search in spatial databases: Towards searching by document. In Proc. of International Conference on Data Engineering (ICDE), pages 688–699, 2009.

Page 367

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Mining the Frequent Item Set Using Pre Post Algorithm

Manasa M. S., Shivakumar Dallali

Deptt of CSE, CiTech ,K R Puram, Bengaluru ,Karnataka, India

Abstract : The fundamental technique adopted in data mining to retrieve the data from the database are apriori algorithm, fp tree, ecalt algorithm using the rule of data mining like association rule, classification, clustering etc. The apriori algorithm traverse many times into database to generate frequent item and it take more space, where as the fp tree is advantageous compare to the apriori algorithm but it will not use the memory up to mark. So to overcome the drawback of this algorithm, in this paper we are adopting new technique called ppc-tree, this tree is constructed based on the pre -post traverse, the prepostalgorithm it construct the tree based vertical traverse of the database, it also scan the database twice and construct the tree, the ppc tree look similar to fp-tree but tree constructed in the vertical. time and space utilized by this algorithm is less compare to other technique,the experiment show the performance, stability and scalability of the algorithm.

Keywords: apriori algorithm, fp tree, ppc tree, prepost algorithm

1 Introduction

The new technique was proposed for data mining that is frequent item set. It was proposed by threepeople, they are

Agrawal, Imielinski, and Swami(1993). Since the frequent item set is a technique it used with basic technique of datamining like classification ,clustering, etc..And based on this technique new algorithm was proposed for every application, that algorithm provide more efficiency, scalability,optimal method.The

frequent item set thecnique is further classified into three group.

1Technique based candidate generate-and-test strategy: this is the basic technique of dataming, here data set will generated repeatedly until all the candidates generated. Its work like first it generate the one set candidate and further the first generated candidate set is used to generate the next set of candidate set and this set used to generate next set of candidate.This method will be continued till the all candidate is generated.

2Technique based on divide and conquer strategy: this method of dataset is compressed using divide and conquer method to construct the tree like structure like fp tree and frequent item etc…this fptree were used to understand the reduce the space and to increase the time efficiency

3Technique based on hybrid approach: here the both technique like candidate generate and test strategy and the divide and conquer method is combined to generate the data set.The

candidate generate and test method is used to compress the candidate set is vertical data format and the divide conquer

ICCIT15@CiTech, Bangalore method is used to construct the tree like structure and frequent item set for the data stored in the vertical data format.

There are many technique to find out the frequent item set but they were having many drawbacks like it take more time to construct the tree structure and they are becaming more complex to understand and it need more space to store data. So overcome this drawback, in recent years Deng and Wang gave us the new technique called prepost code to generate the frequent dataset, this method is based on the fp tree structure.

Here the data will be stored in the form of tree like structure.

The prepostcode(ppc tree) is having two step of execution. First it construct the tree like structure by traversing into data set and then using the tree structure it construct the frequent item set using the aprori algorithm.

2 Related work

The algorithm that we are using now for mining the frequent itemset is the combination of the apriori method and fp growth method. The apriori methodis scanning the database and prune for the frequent item set. apriori algorithm will work based on the candidate generate and test strategy, it scan the database of the n item, if item present in the database is not frequent then it generate the k candidate itemset of frequent item, then this frequent itemset will used for generating the next frequent item,this procedure will continue till the all frequent item generate, after all frequent itemset is generated till purne the database. 1994), adopt the Apriori-like method.

The advantages of Apriori-like method is it provide good performance by reducing the size of candidates. The apriori method is very expensive and here we need to scan the database repeatedly and then we need to check a large set of candidates in database for matching item.

The fp growth will store the data in database using datastructure tree called fp tree, which will not use the candidate generation method and it will use the partition, divide and conquer method to store the data. Advantages of fp tree is it will reduce the search space and will generate frequent time set without using the candidate generation.

The ppc algorithm will work based by combining the advantages of apriori algorithm and fp growth.

3 ppc tree

3.1 problem statement of ppc tree

Let I = {i1, i2, . . . ,im} be the universal item set. Let DB = {T1,

T2, . . . ,Tn} be a transaction database, where each Tk(1 _ k _ n) is a transaction which is a set of items such that Tk

I. We also call Aan itemset if A is a set of items. Let A be an itemset. A

Page 368

International Journal of Engineering Research

Volume No.4, Issue Special 5 transaction T is said to contain Aif and only if A

T. Let SPA be the support of itemsetA, which is the number of transactions in DB that contain A. Let ξ be the predefined minimum support and |DB| be the number of transactions in DB. An itemsetA is frequent if SPA is no less than ξ × |DB|. Given a transaction database DB and a ξ , the problem of mining frequent itemsets is how to discover the set of all itemsets that have support no less than ξ ×|DB|, which is also called minimum threshold.

3.2 overview of ppc tree ppc tree is a tree like structure it consists of the 5 value for each node of a tree , the values are: item name, count, child nodes, preorder and postorder. the order index of the child node is calculating in two manner :when traversing this tree in a preorder manner and the order index of this node when traversing this tree in post-order manner.

let we understand the working of ppc tree by considering example, database d is in use and the support value will be

=20%,then firstly algorithm removes all the item frequency which is less than minimum support value ,and sort the frequency item which satisfy the minimum support value in decreasing order.Then it insert the data which is having support value <=20% then tree is constructed by traversing in preorder and postorder with each node n list and node list.The

advantages of Pre Post code algorithm : (i) N-lists are much more compact than previously proposed vertical structures, (ii) the support of a candidate frequent itemset can be determined through N-list intersection operations which have a time complexity of O(m ? n ? k), where m and n are the cardinalities of the two N-lists and k is the cardinality of the resulting N-list.

3.3 ppc tree construction and design

Definitions of ppc tree

Definition 1:ppc tree is a tree like structure

1.one node of tree in named s the root node ,its having null value and child nodes of root node is called as the subtree of the tree.

2 Each node of the subtree consists of five fields namely: item-

name, count, childNode-list, pre-order, and post-order. item – name will specific the frequent item of that node. Count valuesay the number of transactions presented by the portion of the path reaching this node. childNode-listsay the number of the children of the node. pre-orderthis say about the preorder rank of the node. post-orderthis say about the postorder rank of the node.

The difference between the fp tree and ppc tree

1.

P-tree has twofield in each node ,one is node list and another one is header table structure its hold the connection between node which same item node ,but in ppc we don’t have that ,here we are using preorder and post order method .

2.

n ppctree every node will be having the preorder filed and post order field, but in fp tree we are not having this.the preoder of

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 the node is determined by traversing the tree in preorder, the root node N is traverse before the child node of its from left to right and the time N for the traversal of node is record by preorder records.thetravesing of the postorder traversal node is N visted and named as root after all its child nodes were traversed from left to right.

3.

fter a FP-tree is built, it will be used for frequent pattern mining during the total process of FP-growth algorithm, which is a recursive and complex process. However, PPC-tree is only used for generating the Pre-Post code of each node. Later, we will find that after collecting the Pre-Post code of each frequent item at first, the PPC-tree finishes its entire task and could be deleted.

Algorithm of PPC-tree Construction

Input: A transaction database DB and a minimum support threshold ξ .

Output: PPC-tree, F1 (the set of frequent 1-patterns)

Method: Construct-PPC-tree(DB, ξ ) {

//Generate frequent 1-patterns

(1) Scan DB once to find the set of frequent 1-patterns (frequent items) F1 and their supports. Sort F1 in support descending order as If, which is the list of ordered frequent items.

//Construct the PPC-tree

(2) Create the root of a PPC-tree, PPT , and label it as “ null ”.

Scan DB again. For each transaction T in DB, arrange its frequent items into the order of Ifand then insert it into the PPCtree. (This process is the same as that of FP-tree [2].)

//Generate the Pre-Post code of each node

(3) Scan PPC-tree by preorder traversal to generate the pre-

order. Scan PPC-tree again by postorder traversal to generate the

post-order. } theppc tree algorithm can be understand easily by using the example

Example: Let the transaction database be DB, the left two columns of Table 1 and support count= 40%.The PPC-tree storing the DB is shown in Figure 1. It should be noted that based on Algorithm 1 the PPC-tree is constructed via the last column of Table 1

Table 1: transaction database

Id

1

2

3

4

5

Item a, c, g e, a, c, b f, e, c, b, i b, f, h b, f, e, c, d

Ordered frequent items c, a b, c, e, a b, c, e, f b, f b, c, e ,f

Obviously, the second column and the last column are equivalent for mining frequent patterns under the given minimum support threshold. In the last columns of Table 1, all infrequent items are eliminated and frequent items are listed in support-descending order. This ensures that the DB can be efficiently represented by a compressed tree structure.

For Pre-Post code generation, we traverse the PPC-tree twice by preorder and postorder. After that, we get the Figure 1. In this

Page 369 a

International Journal of Engineering Research

Volume No.4, Issue Special 5 figure, the node with (3,7) means that its pre-order is 3, postorder is 7, and the item-name is b, count is 4.

Fig: ppc tree structure

4 Experimental result

The three computers have same configuration, CPU is AMD

Athlon dual-core processor, clocked at 2.11GHz, memory size

2G. T10I4D100K and Pumsb act as experimental data. We compare the runtime of three algorithms Pre Post, FP tree and apriori algorithm when they are performed on the two datasets.From experimental results shown above we know the runtime will become shorter when support increases.

It'sevident.also reflects performance of the parallel algorithm is not as good as PrePost on small dataset.The reason is each node needs to send message to others in clusters, but delay of network bandwidth is unpredictable, so I/O operation occupies main runtime, thus affecting the performance of the algorithm.

Contrarily, Pre Post has an advantage of data localization. But when the dataset is large, PrePost at a lower support threshold can not be performed due to memory overflow.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

5 Conclusion

The ppc tree is used to retrieve the frequent item set from database in the data mining. Using ppc tree we can improve the fastness retrieving of the data.ppc tree use prefix method to construct the tree structure of the data in the database.

The tree structure is constructed vertical, the database is scanned twice and generate the frequent item set using the apriori algorithm.

The ppc tree based on the prepost algorithm is more efficient than the fundamental technique used in the data mining to mine the data. We can further develop the paper by applying parallel algorithm.

6 Reference i.

Agrawal R, Srikant R. Fast algorithms for mining association rules[C]lProc. 20th int. conf very large data bases, VLDB. 1994, 1215487-499.

ii.

Deng Z H, Wang Z H, Jiang J .I. A new algorithm for fast mining frequent itemsets using N- lists[J]. Science China Information Sciences,

2012,55(9) 2008-2030.

iii.

Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. In: The 21th International

Conference on Very Large Data Bases (VLDB'95), Zurich, 1995. 432- 443.

iv.

H. Mannila, H. Toivonen, and A. Verkamo.Efficient algorithm for discovering association rules. AAAI Workshop on Knowledge Discovery in

Databases, pp. 181-192, JuL 1994.

v.

Shi Yue-mei. Hu Guo-hua. A Sampling Algorithm for Mining

Association Rules in Distributed Database[C]. In:2009 First International

Workshop on Database Technology and Applications, 2009, 431-434.

vi.

Han J, Pei J, Yin Y Mining frequent patterns without candidate generation[C]/ACM SIGMOD Record. ACM, 2000, 29(2): 1-12.

vii.

Mobasher B, Dai H, Luo T, et aL Effective personalization based on association rule discovery from web usage data[C]lProceedings of the 3rd international workshop on Web information and data management. ACM,

20019-15.

Graph1: Run time using the prepost ,fp tree, apriori algorithm

ICCIT15@CiTech, Bangalore

Page 370

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Design and Implementation of Research Proposal Selection

Neelambika Biradar, Prajna M., Dr. Antony P J

Deptt. of CSE, KVGCE Sullia D.K neelambika.biradar1@gmail.com, prajnamanu@gmail.com, antonypjohn@gmail.com

II . LITERATURE SURVEY

Abstract—

As a large number of research centers, educational, institutes are opened day by day, the research project selection has became an important tasks for different government and private research funding agencies. As a large number of research papers are received, next step is to group them according to their similarities in the research disciplines. By implementing the text mining approach, classification of project proposals can be done automatically. And ranking of proposals can be done based on the value of feature vector which gives the effective proposals in sorted fashion. The outcome will conclude us the effective way of selecting the best research proposals among them.

Keywords—Ontology based text mining, Classification,

Clustering.

I. INTRODUCTION

Ontology patterns were introduced by Blomqvist and Sandkuhl in 2005[1]. Later the same year, Gangemi (2005) presented his work on ontology design patterns[2]. Rainer Malik et al.,

(2006). have used a combination of algorithms of text mining to extract the keywords relevant for their study from various databases.

The selection of research proposals in existing system is done manually. Here proposals needs to submit to funding agency and according to name and keywords used in proposals are classified into groups. These all things has done by manually means by human beings.

But this is not suitable for large data because it might make misplacements of proposals to wrong groups due to manual process. Misplacement of proposals can be happen for following reasons. First, keywords might give an incomplete meaning about whole proposals. Second, keywords which are provided by applicants may have misconception and also we can say keywords will give only partial representation of proposals.

Third, manual grouping which is done by area expert.

III. BACKGROUND

In computer science, ontology can be said as set of concepts that is knowledge within domain and relation between the pairs of concepts. Ontology is used in various domains as a form of knowledge representation about the world. In this project, ontology is a model for describing the world that gives the mapping between the properties and relationship types which gives the close relationship between the ontology and real world.

Research project selection is an important task in government as well as private funding agencies. It is a challenge for multi process task that begins with a call for proposals by the funding agencies. Earlier it was a manual method for classifying but this method has extended from manual to automatically to be done based on feature vector value. After submission of the proposals, the next step is to apply the preprocessing step like data cleaning to remove all the stop words from the proposals.

The web technology has defined many stop words. By applying the preprocessing step like data cleaning, can remove all the stop words from all the submitted proposals. The Obtained clean data words can be considered as tokens, assigned with unique id to each tokens. Then calculate the number of times the token has been repeated, which gives the frequency of tokens. Now apply the frequency tokenized algorithm for calculating the inverse document frequency text which gives number of documents or proposals. By multiplying the frequency of text to the obtained , IDFT value, will get the feature vector value. Finally the proposal whose feature vector value is highest that will appear at the top of list in descending order based on is value.

This project uses the concept of ontology based text mining approach such as classification and clustering algorithms. The proposed system builds the research ontology and applies the decision tree algorithm to classify the data into the disciplines using created ontology and then the resultant of classification is helps to make clusters of similar data.

A. Ontology

Ontology has several technical advantages like flexibility and easily accommodates heterogeneous data. Nowadays, ontology has become a prominent in the research field especially in computer science. Ontology is knowledge repository which defines the terms and concepts. And also represent the relationship between various concepts. It’s a tree like structure defined by author Gangemi A, 2005. Ontology in this paper is created by submitting proposals which containing the keywords, which are the representation of overall project. Creating a list of keywords from specific area itself is an area of ontology [2]. By creating this it will be easy to classify the proposals into their respective area by checking number of times words have been appeared in paper.

B. Classification

Based on the data, input text data can be classified into number of classes in classification. Various text mining techniques are used for classification of text data such as

ICCIT15@Citech, Bangalore

Page 371

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Support Vector, Machine, Bayesian, Decision Tree, Neural

Network, Latent Semantic Analysis, Genetic algorithm, [8] etc.

Classification is the main steps involved are

1. Document preprocessing

2. Feature extraction/selection

3. Model selection

4. Training and testing the classifier

Pre-processing: Data pre-processing reduces the size of the input text documents significantly. It involves the boundary like sentence boundary, natural language stop words elimination and stemming. Stop words are functional words which occur frequently in language of text so that they are not useful for classification. Feature extraction: The linked list which contains the pre- processed data is use for collecting feature of that document. This is done by comparing the linked list with keywords of ontology of different area. So the refined vector will act as feature vector for that proposal.

Model selection: Now, the way by which that paper is categorized into research area is clustering of proposal. This is done by many approaches but this paper use the k-means algorithm for it. Here created ontology is used for training the network. Here created ontology and feature vector both so it will train then specify the corresponding research area.

Training and Testing: Here created research projects feature vector are transfer in the form of input as the training data to network for training. And this trained network is test with different proposal’s feature vector so one can obtain belonging class of proposal.

C. Clustering

Number of similar objects collected and grouped together is called a cluster. Following are few of the definitions of the cluster.

1. A cluster is a set of entities which are like and entities from different clusters are not alike.

2. A cluster is an aggregation of points in the test space such that distance between any two points in the cluster is less than the distance between any point in cluster.

3. Clusters are connected regions of multi dimensional space containing high density of points separated by low density of points.

4. Clustering means grouping of similar types of objects into one cluster

Clustering is a technique used to make group of documents having similar features. Documents within cluster have similar objects and dissimilar objects as compared to any other cluster. Clustering algorithms creates a vector of topics for document and measures the weights of how well the documents file into each cluster [9].

IV . PROPOSED SYSTEM

The proposed system is based on the ontology based text mining. It includes four phases. Ontology based text mining cluster the research proposals according to their domain.

Unstructured text is processed and extracted interesting information and knowledge by applying text mining .

number of documents to the frequency. And by multiplying this value with frequency will get feature vector value. Finally we can rank the papers based on the feature vector value.

V. METHODOLOGY

In this paper research projects are clustered into specific area using ontology of different areas. Following are the modules for proposed system and is also shown in Figure 2.

Module 1: In the first module users have to submit the proposals. At a time five proposals can be submitted.

Proposals along with their abstract will be sent and those will be stored on ontology.

Module 2: By applying preprocessing step like data cleaning, we can remove all the stop words from proposals. Then obtained cleaned data will be given as input to next module.

ICCIT15@Citech, Bangalore

Page 372

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 feature vector values papers will be ranked.

(

Module 3: The cleaned data words are called tokens which assigns with unique id. Number of times token has been repeated is called frequency computation.

Module 4: The Log of ratio of number of documents to the frequency of text is called inverse document frequency text.

Figure 2: System Architecture

VI. RESULT AND DISCUSSION

Here result of proposed system has been discussed. Any government and private funding agency will call for proposals.

Then users can submit their research proposals and by applying pre-processing steps like data cleaning, all the stop words can be removed. Each token have their own unique id, then calculate number of times token has appeared which gives frequency.

By using appropriate formula we can calculate IDFT and feature vector as shown in below screen shot. Based on the In the above Figure 3, few of the feature vector value is zero, it means that few of tokens are unique, not repeated in any of the proposals, and leads the frequency value to 1. Hence by applying Eq (1), we got the result as shown in screen shot in last two columns using Equations 1 and 2 respectively.

Finally based on the feature vector value papers will be sorted.

VII. CONCLUSION

This paper has presented the ontology in text mining for grouping of proposals. Research ontology is constructed to categorize the concepts in different discipline area. And also form a relationship among them. The text mining technique provides different methods like classification, clustering etc. for extracting important information from unstructured text document. Feature vector value will be calculated based on number of times token has been repeated in proposals with product of inverse document frequency text value. Finally proposals are ranked based on the feature vector value. The

Highest feature vector valued proposal will give effective. research proposal. Hence we can reduce the time consumption by compare to manual approach.

Figure 3: Result Analysis

REFERENCES

i.

Blomqvist E and Sandkuhl K (2005),―Patterns in ontology engineering:Classification of ontology patterns ‖ , In Proceedings of the 7th

International Conference on Enterprise Information Systems. ii.

iii.

Gangemi A (2005), ―Ontology design patterns for semantic webcontent ‖ , In The Semantic Web ISWC 2005 Springer.

S. Bechhofer et al., OWL Web Ontology Language Reference, W3C recommendation, vol.10, p. 2006-01, 2004. iv.

Henriksen A D and Traynor A J (1999), ―A practical R&D project selection scoring tool, ‖ IEEE Trans. Eng. Manag., Vol. 46, No. v.

Y. H. Sun, J. Ma, Z. P. Fan, and J. Wang, ―A group decision support approach to evaluate experts for R&D project selection, IEEE Trans

Eng. Manag., vol. 55, no. 1, pp. 158–170, Feb.2008. vi.

Jian Ma, Wei Xu, Yong-hong Sun, Efraim Turban, Shouyang

Wang.An Ontology-Based Text- Mining Method to Cluster Proposals for

Research Project Selection,IEEE Trans an systems and humans vol.42,no.3

May2012 vii.

jay prakash oandey et al,automatic ontology creation for research paper classification vol. 2, no 4 Nov 2013 viii.

Jain, A. K., and Dubes, R. C., ―Algorithms for clustering ata ‖ ,Prentice- Hall.,1988. ix.

Rainer Malik, Lude Franke and Arno Siebes

(2006),―Combination of text-mining algorithms increases the x.

Performance ‖ , Bioinformatics

A. Maedche and V. Zacharias, Clustering “Ontology-based

Metadata in the Semantic Web”. In Proceedings of the 6th European

Conference on Principles and Practice of Knowledge Discovery in

Databases (PKDD'02), Helsinki, Finland, pp. 342-360, 2002. xi.

W. D. Cook, B. Golany, M. Kress, M. Penn, and T.

Raviv,―Optimal allocation of proposals to reviewers to facilitate effective ranking, ‖ Manage. Sci., vol. 51, no. 4, pp. 655–661, Apr. 2005

ICCIT15@Citech, Bangalore

Page 373

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Light Weight Integrated Log Parsing Tool:: LOG ANALYZER

Priyanka Sigamani S.,Dr. D. R. Shashi Kumar

Dept. Of CSE, Cambridge Institute of Technology, Bangalore, Karnataka

Email-ID{priyanka.13scs22@citech.edu.in,hod.cse@citech.edu.in}

Abstract-The amount of content in the Log Files are increasing drastically, manual method to find the errors in those files and fix the issue is becoming very difficult and complex, time consuming and it is not a very efficient method to be followed. Log Analyzer Tool overcomes all the difficulties faced so far. It is highly automated with advanced functionalities which are not provided by the other tools. Log

Analyzer facilitates the

Analyst’s to find bugs in the log file with less efforts. Few Tools fail to open one complete file into the tool to perform search, this tool not just searches the errors in the single file it searches for errors in multiple files and even an entire folder. It displays the result to the user with necessary highlighters and other options. It provides simple and advanced search options to the user.

Keywords- Log Analyzer, Simple Search, Keyword Search,

Date and Time Range Search.

I. INTRODUCTION

Log analysis (or system and network log analysis) is an art and science seeking to make sense out of computer-generated records. The process of creating such records is called datalogging. Logs are emitted by network devices, operating systems, applications and all manner of intelligent or programmable device .A stream of messages in time-sequence often comprise a log. Logs may be directed to files and stored on disk, or directed as a network stream to a log collector. Log messages must usually be interpreted with respect to the internal state of its source (e.g., application) and announce security-relevant or operationsrelevant events (e.g., a user login, or a systems error).

Logs are often created by software developers to aid in the debugging of the operation of application .The syntax and semantics of data within log messages are usually application or vendor-specific. Terminology may also vary; for example, the authentication of a user to an application may be described as a login, a logon, and a user connection or authentication event.

Hence, log analysis must interpret messages within the context of an application, vendor, system or configuration in order to make useful comparisons to messages from different log sources. Log message format or content may not always be fully documented.

II. MOTIVATION

Task of the log analyst is to induce the system to emit the full range of messages in order to understand the complete domain from which the messages must be interpreted. They also provide solution to the bugs once they are examined.

Similarly the products like W4N, ViPR, NCM, ECS,

SRM, SMARTS etc. also generates Log files which are to be analyzed by the log analyst to fix them and ensure the products are bug free in order to keep the product up and running without any interruption. If any bugs found it has to be fixed as soon as possible to avoid other malfunctions. To find out the bugs easily in the log files and to fix them at the earliest LOG ANALYZER was developed. This tool helps the analyst to find out the bugs within the log files in many ways, it also provides 3 main functionalities to locate a bug in the Log Files.

III. SCOPE

This project ensures finding bugs within the log files providing the below features:-

Standalone Desktop Based Tool.

Easy, Friendly User interface.

Ability to perform search on single, multiple files.

Support for Multiple Products: Logs from multiple products like W4N, ViPR, and NCM etc…can be searched within the framework.

Supports Multiple File Type

Search Entire Folder.

Multiple Search Option.

Multiple Options to Display Errors.

Provide Highlighters for the Errors found.

Adjustable Panels for readability.

IV. LITERATURE SURVEY

Literature survey is mainly carried out in order to analyze the background of the current project which helps to find out flaws in the existing system & guides on which unsolved problems can be workout. Log Analyzer tools are available in the market most of them are web based application with limited features.

1.

What is a Search Log?

A search log is a file (i.e., log) of the communications

(i.e., transactions) between a system and the users of that system.

Rice and Borgman (1983) present transaction logs as a data collection method that automatically captures the type, content, or time of transactions made by a person from a terminal with that system.

Peters (1993) views transaction logs as

ICCIT15@CiTech, Bangalore Page 374

International Journal of Engineering Research

Volume No.4, Issue Special 5 electronically recorded interactions between on-line information retrieval systems and the persons who search for the information found in those systems. For Web searching, a search log is an electronic record of interactions that have occurred during a searching episode between a Web search engine and users searching for information on that Web search engine. A Web search engine may be a general-purpose search engine, a nice search engine, a searching application on a single Website, or variations on these broad classifications. The users may be humans or computer programs acting on behalf of humans.

Interactions are the communication exchanges that occur between users and the system. Either the user or the system may initiate elements of these exchanges.

2. How are These Interactions Collected?

The process of recording the data in the search log is relatively straightforward. Servers record and store the interactions between searchers (i.e., actually Web browsers on a particular computer) and search engines in a log file (i.e., the transaction log) on the server using a software application. Thus, most search logs are server-side recordings of interactions.

Major Web search engines execute millions of these interactions per day.

The server software application can record various types of data and interactions depending on the file format that the server software supports.

3.Why Collect This Data?

Once the server collects and records the data in a file, one must analyze this data in order to obtain beneficial information.

Few Tools are given below:-

PowerGREP:-PowerGREP is a powerful Windows grep tool. Quickly search through large numbers of files on your PC or network, including text and binary files, compressed archives,

MS Word documents, Excel spreadsheets, PDF files,

OpenOffice files, etc. Find the information you want with powerful text patterns (regular expressions) specifying the form of what you want, instead of literal text. Search and replace with one or many regular expressions to comprehensively maintain web sites, source code, reports, etc. Extract statistics and knowledge from logs files and large data sets.

Weblog expert:-WebLog Expert is a fast and powerful access log analyzer. It will give you information about your site's visitors: activity statistics, accessed files, paths through the site, information about referring pages, search engines, browsers, operating systems, and more. The program produces easy-toread reports that include both text information (tables) and charts. View the WebLog Expert sample report to get the general idea of the variety of information about your site's usage it can provide.

Log Parser Lizard:- Log Parser Lizard is a GUI for

Microsoft Logparser, definitely the best one available on the market today. Log Parser is a very powerful and versatile query software tool that provides universal query access (using SQL)

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 to text-based data, such as log files, XML files, and TSV/CSV text files, as well as key data sources on the Microsoft Windows operating system, such as the Windows Event Log, IIS log, the registry, the File System, the Active Directory services and much more.

tool.

Piwik, Oracle log analyser, Wget are few other log analysis

V. EXISTING SYSTEM

Presently there is no specific tool available for to find exact bugs in the log file. Different Log Analysts use different method to find the bugs within the log file .One such Tool which they currently use is Note Pad++.Note Pad++ searches a file for multiple keywords. You can specify a list of keywords to search for in the current file, and filter out lines that match any of the keywords in this list. It was developed mainly for analyzing log files where you are interested in more than one keyword and the order in which they appear.

Matching lines are listed with their line numbers in a separate panel in the plug-in window

Double clicking a matched line in this panel will take you to the corresponding line in the original document

Options to copy the filtered lines to clipboard and highlight matches in the original file

Supports case sensitive search, whole word matching and regular expressions. Regexp is enabled by default.

VI. PROBLEM STATEMENT

Note Pad++ is not efficient in all ways.

It only performs keyword search.

It does not perform other advance searches like automatic grepping the exceptions from the log file.

Fails to a load a log file more than few MB’s length.

Using this tool is not that effective. Log Analyst must go through the log files line by line in order to fix a bug

Time consuming

If a larger log file has to be searched it has to be first split into many chunks, open each chunk and find the errors in it manually.

To split the files some other tool should be used first and then open each split in Note Pad++ every time.

If there are many larger files as such the time complexity increases.

If the issue is critical it has to be escalated within short amount of time ,if this is the case to split it and them find errors in each split it will a ffect the client’s environment who is waiting for the issue to get fixed.

VII. PROPOSED SYSTEM

The proposed system has a lot of new features which would help the log analysts perform the log analysis quickly and accurately. Following are the features of the Log Analyzer.

Standalone Desktop Based Tool: The Tool will be a standalone desktop based application which helps the users to

ICCIT15@CiTech, Bangalore Page 375

International Journal of Engineering Research

Volume No.4, Issue Special 5 install them on Laptop or Personal computer. It is a light weight tool(uses minimum system resources)

Easy, User Friendly User interface: The Tool has a very understandable GUI where Tool Tips for all the components will be provided to guide the user in the correct path in order to use the tool effectively and more efficiently.

Ability to perform search on single and multiple

files: When other tools have problem opening a single file to find errors this tool has a ability to perform search on multiple files on a single selection.

Provides folder Search option: where a search error in a single large file is a challenge here an entire folder could be searched.

Support for Multiple Products: Logs from multiple products like W4N, ViPR, and NCM etc…can be searched within the framework. This tool is not specific to a product.

Supports Multiple File Type: The tool supports different files types. Following files are accepted as input file

.txt, .log.

Multiple Search Options: o Simple and Advance Search: In Simple Search log analyst can select a file or multiple files or a complete folder and search for both Warning and severe or restrict the search for either of one.

o Manual Keyword Based Search: In Keyword Search log analyst can select a file or multiple files or a complete folder and search for the keyword of the Log Analyst’s choice. Again log analyst can select the kind of pattern he/she wants like Match

Case, Starts With, Ends With (All Cases would display Match

Case, Starts With and ends With).

o Automatic search of Standard Java Errors in the log

File: When Simple, Keyword or Date and Time range searches are performed on the log files Java Exception are automatically grepped and displayed to the User.

o Search based on Date and Time Range: This is a unique feature and more useful feature where a time range can be specified and everything within the time range will get displayed to user.

Multiple Options to Display Errors: Errors which are displayed to the user have different colour Example: - SEVERE

– Red and WARNING-Green For the number of files selected corresponding tabbed panes gets generated and displays the errors specific to the file names.

Close Options: The Tabs created can be independently closed or on right click there is option available to close the entire tabs which are opened

Provide Highlighters to the Errors found: For all the searches performed Simple, Keyword and Date and Time Range

Search Colour Highlighters are provide for the Log Analyst to locate the errors in the file. Different colours indicating severity level.

o Simple Search:

SEVERE – Red, WARNING-Green o Keyword Search:

Searched Keyword will get a Yellow Highlight.

o Date and Time Range search:

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Start Time- Green, End Time – Green.

Adjustable Panels for readability: This feature is mainly for the Log Analyst’s readability.

Display the Progress of the Search: When the User is performing a search notify if it is progressing and when the search is completed.

Abort Operation: If the user is not interested in performing the search he\she must be able to abort the search at any point of time.

Open Files: The Tool also enables the Log Analyst’s to open the selected file from the tool with the supporting software required to open that particular file. The files are stored on a table for each of the search options. It is stored along with the path for the Log Analyst to identify which log file it is while performing search on multiple files or a folder.

VIII. SYSTEM ARCHITECTURE.

This section presents the architecture for the Log Analyzer Tool.

Figure 1. Log Analyser Architecture

Input: Input to the tool is Log file(s), Folder.

Operation: Simple,Keyword and Date and Time Range Search are the three modules which are implemented in this project.

Output: Log Analyst expected parsed result.

1.

Detailed System Architecture

The detailed architecture explains how the of flow of the

Log Analyzer Tool works Initially when the Log Analyzer

Tool is accessed it loads the Initial screen with a tool bar and a display screen. Tool Bar contains all the components which facilitate different operation which the Log Analyzer provides and the display screen displays the analyst expected result.

2.

Modules

This project mainly consists of 3 Modules. They are: -

Simple Search

Keyword Search

Date and Time Range Search

ICCIT15@CiTech, Bangalore Page 376

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 user has started the search, options are given to abort the operation. Every tab changes colour to indicate that the operation has completed in that File. Progress Bar is provided to indicate the operation is still in progress and also a text saying

“Search Completed” pops after the entire search has completed.

2.2 Keyword Search

Figure 2. Detailed Architecture of Log Analyzer

2.1 Simple Search

Initially the user is given option to select a single log file or multiple log files or an entire log folder based on the users choice to perform the simple search operation. Then user must select files for which log parsing must be performed. After the

User selects the files, the selected file names along with entire path are displayed on a table which allows the analyst to know on what component the search is performed, the tool also allows analyst to open the file within framework to cross check if the analyst is performing the operation on the file which is of interest. It also provides options to delete single file from the table or the entire list of files from the tables. Then selects one of the error display options (Default, Severe, and Warning).

Figure 4. Block Diagram for Keyword Search

Keyword Search is used when the analyst already guesses what could be the reason for the failure of the product.

So this module allows the analyst to selects the Log files or entire folder of log files, then the analyst enters the possible query for which they think the failure has occurred. Selects the Matching

Type (Match Case, Starts With, Ends With, All Cases). Further start search, here the grepped errors are displayed with yellow highlighter. At any point of time the search could be aborted and resumed providing flexibility for the analyst.

2.3 Date and Time Range Search

Similar to Simple and Keyword Searches Date and

Time Range Search greps the content within the Start and

End Time and displays it to the user/Analyst to study about the product with information available from the log files.

Analyst studies the log information for which the failure occurred in the product and provides relevant solution to overcome the problem faced by the clients.

Figure 3. Block Diagram for Simple Search

By selecting Default the tool displays both Severe and

Warning with highlighters for user to easily locate them, on selecting Warning or Severe it displays only them. Further the analyst clicks on Start Search button which performs the search on the files for the selected error display option , displaying the grepped result by creating individual tabs for the files to avoid confusion to know which result belongs to which file. Once the

ICCIT15@CiTech, Bangalore

Figure 5. Block Diagram for Date and Time Range Search

Page 377

International Journal of Engineering Research

Volume No.4, Issue Special 5

IX.IMPLEMENTATION

The implementation phase of the project is where the detailed design is actually transformed into working code. Aim of the phase is to translate the design into a best possible solution in a suitable programming language. This section covers the implementation aspects of the project, giving details of the programming language and development environment used.

1.

Language Used For Implementation

Java is chosen as the programming language. Few reasons for which Java is selected as a programming language can be outlined as follows:-Platform Independent, objectoriented, distributed, Rich Standard Library and mainly for the below reason

Swing Support. Swing was developed to provide a more sophisticated set ofGUI components than the earlier Abstract

Window Toolkit. Swing provides a native look and feel that emulates the look and feel of several platforms, and also supports a pluggable look and feel that allows applications to have a look and feel unrelated to the underlying platform. It posses these traits Platform-independence, Extensibility, Look and feel.

2.

Development Environment

A platform is a crucial element in software development. A platform might be simply defined as “a place t o launch software”. In this project, for implementation purpose

NetBeans -IDE 8.0.1 is used.

2.1NETBEANS- IDE 8.0.1

NetBeans is a multi-language software development environment comprising an integrated development environment

(IDE) and an extensible plug-in system. It is written primarily in

Java and can be used to develop applications in Java and, by means of the various plug-ins, in other languages as well, including C, C++, COBOL, Python, Perl, PHP, and others.

NetBeans employs plug-ins in order to provide all of its functionality on top of (and including) the runtime system, in contrast to some other applications where functionality is typically hard coded.

The NetBeans SDK includes the NetBeans java development tools (JDT), offering an IDE with a built-in incremental Java compiler and a full model of the Java source files. This allows for advanced refactoring techniques and code analysis.

X. INTERPRETATION OF RESULTS

The following snapshots define the results or outputs that we will get after step by step execution of all the modules of the system.

Below are the snapshots for the modules Simple

Search, Keyword Search and Date and Time Range Search with different functionalities.

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

1.

Log Analyzer Build

2.

Initial Screen after tool as Launched

3.

Simple Search

4.

Keyword Search

Page 378

International Journal of Engineering Research

Volume No.4, Issue Special 5

5.

Date and Time Range Search

Start Time

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

ACKNOWLEDGEMENT

This project is supported by EMC Software & Services India

Pvt. Ltd. Bangalore. My Sincere thanks to the Escalation

Engineering Team Members for all their support and guidance to carry out this work.

REFERENCES

[1] Oracle Learning Swing with NetBeans IDE

[2] Java Swing, O’reilly, Marc Loy, Robert Eckstein.

[3] Oracle Java Documentation Creating a GUI with Swing.

[4] Core Swing : Advanced Programming by Kim Topley.

[5] The Complete Reference Java, 7 th

Edition, Herbert Schildt,

McGraw Hill Professional.

[6] http://www.google.com

[7] http://www.stackoverflow.com

[8] http://www.w3schools.com

End Time

XI. CONCLUSION

This Tool reduces the time of the Analyst in finding the bugs within the Log Files manually. It facilitates the Analyst in finding the bugs within the Log Files or the folder consisting log files in less time providing operations like Simple Search, keyword Search, Date and Time Range Search within the same framework. The selection of the search type depends on the situation, experience of the Analyst. When loading a single Log

File onto the application was a problem Log Analyzer searches for errors/bugs in the log file or log files or a complete folder consisting of log files providing proper Highlighters for the

Search Type.

XII. FUTURE ENHANCEMENT

In future we could Add Remote host within the frame work which would directly search the bugs for the product on the client machine rather than asking the clients to make a copy of the Log Files in a shared location which the analyst and the client has access to. Another enhancement could be, provide an option to Display Errors like All errors or distinct errors where All errors option could be used when analyst is interested to know the count of the bugs within the log file and Distinct error option could be used to see only one occurrence of the a error which is the cause for the failure.

ICCIT15@CiTech, Bangalore Page 379

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Towards Secure and Dependable for Reliable Data Fusion in Wireless Sensor

Networks under Byzantine Attacks

Valmeeki B.R.

,

Krishna Kumar. P.R.

,

Shreemantha M.C.

Dept. of M.Tech (CSE) ,Cambridge Institute of Technology, B’lore -36 valmeeki1991@gmail.com,, rana.krishnakumar@citech.edu.in,, smchatrabana@gmail.com

Abstract - The

Data Storage‘s attack is a severe a ttack that can be easily launched by a pair of external attackers in Wireless

Sensor Networks. In this attack, an attacker sniffs packets or data at one point in the network by injecting fake contents or wrong waiting time for corresponding nodes. In this system, the system proposes novel attackers detection and positioning scheme based on mobile (Location Based Server) LBS, which can not only detect the existence of Network Node attacks, but also accurately localize the attackers for the system to eliminate them out of the network and enhancing the digital signature value using Secure Hash Algorithm

512(SHA-512) due to security reason.

Index terms

Wireless Sensor Networks, Location Based

Server, Digital Signature, Secure Hash SHA-512.

1. INTRODUCTION

Wireless Sensor Networks are spatially distributed autonomous sensor to monitor physical or environmental conditions such as temperature, pressure and sound. In order to find the attacker details in the mobile phone, the mobile phone consists of three logical parts which are involved in the data exchange. The hardware component is the insecure communication unit of the device responsible for the Bluetooth, Location Base Server

(LBS) or Mobile Device for communication with the external machine. The mobile user can connect with the LBS Server via

Bluetooth device to communicate with the mobile. The user will find the Bluetooth server name and then login into mobile to view all current attackers in the Storage Node which is Wireless

Sensor Networks.

Wireless Sensor Networks is the one of the areas in the field of wireless communication, where in delay is particularly high.

They are promising technology in vehicular, disaster response, under water and satellite networks. Delay tolerant networks characterized by large end to end communication latency and the lack of end to end path from a source to its destination and they pose several challenges to the security of WSNs. In the network layer there are many attacks so we consider most common types of attacks on these networks. With the help of these attacks they give serious damages to the network in terms of latency and data availability. Using entropy anomaly detection algorithm motivated by the detection of external attackers and to prevent them from the attacking data from the outside environment.

SHA512 is one member of a family of cryptographic hash functions that together are known as SHA-2. The basic computation for the algorithm takes as input a block of input

ICCIT15@CiTech, Bangalore data that is 1024 bits (128 bytes) and a state vector that is 512 bits (64 bytes) in size, and it produces a modified state vector. It is follow-on to the earlier hash algorithms MD5 and SHA-1, and it is becoming increasingly important for secure internet traffic and other authentication problems. As the SHA512 processing involves a large amount of computations, it is critical that applications use the most efficient implementations available.

The algorithm operates on 64-bit QWORDs, so the state is viewed as 8 QWORDs (commonly called A…H) and the in put data is viewed as 16 QWORDs. The standard for the SHA-2 algorithm specifies a procedure for adding padding to the input data to make it an integral number of blocks in length. This happens at a higher level than the code described in this document. This paper is only concerned with updating the hash state values for any integral number of blocks.

The SHA512 algorithm is very similar to SHA256, and most of the general optimization principles described in this system apply here as well. The main differences in the algorithm specification are that SHA512 uses blocks, digests and data-type of computation twice the size of SHA256. In addition, SHA512 is specified with a larger number of rounds of processing (80 rather than 64).

2. IMPLEMENTATION

Fig 1: Finding the attacker in mobile devices using Location

Based Server.

Page

380

International Journal of Engineering Research

Volume No.4, Issue Special 5

Rapid growth in the Wireless technology and mobile devices to deliver new types of location centric applications and services to users. Location base Service is also known as LBS, are a general class of computer program-level services that use location data to control features. As such LBS is an information system and has a number of uses in social networking today as an entertainment service, which is accessible with mobile devices through the mobile network and which uses information on the geographical position of the mobile device. This has become more and more important with the expansion of the smart phone and tablet markets as well.

Sender will first browse the file which it wants to send to the destination, initially it redistribute the SHA512 standards. Then the sender sends browsed file to the router before it delivers the file to the destination. Once receiving the file from the sender the router checks the details of end users and attacker details.

The attacking system may be of two types- injecting fake contents to the particular nodes and wrong waiting time or time delay in delivering the file to the destination.

The type of attacking system can be viewed by the mobile user by connecting with mobile LBS via Bluetooth device. The LBS first checks for the mobile user login verification and sends attacker details to the mobile user. Mobile user request for the attacker details to the Location Based Server, the LBS sends the attacker response details to the mobile user so that user can view the attacker details.

3. PERFORMANCE EVALUATION

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Entropy based anomaly detection scheme incorporates knowledge and behavior for detecting the varying attacks in wireless sensor network. Compared with existing mechanisms, the entropy scheme achieves high filtering probability and high reliability and also optimal utilization of energy. This work has been implemented in Java language, the results shows the effective data transmission in wireless sensor networks. Figure 2 illustrates the total energy of all sensor nodes in the data transmission, which also indicates the balance of energy consumption in the network and Figure 3 shows the comparison of time in the data transmission. The results demonstrate that the

Entropy based Anomaly Detection System scheme achieves high en-routing filtering probability and high reliability and also optimum utilization of energy.

Fig 2: Comparison of time delay in data transmission

ICCIT15@CiTech, Bangalore

Comparisons of Energy Consumption in Data Transmission

Fig 3:

4. CONCLUSION

The malicious node detection scheme for adaptive data fusion under time varying attacks. The detection procedure is analyzed using the entropy-defined trust model, and has shown to be optimal from the information theory point of view. It is observed that nodes launching dynamic attacks take longer time and more complex procedures to be detected as compared to those conducting static attacks via Hand Held mobile devices using

LBS and enhancing the digital signature scheme using SHA512

Standards. The adaptive fusion procedure has shown to provide significant improvement in the system performance under both static and dynamic attacks. Further research can be conducted on adaptive detection under Byzantine attacks with soft decision reports.

Page

381

International Journal of Engineering Research

Volume No.4, Issue Special 5

REFERENCES i.

“Federal Information Processing Standards Publication 180 -2

SECURE HASH STANDARD” http://csrc.nist.gov/publications/fips/fips180

2/fips180-2.pdf

ii.

“P rocessing Multiple Buffers in Parallel to Increase Performance on

Intel® Architecture Processors” iii.

Fast SHA256 Implementations on Intel® Architecture Processors http://download.intel.com/embedded/processor/whi epaper/327457.pdf

iv.

“Fast Cryptog raphic Computation on Intel® Architecture Processors

Via Function Stitching”.

v.

Optimized SHA512 http://www.intel.com/p/en_US/embedded/hwsw/te processing#docs

Source Code hnology/packet-

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 vi.

Intel® Advanced Vector Extensions Programming Reference http://software.intel.com/file/36945.

vii.

K. Liu, J. Deng, P.K. Varshney, and K. Bala krishnan, “An

Acknowledgment-Based Approach for the Detection of Routing Misbehavior in

MANETs,” IEEE T rans. Mobile Computing, vol. 6, no. 5, pp. 536-550, May

2007.

viii.

K. Aberer and Z. Despotovic, “Managi ng Trust in a Peer-2-Peer

Information System,” Proc. 10 th

Int’l Conf. Information and Knowledge

Management (CIKM ’01), pp. 310 -317, 2001.

ix.

Y. Zhu, D. Guo, and M.L. Honig, “A Message Passing Approach for

Joint Channel Estimation, Interference Mitigation and Decoding,” IEEE Trans.

Wireless Comm., vol. 8, no. 12, pp. 6008 6018, Dec. 2009.

x.

W. Y u and K.R. Liu, “Game Theoretic Analysis of Cooperation

Stimulation and Security in Autonomous Mobile AdHoc Networks,” IEEE

Trans. Mobile Computing, vol. 6, no. 5, pp. 507 521, May 2007.

ICCIT15@CiTech, Bangalore Page

382

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Identity and Access Management To Encrypted Cloud Database

Archana A., Dr. Suresh L., Dr. Chandrakanth Naikodi

Deptt of CSE, Cambridge Institute of Technology, Bangalore, India

Abstract — Security is one of the top concerns about cloud computing and the on demand business model. Worries over data privacy and financial exposure from data breaches may be the cloud service providers greatest road backs to new business. As the cloud infrastructure grows so does the presence of unsecured privileged identities that hold elevated permission to access data, run program and change configuration settings. when the data is place in cloud ,the cloud provider should ensure the security and availability of data. Encryption helps in securing the data but still is not complete. In this paper we propose an architecture that implements identity and access management in encrypted cloud databases. By enforcing access control and identity management, the users are guarantee in their security of data.

this approach minimizes the data leakage problem. the correctness and feasibility of the proposal is demonstrated through formal models, while the integration in a cloud base architecture is left to future work.

Index Terms-Cloud Security, Confidentiality, Identity and

Access Control Management

I.INTRODUCTION

In a cloud context where confidential information is placed in infrastructure of untrusted 3 rd parties ensuring confidentiality and security of data is of main importance [2][5].In order to fulfill these requirements there are few data management choices. The original data should be accessible only by the trusted parties. if the data is accessed by any untrusted party then the data needed to be encrypted .To satisfy all these requirements has different levels of complexity depending on the type of cloud service. There are several solutions ensuring confidentiality .confidentiality is a major concern and can be ensured in several ways in storage as a service(sos).but in data base as a service paradigm ensuring confidentiality is still an open research area. In this context secure DBaas is used, which does not expose the unencrypted data to the cloud provider and ensures the DBaas qualities, such as availability (readiness of data),efficiency of data(reliability)and elastic scalability[8].

The confidentiality of data stored in cloud can be achieved through encryption but must guarantee that all decryption keys are managed by the tenant(client)/end-user and never by the cloud provider. We cannot adopt the transparent data encryption feature[7][1] because this approach h makes it possible to build a trusted DBMS over untrusted storage. The DBMS is trusted and decrypts data before their use. therefore this approach is not applicable to the DBaas, because we consider the cloud provider is untrusted.

Even the proposal of the main authors in[8]has some risks of information leakage because the encryption of the cloud

ICCIT15@Citech, Bangalore databases information is based on one master key shared by all users

The enforcement of access control policies through encryption schemes guarantees that data outsourced to the public cloud databases are always managed in an encrypted way, thus guaranteeing confidentiality for data in use and at rest in the cloud. It minimizes information leakage in the case of user key loss or a compromised client machine and even in worst scenario where a malicious but legitimate user colluders with a cloud provider personnel by disclosing his decryption keys. In such a case a partial data leakage is inevitable but is limited to the data set accessible by the additional information about other data that remain inviolable through standard attack techniques.

Access control is only one subset of identity management

(IM).identity

management covers a whole range of functions such as access control ,user provisioning, directory services, account auditing, role and group management, single sign-on(sso) and privileged account management.

Access control differs from identity management in that access control is strictly concerned with providing authentication credentials. In this approach the point is to provide user access, not prove their identity. This narrow focus according to identity management experts, leads to cases of mistaken identity. people who shouldn’t have access to system like malicious users.

masquerade as legitimate users to gain unauthorized access. In this way identity management revolves around verifying users ideally with multiple pieces of proof of their identity before issuing of credentials.

II.LITERATURE SURVEY

Security in cloud is one of the major areas of research. The survey shows that the researchers are focusing on various techniques to enhance the data security in cloud.

Ryan K L Ko et.al [4]studied the problems and challenged of the trusted cloud, where the unauthorized user can access the entire data without disturbing the actual user. An unauthorized person may do two things which is accessing the data and putting duplicate data because cloud storage provides geographical databases. It is not a trusted one to store the data of the users.

For this problem Ryan K L Ko et al proposed a Trust Cloud framework ,to achieve a trusted cloud to the user, to provide a service by making use of detective controls in cloud environment. Detecting process has accountability access with the cloud. Here user is a responsible person for their data, hence user must tell the accountability with the cloud. Here user is a

Page 384

International Journal of Engineering Research

Volume No.4, Issue Special 5 responsible person for their data, hence user must tell the accountability with the technical and policy based services. by providing the accountability through user it may solve the

Muhammad RizwanAsgharet.al[4] discusses the problems of enforcing security policies in cloud environment. With the high growth of data in cloud they where problem arises due to untrusted person access of the data. To ensure the security is immature, they didn’t ensure for the safe data in cloud environments. Security problem is great issue, here we enforce the security for the owners data. Providing high security they may have high expensive for the usersL.ferreti et al studied the problem of data leakage of the legitimate user in cloud environment by the cloud provider; they didn’t give better security to the user for their personal data or internal data. Main problem arises because of no encrypted data were found, and also it provide the security for the front end database only and not controlled the backend databases, so the malicious attackers may gain the data access to the outsourced data.

Luca Ferretti[7] proposed a novel scheme that integrates data encryption with user access control mechanisms. It can be used to guarantee confidentiality of data with respect to a public cloud infrastructure and minimize the risks of internal data leakage even in the worst case of a legitimate user colluding with some cloud provider personnel.

III.ENCRYPTION AND IAM MODEL

We consider a typical scenario in which a tenant organization requires a database service from a public cloud DBaas provider.

In the tenant, there is a database administrator role(DBA) and multiple database users. The DBA is a trusted subject. He has complete access on all database information, and is in charge of enforcing the access control policies of the tenant. Each tenant user has a different level of trust and a consequent authorization to access a specified subset of the database information. this database view is limited by the tenant access control policies that are implemented through authorization mechanism.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 problem from the untrusted one. Hence this approach provides privacy, security ,accountability and auditability.

Figure 1 evidence the considered scenario, where the cloud databases stores all the tenant data and the tenant manages the following types of information: access control policies, users credentials, root credentials tenant data denote the information that is stored in the cloud databases. They are accessed by the trusted database users, such as Alice and bob, through SQL operations. Access control policies are the rules of the tenant that define which data can be accessed by each user.Forexample,Aliceauthorized(tenant) data denotes all and only tenant data on which Alice has legitimate access as defined by the access control policies. Users authorized data can be accessed by one or multiple tenant users, such as Alice & bob authorized data. Alice credentials include all information that she requires to access and execute SQL operations on all and only her authorized data. The DBA is the only tenant subject that has access on root credentials granting complete access on cloud database information and data.

IV. DESIGNAND IMPLEMETATIONS

Fig 1:reference model for a multi user accessing encrypted cloud databases

ICCIT15@Citech, Bangalore

IAM technology can be used to initiate, capture, record and manage user identities and their related access permissions in an automated fashion. This ensures thataccess privileges are granted according to our interpretations of policy and all individuals and services are properly authenticated, authorized and audited.

In IAM implementations there are four steps

*Assert inventory

*Risk assessment

*Architecture review

*implementation

These steps should flow from your information security policy, which company has already drafted.

Page 385

International Journal of Engineering Research

Volume No.4, Issue Special 5

Security should be provided to the entire organization beginning from hardware, servers, routers and workstations including the databases apart from this there is also software ,applications and crucial data like customers and employees information and transaction records.

First the inventory is divided based upon the risk example high risk data and low risk data.

Then risk is determined by accessing the values of data and determining how much loss or damage it could cause. The main considerations are that the access control policy is based on level of risk .high risk asserts requires stronger control. Example: A highly expensive two factor authentications is required for access to publicly available information.

The third is architecture review, which is considered with what systems are running. is it windows or Unix. for windows active directory might be the access management system of choice, since it’s primarily designed for windows architecture. for Unix and Linux based system it might be LDAP.

Finally the implementation is based on how many different applications will need to be accessed. If there are multiple applications each with their own user ID and password then

Single Sign on(SSO) system will be considered.

V. CONCLUSION

We propose an design where the IAM is integrated to the encrypted cloud databases. The IAM mechanism helps an organization to have an control access to critical business system and data. This method also ensures the role based control governance with regard to accessmanagement. It also limits the risk of information leakage due to internal users. The paper gives an overall idea and the formal models that demonstrate the correctness and feasibility of the proposed scheme.

REFERENCES i.

G.Cattaneo,L.catuagno,A.Dsorbo,and P.Persiano,”The Design and implementation of a Transparent Cryptographic File System For

Unix,”proc.FREENIX track:2001 USENIX Ann.Technical Cong.April.2001

ii.

M.Armbrustetal,”A view of cloud computing,”comm..of the

ACM,vol.53,no.4,pp.50-58,2010.

iii.

RyanKLKO,Peterjagadpramana,MirandaMowbray,siani Pearson,

Markus Kirchberg,QianhuiLiang,Bu Sung Lee,”Trust Cloud: A framework for

Accountability and trust in cloud computing 2011 IEEE.

iv.

Muhammad RizwanAsghar,Mi haelaIon,BrunoCrispo,”ESPOON

Enforcing Encrypted Security Policies in Outsourced Environment”,2011 Sixth

International conference on Availability Reliability and Security.

v.

W.Jansen and T.Grance,”Guidelines on Security and Privacy in

Public Cloud Computing ,” Technical Report Special Publication 800 -

144,NIST,2011 vi.

Luca Ferreti,MicheleColajanni and Micro Marchetti,Access Control enforcement on query aware encrypted cloud databases IEEE 2013 vii.

“OracleAdvancedsecurity”,Oracle corporation, viii.

http://www.Oracle.com/technetwork/database/options/advanced_secu rity,April 2013.

ix.

L.Ferretti,M.colajanni,and .Marchetti,”Distributed,Concurrent,and independent Access to Encrypted cloud Data bases.”IEEE Transaction on

Parallel and distributed system, 2014

ICCIT15@Citech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Page 386

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

An Analysis of Multipath Aomdv in Mobile Adhoc Networks

S. Muthusamy, Dr. C. Poongodi

Department of IT, Bannari Amman Institute of Technology, Sathyamangalam muthusamybecse@gmail.com, poongodic@bitsathy.ac.in

ABSTRACT: A Mobile Ad-hoc Network (MANET) is a dynamic wireless network that can be formed without the need for any pre-existing infrastructure. It is an autonomous system of mobile nodes connected by wireless links. Each node in a

MANET operates as a router to forward packets and also as an end system. The nodes are free to move into a network in self manner. These nodes often changes location. Proactive,

Reactive and Hybrid are the three main classes of routing protocols. A Reactive (on-demand) routing strategy is a popular routing category for wireless ad hoc routing .The

design for as Reactive routing follows the idea that each node tries to reduce routing overhead by sending routing packets whenever a communication is requested. This survey compare the performance of two on demand reactive routing protocols for MANETs namely Ad hoc On Demand Distance Vector

(AODV), and Ad-hoc On-demand Multipath Distance Vector

Routing (AOMDV) .AODV are reactive gateway discovery algorithms where a mobile device of MANET connects by gateway only when it is needed. AOMDV was designed to solve problem in highly dynamic ad hoc networks where link failures and route breaks occur commonly. AOMDV maintains routes for destinations and uses sequence numbers to determine the freshness of routing information to prevent routing loops in active communication. AOMDV is a timer-based protocol and provides a way for mobile nodes to respond to link breaks and topology changes.

This survey states that Performance of

AOMDV is better than AODV by Packet Delivery Ratio, Life

Time of Network, Life Time of System and End-to-End Delay.

Key Words: - MANETS, AODV, DSR, AOMDV, MANET,

Routing.

1.INTRODUCTION

Network

Based on information technology (IT), a network is called as a series of points or nodes interconnected by communication paths or links. Networks can also be interconnect with other networks by routers .Networks contain sub networks also. A group of interconnected computers and peripherals wireless is capable of sharing software and hardware resources between many users either by using wire like cable and. The Internet is a global network of networks. by Through a system of routers, servers, switches, and the link connecting with each other a system that enables users of telephones or data communications lines to exchange information over long distances.

MANET

A mobile ad hoc network (MANET) is a collection of mobile nodes in a wireless architecture. MANET dynamically establishes the network in the absence of fixed infrastructure.

ICCIT15@ Citech, Bangalore

One of the typical features of MANET is each node must be able to act as a router to find out the optimal path to forward a packet with low cost. As nodes may be moving continuously, entering and leaving the network, the topology of the network will change automatically. For civilian and military applications

MANETs provide an emerging technology. One of the important research areas in MANETs is establishing and maintaining the ad hoc network through the use of routing protocols.

Routing In MANET

Routing based on the straight flow of data from source to destination in order to maximize the network performance. It has two fundamental requirements on the routing protocol such as (i) The protocol should be distributed and (ii) The protocol should able to compute multiple loop-free routes while keeping the communication overhead to a minimum.

Attacks In MANET

Attacks in MANET can be categorized into Passive attack and Active attack Passive attack This attack does not actually disrupt the operation of the operation of the network.

Example: Snooping is una uthorized access to another person’s data. Active attack This attack attempts to alter or destroy the data being exchanged in the network.

Challenges in MANET

One of the main challenges in ad-hoc networking is the efficient delivery of data packets to the mobile nodes. Here the topology is not predetermined because the network does not have centralized control mechanism. Routing in ad-hoc networks can be viewed as a challenge due to the frequently shifting topology.

The design of robust routing algorithms that adapt to the frequent and randomly changing network topology is another big challenge. This paper compare and evaluate the performance of two types of On demand routing protocols namely Ad-hoc

On-demand Distance Vector (AODV) routing protocol which is unidirectional path and Ad hoc On-demand Multi path Distance

Vector (AOMDV) routing protocol which is multipath.

AOMDV incurs more routing overhead and packet delay than

AODV but it had a better efficiency when it comes to number of packets dropped and packet delivery.

2. LITERATURE SURVEY

Information communication is a necessary practice in

Information Era that is done by forwarding information from one node to another node. Information forwarding task is done with the help “R outing ”. Routing is a c hallenging task since there is no central coordinator, such as base station, or fixed routers in other wireless networks that manage routing decision. Each node

Page 387

International Journal of Engineering Research

Volume No.4, Issue Special 5 act as a router/base station to forward the information, Hence a special form of routing protocols is necessary, There are ample number of routing protocols have been developed for MANETs.

Routing protocols for Mobile ad hoc networks broadly classified into the following categories:

1.

Proactive or table-driven routing protocols (DSDV)

2.

Hybrid Routing Protocols (ZRP).

3.

Reactive or on-demand routing protocols (DSR,

AODV, AOMDV).

[6]Proactive (table-driven) routing protocol is an approach where each router can build its own routing table based on the information that each router or node can learn by exchanging information among the network’s routers. This is achieved by exchanging update messages between routers on a regular basis to keep the routing table at each router up-to-date. Then, each router consults its own routing table to route a packet from its source to its destination. When a source node or an intermediate node consults the routing table, the path information, which is up-to-date, is immediately available and can be used by the node. This is because each router or node in the network periodically updates routes to all reachable nodes via broadcasting messages that the node received from the other nodes in the network. The advantage of these protocols hence maintains routing information about the available paths in the network even if these paths are not currently used.

[7]Hybrid routing protocols aggregates a set of nodes into zones in the network topology. Where the network is partitioned into zones and proactive approach is used within each zone to maintain routing information. The reactive approach is used to route packets between different zones. The route to a destination that is in the same zone is established without delay and a route discovery and a route maintenance procedure is required for destinations that are in other zones in the network. The zone routing protocol (ZRP) and zone-based hierarchical link state

(ZHLS) routing protocol. The advantage of this protocol is

Compromise on scalability issue in relation to the frequency of end-to-end connection, the total number of nodes and the frequency of topology change.

[1]Ad hoc On-Demand Distance Vector Routing (AODV) is an on-demand, single path, loop-free distance vector protocol. It combines the on-demand route discovery mechanism in DSR with the concept of destination sequence numbers from

DSDV.However, unlike DSR which uses source routing; AODV takes a hop-by-hop routing approach. The Ad hoc On-Demand

Distance Vector Routing protocol enables dynamic, self-starting, multi-hop routing between participating mobile nodes wishing to establish and maintain an ADHOC network.

[8] The operation of the protocol has two phases: route discovery and route maintenance. In Ad-hoc routing, when a route is needed to some destination, the protocol starts route discovery. Then the source node sends route request (RREQ) message to its neighbors, if those nodes do not have any information about the destination node, then they send the message to all its neighbors and so on, if any neighbor node has the information about the destination node, the node sends route reply message to the route request message initiator.

ICCIT15@ Citech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

On the basis of this process a path is recorded in the intermediate nodes. This path identifies route and is called the reverse path.

Since each node forwards route request message to all of its neighbors, more than one copy of the original route request message arrive at a node. A unique id is assigned, when a route request message is created. When a node receives the RREQ, it checks id and the address of the initiator, if it had already processed that request, it discarded that message. Node that has information about the path to the destination sends route reply

Message to neighbor from which it has received route request message. The neighbor does the same. Due to the reverse path it possible. Then the route reply (RREP) message travels back using reverse path. When a route reply message reaches the initiator the route is ready and the initiator start sending data packets.The AODV protocol maintains an invariant that destination sequence numbers monotonically increase along a valid route, thus preventing routing loops.Less memory space is required as information of only active routes are maintainedIncreases the performance.

[2] The Dynamic Source Routing (DSR) is an on demand source routing protocol that employs route discovery and route maintenance procedures. In DSR, each node maintains a route cache with entries that are continuously updated as a node learns new routes. A node wishing to send a packet will first inspect its route cache to see whether it already has a route to the destination. If there is no valid route in the cache, the sender initiates a route discovery procedure by broadcasting a route request packet, which contains the address of the destination, the address of the source, and a unique request ID. As this request propagates through the network, each node inserts its own address into the request packet before rebroadcasting it. As a consequence, a request packet records a route consisting of all nodes it has visited. When a node receives a request packet and finds its own address recorded in the packet, it discards this packet and does not rebroadcast it further. A node keeps a cache of recently forwarded request packets, recording their sender addresses and request IDs, and discards any duplicate request packets.

3. PROBLEM STATEMENT

A. OVERVIEW

Ad-hoc On-demand Multi path Distance Vector

Routing (AOMDV) [9] protocol is an expansion of the AODV protocol for computing multiple loop-free and link disjoint paths

[1]. The routing table entries for each destination contain a list of the next-hops address along with the corresponding hop counts.

The same sequence number should be followed for all the next hops. This helps in keeping track of a route.

B. HOP COUNT

The advertised hop count should be maintained for each destination which is defined as the maximum hop count for all the paths .This will be used for sending route advertisements for the destination .An alternate path to the target defines duplicate route advertisement received for each node. If alternate path has a less hop count than the advertised hop count for that destination loop freedom is guaranteed for a node by accepting alternate paths to destination. The advertised hop count therefore

Page 388

International Journal of Engineering Research

Volume No.4, Issue Special 5 does not change for the same sequence number because of the maximum hop count is used. The next-hop list and the advertised hop count are reinitialized when a route advertisement is received for a destination with a greater sequence number.

AOMDV allows intermediate nodes to reply to RREQs while selecting disjoint paths. AOMDV is a better on-demand routing protocol than AODV since it provides better statistics for packet delivery and number of packets dropped.

AOMDV can be used to discover routes with nodedisjoint or link-disjoint. To find node-disjoint routes, each node does not immediately reject duplicate RREQs. Each RREQs arriving via a different neighbor of the source defines a nodedisjoint path. This is because nodes cannot be broadcast duplicate RREQs, so any two RREQs arriving at an intermediate node via a different neighbor of the source could not have traversed the same node.

In an attempt to get multiple link-disjoint routes, the destination replies to duplicate RREQs, the destination only replies to RREQs arriving via unique neighbors. After the first hop, the RREPs follow the reverse paths, which are node disjoint and thus link-disjoint. The trajectories of each RREP may intersect at an intermediate node, but each takes a different reverse path to the source to ensure link disjointness.

4.ARCHITECTURE DESIGN

Comparison between AODV and AOMDV

The architecture design as shown in Figure 1 in this section is to extend the AODV protocol to compute multiple

Fig ure

1:-

Arc hite ctur e

Des ign

Of

AO

MD

V

Pro toco l

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 disjoint loop-free paths in a route discovery. We assume that every node has a unique identi fier (UID) (e.g., IP address), a typical assumption with adhoc routing protocols.

or simplicity, we also assume that all links are bidirectional, that is, a link exists between a node i to j if and only if there is a link from j to i. AOMDV can be applied even in the presence of unidirectional links with additional techniques to help discover bidirectional paths in such scenarios.

SNO

1

2

3

4

5

EXISTING(AODV) AOMDV

Distance vector concept uses hop-by-hop routing approach

Finds routes on demand using a route discovery

Propagation from the source towards the destination establishes multiple reverse paths both intermediate nodes as well as at the destination.

Traverse these reverse paths back to form multiple forward paths to the destination at the source and intermediate nodes.

Multiple paths discovered are loop-free and disjoint

Multiple paths discovered are loop-free and disjoint also detailed description of route update rules used at each node and the multipath route discovery procedure.

Overhead incurred discovering multiple paths.

in

It does not employ any special control packets

Overhead n o t incurred in discovering multiple paths.

employ packets special control

5. CONCLUSION

This paper evaluated the performance of AODV,

AOMDV and DSR using ns-2. Comparison was based on the packet delivery fraction, throughput and end-to-end delay. We concluded that in the static network (pause time 50 sec),

AOMDV gives better performance as compared to AODV and

DSR in terms of packet delivery fraction and throughput but worst in terms of end-to end delay

REFERENCES i.

W. Heinemann, A. Chandrakasan, and H. Balakrishnan, “An

Application-Specific Protocol Architecture for Wireless Micro sensor

Networks,” IEEE Trans. Wireless Comm., vol. 1, no. 4, pp. 660 -670, Oct.

2002.

ii.

L.B. Oliveira et al., “SecLEACH -On the Security of Clustered

Sensor Networks,” Signal Processing, vol. 87, pp. 2882 -2895, 2007.

iii.

P. Banerjee, D. Jacobson, and S. Lahi ri, “Security and

Performance Analysis of a Secure Clustering Protocol for Sensor

Networks,”Proc. IEEE Sixth Int’l Symp. Network Computing and Applications

(NCA), pp. 145-152, 2007.

iv.

K. Zhang, C. Wang, and C. Wang, “A Secure Routing Protocol for

Cluster-Bas ed Wireless Sensor Networks Using Group Key Management,”

Proc. Fourth Int’l Conf. Wireless Comm., Networking and Mobile Computing

(WiCOM), pp. 1-5, 2008.

v.

Shamir, “Identity Based Cryptosystems and Signature Schemes,”

Proc. Advances in Cryptology (CRYPTO), pp. 47-53,1985.

vi.

J. Liu et al., “Efficient Online/Offline Identity -Based Signature for

Wireless Sensor Network,” Int’l J. Information Security, vol. 9, no. 4, pp. 287 -

296, 2010.

ICCIT15@ Citech, Bangalore

Page 389

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Light Weight SNMP Based Network Management and Control System for a

Homogeneous Network

Brunda Reddy H K, K Satyanarayan Reddy

Dept. of CSE(M.Tech), Cambridge Institute of Technology, B’lore -36

brundha1991@gmail.com

, satyanarayanreddy.cse@citech.edu.in

Abstract

Network information helps in dissects faults and errors in a network. Remedying such faults and errors is a major task of an organization network management system.

This paper introduces a mechanism that uses a Light Weight

Simple Network Management Protocol (SNMP) - based solution that addresses discrete kinds of network devices and discovers Interface-to-Interface connectivity among the devices and basic information of those network devices This paper proposes a algorithm to discover network, discover device type, and Interface-to-Interface connectivity This paper concentrates on a subnet of an organizations network.

Keywords — MIB, OID, SNMP, Topology, Subnetwork.

I.

I N T R O D U C T I O N

Network topology is an illustration of nodes and links in a network and nodes are interconnected with each other. The network topology can be classified as a physical network topology, which is referred as enacting of the physical connectivity relationships that exist among entities or nodes in a network. A physical network corresponds to many logical topologies, in which a network is divided into logical segments through subnets.

An organization consists of many departments and an organization’s level network consists of many subnetworks.

Network topologies can constantly changes as nodes and links join a network and network capacity is increased to deal with added traffic. Keep track of network topology manually is a frustrating and often impossible job. An inexperienced network administrator joining an organization faces many problems due to the lack of a discovery tool. Even for the experienced person, keeping track of devices and their connectivity details, without having a proper method of visually presenting them becomes a difficult task. In order to avoid these problems accurate topology information is necessary for simulation, Network management and so on.

Thus, there is a considerable need for automatic discovery of network topology. This paper proposes a Light Weight SNMP based solution. The solution using SNMP is simple and effective and it is easy to use because even if the host or any device that does not support SNMP, can still find the connections. This paper generates a solution that performs better than other systems and generates least amount of traffic and network bandwidth will be less. This paper concentrates on subnetwork of the organization-level topological discovery.

area, and there are a many number of interesting mechanisms

-such as ping, Trace route, DNS, address resolution protocol

(ARP), and SNMP- available to discover network elements and the connectivity among them.

R.Siamwalla et al. [ii] did a good work and proposed mechanisms to discover topology by combining ping, trace route, SNMP, DNS, and ARP. These methods can able to discover only L3-level devices, and thispaper did not propose any method to discover L2 – or host-level devices, though they proved that SNMP performs better than all other mechanisms. Yuri et al. [iii] proposed a mechanism that is heterogeneousbut this mechanism requires ICMP spoofing in order to get complete forwarding table, which is not allowed in most of today’s networks. Though they did a good work in explaining the connectivity algorithm they failed to provide details on SNMP MIBs required for collecting network topology information. Lowekamp et al. [iv] proposed a mechanism by which they would not require complete forwarding information of bridges; their approach contradicted of Yuri et al. [iii]. SumanPandey et al. [i] extended the work of Lowekamp et al. [iv] and proposes a complete topological discovery mechanism to discover L2-

L2, L3-L3, L2-L3, and L2 and L3 to end host connectivity.

This paper extend the work of SumanPandey et al. [i] and discovers details of each network devices discovered in the organisations subnetwork that is supported by the SNMP and for those devices that does not support SNMP, this paper make use of ICMP echo request to check the device is alive or not and displayed ping information and some basic information of the device.

Organization of this paper is as follows: The network topology discovery algorithms are explained in section 2, and implementation is explained in section 3, and our conclusion and future works are explained in section 4.

II.

NETWORK DISCOVERY ALGORITHM

In this section, discovering network nodes and the connections between them and details of each discovered devices are explained. Since the approach of this paper is mainly depending on the SNMP, it first analyzes the

Management Information Base (MIB) objects required to discover the network and the devices in the network. Then those MIB’s are used to discover the network, type of device, details of particular device and connectivity between switch and network devices.

Related work: Discovering the topology of the Internet is a problem that has attracted the attention of many networking researchers. Network connectivity discovery is a well known

A.

MIBs used

Discovery mechanism used in this paper is completely based on SNMP. Table 1 explains all the SNMP MIB Objects needed.

ICCIT15@Citech, Bangalore

Page 388

International Journal of Engineering Research

Volume No.4, Issue Special 5

TABLE I

MIBS USED FOR NETWORK DISCOVERY

MIB-II RFC 1213[ix],RFC 1759[vi],RFC 1514[vii].

sysServices, sysDescr, ifTable, ipAddrTable, ipRouteTable, ipNetToMediaTable, hrSystemUptime, hrSystemNumUsers, hrSystemProcesses, hrMemorySize, hrStorageTable, hrDeviceTable, prtInputTable, prtOutputTable.

BRIDGE-MIB for connectivity Discovery[x] dot1dTpFdbAddress, dot1dTpFdbPort

B.

Overall Network Discovery Algorithm

Algorithm 1 shows the overall network connectivity discovery.

To discover a network the basic input needed for our system is

IP address of at least one switch used in a department or subnet in the organization; barrier information i.e., IP address of local host, community string of SNMP, SNMP version, port number.

Network discovery process uses an ARP cache table, ICMP utilities to discover the devices in the networks. For each discovered device in the network, it checks whether the device is alive or not then if device is alive it checks SNMP is supported or not. If the device support SNMP then discover the device type, such as a device is L2/L3 switch, workstation, and printer.

Depending on the type of the device, the appropriate MIB information is fetched from the SNMP agents and stored in the database. The MIB information retrieved from SNMP agent is used to find the connectivity among devices in the network. In this way, discovering connections between L2/L3 switches and work stations and all other devices such as printers in the network.

TABLE 2

ALGORITHM 1: OVERALL ALGORITHM

1.

Take a switch IP Address as input

2.

Network devices discovery a) Device discovery ipNetToMediaNetAddress b) Device discovery ipNetToMediaPhysAddress

3.

Device type discovery

4.

Device details discovery based on device type

5.

Connectivity discovery using using

C. Recursive Network Device Discovery Algorithm

This paper make uses of a simple, workable architecture of RFC

1213, RFC 1759, and RFC 1514 managed objects for managing

TCP/IP and UDP-based networks. This paper utilizes the minimum and workable architecture of RFC 1213 to discover network topology. This RFC defines managed objects and they are standard and implemented by all vendors. The information found in RFC 1213 is sufficient for discovering all the devices in the network.

Network devices are discovered by using the ipNetToMediaTable object. This paper utilizes ipNetToMediaPhysAddress and ipNetToMediaNetAddress maintained by the ipNetToMediaTable.

ipNetToMediaNetAddress contains all the IP address of the devices connected to a particular switch and

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 ipNetToMediaPhyAddress contains MAC address for all devices connected to switch.

As soon a node is discovered, all unique ipNetToMediaNetAddress entries are used to discover another set of new nodes. One particular device connected to switch can help in discovering more devices and the discovery process is recursive.

For the devices that does not support SNMP, icmp echo requests are used to check a device is alive or not. If the device is alive display ping details and some basic information for that device. Depending on the number of subnetworks a device can contain multiple IP address. All the

IP address of a device can obtain using ipAdEntAddr object of ipAddrTable.

TABLE 3

ALGORITHM 2: RECURSIVE ALGORITHM FOR DEVICE DISCOVERY

1.

Set of switch IP Address, discovering devices through each switch

2.

ipNetToMediaNetAdress (IP address) a) for each switch, get all the IP address of devices from ipNetToMediaNetAddress b) if there is no ipNetToMediaNetAddress, then return; c) Call ipNetToMediaNetAddress recursively for all the switch IP address

3.

ipNetToMediaPhysAddress (MAC address) a) for each switch, get all MAC address from ipNetToMediaPhysAddress b) if there is no ipNetToMediaPhysAddress, c) then return; call ipNetToMediaPhysAddress recursively for each switch

D. Device Type Discovery Algorithm

The type of device is discovered by using sysServices MIB object and convert value of sysServices into a seven-bit string. Each bit corresponds to the 7 layer of OSI network model

For the devices that are discovered by the switch, type can be discovered by printer MIB and bridge MIB. If the devices that support bridge MIB then the devices is of type switch and the devices that support printer MIB , then the device is of type printer and devices that does not support both MIBs is of type workstation or end host. In this way the input switch type and the devices connect to switch type can be discovered.

TABLE 4

ALGORITHM 3: DEVICE TYPE DISCOVERY ALGORITHM

I.

Discovering switch type

II.

1.

For each switch given as input

2.

sysServices object is used

3.

convert sysServices in to seven-bit string

4.

type of switch is obtained on the basis of enabled bits of seven bit string of sysServices

5.

repeat

Discovered devices type discovery

1.

Check if device supports printer MIB, then return printer

2.

Else if device supports bridge MIB, then return switch

ICCIT15@Citech, Bangalore

Page 389

International Journal of Engineering Research

Volume No.4, Issue Special 5

3.

Else return workstation

4.

Repeat Discovered device type discovery for all devices discovered

E.

Device Details Discovery

As the type of all devices is discovered, the details of particular type of device are retrieved from their specified

MIB. For switches, interface details, route details, and devices connected details are fetched. Interface details are maintained by the ifTable object. For route details of each switch, ipRouteTable MIB object is used. For end hosts details discovery, this paper utilizes RFC 1514 supported MIBs.

Workstation details include uptime, number of users, current processes, memory size, storage details and device description.

For retrieving these details hrSystemUptime object, hrSystemNumUsers object, hrSystemProcesses, hrMemorySize object, and for storage details hrStorageTable

MIB is used. For description of each workstation, hrDeviceTable’shrDeviceDescr object is used. For printer details discovery, this paper utilizes RFC 1759 MIB module.

We retrieved Input, Output details of the printer. prtInputTable is used to retrieve input details of printer and prtOutputTable is used for output details of each printer.

F.

Connectivity Discovery

An organization network is made up of distinct types of devices.

To find connections between the various types of devices and the switch is a challenging work. In this section, we find interface to interface connectivity; the Algorithm 4 explains connectivity between devices of network and the switch. After the type of device is discovered, the process of discovering connectivity begins.

For discovering connectivity, only Bridge MIB is used. In bridge MIB of switch dot1dTpFdbTable MIB object is used. In this object, dot1dTpFdbAddress object contains the mac address of the devices which are connected to the switch and dot1dTpFdbport contains the port numbers of the switch to which devices are connected. Using these two MIB objects we find the connectivity between switch and the devices. Algorithm

4 explains the procedure.

Since dot1dTpFdbAddress contains MAC address of the devices connected to switch we need to retrieve IP address for those devices. To retrieve IP address, we make a set of all mac address and IP address of devices connected to switch from ipNetToMediaPhysAddress and ipNetToMediaNetAddress respectively and make a set of all mac address, port numbers from dot1dTpFdbAddress and dot1dTpFdbport respectively. By mapping the ipNetToMediaPhysAddress set and dot1dTpFdbAddress set, the matched set of MAC address are retrieved. For those MAC address the corresponding IP address and port numbers are retrieved from ipNetToMediaNetAddress set and dot1dTpFdbPort set respectively. One switch can be connected to other switch through some interface that interface

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 is said as learned port and the ports to which devices are connected physically are said as physical port of the switch.

By this way we can also find which switch is connected to which port of the switch. This method is heuristic but by verifying it manually we found that this method generates correct results for some switch of the organizations subnetwork and this method has to be refined and we will do this as a future work.

TABLE 5

Algorithm 4: Connectivity Discovery Algorithm

1.

For each switch, to discover the interface to which the devices are connected a) Get the set of MAC address of the switch from dot1dTpFdbAddress i.e., {Mia…….Mir} b) Get the set of port numbers of the switch from dot1dTpFdbport i.e., {Pia…….Pir} c) Get the set of MAC address of the switch from ipNetToMediaPhysAddress i.e., {Mja……..Mjr} d) Get the set of IP address of the switch from ipNetToMediaNetAddress i.e., {Nja……..Njr} e) To get the IP address of the device, map set1 i.e.,

{M…….M}and set 3 i.e., {M……..M} f) If any MAC address from both sets matches, then the corresponding IP address and port number is retrieved g) Repeat from 1

III.

IMPLEMENTATION.

We used Java 1.6, Tomcat Apache, SNMP4J API [viii], Net beans 6.9.1 IDE with JDK and JFree Chart [v] for plotting graphs. We developed and tested our system on Red Hat Linux

5.4 with 2.80

– GHz, Pentium 4 CPU with 512 MB RAM.

We applied our discovery system to a subnet of an organization network: a department, ISRO (ISAC). We found few switches, printers, workstations and a server and gateway.

In those switches one switch s a central LAN switch and it is used to connect two departments. We noticed that the number of switches, routers, is same but the number of work stations varied. In the devices discovered, some may not in use or not alive. We used icmp echo request to check whether the device is alive or not. For the device that is alive, we check whether those devices support SNMP or not. If SNMP is not supported we displayed some basic information and ping details of that device to show that the device is alive but SNMP is not supported. we compared the time taken for the device discovery through each switch. Figure shows the time taken to discover the number of devices. Our system took 3 seconds to discover 160 devices connected to central LAN switch that switch 1 and 2 seconds to discover 33 devices, 33 devices, 33 devices and 32 devices connected to switch 2, switch 3, switch

4 and switch 5 respectively. Our system took 3 to 4 seconds to discover device type and details of each device.

ICCIT15@Citech, Bangalore

Page 390

International Journal of Engineering Research

Volume No.4, Issue Special 5

180

160

140

120

100

80

60

40

20

0

0

Switch 1

Switch 4

2

Switch 2

Switch 5

4

Switch 3

Time taken for discovery in seconds

Figure 1. Test Results of ISAC Department.

6

IV.

CONCLUSION AND FUTURE WORK.

In this paper, we focus on discovering the devices of subnetwork of an organization we also discover the connectivity between switch and devices and some details of those devices.

We discovered different type of devices, including switches, printers and end host and enhanced the already existing technique of device type discovery. We utilized the SNMP mechanism, which is the most efficient mechanism and this generates the least amount of traffic in comparison to mechanisms in other research.

Since our discovery system is applied to a subnetwork of an organization, our future goal is to discover the entire organization’s network. For visualizing a network we aim to represent a network in a graphical form and include more link characteristics such as link capacity and link failure on the graphical representation. To notify the SNMP manager (client) about the problem at SNMP agent (server) like disk crash at systems and so on we plan to use SNMP traps in the future.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

ACKNOWLEDGMENT

Many thanksto Dr.Suresh L, Principal, Cambridge Institute of

Technology, Bangalore, for his continuous encouragement in every aspect during the course of work and guidance in betterment of the same. I would also wish to thank Shri. D K

Mohan, Chaiman, Cambridge Institute of Technology for providingexcellentinfrastructure and platform over the course of period of work.

I am thank full to the esteemed organization(ISRO) for giving me an opportunity to do the project and I also thank full to my guide B.Prabakaran for his guidance to complete this project. My special thanks to Nikhil Kumar who is a administrator in a department of ISRO for his support in testing this project.

REFERENCES i.

SumanPandey, Mi-Jung Choi, Sung- Joo Lee, James W. Hong “ IP

Network Topology Discovery Using SNMP”, POSTECH, Korea 2013.

ii.

R.Siamwalla, R. Sharma, and S. Keshav, “Discovering internet topology, ” Cornell Univ., Ithaca, NY, Techical Report.

iii.

Y. Breitbart, M. Garofalakis, B. Jai, C. Martin, R. Rastogi, A.

Silberschat z, “Topology Discovery in Heterogeneous IP Networks: The

NetInventory System,” IEEE/ACM Transactions on Networking.

iv.

B. Lowekamp, D. R. O’Hallaron, T. R. Gross, “Topology discovery for large Ethernet networks,” ACM SIGCOMM, San Diego, CA,

USA, pp. 237~248.

v.

vi.

JFreeChart implementations, http://www.jfree.org.

R. Smith, F. Wright, S. Zilles, J Gyllenskog, “Management

Information Base for Printer” RFC 1759, IETF, March 1995 vii.

P. Grillo, S. Waldbusser, “Host Resources MIB”, RFC 1 514,

September 1993.

viii.

SNMP, SNMP4J API, http://www.smp4j.org

ix.

K. McCloghrie, M. Rose, “Management information Base for

Network Management of TCP/IP- based Internets, MIBII,” RFC 1213,

IETF, March 1991 x.

E.Decker, P. Langil le, A.Rijsinghani, K .McCloghrie, “Bridge

MIB,” RFC 1493, July 1993 .

ICCIT15@Citech, Bangalore

Page 391

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Lagrange Based Quadratic Interpolation for Color Image Demosaicking

Shilpa N.S., Shivakumar Dalali

Dept. of Computer Science, Cambridge Institute of Technology, Bangalore.

shilpans11@gmail.com, shivakumar.dalali@gmail.com

Abstract

Digital image processing has become so popular over past few decades but increasing level of noise will effect the quality of the image. Noise has to be removed to improve the image quality. The bayer color filter array (CFA) gives information about the intensity of light in red, green and blue(RGB) wavelength regions. The CFA image captured by the image sensor is then demosaick to get full color (RGB) image. The present work represent a novel color image demosaicking algorithm using a lagrange quadratic interpolation method and directional interpolation method. By introducing the lagarnge interpolation the interpolation direction of the center missing color component can be determined with minimum error. Also the center missing color component is interpolated using the quadratic interpolation method by exploring the intra channel correlation of the neighboring pixels. In addition to this the present work contributes in strengthening the image quality and provides superior performance in both objectively and subjectively.

pattern gives special importance to the number of green sensors to mimic the human eyes greater sensitive to green light .The demosaic method based on interpolation to convert two – dimensional- bayer encoded image into the true color image, RGB, which is an M-by-N-by-3 array.

Sensor alignment is one of the following text strings that specifies the bayer pattern. Each string represents the order of the red, green and blue sensors by describing the four pixels in the upper left corner to the image (left-to-right, top-to-bottom).

G

B

1A

5A

R

G

2A

6A

G

B

3A

7A

R

G

4A

8A

G

9A

R

10A

G

11A

R

12A

Keywords — .

Color filter array(CFA) interpolation, demosaicking, lagrange quadratic interpolation.

B

13A

G

14A

B

15A

G

16A

I. INTRODUCTI ON

Human eyes can perceive a few million colors. Most of these colors can be produced by mixing just the three primary colors

− red, green and blue − in varying proportions . Image sensor are used to acquire primary colors.

Three separate sensors are required for a camera to acquire an image. To reduce the cost and space many cameras are using a single sensor covered with a color filter array (CFA). In the CFA-based sensor configuration, commonly 2×2 bayer patterns are used to acquire an color image as shown in fig 1. Color image contains three

RGB planes. The CFA image contains few color pixels of each plane and remaining pixels are missing.

Those missing color components are estimated by considering the existence of acquired neighboring pixels contained in the CFA image. This process is called interpolation. The interpolation is applied to each and every missing pixel to obtain a full color image. The color interpolation process is known as demosaicikng.

Although many different CFA patterns have been proposed in the camera, the most prevalent is the 2×2 ‘GRBG’ Bayer patter shown in fig [1]. The color reproduction quality depends on the CFA templates and the demosaicking algorithms that are employed.

There are various demosaicking algorithms [1] - [8] have been proposed in the past few decades based on Bayer pattern.

A bayer filter array or CFA represent the arrangement of color filters that each and every sensor in a single sensor digital camera only acquired red, green and blue pixels. The

ICCIT15@CiTech, Bangalore

Fig. 1. Bayer CFA pattern.

The existing methods were proposed to obtain a full color image by utilizing the color differences between RGB planes. Each method has its own both the advantage and a disadvantages with respect to the interpolation .In a demosaicking technique there are lot of challenges to achieve interpolation with efficient and effective manner to obtain full 24 bit color image with less degradation.

In this paper proposed interpolation technique is simplest approach to the demosaicking problem to treat color planes seperately and fill in missing pixels in each plane using a lagrange based quadratic interpolation. Advantage of this method is more effective in smooth region. However, existing methods leads to color artifacts and lower resolution in regions with texture and edge structures. To overcome from these issues, proposed method reduces color artifacts and giving good resolution in the edges. Here introducing an interpolation in both horizontal and vertical directions and seperately treated for all the three RGB planes independently.

The rest of the paper is organized as follows. Section II describes the proposed Lagrange based Quadratic interpolation algorithm. Section III presents experimental results, and Section IV the conclusions.

Page 392

International Journal of Engineering Research

Volume No.4, Issue Special 5

II. PROPOSED METHOD

A. Lagrange quadratic interpolation

The motivation of this proposed method comes by observing the traditional demosaicking methods [1] - [8]. Due to the inaccurate edge information the center missing color component cannot be interpolated accurately because there is a inadequate information of irregular edge and texture details exists. Here the edge directions of the neighboring pixels are estimated in order to exploit the main direction by lagrange quadratic interpolation with localy.

B. Interpolating the missing Green component

The missing green components can be interpolated by using the adjacent neighboring acquired pixels as shown in fig 2. So the missing green components represents, for example G

2I which is interpolated by using acquired green components of

G

1A and G

3A

. Similarly remaining interpolated missing green components can be obtained by using neighboring acquired green pixels in the green plane.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Where X is the position of the missing pixel position, X

3,0 the position of the acquired pixel G

9A

, X

3,1 is is the position of the acquired pixel G

11A

.

Vertical Interpolation

In the vertical interpolation only two pixels in the coloumn are interpolated because the pattern G

6A

The missing pixel G

5I is already acquired.

in the first column can be obtained using equation (3) and the third column missing pixel G

7I using equation (4).

First column,

Where Y is the position of the missing pixel position, Y

1,0 the position of the acquired pixel G the acquired pixel G

9A.

1A

, Y

1,1 is is the position of

Third column,

Where Y is the position of the missing pixel position, Y

3,0 the position of the acquired pixel G

3A

, Y

3,1 is is the position of the acquired pixel G

11A

.

C. Interpolating the missing Red component

Fig.2. Demosaic Green plane

Horizontal Interpolation

The missing pixel G

2I equation (2).

in the first row can be obtained using equation 1 and the third row missing pixel G

10I using

First row,

Fig.3. Demosaic Red plane

Horizontal Interpolation

First row,

Where X is the position of the missing pixel position, X

1,0 the position of the acquired pixel G

1A

, X

1,1 is the position of is the acquired pixel G

3A

.

Third row,

Third row,

Vertical Interpolation

ICCIT15@CiTech, Bangalore

Page 393

International Journal of Engineering Research

Volume No.4, Issue Special 5

First coloumn,

Second coloumn,

Third coloumn,

D. Interpolating the missing Blue Component

Horizontal Interpolation

Second row,

Third coloumn,

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

III. Experimental Results

After applying the interpolation obtained the better results.

By observing the marking region in fig5 and fig7,in fig5 using bilinear interpolation in that marked region not giving the detailed texture information but in fig3 in the marked region its giving better texture details.

In fig8 and fig 10 shows the marked region in original and demosaick images shows the difference in the smooth region.

Original image

Fig5.original image of size 512×512 .

CFA image

Fig 4.Demosaic Blue plane

Third row,

Vertical Interpolation

First coloumn,

Second coloumn,

ICCIT15@CiTech, Bangalore

Fig6.CFA image of size 512×512

Page 394

International Journal of Engineering Research

Volume No.4, Issue Special 5

Demosaick image

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Demosaick image

Fig7.Demosaick image using lagrange interpolation

Original image

Fig8.original image

CFA image

Fig9. CFA image

ICCIT15@CiTech, Bangalore

Fig.10.Demosaick image

IV. Conclusion

In this paper we proposed an efficient demosaicking algorithm by applying a lagrange based quadratic interpolation method along with the horizontal and vertical directions. This algorithm is more efficient, very simple and consumes less time. And gives better image quality not only in smooth regions as well as in the irregular edge and texture details.

By using this method the true color (RGB) image obtained independently is with less degradation. In future by applying this method to all other directions around the missing pixels to obtain true color image.

References

i.

Xiangdong Chen,Gwanggil Jeon,Jechange Jeong ”voting -based directional interpolation method and its application to still color image demosaicking”,vol.24.no.2,February 2014.

ii.

Pekkucuksen and Y.Altunbasak,” Edge strength filter based color filter array interpolation” IEEE Trans.

Image Process..vol.21.no.1,pp.393-

397,Jan.2012.

iii.

K.H.

Chung and

Process.,vol.15,no.10,pp.2944-2945,0ct,2006.

Y.H.ChaTrans.Image

iv.

N.X.lian,L.chang,Y.P.Tan,and V.

Zagorodnov.”Adaptive filtering for color filter array demosaicking,”IEEE Trans.

Image

Process.,vol.16,no.10,pp.2515-2525,oct.2007.

v.

R.Lukac,K.N.Plataniotis,and D.Hatzinakos,”color image zooming on the bayer patterns,”IEEE Trans,Circuits Syst.Video

Technol.,vol,15,no.11.pp,1457-1492,Nov,2005.

vi.

D. Paliy, V. Katkovnik, R. Bilcu, S. Alenius, and K. Egiazarian,

“Spa -tially adaptive color filter array interpolation for noiseless and noisy data,” Int. J. Imag. Syst. Technol., vol. 17, no. 3, pp. 105 – 122, 2007.

vii.

L.Zhang,A. Wu,A.Buades,and X.Li.”color demosaicking by local directional interpolation and non-localadaptive thresholding,”J.

Electron.Imaging,Vol.20,no.2,p,023016-16,2011.

viii.

A.Buades,B.Coll,J.M.Morel, and C.Sbert,”self -similarity driven color demosaicking,” IEEE Trans.Image Process.,vol.18,no.6,pp.1192-

1202,jun.2009.

ix.

P.Simoncelli.”Image quality assessment:from error visibility to structural similarity.” IEEETrans.Image Process.,vol.13,no.4.pp.600

-

612.Apr.2004.

x.

R. Lukac and K. N. Plataniotis, “A normalized model for color ratio based demosaicking schemes,” in Int. Conf. on Image Process., 2004, vol. 3, pp. 1657 – 1660.

Page 395

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Case Study: Leveraging Biometrics to Big Data

Shivakumar Dalali

1

, Dr. Suresh L.

2

, Dr. ChandrakantNaikodi

3

1

VTU,Belgaum

2,3

Dept. Of CSE. Cambridge Institute of Technology shivakumar.dalali@gmail.coM, suriakls@gmail.com,chandrakant.naikodi@yahoo.in

Abstract- In order to weave the exponentially increasing quantity of biometric data, it must be dealt with a big data perspective using technologies capable of processing massive amounts of data efficiently and securely. The main challenge in the biometric industry is to overcome all the threats during different phases of the biometric system development life cycle.

The current biometric models emphasis’s the importance and significance of big data. This paper capitalizes on the most important challenges encountered and critical criteria’s to be followed in biometric analysis and proposes a general approach for the big data biometric analysis.

I.

INTRODUCTION

Most people in the internet authenticationuses passwords. The

Biggest threat with password authentication approaches is the existence of too many password account pairings for each user which leads to forgetting or the same user name and password for multiple sites [1]. One possible solution to this problem can be the use of biometric systems [2][6][14]. Biometric authentication techniques try to validate the identity of a user based on his/her physiological or the behavioral traits, while their use on the internet is still relatively modest. The main reason is accessibility and scalability of existing biometric technology.

Similar issues are also encountered in other deployment domains of biometric technology such as forensics, law enforcement and alike. For example according to [3] the biometric databases of the Federal Bureau of Investigation, the US State Department,

Department of Defense, the development of Homeland security, and Aadhaar project in India are expected to grow significantly over the next few years to accommodate several hundred million

(or even billions) of identities. Such expectations make it necessary to devise highly scalable biometric technology, capable of operating on enormous amount data which in turn induces the need for sufficient storage capacity and significant processing power.

A. Big Data Mining platform

In data mining systems, the mining algorithms require computational intensive computing units for data analysis and comparisons. Computing platform needs two types of resources: data and computing processors. For small scale data mining tasks a single desktop, which contains hard Disk and CPU is sufficient. Indeed many data mining algorithms are designed for this type of problems [5][9].

Big data mining will rely on cluster computers with a high performance computing platform. A data mining task is deployed by running parallel programming tools. The role of the software component is to make sure that a single data mining task, such as finding best match of a query from a database with billions of records is split into many small tasks each of which is running on one or multiple computing nodes [11][13].

One solution with respect to the outlined issues is moving existing biometric technology to the big data platform that ensures appropriate scalability of the technology, sufficient amount of storage, parallel processing capabilities, new types of tools to analyze the data and with the wide spread availability of mobile devices also provides an accessible entry point for various applications and services that rely on mobile clients.

Hence big data biometrics analysis is capable of addressing issues related to the next generation of biometric technology, but at the same time offers new analytical tools possible to be used along with the existing biometric systems.

However moving the existing biometric technology to the big data environment is a nontrivial task. Biometric architects, developers and researchers who attempt to tackle this should be aware of challenges encountered with big data [10][12].

The organization of the paper is structured as follows. In section 2 we concentrate on the challenges, considerations and trends for big data in the field of biometrics. Section 3 concentrates on working strategies like operating territory and the focused areas for big data biometric analysis. In section 4 we propose the general approach for the big data biometric analysis.

Finally the paper is concluded with some comments.

II.

RELATED WORK

Big data biometric analysis is a highly active field, which gained popularity only a few years ago. Since the field covers a wide range of areas relating to all phases of big data analysis in biometrics, it is natural that not all possible aspects of the field is appropriately covered in available scientific literature. This is also true for big data biometric analysis [7][8].

This paper tries to cover challenges faced in the big data environment because big data gives many insights to the analysis. Mean while we need to make many considerations and mark the operating territories in the field of biometrics.

A. Challenges for Big data biometric analysis

Due to improvement in the field of electronic devices and multiple data collection sources produces enormous biometric data. Processing of this volume, variety, value and velocity data has the following challenges in the field of biometrics.

Handling large data: Almost every electronic sensing device generates some kind of digital data. However most of the data is not being used due to challenges such as storage, analysis and closed nature of existing biometric systems. And it also applies to the biometric devices and biometric data. Handling such huge biometric data is one real challenge and requires new types of storage, analytical skills and open systems.

ICCIT15@CiTech ,Bangalore

Page 396

International Journal of Engineering Research

Volume No.4, Issue Special 5

New biometric modes and multimode: Different types of biometric modes like new physical biometric modes, soft biometric modes and behavioral biometric modes are gaining popularity; new physical biometric modes like palm print, hand geometry, voice, signature, DNA and hand veins and soft biometric mode like Image Sample for knuckle biometrics, conjunctive vasculature and tattoo image needs to be addressed properly. New behavioral biometric modes like blinking pattern,

ECG, EEG, gait, game strategy, keystroke dynamics, text style.

And even the multimodal biometrics requires new techniques of fast analysis. Developing such environment is a real challenge in the big data biometric analysis [4][16].

New processing capabilities and algorithm: Big data and use of cloud, makes it very necessary to have a change in the traditional processing. Therefore the real challenge behind this is to develop new types of algorithms and new types of processing ideas to process the biometric data.

Interoperable system: Due to diverse modalities which can utilize tens of petabytes of biometric material in data storage, in near future these systems will not only be used to identify individuals in near real time, they will also need to share information between multiple organizations in order to successfully accomplish a wide range of identity management missions. This idea needs interoperable systems [15].

Very large biometric repositories: Traditional biometric systems have begun to reach the limits of their scalability due to a surge in enrollment and the increasing multi-modality of databases. To overcome these problems, need to have a large biometric repository.

B. Considerations for big data biometric analysis

Due to different characteristic nature like volume, value, variety and velocity of big data the design issues in the big data biometric needs following kinds of considerations.

Matching/processing algorithms: Most of the existing biometric algorithms are proprietary, unique, and not interoperable. In that some are very expensive to implement and maintain. Because of these reasons matching/processing algorithm needs to consider for refine/redefine.

Data fusion: More multimodal biometric systems are common now-a day which makes more difficult to modal. While fusing the different data modes needs to consider an idea of filtering bad data because bad data can be worse than no data.

Analysis: Biometric data analysis algorithms need to consider complex relationships and eliminate duplicate/multiple identities.

Storage: Big data needs huge storage which could be considered as cloud. Data needs to be accessed from “the edge” which requires adequate throughput to meet performance, security, classification, privacy, protection.

C. Trends in big data biometric

Big data analysis has some trends which are applied to the big data biometric as follows:

Keeping all inputs/samples instead of the best one which may impact storage, processing, etc.

Collecting biometric samples at the edge of the envelope creates new processing problems to eliminate duplications and to improving quality, orientation, and resolution.

ICCIT15@CiTech ,Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

New biometric modes are continuously adding e.g. voice, gait, and scent to the existing biometric systems.

Exploitation of soft biometrics like scars (marks), tattoos are creating new requirements to the biometric system designers.

III. WORKING STRATEGIES

To raise existing biometric systems in par with big data biometric systems needs some ideas like operating territories and focused areas.

A. Big data biometrics operating territory

In order to find insight present in the big data biometric we need some operating territories which are as follows.

1) Expand and improve open source biometric algorithms.

2) Refinement to the system model and tuning the existing biometric algorithms for big data biometrics.

3) Adapting big data analytics for biometrics.

4) Enhancing visualization tools.

B. Focusing areas

To leveraging biometric systems to the big data biometric systems we need to focus on the following areas:

Analytic stack.

Pipelines: Parallel computing and algorithms.

Data visualization.

Biometric specific system modelling, biometric fusion environment, visualization for biometrics and intelligence.

IV GENERAL APPROACH FOR BIG DATA BIOMETRIC

PROCESSING SYSTEM

The (Fig. 1.)demonstrates different components of big data biometric processing systems.

Diversified biometric data capture: For single trait, data may be captured from different sources. The captured data may be heterogeneous and diversified. For example in face recognition, face images may be collected from different sources like mobile cameras, cameras, surveillance cameras, face book images. Preprocessing may be used to extract salient features from these diversified data.

Fig 1: General approach for big data biometric processing system

Parallel preprocessing algorithms: Preprocessing is required to clean the samples which may be subjected to various types of noise, inferences and prepare the samples into appropriate

Page 397

International Journal of Engineering Research

Volume No.4, Issue Special 5 format for feature extraction or biometric analysis.

Preprocessing big data biometric needs new parallel preprocessing algorithms which will be able to remove noise and inferences in biometric data.

Parallel big biometric algorithms: Big data biometric analysis needs parallel feature extraction and data analysis or data mining algorithms which are suitable to meet considerations mentioned in section II B. The extracted biometric templates are stored in big databases/cloud storage as individual references.

Big data biometric recognition often makes use of a comparator module which can be carried out in two different modes namely user verification and user identification. The former performs authentication based on “are you who you claimed to be” mode.

This mainly involves a straight forward “1 to 1” comparison, where by the final verdict is a binary accept or reject decisions.

The latter performs an exhaustive “one to many” searches on the entire user database to solve the “who are you” question. The main aim of user identification is to find the closest matching identity if any exists.

V CONCLUSION

Leveraging existing biometric to Big data have an enormous potential market value and as such attract the interest of research and development groups from all around the world.This paper highlights Challenges, considerations, trends, operating territories and focused areas that need to be considered when designing big data biometric. A general approach for big data biometric processing systems is designed as an analysis stack.

BIBLIOGRAPHY i.

D. Balfanz,"The future of authentication",IEEE Security and Privacy,

2012, vol. 10, pp. 22-27.

ii.

Edmund Kohlwey, Abel Sussman, Jason Trost, Amber Maurer Booz

Allen Hamilton "Leveraging the Cloud for Big Data Biometrics" 2011 IEEE

World Congress on Services iii.

JernejBule, Peter Peer "Fingerprint Verification as a Service in KC

CLASS" Proceedings of the 1th International CLoud Assisted ServiceS, Bled, 25

October 2012.

iv.

GirishRaoSalanke N S, Dr. Maheswari N, Dr. Andrews Samraj,

S.Sadhasivam"Enhancement in the design of Biometric Identification System based on Photoplethysmography data”, Proceedings of 2013 ICGHPC March -

2013.

v.

XindongWu,Fellow, IEEE, Xingquan Zhu, Senior Member, IEEE,

Gong-Qing Wu, and Wei Ding, Senior Member, IEEE "Data Mining with Big

Data" IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 26, NO. 1, JANUARY 2014.

vi.

SharifahMumtazah Syed ahmad ,BorhanuddianMohd Ali, and Wan

Azizun Wan Adnan "Issues and challenges of biometric applications as access control tools of information security" IJICIC. Volume 8,2012.

vii.

David Hagan Biometric Solutions Architect "Biometric Systems and

Big Data", Big Data Conference May 8-9, 2012.

viii.

A. Machanavajjhala and J.P. Reiter "Big Privacy: Protecting

Confidentiality in Big Data",ACMCrossroads,vol. 19, no. 1,pp. 20-23, 2012 ix.

B. Huberman, "Sociology of Science: Big Data Deserve a Bigger

Audience",Nature,vol. 482, p. 308, 2012.

x.

A. Labrinidis and H. Jagadish"Challenges and Opportunities with

Big Data", Proc. VLDB Endowment, vol. 5, no. 12, 2032-2033,2012.

xi.

D. Luo, C. Ding, and H. Huang, "Parallelization with Multi-plicative

Algorithms for Big Data Mining", Proc. IEEE 12th Int’l Conf. Data Mining,pp.

489-498, 2012.

xii.

X. Wu, K. Yu, W. Ding, H. Wang, and X. Zhu "Online Feature

Selection with Streaming Features", IEEE Trans. Pattern Analysis and Machine

Intelligence,vol. 35, no. 5, pp. 1178-1192, May 2013.

xiii.

Joseph Rickert"Big Data Analysis with Revolution", R Enterprise

January 2011.

ICCIT15@CiTech ,Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 xiv.

Peter Peer and Jernej Blue "Building Cloud-based Biometric services",Informatica 37, pp 115-122, 2013.

xv.

Joshua C klontz, Brendan F Klare, Scott Klum, Anil K Jain and Mark

J Burge. " Open Source Biometric recognition ", IEEE BTAS, Sept 2013.

xvi.

N. S. GirishRaoSalanke, N. Maheswari and Andrews Samraj,"An

Enhanced Intrinsic Biometric in Identifying People by Photopleythsmography

Signal", Proceedings of the ICSIP 2012, Ó Springer India 2013.

xvii.

S.N.S. Raghava“Iris Recognition on Handoop: a Biometrics System

Implementation on Cloud Computing,” in: Proceedings of IEEE CCIS,2011.

xviii.

Karl Ricanek Jr.&Chris Boehnen “ Facial Analytics: From Big Data to Law Enforcement”Published by the IEEE Computer Society 2012.

xix.

Randal E. Bryant, Randy H. Katz and Edward D. Lazowsk “ Big-

DataComputing: Creating revolutionary breakthroughs in commerce” xx.

ManasBaveja, Hongsong Yuan, and Lawrence M. Wein“Asymptotic

Biometric Analysis for Large Gallery Sizes”December 2010.

xxi.

Anil K. Jain, Brendan Klare and UnsangPark.“ Face Recognition:

Some Challenges in Forensics”

Page 398

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

‘DHERO’ -Mobile Location Based fast Services

Bharath D., Anand S Uppar

Dept of CSE, SDIT Mangalore, DK

Bharathd363@gmail.com

ABSTRACT

:

Location based services is a part of mobile multimedia services, where user can find the services and products.The services will support people to navigate on daily errands.There arenumerous application extents like mobile works, shopping and sports, tourism delivery, community servicesand public transport and safety. Mobile location based services will be based on standard technology like mobile devices, wireless networks and maps. Particularly mobilelocation based services utilize currentposition capabilities of a mobile device using GPS technology it extract the position of the user. Based on user location nearby services will be determined. One of the major issues in location based service is privacy controls to users without vastly affecting user’s services. Main aspects of this application are providing service, sharing and safety for users.

Key Terms: LBS Location based services

I.

INTRODUCTION

In these computerized era peoples depends on technology for daily errands. Now a days online shoppingsites, home deliveries and mobile services gaining more popularity because of savein timeand price.Todaymany service providing application and messenger application implemented based on mobile platform.

Mobile information society is developing rapidly as mobile telecommunications moves from third to fourth generation technology. In these computerized wordthe Internet and its services are coming towireless devices. The convergence of content and technology is deepening and themarket is being reorganised. Different actors want to reserve their place through mobile application. Today most of peoples choose online sites and mobile applications to buy there required products, this directly effects on growth in mobile and web technology. Key success for online e-commerce services is two principle wayof marketing.First, web user interface for users to buy products from online sites. Second, providing a mobile applications interaction to users with this user can buy specific products using application. Mobile application is key success factor for online business .Important fact in mobile technology and online shopping sites are quality of service and user interface standards.

Recent year android enable mobile phone going grad special attention than others that it because of user friendlyenvironment and high level operation support features, there are many features that android have but one of the interesting feature enable by android is GPS.In these GPS is the one which helps in getting driving directions and proving location information .The

purpose of providing GPS is to provide location based information is called LBS.

ICCIT15@CiTech, Bangalore

Location sharing application providesan personal location sharing service with known person based on user permission.

Location sharing service mainly concerns on user safety service by directlysending user location to emergency stations like police stations, fire stations and hospitals. Based on user location emergency service providers will provide the safety service to particular user.It also provides guidelines on user required route and transport all this services are based on the application called

Google nearby search.

Fig 1: Location based service components.

II.

LITERATURE SURVEY

This section briefly presents the related works on location based services and its applications.

Today, we are dealing with the era of mobile applications and online shopping sites which replace real time marketing into online fast marke ting’s.

Present days, 40% of users choose online sites to buy there required products and remaining 60% peoples choose mobile applications to buy there products. For example, Flipkart,Myntra, Amazon and

Snapdealare professional online shoppingecommerceorganizations providing services to user location.

These organizations startedwith online sites but now a days they are most concentrating on mobile applications because of rapid growth in mobile users andtechnology. These organization developed mobile applications to market their venders products

Now a days ,number of android based applications increasing rapidly because recent, a developer survey conduct in April-

2013 found that 71% of mobile developers develop Android because there were 1billion active android users per month.

Page 399

International Journal of Engineering Research

Volume No.4, Issue Special 5

Mobile based marketing are becoming increasingly popular because of rapid growth in smart phone users and also Mobile location based services is considerably profitable opportunity for service providers and users. In MLBS set an application to exploit the knowledge about geographical position to know the address to deliver the ordered product. Based on above survey today olx, quicker and whatapp applications gain popularity in small peroids.These applications uses web service technique between database and application. Mobile Location based service depends on the location of the users to get current location of the user which require many LBS components.

We have huge number of applications for location based services

,onlineservices and nearby service providers searchingin that most application work based on Google nearby key place search.

Android applicationsrelated to LBS mainly concentrate on the current location of the user which is obtained using GPS or network provider.

Mobile Location based services are classified into many categories in that major types are:

Entertainment serviceapplications includes location sharing with friends and community and location based gaming service.

Information based service is based on nearby location checking

,localweather reports finding taxi and entertainment information’s .

Navigation services which provides navigation to particular place, map services and traffic information’s.

E-commerce services include services like nearby store searching, sales information about products and mobile transactions or billings.

Security and Tracking service is based on nearby emergency service provider finder and tracking particular person activity .

Mobile LBS most commonly require GPS to locate user location with internet service .in most of the application web service is common technique used to provide interface between database and mobile application.

Most of the applications are implemented based on the ecommerce technique because which includes nearby shop searching and products searching based on the user location like

olx andquikermobile applications.

These application completely commerce service application because these applications can get user current location to search their required products and display to users.

E-commerce services provide both mobile and web user interface for users for example flipkart and amazon. This type of services increases the quality of service providing for users.

Service provider organizations mainly depending on ecommerce technology to market their venders’ products through online site and now a day they back with mobile based marketing.

E-commerce companies consider only high level venders or bulk products proving venders. Because of these reason middleware peoples cannot sell their product through online sites or mobile applications.

In present, local service providers like local home delivery services, local transports services and hotels are facing loss in business due to High -technology support to large scale business

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 like online shopping supports, advertisements through applications.

Medium and small scale business peoples presently facing problems because of lack in technology support. Because small scale business peoples cannot spend more money for building web pages for indivual services. To overcome this problem we proposed an application which plays an important role in medium and small scale business level.

III.

PROPOSED SYSTEM

Now a days there exist many location based service application for example location sharing with friends and parents, emergency contact based on user location etc. But there is no application which provides combination of services, here we proposed the technique for combination of services.in our proposed application implemented for both service providers and users to exchange information between them .In these application parallel we introduced location sharing entertainment service; it provides an interface between users to users information exchange.

A.Requirementes

This work is designed for service providers and users. In proposed system, two-wayinteraction for Service providers and single user interface for normal users; first one is android mobile application which providesprimary interface for users and service providers. Second, web user interface is provided for the service provider, using browser support they can use ecommerce service for their business.

B. Architecture

In these application main aspect is verify the service providers information to avoid fake registration of service providers because fake service providers cause on users ordered service failures .to avoid these problem we are proposed dashboard android owner application.

There are two way interfaces provided for the service providers.

Android application, itprovides an android application interface for both service providers and normal users. This application is developed in android studio[java,xml] with support of web service [php] connect with database[mysql].

Fig 2: Architecture diagram for proposed system

ICCIT15@CiTech, Bangalore

Page 400

International Journal of Engineering Research

Volume No.4, Issue Special 5

Web application ,its provides an web user interface for service providers to upload their information and view their respective business orders from the customers.

C. Security in application

Major aspect in mobile applications is maintaining privacy in user data from unknown authorized. Based on userauthentication informationuser get response back from application. In mobile application primary authentication is based on user name and password.

In these application we are getting information through web service to maintain URL privacy we are passing secrete key with base URL .This provides safe information exchange between application and database through web service.

Following mathematical statements describes the security privacy in web service.

Universal resource link access security contains following procedure;

URL=I(info)+key(ss)

Where

Key (ss) =>Server side secrete key,

I (info) =>Respective user information whichincludes authenticationinformation plus service information.

IV.

CONCLUSION

This section briefly presents the result works ofmobile location based fast services application. This application provides commerce services to userswith fast delivery services. Main advantage of mobile location based service is service providers are filtered based on the respective user location using system user longitude and latitude coordinates of the mobile device user locationwill be determined so delivery in service to users will be comparably fast.Ithelpsin technology support for medium and small scale business peoples by providing mobile interface to

Services providers to add their items into the application and alsoit provides web user interface for service providers.

REFERENCES

i.

Chandra, A., Jain, S., Qadeer, M.A., “GPS Locator: An Application for Location Tracking and Sharing Using GPS for Java Enabled Handhelds,”

2011 International Conference on Computational Intelligence and

Communication Networks (CICN), pp.406-410, 7-9 Oct.2011.

ii.

iii.

http://www.olx.in o http://www.olx.in

iv.

Sandeep Kumar, Mohammed Abdul Qadeer, Archana

Gupta,―Location Based Services using Androidǁ , IEEE 2009.

v.

Daniel J. Abadi, Peter A. Boncz, Stavros Harizopoulos, Columnoriented Database Systems, VLDB ’09 , August 24-28, 2009, Lyon, France.

vi.

viii.

Systems, http://en.wikipedia.org/wiki/find vii.

Miguel C. Ferreira, Samuel R. Madden, Compression and Query

Execution within Column Oriented Databases.

Daniel J. Abadi, Query Execution in Column-Oriented Database ix.

D. J. Abadi, S. R. Madden, N. Hachem, Column-stores vs. row-stores: how different are they really?

, in: SIGMOD’08, 2008, pp.967– 980.

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Page 401

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

‘Im@’ - A Technique for Sharing Location

Bhavya A.

1

, Balapradeep K. N.

2

, Dr. Antony P. J.

3

1

Deptt of CSE,VTU Belgaum, KVGCE Sullia, DK

2,3

Deptt of CSE,KVGCE Sullia, DK

Bhavya2535@gmail.com

Abstract

: The use of mobile phones today has become a part of our daily life.Recently all the mobile phones or smart phones are equipped with Global positioning system (GPS) sensors to get information about the location. LBS (Location based services) are used to obtain the knowledge about the geographical position. There exist many applications today which are going to share ones location with other in the terms of location co-ordinates (longitude and latitude)that can be viewed in the form of Google map also called as map based location sharing. This paper provides a detail description of sharing location with the friends in the form of text (also called as text based location sharing) instead in terms of map, since map based location sharing is time consuming compared to text based location sharing. The Longitude and latitude (Geographical coordinates) properties are used to obtain the location and that can be converted into text form and shared with the friends. This application is also enriched with the

Near-by services. It provides all the services near to the user location. Here near-by services are the organizations those who provide the services to the user nearer to the organization location.

Keywords:GPS (global positioning system),Location based services (LBS),longitude and latitude.

smartphone operating system it also supports execution of all local and third party applications.There exist many open source mobile platforms, butiOSfromapple, androidfrom Google,

Symbian from Symbian foundation, and windows from

Microsoft are the most popular. Android provides a platform where any applications can be downloaded, according to research has done till now more than 68,000 applications are available and number of applications downloaded by android enabled mobile phone reached more than 1 billion.

Recent year’s android enabled mobile phones going to grab the special attention than others that is because of its features, there are many features that android have but one of the interesting feature enabled by android is GPS. GPS is the one which helps in getting driving directions and proving location informationthe main purpose of using GPS is to provide location based information also called as location based service [3].

Location based services provide the location of person/device and the same location can be shared with others and other technology with respect to location based services is tracking the location of the other person/device this is also called as “self reporting position” instead of tracking.

The location of person/device can be obtained by location co-ordinates (longitude and latitude). The GPS sensors inserted inside the device sense the accurate location and obtain the longitude and latitude of the position and display the location in terms of Google map.

I.INTRODUCTION

In this computerized era of Facebook, watsapp, twitter etc.

sharing our day-to-day activities with friends and family has become a trend while sharing location also plays a very important role.

In olden day’s telephone was used to exchange the information as the days passed technology is being updated leading to the evolution of mobile phone, where as in mobile phone information can be exchanged in the form of text message and calls. Later the mobile phones are equipped with many additional features such as Bluetooth, Wi-Fi, Internet, GPS [1] etc. which lead into the era of smart phones. Smart phone (or mobile phone) is a mobile phone with same features as mobile phone but with advanced operating system and more popular then mobile phone, some of the features of smart phones are most of them have touch screen interface within digital camera and they can run third party applications with inbuilt GPS, web browsing, mobile payment etc. The first smart phone is

“Ericsson R380” released by Ericsson mobile communication in the year 2000, then 2008 was the year saw the first phone that is

HTC Dream to use android.

Android is an open source platform founded by Andy Rubin owned by Google. Android [2] is software which runson

Fig. 1 LBS Components

Fig 1 shows the LBS components. The LBS application represents the specific application means entire single application together with the smartphone implemented with sensor and server to store data, where LBS middleware is intermediate between the LBS application and core LBS features. The core LBS features are location tracking to track the location of the device, GIS (Geographic information system) provider is used to provide the Google maps and location collection service (LCS) used to collect the longitude and latitude information about device.

ICCIT15@CiTech, Bangalore

Page 402

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Today all smartphones have the location sensing capability built-in, the successful location based service can be obtained by providing the accurate location co-ordinates.

II.LITERATURE SURVEY

This section briefly presents the related works on location based services and its applications.

Today, we are dealing with the era of smart phones and iPhone’s , which are going to replace the bulky desktops in all manners. We have huge number of applications and usage where person walking on the roadside needs to get relevant data that can be obtained by location based services. GPS is a local positing system becoming popular. It is easier these days to use the map by connecting to the GPS receiver to devices. GPS chips are inserted into the device which obtains the accurate location of user by satellite signals and the location can be viewed in terms of Google map.

The authors Chris Hulls and Alex Haro proposed an application Life 360:usability of mobile devices for time use

surveys [4]in 2008. This is a family network location based service application which allows the family members to share their location and easily communicate with each other by adding the family members into the application which makes a family circle. The main features of this application is the person can instantly see where other family members are located, person can also able to share or not to share their particular location at particular time and family members can also chat with each other within a circle, thereby providing the family safety and also it gives alert when a new person enter into the circle or when the family member leave the circle.

Another noteworthy application related to the location sharing is Find My Friends [5] by Apple in 2011. This application allows the user to track other person location and can share the location of his own with person of his choice, if a person wants to track the location of the other person then the notification can be send to the person as a request. The location can be turned on or off at any time, location of the person is obtained by GPS, so whenever the GPS is turned off it is difficult to track location and to share location.

In the paper GPS and SMS based child tracking system

using smart phone[6] by A. Al-Mazloum, E.omer and M. F. A.

Abdullah presented anapplication based on tracking the child using the smart phone. This application specially developed to provide children safety by tracking the location of the child using the smart phone, where once after the application is installed by child and parent the parent can able to track the child activities. The parent can obtain the child location by sending the request message to the child to obtain the location and child can response the request of the parent thereby parent can view the location of the child. Here GPS is used to obtain the location of the child and location can viewed in terms of

Google map their by providing the child safety.

In the paper titled Android based mobile application and

development and its security [7] by SuhasHolla and Mahima M

Katti provided a detailed description of how to achieve security for the applications which has been downloaded from the

Google because android is an open source platform for mobile operating system. Anyone can upload the application in to the

Google so few may take advantage and upload application without any security issues which leads to many computer crimes; such problem can be overcome by using the layered approach to develop android application. This contains the application sandbox to detect the suspicious application both statically and dynamically.

The application named Nearest Friends Notification

Reminder [8] this application feature is to provide notification when any of friends in the user friend list moved in to the same location. GPS tracker will track the location of the friend only when the friends get into the same location as the user. The advantage of using this application is which helps to meet a friend who is in same area/place.

Google has built-in feature Search nearby [9]to search the nearby locations. Which helps the person to find the nearby location together with the location it also provides the option of navigation and bookmark. Where navigate option provides the direction to location by showing the route and bookmark allows the place to be marked as interest thereby saving the location and retrieving the directions for them.

III.PROPOSED SYSTEM

Location based services provides the location of the device using GPS, location can be obtained by the geographical coordinates (longitude and latitude). There exist many applications today to share the location of person/device, but the purpose of the proposed system is to share the location with friends in the text form and it also search the nearby services and display the contact of the registered nearby service providers, where user can able to interact with the registered service providers and make their work done

.

A.

Requirements

This work is designed for both users as well as for the service providers. Both must have the smartphones that support GPS and before they use the application both must be registered to the application. This application is used by the users to view the friends location and to view the nearby service providers, where service providers can only interact with the user who wish to use the services that has been provided by near-by service provider.

This application is developed using androidStudio, database

MySQL, web service using PHP. Android OS has been used for the implementation the solid reason for why we use this for implementation is to target more number of users.

B.

Application Architecture

We propose a solution to share the location and to access the nearby services using the GPS technology, GPS feature exist in all advanced smartphone. The simple idea of this application is to share and track the location of the person, and to search Nearby services.

ICCIT15@CiTech, Bangalore

Page 403

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Fig.2. Architecture diagram of proposed system

1) GPS provider:GPS provider provides the exact locationof the device/person. Almost of all the location based services use GPS to obtain location. GPS receivers are inserted inside the devices to get the location. GPS chips analyse the satellite signals and identify the user location. The location can be obtained by geographical coordinates.

2) Mobile client: Mobile clients are the people who use the mobile application. In this application both user and service provider are called as mobile client. Mobile client can obtain the location through the GPS and same location can be stored in the server database so mobile client can able to retrieve the location information stored in the server database at any time.

3) Server: server is connected with the database to store the information. Server stores both location information as well as the user personal information. When the location of the person is changed/updated then that will be stored in the database,so user can able to track the other person position which has been saved in the database. User can able to access only the current location information person that exists in the database.

The Fig 2 shows the architecture diagram of the proposed system.The mobile client can be user or service provider who uses the application. The user can able to share or track the location of friends in the friend list, the location of the users can be obtained through GPS provider which take the longitude and latitude of the person location and convert it into the text form and store it into the server database. So when user wish to view the location of his/her friends can view by refreshing the application then the location of each and every friend in the friend list will be obtained from the database and display in the text form. Fig.3 shows the friends list together with the name and location of the person. Here location is displayed in the text form as a status of the person. User can also able to access the nearby services, when user click on the nearby services he/she can obtain registered nearby services nearer to the location thereby user can able to interact with the interested service providers and get their work done.For ex: suppose consider you are near some restaurant named Taj , if that restaurant is registered to the application then it will be automatically displayed in the service provider list, then user can click on the restaurantname and order food by interacting with them. The restaurant service provider come to know your location details once you order foodand delivers the order to your location. Fig.4

shows the near-by services with respect to the user location.

Fig.2. Friends list

Service providers can register into the application by providing their location and service details, once after the registration if any user get into the location of where the service provider exist then that service will automatically displayed in the user’s nearby service list. Once the nearby services listed out into the nearby service list, user can able to see the location of that service and same time user can interact with the service provider and get their work done. The Service provider can get to know the location of the user only when user shows the interest, only work of the service provider is to provide the service to the users.

Fig.2. Near-by services

In order to use the application user and service provider must register into the application. The application users can register into the application as user or service provider.

ICCIT15@CiTech, Bangalore

Page 404

International Journal of Engineering Research

Volume No.4, Issue Special 5

4) User registration: user can register into the application by providing user personal details such as phone number, city, password etc. once after the registration the user personal data and location information will be stored in the user database.

Whatever the changes takes place regarding the user everything will be updated in the database. Once after the registration the user can login to the application and can add the friends, share the location, view friend’s location and also search near -by service providers.

5) Service provider registration:service provider can register into the application by providingname, place, service, phone number etc. once after the registration the service provider will be visible to users in that place. The service provider can interact with the users in case if the user shows the interest for his service. Service provider can see the user and user location until provider fulfils the requirement of the user; later service provider cannot able to see the user.

Application interface provides an interface between the database and mobile client. The location server stores the location information in the database, which can be stored and retrieved at any time.

IV.CONCLUSION

Nowadays there exist many applications based on the location based services. This application provides automatic updating of location for every 30 seconds and sharing our location address in text form with the friends

.

Where the basic applications used to share location in the map view requires high speed network. But in the proposed system sharing a location in text form can be achieved through the low speed network also.

The system uses longitude and latitude properties to share a location. The additional feature of this application is which provides user with nearby services option, which shows all nearby services exist near to the user location. So user can make use of nearby service by interacting with the service of their interest.It provides a fully secured location sharing based on authorization, and privacy can be achieved by providing the data only to the subscribed user.

REFERENCES

i.

Chandra, A., Jain, S., Qadeer, M.A., “GPS Locator: An Application for Location Tracking and Sharing Using GPS for Java Enabled Handhelds,”

2011 International Conference on Computational Intelligence and

Communication Networks (CICN), pp.406-410, 7-9 Oct.2011.

ii.

http://en.wikipedia.org/wiki/android iii.

Sandeep Kumar, Mohammed Abdul Qadeer, Archana

Gupta,―Location Based Services using Androidǁ , IEEE 2009.

iv.

Jennie W. Lai, LorelleVanno, Michael W. Link, PhD,Jennie Pearson,

HalaMakowska, Karen Benezra, and Mark Green “ Life 360: usability of mobile devices for time u se surveys”AAPOR – May 14-17, 2009 v.

vi.

http://en.wikipedia.org/wiki/Find_My_Friends

A. AlMazloum, E. Omer, M. F. A. Abdullah,”GPS and SMS -Based

Child Tracking System Using Smart Phone”,International Journal of

Electrical, Computer, Electronics and Communication Engineering Vol:7,

No:2, 2013 vii.

S uhasHolla, Mahima M Katti, “ Android based mobile application and development and its security” IEEE -2012.

viii.

Http://blogs.wsj.com/digits/.../facebook-to-notify-users-when-friendsare-nearby/.

ix.

http://en.wikipedia.org/wiki/Nearby

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Page 405

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Self-Vanishing of Data in the Cloud Using Intelligent Storage

Shruthi

nd

2 sem M.Tech

Dept of CSE

SVIT, Bengaluru.

Ramya N

2 nd sem M.Tech

Dept of CSE

SVIT, Bengaluru.

ABSTRACT

Cloud is meant for storing large amount of data for long period of time with security.

The user may some of his confidential data in the cloud.

To maintain the good consistency the cloud service provider replicates the data geographically without the permission of authorized user. As the data is confidential and the data is replicated and stored in different servers the data can be misused and some malicious activity can be performed by the unauthorized user or by the cloud service provider. In order to overcome the above conflicts the SeDaS is proposed. Self- destructing is mainly used for protecting the confidential data.

As the data is confidential to protect the data the user specifies the time interval for that specific data that is stored in cloud. After the completion of time interval the data and the replicated copies will be selfdestructed without intimating the authorized user. This paper is using active storage and cryptographic techniques to solve above challenges.

Keywords: Cloud computing, self-destruction data, active storage framework, data privacy.

I.

INTRODUCTION

Swathi S.M

nd

2 sem M.Tech

Dept of CSE

SVIT, Bengaluru.

Sreelatha P.K

Assistant Professor

Dept of CSE

SVIT, Bengaluru.

confidential file that are stored in the cloud. After the user specified time expire the files and all the replicas will be selfdestructed from the cloud.

Every confidential file stored in the cloud may not be required for the long period of time by the user. To delete the file after some time vanish methodology was proposed. In vanish methodology the secret key will be divided and stored in the distributed hash table (DHT),

DHT is one of the characteristic of P2P, the node in the

DHT will be refreshed after every 8 hours. So the keys present in the node will be deleted, because of this the user may not get the enough number of keys to decrypt the file.

One of the disadvantage of vanish methodology is the key cannot survive for long period of time. To overcome this challenge SeDaS is proposed which is dependent on

Active Storage Framework. The SeDaS system mainly stipulate two modules, one is self- destruct method object that is fraternized with each and every secret key and another is for each secret key the survival time parameter.

SeDaS is offering:

Cloud computing plays a major role for organization or the individuals for storing the large amount of data.

Cloud provides not only the storage but also provides the services like infrastructure-as-a-service (Iaas), platform-as-aservice (Paas), software-as-a-service (Saas). Because of these services the organization and individuals are focusing towards cloud.

As cloud is an internet based technology it also provides mobility so people are more interested in storing and retrieving the personal data. The personal data may contain passwords, passport numbers, account numbers and some more important documents. In spite of maintaining the individual files all the files can be stored in a single directory in cloud. The user specifies the time for each

ICCIT15 @ CiTech Bangalore

1) Key distribution algorithm is based on Sham ir’ s algorithm which is mainly used as core algorithm to store the client s’ d istributed key in the object storage system.

2) Here the object based storage interface is used to store and manage the divided keys that are based on

Active Storage Framework.

3) Securely deleting files and random encryption keys that is stored in secondary storage is supported by SeDaS.

II. RELATED WORK

Levy et al. (2009) proposed “ Vanish: Increasing Data

Privacy with Self- Destructing Data ” [2]

Page 406

International Journal of Engineering Research

Volume No.4, Issue Special 5

Personal data are cached, copied, and archived by third parties, often without our knowledge or control. We wish to ensure that all copies of certain data become unreadable after a user-specified time, without any specific action on the part of a user, and even if an attacker obtains both a cached copy of that data and the user ’ s cryptographic keys and passwords.

With the help of novel integration of cryptographic techniques, Vanish overcomes the above challenges. The goal is to self-destruct the data automatically after it is no longer useful. Vanish system leverage the services provided by decentralized, global-scale P2P infrastructures and, in particular, Distributed Hash Tables (DHTs). DHTs are designed to implement a robust index-value database on a collection of P2P nodes. Vanish encrypts a user ’ s data locally with a random encryption key not known to the user, destroys the local copy of the key, and then sprinkles bits

(Shamir secret shares) of the key across random indices

(thus random nodes) in the DHT.

Vanish architecture

Data object D is taken by Vanish in order to encapsulate it into VDO. To encapsulate the data D, Vanish picks a random data key, K, and encrypts D with K to obtain a cipher text C.

Figure 1: Vanish System Architecture Figure shows how to split the data key K into N pieces K1 … KN vanish uses threshold secret sharing. The application or the user can set the threshold, which is the parameter of secret sharing. To reconstruct the original key, N shares are required which is determined by threshold.

Advantages

Vanish targets post-facto, retroactive attacks; that is, it defends the user against future attacks on old, forgotten, or unreachable copies of data. The attacker ’ s job is very difficult, since he must develop an infrastructure capable of attacking all users at all times. Solution utilizes existing, popular, researched technology - used since 2001.

It does not require special security hardware, or special operations on the part of the user. Utilizes inherent half-life

(churn) of nodes in DHT – data definitely destroyed.

Disadvantages

ICCIT15 @ CiTech Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

This mechanism is not universally applicable to all users or data types. They focus in particular on sensitive data that a user would prefer to see destroyed early rather than fall into the wrong hands. Vanish applications may compose VDOs with traditional encryption systems like PGP and GPG. In this case, the user will naturally need to manipulate the PGP/GPG keys and passphrases. It does not defend against denial of service attacks that could prevent reading of the data during its lifetime.

Tang et al. (2010) proposed “ FADE: A secure overlay cloud storage system with File Assured Deleti on” [3]

Keeping data permanently is undesirable, as data may be unexpectedly disclosed in the future due to malicious attacks on the cloud or careless management of cloud operators. The challenge of achieving assured deletion is that we have to trust cloud storage providers to actually delete data, but they may be reluctant in doing so. Also, cloud storage providers typically keep multiple backup copies of data for fault-tolerance reasons.

It is uncertain, from cloud clients ’ perspectives, whether cloud providers reliably remove all backup copies upon requests of deletion.

FADE is a secure overlay cloud storage system that provides fine-grained access control and assured deletion for outsourced data on the cloud, while working seamlessly atop toda y’ s cloud storage services. In FADE, active data files that remain on the cloud are associated with a set of userdefined file access policies (e.g., time expiration, read/write permissions of authorized users), such that data files are accessible only to users who satisfy the file access policies.

In addition, FADE generalizes time-based file assured deletion (i.e., data files are assuredly deleted upon time expiration) into a more fine-grained approach called policy- based file assured deletion, in which data files are assuredly deleted when the associated file access policies are revoked and become obsolete.

The FADE system

The FADE system is composed of two main entities:

FADE clients. A FADE client (or client for short) is an interface that bridges the data source (e.g., file system) and the cloud. It applies encryption (decryption) to the outsourced data files uploaded to (downloaded from) the cloud. It also interacts with the key managers to perform the necessary cryptographic key operations.

Key managers. Minimum group of key managers together developed FADE for assured deletion and access control

Page 407

International Journal of Engineering Research

Volume No.4, Issue Special 5 policy based keys are maintained, each of which in turn is a standalone entity.

Advantages

FADE decouples the management of encrypted data and encryption keys, such that encrypted data remains on third-party (untrusted) cloud storage providers, while encryption keys are independently maintained by a key manager service, whose trustworthiness can be enforced using a quorum scheme. FADE generalizes time-based file assured deletion into a more fine-grained approach called policy based file assured deletion, in which files are associated with more flexible file access policies and are assuredly deleted when the associated file access policies are revoked and become obsolete.

Disadvantages

It doesn ’ t support operations for a batch of files. Block update operation is not supported. Along with the file, it has to transmit metadata.

III. SYSTEM ARCHITECTURE

SeDaS architecture is as shown in figure 2 that consists of three components that are based on Active Storage

Framework.

i.

Metadata server.

ii. Application node. iii.

Storage node.

Metadata server consists of many managements like user, session, server, key and file management. Application node: is used by the user for the services like storage in SeDaS.

Storage node: it mainly consists of two subsystems, storage subsystem like <key,value> and runtime subsystem like

Active Storage Object. These subsystems have many features like: Store subsystem is dependent on Object Storage

Component which performs the operations like managing objects that are stored in storage node. The object is represented uniquely, called as objectID, which is used as key. Active Storage Object runtime subsystem is dependent on Active Storage Agent model.

ICCIT15 @ CiTech Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Figure 2: Architecture of SeDaS System

Active Storage Object: The user object defines the active storage object that has the time-to-live value property.

Time-to-live value is the value that is specified by the user to his private files. Time-to-live is the survival time for file that specifies how long the file exists in the cloud. After the ttl value expires the file will be self- destructed.

IV.

CONCLUSION

In cloud computing data privacy and security is mainly concerned. Our paper mainly surveyed on the data privacy for sensitive files that are stored in the cloud. Here we are mainly concentrating on the SeDaS methodologies that mainly used for data privacy. In SeDaS architecture it consists of three components such as application node, metadata server and storage node. By using some of the cryptographic techniques and active storage, SeDaS provides security for confidential files.

V. REFERENCES

[1] Lingfang Zeng , Shibin Chen , Qingsong Wei , and Dan Feng,

“SeD as: A Self- Destructing Data System Based on Active Storage

Framework ,” IEEE Transactions On Magnetics, Vol. 49, No. 6, June

2013.

[2] Levy et al. (2009) proposed “ Vanish: Increasing Data Privacy with

Self-Destructing Dat a”.

[3] Tang et al. (2010) propose d “F ADE: A secure overlay cloud storage system with File Assured Deletion ”.

[4] A. Shamir, “ How to share a secre t,” Commun. ACM, vol. 22, no. 11, pp.

612 – 613, 1979.

[5] R. Geambasu, T. Kohno, A. Levy, and H. M. Levy, “ Vanish: Increasing data privacy with self-destructing data, ” in Proc.

USENIX Security Symp., Montreal, Canada, Aug. 2009, pp. 299 – 315

Page 408

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

A Survey on Various Comparisons on Anonymous Routing Protocol in

MANET’S

Dr. Rajashree V. Biradar

1

, K. Divya Bhavani

Deptt of CSE, Bellary, Karnataka,India rajashreebiradar@yahoo.com, kdivya9209@gmail.com

2

Abstract

: MANET is an infrastructure less sensor network, consisting of collection of mobile devices. Secure Routing is a challenging task in MANETS to overcome this problem anonymous routing protocol is developed. This paper focus on comparison of different existing anonymous routing protocol based on routing category, design, advantages and disadvantages.

Keywords- Mobile ad hoc network, Routing, security, anonymity

I. Introduction

A MANET Stands for "Mobile Ad Hoc Network”, it is type of infrastructure less ad hoc network. MANET is an autonomous collection of mobile nodes sharing a wireless channel without any centralized control or established communication backbone.

The mobile nodes that can communicate to each other via radio waves, the mobile nodes that are in radio range of each other can directly communicate, whereas other nodes i.e. out of communication range will use intermediate nodes to route their packets. Each of the nodes has a wireless interface to communicate with each other; hence these networks are also called as multi -hop networks. MANET is a self-configuring network of mobile routers and associated hosts connected by wireless links. The routers (mobile devices, nodes) are free to move randomly and organize themselves arbitrarily, thus, the network’s wireless topology may change rapidly and unpredictably hence it has dynamic topology and each mobile node has limited resources such as battery, processing power and on-board memory.

Table 1.0 Characteristics of MANETS ’ s s.no

1

Characteristics Description

Distributed network The control of the network is distributed among the nodes i.e. there is no background network for the central control of the network operations.

2

3

4

5

6

Multi hop routing

Dynamic network topology

Self-configuration

Bandwidth constraint

Device access flexibility

The out of radio range nodes will communicate with each other with the help of one or more intermediate nodes

The nodes in M ANET’s are mobile nodes hence the network topology may change rapidly and randomly over time

Computations decentralization independent computational, switching

(or routing), and communication capabilities.

Resource constraints Limited bandwidth available between two intermediate nodes.

Access to the channel cannot be restricted and the communication medium is accessible to any entity.

Table 2.0 Applications of MANETs

Application

Tactical networks

Sensor networks

Home and enterprise

Commercial and civilian

Emergency services

Services

Military communication and operations

Data tracking of environmental conditions, animal movements, chemical/biological detection

Home/office wireless networking, Personal area networks (PAN), Personal networks (PN)

• E -commerce: electronic payments anytime and anywhere environments

• Vehicular services: road or accident guidance, transmission of road and weather conditions, taxi cab network, inter-vehicle networks

Policing and fire fighting Supporting doctors and nurses in hospitals Disaster recovery

Figure 1.0 Mobile Ad hoc network

ICCIT15@CiTech, Bangalore

II. Routing protocols in MANET’s

A routing protocol defines a set of rules used by routers to determine the most appropriate paths into which they should forward packets towards their intended destinations. MANETS classifies the routing protocol into 3 types.

Page 409

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Figure 1.1 Type of routing protocol

Proactive Routing Protocols:

Proactive Routing Protocols are also called as table driven routing protocol. In this routing each node in the network will maintain the routing table in which it consists of routing information or routing activity from one to each other node in the network.

Example: Destination-Sequenced Distance-Vector (DSDV),

Cluster Gateway Switch Routing Protocol (CGSR)

Wireless Routing Protocol (WRP).

and

Reactive routing protocol:

In this routing protocol, routes can be identified whenever the node demands to send the packet from source to destination. Hence this protocol is also known as source-initiated on demand driven routing protocols.

Example: Ad Hoc On-Demand Distance Vector Routing

(AODV), Dynamic Source Routing (DSR)

Hybrid routing protocol:

Hybrid routing protocol is combination of both proactive and reactive routing protocol. It uses the route discovery mechanism of reactive protocol and the table maintenance mechanism of proactive protocol. For instance, table-driven protocols can be used between networks and ondemand protocols inside the network or vice versa.

Example: Zone Routing Protocol (ZRP)

Significance of Anonymous routing protocol

Anonymity is a quality or state of being unknown or unacknowledged or the state of being not identifiable within a set of subjects, the anonymity set. Concept of anonymity has recently attracted attention in mobile wireless security research.

ICCIT15@CiTech, Bangalore

Proactive routing and global-knowledge-based routing schemes are the ones used in infrastructure networks to provide anonymity protection. These are not applicable in the case of mobile ad hoc networks. In hostile environment, the adversary may attack against the routing information to track the messages sending from source to the destination node. The malicious node or an attacker can be prevented by anonymous routing protocols in which it will provide security to the routing and the location of nodes in the network.

III. Types of anonymous routing protocols

ANODR: Anonymous On Demand Routing With

Untraceable Routes For Mobile Ad Hoc Networks

ANODR an anonymous on-demand routing protocol for mobile ad hoc networks deployed in hostile environments, addressing two close-related unlink ability problems, namely route anonymity and location privacy.

ANODR design is based on “broadcast with trapdoor information”. This protocol avoids the coun ting-to-infinity problem of other distance-vector protocols by using sequence numbers on route updates.

AO2P: Ad-Hoc On-Demand Position-Based Private Routing

Protocol

AO2P is a one of the important anonymous routing protocol. It is an ad hoc on-demand position-based private routing algorithm. This protocol is mainly proposed for communication anonymity.

Zone Routing Protocol ZRP

Zone routing protocol is for mobile ad hoc networks which localizes the nodes into sub-networks (zones). It incorporates the merits of on-demand and proactive routing protocols. Anonymity zones are used to protect both source and destination privacy. Local flooding is used in destination anonymity zones to guarantee data delivery.

ALARM protocol (Anonymous Location Aided Routing in

MANET)

ALARM demonstrates the feasibility of obtaining, at the same time, both strong privacy and strong security properties. It uses nodes current locations to construct a secure MANET map. Based on the current map, each node can decide which other nodes it wants to communicate with.

ALERT: An Anonymous Location-Based Efficient Routing

Protocol in MANETs

ALERT dynamically partitions a net-work field into zones and randomly chooses nodes in zones as intermediate relay nodes, which form a non-traceable anonymous route. It randomly chooses a node in the other zone as the next relay node and uses the GPSR algorithm to send the data to the relay node.

Page 410

International Journal of Engineering Research

Volume No.4, Issue Special 5

Table 3.0 Comparison of types of anomalous routing protocols

Anonymous routing protocols

ANODR:

AO2P:

Zone

Routing

Protocol

(ZRP)

ALARM protocol

ALERT:

Routing

Protocol category

Reactive

Reactive

Hybrid

Proactive hybrid

Anonymous routing protocol design based on

Broadcast and trapdoor information

Receiver contention, For R-AO2P channel access

Local broadcasting

Group signature and location based forwarding

Zone partition and Random choosing

Anonymous routing protocols

ANODR:

AO2P:

Zone

Routing

Protocol

(ZRP)

ALARM protocol

ALERT: pros Cons

Better trade-offs between routing performance and security protection is

Obtained.

Gives identity and location anonymity for source and destination.

1. Performance of routing varies significantly when

Different cryptosystems are utilized.

2. It does not focus on security.

3.It have less efficiency

1.Less significant routing performance hence lower packet delivery ratio

Increases the scalability of

MANETS and maintain the route

1. Reliable data transmission.

2. It will provide mutual authentication between the mobile nodes.

Location and routing anonymity with high performance with low cost.

1. Maintaining the high level of topological information of nodes require more memory and power consumption.

2. Focus only on destination anonymity.

Does not provide route anonymity.

Not completely bullet proof to all attacks.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Advanced Research in Computer Science and Software Engineering”, Volume 3,

Issue 5, May 2013 iii.

Prabhu.K, Senthil Kumar, “A Survey on Various Manet Routing

Protocols Based on Anonymous Communication”, “ International Journal of

Innovative Research in Computer and Communication Engineering”, Vol. 3,

Issue 1, January 2015 iv.

Anuj K. Gupta, Harsh Sadawarti, and Anil K. Verma, “Review of

Various Routing Protocols for MANETs”,” nternational Journal of Information and Electronics Engineering”, Vol. 1, No. 3, November 2011.

v.

SHINO SARA VARGHESE, J. IMMANUEL JOHN RAJA, “A Survey on Anonymous Routing Protocols in MANET”, RECENT ADVANCES in

NETWORKING, VLSI and SIGNAL PROCESSING.

vi.

VI.

Jojy Saramma John, R.Rajesh”, “Effic ient Anonymous Routing

Protocols in Manets: A Survey”, “International Journal of Computer Trends and

Technology (IJCTT) “, volume 11 number 1 – May 2014.

vii.

Xiaoxin Wu, Jun Liu, Xiaoyan Hong, and Elisa Bertino,Fellow, IEEE,

“Anonymous Geo -Forwarding in MAN ETsthrough Location Cloaking”, “IEEE

TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS”, VOL. 19,

NO. 10, OCTOBER 2008.

viii.

Harjeet Kaur, Varsha Sahni, Dr. Manju Bala, “A Survey of Reactive,

Proactive and Hybrid Routing Protocols in MANET: A Review”, “(IJCSIT)

International Journal of Computer Science and Information Technologies”, Vol.

4 (3) , 2013.

ix.

XI.

Karim El Defrawy and Gene Tsudik, “Security Issues in ALARM

Protocol for Mutual Authentication in MANET: A Review”, “International

Journal of Innovative Researc h in Computer and Communication Engineering”,

Vol. 2, Issue 5, May 2014.

x.

X.

Snehlata Handrale, Prof. S. K. Pathan, “An Overview of

Anonymous Routing ALERT Protocol”, (IJCSIT) International Journal of

Computer Science and Information Technologies, Vol. 5 (2) , 2014, 1607-1609.

xi.

XI.

Abhishek Gupta ,Samidha D Sharma, “A Survey on Location

Based Routing Protocols in Mobile Adhoc Networks”,” (IJCSIT) International

Journal of Computer Science and Information Technologies”, Vol. 5 (2) , 2014,

994-997, xii.

XII. Haiying Shen,Member, IEEE, and Lianyu Zhao, Student Member,

IEEE, “ALERT: An Anonymous Location -Based Efficient Routing Protocol in

MANETs”, IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 12, NO.

6, JUNE 2013.

xiii.

XIII.

Xiaoxin Wu and Bharat Bhargava,Fellow, IEEE, “AO2 P:Ad

HocOn-DemandPositionBasedPrivate Routing Protocol”, “IEEE

TRANSACTIONS ON MOBILE COMPUTING”, VOL. 4, NO. 4, JULY/AUGUST

2005.

Conclusion

This survey paper presents the overview of routing protocols and importance of anonymous routing protocols. This paper gives an idea about different existing anonymous routing protocols in

MANETs with its merits and demerits, still MANETs have posed a challenge to fight against attacks to provide high secure route efficiency.

References i.

Dr.S.S.Dhenakaran, A.Parvathavarthini, “An Overview of Routing

Protocols in Mobile AdHoc Network”, “International Journal of Advanced

Research inComputer Science and Software Engineering”, Volume 3, Issue 2,

February 2013 ii.

Aarti, Dr. S. S. Tyagi, “Study of MANET: Characteristics,

Challenges, Appl ication and Security Attacks”, “International Journal of

ICCIT15@CiTech, Bangalore Page 411

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Stock Market Prediction: A Survey

Guruprasad S.

1

, Rajshekhar Patil

2

, Dr.Chandramouli H

3

, Veena N

4

1,2

Dept. of CSE, BMSIT, Benguluru.

3,4

Dept. of CSE, EPCET, Benguluru.

guruprasad@bmsit.in,Hcmcool123@gmail.com, Veena_guruprasad@rediffmail.com

Abstract:

Stock market is a widely used investment scheme promising high returns but it has some risks. An intelligent stock prediction model would be necessary.

Stock market prediction is a technique to forecast the future value of the stock markets based on the current as well as historical data available in the market. Stock market prediction is a mainly based on Technical Analysis and Fundamental Analysis. In literature it is observed that there are several techniques available for the predicting the stock market value. This paper aims at survey on the use of Neural Network (NN), Data

Mining, Hidden Markov Model (HMM), Neuro Fuzzy system,

Rough Set Data Model and Support Vector Machine techniques for predicting the stock market variation. In this paper, a methodology is proposed for forecasting to provide better accuracy when compared to the traditional methods.

Keywords:

Data Mining, Hidden Markov Model, Neural

Network, Neuro Fuzzy system, Rough Set.

1. Introduction

Stock market plays a vital role in the economic performance. Basically, it is used to deduce the economic situation of a particular nation. However, information regarding stock market is typically incomplete, uncertainty and indefinite, making it a challenging task to predict the future economic performance. More specifically, the stock market's variations are analyzed and predicted in order to access knowledge that could help to guide the investors, when to buy or when to sell, and to hold a financial asset. In general, prediction means to know about the future. So, for the investment or trade in the market, prediction of market value is very much essential. The market movement changes frequently, valued ahead, difficult to predict and disorganized in nature [1]. Hence by using only technical analysis methods to anticipate stock market is very difficult, similar to that of time series analysis. Essential analysis typically works best over longer periods of time, where technical analysis is more appropriate for short term trading. Researchers have made several attempts to predict the performance of financial market. Many artificial intelligent techniques such as Neural

Network and Fuzzy Systems have been proposed [2]. Since it is difficult to interpret their results, they are unable to visualize clearly the nature of interactions between technical indicators and stock market variations. The difficulty in case of technical analysis is that, it requires a complete pattern to make an accurate prediction on the stock movement. Preferably, such a

ICCIT15@CiTech, Bangalore forecast should be made before the pattern is completed to facilitate the prediction process. The vital idea for successful stock market prediction is achieving best results, and to minimize the inaccurate forecast of the prices.

Fig 1: Various prediction techniques

The survey of recent techniques such as NN, Data Mining,

Neuro Fuzzy system, HMM and Rough Set Data Model system offer useful tools for anticipating the noisy environments of stock market. This article aims at providing intelligent techniques to anticipate market prices. A stock market index is the representation of the movement in the “ average of several individual stocks ” . Resistant characteristics are not taken into consideration in forecasting process. To overcome these drawbacks, researchers could develop a model to forecast individual stock prices [3].

2. LITERATURE REVIEW

Phichhang Ou and Hengshan Wang applied ten different data mining techniques to anticipate price variation of Hang Seng index of Hong Kong stock market [4]. Among those 10 methods

LS-SVM and SVM generate high ranking predictive performance. Mostly, SVM is best as to compare with LS-SVM for in sample prediction. Since, in case of hit rate and error rate

LS-SVM is better than SVM for considering the out sample forecast.

Suresh et al. use different data mining techniques that are able to discover the hidden pattern; forecast the future trends and behaviors in financial market [5]. Pattern matching techniques are found to be descriptive in time-series analysis. In this paper, they used an algorithm to accommodate a flexible and dynamic pattern-matching task in time series analysis. Apart from segment size the instance to sub-time-series size affects the

Page 1

International Journal of Engineering Research

Volume No.4, Issue Special 5 system performance. In this paper, the ratio was set to 1 and also the ratio was reduced to obtain better result.

Binoy et al. used hybrid decision tree-neuro-fuzzy system methodology for forecasting of stock market. Automated stock market trend anticipation system was proposed using decision tree adaptive neuro-fuzzy hybrid system [6]. They used different techniques like technical analysis and decision tree. First, technical analysis is used for feature extraction, and then the decision tree for feature selection. The reduced dataset obtained by these two is fed as input, to train and test the adaptive neurofuzzy system for next day stock prediction. They tested their proposed system on four major international stock market data.

Their experimental results clearly showed that the proposed hybrid system produces much higher accuracy when compared to stand-alone decision tree based system and Adaptive Neuro

Fuzzy Inference System (ANFIS). The above proposed neurofuzzy system is as shown in Fig 2.

Aditya Gupta and Bhuvan Dhingra in [7] used Hidden Markov

Model (HMM) for predicting the market prices. With the help of historical stock prices they present the Maximum a Posteriori

HMM approach for anticipating stock values for the next day.

For training the continuous HMM they consider the intraday high and low values and fractional change in stock price. Some of the already existing methods like HMMs and Artificial Neural

Networks use Mean Absolute Percentage error (MAPE) to minimize inaccuracy rate. They have tested their approach on several markets, and compared the performance. Finally, they present an HMM based Maximum a Posteriori (MAP) estimator for market predictions. The model uses a latency of days to predict the stock value for the next day. Using an already trained continuous HMM MAP, decision is made over all the possible values of stock. They assume the four underlying hidden states viz fractional change, fractional high, fractional low emit the visible observations.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Md. Rafiul Hassan et al. in [8], deployed a fusion model by combining Hidden Markov Model (HMM), Artificial Neural

Networks (NN) and Genetic Algorithms (GA) to anticipate financial market prediction. In the proposed fusion model, an

NN is employed as a black- box to introduce noise to the observation sequences so that they may be better fitted with the

HMM’s. GA is then applied to find out the optimal initial parameters for the HMM’s , given the transformed observation sequences. By using this fusion model there are wide range of options to find number of alternative data items from historical data. This type of data items is responsible for market behavior for current day. Then average of the price differences for the identified data items is calculated. After this, average is added to the current day price. The value obtained is the forecast value of a particular day. This model consist of two phases

Phase 1: Optimizations of HMM.

Phase 2: Using weighted average method to obtain the forecast.

The schematic representation of fusion model is as shown in

Fig.3.

Fig 2: Block diagram of neuro-fuzzy system.

ICCIT15@CiTech, Bangalore

Fig 3: Block diagram for fusion model.

A.E Hassanien et al. proposed a generic rough set predictive model using the data set consisting of daily variations of a stock traded by gulf-bank of Kuwait[9]. The objective was to modify the existing market predictive models based on rough set approach and to construct a rough set data model that would significantly reduce the total number of generated decision rules, keeping the degree of dependency intact. They created an information table consisting of several market indicators like closing price, high price, low price, trade, value, average, momentum, disparity in 5 days, price oscillator, RSI (relative strength index) and ROC (rate of change). These indicators acts as conditional attributes of the decision table which predict price movement.

Page 2

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Fig 3.3: Rough set data model

The major steps involved in the creation of rough set data model are as shown below steps:

Step 1: Create a detailed information system table with set of real value attributes.

Step 2: Discover the minimal subset of conditional attributes that discerns all the objects that are present in the decision class.

Step 3: Divide the relevant set of attribute into a different variable sets. Then, Compute the dependency degree and the classification quality. Calculate the discrimination factors for individual combination to find the highestdiscrimination factors. Add the highest-discrimination factors combination to the final reduct set.

Step 4: For each generated reduct data set, and its corresponding objects construct the decision rule.

3. Conclusion

This paper surveyed the different methodologies for stock market prediction such as Data mining, Neural Network, Neuro

Fuzzy system, Hidden Markov Model and Rough set data models. This paper also highlights the fusion model by merging the Hidden Markov Model (HMM), Artificial Neural Networks

(NN) and Genetic Algorithms (GA). The NN and HMM’s have the ability to extract useful information from the data set. so, it plays a vital role in stock market prediction. These approaches are used to control and monitor the entire the market price behavior as well as fluctuation. Hidden Markov Model and

Rough set data models are used frequently in anticipation of market prices.

4. Acknowledgement

We express our sincere thank to all the authors, whose papers in the area of Stock Market Prediction are published in various conference proceedings and journals.

References

i.

S.Arun , Joe Babulo, B. Janaki, C. Jeeva, “Stock Market Indices

Prediction with Various Neural Network Models ”, International Journal of

Computer Science and Mobile Applications, Vol. 2, Issue 3, .pp32-35 march

2014.

ii.

URL: http://www.learnartificialneuralnetworks.com/stockmarketprediction.html

.

iii.

Kuo R J, Lee L C and Lee C F , Integration of Artificial NN and

Fuzzy Delphi for Stock market forecasting, IEEE International Conference on

Systems on Man, and Cybernetics, Vol. 2, pp. 1073-1078 Jan 1996.

iv.

Phichhang Ou and Hengshan Wang “Prediction of Stock Market

Index Movement by Ten Data Mining Techniques ”, Canadian Center of Science

and Education, Vol. 3 no 12 December, 2009.

v.

M. Suresh babu, N.Geethanjali and B. Sathyanarayana, “Forecasting of Indian Stock Mar ket Index Using Data Mining & Artificial Neural Network”,

International journal of advance engineering & application, Vol. 3, Issue.4, .pp

312-316 may 2011.

vi.

Binoy B. Nair, N. Mohana Dharini, V.P. Mohandas, “A Stock Market

Trend Prediction System Using a Hybrid Decision Tree-Neuro-Fuzzy

System”, International Conference on Advances in Recent Technologies in

Communication and Computing on , .pp 381-385 June 2010.

vii.

Aditya Gupta and Bhuwan Dhingra, “Stock Market Prediction Using

Hidden Markov Models”, IEEE International Conference on Systems on Neural

Network World .pp 381-385 December 2012.

viii.

Md. Rafiul Hassan and Baikunth Nath, “Stock Market forecasting using Hidden Markov Model: A New Approach”, Proceeding of the 5 th international conference on intelligent Systems Design and Application 0-7695-

2286-06/05, IEEE 2005.

ix.

Hameed A1-Qaheri, AE Hassanien, and AAbraham, "A Generic

Scheme for Generating Prediction Rules Using Rough Set", IEEE International

Conference on Systems Neural Network World, Vol. 18, No. 3,pp. 181-198,2008.

x.

Yang, Kongyu, Min Wu, and Jihui Lin. "The application of fuzzy neural networks in stock price forecasting based On Genetic Algorithm discovering fuzzy rules.", In Natural Computation (ICNC), 2012 Eighth

International Conference on, pp. 470-474. IEEE, 2012.

xi.

Victor Devadoss and T. Anthony Alphonnse Ligori, “Stock

Predictions using Artificial Neural Networks”, International Journal of Data

Mining Techniques and Applications, Vol:02, pp283-291 December 2013,

.pp283-291.

ICCIT15@CiTech, Bangalore Page 3

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Identifying and Monitoring the Internet Traffic with Hadoop

Ranganatha T.G., Narayana H.M

Deptt of CSE,M.S Engineering College, Bangalore tgranganatha@gmail.com

ABSTRACT:Handling internet traffic in these days is not so easy, but the explosive growth of internet traffic is hard to collect, store and analyze internet traffic on a single machine.

Hadoop has become a popular framework for massive data analytics. It facilitates scalable data processing and storage services on a distributed computing system consisting of commodity hardware. In this paper, I present a Hadoop based traffic analysis and control system, which accepts input from

Wire shark (Log File), and output in form of summary which contains entire internet traffic details and I also implemented the congestion control algorithm to control the online network traffic in the internet.

KEYWORDS: Single machine, Hadoop, Commodity

Hardware and Wire shark.

1.

INTRODUCTION

Internet has made great progress in daily life and brought much more convenience to much more people daily life in recent years, fact that it is still provides a kind of BOF (Best of Effort) service to application has never been changed since its invention.

Mininet is a network emulator, an instant virtual network our laptop, It runs a collection of end-hosts, Switches, Routers and

Links on a single Linux kernel. It uses a lightweight virtualization to make a single system look like a complete network, System, Code, running in the same kernel.

Open daylight is a controller used to control the flows running in the Mininet, Mininet will connect to controller and setup a ‘n’ tree topology.

Wire shark is a tool used to capture packets in the network.

It’s a free open source packet analyzer used for network trouble shooting, analysis, software and Communication protocol development and education.

Hadoop was originally designed for batch oriented processing jobs, Such as creating web pages indices or analyzing log data.

Hadoop widely used by IBM, Yahoo!! , Face book, Twitter etc., to develop and execute large-scale analytics or applications for huge data sets. Apache Hadoop is a platform that provides pragmatic, cost-effective, scalable infrastructure for building many of the types of applications described earlier. Made up of a distributed file system called the Hadoop Distributed File system

(HDFS) and a computation layer that implements a processing paradigm called Map Reduce, Hadoop is an open source, batch data processing system for enormous amounts of data. We live in a flawed world, and Hadoop is designed to survive in it by not only tolerating hardware and software failures, but also treating them as first-class conditions that happen regularly. Hadoop uses a cluster of plain old commodity servers with no specialized

ICCIT15@CiTech, Bangalore hardware or network infrastructure to form a single, logical, storage and compute platform, or cluster,that can be shared by multiple individuals or groups. Computation in HadoopMap

Reduce is performed in parallel, automatically, with a simple abstraction for developers that obviate complex synchronization and network programming. Unlike many other distributed data processing systems, Hadoop runs the user-provided processing logic on the machine where the data lives rather than dragging the data across the network; a huge win for performance.

The main contribution of my work exists in designing,

Implementing and controlling of internet traffic through big-data analytics. Firstly I have created a virtual network by using the

Mininet tool, instantly makes virtual network on my laptop it contains switches, routers, and hosts. It can be controlled by using open daylight controller. To capture the packets flow from the virtual network we use a Wire shark like tool. we capture the packet log file and save it in a text file and then log file is given as input to the Hadoop to process the large data of log file and we will visualized the summary report that contains the flow analysis details which as sender ip , destination ip and it also have the size of byte sent. By using that file we control the traffic by using the congestion control algorithm to control the online traffic.

The main objective of the work includes:-

To design and implement a Traffic flow identification system using Hadoop.

The traffic flow identification system will be very useful for network administrator to monitor faults and also to plan for the future.

2. BACKGROUND WORK

Over the past few years , a lot of tools have been developed and widely used for monitoring the internet traffic. Mininet is tool widely used to setup a virtual network in your laptop. So that we can able to simulate using these Mininet tool to identify the flow of the packets in the virtual network. Wire shark is a popular t raffic analyzer that offer’s user friendly graphics interface.

Tcpdump is also most popular tool of capturing and analyzing the internet traffic. Open daylight is an controller used to control of the packets in the Mininet virtual network. It acts like an controller to control the packets form where to where the packets needs to be sent that can be controlled by the Open daylight tool.

Most of the map reduce applications on Hadoop are developed to analyze large text, log files or web, in this we firstly packet processing or analysis for Hadoop that analyzes trace file in blocks it will process the block of file and give the result in parallel in distributed environment.

Page 415

International Journal of Engineering Research

Volume No.4, Issue Special 5

Methodology

For flow analysis we use some of the map Reduce algorithm to reassemble the flows in the network. We use the k-means algorithm for efficient clustering of the packets with in the network.

For flow control we use the congestion control algorithm for the control of internet traffic with the Hadoop. By this we able to control the packets flow control very easily and in the very effective manner.

3. LITERATURE SURVEY

A lot of research is done to measure the performance of the internet traffic using Hadoop. S.J Shafer, S. Rixner, and Alan L

[2]. Cox discuss about performance of distributed Hadoop file system. Hadoop is most accepted framework for managing huge amount of data in distributed environment. Hadoop makes use of user-level file system in distributedmanner. The HDFS (Hadoop

Distributed File System) is a portable across both hardware and software platforms. In this paper a detailed performance analysis of HDFS was done and it displays a lot of performance issues. Initially the issue was on architectural bottleneck that exist in the Hadoop implementation which resulted in the inefficient usage of HDFS.

The second limitation was based on portability limitations which limited the java implementation from using the features of naive platform. This paper tries to find solution for the bottleneck and portability problems in the HDFS.

T. Benson, A. Akella, and D. A. Maltz,[3] wrote a paper on

“Network traffic characteristics of data centers in the wild” In this paper the researcher conduct an experiential report of the network in few data centers which belongs to differenttypes of organizations, enterprise, and university. In spite of the great concern in developing network for data centers,only few information is known about the characteristic of network-level traffic. In this paper they gather informationabout the SNMP topology, its statistics and also packet level traces. They examine the packet-level and flow-leveltransmission properties. They observe the influence of the network traffic on the network utilization, congestion, linkutilization and also packet drops.

A. W. Moore and K. Papagiannaki [4] give traffic classification on basis of full packet payload. In this paper a comparison was done between port-based classification and content- based classification. The data used for comparison was full payload packet traces which were collected from the internet site. The output of the comparison showed that the traffic classified based on the utilization of the well-known ports. The paper also proved that port based classification can identify 70% of the overall traffic. L. Bernaille, R. Teixeira[5] tells that port-based classification is not a reliable method to do analysis. This paper proposes a technique that depends on the observation of the first five packets of TCP connection to identify the application.

J.Erman, M.Arlitt, and A.Mahanthi [5] Traffic Classification

Using Clustering Algorithms In this paper we evaluated three different clustering algorithms, namely K-Means, DBSCAN, and

Auto Class, for the network traf_classi_cation problem. Our analysis is based on each algorithm's ability to produce clusters

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 that have a high predictive power of a single traffic class, and each algorithm's ability to generate a minimal number of clusters that contain the majority of the connections. The results showed that the Auto Class algorithm produces the best overall accuracy.

4. EXISTING SYSTEM

Today the internet users are growing very rapidly. Each and every person is utilizing the internet through one or the other way. so, possibly internet traffic may also increases. single it is not so easy to handle the very big internet traffic. And storing these large data and processing these large data is not possible to handle by single system.

The problem with this is handling the internet traffic using single server is not scalable to handle bigger network and may be chances of single point of failure.

5. PROPOSED SYSTEM

Figure 1 System Architecture

I.

Overview

Handling Internet Traffic Consists of Three main components namely Mininet (Network), Wire shark and Hadoop

Cluster.Figure:-1 shows the key components required for Flow

Analysis. The functions of the above 3 Components are described below:

Mininet: Mininet is the tool used to setup of the network.

Mininet is a network emulator it creates a realistic virtual network, running real kernel, switch and application code on a single machine(VM, cloud or native), it uses light weight virtualization to make a single system look like a complete network.

Wire shark:Wire shark is the tool used to capture, filter and inspect packet flow in the network. A network analysis tool formerly known as Ethereal, captures packets in real time and display them in human readable format. Wire shark includes filters, color-coding and other features that let you dig deep into network traffic and inspect individual packets.

HadoopCluster:Here it consists of two parts namely

1.

Flow analysis will have the entire detail of traffic analysis of the big-data.Inthis it will take care of map-

Page 416

International Journal of Engineering Research

Volume No.4, Issue Special 5 reduce and clustering algorithms to make a way to obtain the output.

2.

Flow control will be taking care of how to control the large amount of traffic without collision and loss of packets.

II.

Setting Up of Network Through Mininet.

Mininet is a toll used to connect the virtual network with in a laptop. So, that we can able to connect n-number of switches between the sender and destination hosts.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Figure 2:- Mininet Setup

In the above figure host1 and host2 are the source and destination of the virtual network with in the laptop. S1 and S2 are the switches that are present in between the hosts and corresponding paths are controlled by using the Opendaylight controller I can able to control the virtual network setup in the computer. Flows and operations on the network can be modified or changed by the Opendaylight control.

III.

Capturing Packet Flow using Wire shark

Wire shark is the tool used to capture and modify the packet flow with in the network.

After setting up of network through mininet, then next step is used to capture the packet flows from source host to destination host in between some switches is used to connect the end hosts.

In this wire shark tool is used to capture the packet flows details and it is in the form of log Files, and the collected log Files is stored in text file, and further it is processed to the next step.

IV.

Packet Flows can be analyzed in Hadoop

For traffic trace collected with wire shark like tools, there is no obvious flow information available, so the first step before analysis is to recover flows from the individual packets. Our system implements a bunch of MapReduceapplications, include flow builder, which can quickly and efficiently reassemble flows and conversation, even if they are stored across several traffic files. The second step is flow clustering aims to extract groups of flows that share common or similar characteristics and patterns within the same clusters. In this paper I have choose the k-means algorithm to identify different groups of flows according to their statistical data.

V.

Congestion control Algorithm to control flows in

Hadoop

As by looking the above the flows cannot be handled through single system if huge no of traffic is came then it cannot be handled so we planned to control the flows with in the network through this lot of congestion in the network packet can be avoided.

Figure 3:- Flows controlled in Hadoop

In the above figure 3 show the entire network how the flows can be able control. I have implemented the congestion control algorithm to control the flows from source host to destination hosts.

If the packet is of byes exceeds above some range then the path form host-1 to host-2 can be changed or else the packet bytes doesn’t exceeds the range the old path form host -1 to host-2 can be used for packet transmission.

Here it will check for bytes I wrote the algorithm that will handle control on only bytes.

Case-1: If byte >= specified then the path is changed accordingly and

Case-2: If bytes<specified then the path is same old one.

6. SCREEN SHOTS

Figure-1:- Setting Up of Network Through Mininet.

ICCIT15@CiTech, Bangalore

Page 417

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Figure-2:- Controlling of network through open daylight controller.

Figure-4:- Capturing Packet Flow using Wire shark.

7. CONCLUSION

In this paper, I have presented the work on Idenifing and

Monitoring the Internet Traffic with Hadoop. Setting up of the network and obtaining the trace file and input trace file given as input to the Hadoop and flow analysis can be done. And we also implemented the congestion control algorithm to control the internet traffic analysis. Flow control can be done by using congestion control algorithm to control the internet traffic in the

Hadoop cluster.

REFERENCES i.

M. Yu, L. Jose, and R. Miao,” Software defined traffic measurement with open sketch ,” in Proceedings 10th USENIX Symposium on

NetworkedSystems Design and Implementation NSDI, vol, 13, 2013.

ii.

Scscc J. Shafer, S. Rixner, and Alan L. Cox, “The Hadoop

Distribution Filesystem: Balancing Portability and Performance”, in

Proceedings of iii.

2010.

the 10th ACM SIGCOMM conference on Internet measurement ACM iv.

T. Ben son, A. Akella, and D. A. Maltz, “Network traffic characteristics of data centers in the wild,” in Proceedings of the 10th

ACMSIGCOMM conference on Internet measurement. ACM, 2010, pp. 267 – 280.

v.

A.W. Moore and K. Papagiannaki,” Toward the accurate identific ation of network applications,” in Passive and Active network

Measurement.Springer, 2005, pp.41-54.

vi.

5.J. Erman, M. Arlitt, and A. Mahanti, “Traffic classification usingclustering algorithms,” in Proceedings of the 2006 SIGCOMMworkshop on

Mining network data. ACM, 2006, pp. 281 – 286.

vii.

viii.

Apache Hadoop Website, http://hadoop.apache.org/

YuanjunCai, Bin Wu, XinweiZhang, Min Luo and Jinzhao Su, “Flow identification and characteristics mining from internet traffic with hadoop” in

978-1-4799-4383-8/1 at IEEE 2014

Figure-3:- Setting up of the content that need to be captured in the wireshark.

ICCIT15@CiTech, Bangalore

Page 418

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Authentication Prevention Online Technique in Auction Fraud Detection

Anitha k., Priyanka M., Radha Shree B

Deptt. of CSE, Raja Rajeswari College of Engineering, kumbalgudu Bangalore 560074 anithakrishna14@gmail.com, priyagowda444@gmail.com, radha13shree@gmail.com

Abstract — The E-business sector is rapidly evolving and the needs for web market places that anticipate the needs of the customers and the trust towards a product which are having more good rating. When most of the people are benefited from the online website trading, culprits are also taking advantage to conduct fraudulent activities against honest parties to obtain more fake profits. Therefore understanding the requirement for analysing for user needs and trust providence in order to improve the usability and user retention of a website can be addressed by personalizing and using a fraud product detection system.

Keywords — online auction, fraud detection, fraud prevention, online authentication.

1.

INTRODUCTION

Since the emergence of the World Wide Web (WWW), electronic commerce, commonly known as e-commerce, has become more famous. Websites such as eBay and so on likestartup websites allow Internet users to buy and sell products and provide services online, which benefits everyone in terms of usefullness and profitability. The regular online shopping business model allows sellers to sell a product or service at a default price, where buyers can choose to purchase if they find it to be a good deal. Online auction however is a different business structure by which items are sold through price auction. There is often a starting price and expiration time specified by the retailer. Once the auction starts, possible buyers bid against one another, and the winner gets the item with their highest winning auction.

Similar to any formats supporting economic transactions, online auction attracts criminals to indulge in fraudulent activities. The varying types of auction fraud are: Products purchased by the buyer are not delivered by the retailer. The delivered products do not match the descriptions that were posted by the retailer.

Malicious retailers may even post non-existing or fake items with fake description to cheat buyers, and request payments to be strange directly to them via bank-to-bank wire transfer.

Furthermore, some culprits apply e-mail techniques to steal highrated retailer's accounts so that possiblebuyers can beeasily cheated due to their high rating. Personaffected byfraud transactions usually lose their amount and in most cases are not refundable. As a result, the status of the online auction services is hurt significantly due to culprit crimes.

To provide some security against fraud, internet marketing sites often provide security to culprit victims to cover their loss up to a certain amount. To reduce the amount of such compensations and improve their online fame, internet marketing providers often adopt the following approaches to control and prevent fraud. The identifies of registered users are validated through email, SMS, or phone verifications. A rating system where buyers provide feedbacks is commonly used in internet marketing sites so that fraudulent retailers can be caught immediately after the first wave of buyer complaints. In addition, proactive moderation systems are built to allow human experts to manually investigate suspicious retailers or buyers. Even though electronic-commerce sites spend a large budget to fight frauds with a moderation system, there are still many outstanding and challenging cases. Criminals and fraudulent sellers frequently change their accounts and IP addresses to avoid being caught.

Also, it is usually infeasible for human experts to investigate every buyer and seller to determine if they are committing fraud, especially when the e-commerce site attracts a lot of tra ffi c. The patterns of fraudulent sellers often change constantly to take advantage of temporal trends. For instance, fraudulent sellers tend to sell the “hottest” products at the time to attract more potential victims. Also, whenever they find a loophole in the fraud detection system, they will immediately leverage the weakness.

In this paper, we consider the application of an authentication prevention technique for auction fraud detection in a major auction site, where hundreds to thousands of new auctions take place every day. Therefore, it is necessary to develop an automatic prevention system that only directs suspicious cases for expert inspection, and passes the rest as clean cases. The moderation system for this site extracts rule-based features to make decisions. Where with years of experience human experts have created many set of rules to detect the suspicious fraudulent culprits and the resulting features are often binary.1 using rank algorithm that is for instance we can create a binary feature from giving the ratings i.e. if the feature value is 1 if the rating of a seller is lower than a threshold; otherwise it is 0. The final prevention decision is based on the fraud score of each case.

Which can be done only by preserving the database of a retailers and buyers with all the basic details and investigation is done while keeping their workload at a reasonable level.

Since the fraudulent sellers change their pattern very fast, it requires the model to also evolve dynamically. However, for o ffl ine models it is often non-trivial to address such needs. Based on the reviews if the case is determined as fraudulent, all the cases from this retailer along with his pending products will be removed immediately. Therefore, smart fraudulent sellers tend to change their patterns immediately to avoid being caught. Also, since the training data is from human labelling, the high cost makes it almost impossible to obtain a very large sample.

Therefore for such systems (i.e. relatively tiny sample size with many features with temporal pattern), authentication prevention feature selection is often required to provide good performance.

Human experts are also willing to see the results of authentication prevention feature selection to monitor the effectiveness of the current set of features.

Our Contribution: In this paper, we study the problem of building online models for the authentication prevention

ICCIT15@CiTech, Bangalore

Page 419

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 technique system, which essentially evolves dynamically over [7] to train convenient decision treesto select good sets of time. We propose rank probit authentication model frame work for the binary response we are applying a well known technique in statistical literature called search variable selection (SSVS).

The paper is organised as follows. In Section 2 we first summarize several specific features of the application and describe authentication prevention frame work with fitting details.

We review the related work in literature in Section 3. In section 4 we show the experimental results that compare all the models proposed in this paper and several simple baselines. Finally, we conclude and discuss future work in section5.

2.

OUR

Our application is to detect online frauds for a major websites where hundreds of thousands of new auction cases are posted every day. Every new case is sent to authentication prevention system for in advance to assess the risk of being fraud. The system is being featured by:

Rule-based features: Human experts with years of experience created many rules to detect whether a user is fraud or not. An example of such rules is “blacklist”, i.e. whether the user has been detected or complained as culprit before. Each rule can be concerned as a binary feature that indicates the fraud likeliness.

Selective labelling : If the fraud score is above a certain threshold level, the case will enter a queue for further investigation by human experts. Once it is evaluated, the final result will be labelled as Boolean feature, i.e. genuine or culprit.

Cases with higher scores have higher priorities in the queue to be evaluated. The cases whose fraud score are belowthe threshold are determined as clean by the system without any human judgment.

Fraud churn: Once one case is labelled as fraud by human experts, it is very likely that the retailer is not trust-able and may be also selling other fraudulent products; hence all the items submitted by the same retailer are labelled as fraud too. The fraudulent retailer along with his/her cases will be removed from the website immediately once detected.

User feedback: Buyers can file complaints to claim

Loss if they are recently cheated by fraudulent retailers.

Similarly retailers may also complain if his/her products have been judged as mistakenly

3.

METHODOLOGY

RELATED WORK

Online auction fraud is always recognized as an importantissue.

There are contexts on websites to teach people how toprevent online auction fraud (e.g. [35, 14]). Categorizesauction fraud [10] into several types and proposes strategies tofight them.

Statussystems are used extensively by websites to detect auction frauds, although many of them useinnocent approaches.

Summarized[31] several key propertiesof a good status system and also the challenges for themodern status systems to extract user feedback. Otherrepresentative work connecting status systems with onlineauction fraud detection include [32, 17, 28], where thelast work [28] introduced a Markov random field model witha belief propagation algorithm for the user status.Other than status systems, machine learned models have been applied to moderation systems for monitoring anddetecting fraud. Proposed features and make predictions. Developed [23] another convenient approach that uses social networkanalysis and decision trees. Proposed [38] an offline reversion structuring framework for the auction fraud detectionmoderation system which incorporates domain knowledgesuch as coefficient bounds and multiple instance learning.

In this paper we treat the fraud detection problem as a binaryclassification problem. The most frequently used structuresfor binary classification include planning reversion

[26],probit reversion [3], and support vector machine (SVM) [12] anddecision trees [29]. Feature selection for reversion structures isoften done through introducing an act of punishing on the coefficients.Typical punishments include vault regression [34]

(L2 penalty)and Lasso [33] (L1 penalty). Compared to vault regression,Lasso shrinks the unnecessary coefficients to zero instead ofless values, which provides both awareness and good performance.Stochastic search variable selection (SSVS) [16] uses“spike and slab” prior [19] so that the posterior of the coefficientshave some possibility being 0. Another approachis to consider the variable selection problem as model selection,i.e. put priors on structures (e.g. a Bernoulli prior oneach coefficient being 0) and compute the marginal posteriorprobably of the structure given data. People then eitheruse Markov Chain Monte

Carlo to sample structures from themodel space and apply

Bayesian model averaging [36], ordo a stochastic search in the structure space to find the posteriormode [18]. Among non-linear models, a tree model usually handles the non-linearity and variable selection simultaneously.Representative work includes decision trees[29], random forests [5], gradient boosting [15] and

Bayesianadditive regression trees (BART) [8].

Authentication prevention considers the scenario thatthe input is given one piece at a time, and when receiving a batch of input the structure has to be updated accordingto the data and make predictions and servings for the nextbatch. The concept of online modelling has been applied tomany areas, such as stock price forecasting (e.g. [22]), webcontent optimization [1], and web spam detection (e.g. [9]).Compared to offline models, online learning usually requiresmuch lighter computation and memory load; hence it canbe widely used in real-time systems with continuous supportof inputs. For online feature selection, representativeapplied work include [11] for the problem of object trackingin computer vision research, and [21] for contentbasedimage retrieval. Both approaches are simple while in this paperthe embedding of SSVS to the online structuring is moreprincipled.Multiple instance learning, which handles the training datawith bags of instances that are labelled positive or negative,is originally proposed by [13]. Many papers has been publishedin the application area of image classification suchas

[25, 24]. The logistic regression framework of multiple instancelearning is presented in [30], and the SVM framework.

4.

EXPERIMENTS

We conduct our experiments on a real online auction fraud detection data set collected from a major earlier website. We consider the following online structures:

ON-PROB is the online probit reversion structure

ICCIT15@CiTech, Bangalore

Page 420

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

ON-SSVSB is the online probit reversion structure with “spike and slab” prior on the coe ffi cients, and the coefficients for the binary rule features are bounded to be positive.

ON-SSVSBMIL is the online probit reversion structure with multiple instance learning and “spike and slab” prior on the coe ffi cients. The coe ffi cients for the binary rule features are also bounded to be positive.

For all the above online structures we ran 10000 iterations plus 1000 burn-ins to guarantee the convergence of the Gibbs sampling.

We compare the online structures with a set of o ffl ine structuresthat are similar to [38]. For observation i, we denote thebinary response as yiand the feature set as x . For multipleinstance learning purpose we assume retailer i has Ki cases and denote the feature set for each case l as x .The o ffl ine structures are:

Experienced has the human-tuned coe ffi cients set by domain experts based on their knowledge and recent fraud fighting experience?

OF-LR is the o ffl ine planning reversion structure that minimizes the loss function.

OF-MIL is the o ffl ine planning reversion with multipleinstance learning that optimizes the loss function.

OF-BMIL is the bounded o ffl ine planning reversion with multiple instance learning that optimizes the loss function in

(39) such that ß = T , where T is the

Pre-defined vector of lower bounds.

All the above o ffl ine structures can be fitted via the standardL-

BFGS algorithm [39]. This section is organized as follows. We first introduce the data and describe the general settings of the structures. We specify the analyzing metric for this experiment: the rate of missed customer feedback. Finally we show the performance of all the structures.

4.1 The Data and Structure Setting

Our application is real authentication prevention and fraud detection system designed for a major earlier online auction websitethat attracts hundreds of thousands of new auction postingsevery day. The data consist of around 2M expert labelledauction cases with ∼ 20K of them labelled as fraud duringSeptember and October 2010. Besides the labelled data wealso have unlabelled cases which passed the “pre-screening” ofthe temperance system (using the Expert structure). The numberof unlabelled cases in the data is about 6M-10M. For eachobservation there is a set of features indicating how

“suspicious”it is. To prevent future fraudulent retailers gaming aroundour system, the exact number and theme of these featuresare highly confidential and cannot be released. Besides theexpert-labelled binary response, the data also contains a listof customer feedback every day, filed by the victims of the culprits.

Our data in October 2010 contains a sample of around 500 customer feedback.

Figure 1: Fraction of bags versus the number ofcases per bag

(“bag size”) submitted by fraudulentand clean sellers respectively. A bag contains all thecases submitted by a seller in the same day.

Human experts often label cases in a “pouched” way, i.e. at any point of moment they select thecurrent most

“suspicious” retailer in the system and examineall of his/her cases posted on that day. If any of these cases is culprit, all of this retailer's cases will be labelled as fraud.Therefore we put all the cases submitted by a retailer in thesame day into a pouch. In

Figure1 we show the distributionof the bag or pouch size posted by fraudulent and clean retailer's respectively.From the figure, we do see that there are someproportion of retailer's selling more than one item in a day,and the number of bags or pouches

(retailers) decays exponentially as thepouch size increases. This indicates that applying multiple instance learning can be useful for this data.

Figure 2: The boxplots of the rates of missed customer

Complaints on a daily basis for all the o ffl ine and online models. It is obtained given 100% workloadrate.

It is also interesting to see that the fraudulent retailer’s tend to post moreauction cases than the clean retailer’s, since it

ICCIT15@CiTech, Bangalore

Page 421

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 potentially leadsto higher fake profit.We conduct our experiments for the o ffl ine structures OF-LR,OF-MIL and OF-BMIL as follows: we train the structures usingthe data from September and could re-rank all the cases (labeled and unlabeled) in the group and select the first M cases with the highest scores.We call M/N the “workload rate” in the following. For a passage specific then test the structures on thedata from October. For the online structures ON-PROB, ONSSVSBand ON-SSVSBMIL, we create workload rate such as 100%, we could count the number of reported culprits feedback m in the M cases.Denote the total number of reported culprit feedback in the test group as C, we describe the rate of missed feedbackas 1−Cm/C given the groups with varioussizes (e.g. one day, 1/2 day, etc.) starting from the beginningof September to the end of October, update the structuresfor every group, and test the structures on the next group. Tofairly compare them with the o ffl ine structures, only the groupsin October are used for analysing. workload rate M/N. Note that since in structure analyzing we rerank all the cases including bothlabeled and unlabelled data, different structures with the sameworkload rate (even 100%) usually have different ratings of missed customer feedback. We

4.2 Analysing Metric

In this paper we adopt an analyzing metric introducedin [38] that directly reflects how many culprits a structure cancatch: the rate of missed complaints, which is the part of customer feedback that the structure cannot capture as culprit. Note that in our argue structure A is better than structure B if given the same workload rate, the rate of missed customer feedback for A is less than B.

application, the labelled data was notcreated through random sampling, but via a pre-screening authentication system using the experienced-tuned coefficients (thedata were created when only the experienced structure was deployed).This in realityintroduces biases in the analyzing for the metrics which only use the labeled observations but ignore theunlabelled ones. This rateof missed feedback metric howevercovers both labelled and unlabelled data since buyersdo not know which cases are labelled, hence it is unbiasedor analyzing the structure performance.

Recall that our data were generated as follows: For each case the authentication system uses a human-tuned linear scoringfunction to determine whether to send it for skilled labeling. If so, skilled review it and make a genuine or culprit judgment; otherwise it would be determined as clean andnot reviewed by anyone. Although for those cases that arenot labelled we do not quickly know from the systemwhether they are genuine or not, the real culprit cases wouldstill show up from the feedback filed by victims of theculprits. Therefore, if we want to prove that one machinelearned structures is better than another, we have to make surethat with the same or even less skilled labeling workload, the former structure is able to catch more culprits (i.e. generateless customer complaints) than the latter one.

Figure 4: For ON-SSVSBMIL with daily batches, delta= 0.7 and omega = 0.9, the posterior probability of B jt

= 0 (j is the feature index) over time for a selected set of features.

Figure 3: The rates of missed customer complaints for work load rates equal to 25%, 50%, 75% and 100% for all the offline models and online models with daily batches.

For any test group, we regard the number of labeled casesas the expected 100% workload N, and for any structure we

ICCIT15@CiTech, Bangalore

Figure 5: For ON-SSVSBMIL with daily batches, delta= 0.7 and omega = 0.9, the posterior mean of Bjt (j is the feature index) over time for a selected set of features.

Finally, the most interesting set of features are the ones that have a large variation of pjt day overday. One important reasonto use authentication prevention feature selection in our

Page 422

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 application is to capture the dynamics of those unstablefeatures. xiv.

T. Dietterich, R. Lathrop, and T. Lozano-P´erez.Solving the multiple

In Figure 5 we show the posterior mean of a randomly selected instance problem withaxis-parallel rectangles. Artificial Intelligence,89(1-2):31– set of features. It is obvious that whilesome feature coefficients are always close to 0 (unimportant features), there are also many features with huge variationof the

71, 1997. xv.

Federal Trade Commission. Internet auctions: A guidefor buyers and sellers. http://www.ftc.gov/bcp/conline/pubs/online/auctions.htm, 2004. xvi.

J. Friedman. Stochastic gradient boosting.

Journal of the Royal coefficient values.

Statistical Society.Series B (Methodological), 58(1):267–288, 1996 Computational

5. CONCLUSION AND FUTURE WORK

Statistics & Data Analysis, 38(4):367–378, 2002.

xvii.

E. George and R. McCulloch. Stochastic search variable selection.Markov chain Monte Carlo inpractice, 68:203–214, 1995.

In this paper we build online structures for the authention prevention auction fraud and investigating system designed for a major traditional online auction website. By empirical experiments on a realwordonline auction fraud investigating prevention data. We show that our proposed authentication probit structure framework, which combinesonline authentication feature selection, bounding coefficients from experienced knowledge and multiple instance learning, can significantlyimprove over baselines and the human-tuned model.

Notethat this online authentication prevention structuring framework can be easily extendedto many other applications, such as web spam detection,content optimization and so forth.

Regarding to future work, one direction is to include theadjustment of the selection bias in the online authentication prevention structure trainingprocess. It has been proven to be very effective for offline structures in [38]. The main idea there is to assume all theunlabeled samples have response equal to 0 with a very less weight. Since the unlabeled samples are obtained from an effective prevention system, it is reasonable to assume xviii.

D. Gregg and J. Scott. The role of reputation systems in reducing online auction fraud.International Journal of Electronic Commerce, 10(3):95–120,

2006. xix.

C. Hans, A. Dobra, and M. West.Shotgun stochastic search for ¸Slarge pT regression. Journal of theAmerican Statistical Association, 102(478):507–

516,2007. xx.

H. Ishwaran and J. Rao. Spike and slab variable selection: frequentist and bayesian strategies. The Annals of Statistics, 33(2):730–773, 2005. xxi.

T. Jaakkola and M. Jordan.A variational approach to bayesian logistic regression models and their extensions.In Proceedings of the sixth international workshop on artificial intelligence and statistics.Citeseer, 1997. xxii.

xxiii.

W. Jiang, G. Er, Q. Dai, and J. Gu. Similarity-based

Online feature selection in content-based image retrieval. Image

Processing, IEEE Transactions on, 15(3):702–712, 2006. xxiv.

K. Kim. Financial time series forecasting using Support vector machines. Neurocomputing,55(1-2):307–319, 2003. xxv.

Y. Ku, Y. Chen, and C. Chiu.A proposed data mining approach for internet auction fraud detection.Intelligence and Security Informatics, pages 238–

243, 2007. xxvi.

O. Maron and T. Lozano-P´erez.A framework for multiple-instance learning.In Advances in neural information processing systems, pages 570–576,

1998. xxvii.

O. Maron and A. Ratan.Multiple-instance learning for natural scene classification.In The Fifteenth International Conference on Machine Learning, thatwith high probabilities they are genuine. Another futurework is to deploy the online authentication prevention structures described in this paperto the real production system, and also other applications.

REFERENCES i.

D. Agarwal, B. Chen, and P. Elango. Spatio-temporalmodels for estimating click-through rate. InProceedings of the 18th international conference onWorld Wide Web, pages 21–30. ACM, 2009. ii.

S. Andrews, I. Tsochantaridis, and T. Hofmann.Support vector machines for multiple-instancelearning. Advances in neural information processingsystems, pages 577–584, 2003. iii.

C. Bliss. The calculation of the dosage-mortalitycurve.Annals of

Applied Biology, 22(1):134–167, 1935. iv.

A. Borodin and R. El-Yaniv.Online computation andcompetitive analysis, volume 53.Cambridge UniversityPress New York, 1998. v.

L. Breiman. Random forests.Machine learning,45(1):5–32, 2001. vi.

R. Brent. Algorithms for minimization withoutderivatives.Dover

Pubns, 2002. vii.

D. Chau and C. Faloutsos. Fraud detection inelectronic auction. In

European Web Mining Forum(EWMF 2005), page 87. viii.

[8] H. Chipman, E. George, and R. McCulloch. Bart:Bayesian additive regression trees. The Annals ofApplied Statistics, 4(1):266–298, 2010. ix.

[9] W. Chu, M. Zinkevich, L. Li, A. Thomas, andB. Tseng. Unbiased online active learning in datastreams. In Proceedings of the 17th ACM

SIGKDDinternational conference on Knowledge discovery anddata mining, pages

195–203. ACM, 2011. x.

C. Chua and J. Wareham. Fighting internet auctionFraud: An assessment and proposal. Computer,37(10):31–37, 2004. xi.

R. Collins, Y. Liu, and M. Leordeanu. Online selection of discriminative tracking features. IEEE Transactionson Pattern Analysis and

Machine Intelligence, pages1631–1643, 2005. xii.

N. Cristianini and J. Shawe-Taylor.An introduction to xiii.

Support Vector Machines: and other kernel-basedlearning methods.

Cambridge university press, 2006.

1998. xxviii.

P. McCullagh and J. Nelder.Generalized linear models.Chapman &

Hall/CRC, 1989. xxix.

A. B. Owen. Infinitely imbalanced logistic regression. J. Mach. Learn.

Res., 8:761–773, 2007. xxx.

S. Pandit, D. Chau, S. Wang, and C. Faloutsos. Netprobe: a fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th international conference on World Wide Web, pages 201–210. ACM,

2007. xxxi.

106, 1986.

J. Quinlan. Induction of decision trees. Machine learning, 1(1):81– xxxii.

V. Raykar, B. Krishnapuram, J. Bi, M. Dundar, and R. Rao. Bayesian multiple instance learning: automatic feature selection and inductive transfer. In

Proceedings of the 25th international conference on Machine learning, pages

808–815. ACM, 2008. xxxiii.

P. Resnick, K. Kuwabara, R. Zeckhauser, and E. Friedman.

Reputation systems. Communications of the ACM, 43(12):45–48, 2000. xxxiv.

P. Resnick, R. Zeckhauser, J. Swanson, and K. Lockwood. The value of reputation on eBay: A controlled experiment. Experimental Economics, 9(2):79–

101, 2006. xxxv.

R. Tibshirani. Regression shrinkage and selection viaThe lasso. xxxvi.

A. Tikhonov. On the stability of inverse problems. In xxxvii.

USA

Dokl.Akad. Nauk SSSR, volume 39, pages 195–198,1943.

Today. How to avoid online auction fraud.http://www.usatoday.com/tech/columnist/2002/05/07/yaukey.htm, xxxviii.

2002.

L. Wasserman. Bayesian model selection and model averaging.

Journal of Mathematical Psychology, 44(1):92–107, 2000. xxxix.

M. West and J. Harrison.Bayesian forecasting and dynamic models.

Springer Verlag, 1997. xl.

L. Zhang, J. Yang, W. Chu, and B. Tseng.A machine-learned proactive moderation system for auction fraud detection.In 20th ACM Conference on

Information and Knowledge Management (CIKM).ACM, 2011.

ICCIT15@CiTech, Bangalore

Page 423

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Differential Query Services Using Efficient Information Retrieval Query

Scheme In Cost-Efficient Cloud Environment

Shwetha R., Kishor

Kumar K., Dr.

Antony

P. J.

Department of Computer Science and Engineering, KVGCE, Sullia, DK

Shwethas183@gmail.com

Abstract

Cloud computing is a new technology where the users tend to get the services through internet based on their demand. In this new technology users should receive the services without much delay and the costs also should be reduced. The most important aspect in this environment is maintaining privacy and efficiency. The original key word based file retrieval scheme proposed by ostrovsky allows the users to retrieve the requested files without leaking any information but it causes heavy querying overhead. In this paper we present an efficient information retrieval query

(EIRQ) scheme to reduce the querying overhead in the cloud.

In EIRQ, a user will give the query along with the rank then the user will retrieve the files based on the rank. The rank shows the percentage of files that will be returned to the user.

Keywords - Cloud computing, cost efficiency, differential query services, privacy.

I INTRODUCTION

Cloud computing is the delivery of computing resources over the Internet. It has been widely adopted in broad applications and is becoming more pervasive.The main reasons behind cloud computing sharp growth are increases in computing power and data storage, exponential growth of social network data, and modern data centres, some of which can suffer from high maintenance costs and low utilization. There are also challenges in the development of reliable and cost-effective cloud-based systems.Cloud computing presents a new way to supplement the current consumption and delivery model for IT services based on the Internet, by providing for dynamically scalable and often virtualized resources as a service over the

Internet. Cloud computing is the use of computing resources

(hardware and software) which are available in remote location and accessible over the network.

Users are able to buy these computing resources as a utility, on demand.The name comes from the common use of a cloudshaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user's data, software and computation.

Cloud computing as an emerging technology is expected to reshape information technology processes in the near future [1]. Due to the overwhelming merits of cloud computing, e.g., cost-effectiveness, flexibility and scalability, more and more organizations choose to outsource theirdata for

ICCIT15@CiTech, Bangalore sharing in the cloud. As a typical cloud application, an organization subscribes the cloud services and authorizes its staff to share files in the cloud. Each file is described by a set of keywords, and the staff, as authorized users, can retrieve files of their interests by querying the cloud with certain keywords. In such an environment, how to protect user privacyfrom the cloud, which is a third party outside the security boundary of the organization, becomes a key problem.

User privacy can be classified into search privacy and access privacy[2]. Search privacy means that the cloud knows nothing about what the user is searching for, and access privacy means that the cloud knows nothing about which files are returned to the user. When the files are stored in the clear forms, a proper solution to protectuser privacy is for the user to request allof the files from the cloud; this way, the cloud cannot know which files the user is really interested in. While this does provide the necessary privacy, the communication cost is high.Private searching was proposed by Ostrovsky et al. [3][4] which allows a user to retrieve files of interest from an untrusted server without leaking any information. However, the Ostrovsky scheme has a high computational cost, as it requires the cloud to process the query on everyfile in a collection. Otherwise, the cloud will assume that certain files, without processing, are of no interest to the user. It will quickly become a performance bottleneck when the cloud needs to processthousands of queries over a collection of hundreds of thousands of files.To make private searching applicable in a cloud environment, the previous work [7] designed a cooperateprivate searching protocol (COPS), where a proxy server, called the aggregation and distribution layer (ADL), is introduced between the users and the cloud. The ADL deployed inside an organization has two main functionalities: aggregating user queries and distributing search results. Under the ADL, the computation cost incurred on the cloud can be largely reduced, since the cloud only needs to execute a combined query once, no matter how many users are executing queries.

Furthermore, the communication cost incurred on the cloud will also be reduced, since files shared by the users need to be returned only once.Motivated by this goal, the new scheme, named Efficient Information retrieval for Ranked Query (EIRQ), in which each user can provide his own percentage along with the query to determine the percentage of matched files to be returned. The basic idea of EIRQ is to construct a privacy preserving mask matrixthat allows the cloud to filter out a certain percentage of matched files before returning to the ADL.

This is not a trivial work, since the cloud needs to correctly filter out files according to the rank of queries without knowing anything about user privacy.

Page 424

International Journal of Engineering Research

Volume No.4, Issue Special 5

II. RELATED WORK

A number of methods have been proposed in recent years to provide user privacy and also regarding private searching schemes.

Private searching on streaming data (2005).In this paper, R. Ostrovsky and W. Skeith[1] considered the problem of private searching on streaming data. He showed that in this model we can efficiently implement searching for documents under a secret criteria (such as presence or absence of a hidden combination of hidden keywords) under various cryptographic assumptions.The results can be viewed in a variety of ways: as a generalization of the notion of a PrivateInformation Retrieval as positive results on privacy-preserving datamining; and as a delegation of hidden program computation.

Searchable symmetric encryption (2006) allows a party to outsource the storage of his data to another party in a private manner, while maintaining the ability to selectively search over it. This problem has been the focus of active research and several security definitions and constructions have been proposed. In this paper we begin by reviewing existing notions of security and propose new and stronger security definitions. R. Curtmola,

J. Garay, S. Kamara, and R. Ostrovsky[2] presented two constructions that show secure under new definitions.

Interestingly, in addition to satisfying stronger security guarantees, the new constructions are more efficient than all previous constructions. Further, prior work on SSE only considered the setting where only the owner of the data is capable of submitting search queries. They also consider the natural extension where an arbitrary group of parties other than the owner can submit search queries. The SSE is formally defined in this multi-user setting, and presents an efficient construction.

Private searching on stream data Journal of Cryptology

(2007).Private searching on streaming data is a process to dispatch to a public server a program, which searches streaming sources of data without revealing searching criteria and then sends back a buffer containing the findings. From an Abelian group homomorphic encryption, the searching criteria can be constructed by only simple combinations of keywords, for example, disjunction of keywords.

The recent breakthrough in fully homomorphic encryption has allowed us to construct arbitrary searching criteria theoretically. Here consider a new private query, which searches for documents from streaming data on the basis of keyword frequency, such that the frequency of a keyword is required to be higher or lowerthan a given threshold. This form of query can help us in finding more relevant documents. Based on the state of the art fully homomorphic encryption techniques, we give disjunctive, conjunctive, and complement constructions for private threshold queries based on keyword frequency.Combining basic constructions, further presented a generic construction for arbitrary private threshold queries based on keyword frequency. The protocols are semantically secure as long as the underlying fully homomorphic encryption scheme is semantically secure.

Hierarchical attribute-based encryption and scalable user revocation for sharing data in cloud servers (2011).Access

control is one of the most important security mechanisms in cloud computing. Attributed based encryption provides an

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 approach that allows data owners to integrate data access policies within the encrypted data. However, little work has been done to explore flexible authorization in specifying the data user's privileges and enforcing the data owner's policy in cloud based environments. In this paper, G. Wang, Q. Liu, J. Wu, and

M. Guo [4]propose a hierarchical attribute based access control scheme by extending cipher text-policy attribute-based encryption (CP-ABE) with a hierarchical structure of multi authorities and exploiting attribute-based signature (ABS). The proposed scheme not only achieves scalability due to its hierarchical structure, but also inherits fine-grained access control with authentication in supporting write privilege on outsourced data in cloud computing. In addition, it showed decoupling the task of policy management from security enforcement by using the extensible access control mark up language (XACML) framework. Extensive analysis shows that this scheme is both efficient and scalable in dealing with access control for outsourced data in cloud computing.

Efficient information retrieval for ranked queries in cost-effective cloud environments (2012).Cloud computing as an emerging technology trend is expected to reshape the advances in information technology. In this paper, it addresses two fundamental issues in a cloud environment: privacy and efficiency. Here first review a private keyword-based file retrieval scheme proposed by Ostrovsky et.[5]. Then, based on an aggregation and distribution layer (ADL), presented a scheme, termed efficient information retrieval for ranked query

(EIRQ), to further reduce querying costs incurred in the cloud.

Queries are classifiedinto multiple ranks, where a higher ranked query can retrieve a higher percentage of matched files.

Extensive evaluations have been conducted on an analytical model to examine the effectiveness of this scheme.

Newconstructions and practical applications for private stream searching (2013).A system for private stream searching allows a client to retrieve documents matching some search criteria from a remote server while the server evaluating the request remains provably oblivious to the search criteria. In this extended abstract, we give a high level outline of a new scheme for this problem and an experimental analysis of its scalability.

The new scheme is highly efficient in practice. We demonstrate the practical applicability of the scheme by considering its performance in the demanding scenario of providing a privacy preserving version of the Google News Alerts service.

III. PROPOSED SYSTEM

Here the new proposed scheme called Efficient Information retrieval System is introduced. This new system uses the method of Flexible ranking mechanism which allows users to provide a rank and can personally decide how many matched files will cloud returns.

The basic idea is to construct a matrix that allows the cloud to filter out certain percentage of matched files. The new scheme reduces the querying overhead and also computational costs. The EIRQ system protects user privacy which allows each user to retrieve matched files on demand. This is not an easy work, because the cloud needs to correctly filter out files according to the rank of queries without knowing anything about user privacy. This has two extensions: the first extension shows the least amount of modifications from the Ostrovsky scheme,

Page 425

International Journal of Engineering Research

Volume No.4, Issue Special 5 and the second extension provides privacy by leaking the least amount of information to the cloud

Figure 1: Architecture Of the Proposed system

The proposed system has following four modules

Differential Query Services

The novel concept proposed here is a differential query service, to COPS, where the users are allowed to personally decide how many matched files will be returned. This is motivated by the fact that under certain cases, there are a lot of files matching a user’s query, but the user is interested in only a certain percentage of matched files. To illustrate, let us assume that Alice wants to retrieve 2% of the files that contain keywords

“A, B”, and Bob wants to retrieve 20% of the files that contain keywords “A, C”. The cloud holds 1,000 files, where {F1, . . . ,

F500} and {F501, . . . , F1000} are described by keywords “A,

B” and “A, C”, respectively. In the Ostrovsky scheme, the cloud will have to return 2,000 files. In the COPS scheme, the cloud will have to return 1,000 files. In our scheme, the cloud only needs to return 20 files. Therefore, by allowing the users to retrieve matched files on demand, the bandwidth consumed in the cloud can be largely reduced.

Efficient Information Retrieval for Ranked Query

The new scheme proposed is termed as Efficient

Information retrieval for Ranked Query (EIRQ), in which each user can choose the rank of his query to determine the percentage of matched files to be returned. The basic idea of

EIRQ is to construct a privacy preserving mask matrixthat allows the cloud to filter out a certain percentage of matched files before returning to the ADL. This is not a trivial work, since the cloud needs to correctly filter out files according to the rank of queries without knowing anything about user privacy.

Focusing on different design goals, we provide two extensions: the first extension emphasizes simplicityby requiring the least amount of modifications from the Ostrovsky scheme, and the second extension emphasizes privacyby leaking the least amount of information to the cloud.

Aggregation and Distribution Layer

An ADL is deployed in an organization that authorizes its staff to share data in the cloud. The staff members, as the authorized users, send their queries to the ADL, which will aggregate user queries and send a combined query to the cloud.

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Then, the cloud processes the combined query on the file collection and returns a buffer that contains all of matched files to the ADL, which will distribute the search results to each user.

To aggregate sufficient queries, the organization may require the

ADL to wait for a period of time before running our schemes, which may incur a certain querying delay.

Ranked Queries

To further reduce the communication cost, a differential query service is provided by allowing each user to retrieve matched files on demand. Specifically, a user selects a particular rankfor his query to determine the percentage of matched files to be returned. This feature is useful when there are a lot of files that match a user’s query, but the user only needs a small subset of them.

IV EXPERIMENTAL RESULTS & EVALUATION

In this section, we will compare three EIRQ schemes, from the following aspects: file survival rate and computation/ communication cost incurred on the cloud. Then,based on the simulation results, we deploy our program in Amazon Elastic

Compute Cloud (EC2) to test the transfer-in and transfer-out time incurred on the cloud when executing private searches. In the previous scheme there was a large querying overhead and consumes more bandwidth.

First, we test the transfer-in time in the real cloud, which is mainly incurred by receiving queries from the ADL.Then, we test the transfer-out time at the cloud, which is mainly incurred by returning files to the ADL. The results are shown below

Therefore, EIRQ-Efficient is most suitable to be deployed to a cloud environment. For example, the time to transfer a query from the ADL to the cloud consumes less than 100 seconds, and the time to transfer the buffer from the cloud to the ADL consumes less than 500 seconds, fewer than 4 common keywords.

Page 426

International Journal of Engineering Research

Volume No.4, Issue Special 5

V CONCLUSION

In this paper, we proposed three EIRQ schemes based on an

ADL to provide differential query services while protecting user privacy. By using our schemes, a user can retrieve different percentages of matched files byspecifying queries of different ranks. By further reducing the communication cost incurred on the cloud, the EIRQ schemes make the private searching technique more applicableto a cost-efficient cloud environment.

However, in the EIRQ schemes, we simply determine the rank of each file by the highest rank of queries it matches. Forour future work, we will try to design a flexible ranking mechanism for the EIRQ schemes.

REFERENCES i.

R. Ostrovsky and W. Skeith, “Private searching on streaming data,” in Proc. of CRYPTOLOGY, 2005.

ii.

R. Curtmola, J. Garay, S. Kamara, andR.

Ostrovsky, “Searchable symmetric encryption: improved definitions and efficient constructions,” in Proc.

ofACM CCS, 2006.

iii.

“Private searching on streaming data,” Journal of Cryptology, 2007.

iv.

G. Wang, Q. Liu, J . Wu, and M. Guo, “Hierarchical attribute-based encryption and scalable user revocation for sharing data in cloud servers,”

Computers & Security, 2011.

v.

Q. Liu, C. C. Tan, J. Wu, and G. Wang, “Efficient information retrieval for ranked queries in costeffective cloud environments,” in Proc. of

IEEE INFOCOM, 2012.

vi.

J. Bethencourt, D. Song, and B. Waters, “New constructions and practical applications for priv ate stream searching,” in Proc. ofIEEE S&P,

2013.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

ICCIT15@CiTech, Bangalore Page 427

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

An Efficient and Effective Information Hiding Scheme using Symmetric Key cryptography

Sushma U., Dr. D.R. Shashi Kumar

Dept. Of CSE,Cambridge Institute Of Technology Bangalore - 36, India ssushma999@gmail.com,shashikumar.cse@citech.edu.in

Abstract : Modern days are fully dependent on internet communication. Through net we can transfer data anywhere in the world to anyplace we want. The Internet was born out of academic efforts to share information; it never actually strove for the high security process. It plays a key role in becoming people online, it is very easy and effective but dangerous too in terms of data hacking and eavesdropping by hackers. It is needed that while using Internet data must be secured and should be personal. Image encryption can be used to protect data during transmission. Image encryption is a suitable process to protect image data. There are many cryptographic algorithms which are being used to secure multimedia data like images, but they have some definite advantages and disadvantages. So there is a requirement to develop a strong image cryptography algorithm for securing the image while transferring. In this paper, a new symmetric key cryptography algorithm has been proposed for color 3D images. In this algorithm a different type of key generation method is being introduced. This technique is unique and is used for th e first time for key generation. Here two public keys are used in cryptography process. Key generation is very important in symmetric as well as in the asymmetric key cryptographic algorithm. Here, we propose a work for developing a new symmetric key cryptography algorithm for image data to provide a secure transmission during the network communication. All the concepts related to this area are explained. This algorithm is totally lossless, such that image

pixel are preserved during encryption and decryption.

Key Words — Encryption, Decryption, Image.

I.

Introduction

The quick improvement of PC system correspondence, there is so natural to acquire computerized pictures through the system and further utilize, imitate and disseminate them. Computerized innovation brings us much handiness, however it likewise gives a chance to assailant or unlawful client to hack our own information. Regularly, there are two noteworthy methodologies which are utilized to ensure pictures. One is data concealing which incorporates obscurity, watermarking, steganography and spread channel. The other is encryption which incorporates traditional cryptography calculation [I]. The field of encryption and security is getting to be essential in the twenty first century, when an enormous measure of data is transmitted over the neighborhood and additionally the Internet. The advanced information and pictures account more than two-third of data that is transmitted over the Internet [2].Thus, a very dependable and hearty encryption calculation is obliged when the data is transmitted over the unsecured channels. Information encryption and information inserting are the most vital implies that can be

ICCIT15@CiTech, Bangalore utilized to transmit the wanted information with a high level of security and dependability while is passing through the unsecured channels [3]. The difficulties of media framework like advanced pictures, archives, sound, and feature rely on upon two components that media framework data size is here and there horribly monstrous and wish to be handled inside the constant [4]. Encryption calculations like DES, plan and RSA aren't fitting for sensible picture encoding, especially underneath the condition of undertakings of on-line interchanges [5].

Militaries, governments, privately owned businesses have utilized the encryption for quite a while to encourage mystery correspondence. The routine cryptographic frameworks chiefly have been produced for securing alphanumeric information as opposed to the picture and sound signals. The encryption of sound signs with customary encryption obliged extensive measure of processing power and time. A quick, dependable, and powerful calculation is needed to scramble both picture and sound with less processing time and high level of precision [6].

Mechanical advances in the advanced substance process, generation and conveyance has offered ascent to a scope of late flag handling applications amid which security dangers won't be taken care of in an exceedingly established style. These applications fluctuate from the interactive media framework, content creation and dissemination of cutting edge biometric sign procedure for validation, biometric recognizable proof and access administration. In a few of those cases, security and protection dangers may block the reception of late picture and feature transforming administrations. Therefore, the vocation of cryptanalytic systems in picture and feature process applications is transforming into more regular. The cryptanalytic methods used in these applications will be utilized as a part of two measurements (20) grid [7].

II. Literature Review

As indicated by Dr. Mohammad V Malakooti Mojtaba

Raeisi Nejad [6], they have proposed an calculation for pictures taking into account a novel misfortune less advanced encryption framework for interactive media utilizing the orthogonal changes for the encryption of picture information. This technique is in light of the square figure symmetric key cryptography. Creators have an accentuation on the improvement of a novel lossless advanced encryption framework for interactive media. They utilized the symmetric properties of the orthogonal changes to ascertain the opposite of the orthogonal grids amid the execution of the decoding methodology. They utilized a few traditional picture encryption methodologies, for example, Discrete Cosine

Change (OCT), Hadamard Transform (HT) and additionally

Malakooti Transform (MT) [6]. As indicated by Sahar Mazloom,

Amir-Masud Eftekhari -Moghadam, a picture encryption is by

Page 428

International Journal of Engineering Research

Volume No.4, Issue Special 5 one means or another not quite the same as content information scrambled because of some innate highlights of the pictures.

Pictures have endlessness data capacity highlight and the high relationship between pixels, that square measure regularly intense to handle like an instant message. Bizarrely interesting properties of the clamorous maps, for example, sympathy to starting conditions furthermore, irregular like conduct have pulled in the mindfulness of cryptographers to grow new encryption calculations. The Author has foreseen another symmetric key cryptography for pictures. This calculation broadly utilized confusingly-dispersion plan which uses the idea of disorderly 2 measurements (20) customary guide and 1 measurement (I D) Logistic guide. This calculation utilize 128 bits of mystery key. This calculation is explicitly planned for the shading pictures, which are 3D shows of RGB information stream. This calculation is especially intended for shading pictures, which is a 3D show of the information stream [11].

Creators had an accentuation on costly sight and sound substance, for example, computerized pictures, still, is helpless against unapproved access while away and amid transmission over a system. The Stream of advanced pictures additionally obliges high system transfer speed for transmission. In this paper, they display a novel plan, which joins Discrete Wavelet

Transform (DWT) for picture pressure and square figure Data

Encryption Standard (DES) for picture encryption. The reenactment results show that, the proposed technique upgrades the security for picture transmission over the Internet too as enhances the transmission rate [9]. In this paper creators have extraordinary about pictures territory unit that was totally not quite the same as messages in a few perspectives, as to a great degree repetition and relationship, the indigenous structure furthermore the attributes of adequacy recurrence. Therefore, the methods for normal mystery composing can't be relevant to photos. They have propensity to enhance the properties of disarray and dispersion regarding particular exponential disorganized maps, and style a key subject for the imperviousness to dim code assault and datum assault and differential assault [1]. NK Pareek, Vinod Patidar, KK Sud [8] has portrayed about the disorder fundamentally cryptological calculations, which was new and conservative to build up a secured picture cryptography strategy. They proposed a new approach for picture cryptography that upheld riotous maps, such an approach to meet the necessities of secured picture exchange. With the proposed picture cryptography calculation, they utilized outer mystery key of 80-bit size and 2 riotous supplying maps are utilized. The starting conditions for the every supplying maps are inferred abuse the outside mystery key by giving totally diverse weightage to any or every one of its bits. The proposed calculation for cryptography technique, utilized eight various types of operations to compose the picture components of a picture and that one among them are utilized for a chosen pixel is dictated by the deciding aftereffect of the supplying guide. To make the figures picture a ton of solid calculation product needed against any assault. The estimation of key changed, when scrambling each square of sixteen pixels of the picture [8].

III. The Proposed Work On Color Image Cryptography

There are a ton of cryptographic calculations for content information, yet the media, cryptography calculation is

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 less as contrasted with a content information cryptography calculation. The RAS, AES, DES, MD5 and so on are not utilizing for picture cryptography on the grounds that sight and sound like picture, feature and sound information size is typically expansive.

At present numerous cryptography calculation is utilized for picture information, yet some have shortcoming in it. They are a need of new symmetric key cryptography calculation for picture information. They require a high secure transmission execution, when connected at high bitrate interactive media information in the system; it requires high transforming assets and quick calculation. Another cryptographic calculation has been produced to speak to the interactive media content security in system channel. This calculation speaks to the proposed plan for shading picture encryption in the system that uses the 3D framework. In this paper, we have portrayed the method for usage for proposed calculation. There is a need of security for mixed media information like picture, feature and sound amid the system correspondence. This calculation is in view of another innovation for key era of pictures. In this paper a new key era procedure is created encryption and decoding. This key era methodology is interesting. Two diverse open is the utilization of cryptography procedure. Key era capacity coding is independent from encryption and decoding program because of conceal key values from the client and programmer. The accompanying steps are depicting the proposed calculation approach.

IV. Key Generation Algorithm

Step 1: Calculate two different qualities i.e. a & b by help of p,q.

Step 2: Two open key is created with the assistance of four values p, q, a, b.

Step 3: Create a grid of 2x2 or 4x4 i.e. initially open key.

Step 4: Create another grid of 16x16 or 8x8 i.e. second open key.

Step 5: These two open keys are utilized for encryption also, decoding.

V. Calculation For Encryption

The method for the Image encoding algorithmic tenet is as per the following:

Step 1: Enter two huge numbers p, q.

Step 2: Import the obliged plain picture and proselyte into the framework by the assistance of MAT LAB charge.

Step 3: Divide entire grid into n number of 2x2 or 4x4 grids.

Step 4: Multiply n number of 2x2 or 4x4 grid with first and foremost open key.

Step 5: Combined all "n" grid and structure a solitary network of size equivalent to plain picture.

Step 6: Divide entire grid into "n" number of 8x8 or 16x16 grids.

Step 7: Add "n" number of 8x8 or 16x16 grid with second open key.

Step 8: Transpose the framework

Step 9: Combined all "n" framework and structure a solitary grid of size equivalent to plain picture.

Step 10: Resultant framework is obliged figure picture.

Page 429

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

VI. Calculation For Decryption

The strategy for the Image deciphering algorithmic tenet is as per the following:

Step I: import Cipher picture and proselyte into the network by the assistance of MA TLAB charge.

Step 2: Divide entire network into "n" number of 8x8 or 16x16 networks.

Step 3: Transpose the network.

Step 4: Subtract "n" number of 8x8 or 16x16 network with second open key.

Step 5: Combined all "n" network and structure a solitary grid of size equivalent to figure picture.

Step 6: Divide entire framework into "n" number of 2x2 or 4x4 frameworks.

Step 7: Divide "n" number of 2x2 or 4x4 framework with to start with open key.

Step 8: Combined all "n" framework and structure single grid of size equivalent to figure picture.

Step 9: Resultant framework is Plain picture. The exhibitions of encoded and decoded pictures have been tried and the outcome will be examined through MATLAB test system, the proposed calculation is basic and lossless however troublesome for the gatecrashers to benefit the key.

In symmetric key cryptography, open key is the utilization of the sender and collector. We clarified all the ideas identified with this exploration region. This calculation is lessening misfortunes of picture pixel amid encryption and decoding. We purposed a lossless computerized encryption model based on another innovation for key era for pictures. The estimations of mystery keys were acquired from the another system of key generator strategy, results have tried also, investigate the performs of encryption and unscrambling methodology utilizing parallel calculation or some kind of shrewd calculation that make more secure and straightforward however more hard to find key worth.

VII. Exploratory Result Analysis

To accept our proposition, the test setup is actualized in

Java language with the utilization of MA TLAB test system running on Windows stage. We utilized RGB shading scale pictures. In light of the proposed calculation, we created programming for encryption and unscrambling of pictures and the outcome is indicated in Figure 1 and 3. Two measurements are utilized for key era. The primary metric is our first open key of size 2x2 or 4x4 and the second one is our second open key of size 8x8 or 16x16. The first Rose picture is tried on MATLAB through the proposed calculation for encryption and unscrambling. All the sweep modules grew in new cryptography calculation. Key era capacity coding is divided from encryption and decoding program due to conceal key qualities from the client and programmer. Test results for the Rose picture are demonstrated in figure (Fig. 1 to 8).

ICCIT15@CiTech, Bangalore

Figure 1 shows unique picture which is utilized for the cryptography reason. The proposed calculation utilizing for encryption and unscrambling connected to picture (figure 1 unique pictures) and the calculation lives up to expectations legitimately with the methodology. With this picture, a check was made to analyze whether an information misfortune is prevented or not. Figure 2 represents a cipher image of an original image. This output came after applying the new generated keys on the original image and performed encryption and decryption process, a cipher image is obtained that is shown in figure 2. Figure 3 represents the decoded picture or plain picture after decoding methodology. Decoded picture or plain picture acquired after applying new produced keys and decoding calculation available for later request of encryption calculation.

VIII. Examination Between Encrypted And

Decoded Histogram:

Figure 4 demonstrates a histogram of the first picture. The histogram is utilized for discovering force of pixel at diverse focuses. imhist(I) order is utilized for showing a histogram of the picture "I" over a dark scale shading bar. The quantity of containers in the histogram is indicated by the picture sort. In the event that is a grayscale picture, imhist utilizes a default estimation of 256 containers. We broke down the force of pixel at diverse point through the histogram demonstrated in figure 4 and figure 5. A picture histogram represents how pixels in an

Page 430

International Journal of Engineering Research

Volume No.4, Issue Special 5 picture are disseminated by diagramming the quantity of pixel at every shading force level. The histogram of unique/rearranged picture is like the histogram of plain picture. This implies that the relating factual data in rearranged picture is as equivalent as the plain picture.

IX. Examination:

N=row=column; n=NxN;

Mat= Squar e [Ml (i,j)-M2(i,j)];

Mean Square Error (MSE) = (Mat/N);

We have thought about our mean square Error Estimation with

Table 2 [6].

Table 1 is our outcome which MSE is zero at diverse piece.

Square network is utilization for correlation. Figure 6 demonstrate a snapshoot of MATLAB yield of mean square blunder between unique yield and plain picture.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

X. CONCLUSION

In this proposal, we build up another system for symmetric key encryption of a picture information. This method is special, straightforward for encryption and decoding and hard to recognize. The mean square lapse of our outcome is additionally zero at extremely piece of the framework. In this paper, we actualize a methodology which gives lossless picture transmission. Exploration work gives another calculation to picture encryption and unscrambling procedure. This calculation performs a worthy nature of administration and need a fitting distinctive security level of the picture information. We effectively recreate the idea of key encryption and unscrambling of pictures through MATLAB and dissect the outcome. Several shortcomings were discovered and overcome effectively. The outcomes looked at with other encoded methods that demonstrated the viability of the new scrambled system. This paper gave an important approach in the developing region of cryptography.

References i.

Linhua Zhang ,Xiaofeng Liao, Xuebing Wang, "An image encryption approach based on chaotic maps" Department of Computer Science and

Engineering, Chongqing University, Chongqing, China,2005, 759765 www.elsevier.com/locate/chaos .

ii.

S. Changxiang, Z. Huangguo, F. Dengguo, C. Zhenfu & H.Jiwu,

"Survey Of Information Security", Science In China Press 2007.

iii.

H. Cheng, X. Li, "Partial Encryption of Compressed Images and

Videos", IEEE Transactions on Signal Processing, Vol. 48 No. 8, August 2000.

iv.

R. Rudraraju, B.A, "Digital Data Security Using Encryption",

Master's Paper, University of Texas at San Antonio, 2010.

v.

Khan UM, Kh M. Classical and chaotic encryption techniques for the security of satellite images. In: IEEE international symposium on biometrics and security technologies (ISBAST 2008), vol. 5, no. 23-24; 2008. p. 1-6.

vi.

Dr. Mohammad V. Malakooti Mojtaba Raeisi Nejad Dobuneh "A

Lossless Digital Encryption System for Multimedia Using Orthogonal

Transforms" Dobuneh Islamic Azad University, Dubai, UAE 978-1-4673-0734-

5/12/2012 IEEE vii.

W. Puech, Z. Erkin, M. Barni, S. Rane, and R. L. Lagendijk

"Emerging Cryptographic Challenges In Image And Video Processing

Mitsubishi Electric Research Laboratories", TR2012-067 September 2012.

viii.

N.K. Pareek, Vinod Patidar, K.K. Sud, "Cryptography victimization multiple one-dimensional chaotic maps", Communications in nonlinear Science and Numerical Simulation 10 (2005) 715-723].

ix.

Philip P. Dang and Paul M. Chau "Image Encryption For Secure

Internet Multimedia Applications" IEEE; 2000; Department of Electrical and

Computer Engineering University of California, San Diego La Jolla, CA,

@2000, 92093 0098 3063/2000] x.

Centro-symmetric Key Cryptography Definition and clarification,

Date Accessed: twenty two Gregorian calendar month 2009 http:// en.

wikipedia.org/wiki/Symmetrickey algorithm.

xi.

Sahar Mazloom, Amir-Masud Eftekhari -Moghadam, "Color Image

Cryptosystem using Chaotic Maps"; 2011 IEEE ; Faculty of Electrical,

Computer and IT Engineering, 978-1-4244-9915-1.

xii.

William Stallings, (Feb, 2007), Third Edition Book on Cryptography and Network Security.

xiii.

Andrew S. Tanenbaum, Third Edition, Chapter 2 CDMA, Book of laptop Networkso. 1, pp. 1089 – 1098, 1993.

ICCIT15@CiTech, Bangalore

Page 431

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

ATM Deployment using Rank Based Genetic Algorithm with convolution

Kulkarni Manjusha M.

Department of CSE, V.T.U., SJBIT, Bangalore, India manjusha.shastry@gmail.com

Abstract — ATM is the most significant service provided by banking sector to the customers. Optimally deploying ATM’s is very complex. The effective deployment of ATM depends upon various factors such as, where the customers lives, where they work, roads they travels and the cost to reach ATM. Genetic algorithms are used to solve such optimization problems using techniques such as inheritance, mutation, selection, and crossover. A banks decision to deploy ATM’s should be logical as well as profitable which provide greater convenience and covering larger market area with maximum customers. The objective is to minimize the total number of machines but covering all the customer demands in the selected area. This study proposes a Rank Based Genetic Algorithm using convolution for solving the Banking

ATM’s Location Problem

(RGAC).

RGAC is one of the ATM deployment strategy based on rank concept which gives high feasible solution in reasonable time. RGAC gives cost efficient allocation of

ATM’s and computing percentage coverage(PC Covering whole area ) which is high as it covers customers demands by maximizing the service utility of each machine .

Key

Words: Genetic Algorithms (GAs), Rank, Automated

Teller Machines (ATM), Percentage coverage (PC), Client

Utility matrix (CU), Service Utility Matrix (SU) ,Rank Based

Genetic Algorithm using convolution (RGAC).

I.INTRODUCTION

ATM is an electronic banking outlet, which allows customers to complete basic transactions without the aid of a branch representative or teller. ATMs are scattered throughout cities, allowing customers easier access to their accounts. ATMs have become a competitive weapon to commercial banks whose objective is to capture the maximum potential customers. The fact is that commercial banks compete not only on the dimension of price but also on the dimension of location. ATM optimal

Deployment Strategies offer the opportunity to provide greater convenience and to attract more customers by covering the money market with sufficient ATM facilities. These strategies also provide greater cost efficiency by finding optimal number of ATMs to be installed and greater profitability by increasing the ATM user base in order to earn much more transactions and services fees as well as through the inflow of deposits from the depositors.

The location depends on the transactions demanded by the customer of proprietary ATM and non-proprietary ATM. A bank’s decision to deploy ATMs should be a rational Economic decision using the best ATM deployment strategy that takes into account the high computation complexities. This paper proposes

ICCIT15@CiTech, Bangalore a new Rank Based Genetic Algorithm for solving the banking

ATM location problem using Convolution (RGAC) which outperforms the Heuristic Algorithm based on Convolution

(HAC) algorithm that is inefficient while market size increases.

(RGAC) increases the search efficiency by improving the evolutionary process while meeting a feasible solution.

Moreover, RGAC has proved to be a robust approach for solving the ATMs deployment problem and is able to provide high quality solutions in a reasonable time.

The rest of the paper is structured as follows: Section II indicates some important related work RGAC.A detailed description of the problem encoding and specific operators are explained in Section III. Section IV explains about RGAC and

VI section includes concluding remarks.

II. RELATED WORK

The study [1] investigated placement of minimum number of

ATM machines covering all customer demands in given geographical area. They have developed a heuristic algorithm to efficiently solve the optimal location problem of ATM placement by formulating a mathematical model.

In this study, the problem of finding the minimum number of

ATM’s and their locations given arbitrary demand patterns is considered. They have considered one particular area and divided the parts of that accordingly such as area with no demand, high demand, and normal demand and so on with color code. Using the variables the placement problem is modeled.

The study [2] presents the problem of WiFiDP (WiFi

Network Design problem) grouping problem. A hybrid grouping genetic algorithm (HGGA) is proposed as a convenient method to solve such problems with providing a smaller and low cost connection service. The popularity of WiFi-enabled devices represents an enormous market potential for wireless networking services and mobile applications, based on this technology. The deployment of citywide WiFi access networks is a location problem as well as its a large assignment. In this case, the grouping genetic algorithm is combined with a repairing procedure, to ensure feasible solutions, and with a local search to improve its performance for the case of the WiFiDP. The grouping genetic algorithm (GGA) is a class of evolutionary algorithm specially modified to tackle grouping problems, i.e.

scenarios in which a number of items must be assigned to a set of predefined groups. Thus, in the GGA, the encoding, crossover, and mutation operator of traditional GAs are modified, obtaining a compact algorithm with very good performance in problems of grouping.

The study [3] investigate the ATM placement problem which is significant service provided by bank to customers.

Many banks utilize ATMs to make cash withdrawal available to their customers at all times. They have formulated the ATM

Page 432

International Journal of Engineering Research

Volume No.4, Issue Special 5 allocation problem as an optimization problem with mathematical model by considering various factors for deployment such as the price of buying or leasing an ATM, cost of deployment, cost of operation, and ATM characteristics to be deployed. .

The study [4] determines cash management for ATM network. They have proposed one approach based on artificial neural network to forecast a daily cash demand for every ATM in the network and on the optimization procedure to estimate the optimal cash load for every ATM.. ANN are used for tasks such as pattern recognition, classification and time series fore-casting.

The key to all forecasting applications is to capture and process the historical data so that they provide insight into the future.

The primary objective of cash forecasting is to ensure that cash is used efficiently throughout the branch network. Cash forecasting is integral to the effective operation of an ATM/branch network optimization procedure.

The study [5] revolves around the grid computing environment where sharing computational resources, software and data take place at a large scale. However, the management of resources and computational tasks is a critical and complex undertaking as these resources and tasks are geographically distributed and a heterogeneous, dynamic in nature. They proposed a new Rank Based Genetic Scheduler for Grid

Computing Systems (RGSGCS) for scheduling independent tasks in the grid environment by minimizing Makespan and

Flowtime. The performance of the system can be evaluated by distributing number of resources across the networked computers.

III. PROBLEM FORMULATION

The ATM placement problem is modeled and defined mathematically. The variables used in modeling of the intended problem are shown in the table I. The optimization problem is organized in such a way as to realize market clearance.

In other words, the difference between CU and SU should be minimized. This difference can be expressed mathematically in equation 1:

E = SU – CU ≥ ᴓ

(1)

Where E is the difference matrix of size(I×J) after assigning total number of machines, SU is service utility matrix.

A. Client Utility Matrix CU:

Any exercise to optimize the deployment of ATMs must start with a thorough understanding of the customer base and identification of the priority of the customer’s .The generation of

CU is made by following these procedures:

• The first step is to categorize people based on where they live, where they work and where they may need money in order to make payment for shopping and other transactions. The science of grouping of the people in a geographical area according to socioeconomic criteria is known as Geo-demography. The

Commercial Geodemography has been used to target ATM services to the Bank’s clients based on their lifestyle and location. In this study the geo-demographic approach is used by conducting a survey on potential Customer as well as geographical, demographic, economic, and traffic data. Other considerations include safety, cost, convenience, and visibility.

Quite often, malls, supermarkets, gas stations, and other high-traffic shopping areas are prime locations for ATM sites. In this paper, the priorities for different potential ATM locations

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 will be implemented based on a priori analysis of all the applicable factors. Using SPSS program , the related data are entered. The variables used are Customers Age, income,

Education and Marital Status which constitute the demographic and economic factors. The traffic data are represented by a variable such as the location importance which encompasses factors like number of residents, number of public institutes, number of private institutes and the state of street whether it is main street, by-street or crossroad.

The procedure now is to compute the mean value of these variables for each customer then we segment the customers according to their areas and compute the cumulative mean value for customers belonging to each respective segment. Each cumulative mean value represents one element in G(x×y) matrix.

The elements of G(x×y) range from 0 to 10.when The element g(x×y) is high means that there are more number of potential customers in that area, in contrast, when g(x×y) is small means less number of potential customer are there.

• Generate submatrices (cur), the matrix of cur is presented in equation 2 and

Figure 1:

cur = G(x×y) × U(m×n) × 10 (2)

Figure 1. Cur Matrix.

Where: r = 1, 2… (m× n) (I× J) . U(m×n) is the d egradation of

Client Utility, by assuming m =3, n=3, U(m×n) is given in figure

2:

Figure 2: Degradation of client utility matrix

CU matrix can be obtained by replacing each element in G(x×y) by its corresponding matrix cur as in Figure

3.

Figure 3. Client Utility Matrix CU.

The reason behind calculating cur is that, cur will be strongest at the center of the areas, and it will degrade as one moves away from it.

B. Service Utility Matrix SU:

Once the deployment of ATMs is a one off project, hence it is done once. It is essential to distribute the limited number of atms in such a way as to maximize the utility of services. In order to

Page 433

International Journal of Engineering Research

Volume No.4, Issue Special 5 find SU, this study assumes that ATMs are homogeneous, in line with this there exists only one matrix A. Matrix A

(represents the degradation of ATMs utility as one moves away from its location) is predetermined and held constant for all machines. The rectilinear distance model is adopted as shown in figure 4.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

The value of g ranges between [0, 1] and it approaches one only when all elements in E are zeros or positive values, denoting the saturation level of Client Utility.

Figure 4. Service Matrix A (The rectilinear distance model)

The matrix Ln indicates the location of the nth machine. If this location is denoted by the coordinates (un,vn) then all elements of Ln are equal to zero except for coordinates (un,vn) where they are equal to one as in the equation 3 and figure 5.

In order to deploy less number of ATMs without affecting negatively on PC and γ if the value of PC is equal to hundred

(100), then next step the number of ATMs is reduced by one, the trial continues reducing N as long as PC is within the acceptable limit (i.e. more than the lower limit 99). Otherwise when the value of PC is less than the acceptable limit, then trial increases number of ATMs till PC reaches the acceptable limit. The previous conditions are presented in equation 9:

The matrix SU can be obtained from the convolving of two matrices A and L as in equation 4 and figure 6. Notice that the objective of the convolution here is to surround the unique nonzero element in Ln with the service pattern matrix A. Therefore, the convolution operation in this case can be performed very efficiently by simply centering the elements of the A matrix at

(un, vn).

SU = A * L (4)

Where: the symbol * indicates the convolution product.

C. Percentage Coverage (PC):

In order to satisfy the client, his Utility should be satisfied by covering his demand, and the Service Utility should be maximized through effective deployment of ATMs, this will save the cost of providing additional ATM. PC is computed as the percentage of ψ (ψ is equal to one in all points in E that have

SU greater than CU) divided by the number of elements in E. PC is given as in equation (5):

Where ψ is given in equatio

In addition to PC, another important measure (the total Client

Utility satisfied γ) is calculated. The formula of γ is given in equation (7): n ( 6):

The algorithm returns both γ and PC values with the solution as will be shown in the simulations section. PC and γ values are essential in measuring the goodness of deployment of ATMs.

ICCIT15@CiTech, Bangalore

Where: k = 1, 2 …

V. RANK BASED GENETIC ALGORITHM FOR

SOLVING THE BANKING ATM’S LOCATION

PROBLEM (RGAC)

GA is used to solve optimization problems by imitating the genetic process of biological organisms. A potential solution to a specific problem may be represented as a chromosome containing a series of genes. A set of chromosomes makes up the population. By using Selection, Crossover and Mutation

Operators, GA is able to evolve the population to generate an optimal solution.

A. Chromosome Representation

The efficiency of GA depends largely on the representation of a chromosome which is composed of a series of genes.

Here each gene represents an ATM location which is equal to one or zero based on binding of the ATM to its location as in equation 3. As a result, L represents the chromosome.

Population Initialization is generated randomly.

B. Fitness Equation

A fitness equation must be devised to determine the quality of a given chromosome and always returns a single numerical value.

In determining the fitness equation, it is necessary to maximize the Percentage Coverage PC of CU. RGAC takes the PC value as a fitness equation for a given chromosome, which presented in equation 5.

C. Evolutionary Process

Evolutionary process is accomplished by applying Rank based

Roulette Wheel Selection (RRWS). Crossover and mutation operate from one generation to the next. Selection Operator determines how many and which individuals will be kept in the next generation. Crossover Operator controls how to exchange genes between individuals, while the Mutation Operator allows for random gene alteration of an individual. Besides the standard genetic operators discussed above, the Elitism Phase is used to preserve the best candidates. These stages are discussed in details as below. Firstly, in order to carry out the RRWS, the

Relative Probability (shown in equation 14) and cumulative proportion of each chromosome are calculated.

Pi = Rank (fitness); (14)

After that, one-Point Crossover and Mutation Operators, the algorithms (1, 2) are applied to the chromosomes from the selection phase. Mutation Operator runs through the genes in each of the chromosomes and mutates each gene according to a

Page 434

International Journal of Engineering Research

Volume No.4, Issue Special 5

Mutation Rate Pm. Finally, Elitism combines the parent population with the modified population (the candidates generated by Crossover and Mutation Operators), and takes the best chromosomes to the next generation. The purpose of this phase is to preserve the best chromosomes from being lost. After this phase, the algorithm continues to the next iteration. RGAC is presented in the algorithm 3.

D. Performance Analysis

RGAC needs to execute some hundreds of iterations to come up with an optimal solution. However, the shortcoming of HAC is convergence to a local optimum. According to the simulation results, it is proved that RGAC is effective in speeding up convergence while meeting a feasible result. Also RGAC outperforms HAC, in the PC and g values to obtain the final schedule.

Algorithm 1 One point Crossover

1: for i=1 to popSize/2 do

2: Select two chromosomes p1, p2 randomly

3: Select Crossover point Point randomly

4: Save coordinates of ATM locations of two chromosomes p1, p2 in row1, col1, row2, col2.

5: if random [0, 1] probCrossover then

6:

7: for k=Point+1: ChromosomeLength do

Swap the coordinates of ATM locations of

T

8: two chromosomes p1, p2 end for

9: Keep the newly produced chromosomes as candidates

10: end if

11: end for

Algorithm 2 Mutation

1: for i=1 to popSize do

2:

3:

Select chromosome p randomly

Save coordinates of ATM locations of chromosome p in row, col

4: if random[0,1] probMutation then

5:

6:

Select one ATM location of chromosome p randomly and make it equals to zero

Generate one ATM location of chromosome p randomly and make it equals to one add the newly produced chromosomes as candidates 7:

8: end if

9: end for

RGAC Algorithm

1: Generate Initial Population P of size N1 randomly.

2: Evaluate each chromosome using equations (1,5 and 7)

3: for g 1 to MaximumGenerations do

4: Generate offspring Population from P

5:

6:

Rank based Roulette Wheel Selection

Crossover and Mutation algorithms (1,2)

7: Evaluate each chromosome resulting from Crossover and Mutation stages using equations (1,5 and 7)

8: (Elitist) Select the members of the combined population based on minimum fitness, to make the population P of the next generation.

9: Evaluate each chromosome using equations (1,5 and 7)

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

10:

11: found; if stopping criterion has reached then return PC, g values and best ATM Location Matrix L

12: break;

13: end if

14: end for

RGAC is effective in speeding up convergence while meeting a feasible result. The fitness equation gives very good results and high quality solutions, even though the complexity of the

Problem increases.

VI CONCLUSIONS AND FUTURE WORK

This paper presents the RGAC technique, for deploying ATMs in the Banking world, which increases search efficiency by improving the evolutionary process while meeting a feasible result. Moreover, RGAC has proved to be a robust approach for solving the ATMs deployment problem and is able to provide high quality solutions in a reasonable time as compared to HAC.

The simulation results show that, RGAC solves the optimization problem of ATMs deployment by maximizing PC and minimizing N, thus RGAC matches the two objectives of banks, namely, attaining the highest client utility as well as improving the cost efficiency of the banks. In the future, it is appropriate to extend the goodness of results by including other measures like variance or standard deviation in order to obtain less dispersion in matrix E.

REFERENCES i.

M.A. Aldajani and H. K. Alfares,“ Location of banking automatic teller machi nes based convolution,” Comput. Ind. Eng., vol. 57, no. 4, pp.1194

1201, 2009.

ii.

Figueras, and M. Solarski, “A hybrid grouping genetic algorithm for citywide ubiquitous wifi access deployment,” in CEC’09: Proceedings of the

Eleventh conference on Congress on Evolutionary Computation. Piscataway, NJ,

USA: IEEE Press, 2009, pp. 2172 – 2179.

iii.

A. Qadrei and S. Habib, “Allocation of heterogeneous banks’ automated teller machines,” in INTENSIVE ’09: Proceedings of the 2009 First

International Conference on Intensive Applications and Services. Washington,

DC, USA: IEEE Computer Society, 2009, pp. 16 – 21.

iv.

L. B. J. F. P. D. Rimvydas Simutis, Darius Dilijonas, “Optimization of cash management for ATM network,” in Information Technology and Control,

2007.

v.

A.J. S. R. Wael Abdulal, Omar Al Jadaan, “Rank based genetic scheduler for grid computing systems,” in The International Conference on

Computational Intelligence and Communication Networks (CICN 2010).IEEE,

2010.

vi.

Alaa Alhaffa and Wael Abdulal,” A Market -Based Study of Optimal

ATM’S Deployment Strategy,” International Journal of Machine Learning and

Computing, Vol.1, No. 1, April 2011.

vii.

Omar Al Jadaan, Lakishmi Rajamani,3C. R. Rao ,” Improved selection operator for ga”, Journal of Theoretical and Applied Information

Technology, 2005 - 2008 JATIT.

viii.

Rakesh Kumar, Senior Member, IACSIT and Jyotishree, Member,

IACSIT,” Blending Roulette Wheel Selection & Rank Selection in Genetic

Algorithms”, International Journal of Machine Learning and Computing, Vol. 2,

No. 4, August 2012.

ix.

D. E. Goldberg, Genetic Algorithms in Search, Optimization, and

Machine Learning. New York, NY: Addison-Wesley, 1989.

x.

M. Wagner, “The optimal cash deployment strategy modeling a network of automated teller machines,” thesis in Master of Science in

Accounting, HankenSwedish School of Economics and Business

Administration, 2007.

Page 435

International Journal of Engineering Research N:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Data Oblivious Caching Framework for Hadoop using MapReduce in Big data

Sindhuja.M, Hemalatha.S

Assistant professor- Information Technology, PG Scholar - software engineering

Abstract— sindhuja.m@rajalakshmi.edu.in,hemzmohan12@gmail.com

The invention of online social networks, smart phones, fine tuning of ubiquitous computing and many other technological advancements have led to the generation of multiple petabytes of both structured, unstructured and semi-structured data. These massive data sets have lead to the birth of some distributed data processing and storage technologies like Apache Hadoop and MongoDB. The data that are huge in volume takes more time to execute for particular method and causes failure to distributed system. To defeat this issue a framework was developed called hadoop for big data processing and is being used in many large scale organizations.

It process huge amount of data in least amount of time even in the large distributed systems. The main advantage is automatic fault tolerance capacity to handle with failure of systems during execution or processing using mapreduce programming technique. Here execution time is still an issue for delivering large amount of data and processing it repeatedly for particular process. Existing method does not have any aspect to reduce time for recompilation. The proposed approach includes hadoop distributed oblivious caching system for big data processing, which can handle both the type of cache memory, one is local cache and other is distributed cache. This distributed cache would reduce the recompilation time and increase the cache hit ratio.

Index Terms— Big-data, Distributed Cache, Hadoop, Map

Reduce.

accomplishes two job: massive data storage and data processing.Hadoop follows the Master/slave architecture which will decouple the system metadata and application data. Hadoop can be used to implement the mapReduce framework.

II.

BACKGROUND

A.

Data storage in Hadoop

The hadoop distributed file system (HDFS) is a distributed file system mainly designed to run on large commodity hardware .The hadoop can also deployed on low cost hardware’s. It provides high throughput access to application data and suitable for application that have large dataset including files that reach into the petabytes. HDFS is scalable and highly fault tolerant .By facilitating the transfer of data between the nodes, enabling hadoop system to continue running even if any one of the nodes gets fail. This process decreases the risk of catastrophic failure, even in the event of numerous node fail.

B.

Data analytics using Map Reduce

DETAILS

I.

INTRODUCTION

Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications. It requires new technologies and architectures so that it becomes possible to extract value by capturing and to perform analysis process. Because of its various properties such as volume, velocity, variety, variability, value and complexity leads many challenges. Since big data is a imminent technology in the market which can bring huge benefits to the business organizations. There are various challenges and issues associated in bringing and adapting this implementation. Apache hadoop [3] is an open source implementation which is widely used in distributed system, follows Clustered approach and allows massive amount of data to be stored. Essentially, it

I

N

Map ()

Reduce

O

U

P

Map ()

Reduce

T

U

Map ()

P

T

Split Sort Merge

U

T

Figure 1: MapReduce architecture

Map Reduce [10] has become the most popular framework for large scale processing and analysis. MapReduce has to be specified as two phases. First, MapPhase as specified by a map function

(also called Mapper) takes the key value pairs as input, possibly perform some computation on the input and produces intermediate result in the form of key/value pairs and second, Reduce Phase

( also called reducer ) which processes these results as specified by a reduce function .The data from the map phase are shuffled (i.e. exchanged and merge sorted) and passed to the machines to perform the reduce function. The Mapper and reducer can be

ICCIT15@CiTech, Bangalore

Page 436

International Journal of Engineering Research N:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015 performed in parallel. This parallelism also offers some possibility of recovering from partial failure of servers or storage during the operation if one Mapper or reducer fails, the job can be reschedule.

III.

RELATED WORK

There has been an enormous amount of work on in- memory storage and caches had carried out. Data oblivious caching was proposed by ideas from the prior work. It is mainly dealt with parallel jobs that led to a coordinated system which improve the job completion time and cluster efficiency. The issues and challenges in Big data are discussed [1] and also begun a collaborative research program into methodologies for Big data analysis [12] and design

Dache [13] used a novel cache description scheme and cache request and response protocol have been designed for processing the cached data. It is a distributed layered cache system built on the top of the hadoop distributed file system which provides a multiple cache services. Distributed hash table is used to provide the cache service using p2p Style. Three replicas are maintained in three different cache services. The replication will improve the robustness and alleviates the workload. The system maintains three layers namely in-memory cache, snapshot of the local disk, actual disk view. The cache service can be accessed by integrating the applications with the client library of the system.

Memory cached[4] is a distributed caching system designed as an object accessing layer between the application and underlying relational database. The cache manager of Dache could utilize

Memory cached to accelerate query response because the size of cache item is generally small.

RAM Cloud [9] and prior work on databases such as MMDB [2] stored all data in RAM only. While this is suited for web servers, it is unlikely to work in data- intensive clusters due to capacity reasons – FaceBook has more storage on disk that aggregate memory. Proposed system thus treats memory as a constrained cache. The study of speculative execution of tasks described the execution which potentially slows down the entire MapReduce job in order to accelerate [15] the execution of a MapReduce job and also does not address the data sharing problem which is identified. This mechanism is orthogonal to proposed work and could be integrated straight forwardly.

Distributed systems such as Zebra [5] and XFS developed for the

Sprite operating system make use of client-side in-memory block caching, also suggesting using the cache only for small system.

However, these systems make use of relatively simple eviction policies and do not coordinate scheduling with locality since they were designed for usage by a network of workstations.

According to PACMan [14] when multiple jobs run in parallel, job’s running time can be decreased only when all the inputs related to running a job are cached. Caching only part of the inputs will not help in improving the performance. These massive distributed clustered systems have large memories and job execution performance can be improved if these memories can be utilized to the fullest. PACMan is a caching service that coordinates access to the distributed caches. The system aims at minimizing total execution time of job by evicting those items whose inputs are not completely cached. For evicting those inputs,

LIFE sticky policy has been developed.

Multiple intelligent cache [11] mechanism in which the cache distributed over redis servers and redis server (single place to store all cached data) will serves client request. This mechanism helps in improving the performance, lowering access latency and increasing the throughput. In order to increase and progress the performance of

MapReduce, the new design of HDFS was proposed by introducing multi intelligent caching concept with redis server.

Collaborative caching [6] presented a design of a proactive fetching and caching mechanism based on Memcached and integrates it with

Hadoop. Hadoop with Memcached servers which would store meta-data about objects cached at datanodes. The blocks to be cached are decided on the basis of two level greedy approaches.

To overcome the limitation of the hadoop system a system called in-memory cache scheme [7] has been developed. Data locality and task parallelism can be improved in multi-core environment by integrating the in-memory cache scheme with the cache data. The performance of the hadoop increased from 1.5 x to 3.5x.

IV.

DESIGN

Figure 2: The Docache infrastructure

Figure 2 shows the overall infrastructure of the system.

Docache is a mechanism which is used to access the cached data with less time and resources. All the local caches can be coordinated by the distributed cache called Docache. It uses the data oblivious caching algorithm for processing the data. because this algorithm is much easier to analyze than a real cache`s characteristic such as replacement policies etc. This framework does not depend on variables/hardware parameter such as cache size or cache line length. It is efficient usage of processor caches and reduction of memory bandwidth requirements.

For application data, a distributed cache keeps a copy of a subset of the data in the database and it is also temporary in nature.

ICCIT15@CiTech, Bangalore

Page 437

International Journal of Engineering Research N:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

It is work by divide and conquer algorithm, where problem is divided in to many sub problems. These sub problems are processed by MapReduce framework. The centralized cache management in HDFS allows user to specify the data to be cached by HDFS. The name node will communicate with the data nodes that have the desired blocks on disk and instruct them to cache the blocks in local caches as well as in centralized cache. The cached item can get from both the local cache and remote cache. The data from the user are processed by MapReduce. The input will first split into various key/value pair and each key/value pair will be assigned to each map. The intermediate result that produced as a result of map task is cached in both local node and in distributed cache. The docache does not contain the actual data. Then the reducer will perform the Reduce task .Number of reduce task will always lower than the reduce task.

The metadata will be in the form of key/value pair. Docache retrieve the searched data fast and helps to access the cached item in a short time with less resource used. Name node is the central coordinator which coordinates the entire data node. It is the master of the HDFS that directs the slave node to perform low level I/O tasks. The name node updates the mapping of a cached block to the data node. Data node provides the report about the local cached block to the Name node and docache. Data nodes are responsible for data replication across multiple nodes.

V.

A.

Hadoop cluster

M

ETHODOLOGY

Hadoop cluster can be formed by single node or multi -node.

The hadoop cluster mainly consists of Name node, Data node, Job tracker, task tracker. Each node in the cluster is either master or slave. Slave nodes are data \node and task tracker. HDFS is a primary storage in hadoop application which exposes a file system namespace and allows user to store the data in it. The HDFS file consists of many blocks. Each block consists of 64 MB. Each block is replicated three times .So that the blocks can be processed fast and easily. Data nodes are responsible for serving read and write request. The name node is the repository for all HDFS metadata. Then 4 GB dataset were uploaded into the file system.

B.

Distributed cache

Accessing data from cache is much faster compared to disk access. To improve the overall performance and efficiency of the cache, distributed cache is used. It reduces the overall time taken to search the required data from the cache. Almost the time is reduced by 80% .Distributed cache allows us to query, search analyze more cache entries in memory with results to complex searches returned in less than a second. The time consumption and expensive process can be avoided by querying the cache directly from the database, then mapping query results to cache lookups.

are not effective due to the mobility of the task. Adaptive replacement cache is the efficient cache replacement policy for cache utilization.

This algorithm constantly balances LRU and LFU to improve the combined result. It splitting the cache directory into two lists T1 and

T2 for recently and referenced entries. Any entry in L1 that referenced once more gets another chance and enters L2.Entries entering the cache (T1, T2) will cause to move towards the target maker. If no free space exists in the cache, the marker will also determine whether either T1 or T2 will evict an entry.

ARC has low space complexity. A realistic implementation had a total space overhead of less than 0.75%.It has low time complexity, virtually identical to LRU and suitable to adapt different workloads and cache sizes. In particular, it gives very small cache space to sequential workloads, thus avoiding a key limitation of LRU. For a huge, real-life workload generated by a large commercial search engine with a greater than 3GB cache,

ARC's hit ratio was dramatically better than that of LRU

Algorithm:

ARC(c) T1 = B1 = T2 = B2 = 0, p= 0, X.

Case I. x € T1 ᴜT2 (a hit in ARC( c) and DBL (2c)): Move X to the top of T2.

Case II. X € B1 (a miss in ARC (c), a hit in DBL (2c)):

Adapt P= min{c, p+ max{|B2|/ |B1|, 1}} .

REPLACE (p). Move X to the top of T2 and place it in the cache.

Case III. X € B2 (a miss in ARC (c), a hit in DBL (2c)):

Adapt P= max {0,p max{ |B1|/ |B2|, 1} } .

REPLACE (P). Move X to the top of T2 and place it in the cache.

Case IV. X € L1 ∪ L2 (a miss in DBL (2c) and ARC (c)): case ( i) |L1| = c:

If |T1| <c then delete the LRU page of B1 and REPLACE (P).

Else delete LRU page of T1 and remove it from the cache. case ( ii) |L1| <c and |L1| + |L2| ≥ c: if |L1| + |L2| = 2c then delete the LRU page of B2.

REPLACE (P).

Put X at the top of T1 and place it in the cache.

Subroutine REPLACE (P)

If (|T1| ≥ 1) and ((X € B2 and |T1| = P) or (|T1| >P)) then move the

LRU page of T1 to the top of B1 and remove it from the cache.

Else move the LRU page in T2 to the top of B2 and remove it from the cache.

D.

Performance evaluation

C.

Cache Replacement policy

Keeping the cached item for a long time will result in wastage of resource and memory. The unused item should be removed from the memory in order to make it available for new cached item. Many well-known cache replacement policies like LRU and LFU [8] alone

The calculated execution time will be present in both time and graph. The cache execution time is then compared with the normal execution (Disk memory).The below graph represents the different run time of the same search.

ICCIT15@CiTech, Bangalore

Page 438

International Journal of Engineering Research N:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

1

2

3

4

VI.

E

XPERIMENTAL

R

ESULTS

Hadoop is run in single-distributed mode. The number of mappers is set as 6 in this experiments and the reducers count varies according to the application. The input of the application is cached in local cache. Then distributed cache is used to access the data from the local cached item.

Table 1: Experimental result of searching word shows how much time consumed for searching.

RUN TIME(ms)

13750

25

23

20 process a given set of data and save the intermediary data within the local files. The reducer nodes will then copy these data from the Mapper nodes and later aggregate it to produce the final result.

In Data oblivious cache Data can be get from the local cache as well as remote cache. Centralized caching can improve the overall cluster utilization. It maintains replication copy of cached data for high availability .Hence the cache memories would reduce the recompilation time and increase the cache hit ratio. The future development will focus on enhancing the caching mechanism with advanced algorithm using the hadoop and MapReduce

References

i.

Avita Katal, Mohammad Wazid, R H Goudar “Big Data: Issues,

Challenges, Tools and Good Practices”, 2013 IEEE. ii.

H.Garcia-Molina and K. Salem., “Main Memory Database Systems: An

Overview”, In IEEE Transactions on Knowledge and Data Engineering, 1992. iii.

Hadoop, http://hadoop.apache.org/, 2013. iv.

Jing Zhang, Gonqing Wu, XuegangHu, Xindong Wu “A Distributed cache for Hadoop Distributed File System in time cloud services” on ACM/IEEE 13 th international conference, 2012. v.

John H, Hartman and John K. Ousterhout “The Zebra Striped Network

File System”, In ACM SOSP, 1993. vi.

Meenakshi Shrivatava, Dr.Hans-peter Bischof, “Hadoop-Collaborative

Caching in Real Time HDFS”, Google, 2013. vii.

Memcached—A distributed memory object caching system, http://memcached.org/, 2013. viii.

Nimrod Megidd, Dharmendra S. Modha, “Outperforming LRU with an

Adaptive Replacement Cache Algorithm”, IEEE trans. on distributed system,

2004. ix.

J. Ousterhout.K “The Case for RAMClouds: Scalable High- Performance

Storage Entirely in DRAM” in SIGOPS Operating Systems Review, 2009. x.

Pietro Michiardi, “Map Reduce Theory and Practice of Data Intensive

Applications”, Eurecom, 2011. xi.

K.Senthil Kumar, K. Satheesh Kumar, S. Chandrasekar, “Performance

Enhancement of data Processing using Multiple Intelligent Cache in

Hadoop”,IJIET,Vol.4, Issue 1, June 2014. xii.

Stephen Kaisler, Frank Armour, J. Alberto Espinosa, William Money, “Big

Data: Issues and Challenges Moving Forward”, IEEE, 46th Hawaii

International Conference on System Sciences, 2013. xiii.

Yaxiong Zhao, JIU Wu , “Dache: A Data Aware Caching For Big Data

Applications Using the MapReduce Framework”, Vol.19, No 1, February 2014 xiv.

Zhiwei Xaio, Haibochen, Binyu sang “A hierarchical Approach to maximizing MapReduce Efficiency” on international conference, 2011. xv.

M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving mapReduce performance in heterogeneous environments”, in Proc. of

OSDI’2008, Berkeley, CA, USA, 2008.

Figure 3: Comparison of different runs

For this application book publication data of about 4 GB in size was used. The time taken for different run is given in below table.

VII.

CONCLUSION

This paper has exposed the major data analyzing problems that need to be addressed in Big Data processing and storage. We have described Data oblivious caching, an in-memory coordinated caching system for data processing in hadoop using MapReduce framework. During Map-Reduce framework, Mapper nodes

ICCIT15@CiTech, Bangalore

Page 439

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

PAPR Reduction For STBC MIMO-OFDM Using Modified PTS Technique

Combined With Interleaving and Pulse Shaping

Poonam, Sujatha S

Dept. of TCE, CMRIT-Bengaluru pkkpoonam2@gmail.com, sujatha.s@cmrit.ac.in

Abstract

: Multiple Input Multiple Output-Othogonal

Frequency Division Multiplexing MIMO-OFDM is a most attractive technology which has been recently proposed in wireless communication. It provides high data rate services and better system performances. It improves data throughput and delivers highest capacity as well. However, MIMO-OFDM suffers with the drawback of Peak to Average Power Ratio

(PAPR) for the large number of subcarriers which can effect the system output. Therefore, to overcome the problem of

PAPR reduction, an effective technique PTS (partial transmit sequence) is used. In this paper, modified PTS technique combined with interleaving and pulse shaping method has been presented to improve the performance of MIMO-OFDM system in terms of PAPR reduction. The basic idea behind PTS is to analyses the influence of number of detected peaks on

PAPR performance and the system complexity by combining signal subblocks and the rotation factors. The simulation results are computed by using MATLAB simulation which completely improves PAPR performance by using modified

PTS combined with interleaving and pulse shaping method for

STBC MIMO-OFDM.

Key-Words: - MIMO-OFDM, PAPR, STBC, Partial

Transmit Sequences, Interleaved Subblock Partition

Scheme, Raised-Cosine pulse shape

1.

Introduction

Orthogonal Frequency Division Multiplexing (OFDM) is a high speed wireless communication technology which has a demanding future in mobile communicatin system. It provides high data rates and high quality multimedia services to mobile users and also delivers high data throughput and gives efficient wideband communication system. Due to all these advantages of OFDM it has been playing an important role in various communication system. Multiple antennas are used to increase the capacity of wireless lines so have been a great deal of interest in communication system. Space-time codes with

OFDM results in wideband communication. By using multiple antennas at the transmitter as well as at the receiver end, spatial diversity can be achieved since it does not increase the transmit power and signal bandwidth. Therefore, many high speed data transmission standards have been presented such as

WiMAX(IEEE 802.16), WLAN(IEEE 802.11a/g), digital video broadcasting (ADSL) etc.

MIMO-OFDM is the technology which combines multiple input,multiple output, which multiples capacity by transmitting different signals over multiple antennas, and orthogognal frequency division multiplexing (OFDM). MIMO-OFDM has several advantages of high data throughput, robustness against

ICCIT15@CiTech, Bangalore multipath fading, high power spectral efficiency,better performance. However, at the saem time MIMO-OFDM suffers with a problem of PAPR while implementing the system. PAPR is defined as the peak to average power ratio which increases the analog to digital and digital to analog converter complexity and as a result it reduces the efficiency of the radio-frequency(RF) power amplifier.

There are several techniques used to reduce PAPR performance in MIMO-OFDM system. The techniques are categorised into 3 types- Distotion methods, Distortionless methods and other methods. These methods includes-

Clipping,Companding,Selective mapping (SLM), Partial transmit sequence (PTS), Active constellation extension

(ACE), Tone reservation (TR). Clipping considers a predetermined threshold which helps in reducing PAPR to a loewst value. Interleaving combined with PTS, is also introduced in implementing MIMO-OFDM to reduce PAPR.

Interleaving is basically, transmission of reordering of consecutive bytes of data over a large sequence of data to reduce the effect of burst error. In this paper, PAPR is reduced by PTS combined with interleaving and pulse shaping method in MIMO-OFDM. SLM and PTS belong to the probabilistic class because several different signals are obtained but only the minimum PAPR signal is taken into consideration. In SLM, several signals contains same information data and one OFDM signal of lowest PAPR is selected. However SLM is a flexible technique but it requires high computatinal complexity low bandwidth efficiency.

Therefore, an effective technique PTS (Partial Transmit

Sequence) has been used in this paper which helps to reduce

PAPR to a minimum value. PTS is a distortion type method and attractive technique used to improve the statistic of a multicarrier. In PTS, the data input information is divided into smaller disjoint subsequences. The input data is carried out and IFFT is performed and each subsequence are then multiplied with rotating phase factors. The output combined with rotating phase factors are then added to obtain 0FDM symbol for transmission. Each and every subsequence determines the PAPR reduction. PAPR is computed for each resulting sequnce and the signal sequence with minimum

PAPR is considered and transmitted. The partitioning types for PAPR reduction can be categorised as- interleaving partition, adjacent partition and pseudo-random partition.

However, PTS in modified form is the better option compared to an ordinary PTS because in an ordinary PTS

,all the phase factor combination are considered which results in the increasing complexity with the several number

Page 440

International Journal of Engineering Research

Volume No.4, Issue Special 5 of subsequences. Hence, modified PTS is being considered to complete the PAPR reduction and to reduce the system complexity as well. Therefore modified PTS is a successful technique and has real and imaginary parts which are separatly multiplied with phase factors.

If the previous work related to the same concept is taken into consideration using PTS, then PAPR reduction is jointly optimized in both the real and imaginary parts separatly multiplied with phase factors when different subcarriers and subsequences. PTS combined with interleaving is very helpful in

MIMO-OFDM system for PAPR reduction.

2.

MIMO-OFDM and PAPR in OFDM system

MIMO-OFDM system consists of the the transmitter and receiver end. Transmission antennas are used at the transmitting end. An input data bit stream is supplied into space-time coding, then modulated by OFDM and finally fed to antennas for sending out (radiation). At the receiving end, in-coming signals are fed into a signal detector and processed before revitalization of the original signal is made.

The PAPR of OFDM is defined as the ratio between maximum the power and the average power, The PAPR of the OFDM signal X (t) is defined by the following equation (1):

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Partial Transmit Sequence (PTS) algorithm is a technique for improving the statistics of a multicarrier signal. The basic idea of Partial Transmit Sequence algorithm is to divide the original OFDM sequence into several subsequences and for each sub-sequences multiplied by different phase factors until an optimum value is chosen. First, consider the data block as vectors, X=[X

1

, X

2

……X

N-1

]

T

. Then, data vector X are partioned into disjoint sets, represented by the vector

{X m, m=1, 2….M}. Here, we assume that the data clusters consist of a contiguous set of subcarriers and are of equal size.

Fig. 1.

Block diagram for Partial Transmit Sequences

The objective is to combine the M number of clusters using the equation (6) and to obtain a optimal solution,

The Cumulative Distribution Function (CDF) is used to measure the efficiency of PAPR technique. The Complementary CDF is used to measure the probability that PAPR of a certain data block exceeds the given threshold which is implemented by using mean as a zero and variance of 0.5 and using Guassian distribution.

The following equation describes probability of OFDM signal which exceeds the threshold value:

The input to the high power amplifier (HPF) must be continuous time signal. Therefore, Oversampling has been used to approximate the CCDF of PAPR to make the continuous OFDM signals and is presented as:

The CCDF of PAPR using oversampling is recalculated and can be given as:

3.

PTS

AND MODIFIED

PTS

SCHEME

Partial Transmit Sequence (PTS) algorithm was first proposed by Muller S H and Huber J B , which is a technique for improving the statistics of a multi-carrier signal. The basic idea of partial transmit sequences algorithm is to divide the original

OFDM sequence into several sub-sequences, and for each subsequence, multiplied by different weights until an optimum value is chosen.

ICCIT15@CiTech, Bangalore where {m=1,2,….M} are weighting phase factors and are assumed to be rotated using different combinations. Here X m is the partially transmitted sequence. The increase in the number of phase factor decreases PAPR of the OFDM signal but in turn increases the hardware complexity of the system.

The partial transmit sequence scheme is an attractive solution to reduce PAPR in MIMO-OFDM system without any distortion of transmitted signals. In the PTS scheme, the input data block is partitioned into disjointed subblocks. And each subblock is multiplied by phase weighting factors, which obtained with optimization algorithm. If the subblocks are optimally phase shifted, they exhibit minimum PAPR and consequently reduce the PAPR of the merged signal.

The number of subblocks (V) and the subblock partition scheme determine the PAPR reduction. The main drawback of PTS arises from the computation of multiple IFFTs, resulting in a high computational complexity with the factorial of the product of transmit antennas number and subblocks number.

In general, Subblock partitioning types can be classified into

3 categories; interleaved partition, adjacent partition, and pseudo-random partition. For the interleaved method, every subcarrier signal spaced apart is allocated at the same subblock. In the adjacent scheme, successive subcarriers are

Page 441

International Journal of Engineering Research

Volume No.4, Issue Special 5 assigned into the same subblock sequentially. Lastly, each subcarrier signal is assigned into any one of the subblocks randomly in the pseudorandom scheme. It can be noted that the computational complexity of the interleaved subblock partitioning scheme is reduced extensively as compared to that of the adjacent and pseudorandom partition scheme. This subblock partitioning scheme reduces considerably the envelope fluctuations in the transmitted waveform.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 larger series of data to reduce the outcome of burst errors.

The use of interleaving to a great extent increases the capability of error protection codes for corrected burst errors. Many of the error protection coding processes can correct for small numbers of errors, but cannot correct for errors that occur in groups.

A data symbol vector is encoded with space-time encoder takes a single stream of binary input data and transforms it into two vectors S1 and S2 as: which are given to the IFFT blocks and feded to transmitter antennas i.e. are and respectively.

Symbols S1 and S2 represents the two continous time domain signals and which is combined to V clusters:

5.

Fig.3. Interleaving operation

PROPOSED SCHEME where { 1,2,……..V

} are weighing factors and are asumed to be perfect rotation.

Modified PTS technique

In PTS technique to find the minimum PAPR signal after multiplication of phase factors, exhaustive search algorithm is used. Mathematical complexity of exhaustive search is high. So in modified PTS technique

[10]

, neighbourhood search algorithm is used to find the minimum PAPR signals.

Fig.2. Block diagram of OFDM system using PTS technique

4.

INTERLEAVING

Interleaving is the data reordering which is to be transmitted in such a way that successive bytes of data are scattered over a

Fig 4.: Structure of transmitter end using modified PTS combined with interleaving and pulse shaping method

6.

RESULTS

The proposed work is computed in MATLAB Simulation.

The results obtained can be shown in below figures. The analysis of PAPR reduction performance for modified PTS combined with interleaving and pulse shaping methods has been carried out using the MATLAB simulation.

The simulation parameters used in the analysis of PAPR reduction are represented in the following table.

ICCIT15@CiTech, Bangalore

Page 442

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Table: simulation parameters

SIMULATION

PARAMETERS

Number of OFDM blocks

Number subcarriers (N) of of Number subbloacks

Oversampling factors(L)

Rolloff factor(α)

Values of parameters

4

1000

64,128,256,512,1024

2,4,8,16

0.6 (range - 0 to 1)

Subblock partitioning scheme

Number of antennas used( )

Interleaving and pulse shaping method

2

Modulation schemes 16-QAM

Phase factor(b) weighting 1, -1, j, -j

A modified technique PTS is used to represent the input information in the form of subblocks which is modulated by using 16- QAM and QPSK as modulation schemes. The phase rotating factors are directly transmitted to the receiver through subblocks. The PAPR performance is evaluted using CCDF which describes the probability that the new PAPR value is smaller/lowest than the original PAPR.

The figures below represents the PAPR reduction ouput and the

CCDF of MIMO-OFDM. Figure 1. Shows the MIMO-OFDM

PAPR reduction of OFDM subsequences which is taken at different subcarriers N = 64,128 and 256 For different subcarriers N= 64,128 and 256, the PAPR performace has been improved from 8.6dB to 5.6dB. The PAPR reduces as the number of value of subcarriers used in MIMO-OFDM decreases.

This performance of PAPR has been improved by using modfied

PTS technique combined with interleaving and pulse shaping method.

Fig.6.CCDF of PAPR for different subcarriers N = 64, 128,

256, when V=8, L=4, α=0.6

and Mt=2

Fig.5.CCDF of PAPR for different subcarriers N = 64, 128, 256 when V=4, L=4, α=0.6

and Mt=2

ICCIT15@CiTech, Bangalore

Fig.7.CCDF of PAPR for different oversampling factor L=

2, 4, 8, 16 when N=256, V=4,α=0.6

and Mt=2

The results are obtained by using modified PTS technique which is combined with interleaving and pulse shaping method. The waveform results for different subcarriers N =

64,128,256,512 and 1024 are represented in the figures. As the number of subcarriers increases in MIMO-OFDM, the

PAPR also increases with respect to subcarriers.

7.

CONCLUSION

The paper revolves around the idea behind the modified

PTS technique which helped to reduce the PAPR perfromance for MIMO-OFDM. The PTS technique has been used along with the interleaving and pulse shaping method. The results presents that the PAPR reduction has been improved to a greater extent i.e. PAPR has been reduced from 9.5dB to 5.2dB. It is an effective technique combined with interleaving and pulse shaping method for

STBC MIMO-OFDM system which is used to achieve a better trade off between complexity and PAPR performance. It also provides high data rates and helps to provide the data thoughput in a very better way. MIMO-

OFDM has several advantage and is very helpful in digital multimedia and wirless broadband mobile communication system.

Page 443

International Journal of Engineering Research

Volume No.4, Issue Special 5

8.

REFERNCES i.

S. B. Weinstein and Paul M. Ebert “Data Transmission by Frequency -Division

Multiplexing Using the D iscrete Fourier Transform”, IEEE Transactions on

Communication Technology, vol-19, no. 5, pp. 628-634, October 1971.

ii. Dae-Woon Lim, Seok-Joong Hoe, and JongSeon No, “ An Overview of PeaktoAverage Power Ratio Reduction Schemes for OFDM Signals”, Journal of

Communications and Networks, vol. 11, no. 3, pp. 229-239 , June 2009.

iii.

S.H. Muller and J.B. Huber, “ OFDM with reduced peak-to-average power ratio by optimum combination of partial transmit sequences”, IEEE Electronics

Letters, vol. 33 no. 5, pp. 368-369 , February 1997.

iv.

Parneet Kaur, Ravinder Singh, “ Complementary Cumulative Distribution

Function for Performance Analysis of OFDM Signals”, IOSR Journal of

Electronics and Communication Engineering (IOSRJECE), ISSN : 2278-2834, vol 2, Issue 5, pp 05-07, Sep-Oct 2012.

v. Taewon Hwang, Chenyang Yang, Gang Wu, Shaoqian Li and Geoffrey Ye Li,

“OFDM and its Wireless Applications: A Survey”, IEEE Transactions on

Vehicular Technology, vol. 58, no. 4, pp. 1673-1694 , May 2009.

vi. S.Sujatha and P.Dananjayan, “ PAPR reduction techniques in OFDM systems using DCT and IDCT’’, Journal of Theoretical and Applied Information

Technology 30 th

June 2014. Vol. 64 No.3.

vii. Zhongpeng Wang, “Combined DCT and Companding for PAPR Reduction in

OFDM Signals”, Journal of Signal and Information Processing, vol. 2, pp. 100-

104, 2011.

viii.

T. Jiang and Y. Imai, “An Overview: Peak - To-Average Power Ratio Reduction

Techniques for OFDM Signals,” IEEE Transactions on Broadcasting, vol. 54, no. 2, pp. 257- 268, 2008.

ix.

P.Mukunthan and P.Dananjayan, “PAPR reduction of and OFDM signal using modified PTS combined with interleaving and pulse shaping method”, European Journal of Scientific Research, Vol. 74, No. 4, May 2012,pp.

475-486.

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

ICCIT15@CiTech, Bangalore

Page 444

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Generation of Migration List of Media Streaming Applications for Resource

Allocation in Cloud Computing

Vinitha Pandiyan

,

Preethi

,

Manjunath S.

Deptt. Of CSE, Cambridge Institute of Technology, Bangalore, India

Abstract-

The recent trend and requirement for large storage in cloud computing has made migration and cloud virtualization technology increasingly popular and valuable in cloud computing environment due to the benefits of server consolidations, live migration, and resource isolation. Live migration of virtual machines can be used to implement energy saving and load balancing in cloud data centre. However, to our knowledge ,most of the previous work concentrated on the implementation of migration technology itself while didn’t consider the impact of resource reservation strategy on migration efficiency .This paper focuses on the live migration strategy of multiple virtual machines with different resource reservation methods. We first describe the live migration framework of multiple virtual machines with resource reservation technology .As soon as the virtual machine size increases then the data which is in migration list is transferred to the corresponding virtual machine.

Keywords : virtualization technology,

Google,Yahoo!,Microsoft, IBM and Sun

Amazon,

1.Introduction

Cloud computing has recently received considerable attention in both academics and industrial community as a new computing paradigm to provide dynamically scalable and virtualized resource as a service over the internet Currently, several large companies, such as Amazon, Google,Yahoo!,Microsoft, IBM and Sun are developing their own cloud platforms for consumers and enterprises to access the cloud resources through services .

Recently, with the rapid development of virtualization technology, more and more data canters use this technology to build new generation data center to support cloud computing due it the benefits such as server consolidation, live migration and resource isolation. Live migration of virtual machines means the virtual machine seems to be responsive all the time during the migration of the clients’ perspective. Compared with traditional suspend/resume migration ,live migrate holds many benefits such as energy saving, load balancing, and online maintenance

.Many live migration methods are proposed to improve the migration efficiency As the live migration technology widely used in modern cloud computing data center, live migration of multiple virtual machines becomes more and more frequent.

Different from the single virtual machine migration, the live migration of multiple virtual machines faces many new problems, such as migration failures due to the insufficient resource in target machine, migration conflicts due to the concurrent migrations, and migration trashing due to the dynamic changes of virtual machine workloads.All the above issues should be overcome to maximize the migration efficiency in virtual cloud data center environments In this paper, we study

ICCIT15@CiTech, Bangalore the live migration efficiency of multiple virtual machines from experimental perspective and investigate different resource reservation methods and migration strategies in the live migrations. We first describe the live migration frame work of multiple virtual machines with resource reservation technology.

Then we perform aseriesof experiments to investigate the impacts of different resource reservation methods on the performance of live migration in both source machine and target machine .Additionally, we also analyze the efficiency of parallel migration strategy and workload-aware migration startegy.The

matrix such as down time ,total migration time, and work load performance over heads are measured

2. Related Work

Resource allocation is one of the major aspects of cloud computing, dynamic resource allocation has its own challenges that has to be while implementing it, there are many techniques which have come up in order to deal with it. The cloud comprises of data center hardware and software [1].The resource allocation concept is been analyzed in many computing areas such as grid computing and operating systems. The prediction plays a very important role during the process of resource allocation. The prediction of CPU utilization for the upcoming demand has been studied in the literature. A prediction method was proposed based on Radical Basis Function (RBF) network by Y.Lu et.al for the purpose of predicting the user access demand. Also came up with the concept of multi – scaling. The statistical expected value can be obtained by service provider by using methods [13][14].Content Delivery Network [CDN] and central resource by data center should be used to represent and include multi-scaling for Video On Demand (VOD) with guarantee Quality Of Service (QOS) [1].There exists the necessity for the consolidation of cost involved in the process of streaming along with optimized experience of the user. There are different types of resource provisioning plan that can be chosen as per the requirement which are provided by cloud providers.[15] The two common types include on-demand plan and resource reservation by analyzing both types, the resource reservation is said to be more inexpensive compared to ondemand plan.

3. System Architecture

The components of the system architecture includes

Service Provider

Cloud server

Android User

Service Provider: The login operation is the main operation carried out at the service provider module. The login into the

Page 445

International Journal of Engineering Research

Volume No.4, Issue Special 5 cloud server is achieved by the cloud administrator which involves many crucial aspects of the entire process. On the successful completion of the login process, the browsing of multimedia files can be achieved

.

The details regarding the send file can be obtained along with the memory balance details.

Cloud server: Here the major operation is viewing all the multimedia files along with the memory maintenance which retrieves the information of the memory

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Fig1: System Architecture

is been utilized and also the remaining space. The cloud server also contains the prediction details, which provides predictive information about the future requirements of space required for the files based on the previous statistical data. While the media files are viewed the service provider /cloud administrator will have to check for the current amount of space that is available for further addition of the new file. If there exists a condition, where the available space is less compared to the space available on the cloud, then comes the concept of migration list. The data which does not fit into the available space is placed on the migration list which can be added on to the cloud server, the file will be stored in different place, all the data which is ready to be added but its temporarily not possible to be added due to insufficient space .The migration list can be defined as the temporary space where all the data is stored when the memory is not sufficient for the completion of the addition process.After

the memory is expanded with sufficient space for the addition of the new file the re – migration process takes place i.e. the data that is placed on the migration list is added to the server.

Android user: The android user should first register with the cloud by providing the necessary details. Once the registration process is completed successfully, the user can login with the corresponding username and password.

The available files can be viewed and also be carried out .On the selection of particular file, the user can view the rank of that particular file along with the comments provided by other users.

In this way the user can also provide the comments and rank for the particular multimedia file.

ICCIT15@CiTech, Bangalore

Fig2:Data Flow

4.Conclusion

Live migration of virtual machines is an efficient technology used to implement energy saving and load balancing in virtualized cloud computing data center. This paper, we study the live migration efficiency of multiple virtual machines from experimental perspective and investigate different resource reservation methods in the live migration process as well as other complex migration strategies such as parallel migration and workload-aware migration. Experimental results show that:

(1) Live migration of virtual machine brings some performance overheads. (2)The performance overheads of live migration are affected by memory size,CPU resource, and the workload types.

(3)Resource reservation in target machine is necessary to avoid the migration failures.(3) The adequate system resources in the source machine can make more parallel number of migrations and can obtain better migration efficiency.(5) The workloadaware migration strategy can efficiently improve the performance of migrated workload. Based on the experimental discoveries, three optimization methods, optimization in the source machine, parallel migration of multiple virtual machines and work-aware migration strategy, are proposed to improve the migration efficiency. Future work will include designing and implementing intelligent live migration mechanism to improve the line migration efficiency in the multiple virtual machines scenario and studying the migration strategies as an optimization problem using mathematical modelling methods.

References i.

M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G.

Lee, D. Patterson, A. Rabkin, I. Stoica et al.

, “A view of cloud computing,”

Communications of the ACM, vol. 53, no. 4, pp. 50 – 58, 2010.

ii.

C. Waldspurger, “Memory resource management in V Mware ESX server,” ACM SIGOPS Operating Systems Review, vol. 36, no. SI, p. 194, 2002.

iii.

P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R.

Neugebauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,” in

Proceedings of the nineteenth ACM symposium on Operating systems principles,

2003, p. 177.

Page 446

International Journal of Engineering Research

Volume No.4, Issue Special 5 iv.

D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L.

Youseff, and D. Zagorodnov, “The eucalyptus open -source cloud-computing system,” in Proceedings of the 2009 9 th

IEEE/ACM International Symposium on

Cluster Computing and the Grid-Volume 00, 2009, pp. 124 – 131.

v.

K. Ye, X. Jiang, D. Ye, and D. Huang, “Two Optimization

Mechanisms to Improve the Isolation Property of Server Consolidation in

Virtualized Multicore Server,” in Proceedings of 12th IEEE International

Conference on High Performance Computing and Communications, 2010, pp.

281 – 288.

vi.

C. Clark, K. Fraser, S. Hand, J. Hansen, E. Jul, C. Limpach, I. Pratt, and A.Warfield, “Live migration of virtual machines,” in Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation-

Volume 2, 2005, p. 286.

vii.

M. Nelson, B. Lim, and G. Hutchins, “Fast transparent migration for virtual machines,” in Proceedings of the annual conference on USENIX Annual

Technical Conference, 2005, p. 25.

viii.

D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L.

Youseff, and D. Zagorodnov, “The eucalyptus open -source cloud-computing system,” in Proceedings of the 2009 9 th

IEEE/ACM International Symposium on

Cluster Computing and the Grid-Volume 00, 2009, pp. 124 – 131.

ix.

M. Nelson, B. Lim, and G. Hutchins, “Fast transparent migration for virtual machines,” in Proceedings of the annual conference on USENIX Annual

Technical Conference, 2005, p. 25.

x.

Y. Luo, B. Zhang, X. Wang, Z. Wang, Y. Sun, and H. Chen, “Live and incremental whole system migration of virtual machines using blockbitmap,” in

Proceedings of the IEEE International Conference on Cluster Computing, 2008, pp. 99 – 106.

xi.

H. Liu, H. Jin, X.

Liao, L. Hu, and C. Yu, “Live migration of virtual machine based on full system trace and replay,” in Proceedings of the 18 th

ACM international symposium on High performance distributed computing, 2009, pp.

101 – 110.

xii.

M. Hines and K. Gopalan, “Post -copy based live virtual machine migration using adaptive pre-paging and dynamic selfballooning,” in

Proceedings of the 2009 ACM SIGPLAN/ SIGOPS international conference on

Virtual execution environments, 2009, pp. 51 – 60.

xiii.

K. Ye, J. Che, X. Jiang, J. Chen, and X. Li , “vTestkit: A Performance

Benchmarking Framework for Virtualization Environments,” in Proceedings of fifth ChinaGrid Annual Conference, 2010, pp. 130 – 136.

xiv.

D. Niu, Z. Liu, B. Li, and S. Zhao, “Demand forecast and performance prediction in peer-assisted ondemand streaming systems,” in Proc. of IEEE

Infocom conference, pp 421 – 425, 2011.

xv.

D. Niu, H. Xu, B. Li, and S. Zhao, “Quality Assured Cloud

Bandwidth Auto-Scaling for Video-onDemand Applications,” in Proc. of IEEE

Infocom Conference, pp. 421 – 425, 2012.

xvi.

Models,”

E. White, M. O’Gara, P. Romanski, P. Whitney, “Cloud Pricing in Cloud Expo: whitepaper,2012.http://java.syscon.com/node/2409759?page=0,1.

Article,

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

ICCIT15@CiTech, Bangalore

Page 447

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5

Child Tracking in School Bus Using GPS and RFID

19 & 20 May 2015

ShilpithaSwarna, Prithvi B. S., Veena N.

Dept of CSE, SVIT, Bangalore

Abstract

: Many researches on real-time vehicle tracking is conducted, like wise tracking a school bus is important, sending a child to school by bus can be wracking for parents. It is important to know if their child has boarded to the right bus

,safe on it and reached a correct destination(i.e school) on time. According to the statistics conducted by world health organization (WHO).In India about 41% of children die due to lack of road transportation safety. This paper presents a reliable real time tracking system using global positioning system (GPS),global system for mobile communication (GSM) services and RFID or smartcard ,which keeps real time tracking of child at all time. Parents can log in into their mobile or web to track the bus to know whether the bus is running late and minimizes the time children to wait at bus stop by which less time to be exposed to be exposed to criminal predators, bad weather or any other dangerous condition avoided to the child.

Key-Words: -tracking, global positioning system, global system for mobile communication, RFID, Google map API.

2. Introduction

There is a huge demand for tracking devices, which is actually considered a life saving devices. These devices keep track of children and update about the real time tracking to their parents. During the time of disasters, these system helps the parents to track their children location. According to hind[1] tracking provides several services like stolen of assets, to keep track of the behavior of the employee at workplace environment.

Parents must know about child safety in school bus, sending a child to school by bus is a wracking for their parents.

The parents should know whether the child has boarded the bus, safely reached the school the school, found the right bus to reach home on time. To keep the real time tracking of children by using the GPS installed in school bus make parents to be bit relaxed on their safety while travelling in school bus by installing such safety components make bus and child tracking easier and saferincludes accountability, increases the convenience and savings.

Providing safety measures to their children while travelling in school bus is a important concern for parents by using GPS tracking , GSM services and RFID or smartcard make parents to relax on their ward safety by using these components for real time tracking using which parents can login anywhere to find the location of bus and child on their mobiles or web.

3. System Implementation

The GPS receiver will receive the location coordination(longitude ,latitude, speed, device data) from the

ICCIT15@CiTech, Bangalore satellite with the resolution (frequency) based on the user requirement like 10 readings/minute in NMEA format will contain raw data with huge information.

The microcontroller processes the raw data information according to the algorithm present. The location co-ordination instructs the GSM modem to provide serial communication to server(database).

Figure 1:Block diagram of School bus tracking

Figure 1 shows the overview of the tracking system installed in the school bus.

RDID is used find the child login and logout from the bus. There are two types of RFID namely active and passive

RFID. In which active RFID reader will be centralized in the bus. In Passive RFID reader will be installed at door.

Figure 2 :RFID integrated GPS tracking device.

The raw GPS data collected from the device installed in the school bus will send it to the NMEA server at the

Page 448

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 reconfigured timings ( i. e 10 readings/minute).

19 & 20 May 2015

Figure 3 :Tracking Architecture

In figure 3 shows the tracking architecture in which the

GPS device is installed in the school bus would provide raw

GPS data to the NMES server would parse the input raw data and output will be sent to server and this data will be computed based on the sms logic the message will be triggered to the parents. The parents can login with mobiles or web to access this location information.

To access their child location information the parents will be given with the Id and password. If the driver exceeds the limited speed the alert can be generated to the administrator.

Figure 4 :view of location in mobile

Figure 6 :NMEA of GPS receiver.

The NMEA server would get the raw data(unreadable form and encrypted form) of GPS as input and parsed output.

4. Conclusion and Future work

The RFID and GPS tracking is designed and implemented in the school bus, parent can track there children location provided with more reliable information about the boarding and departed of the child. The location coordinates(latitude ,longitude) will be converted by Google API.

This system would work on system GPS accuracy which would depend on weather condition and satellite coverage, delay in tracking message provided to the parents about there ward.

REFERENCES i.

Design and implementation of an accurate real time GPS tracking system Hind AbdalsalamAbdallahDafalla ii.

April2011

What is PHP, URL: http://www.techrepublic.com/ accessed on 3 iii.

Michaelkofler, “ The definitive guide to MYSQL5 , third edition”,

NewYork 2005, (PAGE 3, 4, 5, 6, 7).

iv.

Official Google Map API website, URL http://code.google.com/apis/maps/faq.

v.

OZEKI NG SMS gateway website, URL: http://www.ozekisms.com/index.php February 2011.

ICCIT15@CiTech, Bangalore

Page 449

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Risk Mitigation by Overcoming Time Latency during Maternity- An IoT

Based Approach

Sanjana Ghosh, R. Valliammai, Kiran Babu T.S., Manoj Challa

sagh12cs@cmrit.ac.in, vamr12cs@cmrit.ac.in, kiran.ts@cmrit.ac.in, manoj.c@cmrit.ac.in

Abstract — Coming along with the recent development of Iot

(Internet of Things), wireless devices have invaded the medical science with a broad spectrum of possibilities. Along with improving the quality of life of patients, wireless technology enables patients to be monitored remotely during emergencies and provides them health information, reminders, and support potentially extending the reach of health care by making it available anytime. The wireless sensor networks are inserted into the vaginal canal and can detect electrical signals associated with uterine contractions, sensing that labor has begun. These sensors detect signals directly from the specific points in the body where they originate responsible for sensing uterine contractions, even during the preterm labor. These sensors are responsible for transmitting information to the cellphones which in turn alerts the maternity centers so that

patient receives apt treatment.

Keywords : Internet of Things, wireless technology for healthcare by accurately monitoring, measuring and analyzing a variety of health status indicators

II. RELATED WORKS

1. The Sure CALL Labor Monitor technology is used for calling the onset of labor. Uterine Electromyography Labor Monitoring detects uterine muscle contractions from abdominal recordings of electrical signals generated in the uterus, uterine EMG activity can be measured by abdominal surface electrodes.

Tocodynamometers are external pressure measurement devices which are being used to measure the contractions of the uterus and are the primary type of external monitor. The patient wears the device on a tightly attached belt, which must maintain a constant pressure on a pressure-sensing device. As the uterus contracts, a strain gauge measures the pressure exerted by the abdomen on the device.

I. I NTRODUCTION

The Internet of Things refers to a wireless connectivity medium between objects. Internet of things is not only a global network for people to communicate with one another, but it is also a platform where devices communicate electronically with each other and the world around them. From any time and place, connectivity for everyone, we will now have connectivity for anything.

Embedding short-range mobile transceivers into a wide array of additional gadgets, enabling new forms of communication between people and things and between things themselves, is the essence of IoT. The term "Internet of Things" has been formulated to describe a number of technologies and research disciplines that enable the Internet to reach out into the real world of physical objects.

In the current scenario it is being realized that integration of small microcontrollers with sensors can result in creating of extremely useful and powerful devices, which can be used as an integral part of the sensor nodes. These devices are termed as sensor nodes. Nodes are able to communicate each other over different protocols.

In a society, where a joint family system is not prevailing, particularly in nuclear families, an emergency can arise at any time. A Typical case of a working couple where the husband is away at work and the wife alone at home has labor pain and she has to be put in contact with the hospital for an emergency.

For this purpose smart sensors, which combine a sensor and a microcontroller, make it possible to couple the power of the IoT

Unfortunately, the Tocodynamometer is recognized to be lacking in accuracy and in comfort. Many other factors affect the pressure measurement, such as the amount of fat, instrument placement and uterine wall pressure, body movements, gastric activity and other non-labor induced stresses on the device can be misinterpreted for labor contractions.

As a result, the Tocodynamometer crudely and indirectly measures uterine contractions and therefore cannot identify true labor contractions. The devices themselves are also uncomfortable and inconvenient for the patient. They are not suitable for ambulatory uses and have not proven to be very effective for home uterine monitoring. Cost, instrument reliability and patient mobility are also issues.

SureCALL Wireless Remote Model will couple wirelessly to a cellular phone within the range where a customized application will transmit patient recordings to a clinic or monitoring service.

ICCIT15@CiTech, Bangalore

Page 450

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 are typically very small electronic devices, equipped with a limited power source.

An embedded system is a part of a product that has a specific task to be performed or it has a dedicated function. These systems are not directly interacted or controlled by end users. In any embedded application, the hardware given to a system is unique and the software has to specially written to make optimum use of the available resources.

In this embedded application, contraction signals in the uterus are detected by the sensor node. These analog signals are converted into digital signals by the ADC converter, and passed on to the microcontroller. The digital signal is checked for its validity within the microcontroller. Validity includes checking of the specific threshold for contractions.

2. This device is developed to detect a woman's likelihood of delivering a premature baby. The Cervo Check is a small ring like structure that are embedded with sensors which picks up electrical signals associated with uterine contractions. The ring is designed such that it is easy to embed in a woman's vaginal canal at a physician's office or hospital.

Once the signal is validated, the contraction processing is done and if this processing is successful, then the microcontroller activates the GSM modem. The GSM modem is disabled until a confirmed contraction signal is processed. The GSM radio signals are kept disabled by design due to the reason so that the harmful radio signals need not be kept ON for the safety of the foetus.

The babies born before 37 weeks gestation are considered to be preterm while the normal gestation period is 40 weeks. By detecting preterm contractions with greater accuracy, this system could allow doctors to take steps at an early stage to prevent preterm births.

This new system picks up early signs that a woman is going into labor too soon. This device has only been tested on animals at this point. This device can prolong the pregnancy for almost six weeks.

The ring is made up of medical grade biocompatible silicone elastomer. Sensors are embedded within the ring and are designed to pick up electrical signals that are associated with uterine contractions. These sensors detect signals directly from the places in the body where they originate, and not through the abdominal wall.

The GPS module has the present position of the latitude and longitude. The present position address is taken from the Google map which is stored in the external memory.

The microcontroller selects the current address. This current address is processed by the microcontroller for the SMS constructor module. The default message and the location are combined and sent to the GSM modem.

The GSM modem is interfaced with the microcontroller for SMS communication with cellphones. The default message and the current location are combined and sent to the GSM modem. The numbers stored in the GSM sim are fetched by the microcontroller. Once the emergency numbers are fetched, an

SMS to these specific emergency numbers are sent to the cellphones.

IV. PROPOSED DESIGN

To find signs of preterm labor, physicians have so long relied on a tocodynamometer, but this device is not effective and accurate at detecting preterm labor very early in a pregnancy.

III. EMBEDDED APPLICATION

The wireless sensor nodes containing the sensors for sensing the muscle contaractions are inserted into the vaginal canal and can detect electrical signals associated with the uterine wall contractions, a sign that labor has begun. Wireless sensor nodes

ICCIT15@CiTech, Bangalore

Page 451

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

FUTURE ENHANCEMENT

The implanted wireless sensors networks can be enhanced to detect the heart rate of the foetus and the pulse rate of the mother. This information can be transmitted to the cellphones and the healthcare centers.

REFERENCES i.

CERVOCHECK developed by JOHNS HOPKINS graduates Karin

Hwang, Chris Courville, Deepika Sagaram and Rose Huang.

ii.

Hunhu Healthcare Raising the bar in fall detection technology.

iii.

An Energy Efficient Cross-Layer Network Operation Model for IEEE

802.15.4-Based Mobile Wireless Sensor Networks Al-Jemeli, M. ; Hussin, F.A.

iv.

Applications of wireless sensors in medicine Furtado, H. ; Trobec, R.

MIPRO, 2011 Proceedings of the 34th International Convention.

v.

Medical Applications Based on Wireless Sensor Networks Stanković,

Stanislava, 2005.

CONCLUSION

Sensor based networks coupled with IoT is the latest advancement in technology which is being explored to its fullest extent in the current times. Medical applications based on

Internet of things are still research projects with good potential for utilization. Great numbers of medical scenarios are being covered with these applications and that opens wide spectrum of benefits for medical practioners. Solution proposed for this problem can be a game changer in achieving high success rates in timely and safe deliveries.

This can play a vital role, in mitigating of risks and overcoming the time lapse during an emergency.

ICCIT15@CiTech, Bangalore

Page 452

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

Predicting Future Resources for Dynamic Resource Management using

Virtual Machines in Cloud Environment

1

VidyaMyageri

1

, Mrs. Preethi. S

2

Deptt. of Computer Science & Eng.,

2

Dept of Information Technology

Cambridge Institute of Technology,K.R.Puram, Bangalore, India

Email id- gmvidya_viddu@yahoo.com, preethi.srinivas2002@gmil.com

ABSTRACT:Cloud Computing became an optimal solution for business customers to maintain and promote their business needs to clients via internet. Now a days the cloud computing allows the business costumer to scale up and down their resource usage based on needs. In order to achieve resource multiplexing in cloudcomputing, recent researches were introduced dynamic resource allocation through virtual machines. Existing dynamic approaches followed unevenness procedures to allocate the available resources based on current workload of systems. Unexpected demand for huge amount of resources in future may cause allocation failure or system hang problem. In this paper we present a new systematic approach to predict the future resource demands of cloud from past usage. This approach uses the resource prediction algorithm to estimate future needs to avoid allocation failure problem in cloud resource management. And skewness algorithm to determine the unevenness in the multidimensional resource utilization of a server.

Keywords: Dynamic resource allocation, Cloud computing,

Resource Prediction Algorithm, Virtual machine migration,

Load balancing, Skewness, Green computing,Hotspot migration and Cold spot migration.

INTRODUCTION

Cloud computing is a fast growing technology that currently being studied in[1].It has moved computing and data away from desktop and portable PCs, into large data centers [2]. It has the capability to harness the power of Internet and wide area network (WAN) to use resources that are available remotely, thereby providing cost effective solution to most of the real life requirements[3][4]. Majority of Business customers interested towards cloud computing and they started their app migration with cloud environment to promotetheir business operations to end client with low investments and high availability. Due to this increased adoption, Resource Management in Cloud (RMC) becomesanimportant research aspect in this area.

Earlierapproaches [5][6] were used evenness procedure in resource distribution to allocate the available resources among the running applications. This approach may leads to resourceover flow due to high amount of resourceallocation than required and resource underflow due to less amount of resource allocation than required. Always resource needs for a running application changes from time to time depends on number of live clients.

In order to overcome resource overflow and resource underflow problems from evenness distribution recent

ICCIT15@CiTech, Bangalore researches were introduced dynamic resource management [7 ,8] with virtual systems. These systems will consider the available resources at server and allocates them to applications based on application workload requirements. To achievethis dynamic managements systems follows unevenness algorithms and on demand resource allocation strategies. This approach will manage the resources dynamically in an efficient manner with virtualization of cloud systems. Dynamic mapping of virtual requirements with physical resources will also helpto avoid SLA violations[9] in cloud environment. Sometimes unexpected demand for huge amount of resources in future may cause allocation failure or system hang problem.

In order to mitigate these problems, in this paper we present a new systematic approach to predict the future resource demands of cloud from past usage. This approach analyzes the resource allocation logs of virtual server, SLA agreements and follows the demand prediction algorithm to estimate future needs to avoid allocation failure problem in cloud resource management. Our approach uses the present and past statistics to predict the future requirements in an efficient manner. To do this we proposed two different methodologies in this paper are (i) hours-bounded (ii) days-bounded resource prediction techniques.By integrating the results of these methodologies our approach assess the reliable resource requirements in future.

Experimental results aresupporting our strategy is more scalable and reliable than existing approaches.

The rest of the paper is organized as follows:

Section 2 discusses about related work, followed by proposed system design which consist of load balancing Cloud architecture, Load prediction algorithm, Skewness algorithm and finally results and future scope.

RELATED WORK

Computing is an emerging computing technology thatis rapidly consolidating itself as the next big step in thedevelopment and deployment of an increasing number ofdistributedapplications.Cloud computing nowadays becomes quite popular amonga community of cloud users by offering a variety of resources.Cloud computing platforms, such as those provided byMicrosoft, Amazon, Google, IBM, and Hewlett-

Packard, letdevelopers deploy applications across computers hosted by acentralorganization.These applications can access a largenetwork of computing resources that are deployed andmanaged by a cloud computing provider.

In cloud platforms, resource allocation (or load balancing) takes place at two levels. First, when an application is uploaded to the cloud, the load balancer assigns the requested instances to physical computers, attempting to balance the computational load of multiple applications across physical

Page

453

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 computers. Second, when an application receives multiple incoming requests, these requests should be each assigned to a specific application instance to balance the computational load across a set of instances of the same application. For example,

Amazon EC2[15] uses elastic load balancing (ELB) to control how incoming requests are handled. Application designers can direct requests to instances in specific availability zones, to specific instances, or to instances demonstrating the shortest response times.

Elnozahy et al. [11] have investigated the problem of power efficient resource management in a single webapplication environment with fixed SLAs (response time) and load balancing handled by the application. As in [13], two power saving techniques are applied: switching power of computing nodes on/off and Dynamic Voltage and Frequency Scaling

(DVFS). The main idea of the policy is to estimate the total CPU frequency required to provide the necessary response time, determine the optimal number A.However, the transition time for switching the power of a node is not considered. Only a single application is assumed to be run in the system and, like in

[10], the load balancing is supposed to be handled by an external system. The algorithm is centralized that creates an SPF and reduces the scalability. Despite the variable nature of the workload, unlike [11], the resource usage data are not approximated, which results in potentiallyinefficient decisions due to fluctuations. Nathuji and Schwan [14] have studied power management techniques in the context of virtualized data centers, which has not been done before.

Besides hardware scaling and VMs consolidation, the authors have introduced and applied a new power management technique called “soft resource scaling”. The idea is to emulate hardware scaling by providing less resource time for a VM using the Virtual Machine Monitor

‟s (VMM) scheduling capability.

The authors found that a combination of “hard” and “soft” scaling may provide higher power savings due to the limited number of hardware scaling states. The authors have proposed an architecture where the resource management is divided into local and global policies. At the local level the system leverages the guest OS

‟s power management strategies. However

, such management may appear to be inefficient, as the guest OS may be legacy or power unaware.

The provision of resource may be made with various virtualization techniques. This may ensure a higher throughput and usage than the existing cloud resource services. The future work is required to deals with the evolutionary techniques that will further result in better resource allocation, leading to improve resource utilization. These resource allocation strategies have the following limitations.

a)Since users rent resources from remote servers for their purpose, they don’t have control over their resources.

b)Migration problem occurs, when the users wants to switch to some other providerfor the better storage of their data. It’s not easy to transfer huge data from one provider to the other.

c)More and deeper knowledge is required for allocating and managing resources in cloud, since all knowledge about the working of the cloud mainly depends upon the cloud service provider.

Hence the existing systems are has the limitations as migration of resources, overloading at server and migrates only working set of an idle VM. To overcome from these limitations used.

1.

A.

System architecture

SYSTEM DESIGN

The proposed system presents the design and implementation of an automated resource management system that achieves a good balance. The proposed system makes the following three contributions: a)Develops a resource allocation system that can avoid overload in the system effectively while minimizing the number of servers b)Introduces the concept of “skewness” to measure the uneven utilization of a server. By minimizing skewness, thus we can improve the overall utilization of servers in the face of multidimensional resource constraints.

c)Designs a load prediction algorithm that can capture the future resource usages of applications accurately without looking inside the VMs. The algorithm can capture the rising trend of resource usage patterns and help reduce the placement churn significantly.

Fig 1: System architecture

The Fig.1 represents the architecture of the dynamic resource allocation for cloud computing environment, which consists of N servers each server consists of two virtual machines(VM) those are connected to the VM scheduler is connected to the internet todistribute the resources dynamically to the clients ,the clients are accessing resources through the internet. Virtual machine (VM) is a software implementation of computing environment in which operating system or program can be installed and run. The VM Scheduler is invoked periodically and receives the resource demand history of VMs, the capacity and the load history of server, and the current layout of VMs on servers.

19 & 20 May 2015 this paper presents skewness algorithm which uses green computing technologies and load prediction algorithm which uses past resource usage to predict the resources for present working environment.

ICCIT15@CiTech, Bangalore Page

454

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5

B. Skewness Algorithm

The paper introduces the concept of skewness to quantify the unevenness in the utilization of multiple resources on a server. By minimizing the skewness, we can combine different types of workloads nicely and improve the overall utilization of server resources.

Let n be the number of PMs and m be the number of

VMs in the system, respectively. The number of resources such as CPU, memory, I/O, network, etc. that should be considered is usually a small constant. Thus the calculation of the skewness and the temperature metrics for a single server takes an invariable amount of time. During load prediction, we need to apply the FUSD algorithm to each VM and each PM. The time complexity is O (n+m).We define the resource skewness of a server p as the following equation wherer- average utilization of all resources for server p.

In practical, we are not considering all the types of resources because all are not performance critical. So in the above calculation we only need to consider bottleneck resources.

Doing this we can improve the overall resource utilization of the server by combining different types of workload by minimizing the skewness.

C. Hot spot migration

To reduce the temperature of the hot spots to less or equal to warm threshold. The nodes in hot spots are sorted by quick sort in the descending order. VM with the highest temperature should be first migrated away. Destination is decided based on least cold node. After every migration the status of each node is updated. This procedure continues until all hot spots are eliminated.

The VM which is removed from the identified hot spot can reduce the skewness of that server the most. For each VM in the list, if a destination server can be found to accommodate it then that server must not become a hot spot after accepting this

VM. Among all such servers, we select one whose skewness can be reduced the most by accepting this VM. Note that this reduction can be negative which means we select the server whose skewness increases the least. If a destination server is found, then the VM can be migrated to that server and thepredicted load of related servers was updated. Otherwise, move on to the next VM in the list and try to find a destination server for it. As long as a destination server was found for any of its VMs, it can be considered as this run of the algorithm a success and then move on to the next hot spot. Note that each run of the algorithm migrates away at most one VM from the overloaded server. This does not necessarily eliminate the hot spot, but at least reduces its temperature. If it remains a hot spot in the next decision run, the algorithm will repeat this process.

It is possible to design the algorithm so that it can migrate away multiple VMs during each run. But this can add more load on the related servers during a period when they are

Input: present &past resource usage chart output: future resource prediction chart(FRPC)

PM : Physical Machine,

RUC : resource usage chart foreach pm in cloud do

RUC ← getUsageChart(pm)

SRUC ← doUsageAnalysis(RUC)

19 & 20 May 2015 already overloaded. It is decided to use this more conservative approach and leave the system some time to react before initiating additional migrations.There are two scenarios are considered in hot spot mitigation. In the first scenario, the VMs running in identified hot spots are migrated to warm spot servers which will not become hot by accommodating the VMs. In the second scenario, if sufficient warm spots are not available to accommodate the VMs in the hot spot, few loads are migrated to the nodes in the cold spot also to mitigate the hot spots.

D.

Predicting Future resource need

Recent cloud architectures are facing resource management problem, due to unexpected requirement of huge resources (CPU time, memory, networks etc.) in an asynchronous manner.Current dynamic resource allocation methodologies are having the capability to map virtual system resources with physical systems dynamically depends on work load.This process will adjust the available resources with the help of hotspot and cold spot migration among the physical machines.

Dynamic resource allocation may fail under some circumstances, when suddenly there is an unexpected huge resource requirement for a physical machine.To address the above pro blem, in this paper we introduced “Predicting Future

Resource Requirement Framework(PFRRF)” to assess the future resource needs. This framework is an extension to dynamic resource allocationandmanagementarchitecturePFRRF considers the log files of resource allocation system to analyze and estimate future requirements.

To achieve this our framework uses theResource

Prediction Algorithm(RPA) which takes structured log file contentas training data and time bounded methodologies for decision making process in resource assessing.Semi-structured log file data would be transformed tostructured log files, whichare having the periodic resource allocation charts (PRAC) for every physical machine running on cloud environment.PRAC contains individual resource level mapping to each physical machine and hot spot, cold spot thresholds.

After that PRAC’s of every physical system are sorted on date parameter and clustered individually at physical system level.

Theseindividual clusters are given as input data toRPA to predict the future workload of every physical machine. After computing of physical machines predictionresults, resourcemigration will be performed based on resource overflow and under flow results.This migration helps to know the additionalresource requirement for future based on current and past log requirements. At the end, sum of every physical machine level additional requirement will be the final requirements, which to be added to cloud resource pool to avoid future underflows and hanging problems.

4.RESOURCE PREDICTION ALGORITHM

ICCIT15@CiTech, Bangalore Page

455

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5 19 & 20 May 2015

PRUC ← predictFutu reRequirements(SRUC) if(PRUC <= THRESOLD) setOverFlowFlag() elsesetUnderFlowFlag() wishList.addToWishlist(PRUC) endforeach;

FRPC = doMigration(wishList) return FRPC

End

First RPA considers every physical machine in cloud and generates the resource usage chart (RUC) for every machine based on cloud resource consuming log file. Based on RUC data

RPA creates the SRUC to predict future requirements

PRUC. If the requirements are less than the threshold the resource underflow flag will be set else resource overflow flag will be set and be added to wish list. Migration method[12] Will take the wish list as input and performs mapping to return FRPC.

Fig 2: Predicted vs. consumed memory utilization representation

2.

SIMULATION AND RESULTS

In this section we discuss about the performance of our framework and comparison with other approaches to prove the efficiency. Our experiments mainly concentrated on prediction of future resource requirements based on present and past usage details from server log files. In this case we observed the three important resources are memory, process and network resource.

In order to perform these experiments we were selected the Unixbased private cloud hosting center which is managing more than 20 applications of various technologies.On pilot basis we were taken 8 applications to adopt our framework process separately. For the training data we had observed the last 12 months resource allocation charts from log files. This cloud center is alreadyfollowing its ownresource allocation technology for all physical machinesrunning on hosting center. Along with allocation accuracy checking this testing also considers the SLA violations with every physical machine.

RPA evaluation:Every physical Machine of Cloud hosting center is having 8 GB of RAM, Core i7 (2nd Gen) Processor and

20GB of Hard Disk. Apart from this additional resources are available with the hosting center to allocate dynamically as per the requirements. We deploy 8 virtual machines to map with 8 physical machines which are running the applications.For every five minutes the resource Allocation will be updated to adjust the resources among physical machines and writes the same on log files.

We had given the last one year resource allocation chart to experimental cloud center. After analyzing this data,our framework predicted thenext one month resource requirements hour wise and day wise as per applicability at physical machine level. We had compared the framework predicted resources for future with the consumed resources on the specific hours and days. These experimental results are shownin the below table .1

and accuracy of prediction also represented with graphs.

ICCIT15@CiTech, Bangalore

Fig 3: Predicted vs. consumed CPU utilization representation

Fig 4: Predicted vs. consumed Network utilization representation

3.

CONCLUSION

In this paper, we presented Predicting Future Resource

Requirement Framework (PFRRF) to assess the future resource needs. This framework is an extension to dynamic resource allocation and management architecture. Review shows that dynamic resource allocation is growing need of cloud providers for more number of users and with the less number of systems.Based on the changing demand the proposed system multiplexes virtual to physical resources. The system uses the skewness metric to mix VMs with different resources.

Page

456

International Journal of Engineering Research ISSN:2319-6890(online), 2347-5013(print)

Volume No.4, Issue Special 5

The proposed algorithm achieve overload avoidance BY predicts the future resource needs based on present and past allocation data from resource log files and by using green computing technology we can turn off the idle serves. The skewness algorithm supports load balance as well as green computing.

We use the Resource Prediction Algorithm (RPA) to assess the future need effectively. Experiments are proving the prediction accuracy and green computing for memory, CPU time and network resources in physical machines of cloud.

19 & 20 May 2015 v.

R. Nathuji and K. Schwan, “Virtualpower: coordinated power management in virtualized enterprise systems,” in Proc. of the ACM SIGOPS symposium on Operating systems principles (SOSP’07), 2007.

vi.

J. S.Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat,and R. P.

Doyle, “Managing energy and server resources inhosting centers,” ACM New

York, NY, USA,2001, pp. 103 – 116.

vii.

Atsuo Inomata, TaikiMorikawa, Minoru Ikebe, Sk.Md.

MizanurRahman: Proposal and Evaluation of Dynamin Resource Allocation

Method Based on the Load Of VMs onIaaS (IEEE,2010),978-1-4244-8704-2/11.

viii.

etahiWuhib and Rolf Stadler: Distributed monitoring and resource management for Large cloud environments(IEEE,2011),pp.970-975.

4.

REFERENCES ix.

HadiGoudaezi and MassoudPedram: Multidimensional SLA-based

Resource Allocation for Multi- tier Cloud Computing Systems IEEE 4th

International conference on Cloud computing 2011,pp.324 - 331.

i.

Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph,

Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion

Stoica, and MateiZaharia, “Abovethe clouds: A berkeley view of cloud computing,” UCB/EECS -2009-28.

ii.

R. Buyya, R.

Ranjan, and R. N. Calheiros, “Modeling And Simulation

Of Scalable Cloud Computing Environments And The Cloudsim Toolkit:

Challenges And Opportunities,” Proc. Of The 7th High Performance Computing

And Simulation Conference (HPCS 09), IEEE Computer Society, June 2009.

iii.

Nidhi Jain K ansal, “Cloud Load Balancing Techniques : A Step

Towards Green Computing”, IJCSI International Journal Of Computer Science

Issues, January 2012, Vol. 9, Issue 1, No 1, , Pg No.:238-246, ISSN (Online):

1694-0814 iv.

R. P. Mahowald, Worldwide Software As A Service 2010 – 2014

Forecast: Software Will Never Be Same ,In, IDC, 2010 xiii.

R. Nathuji, K. Schwan, Virtualpower: coordinated power management in virtualized enterprise systems, ACM SIGOPS Operating

Systems Review 41 (6) (2007) 265 – 278.

Nathuji, K. Schwan, Virtualpower: x.

E. Pinheiro, R. Bianchini, E.V. Carrera, T. Heath, Load balancing and unbalancing for power and performance in cluster-based systems, in:

Proceedings of the Workshop on Compilers and Operating Systems for Low

Power, 2001, pp. 182 – 195 xi.

E. Elnozahy, M. Kistler, R. Rajamony, Energy-efficient server clusters, Power-Aware Computer Systems (2003)179-197.

xii.

J.S. Chase, D.C. Anderson, P.N. Thakar, A.M. Vahdat, R.P. Doyle,

Managing energy and server resources in hosting centers, in: Proceedings of the

18th ACM Symposium on Operating Systems ,Principles, ACM, New York, NY,

USA, 2001, pp, 103-116 coordinated power management in virtualized enterprise systems, ACM

SIGOPS Operating Systems Review 41 (6) (2007) 265–278.

xiv.

“Amazon elastic compute cloud (Amazon

EC2),http://aws.amazon.com/ec2/.”

ICCIT15@CiTech, Bangalore Page

457

International Journal of Engineering Research

Volume No.4, Issue Special 5

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Pixel Based Approach of Satellite Image Classification

Rohith. K.M., Dr.D.R.Shashi Kumar, VenuGopal A.S.

Dept. of CSE, CiTech, Bengaluru -036 rohitgowda40@gmail.com, venudon02@gmail.com

Abstract--In this project pixel-based approach for urban land covers classification from high resolution satellite image using

K means clustering and ISO data clustering. Pixel based image analysis of image segmentation that is, clustering of pixels into homogenous objects, and subsequent classification or labeling of the pixels, and modeling based on the characteristics of pixels is done using MATLAB GUI model. When applied to a satellite image, the clustering algorithm approach involves two steps. First, each group or cluster is homogeneous; i.e.

examples that belong to the same group are similar to each other. Second, each group or cluster should be different from other cluster, i.e. examples that belong to one cluster should be different from the examples of other clusters. The algorithm was implemented in MATLAB GUI model and was tested on remotely sensed images of different sensors, resolutions and complexity levels.

Keywords: pixel based approach, high resolution satellite image.

INTRODUCTION

Clustering algorithms for remote sensing images are used to being divided into two categories: pixel-base and object-based approaches. Using pixel based clustering algorithms for high resolution remote sensing images, one could often find the

―pepper and s alt ‖ effect in the results because of the lack of spatial information among pixels.Using pixel based clusteringalgorithms for high-resolution remote sensing images, one could often find the ―pepper and salt ‖ effect in the results because of the lack of spatial information among pixels. In contrast, object-based clustering algorithms are not based on spectral features of individual pixels but on image objects, i.e., segments. Consequently, in terms of semantics, the quality of image objects is heavily dependent on segmentation algorithms.

In this letter, a novel clustering algorithm is proposed to detect geoobjectsfrom high-spatial-resolution remote sensing images using both neighborhood spatial information and probabilistic latent semantic analysis model (NSPLSA). The proposed algorithm is not based on either pixels or segments but on densely overlapped sub images, i.e., rectangular regions, with prefixed size. The probabilistic latent semantic analysis model

(PLSA), which is also called aspect model, is employed to model all sub images. Every pixel in each sub image has been allocated a topic label. The cluster label of every pixel in the large satellite image is derived from the topic labels of multiple sub images which cover the pixel. Unsupervised clustering is a fundamental tool in image processing for geosciences and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori

ICCIT15@CiTech, Bangalore information about the data is available. The problem of clustering points in multidimensional space can be posed formally as one of a number of well-known optimization problems, such as the Euclidean k-median problem, in which the objective is to minimize the sum of distances to the nearest center, the Euclidean kcenterproblem, in which the objective is to minimize themaximum distance, and the k-means problem, in which the objective is to minimize the sum of squared distances.

Efficient solutions are known to exist only in special cases such as the planar 2-center problem. There are no efficient exact solutions known to any of these problems for general k, and some formulations are known to be NP-hard. Efficient approximation algorithms have been developed in some cases.

These include constant factor approximations for the k-center problem, the k-median problem and the k-means problem. There are also approximation algorithms for the k-median and k-means problems, including improvements based on coresets. Work on the k-center algorithm for moving data points, as well as a linear time implementation of a 2-factor approximation of the k-center problem have also been introduced. In spite of progress on theoretical bounds, approximation algorithms for these clustering problems are still not suitable for practical implementation in multidimensional spaces, when k is not a small constant. This is due to very fast growing dependencies in the asymptotic running times on the dimension and/or on k. In practice, it is common to use heuristic approaches, which seek to and a reasonably good clustering, but do not provide guarantees on the quality of the results. This includes randomized approaches, such as clara and clarans, and methods based on neural networks. One of the most popular and widely used clustering heuristics inremote sensing is isodata. A set of n data points in ddimensionalspace is given along with an integer k indicating the initial number of clusters and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space.

Although there is no specifyoptimization criterion, the algorithm is similar in spirit tothe well-known k-means clustering method, in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant advantage of isodataover kmeansis that the user need only provide an initial estimate of the number of clusters, and based on various heuristics the algorithm may alter the number of clusters by either deleting small clusters, merging nearby clusters, or splitting large clusters. The algorithm will be described in the next section. As currently implemented, isodata can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. Our objective in this paper is not to provide a new or better clustering algorithm, but rather, to show how computational geometry methods. Can be applied to produce a faster implementation of isodataclustering. There are a number of minor variations of isodata that appear in the literature .These

Page 458

International Journal of Engineering Research

Volume No.4, Issue Special 5 variations involve issues such as termination conditions, but they are equivalent in terms of their overall structure. We focus on a widely used version, called isoclus, which will be presented in the next section. The running times of isodataand isoclustering are dominated by the time needed to compute the nearest among the k cluster centers to each of the n points. This can be reduced to the problem of answering n nearest-neighbor queries over a set of size k, which naively would involve O(kn) time. To improve the running time, an obvious alternative would be to store the k centers in a spatial index such as a kd-tree. However, this is not the best approach, because k is typically much smaller than n, and the center points are constantly changing, requiring the tree to be constantly updated. Kanungo proposed a more efficient and practical approach by storing the points, rather than the cluster centers, in a kd-tree. The tree is then used to solve the reverse nearestneighbor problem, that is, for each center. We compute the set of points for which this center is the closest. This method is called the iltering algorithm. We show how to modify this approach for isoclustering. The modifications are not trivial. First, in order to perform the sort of aggregate processing that theiltering algorithm employs, it was necessary to modify the way in which the isoclustering algorithm computes the degree of dispersion within each cluster. In order to further improve execution times, we have also introduced an approximate version of the lteringalgorithm. A user-supplied approximation error bound i > 0 is provided to the algorithm, and each point is associated with a center whose distance from the point is not farther than (1 + i) times the distance to its true nearest neighbor. This result may be of independent interest because it can be applied to k-means clustering as well. The running time of the iltering algorithm is a subtle function of the structure of the clusters and centers, and so rather than presenting a worst-case asymptotic analysis, we present an empirical analysis of its efficiency based on both synthetically generated data sets, and actual data sets from a common application in remote sensing and geostatistics.

As the experiments show, depending on the various input parameters (that is, dimension, data size, number of centers, etc.), the algorithm presented runs faster than a straightforward implementation of isoclustering by factors ranging from 1.3 to over 50. In particular, the improvements are very good for typical applications in geostatistics, where the data size n and the number of centers k are large, and the dimension d is relatively small. Thus, we feel that this algorithm can play an importanrole in the analysis of geostatistical data analysis and other applications of data clustering. The remainder of the paper is organized as follows. We describe in the next section a variant of isodata, called isoclustering, whose modification is the focus of this paper. In Section 3 we provide background, concerning basic tools such as the kd-tree data structure and the iltering algorithm that will be needed in our efficient implementation of isoclustering.

K MEANS ALGORITHM

This algorithm has as an input, a predefined number ofclusters

(i.e., the K from its name. Means stands for an average, an average location of all the members of a particular cluster).

When dealing with clustering techniques, one has to adopt a notion of a higher dimensional space, or space in which

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015 orthogonal dimensions are all attributes. The value of each attribute of an example represents a distance of the example from the origin along the attribute axes. Of course, in order to use this geometry efficiently, the values in the data set must all be numeric (categorical data must be transformed into numeric ones) and should be normalized in order to allow fair computation of the overall distances in a multiattributespace.

The K-means algorithm is a simple, iterative procedure, in which a crucial concept is the one of ― centroid ‖. Centroid is an artificial point in the space of records which represents an average location of the particular cluster. The coordinates of this point are averages of attribute values of all examples that belong to the cluster. The steps of the k-means algorithm are given below.

 Select randomly k points (it can also be examples) to be the seeds for the centroids of k clusters.

 Assign each example to the centroid closest to the example, forming in this way k exclusive clusters of examples.

 Calculate new cent roids of the clusters. For that purpose, average all attribute values of the examples belonging to the same cluster (centroid).

 Check if the cluster centroids have changed their

―coordinatesǁ. If yes, start again from the step (ii). If not, cluster detection is finished and all examples have their cluster memberships defined.

Usually this iterative procedure of redefining centroids and reassigning the examples to clusters needs only a few iterations to converge. For a discussion on cluster detection see.

To summarise, clustering techniques are used when there are natural grouping in a data set. Clusters should then represent groups of items that have a lot in common. Creating clusters prior to application of some other data mining technique might reduce the complexity of the problem by dividing the space of data set.

This space partitions can be mined separately and such two steps procedure might give improved results as compared to data mining without using clustering.

RESULTS

Page 459

International Journal of Engineering Research

Volume No.4, Issue Special 5

Fig .1. Classification results of pixel based image. The first image shows the input images, second image shows the k-means clustering of iteration 1 and the third image shows the k-means clustering of iteration 2, and finally classification results of

ISODATA clustering is displayed. k-means clustering, iterative self-organizing data analysis technique (ISODATA) clustering and pixel-Oriented Classification (Fuzzy Based) are examined

.On the one hand, the qualitative evaluation is examined in terms of the semantics of the segmentation results. On the other hand, the results are also quantitatively evaluated in terms of the purity and entropy.

CONCLUSION

k-means clustering, iterative self-organizing data analysis technique (ISODATA) clustering and pixel-Oriented

Classification (Fuzzy Based) are examined .On the one hand, the qualitative evaluation is examined in terms of the semantics of the segmentation results. On the other hand, the results are also quantitatively evaluated in terms of the purity and entropy.

REFERENCES i.

KanXu,Wen Yang, Gang Liu, and Hong Sun, (2013)―Unsupervised

Satellite Image Classification Using Markov Field Topic Model ‖.

ii.

Delong. A, Osokin.A, Isack H.N, and Boykov.Y,( 2010) ―Fast approximate energy minimization with label costs, ‖ in Proc. IEEE Conf. Comput.

Vis. Pattern Recog., pp. 2173 – 2180 iii.

Liénou.M, Maître.H, and Datcu.M, (2010), ―Semantic annotation of satellite images using latent Dirichlet allocation, ‖ IEEE Geosci.

Remote Sens.

Lett, vol. 7, no. 1, pp. 28 – 32.

iv.

Larlus.D and Jurie.F, Apr.(2009) ―Latent mixture vocabularies for object categorization and segmentation, ‖ Image Vis. Comput., vol. 27, no. 5, pp.

523 – 534 v.

MacKay.D.J.C,(2003) ‖Information Theory, Inference, and Learning

Algorithms ‖, vol. 8. Cambridge, U.K.: Cambridge Univ.,p. 12.

vi.

Memarsadeghi.N,Mount.D,Netanyahu.N.S, Le Moigne.J, and de

Berg.M,(2007) ―A fast implementation of the ISODATA clustering algorithm,ǁ

Int. J. Comput. Geom. Appl., vol. 17, no. 1, pp. 71 – 103.

vii.

Rosenberg and J. Hirschberg, ―V -measure: A conditional entropy based external cluster evaluation measure, ‖ in Proc. Joint Conf. EMNLPCoNLL,

2007, pp. 410 – 420.

viii.

Tang. W. Yi, H, and Chen.Y, ― An object-oriented semantic clustering algorithm for high-resolution remote sensing images using the aspect model ―May (2011) IEEE Geosci. Remote Sens.Lett., vol. 8, no. 3, pp. 522– 526.

ix.

Verbeek.J and Triggs.B, ―Region c lassification with Markov field aspect models, ‖ (2007) in Proc.

IEEE Conf. Comput. Vis. Pattern Recog., pp. 1 –

8.

x.

Yang.W, Dai.D, Wu.J, and He.C, ― Weakly supervised polarimetric

SAR image classification with multi-modal Markov aspect model, ‖ (2010) in

Proc. ISPRS, TC VII Symp. (Part B), 100 Years ISPRS — Advancing Remote

Sensing Science, Vienna, Austria, Jul. 5 – 7, pp. 669 – 673.

ICCIT15@CiTech, Bangalore

ISSN:2319-6890(online), 2347-5013(print)

19 & 20 May 2015

Page 460

Download