Peer-to-Peer Resource Detection by Random Walkers Using Rateless Codes Based


International Journal of Application or Innovation in Engineering & Management (IJAIEM)

Web Site: Email:

Volume 3, Issue 3, March 2014 ISSN 2319 - 4847

Peer-to-Peer Resource Detection by Random

Walkers Using Rateless Codes Based on Interest Clusters



and R.Jayaraj



M.Tech scholar, Computer Science &Engineering Department, SRM University, Chennai


Assistant Professor, Computer Science &Engineering Department, SRM University, Chennai


The Peer-to-Peer network is a loosely coupled application for efficiently sharing the resources in the overlay network. They are classified into two categories as predefined structured network and Churn unstructured network. The unstructured network has higher churn rates because peers may join or leave the network simultaneously. To enhance the search performance in the unstructured network our proposed method uses the random walk method implemented within the subnet of Interest based clusters. The main objective of this searching technique is to reduce the cost of searching the resources in larger network size. By using the random walk principle and rateless coding method continuous flow of control packets are exchanged among the nodes in the network. Rateless decoding mechanism is provided which is able to cope with asynchronous information updates in churn network.

Keywords– Rateless codes, Interest cluster, Luby Transform Codes, Random Search strategy.


A Peer-to-Peer network is a decentralized and distributed network in which the individual node acts as both the suppliers and consumers of resources. Overlays are used for indexing and peer discovery, and make the P2P system independent from the physical network topology. Based on how the nodes are linked to each other within the overlay network, and how resources are indexed and located, we can classify networks as unstructured or structured. Unstructured peer-to-peer networks do not form a particular structure on the overlay network by design, but they are formed by nodes that are randomly connected to each other. Query resolution can be extracted as one peer can get the global details of all the peers within subnet interest cluster.

The users having the same interest are grouped together as a cluster based on the CAC-SPIRP method. The Content

Abundant Cluster(CAC) contains peers with self identity, and they are organized themselves into a pool of objects that are frequently accessed in that community. In order to improve the search quality a method called Selectively Prefetching

Indices from Responding Peers (SPIRP).It provides the generic search method .In this technique a query is first routed to the Content abundant cluster, if there exist a solution for the query, the responding peer send its index to the requesting peer. Thus it avoids the network overhead avoiding congestion problem. Thus it reduces the query response time and the cost of searching the query in the p2p network. Communication among nodes is constrained by a setting maximum transfer unit for packets.

A local view of global information states is required by the nodes in the cluster since rendezvous points maintained in the system provides the information for initial list of nodes to contact. By random walk principle every node in the network transfers the packets in controlled manner continuously to its neighboring node. Thus each node receives the packet structure and it is updated with its local information by rateless coding and forwarded to neighbours randomly. Thus any node can retrieve the global data.

Our contribution

The contribution of this paper is three-fold:

1. Our proposed model has the active nodes with its cluster vector information having the interest based clusters maintained by the super nodes which has many normal nodes.

2. When a query is sent to the network it first searches its neighbouring nodes within clusters, if not found it searches active node with its cluster vector information and directs the query to particular cluster.

3 .Random network coding is formed within the clusters in which the rateless codes performs XOR operation and carries information to other nodes in network. Since the information changes asynchronously, they are kept static for a particular time period until all the updates from the nodes reaches the destination nodes message queue. Thus it facilitates the decoding mechanism to cope with asynchronous updates in each and every node.


There are many problems faced by the previous works for data gathering and resource searching with various tools and techinuques.

Volume 3, Issue 3, March 2014 Page 122

International Journal of Application or Innovation in Engineering & Management (IJAIEM)

Web Site: Email:

Volume 3, Issue 3, March 2014 ISSN 2319 - 4847

In the previous work, [6] Gossip Algorithm gathers the data by Classical Network coding method which is a Linear combination of packet is send to each node and the cost of coding and decoding is higher and it does not maintain consistent requirement .It does not create backlogs for update information in each node, suppose a node crashes the update information is lost. The gossip messages are redundant .[7]The use of feedback mechanism makes the process to wait for long time to complete the task. The advantage of this system is it uses the structured multicast protocols explicitly build and use a spanning tree to travel along its paths, such that each message is received exactly once at each node. Unfortunately, when failures occur in any one node , the tree must be rebuilt.

[1] Gnutella is the example of unstructured and decentralized P2P network which uses the blind search technique that has the flooding mechanism for searching the network. .Here searching the same file after a mean time we have to send a request query again to all of its peers in the network, so it consumes more time searching the same resource again and again in the network.

[8] The Equation Based Adaptive search (EBAS) algorithm uses the feedback-based algorithm for maintaining popularity estimate of the resources in the network with the help of feedback from previous queries and its solution. It require more time to analyze the feedback result of each node and makes the process complex.

Our proposed system random walkers carry the rateless packets to each node in the network within the cluster and performs a simple XOR operation. The cost of encoding and decoding the information is very low, since it does not require any complex operation or algorithms.

The searching technique is made efficient and easier by dividing the network into different subnets based on interest clusters, and query is directly routed to any one of the cluster based on the cluster vector information.

A degree is set by each node randomly, and the node travels around the cluster network ,the time to live of a packet is initialized suppose if the nodes still exist in the network, the acknowledgment is sent back again stating that random walker is still alive, Hence the timer is reset, and the process is started again.


In this paper we model the interest clusters based on the user’s interest and the user’s common interest pattern is recognized which makes the searching process more efficient. The active node maintains the details of the super node containing clusters. The super node maintains the details of the normal node. Initially when a query is requested by a node in the cluster, it checks its own cluster, if not it is routed to other super node, still not found, finally routed to active node and search the cluster feature vector in a network. At start up all other peers are interested in retrieving the information and cooperate to obtain it. Everypeer stores the coded packets in its output buffer, so that it is sent to randomly selected peer. We have to make sure that the received packet should not be linearly dependent on the previously received packet for time consumption and send packets without dependency. Each peer is allowed to combine and forward packets from its output buffer.Each and every peer in the network are characterized by their upload ( Bu ) and download

( Bd ) bandwidth. The bandwidth is expressed as packets per second for every peer with the given packet size.

3.1 Construction of Interest Based Clusters

(1)When a new node needs to join the network, it is allowed to join the interest cluster based on detection process, and finally the node will make itself become a super node. A new domain is created based on interest, which is the first node of the cluster, which means the cluster should not exist previously. The Super node select its nearest router which is the active node act as backbone network, and then store its interest cluster’s feature vector in that active node.

Cluster feature vector c i

( c i ,1

, c i ,2

, … , c i , f

) is the average value of the feature vector of all the nodes in the cluster, such as c i


= ∑ w j



(1) where k means the number of the total nodes in the cluster, and w j

is the feature vector of the node j .

(2) If a node leaves the cluster, and it needs to join again in the network, it is not the first node, it already exist, but due to churn activity peers may join and leave the network simultaneously. Hence it can find its active node from the cluster vector feature present in it, and sends a join application to the active node.

(3) Before joining the node to the network, the active node calculates the similarity between the cluster’s feature vector and the new adding node’s. The threshold limit is calculated for each incoming node and finally the node will be added to the cluster if the similarity is greater than the given threshold limit. The new adding node only have to select the largest similarity cluster to join. The new adding node will contact with the super node of the cluster to complete its registration and calculate the similarity with the other nodes in the cluster when it is accepted by an active node. Finally, super node which is the new added node will select the first N nodes as its neighbors according to the similarity between the new adding node and the nodes in the same cluster provided by the super node.

(4) Each active node have some limitation to join the clusters in it. According to that if the number of clusters in the the active node exceeds the upper limit, the new adding node will select the nearest node from the current active node in the

Volume 3, Issue 3, March 2014 Page 123

International Journal of Application or Innovation in Engineering & Management (IJAIEM)

Web Site: Email:

Volume 3, Issue 3, March 2014 ISSN 2319 - 4847 backbone network to join. If there is exist only active node in the network and the corresponding new node’s cluster vector information is not present in any of the active node, then the new node is elected as the super node and a new interest based cluster is created in which the incoming nodes join the cluster based on its specific interest.

(5) If the new adding node can’t find a suitable cluster in the current active node which it wants to add in, then it checks the table of neighbor active node to find its cluster vector information, it searches all the neighbor active node table continuously still the match is found, finally the vector information is not found in any of he active node table it is selected as the super node and a new domain is created. The structure of network topology which follows the mesh topology structure is depicted in figure.1.

Figure 1.

Node Structure

3.1.1 The structure of nodes in the network

Active node is termed as router which plays a special role in the backbone network, it manages he key information of the interest cluster, such as the limitation of the cluster in the active node ,the super node table ,the cluster vector information of all the super nodes attached to the active node, and its routing function. Furthermore, the active node also maintains the neighbor active node table which is used in case of absence of current active node for a mean time by churn activity. It also maintains the forwarding information table. Thus it follows the Mesh topology structure which redirects the query to the neighbor node in the absence of actual active node due to various reasons. The information provided by the active node is shown in table 1.


Local resource table

The table of the super node belong to the active node

Neighbour active node table

The forwarding information table

The super node is a representative node of a cluster. It has the information of each normal node in the cluster. The interest feature vector of the normal nodes is maintained in each super node. It also maintain the details of other super nodes which are present in the same active node, to facilitate the efficient resource search in absence of main super node.

The super node have the backlog of its table structure, in order to provide the details of the same to the newly elected super node among one of the normal node in the absence of the super node. The information provided by the active node is shown in table 2.


Backup super node table

Local resource table

Neighbour super node table belong to the same active node

The table of normal node belong to the super node

The forwarding information table

Normal node is mainly responsible for maintaining the detailed resource information to be shared in the network. They share the information within the cluster using the rateless coding method which is further described briefly in the section below. The normal node also provides the super node table and the forwarding information table. The normal node have the ability to act as a super node in the absence of the super node. It gets the details of other normal node table from the backup of the super node table maintained. The information provided by the normal node is shown in table 3.

Volume 3, Issue 3, March 2014 Page 124

International Journal of Application or Innovation in Engineering & Management (IJAIEM)

Web Site: Email:

Volume 3, Issue 3, March 2014 ISSN 2319 - 4847


Local The super node table Neighbour The forwarding resource table which takes charge of normal node normal node table information table

3.1.2 Maintenance of the Interest Clusters

The maintenance strategy of the nodes is given as follows:

(1) Within the cluster nodes may leave or join the network simultaneously, hence the periodic election is conducted often among the super nodes. When super node leave the network, normal node is elected as the new super node which obtains all the necessary details from the backup super node table. It takes charge of controlling the overall structure within the cluster. In addition to this, the backup node will stand up and hold election to produce a new super-node when the older super one exits.

(2) The nodes in peer system have high churn rates thus nodes may leave and join simultaneously, hence the neighbor table is modified according to the updated information of nodes. Asynchronous update may happen simultaneously within cluster which is to be noted on every activity. The TTL is determined for each packet in the cluster, if it exceeds the random walker say it is still alive the acknowledgment timer is set again and coded packets are forwarded within the nodes with the updated information.

(3) The new node is elected as super node when the backup node is abnormal. The normal node communicates with each other that the backup nod eis abnormal and one among us have to be elected as super node to avoid the problem.

(4) Finally if a new super is elected it communicates with its active node about its cluster vector information and its neighbor super node table information.

3.1.3 Random Search Strategy

(1)Initially a query begins from the normal node, it searches its neighbor node within the same cluster, if no successful results are obtained it is forwarded to the super node for processing.

(2) When a super node received a query message from the same cluster, it firstly searches the resources in the local storage, and will forward the query message to its neighbor super nodes if the related resources are not found. Finally, if the related resources are not found in all the super nodes and the corresponding normal nodes, the super node will hand in the query to the active node.

(3) The active node will firstly search the resources in the local storage if the query comes from its super nodes, and relay the query to its neighbor active nodes if no successful reply returns.

3.2 Peer States

Each peer in the network have the following states in its lifecycle:

1.WAIT: The peer is said to be in the waiting state if it does not receive any coded packets. It changes to the DECODER state if it receives all the necessary packets in the network.

2.OFF: The peer does not respond to any incoming packets to distribute the data, and it is assumed that it has received all the coded packets and decoded all the k packets successfully.

3.SEEDER: The peer has already received and decoded the k information packets successfully, but some of its neighbors are still in the DECODER state. In this case, the peer generates new Luby Transform coded packets, saturating the upload towards its neighbors. As soon as all its neighbors have decoded the original information, the peer changes its state to


4.DECODER: The peer is said to be in decoder state when it does not receive all the information to decode the original information. As soon as the peer receives enough packets to decode the information data, the peer signals such event to all its seeding nodes in order to stop them from pushing more coded packets and changes its state to SEEDER.

All the peers begin with the SEEDER state initially , while the other peers are set to WAIT state. All peers in the

SEEDER state encode the original data and send it to their neighbors in the DECODER state using the Robust Soliton

Distribution. All peers in the DECODER state run the On Fly Gaussian elimination decoding algorithm and progressively construct their generator matrix G , based on the generating equations of the received coded packets. At the same time, these peers insert only their essential packets in an output buffer OB.

3.3 Protocol Description

Throttling peers in DECODER state and combining packets before relaying to neighbours in DECODER state reduces the amount of duplicated packets while retaining the capability of spreading received packets as soon as possible. The comparison of several strategies in the DECODER state to relay the coded packets are discussed.


Store and Recover (SR): A peer does not forward any of the received coded packets that are used to recover the k original blocks. This means that a peer starts to forward packets only when it switches to the SEEDER state.

Volume 3, Issue 3, March 2014 Page 125

International Journal of Application or Innovation in Engineering & Management (IJAIEM)

Web Site: Email:

Volume 3, Issue 3, March 2014 ISSN 2319 - 4847


Relay (RE): At every transmission opportunity, a peer selects a packet in OB and forwards it. Such a packet is deleted from OB in order to relay it only once. The procedure is repeated until OB is empty or the upload capacity is saturated.


Random Relay(RR): A peer at every transmission opportunity randomly draws from OB enough packets to fully use its upload capacity and sends them to its neighbours.


Random Relay with Combinations (RRC): At every transmission opportunity the peer randomly draws from OB enough packets to fully use its upload capacity, it XORS them with a randomly chosen row of the decoding matrix and sends them to its neighbours. This amounts to combining the selected packet with a setoff previously received packets at the cost of a single packet XOR operation.

The aim of the RR and RRC strategies is to send as much information as possible the high utilization of upload bandwidth reduces the information data spreading time.

These strategies may be too aggressive, i.e. they could fill the network with too many duplicate packets. For this reason we consider two variants of previous strategies, namely TRR and TRRC, where the upload bandwidth of RR and RRC is throttled . In particular, at any transmission opportunity the number of relayed packets is limited to minimum.

3.4 Random walk Luby Transform coding

The sender encodes the information packet to be sent to the network. The Uncoded message blocks are divided into equal length containing n blocks. Each block is subjected to encoding process. The degree d , 1 ≤ d ≤ n , of the next packet is chosen at random. If M i

is the i th block of the message, the data portion of the next packet is computed as where { i


, i


, …, i d

} are the randomly chosen indices for the d blocks included in this packet. A prefix is appended to the encoded packet, defining how many blocks n are in the message, how many blocks d have been exclusive-ored into the data portion of this packet, and the list of indices { i


, i


, …, i d

}.Finally, some form of error-detecting code (perhaps as simple as a cyclic redundancy check) is applied to the packet, and the packet is transmitted.

3.5 Asynchronous Luby Transform Decoding

If the packet is redundant then its discarded. If the current received packet is clean of degree d > 1, it is first processed against all the fully decoded blocks in the message queuing area (as described more fully in the next step), then stored in a buffer area if its reduced degree is greater than 1.When a new, clean packet of degree d = 1 (block M i

) is received (or the degree of the current packet is reduced to 1 by the preceding step), it is moved to the message queuing area, and then matched against all the packets of degree d > 1 residing in the buffer. It is exclusive-ored into the data portion of any buffered packet that was encoded using M i

, the degree of that matching packet is decremented, and the list of indices for that packet is adjusted to reflect the application of M i

. When this process unlocks a block of degree d = 2 in the buffer, that block is reduced to degree 1 and is in its turn moved to the message queueing area, and then processed against the packets remaining in the buffer. When all n blocks of the message have been moved to the message queuing area, the receiver signals the transmitter that the message has been successfully decoded.


In this paper we have seen the recent advances of searching the resources in the unstructured networks, where the peers may join and leave the network simultaneously. Due to the higher churn rate it is very difficult task to search a particular resource in the network. To make the search process easier, the network is split into clusters based on user’s interest.

Instead of searching the entire network, the particular part of the network, that is the cluster alone is searched, which reduces query response time and the cost of searching the query. It overcome the problem of Blind Search method which uses the flooding mechanism.

The advantage of this system is random walkers use Luby Transform codes within the clusters only, since the asynchronous information updates happens within the cluster is known to super node alone. The active node maintains the details of the super node table alone, not further update activity happening inside the cluster, those details are maintained and updated only in super node table. This process reduces the network overhead.

In future the amount of electing the new node as super node has to be avoided, which increases the complexity of searching larger clusters, instead the new node has to be elected as normal node by adding some of its interest features to the vector information of the super node. The focus should be in developing the router’s cluster vector information with the feature of adding more relevant information about the clusters, in order to reduce the number of clusters added to the network and improve the efficiency of searching.


[1] V. Bioglio, M. Grangetto, R. Gaeta, and M. Sereno, “An optimal partial decoding algorithm for rateless codes,” in

IEEE International Symposium on Information Theory (ISIT) , aug 2011, pp. 2731

Volume 3, Issue 3, March 2014 Page 126

International Journal of Application or Innovation in Engineering & Management (IJAIEM)

Web Site: Email:

Volume 3, Issue 3, March 2014 ISSN 2319 - 4847

[2] P. Trunfio, D. Talia, H. Papadakis, P. Fragopoulou, Mordacchini, M. Pennanen, K. Popov, V. Vlassov, and S.

Haridi, “Peerto-peer resource discovery in grids: Models and systems,” Future Generation Computer Systems , vol

23, no. 7, pp. 864–878, 2007.

[3] S. Kim, K. Ko, and S. Chung, “Incremental gaussian elimination decoding of raptor codes over BEC,” IEEE

Communications Letters , vol. 12, no. 4, pp. 307–309, Apr. 2008.

[4] L. Alvisi and et al, “How robust are gossip-based communication protocols?” Operating Systems Review , vol. 41, no.

5, pp. 14–18, Oct. 2007.

[5] C. Gkantsidis, M. Mihail, and A. Saberi.” Random walks in peer-to-peer networks”. In Proc. of IEEE INFOCOM ,

Mar. 2004.

[6] S. Floyd, M. Handley, J. Padhye, and J. Widmer. “Equation-based congestion control for unicast applications”. In

SIGCOMM 2000 , pages 43–56, Stockholm, Sweden, August 200

Volume 3, Issue 3, March 2014 Page 127