BloomCast Efficient and Effective Full-Text

advertisement
SMART PRO TECHNOLOGIES
9885652333, www.smartprotech.net
BloomCast: Efficient and Effective Full-Text Retrieval in
Unstructured P2P Networks
ABSTRACT:
Efficient and effective full-text retrieval in unstructured peer-to-peer networks
remains a challenge in the research community. First, it is difficult, if not
impossible, for unstructured P2P systems to effectively locate items with
guaranteed recall. Second, existing schemes to improve search success rate often
rely on replicating a large number of item replicas across the wide area network,
incurring a large amount of communication and storage costs. In this paper, we
propose BloomCast, an efficient and effective full-text retrieval scheme, in
unstructured P2P networks. By leveraging a hybrid P2P protocol, BloomCast
replicates the items uniformly at random across the P2P networks, achieving a
guaranteed recall at a communication cost of
Þ, where N is the size of the
network. Furthermore, by casting Bloom Filters instead of the raw documents
across the network, BloomCast significantly reduces the communication and
storage costs for replication. We demonstrate the power of BloomCast design
through both mathematical proof and comprehensive simulations based on the
query logs from a major commercial search engine and NIST TREC WT10G data
collection. Results show that BloomCast achieves an average query recall of 91
percent, which outperforms the existing WP algorithm by 18 percent, while
BloomCast greatly reduces the search latency for query processing by 57 percent.
SMART PRO TECHNOLOGIES
9885652333, www.smartprotech.net
SYSTEM ARCHITECTURE:
EXISTING SYSTEM:
In the existing system there are two major issues . First, it is difficult, if not
impossible, for unstructured P2P systems to effectively locate items with
guaranteed recall. Second, existing schemes to improve search success rate often
rely on replicating a large number of item replicas across the wide area network,
incurring a large amount of communication and storage costs.
SMART PRO TECHNOLOGIES
9885652333, www.smartprotech.net
An existing p2p search schemes: DHT-based global index and federated search
engine over unstructured protocols.
DHT-based search engines are based on distributed indexes that partition a
logically global inverted index in a physically distributed manner.
Federated search engine over unstructured p2ps, queries are processed based on
flooding. Unstructured p2ps are commonly believed to be the best candidate for
supporting full-text retrieval because the query evaluation operations an be handled
at the nodes that store the relevant documents.
Replication strategies are extensively utilized to improve search performance in
unstructured p2ps. The first type is the query popularity aware strategies.
The second type of replication strategy is independent of the popularity of the
query, such as the WP scheme.
DISADVATAGES OF EXISTING SYSTEM:
– The exact match problem of DHTs, such schemes provide poor fulltext search capacity.
SMART PRO TECHNOLOGIES
9885652333, www.smartprotech.net
– Search recall is not guaranteed with acceptable communication cost
using a flooding-based scheme.
– The strategy is inefficient for solving insoluble queries, the queries for
rate items. The query frequency is difficult or even impossible to
obtain in a distributed p2p system. The existing replication strategies
need to replicate the full document across the network, raising
possibly unacceptable communication and storage costs.
PROPOSED SYSTEM:
In the proposed system, we propose a novel strategy, called BloomCast , an
efficient and effective full-text retrieval scheme, in unstructured P2P networks.
The query popularity independent replication strategy, we propose a novel
strategy, called Bloom Cast, to support efficient and effective full-text retrieval.
Bloom Cast are mathematically that the recall can be guaranteed at a
communication cost of O (square root N), where N is the size of the network.
ADVANTAGES OF PROPOSED SYSTEM:
– By replicating the encoded term sets using Bloom Filters instead of
raw documents among peers, the communication/storage costs are
SMART PRO TECHNOLOGIES
9885652333, www.smartprotech.net
greatly reduced, while the full-text multi keyword searching are
supported.
MODULES:
• Node creation
• Bloom cast replication model generation
• Bloom cast
• Bloom filter
• Query recall
MODULE DESCRIPTION
• Node creation
• To retrieve the full-text efficiently we have created nodes in the
p2p networks.
• Each node is sending documents randomly and uniformly in the
unstructured p2p network.
• By creating nodes in unstructured p2p networks it reduces the
communication and storage cost.
• Bloom cast replication model generation
• Replication model is generated based on the document replica
and query replica.
SMART PRO TECHNOLOGIES
9885652333, www.smartprotech.net
• Bloom cast replica is estimated by the number of nodes having
replica of document and query.
• By using this replication count we evaluate the search success
rate of query searched by the user.
• Bloom cast
• Bloom cast is generated based on network size estimation, node
subset sampling, replication protocol, query evaluation.
• Network size is estimated by DHT subsystem which maintains
the local repository of replicas.
• After that we assign the sub nodes to reduce the cost and
storage.
• Query evaluation is estimated by optimum number of query
replication randomly distributed in network.
• Bloom filter
• Bloom filter maintains the hash table for document replica and
query replica.
• Bloom filter reduces the memory storage and search engines
efficient and effectively for full-text retrieval.
• Query recall
• The recall will produce the replica and Bloom filter without any
loss.
• Query recall will retrieve full-text in unstructured p2p network
and reduces communication cost and storage cost.
SMART PRO TECHNOLOGIES
9885652333, www.smartprotech.net
• It retrieves the data quickly and satisfies the user requirement.
SYSTEM CONFIGURATION:HARDWARE REQUIREMENTS:-
 Processor
-Pentium –III
 Speed
- 1.1 Ghz
 RAM
- 256 MB(min)
 Hard Disk
- 20 GB
 Floppy Drive
- 1.44 MB
 Key Board
- Standard Windows Keyboard
 Mouse
- Two or Three Button Mouse
 Monitor
- SVGA
SOFTWARE REQUIREMENTS:-

Operating System
: Windows95/98/2000/XP

Front End
: Java
SMART PRO TECHNOLOGIES
9885652333, www.smartprotech.net
REFERENCE:
Hanhua Chen, Member, Xucheng Luo, Yunhao Liu, Tao Gu, Kaiji Chen, and
Lionel M. Ni, “ BloomCast: Efficient and Effective Full-Text Retrieval in
Unstructured P2P Networks”, IEEE TRANSACTIONS ON PARALLEL AND
DISTRIBUTED SYSTEMS, VOL. 23, NO. 2, FEBRUARY 2012.
Download