Design and Implementation of HTTP-Gnutella Gateway Baoning Wu (baw4) Wei Zhang (wez5)

advertisement
Design and Implementation of
HTTP-Gnutella Gateway
Baoning Wu (baw4)
Wei Zhang (wez5)
CSE Department
Lehigh University
Motivation




Peer-to-peer networking is a hot topic.
Can P2P nodes search and get files from Web
sites?
Can one P2P network search and get files from
other P2P networks?
In our project, we have built a special gateway
between Gnutella and Web sites.
Related Work





David McNab has launched Freenet search engine.
Asiayeah is a Gnutella search engine.
Filedonkey.com is an Edonkey search engine.
Kalepa Networks , Inc is doing work about connecting
different P2P systems.
Our work is kind of reverse to all above works.
Mechanism of Gnutella Searching




Node A sends a query to its neighbor B;
Node B boardcasts the query to its neighors C, D;
Node C has the objects node A needs and then returns a query hit
message to node B;
Node B forwards the query hit message by consulting the local
states.
Architecture of HTTP-Gnutella
Gateway
Mechanism of the gateway
1.
2.
3.
4.
5.
6.
7.
Node A broadcasts a query message directly or indirectly to the
HTTP-Gnutella gateway;
The HTTP-Gnutella gateway forwards the translated query
message to search engine;
The search engine returns a bunch of query results to the
gateway;
The gateway translates the results into Gnutella formats and then
forwards them to node A;
If node A initializes a download requests to the gateway, the
gateway will translate the Gnutella request into a well-formatted
HTTP request to the Web server;
The gateway fetches the data from the Web server;
The gateway forwards the data from the Web server to node A.
Handle Query Messages


We still use the original Gnutella mechanism to
judge whether to forward the message or not.
The gateway captures all of queries with hops# <
5 and sends them to search engine.
Search Engine API


Google search engine API has a limit of up to
1,000 requests per day.
Search engine API consists of three main
functions:
Query conversion
 Extraction of URLs
 Measurement of content size

Generate Query Hit Messages

Two considerations:
Let Gnutella nodes contact Web servers directly
Let the gateway work as a proxy



The gateway fills its own IP address and a specific
port number (currently 9999) in the query hit
messages.
File names are URLs of Web objects.
Downloading Service

Translate Gnutella download request into a wellformatted HTTP request. e.g.
GET /get/1234/http://www.foo.com/foo.mp3 HTTP/1.1
User-Agent: Gnutella
Host: 123.123.123.123:6346
=>
GET http://www.foo.com/foo.mp3 HTTP/1.1
User-Agent: Gnutella
Host: www.foo.com


It should handle Gnutella handshakes properly.
It also records the bytes transferred.
Problems & Solutions

Irregular handshakes


File size


We handle all possibilites
We use HTTP HEAD request to get file size
Broken Pipe signal

We use forked process
Experiment Results




Outline
Basic verification and validation
Log file format
Results #1 to #4
Basic Verification & Validation


Run our special gateway on machine 1 and run a
normal gtk-gnutalla client on machine 2. After
machine 2 connects to machine 1, we use machine
2 to send query messages and downloading
request to machine 1.
For downloaded files from machine 1, we use
wget to get the same file from web server directly
and use diff to test if they are identical.
Log File Format




Log 1
Time stamp, MUID, IP address, Type, Query
Log 2
Time stamp, IP address, URL, Size, Code, Success
Results #1




No. of Query messages: 319,245
No. of Query Hit messages: 930,860
No. of served requests: 113,391
Average Response Time: 16.33 seconds
Result #2
100000
number of requests
90000
80000
70000
60000
50000
40000
30000
20000
10000
0
1
2
3
4
5
6
7
number of responses
8
9
Result #3







No. of Downloading requests: 952
No. of Different IP addresses: 67
No. of served Requests: 945
No. of sucessfully served requests: 740
Total size transfered: 244,227,881 bytes
Average response time: 3.15 seconds
Average total download time: 15.92 seconds
Result #4
number of downloaded files
120
110
100
90
80
70
60
50
40
30
20
10
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
different sites
Future Work



Support a variety of file types and measure their
popularity
Build a gateway to connect different P2P systems
Deployment of such gateways
Conclusion



An HTTP-Gnutella gateway was built and worked
for the Gnutella users.
Only 5 days, the gateway transferred about
244MB data from the Web sites to the Gnutella
nodes.
The systems achieved all goals of our design.
Question?
Download