Design and Implementation of HTTP-Gnutella Gateway Baoning Wu (baw4) Wei Zhang (wez5) CSE Department Lehigh University Motivation Peer-to-peer networking is a hot topic. Can P2P nodes search and get files from Web sites? Can one P2P network search and get files from other P2P networks? In our project, we have built a special gateway between Gnutella and Web sites. Related Work David McNab has launched Freenet search engine. Asiayeah is a Gnutella search engine. Filedonkey.com is an Edonkey search engine. Kalepa Networks , Inc is doing work about connecting different P2P systems. Our work is kind of reverse to all above works. Mechanism of Gnutella Searching Node A sends a query to its neighbor B; Node B boardcasts the query to its neighors C, D; Node C has the objects node A needs and then returns a query hit message to node B; Node B forwards the query hit message by consulting the local states. Architecture of HTTP-Gnutella Gateway Mechanism of the gateway 1. 2. 3. 4. 5. 6. 7. Node A broadcasts a query message directly or indirectly to the HTTP-Gnutella gateway; The HTTP-Gnutella gateway forwards the translated query message to search engine; The search engine returns a bunch of query results to the gateway; The gateway translates the results into Gnutella formats and then forwards them to node A; If node A initializes a download requests to the gateway, the gateway will translate the Gnutella request into a well-formatted HTTP request to the Web server; The gateway fetches the data from the Web server; The gateway forwards the data from the Web server to node A. Handle Query Messages We still use the original Gnutella mechanism to judge whether to forward the message or not. The gateway captures all of queries with hops# < 5 and sends them to search engine. Search Engine API Google search engine API has a limit of up to 1,000 requests per day. Search engine API consists of three main functions: Query conversion Extraction of URLs Measurement of content size Generate Query Hit Messages Two considerations: Let Gnutella nodes contact Web servers directly Let the gateway work as a proxy The gateway fills its own IP address and a specific port number (currently 9999) in the query hit messages. File names are URLs of Web objects. Downloading Service Translate Gnutella download request into a wellformatted HTTP request. e.g. GET /get/1234/http://www.foo.com/foo.mp3 HTTP/1.1 User-Agent: Gnutella Host: 123.123.123.123:6346 => GET http://www.foo.com/foo.mp3 HTTP/1.1 User-Agent: Gnutella Host: www.foo.com It should handle Gnutella handshakes properly. It also records the bytes transferred. Problems & Solutions Irregular handshakes File size We handle all possibilites We use HTTP HEAD request to get file size Broken Pipe signal We use forked process Experiment Results Outline Basic verification and validation Log file format Results #1 to #4 Basic Verification & Validation Run our special gateway on machine 1 and run a normal gtk-gnutalla client on machine 2. After machine 2 connects to machine 1, we use machine 2 to send query messages and downloading request to machine 1. For downloaded files from machine 1, we use wget to get the same file from web server directly and use diff to test if they are identical. Log File Format Log 1 Time stamp, MUID, IP address, Type, Query Log 2 Time stamp, IP address, URL, Size, Code, Success Results #1 No. of Query messages: 319,245 No. of Query Hit messages: 930,860 No. of served requests: 113,391 Average Response Time: 16.33 seconds Result #2 100000 number of requests 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 1 2 3 4 5 6 7 number of responses 8 9 Result #3 No. of Downloading requests: 952 No. of Different IP addresses: 67 No. of served Requests: 945 No. of sucessfully served requests: 740 Total size transfered: 244,227,881 bytes Average response time: 3.15 seconds Average total download time: 15.92 seconds Result #4 number of downloaded files 120 110 100 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 different sites Future Work Support a variety of file types and measure their popularity Build a gateway to connect different P2P systems Deployment of such gateways Conclusion An HTTP-Gnutella gateway was built and worked for the Gnutella users. Only 5 days, the gateway transferred about 244MB data from the Web sites to the Gnutella nodes. The systems achieved all goals of our design. Question?