crawling: routing and bandwidth conservation Hi, I'm working on implementing a crawler which will eventually provide xcache-like functionality with some additional bells and whistles. I'm currently using jtella for the gnutella stack implementation.
For the implementation I'm researching ways to perform crawling by repurposing PONG/QUERYHIT messages without the need to flood the network with PINGs, which I think is a good thing as most crawlers waste precious bandwidth.
In my initial investigation I'm noticing a lot of traffic which, if my reading of the protocol is correct, I shouldn't be. Specifically when connecting to a single host (e.g. connect1.gnutellanet.com:6346), and not sending a single PING message, I'm being forwarded PONGs and QUERYHITs.
My question: is this expected behavior (PONGs/QUERYHITs are broadcast) or are there just a lot of broken implementations?
It's my understanding the only messages which are broadcast (i.e. sent to all connected servents except for the sender) are PING and QUERY messages. Additionally all other messages (PONG and QUERYHIT messages) are "routed" back to the sender (by recording the message GUID and sender and using this information for replying.
Are these assumptions correct? |