View Single Post
  #1 (permalink)  
Old September 26th, 2002
arne_bab's Avatar
arne_bab arne_bab is offline
Draketo, small dragon.
 
Join Date: May 31st, 2002
Location: Heidelberg, Germany
Posts: 1,881
arne_bab is a great assister to others; your light through the dark tunnel
Default [0.7 proposal] Count search replies

I already posted this once, but a typo in the subject has clearly kept most people from ever looking inside the post.
This now under a new subject, I hope you forgive me posting the same twice.

Summary: Insert a Query Reply Count (QRC) into the queries.
Contacted hosts go down for popular files.
Not so many replies, which never get seen.
Stops *.mp3 query flood without ever forbidding it.

-----

You could include the number of search replies a query already got in the forwarded query.

That would sort out *.mp3 (and similar requests) without ever needing to forbid them and it would enable users to find rare files more easily, while still keeping the network traffic down, when searching for popular files.

I got that idea, when I calculated how many host you can reach, if every node just has 3 connections and you have a max-hop of 7. Less than 3000.
I also calculated, that using a max-hop of 7 you'd need over 5 Connections to be able to reach all gnutella-users (about 150.000).

With HTL=7 you'd need over 17 Hosts per node to reach every computer user of this world. With HTL=15 you'd need less than 5 (4.5)

The way I suggest may allow you to increase the HTL again without choking the network.

As kazaa might go down soon gnutella could get much more users => more connections to find most files, especially rare. Another approach would be to increase the HTL, but that would slow down the network, as popular files would get replies and more replies, which aren't really needed.

So I got the idea, that you can reduce the number of hosts which are contacted for popular files by setting the number of replies each query sent out gets.

The Query reply count would be included into each query. When a node sents the replies it counts them and adds them to the qrc. If that qrc is greater than 10 afterwards it doesn't forward the query.

Here are some sample calculations:

For the first one I assume, that every user has 3 connections and every third user has some of the requested files. To make it more easily I set the qrc to 1. The HTL is 7.
The number of hosts contacted without the QRC would be 3^7 = 2180 Hosts.
With the qrc you have 2^7 = 128 contacted hosts and 2059 not contacted hosts, that meanse:
If
HTL=7
Hosts=3
popularity is 1/3
contacted hosts: o=128
not contacted hosts: x=2059

The same for
HTL 7
Hosts 6
Pop=1/3:
contacted Hosts: o=16.384
not contacted hosts: x=263.552

If the popularity is smaller you have:
HTL=7
Hosts=3
Pop=1/6
o=610
x=1576

or:
HTL=7
Hosts=6
Pop=1/6
o=78.125
x=201.811

That means if every 3rd User has the file you contact only 6.2% of the nodes, which are in your reach.
If every 6th has it you contact only 39%.

That number increases a bit, as you want more than one reply for each request.
If every hundredth has files you'd contact 93% of the Hosts.

Requests like *.mp3 stop at the first nodes and get at most replies from 5 hosts (the direct neighbors).

That trick would enable you to scale gnutella up again without creating too much traffic. Naturally it depends like everything on users sharing.

Finishing summary: Query Reply Count (QRC) into the queries.
Contacted hosts go down for popular files.
Not so many replies, which never get seen.
Stops *.mp3 flood without ever forbidding it.

at last, I add a chart, which shows its working for popularity = 1/3 and hosts per node=3.
(.gif)(please ignore the "half-transparent"-parts. Only the black lines count)
Arne Bab.

Comments? Are there errors in the calculation? Something else?
Attached Images
File Type: gif qrc-chart.gif (13.2 KB, 326 views)
Reply With Quote