View Single Post
  #7 (permalink)  
Old April 8th, 2002
Nosferatu's Avatar
Nosferatu Nosferatu is offline
Daemon
 
Join Date: March 25th, 2002
Location: Romania
Posts: 64
Nosferatu is flying high
Default Re: Re: Re: Re: Re: Re: Search based blocking and network clustering

Quote:
Originally posted by Smilin' Joe Fission
Until the system starts becoming abused. Yes, it will be abused.
Well, this is an opinion, not a fact. Below you ask me to present hard statistics.
Can you at least tell me what type of abuse you think is inevitable?
Quote:
Have you personally counted all of the Gnutella clients?
Why do you want me to do it personally? You trust me more than other sources?!
Quote:
Have you counted how many of those clients are active at one time? I can pull a number out of my butt too, but unless I can guarantee its accuracy, it's meaningless. I think you're severely overestimating the number of active Gnutella clients out there.
OK, looks like you got me - I was guessing.
Here is <A HREF="http://www.limewire.com/index.jsp/size">the real figure - it's around 300k - 1/2 million at the moment.</A>
I guess their guess is better than mine . It's around 30 times the size of the generally accepted average horizon size.
Quote:
So that gives you the right to break the network?
Break is your opinion, not mine. Can you at least put forward an argument to show how it will be considered 'broken' after this change in a way that it is not 'broken' now, or to a much larger degree?
Quote:
Think of it this way... I'm connected to the network, but I just happen to be connected to 10 hosts that are searching for terms that you deem unwanted. These 10 hosts I'm connected to are connected to other clients which support search blocking. Even if I'm searching for something legitimate, under your proposal, the clients searching for inappropriate terms will be disconnected. I and the thousand or more other hosts, being unfortunate enough to connect to these 10 hosts now have an ever shifting, ever changing horizon because they're being disconnected all the time.
OK, the scenario you have proposed seems unlikely to me and the logic is inconsistent.

You are assuming that the 10 randomly selected hosts you have chosen are ALL searching for something widely considered inappropriate. This is already unlikely, but no doubt will happen very very occaisionally.

By definition, because this term is widely considered inappropriate, and as you say you are searching for only things widely considered legitimate, then the next ten hosts you pick up are pretty much guaranteed to accept your searches and connections.
Quote:
So, you're not only hurting the ones initiating the improper searches, but you're also affecting the hundreds or even thousands of clients connected to them.
Degree of likelihood that all clients connected to one of these 'naughty' guys are ALL connected to ALL ten of them and no one else: 0.00000000..etc..00001 %
Considered 'impossible'.
Even if the impossible happened, all that would be experienced is everyone would for 10 seconds to a minute be searching for 10 new hosts. Since the other 'hundreds or even thousands of clients' who are in your impossible scenario all connected to these 10 'naughty' hosts are all receiving pongs through the 'naughty' hosts up until the time that the 'naughty' guys perform their 'naughty' search, they will already have knowledge of a great number of 'nice' hosts, so they should find a new one without even having to visit a host cache.

Remember, the above is not going to happen.

'Naughty' searchers are going to appear rarely, one at a time.

There is a scenario where what you describe is going to happen, which will be during start up, if a very wide number adopt the strategy of 'specialising' their searches.

Let's look at it this way. I will try to describe a reasonable, but worst-case, scenario, where you the user are not searching for something considered inappropriate by most people.

Say a very high number of people think specialist searches implemented using grep is a good idea, ie search me for iso files only, or search me for mp3s only. What do you think the upper limit would be? 40% of people might think this way? I think that is a very very conservative figure.

OK, for a very back-of-the-envelope kind of figure,

n * p = t
where
n: total number of trials
p: probability that a connection will not reject you when you search
t: target number of host connections

gives
n = t/p

If we say p = 0.6 (ie 60% 'good' connections, 40% 'bad')
and you want to keep up 10 connections,
n = 10/0.6
n = 16.67

On average, a user searching for something which the specialists reject, has to talk to 17-odd hosts at startup in order to establish 10 good hosts.

This does not consider any additional host-rejection scenarios.

We can generalise the answer, by saying that t = 1
n = 1.7
You have to connect to, on average, about 1.7 times as many hosts, if there are 40% of people wanting to specialise and you search for something else. 40% is an astonishingly high proportion, and you have said yourself that you don't think this idea will take off at all.

How about if we assume 20%, still a very high figure, but perhaps a realistic high-point.
p = 0.8
n = 1/0.8
n = 1.25

Only a 25% increase in number of initial host connections required at startup.

And as I said before, this ignores the effect of hosts caching hosts who are similar to themselves. (Perhaps this effect would be insignificant anyway until you have done a few searches).

Anyway, I guess this means it might be a good idea if when rejected by a host, if you have plenty of hosts in your cache, that you delete the rejecting host from your cache, thus increasing the chances that clients cache hosts with similar search/drop criteria.
Quote:
You're constantly changing their horizons.
Please demonstrate how this is a bad thing. Many of the clients I have tried have constantly changing hosts, ie constantly changing horizons. So what? So the clients I search in ten minutes are different clients to the ones I searched ten minutes ago? I think this is a good thing.

I am more likely to find a result, eventually. I re-search every ten minutes, and get a different group of results. I can still download from most of the machines I located ten minutes ago, if I don't find anything. The only ones I might not be able to download from are firewalled IPs.
Quote:
I do huh? Et tu? I suppose you're the master of statistics. Do YOU have the "statistics" to support your claims?
Maybe. Depends how badly you want them, how much time I have, and whether someone else provides an answer first.

<I>Added later: oops - confused between statistics and probability. No I do not have the statistics, but can model guessed probabilities - see later posting</I>
Quote:
Didn't think so.
You should let people answer before you answer them back.
Quote:
So I'm telling you what I think will happen to the network if your proposal is implemented. Unless you have the data to back up your claims, all you're doing is saying what you THINK will happen if your proposal is implemented.

Fine. Get back to me when that happens.
...
In other words "Don't bother saying anything unless you agree with me." That sure makes for a great debate.
Well, you I said what I think, you said what you think, I was simply saying, people don't flood the channel if you can't add anything not already covered. That makes for a good debate too.

Nos

"We can't train that boy as a Jedi because he is too old and too
full of fear"

Last edited by Nosferatu; April 9th, 2002 at 12:53 AM.
Reply With Quote