Gnutella Forums - Searching the Smart way

Gnutella Forums (https://www.gnutellaforums.com/)

- General Gnutella Development Discussion (https://www.gnutellaforums.com/general-gnutella-development-discussion/)

- - Searching the Smart way (https://www.gnutellaforums.com/general-gnutella-development-discussion/6806-searching-smart-way.html)

Stigeide

December 31st, 2001 07:03 AM

Searching the Smart way

In an rather old article, it is suggested that 70% of the servants are not sharing:
http://www.firstmonday.dk/issues/iss...dar/index.html

Think of it - if the 70% figure is right, then 91% of the bandwith consumed by serchrequests today is wasted. Why? Because 91% (1 - 0.3*0.3) of the requests goes either from freeloader to freeloader, from freeloader to sharer or from sharer to freeloader. All of these should be shaved away.

In my opinion - this is the area where Gnutella clients could develop the most. I am not a developer of Gnutella clients, I just have some ideas:

Don't send serchrequests to Freeloaders. That is, you should not drop the freeloaders from your list of connected hosts. Doing that would damage the network. But, you should test and rate the hosts with some methods; Brainstorming - Send a search that is build up using frequent words in the users shared directory and/or earlier searchs, set the TTL to 1 and register the number of files returned. Using this number as a rating, the hostlist should become better and better for the user.

So, what happens with the freeloaders (the "glue" of our community)?
They would be given a rating of 0, using the above mentioned method. That would put them in the same group as those who share files that I am not interested in.
You should allow them to connect to your servant. You should answer and forward their serches. But you should never send the users searches to them.

So, what would happen if all the clients used this strategy?
I think the searches would become much more efficient and everyone would live happily ever after.

Stig Eide

Moak	December 31st, 2001 07:44 AM

bad idea

You will partly stop routing search queries and eighther make the network inoperable or less operable. If a freeloader is between you and a sharer of a file you want, you never get it with your routing rules.

Superpeers and query caches can help to reduce traffic, while freeloader are usually modem users they can be shielded behind a super peer. See also my list of Anti-freeloading features for alternative ideas against freeloading http://www.gnutellaforums.com/showth...9&pagenumber=3

Greets, Moak

Stigeide

December 31st, 2001 08:54 AM

Quote:

Originally posted by Moak
You will partly stop routing search queries and eighther make the network inoperable or less operable. If a freeloader is between you and a sharer of a file you want, you never get it with your routing rules.

Why would the network be less operable by not routing searches to freeloaders? They will not respond anyway. Of course, they might route the request to the One, but the probability that they just routes it to another freeloader is 70%.
If everybody sendes the requests to those who are on their "buddy-list" (those who share files you like), then the number of hits will be much higher.
I am sure this can be proven both mathematically and by simulation, but today being the first day of the rest of my life, I could not bother ;)

Stig

Moak	December 31st, 2001 09:09 AM

Hmm without new statistics/simulation I doubt it. I think your idea will decrease horizon and availability, while super nodes together with caches will increase horizon.

Stigeide

December 31st, 2001 09:17 AM

What is a supernode? Isn't that just a centralized server in disguise?

Moak	December 31st, 2001 09:32 AM

Supernodes, Superpeers, Ultrapeers (different names, all the same)

The forum search will e.g. give you this thread: "Improving Gnutella performance"
http://www.gnutellaforums.com/showth...&threadid=5254

A very short summary about super peers and other ideas arround (if you're interested):

1. A superpeer concept for dynamic traffic routing = reducing backbone traffic + improves network toplogy + increases horizon (more available files)
2. Search-caches for reducing double/multiple routed traffic = reducing high amout of search backbone traffic
3. Swarming technology = make use of the high amout of wasted bandwith + will spread often requested files + balance load + less "busy" servants (more available files)
4. Add more ideas here.... brainstorming is allways fine :)

cultiv8r

December 31st, 2001 10:41 AM

A freeloader may provide a path to non-freeloaders. Even though it may also prove a path to freeloaders, cutting the path in its entire will cut off all of these non-freeloaders, decreasing the available files even more. Also remember that a freeloader may be connected to more than one other node, increasing this possibility even more.

Unregistered

December 31st, 2001 11:03 AM

Quote:

Originally posted by cultiv8r
A freeloader may provide a path to non-freeloaders.

I do not have anything against freeloaders. But I can not see why my suggestion would hurt them? I didn't propose to cut them off the network. I say that you should manage their searches as usual, but you should not send searches to them. How can they be hurt by that?

cultiv8r

December 31st, 2001 11:13 AM

If you don't send a search request to a freeloader, everyone else connected to that freeloader, whether non-freeloader or not, will never see that request, thus reducing the amount of responses you will get even more. That's why it will hurt.

Stigeide

December 31st, 2001 11:45 AM

It is true that those who is connected to freeloaders won't see the query.
But thats not an error its a feature! ;)

OK, time for some mathematics. Say you send out a query with TTL (time to live) 4. Lets say everyone is connected to 3 hosts.

The old, inefficient and stupid way:
Your query will reach 3**4 + 3**3 + 3**2 + 3**1 = 81 + 27 + 9 + 3 = 120. But since only 30% of these are sharing, the sharing hosts you reach is 36.
The network is fed with 120 queries, and you reach 36 hosts that have more than one file to share.

The new, efficient and Smart way:
Since only one of the three hosts is sharing, you only send the query to that host. He again, sends it only to the sharing hosts in his list as well. This means that you can send a query with a TTL of 120 and still cause as much network traffic as the old way. Since this will reach 120 sharing hosts, you will reach 3.33 times as many files as the old method.

This is obvious! ;)
Stig

Stgieide

December 31st, 2001 11:54 AM

In practice, the best way to manage this is to have two hostlists. One that you receive queries from (as today) and one that you send searchs to.
This new list should always become better as you drop those that do not respond to your searches and/or som automatic test-searches performed by the client, based on your shared directory.

As for the freeloaders, noone would want to use them on the list over hosts to send searches to. But that is good for them, less traffic over their modems.

Moak	December 31st, 2001 03:54 PM

Quote:

Originally posted by Stigeide
This is obvious! ;)

Yep, it's obvious that you will segment the network and get cut off from the files you want (you do not get more files, propably you get none). If you destroy members of a chain the whole chain will be destroyed.
IMHO you are treating the symptoms, perhaps it is better to encourage sharing as suggest in other threads.

cultiv8r

December 31st, 2001 04:32 PM

Keeping to your example (TTL of 4, 3 connected hosts at each node), you gave us this:

Quote:

3**4 + 3**3 + 3**2 + 3**1 = 81 + 27 + 9 + 3 = 120

And that's is absolutely correct. You're also correct that if 70% out of 120 were freeloaders, you'd only have a mere 36 nodes that could give you a possible response.

BUT, that is only in the current scenario, where you send a query to each connected node, regardless it is a freeloader or not. Under the scenario you are proposing, to refrain sending a query to a node known as a freeloader, you will end up with a different number.

If 70% of the 3 connected nodes would be a freeloader (2.1 ~ 2), then you'll end up with 1 (0.9) non-freeloader per 3 nodes. So:

1^4 + 1^3 + 1^2 + 1^1 = 10.

10 possible nodes, in comparision to 36 possible nodes is a drastic reduction in my opinion. In both your and my case, we're also assuming an even spread of freeloaders, which is obviously never the case. What if 3 out of 3 connected nodes turn out to be freeloaders - you'd not be sending out *any* queries to anyone.

However, I can agree that you could have a preference for nodes that seem to return more results on average, although you should never refrain from sending a Query message.

-- Mike

-- Mike

Stigeide

January 1st, 2002 07:33 AM

Anyways, thanks for your input.

Stig Eide

Pallando

January 1st, 2002 09:48 AM

Every Input is welcome! :)

cultiv8r

January 1st, 2002 10:53 AM

Just one more thing though. A TTL of 120 will not survive long, as most clients will drop messages after 7~10 hops.

-- Mike

blb	January 1st, 2002 08:24 PM

Also, say you are connected to three other nodes, and they are all freeloaders. Where does the search go now?

hermaf

January 2nd, 2002 08:43 AM

My measurements have shown that even if you use a TTL greater than 7 you won't get back packages with a TTL > 7 with very high probability ( I talk about 99.9% I haven't calculated that number exactly yet). So you horizon today is "aways" 7 hops in the gntella network.

So cutting out the freeloaders from your searches will give you less hits. Raising the TTL won't help. But as you might know there are other ways how to to priorotize a search/hit ... like eDonkey does.

@blb: Searches contain a TTL time, which is a number of how often the search is forwarded in the network. So the freeloaders will forward the Search if TTL > 1 ...

Stigeide

January 2nd, 2002 01:16 PM

First, I know that a request with TTL=120 will (thank God) not survive - it was just to make a comparison between the old, inefficient method and my Smart method. It is easier to compare the efficiency of two methods if they consumes the same amount of bandwidth.

Anyways, you have to admit that the current method of sending searches blindly is inefficient?

My (Smart ;) ) method would use two lists of hosts for each client:
One list to send and forward queries to.
This list should be cultiv8ted by hosts that responds to your searches. This way you have the great benefit of being close to hosts that hosts files that you want.
One list to receive queries from
This list, you should not care who is on. But you know that these hosts prefer your files, if they are using the Smart method.

My claim is, that this method will make the searches much more efficient because:
Searches is only send to those who actually have files.
The probability that the hosts that see your search will return a hit is bigger, because they are closer to you in "taste".

You can think of it as insiders and outsiders. The outsiders are freeloaders and sends the requests to the insiders. The insiders sends the requests to other insiders.
A cute picture:
http://www.geocities.com/stigeide/s.html

Peace!
Stig Eide

Moak	January 2nd, 2002 01:50 PM

Quote:

Originally posted by Stigeide
http://www.geocities.com/stigeide/s.html

Looks like superpeer concept :)

cultiv8r

January 2nd, 2002 05:32 PM

But think of the underlying issue though: bandwidth. At least, I think that is your concern here. The ones who carry the biggest burden with Queries, is those who cannot responde to them but merely broadcast them. So, that would always be the freeloader. In my opinion, that is not such a bad thing at all.

A much better solution to this issue would be to deny sending a result to a freeloader. Queryhits are usually bigger, even when the broadcasted Queries are summed up. By refraining from sending a Queryhit to someone known as a freeloader, you will:

1) reduce bandwidth
2) give the freeloader incentive to contribute

I've followed some discussions here about that as well. The best one I've seen, and also something I'm looking to implement in my client, is a "rating". When the user runs his Gnutella client for the first time, his rating is 50/50. But the more he dowloads without uploading, his ratio drops. Below a certain ratio, he/she is assumed a "freeloader". This data can easily be sent with a Query.

Obviously, there are a few drawbacks to that method. The first would be those who figure out what to edit to increase their ratio. The second would be where the user shares stuff that, in general, is not popular. There's a third one too, where the user could rename junk/text files to popular files in order to increase his ratio. However, with the introduction of what is known as HUGE (a hashing system), this will most likely not survive for long.

-- Mike

Moak	January 2nd, 2002 06:27 PM

Quote:

Originally posted by cultiv8r
A much better solution to this issue would be to deny sending a result to a freeloader. Queryhits are usually bigger, even when the broadcasted Queries are summed up.

Isn't this still the same? If you stop routing Query or Queryhits, whatever, you will cut off any user behind this freeloader and so harm the network.

Quote:

I've followed some discussions here about that as well. The best one I've seen, and also something I'm looking to implement in my client, is a "rating".

I have listed this under the terms of 'Mojos' (see my anti-freeloading list, section D). I think local Mojos (even hash based) could be allways faked by patching your client, by changing stored client statistics or simply by using a client that doesn take part at your Mojo system. Guess how long it would take until the first Mojo-Freeloading tool hits the network? Only remote or central hosted Mojos (e.g. similar to FTP-Server or IRC-based Fileserver) with a IP/user based ratio and authetification prevents cheating.
I personally do not believe in Mojos: IMHO Gnutella should be designed to stand against bad clients/abuse, be free as possible (think about the web, free and easy information access was the success) and we still have many other unused possibilites left to prevent freeloading.

Greets, Moak

cultiv8r

January 3rd, 2002 09:13 AM

Quote:

Isn't this still the same? If you stop routing Query or Queryhits, whatever, you will cut off any user behind this freeloader and so harm the network.

A Query is a broadcast message. A Query hit is a routed message, taking only the path it came from and destined to only the user that requested it.

It would only be "harmful" if other clients relied on Query Hits for passive searching.

Quote:

I think local Mojos (even hash based) could be allways faked by patching your client, by changing stored client statistics or simply by using a client that doesn take part at your Mojo system.

True. And that's also one of the drawbacks I had listed. There will always be someone who'll try to circumvent this. But there are a few things the cracker should keep in mind when doing so, and the most important one: is it really in his benefit? I mean, would it help him (her) by allowing freeloaders on the network, so eventually he won't find anything he wants anymore?

-- Mike

hermaf

January 3rd, 2002 09:32 AM

I agree with Mike pretty much , though I also doubt that there is a bulet-proof way for rating that can not be faked.

Since I am following the discussion here, I think somehow that we talk about 2 things: One is bandwidth the other is how to "do something against freeloaders" ... and I don't get the point what the two things have to do with each other ... at least freeloaders do not cause are "needed" to keep the network together (though they might lower your effective horizon cause of "unnecessary" hops = freeloaders in your network path).

I think the bandwidth problem is a Gnutella Protocol issue not a freeloader issue ...

Quote:

One list to send and forward queries to.
This list should be cultiv8ted by hosts that responds to your searches. This way you have the great benefit of being close to hosts that hosts files that you want.
One list to receive queries from
This list, you should not care who is on. But you know that these hosts prefer your files, if they are using the Smart method.

So if I get you here you wanna send the queries (incomming/outgoing) to the first list. Do you want to connect to these hosts as well? I mean then you might eliminate some freeloader paths" in your horizon. I mean then one could close the connection to a potential freeloader and ( I am brainstorming ;) ) that could leed to a new network structure where freeloaders are cut out by time if all clients act like this. On the other hand how do you want to rate a client whether he is a freeloader or not? I mean someone might provide files that you are not interested in at all though others might be?
Another thing: cutting out those who do not respond to your queries may lead to a "smaller network"/ smaller horizon for your client but this smaller horizon may have more clients with files you look for ... or am I fascinating totaly now :) ?

hermaf

January 3rd, 2002 09:36 AM

Quote:

But there are a few things the cracker should keep in mind when doing so, and the most important one: is it really in his benefit?

Hm ... should we really be afraid of some geeks? Or do you really think that there would be clinents out after a short time compromising the whole thing? Just asking ... you might be right.

Moak	January 3rd, 2002 09:43 AM

Quote:

Hm ... should we really be afraid of some geeks? Or do you really think that there would be clinents out after a short time compromising the whole thing? Just asking ... you might be right.

Just take the source code of an existing client, change 1-2 lines in the source code, and you have your freeloading tool again. It's that easy! Or even more easy... just take an old/existing client with has no mojos.

Moak	January 3rd, 2002 10:13 AM

Hi Mike!

Quote:

Originally posted by cultiv8r
A Query is a broadcast message. A Query hit is a routed message, taking only the path it came from and destined to only the user that requested it.

It is still the same. For example: If a freeloader is between you and me, I would never get your search results, because you regret to send it via the freeloader route, I will be cut off.

Moaky Moak

PS: I know I repeat myself, I believe superpeers and swarming will be a better solution to decrease bandwith and handle with "useless" nodes. I personally follow a way of encouraging people to share instead of tolerating freeloading. I see freeloading as a selfish symptom, ppl do it because they can't see the bigger context, that they hurt themselfs.

cultiv8r

January 3rd, 2002 10:19 AM

Quote:

Originally posted by hermaf

Hm ... should we really be afraid of some geeks? Or do you really think that there would be clinents out after a short time compromising the whole thing? Just asking ... you might be right.

I wouldn't say that 'afraid' is the correct word for this. But apparently, preventing widespread freeloading is a benefit to all Gnutella users, including those who would normally be freeloaders. If someone decides to take that benefit away, the cycle starts over, and the network will be diluted with 'empty' nodes again.

But cracking in general: whatever tickles your fancy :)

-- Mike

cultiv8r

January 3rd, 2002 10:24 AM

Quote:

It is still the same. For example: If a freeloader is between you and me, I would never get your search results, because you regret to send it via the freeloader route, I will be cut off.

Okay, I see your point, and you're correct. But I think you misunderstood me.

Assuming that there's a freeloader between you and me:

-- If you send a search request, I will still send a search result - because the destination is not the freeloader.

-- If the freeloader sends out a search request, neither you nor me will send a search result, because the destination is the freeloader.

I only refrain from sending the search hit if the *destination* is the freeloader. Anyone before or after this freeloader will still receive his search hits.

-- Mike

Moak	January 3rd, 2002 10:51 AM

Hi Mike,
ah I understand. Just to make sure, does this mean you want to block freeloaders from receiving any search result?

cultiv8r

January 3rd, 2002 12:07 PM

Moak: Yes.

Moak	January 3rd, 2002 02:32 PM

Can I summarize your idea as:
Let freeloader connect, but don't send/route search results targeted to them (drive them crazy).
*asking*

All times are GMT -7. The time now is 08:22 AM.