Official LimeWire 4.1 requests thread

verdyp · #11 (**permalink**) May 15th, 2004

Quote:

Originally posted by rkapsi
I've made a short test if it's worth to truncate the SHA1 URNs to 6 digits.

I vote against it: do you know the "birthday paradox" and is effect on strong hashes? (The birthday paradox is born from the fact that any classroom of only 24 children will have at least 1 pair of children with the same date of birthday; this is a proven mathematical effect).

It says that a cryptographically strong hash algorithm that can produce 2^n distinct values will happen to produce two identical hashes with a 50% chance by only hashing 2^(n/2) distinct files.

With 6 hex-digits, such hash would generate only 2^24 possible values, with a 50% chance of collision when hashing only 2^12 files (and provided that the hash algorithm is really cryptographically strong). That would mean hashing only... 4096 files before getting a collision.

If we say that there are about 200,000 hosts reachable at one time, and that each share a very modest average of 40 files, this means that we will need 200,000*40 possible distinct values, i.e. 4 millions (or 23 bits) with at most one pair of colliding files.

Add the connection time and the fact that there are millions of users of Gnutella which can introduce new files at any time, the need of distinct values goes over 2^32 possible hash values, and the hash must be twice larger (so at least 64 bits).

The cryptographic strength of SHA-1 is not 128 bits as you think but just above 64 bits (2 years ago it was estimated at about 96 bits, but cryptanalysis has shown that the strength was a bit lower). SHA-1 has still no been cracked, but it's one good reason why the European Union launched the NESSIE evaluation project and as well as the US government. An agreement was found with SHA-256 and Whirlpool... whose estimated cryptographic strength for now is at 192 bits. Tiger-160 was eliminated due to the evaluation time and implementation delay (its estimated 128 bits strength is not enough for the ten years that are coming). 128+160 bits "Bitprints" have a strength of about 192 bits, roughly identical to SHA-256.

Conclusion: we must not reduce the size of SHA-1 hashes to less than 128 bits...

et voilà · #12 (**permalink**) May 15th, 2004

Philippe you didn't understand... the Sha1 is normal at 32 digits but once it is computed we cut it to the first 6 digits to requery using less bandwidth. Yes there are more chances of collision, but the search result contains the 32 digits sha1 that the downloader can compare to the complete sha1 of the file he is downloading, so no downloads of collisions, only possible in search results. But in the end users get more chances to complete rare files and the network isn't under pressure as it would be with the complete sha1 requiery.

Do I have to understand that you are more for the kamdelia approach?? I'm feeling too that it is a less quick and dirty way to deal with requieries...

Ciao

verdyp · #13 (**permalink**) May 16th, 2004

Quote:

Originally posted by et voilà
Do I have to understand that you are more for the kamdelia approach??
I'm feeling too that it is a less quick and dirty way to deal with requieries...

There was no reference to Kademlia in my message, which just explains why 6-bytes hashes (24 bits) are very weak to identify files (some users may think that the 16 millions possible values it allows should be enough to identify all files available on Gnutella, when I just explain that it will be enough only to manage very small subsets of files)

OK I had not understood that you added the verification step after finding a reference. That's a good idea which is statistically correct. The fact that a 6-byte hash will produce collisions 50% of the time every 4096 randomly found files, means that the total risk of collision when querying a source will be below 1% (but it will not be null, and that's why the verification step is required!). So the overhead of veryifying the source with the complete hash will be extremely small face to the gain in bandwidth for locating the candidate sources.

This statistic optimization is similar to the statistic optimization performed in QRP with 16-bit hash values (64K tables) whose cryptographic strengh is about 6-bit (64 average values, enough to divide the traffic to 2% for moderately filled QRP tables; if QRP tables are filled at 25%, the collision risk is evaluated to roughly 80%, and QRP will be loosing its capacity to filter searches efficiently; however, for more than 80% of leaf nodes that have QRP tables filled below 10%, the collision risk for non-matching keywords falls under 25%, which means that QRP can filter 75% of queries sent to shielded leaf nodes, saving much bandwidth on UltraPeers, and allowing them to support more leaf nodes).

Matamoros · #14 (**permalink**) May 16th, 2004

Quote:

Originally posted by cmcnulty
The one thing that prevents me from installing LimeWire on more computers (my Mom's, for instance) is the *easy* ability to monitor incoming searches. Who uses this? I know it tried to filter the worst smut, but plenty still slips through. Why can't this be moved to the stats page?
-Cm

Usually one of the first things I do when I startup LW is shut this pane. I wouldn't miss it if it was moved somewhere else...

et voilà · #15 (**permalink**) May 16th, 2004

ok Philippe, glad you got my idea

I was referring to kamdelia as it is the other alternative to the sheme I made as a proposal for requieries... ie I want LW to ABSOLUTLY include requieries. I just want to discuss with folks what way is the best.

Ciao

et voilà · #16 (**permalink**) May 16th, 2004

Quote:

Originally posted by Matamoros
Usually one of the first things I do when I startup LW is shut this pane. I wouldn't miss it if it was moved somewhere else...

My behaviour at startup to!

verdyp · #17 (**permalink**) May 16th, 2004

Quote:

Originally posted by et voilà
ok Philippe, glad you got my idea I was referring to kamdelia as it is the other alternative to the sheme I made as a proposal for requieries... ie I want LW to ABSOLUTLY include requieries. I just want to discuss with folks what way is the best.
Ciao

I don't want requeries... that's a difference. However it could be useful for the "looking for more sources" search after a download fails. Querying with a full URN may not be necessary, when we can simply look for sources with truncated URNs. However the problem is how to route such queries: QRP table contents would need to be changed... So I wonder if instead, we couldn't search by lists of precomputed QRP hash values (instead of strings). This would not fit in a classic search string due to its 32-bit binary values (where the least significant bits are cleared depending on QRP table sizes). In this case a search would be non-empty if just looking for these QRP hashes without any search string, which could as well be a QRP hash of the full SHA-1 URN (acting like a truncated SHA-1). This would go into a Query GGEP extension (search by keyword hash). It would be interesting mostly for long QRP keywords like URNs. If the query contains two hashes, it is exactly like two keywords and should match with the same policy (AND). With queries containing 4 hashes or more, 2 hash values on 3 should be enough to match.

Although I don't like requiries, there's a way to handle the number of keywords in queries, to look for alternative subsets of keywords in separate queries.

et voilà · #18 (**permalink**) May 16th, 2004

You know Philippe I'm wary of requierys too, they could destroy the network

But since LW are clustered together it diminishes the impact of other bad behaving clients. Yes, QRP content would have to be changed to match to trunketed sha1. I'm thinking: once a leave requery, why wouldn't the UP keep it in mind and say send those accumulated hashes from all leaves in a big search twice an hour (this way UP could control requieries and the UP could struture those hashes in a way it could save bandwidth)? I'm proposing here as I'm not a protocol guru like you Philippe

The requieries I want are also automated not manual but they happen only when an host needs more sources (one query by hour per client max)

Also without them Gnutella will never be a good place to look for rare big files like videos, requieries would induce diversity in big files an advantage of emule and overnet.

verdyp · #19 (**permalink**) May 16th, 2004

If you want to contribute, there's a Chord subproject in LimeWire, which is stale since long and experimental, but can be the base of such search architecture for unique IDs.
In Chord, some hosts are responsible of managing routes for some "fingerbits". Those hosts should better be UltraPeers (they need to have long availability to minimize the cost of structure updates), but for now we don't have enough UltraPeers with the latest version to allow deploying this parallel network.
Now that LimeWire 4.0 will be officially released and that it will clearly demonstrate an advance of efficiency in Gnutella searches with a much reduced bandwidth usage, the LimeWire cluster of UltraPeers could become a good candidate to start experimenting with Chord (or other structured topologies).
Before that we need to see the impact on Gnutella of the deployement of LimeWire 4.0. The next weeks will be used to monitor the evolution of traffic and see how much we can gain on the overall network (and as well take the time to correct or fine tune some statistical or usage policy parameters).
For now, I'm not inclined to change the QRP algorithm and routing. It's probably better to think about a parallel topology with separate routing tables for such searches by URNs.
QRP has been done and tuned mostly for keyword searches, with a very weak distribution of keywords.
Including URNs in QRP had a negative impact on routing efficiency (but it has been compensated by the introduction of the low-TTL/high-degree topology in association with dynamic querying).
My opinion is that, if we can remove URNs from QRP tables, the network will be better, but we can't do that before a new more appropriate routing topology is made and tested for URNs, and then deployed, tuned and scaled.

sberlin · #20 (**permalink**) May 16th, 2004

Any chance someone can start up a new thread with the requery/DHT discussion, and/or copy the already-requested features in this thread to a new thread that I'll make sticky?...

It'd be very useful to _just_ have feature requests in a thread.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
official Limewire Communications warning msg.	thricipio	Open Discussion topics	28	January 13th, 2008 07:23 PM
Message :: Only search results with a [ ] are official LimeWire communications	lvlaxx	Download/Upload Problems	1	February 19th, 2007 09:09 AM
official Limewire Communications warning msg	thricipio	New Feature Requests	0	January 30th, 2007 10:07 AM
NEW Official LimeWire Forum for LW community devs ONLY!!!	Tamia	Open Discussion topics	8	March 14th, 2006 05:29 PM