|
Register | FAQ | The Twelve Commandments | Members List | Calendar | Arcade | Find the Best VPN | Today's Posts | Search |
New Feature Requests Your idea for a cool new feature. Or, a LimeWire annoyance that has to get changed. |
| LinkBack | Thread Tools | Display Modes |
| ||||
Quote:
It says that a cryptographically strong hash algorithm that can produce 2^n distinct values will happen to produce two identical hashes with a 50% chance by only hashing 2^(n/2) distinct files. With 6 hex-digits, such hash would generate only 2^24 possible values, with a 50% chance of collision when hashing only 2^12 files (and provided that the hash algorithm is really cryptographically strong). That would mean hashing only... 4096 files before getting a collision. If we say that there are about 200,000 hosts reachable at one time, and that each share a very modest average of 40 files, this means that we will need 200,000*40 possible distinct values, i.e. 4 millions (or 23 bits) with at most one pair of colliding files. Add the connection time and the fact that there are millions of users of Gnutella which can introduce new files at any time, the need of distinct values goes over 2^32 possible hash values, and the hash must be twice larger (so at least 64 bits). The cryptographic strength of SHA-1 is not 128 bits as you think but just above 64 bits (2 years ago it was estimated at about 96 bits, but cryptanalysis has shown that the strength was a bit lower). SHA-1 has still no been cracked, but it's one good reason why the European Union launched the NESSIE evaluation project and as well as the US government. An agreement was found with SHA-256 and Whirlpool... whose estimated cryptographic strength for now is at 192 bits. Tiger-160 was eliminated due to the evaluation time and implementation delay (its estimated 128 bits strength is not enough for the ten years that are coming). 128+160 bits "Bitprints" have a strength of about 192 bits, roughly identical to SHA-256. Conclusion: we must not reduce the size of SHA-1 hashes to less than 128 bits...
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml Last edited by verdyp; May 16th, 2004 at 01:15 AM. |
| ||||
Philippe you didn't understand... the Sha1 is normal at 32 digits but once it is computed we cut it to the first 6 digits to requery using less bandwidth. Yes there are more chances of collision, but the search result contains the 32 digits sha1 that the downloader can compare to the complete sha1 of the file he is downloading, so no downloads of collisions, only possible in search results. But in the end users get more chances to complete rare files and the network isn't under pressure as it would be with the complete sha1 requiery. Do I have to understand that you are more for the kamdelia approach?? I'm feeling too that it is a less quick and dirty way to deal with requieries... Ciao
__________________ Liens d'intérêt /Links of interest: Gnutellaforums en français /The House's rules you have to respect / First search the forum, then create a thread / Free software alternatives! - Logiciels alternatifs gratuits!/ |
| ||||
Quote:
OK I had not understood that you added the verification step after finding a reference. That's a good idea which is statistically correct. The fact that a 6-byte hash will produce collisions 50% of the time every 4096 randomly found files, means that the total risk of collision when querying a source will be below 1% (but it will not be null, and that's why the verification step is required!). So the overhead of veryifying the source with the complete hash will be extremely small face to the gain in bandwidth for locating the candidate sources. This statistic optimization is similar to the statistic optimization performed in QRP with 16-bit hash values (64K tables) whose cryptographic strengh is about 6-bit (64 average values, enough to divide the traffic to 2% for moderately filled QRP tables; if QRP tables are filled at 25%, the collision risk is evaluated to roughly 80%, and QRP will be loosing its capacity to filter searches efficiently; however, for more than 80% of leaf nodes that have QRP tables filled below 10%, the collision risk for non-matching keywords falls under 25%, which means that QRP can filter 75% of queries sent to shielded leaf nodes, saving much bandwidth on UltraPeers, and allowing them to support more leaf nodes).
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml |
| |||
Quote:
|
| ||||
ok Philippe, glad you got my idea I was referring to kamdelia as it is the other alternative to the sheme I made as a proposal for requieries... ie I want LW to ABSOLUTLY include requieries. I just want to discuss with folks what way is the best. Ciao
__________________ Liens d'intérêt /Links of interest: Gnutellaforums en français /The House's rules you have to respect / First search the forum, then create a thread / Free software alternatives! - Logiciels alternatifs gratuits!/ |
| ||||
Quote:
__________________ Liens d'intérêt /Links of interest: Gnutellaforums en français /The House's rules you have to respect / First search the forum, then create a thread / Free software alternatives! - Logiciels alternatifs gratuits!/ |
| ||||
Quote:
Although I don't like requiries, there's a way to handle the number of keywords in queries, to look for alternative subsets of keywords in separate queries.
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml |
| ||||
You know Philippe I'm wary of requierys too, they could destroy the network But since LW are clustered together it diminishes the impact of other bad behaving clients. Yes, QRP content would have to be changed to match to trunketed sha1. I'm thinking: once a leave requery, why wouldn't the UP keep it in mind and say send those accumulated hashes from all leaves in a big search twice an hour (this way UP could control requieries and the UP could struture those hashes in a way it could save bandwidth)? I'm proposing here as I'm not a protocol guru like you Philippe The requieries I want are also automated not manual but they happen only when an host needs more sources (one query by hour per client max) Also without them Gnutella will never be a good place to look for rare big files like videos, requieries would induce diversity in big files an advantage of emule and overnet.
__________________ Liens d'intérêt /Links of interest: Gnutellaforums en français /The House's rules you have to respect / First search the forum, then create a thread / Free software alternatives! - Logiciels alternatifs gratuits!/ |
| ||||
If you want to contribute, there's a Chord subproject in LimeWire, which is stale since long and experimental, but can be the base of such search architecture for unique IDs. In Chord, some hosts are responsible of managing routes for some "fingerbits". Those hosts should better be UltraPeers (they need to have long availability to minimize the cost of structure updates), but for now we don't have enough UltraPeers with the latest version to allow deploying this parallel network. Now that LimeWire 4.0 will be officially released and that it will clearly demonstrate an advance of efficiency in Gnutella searches with a much reduced bandwidth usage, the LimeWire cluster of UltraPeers could become a good candidate to start experimenting with Chord (or other structured topologies). Before that we need to see the impact on Gnutella of the deployement of LimeWire 4.0. The next weeks will be used to monitor the evolution of traffic and see how much we can gain on the overall network (and as well take the time to correct or fine tune some statistical or usage policy parameters). For now, I'm not inclined to change the QRP algorithm and routing. It's probably better to think about a parallel topology with separate routing tables for such searches by URNs. QRP has been done and tuned mostly for keyword searches, with a very weak distribution of keywords. Including URNs in QRP had a negative impact on routing efficiency (but it has been compensated by the introduction of the low-TTL/high-degree topology in association with dynamic querying). My opinion is that, if we can remove URNs from QRP tables, the network will be better, but we can't do that before a new more appropriate routing topology is made and tested for URNs, and then deployed, tuned and scaled.
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml Last edited by verdyp; May 16th, 2004 at 08:38 AM. |
| |||
Any chance someone can start up a new thread with the requery/DHT discussion, and/or copy the already-requested features in this thread to a new thread that I'll make sticky?... It'd be very useful to _just_ have feature requests in a thread. |
| |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
official Limewire Communications warning msg. | thricipio | Open Discussion topics | 28 | January 13th, 2008 07:23 PM |
Message :: Only search results with a [ ] are official LimeWire communications | lvlaxx | Download/Upload Problems | 1 | February 19th, 2007 09:09 AM |
official Limewire Communications warning msg | thricipio | New Feature Requests | 0 | January 30th, 2007 10:07 AM |
*NEW* Official LimeWire Forum for LW community devs ONLY!!! | Tamia | Open Discussion topics | 8 | March 14th, 2006 05:29 PM |