|
Register | FAQ | The Twelve Commandments | Members List | Calendar | Arcade | Find the Best VPN | Today's Posts | Search |
General Gnutella Development Discussion For general discussion about Gnutella development. |
| LinkBack | Thread Tools | Display Modes |
| |||
Proposal for development of Gnutella When a servant process a query and return the filename and filesize available. It should also return a hash code. This would download from multiples sources even when the users rename the original file. A good hash (20bytes) with a checking of the filesize should avoid "false duplicates" Marc. marc@szs.ca |
| ||||
yep, I suggest/vote for it too. There is allready a well documented proposal, named 'HUGE' [1]. From their "Motivation & Goals": o Folding together the display of query results which represent the exact same file -- even if those identical files have different filenames. o Parallel downloading from multiple sources ("swarming") with final assurance that the complete file assembled matches the remote source files. o Safe "resume from alternate location" functionality, again with final assurance of file integrity. o Cross-indexing GnutellaNet content against external catalogs (e.g. Bitzi) or foreign P2P systems (e.g. FastTrack, OpenCola, MojoNation, Freenet, etc.) [1] "HUGE" - http://groups.yahoo.com/group/the_gd...roposals/HUGE/ (Yahoo account required) |
| |||
Could be simple. The HUGE thing look complicate for no reason. The risk of making an error on dups files is close to impossible (1 / Billions * Billions) with an 160bits hash and filesize check. You simply do the download and check that the receive file match the hash... And it's quite simple to code a component that can download from mutliple sources (swarm). You simply test the servers for resume capability, split the files to download in blocks, and create threads that request theses blocks on differents server, you can even create mutliple threads (connections) per server. In order to improve download/connection speed each client should have a list of other client that have the same file and reply not only with there IP but with the IP of others that can provide the same file. This could be done if Hub (supernodes) are inserted in the network. They could scan for dups files! I already have program a swarm component in Delphi and it's working well. I will now work on it to add on the fly add/remove of new download sources. If any one want to work on it to let me know i will send you the sources. It's use Indy for TCP access. Marc. marc@szs.ca |
| ||||
You can do that with HUGE. It also describes the encoding between the two nulls, then the new GET request. I think it preferes SHA1 for the hash, but which you use is flexible, CRC, MD5... The question I have, which is the best algorithm? Can someone give a summary/overview? Hmm, it should be unique enough within a typical horizon (high security is not the topic), small in size (to keep broadcast traffic low), fast to calculate. Last edited by Moak; January 6th, 2002 at 02:34 PM. |
| |||
Follow up About Huge, when i said Huge look complicate i mean from what you tell me it's more about verification of data integrity. I prefer less verification and better speed (smaller protocol) as long as verification is good enough. About CRC it's really not a good idea to use CRC16 or CRC32 the last one give 4 Billions values, that not enough you could get wrong duplicates files. SHA1 use a 20 bytes (160bits) this give a lot more possibility. To give an idea it would be arround, 4Billions * 4Billions * 4Billions, ... You get the point, with this amount of possibility you reduce the change of making false duplicates. SHA1 speed is fast enough +1MB/Sec but it's not that important, a client program could generate all the sha1 at startup and cache this information in memory 1000 files would required only 20KB of memory. Doing a hash for each query would not be a good idea... Marc |
| ||||
hmm What do you mean with 'verification'? The HUGE goals describe exactly what we really need today: a) efficient multisegmented downloading/caching (grouping same files together from Queryhits for parallel downloads or Query caches) - b) efficient automatic requerying (finding alternative download loactions). I agree, the protocol should be as small as possible. While you agree with SHA1 (I still have no clue about advantages/disadvantages in CRCxx, SHA1, MD5, TigerTree etc), what could be done better as described in the HUGE paper? I think HUGE is pretty simple. It describes hash positioning in Query/Queryhits and necesarry HTTP headers. Then it encodes the hash to make it fit into Gnutella traffic (null inside the hash must encoded!) and also into HTTP traffic. Example, the well known 'GnutellaProtocol04.pdf' becomes 'urn:sha1:PLSTHIPQGSSZTS5FJUPAKUZWUGYQYPFB'. Perhaps you don't agree with BAS32 encoding of the hash, what could be done better? CU, Moak |
| |||
Follow up I know nothing about Huge i simply did get this from your previous post: > Safe "resume from alternate location" functionality, again with final assurance of file integrity. For me "final assurance" mean once download is complete you must do some kind of block check on the original source, with multiple CRC to verify that all the block receive match the original file. This is what i call "final assurance". Like i said i don't know Huge, what i'm proposing is "not final assurance". Only perform a Sha1 and download the file from all matching sources. Without performing checking at the end of transfer. If HUGE is doing this then they can't tell "final assurance of file integrity" but it's exactly what i want to to. To have "final assurance" would use to much bandwidth, performance would be better with a small risk for corrupted file, if it's in the range or 1/10000000000000000 sound ok to me. I will try to take time and check the HUGE principle. CRC vs SHA1: CRC is as good as SHA1 for randomly select a number corresponding to a data. But SHA1 add security, SHA1 was built so that it's impossible to recreate any original data from the hash key (good for password storing). And of course it generate larger number since it's a 20bytes key vs 4 bytes for CRC32 Marc. |
| |||
Base32? The only thing i find somewhat weird at HUGE is that the sha1 is Base32 encoded. This means only 5 bits of a 8 bits byte are used. Just doesnt make sence... oh well The GET request is somewhat strange as well... a simple: GET urn:sha1:452626526(more of this ****)SDGERT GNUTELLA/0.4 would work just as well... Some thoughts.. |
| |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Gnutella Protocoll v0.7 Proposal | Moak | General Gnutella Development Discussion | 41 | August 17th, 2002 11:55 AM |
gnutella development plans | Iamnacho | General Gnutella Development Discussion | 11 | March 9th, 2002 07:21 PM |
My Proposal for XoloX!!! | Unregistered | User Experience | 1 | February 6th, 2002 09:11 AM |
Xolox and Gnutella development | Moak | Rants | 6 | November 25th, 2001 07:05 AM |
---a Radical Proposal--- | Unregistered | General Gnutella / Gnutella Network Discussion | 0 | September 21st, 2001 01:08 PM |