Gnutella Forums  

Go Back   Gnutella Forums > Gnutella News and Gnutelliums Forums > General Gnutella Development Discussion
Register FAQ The Twelve Commandments Members List Calendar Arcade Find the Best VPN Today's Posts

General Gnutella Development Discussion For general discussion about Gnutella development.


Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old January 5th, 2002
Unregistered
Guest
 
Posts: n/a
Default Proposal for development of Gnutella

When a servant process a query and return the filename and filesize available. It should also return a hash code.

This would download from multiples sources even when the users rename the original file.

A good hash (20bytes) with a checking of the filesize should avoid "false duplicates"

Marc.
marc@szs.ca
Reply With Quote
  #2 (permalink)  
Old January 5th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default

yep, I suggest/vote for it too.

There is allready a well documented proposal, named 'HUGE' [1]. From their "Motivation & Goals":

o Folding together the display of query results which represent the exact same file -- even if those identical files have different filenames.

o Parallel downloading from multiple sources ("swarming") with final assurance that the complete file assembled matches the remote source files.

o Safe "resume from alternate location" functionality, again with final assurance of file integrity.

o Cross-indexing GnutellaNet content against external catalogs (e.g. Bitzi) or foreign P2P systems (e.g. FastTrack, OpenCola, MojoNation, Freenet, etc.)

[1] "HUGE" - http://groups.yahoo.com/group/the_gd...roposals/HUGE/ (Yahoo account required)
Reply With Quote
  #3 (permalink)  
Old January 6th, 2002
Unregistered
Guest
 
Posts: n/a
Default Could be simple.

The HUGE thing look complicate for no reason. The risk of making an error on dups files is close to impossible (1 / Billions * Billions) with an 160bits hash and filesize check. You simply do the download and check that the receive file match the hash...

And it's quite simple to code a component that can download from mutliple sources (swarm). You simply test the servers for resume capability, split the files to download in blocks, and create threads that request theses blocks on differents server, you can even create mutliple threads (connections) per server.

In order to improve download/connection speed each client should have a list of other client that have the same file and reply not only with there IP but with the IP of others that can provide the same file. This could be done if Hub (supernodes) are inserted in the network. They could scan for dups files!

I already have program a swarm component in Delphi and it's working well. I will now work on it to add on the fly add/remove of new download sources.

If any one want to work on it to let me know i will send you the sources. It's use Indy for TCP access.

Marc.
marc@szs.ca
Reply With Quote
  #4 (permalink)  
Old January 6th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default

I thought HUGE is simple and flexible?

It does explain a lot of basics and also details how to include hashs into binary Gnutella messages (did you recognize that you have to encode 0x00, also compatibility with other/older clients is guaranteed)? If you think it can be done easier, write a paper... I prefer easy solutions. :-)

PS: What you said about swarming and a list of alternatives downloads. Yes, this is another advantage when we have hashs and Queryhit-caches. I'm a big fan of superpeers, hashs, swarming and multisegmented downloading. :-)
Reply With Quote
  #5 (permalink)  
Old January 6th, 2002
veniamin's Avatar
Devotee
 
Join Date: December 17th, 2001
Posts: 24
veniamin is flying high
Default

I am not sure but i think a CRC could do the job. For each file in a query hit we can put its CRC between the two nulls like Gnotella does for the MP3 files.
Reply With Quote
  #6 (permalink)  
Old January 6th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default

You can do that with HUGE. It also describes the encoding between the two nulls, then the new GET request. I think it preferes SHA1 for the hash, but which you use is flexible, CRC, MD5...

The question I have, which is the best algorithm? Can someone give a summary/overview? Hmm, it should be unique enough within a typical horizon (high security is not the topic), small in size (to keep broadcast traffic low), fast to calculate.

Last edited by Moak; January 6th, 2002 at 02:34 PM.
Reply With Quote
  #7 (permalink)  
Old January 6th, 2002
Unregistered
Guest
 
Posts: n/a
Talking Follow up

About Huge, when i said Huge look complicate i mean from what you tell me it's more about verification of data integrity.

I prefer less verification and better speed (smaller protocol)
as long as verification is good enough.

About CRC it's really not a good idea to use CRC16 or CRC32 the last one give 4 Billions values, that not enough you could get wrong duplicates files. SHA1 use a 20 bytes (160bits) this give a lot more possibility. To give an idea it would be arround,
4Billions * 4Billions * 4Billions, ... You get the point, with this amount of possibility you reduce the change of making false duplicates.

SHA1 speed is fast enough +1MB/Sec but it's not that important, a client program could generate all the sha1 at startup and cache
this information in memory 1000 files would required only 20KB
of memory. Doing a hash for each query would not be a good idea...

Marc
Reply With Quote
  #8 (permalink)  
Old January 6th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default hmm

What do you mean with 'verification'?
The HUGE goals describe exactly what we really need today: a) efficient multisegmented downloading/caching (grouping same files together from Queryhits for parallel downloads or Query caches) - b) efficient automatic requerying (finding alternative download loactions).

I agree, the protocol should be as small as possible.
While you agree with SHA1 (I still have no clue about advantages/disadvantages in CRCxx, SHA1, MD5, TigerTree etc), what could be done better as described in the HUGE paper? I think HUGE is pretty simple. It describes hash positioning in Query/Queryhits and necesarry HTTP headers. Then it encodes the hash to make it fit into Gnutella traffic (null inside the hash must encoded!) and also into HTTP traffic. Example, the well known 'GnutellaProtocol04.pdf' becomes 'urn:sha1:PLSTHIPQGSSZTS5FJUPAKUZWUGYQYPFB'.
Perhaps you don't agree with BAS32 encoding of the hash, what could be done better?

CU, Moak
Reply With Quote
  #9 (permalink)  
Old January 7th, 2002
Unregistered
Guest
 
Posts: n/a
Lightbulb Follow up

I know nothing about Huge i simply did get this from your previous post:

> Safe "resume from alternate location" functionality, again with final assurance of file integrity.

For me "final assurance" mean once download is complete you must do some kind of block check on the original source, with multiple CRC to verify that all the block receive match the original file.

This is what i call "final assurance". Like i said i don't know Huge, what i'm proposing is "not final assurance". Only perform a Sha1 and download the file from all matching sources. Without performing checking at the end of transfer. If HUGE is doing this then they can't tell "final assurance of file integrity" but it's exactly what i want to to.

To have "final assurance" would use to much bandwidth, performance would be better with a small risk for corrupted file, if it's in the range or 1/10000000000000000 sound ok to me.

I will try to take time and check the HUGE principle.

CRC vs SHA1: CRC is as good as SHA1 for randomly select a number corresponding to a data. But SHA1 add security, SHA1 was built so that it's impossible to recreate any original data from the hash key (good for password storing). And of course it generate larger number since it's a 20bytes key vs 4 bytes for CRC32

Marc.
Reply With Quote
  #10 (permalink)  
Old January 7th, 2002
Enthusiast
 
Join Date: January 1st, 2002
Posts: 34
Tamama is flying high
Default Base32?

The only thing i find somewhat weird at HUGE is that the sha1 is Base32 encoded. This means only 5 bits of a 8 bits byte are used. Just doesnt make sence... oh well

The GET request is somewhat strange as well... a simple:

GET urn:sha1:452626526(more of this ****)SDGERT GNUTELLA/0.4

would work just as well...

Some thoughts..
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Gnutella Protocoll v0.7 Proposal Moak General Gnutella Development Discussion 41 August 17th, 2002 11:55 AM
gnutella development plans Iamnacho General Gnutella Development Discussion 11 March 9th, 2002 07:21 PM
My Proposal for XoloX!!! Unregistered User Experience 1 February 6th, 2002 09:11 AM
Xolox and Gnutella development Moak Rants 6 November 25th, 2001 07:05 AM
---a Radical Proposal--- Unregistered General Gnutella / Gnutella Network Discussion 0 September 21st, 2001 01:08 PM


All times are GMT -7. The time now is 10:35 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.

Copyright © 2020 Gnutella Forums.
All Rights Reserved.