Gnutella Forums  

Go Back   Gnutella Forums > Gnutella News and Gnutelliums Forums > General Gnutella Development Discussion
Register FAQ The Twelve Commandments Members List Calendar Arcade Find the Best VPN Today's Posts

General Gnutella Development Discussion For general discussion about Gnutella development.


Reply
 
LinkBack Thread Tools Display Modes
  #11 (permalink)  
Old January 7th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default

Just a question again: You don't agree with BAS32 encoding of the hash, what could be done better?
Reply With Quote
  #12 (permalink)  
Old January 7th, 2002
Unregistered
Guest
 
Posts: n/a
Default

Tamaha,

Hi, Base32 does not reduce the number of bits, it change the way it is display/send.

For example:

2 is "01" in Base 2. Both "2" and "01" contain the exact same information it's just display in another way. Base32 and Base64 convertion are use to assure that datas can travel to different OS/Computer and still be the same value.
Reply With Quote
  #13 (permalink)  
Old January 7th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default wrong

BASE32 codes 5 bits into one char (8 bits).
Reply With Quote
  #14 (permalink)  
Old January 9th, 2002
Unregistered
Guest
 
Posts: n/a
Default

Wrong? I guest you don't understand.

Yes Base32 use 5 bits, but when it encode
11111111 (255) it does not scrapt the remaining 3 bits,
like i previously explain it only change how they are send.

In this case 255 would be send with 00011111 then 00000111.

So sending Byte 32 and Byte 8 in BASE32 is the same as sending Byte 255 in Base255 But the result is the same the value is 255.

Changing the Base DOES NOT CHANGE THE VALUE, it change how it's display. Otherwise it's not a base change it's a value change.


Marc.
Reply With Quote
  #15 (permalink)  
Old January 9th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default

plz look into existing code
Reply With Quote
  #16 (permalink)  
Old January 11th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Post Hashs in Queries (SMALL != HUGE)

Hi,
I thought more about hashs in Gnutella Queries. I personally think they should be as small as possible, because I expect a increased Query/Queryhit traffic from new clients with features like automatic resume and multisegmented downloads. While automatic requeries are a key technology for those features, the Query traffic especially for hash wil increase. Perhaps it will be also necesarry to group multiple searches together into a single message (multiple searches in one Gnutella Query to avoid repeated sending of Gnutella descriptor header, 23 bytes + more repeated payload). A small query/hash will be necessary in my eyes, as small as possible.

Different people have different ideas of a small hash. It should be still unique enough to fit our needs, common are AFAIK those suggestions:

* CRC-32, size 32 bit (256 hashs/KB *) [1]
* MD5, size 128 bit (64 hashs/KB) [2]
* SHA1, size 160 bit (51.2 hashs/KB) [3]
* Tiger, size 192 bit or truncated to 128 or 160 (42.6 hashs/KB) [4]
* Snefru, 256 bit (32 hashs/KB) [5]

I'm not sure about which hash to use (prefered). There seems to be nothing between 32 bit (CRC-32) and 128 bit (MD5) length. CRC32 will be not unique enough within a typical Gnutella horizon, better start to use MD5 or higher. Is it possible to truncate a big hash to e.g. 64 bits, does this make sense? I'm not familiar with cryptography, this is only a short summary... perhaps someone else wants to add some more qualified comments? :-)

At least the hash should be IMHO pure binary in inside the query (not BASE32 encoded which blows up the size again), in HTTP headers it might be BaseWhatever encoded to gain highest HTTP/1.x compatibilty. I think indexing speed is secondary [6]. Indexing local shared files can be performed in background on first startup (meanwhile the client does not answer with own hashs, but can allready search for).

* = pure binary hash, not included are descriptor headers or other protocol overhead

[1] CRC-32 - http://www.ietf.org/rfc/rfc1510.txt (ISO 3309)
[2] MD5 - http://www.faqs.org/rfcs/rfc1321.html
[3] SHA1 - http://www.faqs.org/rfcs/rfc3174.html
[4] Tiger - http://www.cs.technion.ac.il/~biham/Reports/Tiger/
[5] Snefru - http://www.faqs.org/faqs/cryptography-faq/part07/
[6] Hash Indexing Speed - http://groups.yahoo.com/group/the_gdf/message/1970

Last edited by Moak; January 11th, 2002 at 02:39 PM.
Reply With Quote
  #17 (permalink)  
Old January 12th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default

PS: Some ppl suggest to combine a small CRC hash with filesize (which is allready in Queryhits, not in Queries or other Gnutella descriptors).... let's play around with this idea. This would be a 32 bit hash + 32 bit filesize (taken from Gnutella protocol v0.4) = 64 bit key to use. Perhaps it would be more unique to use a real hash of 64 bit instead of the 64 bit CRC+filesize combo, e.g. an truncated MD5?

Here an overview of minimum possibilities:
* CRC-32, size 32 bit
* CRC-32 filesize combo, size 64 bit
* MD5 truncated, size 64 bit
* MD5, size 128 bit

Notes: I have choosen MD5 in this case, because it is the smallest and fastest compared to other hashs (SHA1, Tiger, Snefru). The CRC-32 alone is too small. The CRC-filesize combo might be enough, the truncated 64 bit MD5 might be mathematically more unique while it wastes 32 bit information in Queryhits (not in Query, GET, PUSHS). The next higher alternative is a 128 Bit MD5, e.g FastTrack uses an MD5 hash AFAIK.

I'm not sure if a minimum alternative is the best solution for Gnutella's future. Perhaps a 64 bit key does make us happy now, in future with more superpeers and bigger horizons we might want to have a bigger hash (MD5 or SHA1)?

An possibility could be an encoding alla HUGE. The hash has an prefix telling the hash type. For binary Gnutella messages (Query/Queryhits) this could be a payload like: byte 0 = hash type, more bytes = binary hash. The protocoll defines a list of known hash, while clients need a common solution, this list will be short, e.g start with CRC-filesize combo today and use SHA1 in future. In HTTP-alike Gnutella messages we can work with encoded hashs (not binary), similar to the HUGE proposal [1].

Conclusion: I have none. I suggest to implement and test a minimum solution (CRC32-filesize combo) and a bigger hash (MD5 or SHA1) for a while. With more experience in a real world environment we can hopefully find a suitable solution. Feedback, tests and mathematical analysis are welcome!

[1] "HUGE" - http://groups.yahoo.com/group/the_gd...roposals/HUGE/ (Yahoo account required)

Last edited by Moak; January 12th, 2002 at 07:54 AM.
Reply With Quote
  #18 (permalink)  
Old January 12th, 2002
Moak's Avatar
Guest
 
Join Date: September 7th, 2001
Location: Europe
Posts: 816
Moak is flying high
Default

PPS: Here is a summary why using Base32 or Base64 encoding in HTTP-style requests/headers.

------ snip ------
From http://groups.yahoo.com/group/the_gdf/message/2442:
- We could choose any encoding, but...
- Base32 is useful for compatibility with URLs and domain names, so...
- We might as well use it in protocol-fields, saving extra conversions and developer inconvenience.
------ snap ------

Which sounds logical... BUT.... at least we could also use Base64 after having a '?' in the location/URN. A HTTP GET could look like this:

Definition: GET /get/hash?[URN] HTTP/1.0
Base32: GET /get/hash?sha1:BCMD5DIPKJJTG2GHI2AZ9HG7HZUN5ZPH HTTP/1.0
Base64: GET /get/hash?sha1:/9n6YmKqNRmcLIiKC+2xRccm68 HTTP/1.0

Right now I prefer Base64 for HTTP encoding (it's smaller then Base32), binary encoding inside binary Gnutella binary messages and a smaller hash than SHA1.

Last edited by Moak; January 12th, 2002 at 04:41 PM.
Reply With Quote
  #19 (permalink)  
Old January 27th, 2002
veniamin's Avatar
Devotee
 
Join Date: December 17th, 2001
Posts: 24
veniamin is flying high
Default

If a client respond to a query with query hits that each one has a URN then a lot of bandwidth is beeing wasted.

What i want to say is that for some files a client should not respond with a URN/Query Hit.
For example text files usually are small files which can be downloaded again in case of an incomplete download. There is no need to apply a URN for a text file. Not just .TXT files but also .HTM, .XML, .DOC and other text files. Another reason is that , if you just alter a text file (ex: put a comma) then the HASH changes. So many users can have (almost) the same (text) file but not the exactly identical and sending diferrent URNs.
URNs should be used only for binary files. But not just for every binary file. For small binary files (800~1000KB) a client should not reply with a URN.

Also when a client searches for the same file in other servers should use and the size of the file, not just the HASH. This way we can use an algorithm smaller than SHA1. If you have two binary files with different file size then there is no chance to be the same (or one of them is corrupted).
Reply With Quote
  #20 (permalink)  
Old January 27th, 2002
Unregistered
Guest
 
Posts: n/a
Default

confused, do you really mean "Uniform Resource Names" (urn) or simply "hashs"?
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Gnutella Protocoll v0.7 Proposal Moak General Gnutella Development Discussion 41 August 17th, 2002 11:55 AM
gnutella development plans Iamnacho General Gnutella Development Discussion 11 March 9th, 2002 07:21 PM
My Proposal for XoloX!!! Unregistered User Experience 1 February 6th, 2002 09:11 AM
Xolox and Gnutella development Moak Rants 6 November 25th, 2001 07:05 AM
---a Radical Proposal--- Unregistered General Gnutella / Gnutella Network Discussion 0 September 21st, 2001 01:08 PM


All times are GMT -7. The time now is 02:45 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.

Copyright © 2020 Gnutella Forums.
All Rights Reserved.