Gnutella Forums - View Single Post - Proposal for development of Gnutella (hashs)

Smilin' Joe Fission · #53 (**permalink**) April 14th, 2002

Quote:

Originally posted by Nosferatu
Can't find the word tiger or anything that looks like hashing of parts of the file at

http://rfc-gnutella.sourceforge.net/...-huge-0_92.txt or http://rfc-gnutella.sourceforge.net/...-huge-0_93.txt .
Perhaps it was discussed and then dropped .. ? Got a reference?I found http://bitzi.com/ propose/use <A HREF="http://bitzi.com/developer/bitprint">tiger-tree </A>as an attempt to index as many files as they can .. looks like a good project to incorporate into gnutella clients - have a bitzi index lookup.

You're right... I remember reading about the value of the Tiger Tree hash and, without actually looking at the HUGE proposal again to verify it, I assumed it was the proposal where I originally saw it. However, the HUGE proposal does include a provision for a 39 character tiger tree value, but they don't explain how it is used or how it is generated.

Quote:

Also found the <A HREF="http://www.cs.technion.ac.il/~biham/Reports/Tiger/">Tiger Hash algorithm homepage</A> and the <A HREF="http://sourceforge.net/projects/tigertree/">tiger-tree homepage</A>.

Unfortunately between these three sources I can't find a description of the tiger-tree process in words I can understand. <A HREF="http://bitzi.com/developer/bitprint">"TigerTree is based on the Tiger hash algorithm, applied to each 1024-byte block of a file, then combined up through a binary hash tree to a final summary value"</A> really doesn't cut it for me.

Anyone know what it means? They imply that it can be used for incremental portions of the file .. but I don't understand the process.

I'll see if I can reiterate correctly how it works. Basically the Tiger algorithm hashes a file in 1024 byte blocks. Then it sets up a tree system for finding the final hash which looks similar to:

Code:

A   B C   D   
 \ /   \ /
  E     F
   \   /
    \ / 
     G

From what I remember, working with an "expected" hash value, one can then determine if any of the component hashes are bad. For instance, given the value of E, one could verify that blocks A and B are correct. Given the value of G, one could verify the hash totals for E and F. If, for instance, block C was corrupted, the hash total for F would be wrong so E and F would not equal G. Knowing that the hash total for F was wrong, a person could then trace back the fault to either block C or D. And, if given the true hash values for C and D, individual comparisons could be made to determine that C is incorrect.

I hope this makes a shred of sense, it's in the early morning as I'm writing this and my brain is falling asleep. Besides that, I can't seem to find the reference material I got this from.

Quote:

I think this is a <B><I>really exciting idea</I></B>. Could save a lot of bandwidth downloading broken mp3s etc, for example.

Or, in my case, broken video files. I really hate having to download a 500+ MB file again because a few frames in the middle somehow got garbled during transfer.