Gnutella Forums - View Single Post - Proposal for development of Gnutella (hashs)

Nosferatu · #52 (**permalink**) April 13th, 2002

Quote:

To implement this most reliably and sensibly would require instead of the HUGE proposal's technicque of always and only hashing the whole file, the best implementation would be to have a query 'please hash the file from range x-y'.

I believe this was part of the HUGE proposal as well... The part about using a Tiger tree to hash sections of a file. Is it not?

Can't find the word tiger or anything that looks like hashing of parts of the file at

http://rfc-gnutella.sourceforge.net/...-huge-0_92.txt or http://rfc-gnutella.sourceforge.net/...-huge-0_93.txt .
Perhaps it was discussed and then dropped .. ? Got a reference?I found http://bitzi.com/ propose/use <A HREF="http://bitzi.com/developer/bitprint">tiger-tree </A>as an attempt to index as many files as they can .. looks like a good project to incorporate into gnutella clients - have a bitzi index lookup.

Also found the <A HREF="http://www.cs.technion.ac.il/~biham/Reports/Tiger/">Tiger Hash algorithm homepage</A> and the <A HREF="http://sourceforge.net/projects/tigertree/">tiger-tree homepage</A>.

Unfortunately between these three sources I can't find a description of the tiger-tree process in words I can understand. <A HREF="http://bitzi.com/developer/bitprint">"TigerTree is based on the Tiger hash algorithm, applied to each 1024-byte block of a file, then combined up through a binary hash tree to a final summary value"</A> really doesn't cut it for me.

Anyone know what it means? They imply that it can be used for incremental portions of the file .. but I don't understand the process.

These bitzi guys are JUST doing hashing of files, and are covering any files you care to name .. so they probably have thrashed this issue out pretty well.

Also, if there aren't competing schemes to index all of filespace, then it really makes a lot of sense to use their hashing scheme so that you can link in and allow users to search bitzis index to see what it has to say about what the user receives in their search results. I think this is a <B><I>really exciting idea</I></B>. Could save a lot of bandwidth downloading broken mp3s etc, for example.

Quote:

I think if you look at the file size and the hash, you have enough certainty to call it a definite match in searching for alternate download sources. Better techinuqe described above in first portion of post.

Personally, I would trust just the hash because a file of a different size should theoretically generate a different hash. But that's just my opinion.

The file size could be used to eliminate the case where two files have the same hash. It's possible to happen, but I would expect (and I don't know enough to say this is definitely the case) I would expect that the chances of two files of the same size having the same hash is much smaller than the chance of two files of differing sizes having the same hash. It's just a way to rule out 99% of files which could be duplicate hash but different file.

Nos