Quote:
Originally posted by Nosferatu I just had a conversation on irc .. someone had a good idea, maybe some of you have heard it before.
Anyway, the idea is this: hash the first meg of the file as well as the whole file.
So that way you can tell that 'buffy vampire.divx' 20M is the same file as 'buffy vampyre.divx' 80M, and get at least the first 20M.
Then you repeat search later for files with first meg hash = x.
To implement this most reliably and sensibly would require instead of the HUGE proposal's technicque of always and only hashing the whole file, the best implementation would be to have a query 'please hash the file from range x-y'. |
I believe this was part of the HUGE proposal as well... The part about using a Tiger tree to hash sections of a file. Is it not?
Quote:
Well, this is the question. Is the HASH indeed large enough to <i>have</i> a unique value for each individual permutation of a 1G file, and if not, does it really matter? |
I believe it may be although I haven't verified it. However, what makes me think this is that the SHA1 hash is good for files up to 2<sup>64</sup> bits long, for which I would think it would generate a unique hash for each unique file.
Quote:
Certainly we are not going to generate each version of a 1G file that is possible .. ever (well, unless some pr!ck sits down in the far future and does it on purpose as a programming exercise using some newfangled superdupercomputer we can't even imagine yet .. but I stray from the topic). We do need a hash that has enough values that <i>most probably</I> each individual file we generate will have a unique value .. but it can't be known for sure unless you actually generate the hash for each file (ie generate each file). |
Agreed.
Quote:
I think if you look at the file size and the hash, you have enough certainty to call it a definite match in searching for alternate download sources. Better techinuqe described above in first portion of post. |
Personally, I would trust just the hash because a file of a different size should theoretically generate a different hash. But that's just my opinion.
Quote:
I did a quick one on my calculator based on figure for 'mass of observable universe' from O'Hanian 'Physics' text book .. and 1e70 would seem to be what "they" think (the scientists). But I agree about the drugs |
Well, hopefully they'll do a count someday to find out an exact number. Heh.