Follow up About Huge, when i said Huge look complicate i mean from what you tell me it's more about verification of data integrity.
I prefer less verification and better speed (smaller protocol)
as long as verification is good enough.
About CRC it's really not a good idea to use CRC16 or CRC32 the last one give 4 Billions values, that not enough you could get wrong duplicates files. SHA1 use a 20 bytes (160bits) this give a lot more possibility. To give an idea it would be arround,
4Billions * 4Billions * 4Billions, ... You get the point, with this amount of possibility you reduce the change of making false duplicates.
SHA1 speed is fast enough +1MB/Sec but it's not that important, a client program could generate all the sha1 at startup and cache
this information in memory 1000 files would required only 20KB
of memory. Doing a hash for each query would not be a good idea...
Marc |