Gnutella Forums

Gnutella Forums (https://www.gnutellaforums.com/)
-   General Gnutella Development Discussion (https://www.gnutellaforums.com/general-gnutella-development-discussion/)
-   -   How to produce an unique File index ? (https://www.gnutellaforums.com/general-gnutella-development-discussion/20227-how-produce-unique-file-index.html)

gavelin May 7th, 2003 07:07 AM

How to produce an unique File index ?
 
Hi everybody,
has someone an idea to assign an unique file number in order to uniquely identify one file in all the shared files of a client??
The fact is this assignation SHOULD be fast and the search of this file with the index fast to. The memory usage SHOULD be lessen as possible.

This questions is submitted in the developpement of elinul, a C# open source Gnutella client, soon available.

Thanks ,

Paradog May 7th, 2003 07:29 AM

best and most secure thing you could do:
hash it. i recommend MD5 no idea why the GDF buddies keep using SHA1

der_schreckliche_sven May 7th, 2003 07:55 AM

SHA-1 produces 160bit hashes instead of 128bit hashes (like MD5) which makes it a little more secure.

The main reason for using SHA-1 instead of MD5 was AFAIK that MD5 has one or two known weaknesses which could allow an attacker (the RIAA for example) to corrupt downloads by offering corrupted files with the same MD5 hash as normal files.

SHA-1 is about 50% slower than MD5, but that does not really matter because modern CPU's can hash the data much faster than (at least IDE) drives can read it. And when hashing, it's not the CPU activity but the hard disk activity that makes the computer slow in most cases.

Paradog May 7th, 2003 10:32 AM

Good to know terrible Sven

tshdos May 7th, 2003 04:55 PM

If you just need the index for queryhits, use a simple 32 bit hash.
In C# just use the GetHashCode function of the string containing the full path of the file. Store this value in a lookup table with the hash as the key.

der_schreckliche_sven May 7th, 2003 10:10 PM

not necessarily a good idea. It's possible that two files will have the same hash. And in that case your program would not work correctly whenever the rare case occurs that the hash function returns the same hash for two files.

If it's just for the purpose of storing it in a table, I would not use a hash function at all. I would just use a simple incrementing index for each file.

gavelin May 8th, 2003 04:39 AM

file indexes + files seeking
 
the fact is: if i stored just a file index associated with a file name, i must store this correspondance in memory. If the user shares a lot of files, this can be important.

If i use an hash table, i can have the problem mentioned previously: two files can have the same hash. But the with the file name, i can find the right file, no?

I'd like to raise another problem: have i to look everytime a query comes for a criteria (for instance "madonna mp3") on my drive?? Having all the shared files in memory can be huge but is faster to seek, no?

thx,

tshdos May 8th, 2003 09:29 AM

Quote:

Originally posted by der_schreckliche_sven
not necessarily a good idea. It's possible that two files will have the same hash. And in that case your program would not work correctly whenever the rare case occurs that the hash function returns the same hash for two files.

If it's just for the purpose of storing it in a table, I would not use a hash function at all. I would just use a simple incrementing index for each file.

True, I have yet to see it happen, but I am sure with only a 32 bit hash you will at some point have collisions.

As for speed, you really wouldn't see much of a difference either way, but incremental index does seem easier.

prh99 June 10th, 2003 07:22 AM

As I believe was stated earlier, pick a number to start with and just increment it for every file that is shared. If you use a long integer you uniquely identify a fairly signifcant number of files, more than your average gnutella user will share.


All times are GMT -7. The time now is 02:03 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.

Copyright © 2020 Gnutella Forums.
All Rights Reserved.