How to produce an unique File index ?

gavelin · #1 (**permalink**) May 7th, 2003

Hi everybody,
has someone an idea to assign an unique file number in order to uniquely identify one file in all the shared files of a client??
The fact is this assignation SHOULD be fast and the search of this file with the index fast to. The memory usage SHOULD be lessen as possible.

This questions is submitted in the developpement of elinul, a C# open source Gnutella client, soon available.

Thanks ,

Paradog · #2 (**permalink**) May 7th, 2003

best and most secure thing you could do:
hash it. i recommend MD5 no idea why the GDF buddies keep using SHA1

der_schreckliche_sven · #3 (**permalink**) May 7th, 2003

SHA-1 produces 160bit hashes instead of 128bit hashes (like MD5) which makes it a little more secure.

The main reason for using SHA-1 instead of MD5 was AFAIK that MD5 has one or two known weaknesses which could allow an attacker (the RIAA for example) to corrupt downloads by offering corrupted files with the same MD5 hash as normal files.

SHA-1 is about 50% slower than MD5, but that does not really matter because modern CPU's can hash the data much faster than (at least IDE) drives can read it. And when hashing, it's not the CPU activity but the hard disk activity that makes the computer slow in most cases.

Paradog · #4 (**permalink**) May 7th, 2003

Good to know terrible Sven

tshdos · #5 (**permalink**) May 7th, 2003

If you just need the index for queryhits, use a simple 32 bit hash.
In C# just use the GetHashCode function of the string containing the full path of the file. Store this value in a lookup table with the hash as the key.

der_schreckliche_sven · #6 (**permalink**) May 7th, 2003

not necessarily a good idea. It's possible that two files will have the same hash. And in that case your program would not work correctly whenever the rare case occurs that the hash function returns the same hash for two files.

If it's just for the purpose of storing it in a table, I would not use a hash function at all. I would just use a simple incrementing index for each file.

gavelin · #7 (**permalink**) May 8th, 2003

the fact is: if i stored just a file index associated with a file name, i must store this correspondance in memory. If the user shares a lot of files, this can be important.

If i use an hash table, i can have the problem mentioned previously: two files can have the same hash. But the with the file name, i can find the right file, no?

I'd like to raise another problem: have i to look everytime a query comes for a criteria (for instance "madonna mp3") on my drive?? Having all the shared files in memory can be huge but is faster to seek, no?

thx,

tshdos · #8 (**permalink**) May 8th, 2003

Quote:

Originally posted by der_schreckliche_sven
not necessarily a good idea. It's possible that two files will have the same hash. And in that case your program would not work correctly whenever the rare case occurs that the hash function returns the same hash for two files.

If it's just for the purpose of storing it in a table, I would not use a hash function at all. I would just use a simple incrementing index for each file.

True, I have yet to see it happen, but I am sure with only a 32 bit hash you will at some point have collisions.

As for speed, you really wouldn't see much of a difference either way, but incremental index does seem easier.

prh99 · #9 (**permalink**) June 10th, 2003

As I believe was stated earlier, pick a number to start with and just increment it for every file that is shared. If you use a long integer you uniquely identify a fairly signifcant number of files, more than your average gnutella user will share.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
unique to mac-i doubt it. help!!	sinclaire	General Mac OSX Support	8	November 29th, 2006 09:58 PM
80 kpbs files will not produce sound	palikid	Download/Upload Problems	2	September 29th, 2005 11:30 AM
Can connect.....searches produce nothing	halo8th	Download/Upload Problems	0	June 2nd, 2005 09:16 AM
cannot create index file	tumbleweed80	Qtella (Linux/Unix)	2	August 13th, 2003 02:07 PM
file index in Result Set in QueryHit	JackDCLee	General Gnutella Development Discussion	0	November 15th, 2001 09:08 AM