Gnutella Forums  

Go Back   Gnutella Forums > Gnutella News and Gnutelliums Forums > General Gnutella Development Discussion
Register FAQ The Twelve Commandments Members List Calendar Arcade Find the Best VPN Today's Posts

General Gnutella Development Discussion For general discussion about Gnutella development.


Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old May 7th, 2003
Apprentice
 
Join Date: March 18th, 2003
Location: Marseille, France
Posts: 5
gavelin is flying high
Default How to produce an unique File index ?

Hi everybody,
has someone an idea to assign an unique file number in order to uniquely identify one file in all the shared files of a client??
The fact is this assignation SHOULD be fast and the search of this file with the index fast to. The memory usage SHOULD be lessen as possible.

This questions is submitted in the developpement of elinul, a C# open source Gnutella client, soon available.

Thanks ,
Reply With Quote
  #2 (permalink)  
Old May 7th, 2003
Paradog's Avatar
Distinguished Member
 
Join Date: April 5th, 2002
Location: Germoney
Posts: 739
Paradog is flying high
Default

best and most secure thing you could do:
hash it. i recommend MD5 no idea why the GDF buddies keep using SHA1
Reply With Quote
  #3 (permalink)  
Old May 7th, 2003
Devotee
 
Join Date: May 2nd, 2003
Posts: 22
der_schreckliche_sven is flying high
Default

SHA-1 produces 160bit hashes instead of 128bit hashes (like MD5) which makes it a little more secure.

The main reason for using SHA-1 instead of MD5 was AFAIK that MD5 has one or two known weaknesses which could allow an attacker (the RIAA for example) to corrupt downloads by offering corrupted files with the same MD5 hash as normal files.

SHA-1 is about 50% slower than MD5, but that does not really matter because modern CPU's can hash the data much faster than (at least IDE) drives can read it. And when hashing, it's not the CPU activity but the hard disk activity that makes the computer slow in most cases.
Reply With Quote
  #4 (permalink)  
Old May 7th, 2003
Paradog's Avatar
Distinguished Member
 
Join Date: April 5th, 2002
Location: Germoney
Posts: 739
Paradog is flying high
Default

Good to know terrible Sven
Reply With Quote
  #5 (permalink)  
Old May 7th, 2003
Gnutella Veteran
 
Join Date: March 24th, 2002
Location: Virginia
Posts: 101
tshdos is flying high
Default

If you just need the index for queryhits, use a simple 32 bit hash.
In C# just use the GetHashCode function of the string containing the full path of the file. Store this value in a lookup table with the hash as the key.
Reply With Quote
  #6 (permalink)  
Old May 7th, 2003
Devotee
 
Join Date: May 2nd, 2003
Posts: 22
der_schreckliche_sven is flying high
Default

not necessarily a good idea. It's possible that two files will have the same hash. And in that case your program would not work correctly whenever the rare case occurs that the hash function returns the same hash for two files.

If it's just for the purpose of storing it in a table, I would not use a hash function at all. I would just use a simple incrementing index for each file.
Reply With Quote
  #7 (permalink)  
Old May 8th, 2003
Apprentice
 
Join Date: March 18th, 2003
Location: Marseille, France
Posts: 5
gavelin is flying high
Default file indexes + files seeking

the fact is: if i stored just a file index associated with a file name, i must store this correspondance in memory. If the user shares a lot of files, this can be important.

If i use an hash table, i can have the problem mentioned previously: two files can have the same hash. But the with the file name, i can find the right file, no?

I'd like to raise another problem: have i to look everytime a query comes for a criteria (for instance "madonna mp3") on my drive?? Having all the shared files in memory can be huge but is faster to seek, no?

thx,
Reply With Quote
  #8 (permalink)  
Old May 8th, 2003
Gnutella Veteran
 
Join Date: March 24th, 2002
Location: Virginia
Posts: 101
tshdos is flying high
Default

Quote:
Originally posted by der_schreckliche_sven
not necessarily a good idea. It's possible that two files will have the same hash. And in that case your program would not work correctly whenever the rare case occurs that the hash function returns the same hash for two files.

If it's just for the purpose of storing it in a table, I would not use a hash function at all. I would just use a simple incrementing index for each file.
True, I have yet to see it happen, but I am sure with only a 32 bit hash you will at some point have collisions.

As for speed, you really wouldn't see much of a difference either way, but incremental index does seem easier.
Reply With Quote
  #9 (permalink)  
Old June 10th, 2003
Disciple
 
Join Date: July 18th, 2002
Posts: 19
prh99 is flying high
Default

As I believe was stated earlier, pick a number to start with and just increment it for every file that is shared. If you use a long integer you uniquely identify a fairly signifcant number of files, more than your average gnutella user will share.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
unique to mac-i doubt it. help!! sinclaire General Mac OSX Support 8 November 29th, 2006 09:58 PM
80 kpbs files will not produce sound palikid Download/Upload Problems 2 September 29th, 2005 11:30 AM
Can connect.....searches produce nothing halo8th Download/Upload Problems 0 June 2nd, 2005 09:16 AM
cannot create index file tumbleweed80 Qtella (Linux/Unix) 2 August 13th, 2003 02:07 PM
file index in Result Set in QueryHit JackDCLee General Gnutella Development Discussion 0 November 15th, 2001 09:08 AM


All times are GMT -7. The time now is 10:51 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.

Copyright © 2020 Gnutella Forums.
All Rights Reserved.