![]() |
Partial File Sharing Protocol Development This thread is for discussing and writing an open protocol for sharing partial files across the gnutellanetwork. This feature will greatly increase the amound of resources available to the network. Everyones input is welcome. Here are two great threads at the_gdf to get started: http://groups.yahoo.com/group/the_gdf/message/6807 http://groups.yahoo.com/group/the_gdf/message/6918 Thanks everyone, GF |
PFSP 0.1 - Very Very Rough Here is a very rough draft: Partial File Sharing Protocol 0.1 1 Introduction 1.1 Purpose Creation of an open protocol to allow the sharing of partially downloaded files between gnutella clients. There are many benefits to this and it can be done with the same (or even greater) confidence than the sharing of complete and hashed files. The best way to share partial files is to use tree hashes. This approach actually provides a greater confidence than the current full file hash because the client could confirm segments of the file. The current full file hash can only confirm that the file is the desired file after it has been fully downloaded. If there is an error there is no way to tell were it is and the full file must be discarded. This could be a huge bandwidth waste, especially if throwing out a 700mb file. An added bonus of the tree hash method is the ability to resume from nearly identical files. This solves the problem we are having with files that are the same but w/ different metadata. For example, as soon as I play many video files, something in them changes and the file cannot be swarmed any longer. The same problem exists when the same song has different meta tags at the end. The current proposed solution to the problem is to do a second hash of the file without the portion containing the meta data. This is very file type specific and offers little additional benefit when compared with the tree hash, which would offer the benefits of true swarming, partial file sharing, and the ability to share nearly identical files. For example: I want to download songx, so I search for it and find it. There are quite a few versions with the same size, bitrate, ect but they have different metadata so the hash is different. Well, with the tree hash you could use all of those sources to swarm from for the parts of the file that are the same! This would greatly increase swarming speeds while providing the same security and confidence we currently have with hashed files! 2 Protocol Definition 2.1 Tiger Tree Hash A Tiger Tree hash MUST be generated for each file shared by the client calculated by applying the Tiger hash to 1024-byte blocks of a stream, then combining the interim values through a binary hash tree. Clients MUST NOT share partial files that have not had a Tiger Tree Hash value calculated. See: http://sourceforge.net/projects/tigertree/ http://www.cs.technion.ac.il/~biham/Reports/Tiger/ http://bitzi.com/developer/bitprint 2.2 Sub Hash Ranges Clients MUST store sub ranges of 1mb sizes and may choose to store smaller ranges. 2.3 Transmissions and Display of Tiger Tree Hashes 3.1 Queries and Replies Queries by clients that can handle Partial File Share MUST indicate this in the query (GGEP?). All replies of partial files MUST indicate the ranges available and any X-Alternate-Locations for any parts of the file. Clients SHOULD NOT display partial file results to the user UNLESS the location of a full file is found or the ranges returned cover the range of the full file. 3.1.2 Searching by Sub Hash Clients MAY search by any 1mb sub hash. 3.2 Requesting Ranges HTTP Range GETs are the best standard multivendor way to request parts of larger file. Clients SHOULD request all required ranges. 3.3 Uploading Partial Files A client with a complete file SHOULD randomly upload ranges of the file. If ranges of the file are requested then the client SHOULD randomly choose which ranges to supply first. The intention is to propagate the whole file across the network as rapidly as possible. Clients MAY use X-Alternate-Locations to decide which ranges are rarest and preferentially upload those ranges. 3.4 Sharing Partial Files Clients that are capable of sharing partial files MUST share partial files by default. Client MAY allow users to inactivate the sharing of partial files. To Do: -Add a new GGEP extension to queries that specify that you want to see partial files - Add a new GGEP extension to queryhits that simply specifies Percentage complete for partial hits. (a simple partial/full flag would do as well) - In an HTTP Get request, add a X-Gnutella-Partial-File header that lists the IP/Ports of servers thought to at least have X percentage of a file. Do not list the percentage here. - Servers that support partial file gets, should also support a new CGI type request of a style like GET /uri-res/N2PR? urn:sha1:PLSTHIPQGSSZTS5FJUPAKUZWUGYQYPFB where the returned payload is a csv file of the tuple (start, stop, active) -clients need to be able to indicate what the smallest increment of hash they will provide is. Im not sure it makes sense to store all of the 1024bit (?) hashes. Anyway, there should be a header indicating that the client stores hashes as small as X. Data would be the best way to decide of course but 1mb "feels" like a logical size. -there should be a mechanism for searching by a sub-hash. I guess if clients store 1mb hashes it would be possible to search by any of the 1mb hashes but it would probably be less computationaly intesive if there was an agreed on sub-hash to search by. Im not sure how the meta data at the start of a file works. If it is all of the same size regardless of content (and I doubt this) then it would be possible to search for sub-hashes after the first part of the file to do sub-hash matches between nearly identicle files with different "early" meta data. I guess it would make sense to either use the first 1mb or the second 1mb for searching by sub-hash. |
Re: PFSP 0.1 - Very Very Rough Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
And that's how I would do it: Remote clients return queryreplies of partial files if the file is requested by hash. The local client try's establishing a HTTP1.1 connection to them and requests the ranges of the files it needs. The remote client (if no other error occurs) answers with a 206 (Partial Content) if it has got a subsection of the requested range and with a 416 (Requested range not satisfiable) if the remote client doesn't. Then the downloading is done and everybody is happy. The main advantage I see with this way of handling partial uploads is that it's easy to code, relatively secure (do we need absolute security?). Less changes to the protocol and it has mosts of the benefits your proposal has. I'd rather try using the hammer I have at home before I go buy a bigger one. |
security It is actually rather easy to fake a full file hash, just lie! There is no way for one client to know that the other client lied until it has download the whole file and rehashed it itself. Then, if there is a problem it doesnt know if it was just a mistake in transfering data. If the file was multi-sourced then there is no way to know which of the many clients it downloaded from lied. This is a MAJOR vulnerability with the current gnutella network. One rouge client could search the net, find the size and hash of files, and then use the same file size and hash to respond to ALL queries it can, send garbage data as just a small part of a swarm and destroy thousands or possibly millions of file transfers with minimal bandwith usage. |
Re: security Quote:
Quote:
Quote:
Quote:
I say just give simple partial sharing a try, to see if tree hashes are really necessary. This simple kind of partial sharing could be ready in a month without tedious discussions in the GDF. Your kind of partial sharing will need half a year at least until it's implemented. |
Yes, I have no doubt that there will be major attacks on the gnutella network at some point. The argument that there are already so many holes in gnutella security so why not a few more isnt a very good approach. We need to be working to patch up those holes, not creat new ones. |
In fact, rather than fake file or file segment, corruption of a file segment is more common in partial file transfer. Without tree hash (or some equivalent mechanism), a single file chunk corruption will not be detected before the entire file downloaded then found corruption so has to be dumped. People should have at least tried out some of the p2p software that have implemented partila file transfer (e.g. eDonkey, etc), and get some first hand user experiences before jump to argue about whether 'partial file transfer' is necessary and how it should work. It's quite comfortable to say, that once you've tasted a p2p software which based on 'partial file transfer', you would never want to go back. |
eDonkey isn't a good example, since it's one of the most poorly designed p2p-clients out there. |
Taliban, the java source for Tiger is available here: http://www.cryptix.org/products/jce/index.html It does not have the "Tree" functionality of course. |
Oh I'm not even going to try tiger tree since I'm not a good programmer at all. |
Quote:
|
The best? Hmm, edonk has some great features but has a long long way to go. The two greatest features that I would love to see in gnutella are: -ability to share partials (with tiger tree this can be done even better than it is with edonk) -ability to creat hyper links. I know BS is working on this. |
Quote:
you know? like "the one-eyed is king among the blinds" and right now nothing is better than edonkey when it comes to "real" filesharing. (i dont mean downloading an mp3, i mean getting a very large very specific file fast and reliable) Gnutella sure has the better networkdesign but as you already said is missing some important features in comparison to edonkey (and btw did you read http://www.zeropaid.com/news/article.../05152002a.php ?) [edit:] just to make sure, i love gnutella, would i be hanging around here otherwise? |
One of the reasons that many don't feel excited about eDonkey, besides its boring GUI, is becuase there are too many tricks (most undocumented) to go through before really take advantage of its powerful search and swarm file transfer. When I first used eDonkey, I've got usually 1-5kb/s download speed, now with aids from the 'Bot' and server list and all the proper configuration, the speed can go up to 100 kb/s. The speed here, 50kb/s to 100kb/s, I meant is very stable, it remains this speed in hours. When look into the list of sources that feed the download (of course, most are PARTIAL file), there are usually over hundred sources with a few of them transferring and the rest are in 'Queue' status. Many also hate eDonkey due to the fact that it forces you to share partial file (those still in the middle of downloading), and purposely slow you down sometimes (conspiracy) to force you to stay on line longer and sharing, so a MP3 file may appear taking longer to download than Gnutella, but in case of a DivX, it will take much less time and give you more confidence to have it in a a short amout of time. At the moment, all I need is eDonkey. Compare to what eDonkey can do, the rest of P2P world are really all jokes, take a look at http://www.sharereactor.com. However, the fear eDonkey may one day disappears is always there as it's a close source program, soon or later it will draw attention while it's becoming more and more a dominent deliver channel of the 'stuff' (luckily, not many tech media are paying attention to eDonkey's low profile status) That's why Gnutella needs to catch up and get the lead back while there is still chance. |
Re: Re: PFSP 0.1 - Very Very Rough Quote:
|
The X-Alternate-Location is added it to the HTTP header of the response when somebody is trying to download from you. |
file chunk size I am probably going to change the protocol from 1mb chunks -> either 1mb chunks or 5% of the file, which ever is LARGER. Anyone have any input, opinions, data, ect on this? |
I would skip the lines where it says the servents must not share partial files without tiger tree hash. With tiger tree you can download from those hosts as well, since you will see quickly if the data is not what you want. Before downloading a file, always get the tiger tree hash (of the 1MB chunks) first. Then simply download the file from any location you find and calculate the hash of each new MB you downloaded, so you can quickly verify you are downloading the right stuff. QueryReplies shouldn't contain Alternate-Locations. Another idea which saves gnutella-traffic: Servents should not return queryreplies for incomplete files at all. - If X-Alternate-Locations it HTTP- Headers work alright, each location having the complete file, will return X-Alternate-Locations of all servents that accessed the file recently, so you should be able to gather the locations quickly while connecting to all the hosts having the file. Servents may NOT search by sub-hash. Let's say you are sharing 20Gigs of data, that means you would have to keep 20,000 subhashes in a library and even search through this library. When a servent requests a file, he does request all the ranges he needs. The remote servent answers with either HTTP 206 including the ranges it has and including sending one the ranges or with HTTP 416 if it cannot satisfy the request. I would prefer it, if the ranges were requested and sent by one by one. Servents should not upload ranges of the file randomly but satisfy the ranges of the as they were requested, since I wouldn't want to break HTTP here. Servent settings (e.g. if they share partial files by default) don't belong into this protocol. |
thank you Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Thanks again for the feedback |
Quote:
|
updated I updated the protocol: http://groups.yahoo.com/group/the_gdf/message/7176 |
Quote:
In your proposal it is not a breach of the protocol not to support partial file sharing ( if it was you'd have to form a completely new network without all the older clients ), so it's up to the clients if they want to support it by default, optionally or not at all. |
minor update |
You can't do this because it's too much like freenet and will aid in the pirating of copyrighted software and we won't help you steal software! |
only aids in downloading pirated material if you look for pirated material |
PFSP 0.2 - Programers needed! Well, time for you programmers to add this amazing feature to all the open source clients out there. PFSP 0.2 A great thanks goes out to Tor for updating the PFSP 0.2. The PFSP is now usable and hopefully someone is willing to implement it. http://groups.yahoo.com/group/the_gdf/message/7984 ____________________________ Partial File Sharing Protocol 0.2 Here, the server is the host that is providing the file, and client is the host that requests the file. 1. Partial File Transfer The server assigns file indexes for partial files, and allows HTTP requests for them. Only partial requests (with a Range header) are accepted. Servers that supports uri-res file requests should also allow such requests for partial files. Servers should keep the file index when the file in completed and moved to the incoming files folder. The X-Available-Ranges header is used by the server to inform the client about what ranges are available. X-Available-Ranges: bytes=0-10,20-30 The client requests the ranges it wants using the Range header. Range: bytes=0- means the client wants any ranges the server can provide. The server then provides the range it wants to upload using a 206 Partial Content response. This allows the server to upload different ranges to different hosts, and save bandwidth by allowing them to get the other parts from each other. The 206 response contains a Content-Range header on the form Content-Range: bytes <start>-<end>/<total_size> Note that <total_size> is te size of the FULL file. If the server is unable to provide any part of the requested ranges, it returns a 416 Requested Range Not Satisfiable response. 2. Tree Hashes Tree hashes are not absolutely required for Partial File Sharing, so you don 't have to implement this part at first. TigerTree can be implemented if/when corrupt files become a problem. The reason that it is in this proposal is because Partial File Sharing might cause corrupt files to spread faster. TigerTree hashes are computed using a 1024 byte base size. It is then up to each vendor to decide how many sub-hashes to actually store. Storing (and advertising) the top 10 levels of the tree might be good decision. It would allow a resolution of about 2 MB on a 1 GB file, and requires only a about 25 kB of hash data per file. The tree is provided as specified in the Tree Hash EXchange format (THEX) at http://www.open-content.net/specs/dr...e-thex-01.html /> It basically says that the hash tree is provided as a long stream of binary data starting with the root hash, then the two hashes it is computed from, and so on. To inform the client about where the hash tree can be retrieved the server includes a X-Thex-URI header on this form X-Thex-URI: <URI> ; <ROOT> <URI> is any valid URI. It can be to a uri-res translator, and can even point to another host. The client can then retrieve desired parts of the hash tree by doing range requests for the specified URI. <ROOT> is the root TigerTree hash is base32 format. 3. How to find the location of partial files. This proposal does not affect Gnutella messages in any way. The only available mean of spreading the location of a partial file is through the download mesh in X-Gnutella-Alternate-Location headers. I think this should work very well. Since those who share a partial file are also downloading the same file, they will be able to send alt-loc headers to other hosts sharing the full file. It would be good if the available ranges could be specified in the X-Gnutella-Alternate-Location headers, but I don't really know how to do that most efficiently. The information would quickly become outdated, and is not very important anyway. Spreading partial files in the download mesh will cause servants that do not support partial file sharing to receive addresses to partial sources. I don't think that is a problem. The worst thing that can happen is that they won't be able to use those sources. 4. Sample negiotioation: Here is a sample negotiation. I don't think it will look exactly like this, but it should show the headers in action. Clients might want to request a small range first, to get the list of available ranges. There are some linebreakes in long headers below. Client: GET /get/1234/my_song.mp3 HTTP/1.1 User-Agent: FooBar/1.0 Host: 123.123.123.123:6346 Connection: Keep-Alive Range: bytes=73826- X-Gnutella-Content-URN: urn:sha1:QLFYWY2RI5WZCTEP6MJKR5CAFGP7FQ5X X-Gnutella-Alternate-Location: http://theclient.com:6346/get/2468/my_song.mp3 /> Server: HTTP/1.1 206 Partial Content Server: FooBar/1.0 Content-Type: audeo/mpeg Content-Range: bytes 73826-83825/533273 Content-Lenght: 10000 Connection: Keep-Alive X-Available-Ranges: bytes=0-285749 X-Gnutella-Content-URN: urn:sha1:QLFYWY2RI5WZCTEP6MJKR5CAFGP7FQ5X X-Thex-URI: /uri-res/n2x?urn:sha1:QLFYWY2RI5WZCTEP6MJKR5CAFGP7FQ5X;VEKX TRSJPTZJLY2IKG5F Q 2TCXK26SECFPP4DX7I <10000 bytes of data> "n2x" above is an example. Someone should comment on what should be used. Since the URI is provided in the X-Thex-URI header, each vendor can chose how to provide the THEX data. 5. Issues * A server can decide to upload only a part of the requested range. This means that clients cannot be sure to get the file in sequential order. * Also clients cannot decide how many bytes to download per request. Perhaps the server should be required to upload a range that begins with the first requested byte. |
Partial File Sharing Protocol 0.2.1 is now available. I don't expect there to be any big changes from this version. This document and earlier versions of it are now available for reference at http://groups.yahoo.com/group/the_gd...roposals/PFSP/ /Tor --------------------- Partial File Sharing Protocol 0.2.1 Here, the server is the host that is providing the file, and client is the host that requests the file. 1. Partial File Transfer The server allows HTTP requests for partial files, at URIs chosen by the server. They can for example be assigned a file index and shared at "/get/index/filename", or simply at "/partials/filename". Only partial requests (with a Range header) are accepted. Servers that support uri-res file requests should also allow such requests for partial files. Servers should make sure that the URI to a partial file does not become invalid when the file is completed. The X-Available-Ranges header is used by the server to inform the client about what ranges are available. X-Available-Ranges: bytes 0-10,20-30 The client requests the range it wants using the Range header. Range: bytes=0- means the client wants any ranges the server can provide. The server then provides the range it wants to upload using a 206 Partial Content response. This allows the server to upload different ranges to different hosts, and save bandwidth by allowing them to get the other parts from each other. The server can decide to upload any range inside the requested range. This means that the client cannot be sure that the first byte in the response is first requested byte. The 206 response contains a Content-Range header on the form Content-Range: bytes <start>-<end>/<total_size> Note that <total_size> is the size of the COMPLETE file. If the server is unable to provide any part of the requested range, it returns a "503 Requested Range Not Available" (the Reason Phrase is just my recommendation). If the client continues to request the same range, the server may send a 404. The X-Available-Ranges header will tell a PFSP enabled client what ranges it can request. If the client provides an "Accept:" header with "multipart/byteranges" in it, the server may respond with multiple ranges at once. The client may send multiple ranges in the Range: header if it sends an Accept header with multipart/byteranges in the same header set. This is standard HTTP/1.1 stuff, but I doubt that Gnutella servents will support it. If you do not want multipart support, just ignore it and everything will work fine. You should, however, be aware that there can be multiple ranges specified in one "Range:" header. Servents are then allowed to choose any range within the specified ranges, or simply read the first range only. 2. Tree Hashes Tree hashes are not absolutely required for Partial File Sharing, so you don't have to implement this part at first. TigerTree can be implemented if/when corrupt files become a problem. The reason that it is in this document is because Partial File Sharing might cause corrupt files to spread faster. TigerTree hashes are computed using a 1024 byte base size. It is then up to each vendor to decide how many sub-hashes to actually store. Storing (and advertising) the top 10 levels of the tree might be good decision. It would allow a resolution of about 2 MB on a 1 GB file, and requires only about 25 kB of hash data per file. The tree is provided as specified in the Tree Hash EXchange format (THEX) at http://www.open-content.net/specs/dr...e-thex-01.html It basically says that the hash tree is provided as a long stream of binary data starting with the root hash, then the two hashes it is computed from, and so on. To inform the client about where the hash tree can be retrieved the server includes an X-Thex-URI header on this form X-Thex-URI: <URI> ; <ROOT> <URI> is any valid URI. It can be to an uri-res translator, and can even point to another host. The client can then retrieve desired parts of the hash tree by doing range requests for the specified URI. The THEX data is shared as if it was a partial file. If a client requests a subrange of the THEX data that the server does not store, and is not willing to calculate on the fly, the server uses the same, routines if it was a partial file where the requested range is not available. <ROOT> is the root TigerTree hash is base32 format. 3. How to find the location of partial files. This protocol does not affect Gnutella messages in any way. The only available mean of spreading the location of a partial file is through the download mesh in X-Gnutella-Alternate-Location headers. I think this should work very well. Since those who share a partial file are also downloading the same file, they will be able to send alt-loc headers to other hosts sharing the full file. Spreading partial files in the download mesh will cause servents that do not support partial file sharing to receive addresses to partial sources. I don't think that is a problem. The worst thing that can happen is that they won't be able to use those sources. 4. Sample negotiation: Here is a sample negotiation. I don't think it will look exactly like this, but it should show the headers in action. Clients might want to request a small range first, to get the list of available ranges. There are some linebreakes in long headers below. Client: GET /get/partials/my_song.mp3 HTTP/1.1 User-Agent: FooBar/1.0 Host: 123.123.123.123:6346 Connection: Keep-Alive Range: bytes=73826- X-Gnutella-Content-URN: urn:sha1:QLFYWY2RI5WZCTEP6MJKR5CAFGP7FQ5X X-Gnutella-Alternate-Location: http://theclient.com:6346/get/2468/my_song.mp3 Server: HTTP/1.1 206 Partial Content Server: FooBar/1.0 Content-Type: audio/mpeg Content-Range: bytes 73826-83825/533273 Content-Length: 10000 Connection: Keep-Alive X-Available-Ranges: bytes 0-285749 X-Gnutella-Content-URN: urn:sha1:QLFYWY2RI5WZCTEP6MJKR5CAFGP7FQ5X X-Thex-URI: /uri-res/n2x?urn:sha1:QLFYWY2RI5WZCTEP6MJKR5CAFGP7FQ5X;VEKX TRSJPTZJLY2IKG5FQ 2TCXK26SECFPP4DX7I <10000 bytes of data> "n2x" above is an example. Someone should comment on what should be used. Since the URI is provided in the X-Thex-URI header, each vendor can chose how to provide the THEX data. |
now available in shareaza 1.4!? PFSP is available in shareaza. Havent tried it yet but the future of gnutella is here |
The future of Gnutella does not depend on one person or client it is everyone as a group. |
All times are GMT -7. The time now is 01:44 AM. |
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.
Copyright © 2020 Gnutella Forums.
All Rights Reserved.