Here is a very rough draft:
Partial File Sharing Protocol 0.1
1 Introduction
1.1 Purpose
Creation of an open protocol to allow the sharing of partially downloaded files between gnutella clients. There are many benefits to this and it can be done with the same (or even greater) confidence than the sharing of complete and hashed files.
The best way to share partial files is to use tree hashes. This
approach actually provides a greater confidence than the current full
file hash because the client could confirm segments of the file. The
current full file hash can only confirm that the file is the desired
file after it has been fully downloaded. If there is an error there
is no way to tell were it is and the full file must be discarded.
This could be a huge bandwidth waste, especially if throwing out a
700mb file.
An added bonus of the tree hash method is the ability to resume from
nearly identical files. This solves the problem we are having with
files that are the same but w/ different metadata. For example, as
soon as I play many video files, something in them changes and the
file cannot be swarmed any longer. The same problem exists when the
same song has different meta tags at the end. The current proposed
solution to the problem is to do a second hash of the file without
the portion containing the meta data. This is very file type specific
and offers little additional benefit when compared with the tree hash, which would offer the benefits of true swarming, partial file sharing,
and the ability to share nearly identical files.
For example:
I want to download songx, so I search for it and find it. There are
quite a few versions with the same size, bitrate, ect but they have
different metadata so the hash is different.
Well, with the tree hash you could use all of those sources to swarm
from for the parts of the file that are the same! This would
greatly increase swarming speeds while providing the same security
and confidence we currently have with hashed files!
2 Protocol Definition
2.1 Tiger Tree Hash
A Tiger Tree hash MUST be generated for each file shared by the client calculated by applying the Tiger hash to 1024-byte blocks of a stream, then combining the interim values through a binary hash tree.
Clients MUST NOT share partial files that have not had a Tiger Tree Hash value calculated.
See:
http://sourceforge.net/projects/tigertree/ http://www.cs.technion.ac.il/~biham/Reports/Tiger/ http://bitzi.com/developer/bitprint
2.2 Sub Hash Ranges
Clients MUST store sub ranges of 1mb sizes and may choose to store smaller ranges.
2.3 Transmissions and Display of Tiger Tree Hashes
3.1 Queries and Replies
Queries by clients that can handle Partial File Share MUST indicate this in the query (GGEP?).
All replies of partial files MUST indicate the ranges available and any X-Alternate-Locations for any parts of the file.
Clients SHOULD NOT display partial file results to the user UNLESS the location of a full file is found or the ranges returned cover the range of the full file.
3.1.2 Searching by Sub Hash
Clients MAY search by any 1mb sub hash.
3.2 Requesting Ranges
HTTP Range GETs are the best standard multivendor way to request
parts of larger file.
Clients SHOULD request all required ranges.
3.3 Uploading Partial Files
A client with a complete file SHOULD randomly upload ranges of the file. If ranges of the file are requested then the client SHOULD randomly choose which ranges to supply first. The intention is to propagate the whole file across the network as rapidly as possible. Clients MAY use X-Alternate-Locations to decide which ranges are rarest and preferentially upload those ranges.
3.4 Sharing Partial Files
Clients that are capable of sharing partial files MUST share partial files by default. Client MAY allow users to inactivate the sharing of partial files.
To Do:
-Add a new GGEP extension to queries that specify that you want to
see partial files
- Add a new GGEP extension to queryhits that simply specifies
Percentage complete for partial hits. (a simple partial/full flag
would do as well)
- In an HTTP Get request, add a X-Gnutella-Partial-File header that
lists the IP/Ports of servers thought to at least have X percentage
of a file. Do not list the percentage here.
- Servers that support partial file gets, should also support a new
CGI type request of a style like GET /uri-res/N2PR?
urn:sha1:PLSTHIPQGSSZTS5FJUPAKUZWUGYQYPFB where the returned payload
is a csv file of the tuple (start, stop, active)
-clients need to be able to indicate what the smallest increment of
hash they will provide is. Im not sure it makes sense to store all of
the 1024bit (?) hashes. Anyway, there should be a header indicating
that the client stores hashes as small as X. Data would be the best
way to decide of course but 1mb "feels" like a logical size.
-there should be a mechanism for searching by a sub-hash. I guess if
clients store 1mb hashes it would be possible to search by any of the
1mb hashes but it would probably be less computationaly intesive if
there was an agreed on sub-hash to search by. Im not sure how the
meta data at the start of a file works. If it is all of the same size
regardless of content (and I doubt this) then it would be possible to
search for sub-hashes after the first part of the file to do sub-hash
matches between nearly identicle files with different "early" meta
data. I guess it would make sense to either use the first 1mb or the
second 1mb for searching by sub-hash.