Hi, talking with Tamama, Emixode and Bmk I found it necesarry to overwork the proposal a little bit. :-)
Main goals for a new 0.7 protocol are:
- full HTTP alike connections, to achieve an easy parser
- straight and simple connection sheme, easy and fast
- very flexible desgin for later needs, avoiding a 0.8 in near future *g*
- don't hurt old v0.4 servants, let them still be operable
Appendix A: Extended HTTP style connecting
First of all we make the connection sequence full HTTP like (to use the same parser on all incoming connections). I suggest to use 'CONNECT REQUEST GNUTELLA/0.7', the response 'GNUTELLA/0.7 200 OK' is allready correct.
Then I suggest a slightly different connection handshake to what I wrote in the original post. A v0.7 servant will implement too alternatives:
* 1st handshake alternative:
Fast-Connect
Overview: Default handshake, fixed 2 steps (may be repeated in case of error codes)
Here is a sample interaction:
Code:
Client Server Comments
-----------------------------------------------------------
CONNECT REQUEST GNUTELLA/0.7<cr><lf> <- step 1
User-Agent: AoloA/1.0<cr><lf>
Query-Routing: 0.2, 0.1<cr><lf>
<cr><lf>
GNUTELLA/0.7 200 OK<cr><lf> <- step 2
User-Agent: Peeranha/1.2<cr><lf>
Query-Routing: 0.1<cr><lf>
Your-IP: 194.246.250.222<cr><lf>
<cr><lf>
[binary messages] [binary messages]
The Fast-Connect is default, it will be the most used connection type. Easy and simple. Each client tells usually _all_ (interesting) features it supports in the headers, once. Unknown headers are ignored, for known headers the from both highest supported version is used. After those two steps (two clients = two steps) the binary messages start automatically.
Two exceptions: a) If the server does not respond with '200 OK' the same procedure is started again (perhaps the server disconnects, but it doesn't need to). b) If the client or the server sends an Header 'Full-Handshake: Yes', they _have_ to continue with the full handshake (described next).
* 2nd handshake alternative:
Full-Handshake
Overview: Optional, minimum 4 steps (+ 2*x steps)
Here is a sample interaction:
Code:
Client Server Comments
-----------------------------------------------------------
CONNECT REQUEST GNUTELLA/0.7<cr><lf>
User-Agent: CatDog/2.4<cr><lf>
Query-Routing: 0.1<cr><lf>
<cr><lf>
GNUTELLA/0.7 200 OK<cr><lf>
User-Agent: CatDog/2.5<cr><lf>
Query-Routing: 0.2, 0.1<cr><lf>
Your-IP: 194.246.250.222<cr><lf>
Full-Handshake: Yes<cr><lf> <- recognize this
Query-Routing-Extra: feature xyz requested<cr><lf>
<cr><lf>
CONNECT HANDSHAKE GNUTELLA/0.7<cr><lf> <- more handshake
Query-Routing-Extra: feature xyz accepted<cr><lf>
<cr><lf>
GNUTELLA/0.7 200 OK<cr><lf>
<cr><lf>
START BINARY GNUTELLA/0.7<cr><lf> <- ending handshake
<cr><lf>
GNUTELLA/0.7 200 OK<cr><lf>
<cr><lf>
[binary messages] [binary messages]
The Full-Handshake is similar to number 1, optional, whenever one side needs/requests it with the 'Full-Handshake'-header (see comments above). The first 2 steps are the same as in number 1, but this alternative requires an additional 2 step ending handshake before beginning the binary messages (= 4 steps minimum). In the example above you will see two more handshakes steps in between (= 6 steps), even more are possible. The ending handshake is needed as a seperator/transition from HTTP handshaking to binary data, because this alternative allows more then 2 fixed steps. In contrast to the original v0.7 proposal, this is less "hacky". :-)
Why doing all this? The whole connection sheme was designed for easy parsing + effectivity + flexibility. The technical background: While we mostly will use the first intuitive two-step handshake, we introduced the optional flexible handshake for future/extended use (currently we do not need it, especially if someone wants to transfar proprietary data then let those clients deal with it, proprietary is nothing that a handshake should care about, better improve the protocol for the sake of everyone). Especially the second alternative may look oversized, but after long discussions I think this is the way to fit all needs and it's highly HTTP orientated (HTTP we allready use and it promises to be a robust and successfull design). Developers, plz play arround with all posibilities you could imagine, this design should hopefully make your code easy & flexible.
Summary: A V0.7 client will understand the following HTTP and GNUTELLA methods: GET, PUT, CONNECT, START (and GIV, GNUTELLA).
GET for downloading files - e.g. 'GET /get/1283/gnutti.deb HTTP/1.0'
PUT for uploading files (was GIV in v0.4) - e.g. 'PUT 1283:72814A49E69D0F43AAB400/gnutti.deb HTTP/1.0'
GIV for backwards compatibility only, see PUT - e.g. 'GIV 1283:72814A49E69D0F43AAB400/gnutti.deb HTTP/1.0'
CONNECT to initiate a connection handshake - e.g. 'CONNECT REQUEST GNUTELLA/0.7'
CONNECT to continue a started connection handshake - e.g. 'CONNECT HANDSHAKE GNUTELLA/0.7'
GNUTELLA for backwards compatibility only, see CONNECT - e.g. 'GNUTELLA CONNECT/0.4'
START to end the handshake and start the binary data stream - e.g. 'START BINARY GNUTELLA/0.7'
Notes: Do not continue an established handshake with CONNECT REQUEST (this will reset headers and start over), use CONNECT HANDSHAKE. You should not add proprietary data to a Fast-Connect, it would be ignored by all other clients and wastes bandwith, instead take a look at the Full-Handshake example.
Some geek notes: Our Gnutella specification consists of HTTP 1.0 methods (GET, PUT) and GNUTELLA specific methods (e.g. CONNECT, START), all are full HTTP like. Our Gnutella specific methods allow multiple headers, so they actually look more like a HTTP 1.1 request. We define 'Content-Length=0' if the Conten-Length header is not given in Gnutella specific methods (Tamama *g*). This avoids adding 'Content-Length=0' to every CONNECT|START request, since we have no content/payload here.
Response codes are typcial HTTP like responses including status code - e.g. 'GNUTELLA/0.7 200 OK'.
[1] Appendix B: New GUID-Tagging
We introduce a new GUID-Tagging style, first the documentation then additional explanations:
* Byte 16 is used to show the highest supported protocol version, pseudo code: guid[15]=0x07 in this protocol version (up to v25.5 is possible with 8 bits, every value below 4 is treated as v0.4).
* Byte 9 is used to sign if the peer supports special features, bitcoded, low active, 0xFF means no features (backwards compatibility with older clients). The coded features in those bits are important for routing/broadcasting messages:
Bit 0: Peer acts as superpeer (important for network structure)
Bit 1: Peer acts as proxy/tunneling peer (important for firewalled or NAT-routed hosts)
Bit 2: Peer does understand and request metadata
Bit 3: Peer does understand and request file hash
Bit 4: Peer does understand and use UNICODE (UTF-8 encoded)
Bit 5: Peer does have chat support (important for community idea)
Bit 6-7: reserved
The 2 higest bits are reserved/locked and should be allways 1! Again backwards compatibilty, very old clients had a 10XXXXXX setting, which we have to avoid. The default for current v0.4/v0.6 clients is 11111111 which is also our default = no features.
Why using (oldfashioned) GUID tagging over Handshaking? No, it's a new idea behind and both work together. GUID-features are routed along the network (tell status features a far away host), while handshaking features can only used with the direct connected hosts. You can shake hands with direct connected hosts, but not with a hop=5 away peer.
Mainly the GUID feature bits are used to avoid unnecessarily routed/broadcasted messages, e.g. only when a peer understands metadata you need to send him metadata. When a peer just searches for alternative download locations (searching with a hash), the other peers should not answer with metadatas to save bandwith. Secondarily GUID protocol version provides a faster connect. A host cache will receive a PING from any host connecting, it will answer with collected PONGs. [2] Those PONGs will now carry the Gnutella protocol version of the servant inside the GUID. E.g. you can now directly connect an old 0.4 servant with the old protocol or only connect to superpeers. Connecting is faster.
Notes: Feature bits should be discussed, before finishing this protocol version. To detect client version and features use the following pseudo code (this is NO 100% guarantee!):
int protocol = 4; // fallback to v0.4
struct {
unsigned reserved : 2;
unsigned bChat : 1;
unsigned bUnicode : 1;
unsigned bFileHash : 1;
unsigned bMetadata : 1;
unsigned bProxyTunnel : 1;
unsigned bSuperpeer : 1;
} features = 0; //no features yet
if((guid[8] & 0xC0)=0x80) protocol=4;
else if(guid[15]>=0x04) protocol=guid[15];
if(protocol >= 7) features = guid[8] ^ 0xFF;
Appendix C: Character set and UNICODE expansion
While talking with some other developers, I found out that my idea was too short/confusing, here is a more detailed suggestion:
* First use ISO_LATIN 1 as basic charset in all HTTP header and/or Gnutella strings (propably you allready did this). This won't hurt anyone and will fit most European/American/Australien needs to transport national special characters (e.g деж). But... and this is a big "but", this character set will not fit asian, russian needs and more languages. While Gnutella messages are routed through various servants it is impossible to translate one chracter set into another, the solution is UNICODE.
* Second use optional UNICODE in Gnutella strings (Query/Queryhits), if you really need it. Unfourtunately Unicode is a two byte code per character + Unicode contains zero bytes (A = 'a', '\0') which is a problem too. So Unicode must be encoded, we use UTF-8 (see later post for details).
Why not using Unicode allways? Oops, the binary datastream would not be backwards compatible with v0.4 and it would also blow up Query/Queryhit unnecessarily compared to ISO_LATIN 1 (1 byte vs multibyte characters, bigger size = bigger query traffic, eeks). Why not using another character set than ISO_LATIN 1? Whatever you choose it wouldn't fit a world wide need. Do we need to negotiate UNICODE in the connect handshake? No, since UNICODE is routed in Query/Queryhits it will travel the whole network mesh and is not affected by direct connected hosts. Instead we use the GUI-tagging (see above).
Notes: The two step solution above is my suggestion, perhaps is is better to skip Latin1 completly and use UTF-8 only (topic of further investgations). With the two step solutions all clients speak a common character set (ISO_LATIN 1), very easy to implement, the UNICODE (UTF-8) is optional for clients which want to offer an extended language support or do not use ISO_LATIN1 codepage on their system. While UNICODE (UTF-8) is encoded inside the Query/Queryhit every "basic" client will route it without any problem. Okay, an old client will not understand it, but that is no problem, e.g. a non asian client will usually have no asian files.
I might post a new proposal for a "specialized horizons" header. This allows to group similar interests and traffic together + keep in touch world wide. In this special context here it will help to route e.g. asian unicode primary between ppl offering asian content.
Appendix D: Backwards/Upwards compatibilty
Backwards compatibility with incoming protocol v0.4 is full guranteed. All incoming v0.4 clients should be answered with the simple v0.4 handshaking. Outgoing connections should be started depending on the GUID tagging, however if v0.7 doesn't suceed try v0.4. About v0.6 backwards compatibilty, while the handshake is incompatible with v0.7, I suggest to make this decision client vendor dependent, if you wanna support v0.6 just do it, it's optional.
Here is a sample handshake, old client knocking at the door:
Code:
Client Server Comments
-----------------------------------------------------------
GNUTELLA CONNECT/0.4<lf><lf> <- old client
GNUTELLA OK<lf><lf> <- ok, we speak old 0.4
[binary messages] [binary messages]
Upwards compatibility is also supported (as long as the handshake is compatible, hopefully it is).
Here is a sample handshake, newer client tries to connect:
Code:
Client Server Comments
-----------------------------------------------------------
CONNECT REQUEST GNUTELLA/0.8<cr><lf> <- new version!?!
User-Agent: AoloA/1.8<cr><lf>
<cr><lf>
GNUTELLA/0.7 501 Not Implemented<cr><lf>
User-Agent: AoloA/1.7<cr><lf>
Gnutella-Protocol: 0.7, 0.4
<cr><lf> <- Server doesn't disconnect
CONNECT REQUEST GNUTELLA/0.7<cr><lf> <- Client starts over
User-Agent: AoloA/1.8<cr><lf>
<cr><lf>
GNUTELLA/0.7 200 OK<cr><lf>
User-Agent: Aoloa/1.7<cr><lf>
Your-IP: 194.246.250.222<cr><lf>
<cr><lf>
[binary messages] [binary messages]
Appendix E: Misc
More stuff for brainstorming and documentation....
Let's collect and write down everything that was introduced since v0.4 and is fact in modern clients! I think about blocking webbrowsers (with Referrer header), what more?
Let's think about a chat specification. Perhaps IRC based or IRC DCC for inbetween client communication, perhaps HTTP based.
Integrate the Bye-Descriptor [3] into the v0.7 protocol. Make this message also full HTTP (pure text string, no byte codes) [1]. Perhaps we should also notify downloader and send them a new HTTP BYE message?
Define clients research and retry behaviour. When a peer drops the connection, you want to resume/reconnect. Such a retry should be well defined to avoid "hammering" foreign IPs.
Most important points at the end, a big TODO, workout and define protocol features for: superpeers, dynamic traffic routing, hashs, metadata and (friendly) anti-freeloading behaviour (we had various posts in this forum discussing those topics).
Ah, don't forget to vote for 'Superpeer'... not Ultra-Hyper-Giga-Peer. :-) Keep Marketing and prorprietary ideas out of Gnutella!
Comments, more ideas? Greets, Moak
PS: I might summarize all ideas into one paper later, we still need to work out this proposal. Feedback is highly welcome! Send me a PM (private message) or meet me on IRC. :-)
[1] HTTP/1.0 Status Codes, RFC 1945 -
http://www1.ics.uci.edu/pub/ietf/htt...l#Status-Codes
[2] Gnutella Host Caches -
http://www.gnutellaforums.com/showth...&threadid=5807
[3] Bye Descriptor -
http://groups.yahoo.com/group/the_gd...ls/BYE/Bye.txt (Yahoo account required)