Gnutella Forums - View Single Post - Programming a Gnutella Client, part II: network core

Moak · #1 (**permalink**) January 30th, 2002

(Version 0.999b, Author: Mark "Moak" Seuffert, Editing: TruStarWarrior)

Programming your own Gnutella servent, after reading protocol specification and tons of documentation [1], usually begins with client/server programming, which quickly raises the question, "Which is a good network concept?". I have collected ideas about Gnutella network design and will describe a so-called 'asynchronous server' design here. Before I begin, let me say that this article is intended for advanced programmers. Please feel free to ask questions, make comment or help improve this article!

Ideas and goals for a Gnutella network core, in no specific order:

- flexible socket handling (listening socket, incoming client socket and outgoing socket)
- buffered data stream (line buffered, block buffered, file streaming and output buffering)
- low memory and CPU consumption (superpeer suitable)
- handle high load and high number of connections
- SOCKS proxy ability
- portability for platforms supporting Berkely sockets (Unix, Windows, etc.)
- reusability of code for other Internet projects (base class is not Gnutella specific)
- highly object orientated (OO)

Basic thoughts
An asynchronous server will notify you when something interesting happens, e.g. about a new incoming connection, when data could be sent, or data could be received on a socket. So you do not have to poll for events and do not have to start a new thread for every incoming client connection. Instead, the server will wait for changes and then notifies you within a single thread. For example, the server calls an OnReceive() function whenever new data arrives or OnSend() when more data could be sent.

I chose an asynchronous server because it is very flexible, fast and without any thread overhead, strong in handling a high number of connections (perfect for superpeers) and easy in synchronization (everything within one thread, not much semaphores). If you are not familiar with socket concepts, the possible alternatives are:
a) blocking servers (handling only one request at a time, then wait again blocking)
b) threaded servers (handling simultaneous requests, one thread for every new socket).
c) asynchronous server (handling simultaneous requests, one server thread will handle all sockets)
The server model always depends on application requirements. For example, the well-known Apache webserver is a pre-threaded server, which is a variation of the threaded server (minimized thread overhead, but more complicate). Apache's concept is great for handling thousands (!) of small requests with multiple CPUs. Gnutella usually has longer, persistent connections and runs on standard PCs, so I recommend an asynchronous server model here.

Well, if you're interested in a Windows only Gnutella servent (eeks, not my idea) you can use MS Windows CAsnycSocket class [2] and derive a class for further improvements. I prefer writing my own class because it will basically be compatible with CAsyncSocket, the design of which I really like after having used it in past projects. Advantages: your own class will be fully portable on all platforms supporting Berkeley sockets (your network core will run on Linux, Windows, Mac, etc)... and then you are "forced" to understand what's going on, grin. However, if you don't want to invent your own socket class, then use existing code, e.g. Trolltech's QT library [5] (Qtella is a nice example servent [6]) or as already mentioned, Microsoft's CAsyncSocket.
Perhaps you want to spend some more time to study basics if you have never programmed Internet applications [3] [4]. In any case, please take a short look into Microsoft's MFC class CAsnycSocket. It gives you a taste of what to do in your own code design, and it's very well documented. Alternatively, check the Mutella source code [7], see file asyncsocket.cpp. Mutella is a sweet Unix console client. Unfortunately, it needs more code documentation, but according to Mutella's author, Max Zaitsev, asking specific questions are always appreciated.
When you're familiar with object orientated programming and 'design patterns' have a look on 'Reactor', it describes the ansychronous server model or event-driven multiplexing we use here [13].

Talking Gnutella
Now let's have a look into the Gnutella protocol [8]. There are two different kind of data streams: 1) line buffered in connection handshaking or HTTP file requests and 2) block buffered in binary Gnutella messages. Let me anticipate there is also a third type, 3) file streaming in file transfers. You might say, "We can read a whole file into a memory block and so it's block buffered too...". But files could be huge, and reading a multi MB file into main memory isn't a good idea. Also, reading blocks of a file is unnecessary (without better reason), since all file streams are already prebuffered from your OS or low-level library. So we come to a third kind of data stream - file streaming... Actually it is just read/write (on memory mapped I/O) when we come to implementation.

Okay, let's take a closer look, since a brief understanding of what happens in a Gnutella servent is important before stepping further. We take the Gnutella v0.4 protocol for simplicity (upcoming v0.6 protocol would make no difference here) and trace a client-server connection: A client will connect to the servers listening port, the server will recognize this and establish a new client socket for further communication. Now the handshaking begins [9].
The client sends:

GNUTELLA CONNECT/0.4<lf><lf>

The server responds with:

GNUTELLA OK<lf><lf>

All lines are terminated with a double line feed.
An HTTP file request is similar. When simplified, it looks like this:

GET /get/1283/gnutti.deb HTTP/1.0<cr><lf>

We call this type of data stream line buffered. A possible implementation: collect every piece of incoming data until you find a <lf> and then return the full input line. Your server will read and evaluate this line by line.
After this handshaking, the connection is established and will stay permanent (as long as no side shuts down). Client and server continue to send the binary Gnutella messages. Now, the interesting part begins! A Gnutella message is called a 'descriptor' and consists of a header plus the payload. The binary data stream looks like this

(header)(payload ....)(header)(payload ....) etc...

The header has a fixed length of 23 bytes, the payload is between 0 - 64K bytes. There are no separators in the Gnutella data stream, so you have to read the payload length from the header. We call this type of data stream block buffered. A possible implementation: read 23 bytes into a buffer, which is for the header. Assign a new buffer from the previously read header payload length and read the payload into this buffer. Your server will read and evaluate data block by data block, and then answer with its own messages (e.g. 'QueryHits' and 'Pongs') or just forward it to other servents. So far, those are the basic network functions. Are you ready to go a step further?

Buffering traffic
As we already know, we have a buffered input, which is either line-by-line or block-by-block. What about the output traffic?
You might think we send the output data as soon as we have it, so just send & forget it, right? Actually this wouldn't work! Gnutella uses TCP/IP for transportation of packets across the Internet. TCP is a stream protocol. This means that if you send 100 bytes, the receiving end could receive all 100 bytes at once, or four 25-byte chunks, or whatever. In any case, the receiving peer has to acknowledge that it's ready to receive more bytes. The other peer might be connected via a slower connection (e.g. modem user) and couldn't receive packets with the same speed as you send them out. Your computer makes sure it does not send out more bytes than the other peer can receive and therefore stop sending more data. The solution for this problem is a buffered output.
A possible implementation is simple: We write every output into a big output buffer (one for every socket), and return immediately. In the background, all the buffered data is transmitted whenever a socket is ready for more data.

The application output buffer in common clients is about 64K per socket, at least big enough to buffer a few messages. What happens if the other peer is too slow to receive the Gnutella data buffer? This shouldn't happen for a long period of time, but it could happen. In this case you need to drop single descriptors (this means that you don't copy them to the output buffer). If the connection is very slow or the other peer doesn't accept any byte for a long time, you should drop the connection! Also, don't forget to catch TCP error messages, if an error occurs (e.g. peer disappears) drop the connection.

Again let's take a closer look what's going on inside a Gnutella servent. Lets assume the connection is already established and a new message arrives, e.g. a Gnutella 'Query' descriptor:

- new data arrives
- header is stored in a buffer: calling new() operator with fixed length of 23 bytes
- payload is stored in a buffer: calling new() operator with variable length
- message is evaluated: TTL decreased, hops increased, generate 'QueryHits'
- message is forwarded: calling memcpy() to copy message in every output buffer of connected servents
- free temporary buffers: calling free() twice
- start from top

Another kind of traffic is the file transfer. It's different from the binary message traffic because it doesn't need a output buffer for intermediate storage; we just send a local file. Let's have a look. A connection is already established and a file will be requested:

- new data arrives
- request is stored in buffer (line buffer): using CString or equivalent string container class.
- request is evaluated: parse HTTP request and search database for local shared files
- send file (direct file streaming with memory mapped I/O) or send error message
- close socket

Okay, so far, we have our basic network design. You could already write a small Gnutella servent. Congratulations! Let's improve our network design and search for possible weak spots....

Advanced thoughts: Reference buffering
Did you notice that nearly all the Gnutella message traffic is copied between buffers (memcpy)? Every routed message is copied from an input buffer into one or more output buffers. This could be a lot of work (higher CPU consumption) especially for a superpeer or Gnutella proxy with a constant traffic.

Is there a way around calling memcpy() repeatedly? Yes, but it's not simple and I'm not sure if it's worth the extra code...But nonetheless, here it is: Instead of copying every routed message into an output buffer, we keep each input buffer and just add a pointer to an output queue. The previously used output buffer (see above) is replaced with an output queue that collects pointers to buffers that has to be sent. Since a message could be sent to multiple queues (each for every connected servent) we keep a reference on how often the input buffer still has to be sent. When the reference reaches zero, the buffer is released.
I think you can imagine the extra code we have to add to our socket class. But with this 'reference buffering', we avoid repeated calls of memcpy() for routing messages and gain a lower CPU consumption! I would be glad to hear any feedback!

There is another weak spot in the message routing. We repeatedly allocate memory and release it. While Gnutella messages have different sizes, we constantly allocate memory blocks of different size. This might cause a heap fragmentation. I expect a normal servent will not care about it, but consider a superpeer that runs for days. It might be worth to improve memory management again.
As a solution, I suggest that a 'buffer memory management' is added. This allocates memory blocks of fixed size. If the user requests a buffer bigger than the defined block size, the memory management allocates multiple blocks and links them together in a list. The memory blocks should be pre-allocated in a dynamic memory pool to avoid calling new() and free() unnecessarily. This pool does allocate memory blocks on startup. More are allocated when required. Also, we need different sizes in our memory pool. At least 23 byte blocks for headers and even bigger blocks (e.g. 1 KB chunks) for the payload. You may want to add another granularity (different buffer block sizes for efficient storage).
Once again, this requires some extra code for our socket class (split it up into a own buffer management class). But we avoid repeated calls of new() + free() and prevent a heap fragmentation. Again, I would be glad to hear any feedback!

Advanced thoughts: Packets and MTU
Newcomers to network programming almost always run into the same problems, expecting data packets to be transferred and received in the same way you sent them out. Let me say this: packets are illusions!

When we talked about 'buffering traffic', we already mentioned the streaming nature of TCP/IP. When sending 100 bytes to another peer, do not expect these to be received from the other peer at once. It might be split up into smaller chunks, but not necessarily. So, for example, when you need to receive 2000 bytes, it means you must collect byte chunks until you've got 2000 bytes. Always collect bytes from a TCP socket no matter how few bytes you expect!
Did you remember the OnReceive() function from the beginning? An asynchronous server will notify you when (more) bytes have been received and could be read from a socket. The function of a buffered socket class is to collect those packets until the requested block size is reached (for block buffered data stream) or until a delimiter is received, e.g. <lf> (for line buffered data streams).
Some socket classes do not provide buffered streams out of the box, e.g. previously mentioned CAsyncSocket. Such a low level class is perfect for further modifications since it's only a raw interface to the sockets. I suggest that you write a class named CAsyncSocketBuffered, which would provide a flexible input and output buffering system for every kind of Internet project.

So far, we've talked about packets in TCP. Now, let's talk about how to use them effectively! Yeah, let's do some hardcore stuff!

Basically, when you send out data on a TCP socket with a call of send(), it will be sent out immediately with a TCP packet. Experienced user say, "Oh, I build on TCP's Nagle algorithm" [4] [10]: The data will be buffered to build a full frame (Nagle algorithm) and be sent as fast as possible (avoiding delayed ACK roundtrips). As result the line is used with full efficiency... So far, this is merely theory!
This isn't true when sending huge amounts of data or an other peer has a rude behaviour, e.g. calling receive() with a small number of bytes. In both cases, the TCP/IP stack can't optimal buffer your data streams anymore. TCP is very sensitive to the sizes of send and receive buffers. In Gnutella, we sometimes want to fire up the line and need to reach maximum transfer speed.
Well, the ideal case in networking is that each program always sends a full frame of data with each call to send(). This maximizes the percentage of useful data in a packet and minimizes data stream delays. The worst case would be when small packets are sent with the transmission of single-character messages. These result in 41 byte packets (one byte of data, 40 bytes of TCP/IP header).
Now, what is a full frame size? It's the MTU (Maximum Transmission Unit) of your network interface. The MTU is the largest packet that a given network medium can carry. Ethernet, for example, has a MTU of about 1500 bytes. The current size always depends on your OS and transportation interface. However, when a frame is translated from one type of network to another by a gateway or a router, a frame can be fragmented if the destination network type has a smaller MTU. For gaining maximum speed, we don't care about fragmentation, since your router or ISP will handle it for you and we just try our best!
When we speak about sending full frames we mean the exact available size. Query your interface MTU and subtract 40 bytes for the TCP/IP header, you get the so called MSS "Maximum Segment Size" (for TCP it's MTU - 40 bytes).
Now, what about sending data as fast as possible? From what we discussed until now we send byte chunks of MMS size and so achieve sending full frames. Pretty good, there is still a small problem left. TCP does do some kind of synchronization between frames, a peer does send a so called "ACK" (acknowledge) to sign it receives the last package: Peer A sending package, peer B receiving package, B sending ACK to A. To avoid long ACK roundtrips (delays between frames), we don't wait for ACKs and instead feed the TCP/IP stack with more data. A size of 3*MMS promises best performance and works perfectly together with Nagle (details discussed by Comer and Lin [12]).

A possible implementation: First, query your interface MTU and calculate the MSS (MSS = MTU-40). Set your low level TCP/IP send buffer to at least 3 times as large as the MSS (call setsockopt(s, SOL_SOCKET, SO_SNDBUF, ...) before bind/listen). Second, read full MSS chunks with a call of receive(), write full 3 * MSS chunks with a call of send(), whenever possible. Both points are important!
All application buffers (those to store line-buffered, block-buffered traffic or gnutella messages) will be allocated in a MSS orientated size, here is a formula: size = n * (MTU-40). If you have problems querying the correct MTU size ('ifconfig' on Unix, strange registry setting for Windows), then choose a INI-file solution. Finding the correct MTU is indeed difficult, because you also have to know the corresponding device of the local internet connection... usually this is the default route ('route' on Unix/Windows), but you can't count on that. An INI-file presetting of MTU = 1500 could be an acceptable solution.
What about our socket class with 'reference buffering' (see paragraph above)? Here we have small blocks, 23 bytes for headers, and bigger blocks for the payload. Again, send full frames whenever possible. Choose a MSS orientated block size for the payload chunks. Quite easy. You might think about concenating smaller packets, but I recommend that you send them directly since it occurs seldom. A typical Gnutella traffic consists of:
- small messages ('Ping' 23 bytes, 'Pong' 37 bytes), low percentage in future
- big messages ('Query' and 'QueryHit', with more metadata and grouped queries > MSS)
- file transfer and upcoming file tunneling (many MSS chunks)

Finally, we gained a high TCP performance with a high data flow, low CPU and low memory consumption. Feel free to improve your code and play around with alternatives! A great source for advanced TCP/IP programming are the books "Unix Network Programming, Stevens" [14] and "Effective TCP/IP Programming, Snader" [15]. Highly recommended on every network programmers desk.

Advanced thoughts: Firewalls
Firewalls become more common, they provide a higher security against possible attacks and block unwanted connections. From the view of a socket class, there are only three possibilities:

a) traffic goes through
b) traffic is blocked
c) traffic is blocked but a proxy is available.

We ignore the first two cases since we can't do anything against it. We need to handle the third case, which occurs when a proxy is available and we can't connect directly. The function of a proxy is to take traffic or requests and route them to the destination. An additional authorization handshake might be required to use a proxy. Common proxy software are SOCKS4 and SOCKS5 [11].
Adding proxy functionality to your socket class isn't very complicated. Unfortunately, we haven't the time to describe those details here. Simplified, you would route traffic via the proxy and add a transparent layer for authorization handshaking. Transparent means you don't have to change your existing code, only your socket class. Various sample source code and libraries are available on Internet. Ask your favorite search engine.

From a more general point of view, firewalls are much more complex. Consider Gnutella's PUSH descriptor when it comes to sharing files with "firewalled" peers (here, firewalled means that the peer can't handle incoming connections). Well, this is a subject for an entirely separate article.

I hope you had fun thinking about the insides of Gnutella! The idea for this article was to promote new developers and help them to code a own Gnutella client or understand existing ones. Before you ask, I will say this: Sorry, I don't have a socket class finished. I will let you know as soon as I find time to do so. If you're interested to cooperate, let me know. Your feedback is appreciated!

[1] Programming a Gnutella Client - http://www.gnutellaforums.com/showth...&threadid=4638
[2] MSDN Class CAsnycSocket - http://msdn.microsoft.com/library/en...syncsocket.asp
[3] TCP Stream Socket Application - http://msdn.microsoft.com/library/en...winsock_14.asp
[4] Winsock Programmer's FAQ - http://tangentsoft.net/wskfaq/index.html
[5] Trolltech QT Network Class - http://doc.trolltech.com/3.0/network.html
[6] Qtella Source Code - http://www.gnutelladev.com/source/qtella.html
[7] Mutella Source Code - http://www.gnutelladev.com/source/mutella.html
[8] Gnutella v0.4 Protocol - http://rfc-gnutella.sf.net/Developme...0_4-rev1_2.pdf
[9] Gnutella Handshaking Overview - http://www.gnucleus.com/research/connect.html
[10] Congestion Control in IP/TCP - http://www.faqs.org/rfcs/rfc896.html
[11] SOCKS Proxy - http://www.socks.nec.com/socksprot.html
[12] TCP Buffering and Performance, Comer and Lin, http://citeseer.nj.nec.com/comer95tcp.html
[13] Design Pattern 'Reactor' - http://www.cs.wustl.edu/~schmidt/patterns-ace.html
[14] Unix Network Programming, Volume 1, Second Edition, Stevens, ISBN 0-13-490012-X
[15] Effective TCP/IP Programming, Snader, ISBN 0-201-61589-4

Thx to TruStar, Tama, Tremix, Maksik, Morgwen