|
Register | FAQ | The Twelve Commandments | Members List | Calendar | Arcade | Find the Best VPN | Today's Posts | Search |
General Gnutella Development Discussion For general discussion about Gnutella development. |
| LinkBack | Thread Tools | Display Modes |
| |||
Quote:
<p> Broadcast NEVER cross subnets. Multicasts don't cross subnets if you don't have a multicast router (which is common, I agree) or you set your TTL to 1. The most common configuration today is a switched network that sniffs the IGMP membership messages and will only flood multicasts to interested hosts, thus reducing any potential negitave impact on the network. Sure, the multicasts can't span subnets on most networks today, but if you would bother reading what I frigging wrote you'd see that I was saying that they were simply a smarter then broadcasts, even in the TTL=1 case.. Even in the most old & crappy network, the multicast traffic will still reach all the hosts it has to (in such a network it will be functionally the same as broadcasts). <p>The IETF generally considers the use of broadcast to be depreciated for any new protocol (look at OSPF, broadcast would work fine, but why send the packets to unintrested parties).<p>Finally, most affordable switches don't give you the tools you need to block those broadcasts, you simply can't filter out all broadcasts as you need ARP on most networks. <p> If you want to be ignorant, thats fine, but please don't influence protocol design in areas you obviously don't understand. |
| |||
Quote:
|
| |||
UTF, UNICODE technical (long) As to why the new protocol should extend to UNICODE, and why implement this using UTF-8: UNICODE aspires to define all characters of all languages. Right now, an address space of 2byte (about 64,000 characters) has been defined to cover most languages. This is being extended to 4bytes, but let's keep it at 2 bytes for now. UTF (more correctly UTF-8) as well as UCS are ways to express the 2byte-number (I skip the 4byte UNICODE) for a character. UCS simply is the number in 2bytes, thus it may contain null-bytes. Normally when talking about UNICODE, the UCS-2 (= 2 bytes) method of expressing UNICODE is being refered to. UTF or more correctly UTF-8 uses 1, 2 or 3 bytes to express the 2byte number for a UNICODE character. Null bytes do not occur. This works as follows: <table border=1 cols=4><tr><td>UNICODE character number range (in hex)</td><td>UTF byte 1 (in binary)</td><td>UTF byte 2</td><td>UTF byte 3</td><td></tr><tr><td>0000 - 007f</td><td>0xxxxxxx</td><td>(none)</td><td>(none)</td><td></tr> <tr><td>0080 - 07ff</td><td>110xxxxx</td><td>10xxxxxx</td><td>(none)</td></tr><tr><td>07ff - ffff</td><td>1110xxxx</td><td>10xxxxxx</td><td>10xxxxxx</td><td></tr></table> <i><font color=red>UTF can also have 4 bytes, and using the same scheme express a character number up to U+10ffff. That won't be relevant right now, but may be in future. Provisions should be taken for upward compatibility with possible 4-byte UTF code sequences.</font></i> The first byte of a UTF sequence gives its length in the highest value bits up to the first 0-bit, the following 1 or 2 bytes are easily recognizable as belonging to an UTF sequence by their 2 highest value bits, having a value between 80 and BF. The bits here marked as 'x' give the number of the character in the UNICODE table. Thus, a UTF character of 1 byte length is exactly the same number as the corresponding ASCII character. However, a Latin-1 character will have a number beyond 7f. So its not possible to say if a single byte is a Latin-1 character or the start of an UTF sequence. In conclusion, extending the encoding sheme of the protocoll from ASCII to UTF would leave current clients still working, as nullbytes do not occur. Old clients of course would treat each byte of an UTF sequence as a separate character, leading to funny names in the search results. But you get that even now, and searches containing e.g. German special characters do not really work right now: These characters will normally just be ignored. Moving to UCS might make some old clients fail, as one character might contain a nullbyte. Compared to UCS, UTF for a single character either takes less space (for ASCII text), exactly the same space (for the special European characters and any characters up to 07ff UNICODE, for example Russian), or 1 byte more (most notable for Asian languages) As the bulk of the traffic very probably will remain ASCII for a long time from now, the increase in load by using UTF should be tolerable. You gain a worldwide audience, and you stay compatible with the current standard. Keep in mind that Latin-1 right now neither is standard nor does it work well. Lastly, if at some point in future you desire an extension to cover UNICODE characters up to U+10ffff then UTF-8 can still be used. Please have a look at the <a href=http://www.unicode.org/>UNICODE Consortium</a>. Demonstration pages for UNICODE (always UTF encoded) can be found anywhere on the web. One such is <a href=http://www.geocities.com/Tokyo/Pagoda/1675/unicode-page.html>here</a>. If you go for Latin-1, then you need a mechanism to identify the message as Latin-1 or as UNCIODE. If you use UCS, then you probably cannot maintain downward compatibility. You will also get new problems when at some point in the future characters up to U+10ffff should be supported. |
| |||
Re: UTF, UNICODE technical (long) Quote:
|
| |||
Re: Re: UTF, UNICODE technical (long) Quote:
|
| |||
Yes . To my knowledge, this would make a comprehensive basis for internationalization. With a view to the future, it's best to provide for UTF-8 up to 4 bytes, even though currently only up to 3 will be used. Once the protocoll has been defined that way, it's just up to the clients to fill it with life. Hopefully the cutting edge of clients will support entry and display of characters not in the system default codepage... from past experiences, I'm worried about this . |
| |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Proposal for development of Gnutella (hashs) | Unregistered | General Gnutella Development Discussion | 61 | April 17th, 2002 09:35 AM |
My Proposal for XoloX!!! | Unregistered | User Experience | 1 | February 6th, 2002 09:11 AM |
What does 'Gnutella v0.6 protocoll' mean? | Moak | LimeWire Beta Archives | 0 | December 12th, 2001 11:03 PM |
---a Radical Proposal--- | Unregistered | General Gnutella / Gnutella Network Discussion | 0 | September 21st, 2001 01:08 PM |
protocol extension proposal | Unregistered | General Gnutella Development Discussion | 3 | September 16th, 2001 03:00 PM |