Gnutella Forums - View Single Post - Hallo! We want being able to point to one more question

verdyp · #18 (**permalink**) December 12th, 2004

How can we avoid to be technical for such discussions?
At the time when such 3-characters limit was added, it was because there was a huge and unnecessary traffic caused by searches with too many results like "the", or "mp3", or even "*".
Limewire has integrated some filters forcing its users to be more selective, so that requests will not be randomly routed throughout a large part of the network, where there are too many results, so that these requests will completely fill the available bandwidth.

The problem with those request is not that all results will not be returned (due to bandwidth limitation), but the fact that they will completely fill the space shared that would allow routing more specific and more useful requests. This limitation is inherent to the propagation model of these requests: they behave, statistically, exactly like network-wide broadcasts, consuming an extremely large bandwidth both for the broadcasted requests themselves but even worse for the distributed responses spreaded from throughout the network.

I note however, that since Gnutella has evolved, the routing of responses directly to the requester will use a lower shared bandwidth, because many intermediate nodes will not have to support this traffic: the only node that will be overwelmed by this traffic will be the requested host itself, whose input bandwidth will be completely saturated by responses. But Limewire now has an algorithm to easily limit and control this incoming flow. So these replies are less a problem than they were in the past. It remains true that these requests will still propagate as broadcast, without efficient routing. But LimeWire integrates in its router, some algorithms that limit the impact of broadcasts, by not routing a request immediately to all candidate directions.

If you look at the "what's new?" feature, you'll note that it looks like a request that can match many replies from many hosts, possibly nearly all! This feature now works marvelously, and it's possible that these "greedy" short requests would no longer be a problem in LimeWire.

These algorithms are mostly heuristics, they are not perfect. So we still need to carefully study the impact if we lower them. What is clear is that the most problematic greedy requests in the past was with requests containing only ASCII letters. The 3-characters limit was imagined at a time where only ASCII requests were possible on Gnutella, so working well only in English and some languages rarely present on the web. Under this old limit, we had too many files containing "words" with 1 or 2 ASCII letters or digits. In other terms, these requests were not selective enough.

But as we can now search for international strings, with accents on Latin letters, or using other scripts, the 3 characters minimum becomes excessive, because a search for a 1-letter or 2-letter Han ideographic word will be most often more selective than a search for a 3-letter English word. In fact Latin letters with accents or even Greek letters or Cyrillic letters are much less present on the network, even on hosts run by users using this script natively for naming a part of their shared files.

Note however that if I search for "Pô", this search may look selective for the French name of a River in the North of Italy, but in fact the effective search string will be "po" (because searches are matched and routed so that minor differences of orthography are hidden: I can search for "café" and I get results with "CAFE" or "Café" or "cafes" or results where the accute accent above e is encoded separately after the e letter as a combining diacritic, because of the way various hosts and systems encode or strip this accent in their filenames...). So a search for "Pô" is still not selective enough. This is a place for later improvement, with more dynamic behavior based on actual frequencies of results.

For now, we must keep this limit for ASCII letters (which apply to nearly all Latin letters with few exceptions like "ø" or "æ" which may be decomposed into "o" and "ae", or Latin letters like the Icelandic/old English "thorn", the Nordic "eth", or the rare "esh"), but I don't see why we should not relax it for non Latin scripts:

Notably Cyrillic letters (for Russian, Bulgarian, Serbian...), Han ideographs (for Chinese Hanzi characters, Japanese Kanjis or Korean Hanjas), Hiragana and Katakana syllables (for Japanese), and less urgently for the Greek alphabet, and the Hebrew and Arabic abugidas, and some Indian scripts.