|
Register | FAQ | The Twelve Commandments | Members List | Calendar | Arcade | Find the Best VPN | Today's Posts | Search |
Open Discussion topics Discuss the time of day, whatever you want to. This is the hangout area. If you have LimeWire problems, post them here too. |
Welcome To Gnutella Forums You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, fun aspects such as the image caption contest and play in the arcade, and access many other special features after your registration and email confirmation. Registration is fast, simple and absolutely free so please, join our community today! (click here) (Note: we use Yandex mail server so make sure yandex is not on your email filter or blocklist.) Confirmation emails might be found in your Junk folder, especially for Yahoo or GMail. If you have any problems with the Gnutella Forum registration process or your Gnutella Forum account login, please contact us (this is not for program use questions.) Your email address must be legitimate and verified before becoming a full member of the forums. Please be sure to disable any spam filters you may have for our website, so that email messages can reach you. Note: Any other issue with registration, etc., send a Personal Message (PM) to one of the active Administrators: Lord of the Rings or Birdy. Once registered but before posting, members MUST READ the FORUM RULES (click here) and members should include System details - help us to help you (click on blue link) in their posts if their problem relates to using the program. Whilst forum helpers are happy to help where they can, without these system details your post might be ignored. And wise to read How to create a New Thread Thank you If you are a Spammer click here. This is not a business advertising forum, all member profiles with business advertising will be banned, all their posts removed. Spamming is illegal in many countries of the world. Guests and search engines cannot view member profiles. Deutsch? Español? Français? Nederlands? Hilfe in Deutsch, Ayuda en español, Aide en français et LimeWire en français, Hulp in het Nederlands Forum Rules Support Forums Before you post to one of the specific Client Help and Support Conferences in Gnutella Client Forums please look through other threads and Stickies that may answer your questions. Most problems are not new. The Search function is most useful. Also the red Stickies have answers to the most commonly asked questions. (over 90 percent). If your problem is not resolved by a search of the forums, please take the next step and post in the appropriate forum. There are many members who will be glad to help. If you are new to the world of file sharing please do not be shy! Everyone was ‘new’ when they first started. When posting, please include details for: Your Operating System ....... Your version of your Gnutella Client (* this is important for helping solve problems) ....... Your Internet connection (56K, Cable, DSL) ....... The exact error message, if one pops up Any other relevant information that you think may help ....... Try to make your post descriptive, specific, and clear so members can quickly and efficiently help you. To aid helpers in solving download/upload problems, LimeWire and Frostwire users must specify whether they are downloading a torrent file or a file from the Gnutella network. Members need to supply these details >>> System details - help us to help you (click on blue link) Moderators There are senior members on the forums who serve as Moderators. These volunteers keep the board organized and moving. Moderators are authorized to: (in order of increasing severity) Move posts to the correct forums. Many times, members post in the wrong forum. These off-topic posts may impede the normal operation of the forum. Edit posts. Moderators will edit posts that are offensive or break any of the House Rules. Delete posts. Posts that cannot be edited to comply with the House Rules will be deleted. Restrict members. This is one of the last punishments before a member is banned. Restrictions may include placing all new posts in a moderation queue or temporarily banning the offender. Ban members. The most severe punishment. Three or more moderators or administrators must agree to the ban for this action to occur. Banning is reserved for very severe offenses and members who, after many warnings, fail to comply with the House Rules. Banning is permanent. Bans cannot be removed by the moderators and probably won't be removed by the administration. The Rules 1. Warez, copyright violation, or any other illegal activity may NOT be linked or expressed in any form. Topics discussing techniques for violating these laws and messages containing locations of web sites or other servers hosting illegal content will be silently removed. Multiple offenses will result in consequences. File names are not required to discuss your issues. If filenames are copyright then do not belong on these forums & will be edited out or post removed. Picture sample attachments in posts must not include copyright infringement. 2. Spamming and excessive advertising will not be tolerated. Commercial advertising is not allowed in any form, including using in signatures. 3. There will be no excessive use of profanity in any forum. 4. There will be no racial, ethnic, or gender based insults, or any other personal attacks. 5. Pictures may be attached to posts and signatures if they are not sexually explicit or offensive. Picture sample attachments in posts must not include copyright infringement. 6. Remember to post in the correct forum. Take your time to look at other threads and see where your post will go. If your post is placed in the wrong forum it will be moved by a moderator. There are specific Gnutella Client sections for LimeWire, Phex, FrostWire, BearShare, Gnucleus, Morpheus, and many more. Please choose the correct section for your problem. 7. If you see a post in the wrong forum or in violation of the House Rules, please contact a moderator via Private Message or the "Report this post to a moderator" link at the bottom of every post. Please do not respond directly to the member - a moderator will do what is required. 8. Any impersonation of a forum member in any mode of communication is strictly prohibited and will result in banning. 9. Multiple copies of the same post will not be tolerated. Post your question, comment, or complaint only once. There is no need to express yourself more than once. Duplicate posts will be deleted with little or no warning. Keep in mind a forum censor may temporarily automatically hold up your post, if you do not see your post, do not post again, it will be dealt with by a moderator within a reasonable time. Authors of multiple copies of same post may be dealt with by moderators within their discrete judgment at the time which may result in warning or infraction points, depending on severity as adjudged by the moderators online. 10. Posts should have descriptive topics. Vague titles such as "Help!", "Why?", and the like may not get enough attention to the contents. 11. Do not divulge anyone's personal information in the forum, not even your own. This includes e-mail addresses, IP addresses, age, house address, and any other distinguishing information. Don´t use eMail addresses in your nick. Reiterating, do not post your email address in posts. This is for your own protection. 12. Signatures may be used as long as they are not offensive or sexually explicit or used for commercial advertising. Commercial weblinks cannot be used under any circumstances and will result in an immediate ban. 13. Dual accounts are not allowed. Cannot explain this more simply. Attempts to set up dual accounts will most likely result in a banning of all forum accounts. 14. Video links may only be posted after you have a tally of two forum posts. Video link posting with less than a 2 post tally are considered as spam. Video link posting with less than a 2 post tally are considered as spam. 15. Failure to show that you have read the forum rules may result in forum rules breach infraction points or warnings awarded against you which may later total up to an automatic temporary or permanent ban. Supplying system details is a prerequisite in most cases, particularly with connection or installation issues. Violation of any of these rules will bring consequences, determined on a case-by-case basis. Thank You! Thanks for taking the time to read these forum guidelines. We hope your visit is helpful and mutually beneficial to the entire community. |
| LinkBack | Thread Tools | Display Modes |
| ||||
You seem to have interesting and valuable knowledge about Asian scripts. If you have some programming experience, could you join to our team of open-source contributors to help improve further the internationalization of LimeWire? Unfortunately, my knowledges of these scripts is only theorical, based on the works done in the Unicode standard and related works such as ICU and UniHan properties, but not based on linguistic and semantics. One thing for which we have no clew is the support of Thai (which unfortunately has a visual ordering in Unicode because of the support of the legacy national TIS-620 standard, instead of a logical one used in other scripts, and also because Thai, like many other Asian languages, do not use any space to separate words). In the past, I proposed to index Asian filenames by splitting them arbitrarily in units of 2 or 3 character positions, but the number of generated keywords would have been a bit too high: Suppose that the title "ABCDEFG" is present, where each letter is a ideograph, or a Hiragana or Katakana letter or a Thai letter, the generated searchable and indexed keywords would have been: "AB", "BC", "CD", "DE", "EF", "FG" (if these two-letter "keywords" respect the minimum UTF-8 length restrictions given in my previous message) "ABC", "BCD", "CDE", "DEF", "EFG" Note that there may exist situations where a SINGLE character is a significant keyword. In LimeWire, we currently detect keyword separations either with: - spaces and controls - the general category of characters, so that punctuations or symbols become equivalent to spaces. - the script type of the character: a transition in Japanese between Hiragana or Katakana or Kanji or Latin implies a keyword separation. What we really need is a lexer. There are several open-source projects related to such lexical analysis of Asian texts (notably for implementing automatic translators, or input method editors or word processors). The problem is that they often depend on a local database that will store the lexical entities, or long lists of lexical rules. Some projects perform something else: lexical analysis is performed automatically, from an initially empty dictionnary, by statistical analysis of frequent lexical radicals, so that frequently used prefixes and suffixes can be identified (this is also useful for non Asian languages, like German, or for other Latin-written European or African languages like Hungarian or Berber). This is a research domain which is highly protected by many patents, notably those owned by famous dictionnary editors, or web search engines like Google... Documents on this subject, which would be freely available and that would allow royaltee-free redistribution in open-source software are difficult to find... But I suppose that this has been studied since centuries within some old books whose text is now available in the public domain. My searches within public libraries like the BDF in France have not found something significant (and getting copies of these documents is often complicate or even expensive, unless these books have been converted to numeric formats avaliable online). Also most of these books imply at least a good knowledge of the referenced languages, something I don't have... It's probably easier to do by natives of countries speaking and writing those languages, that's why we call for contributions by open-sourcers...
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml |
| ||||
In principle, Lao and Khmer will be less complex than Thai, because they were encoded in Unicode using the logical model, which makes full-text searches easier to implement. For LimeWire, it means that Lao and Khmer can be handled like other Indian/Brahmic scripts (we don't care if the visual order differs from the logical order, or if there exists input methods that use a visual order, given that the conversion of these texts to Unicode will use a logical order.) But it's not true for Thai, because Thai is supported in Thailand by an old and widely used TIS-620 standard, made since long in the 70's by IBM for the Thai government which made it mandatory for the representation of Thai texts. Unicode has then borrowed this situation, because it wanted to keep a roundtrip compatibility with the very large existing corpus of Thai texts in computers, files, databases, and input methods, encoded since long with TIS-620 or one of its predecessors. Thai has always been encoded with the visual order where some letters that are logically after another one (including in phonetic, structure, or collation) must be entered before it, because it will be written graphically in a single right-to-left direction (this visual ordering comes from the legacy limitations of font and display technologies in the 70s). India has chosen to preserve the logical ordering, which looks more like the way Indian users think about their language, and how they spell it orally (there has existed some typewriters in India using the visual orer, but these were considered difficult or illogical to use; for computers, the ISCII standard was created with the correct assumption that computers would make themselves the visual ordering.) Thai is complex because there does not exist a reliable algorithm to convert from the visual order (encoded) to the logical order (that would be useful for searches and collation). In practice, a Thai collation system comes with a large database containing most Thai words or radicals encoded in logical order. This database could be used to create useful lexical entities in LimeWire, but it is large, and may be subject to some copyright restrictions (I don't know exactly the status of the Thai database that comes with IBM's ICU, i.e. if it can be redistributed freely, like ICU itself). If LimeWire incorporated this database for Thai users, may be this should be an optional download, because of its size. As far as I know, no other Asian language needs such a database for collation, but such a database may be needed by a lexer to split sentences into keywords, due to the absence of mandatory spaces. But it's possible that these Asian users have learned to insert spaces or punctuation within their filenames to help the indexation of their files. We have no feedback about this from Chinese or Japanese users, so I don't know if their attempts to search files in their language is successful or not. If not, may be we should consider implementing at least some basic lexer like the one I exposed above (every 2 or 3 or 4 characters within a sequence of letters of the same Asian script).
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml |
| ||||
I may try to research around to see what I can find out (of those that search in their native language) & see what their responses are. But I won't reply in a hurry it could take quite some time. Particularly this time of year (people on breaks & on holidays overseas, etc.) Thai font licenced! Well I guess that would explain the limitation of characters. Some of mine come freely from a university apparently. So how does Java handle these fonts? Seem to be fine from here. Everytime I visit hotm'l it's in thai text. Not all correctly displayed but. How much do you think LW would pay me if I translated a version totally to Thai & had OSX type iTunes support. lol (ummm just a joke!] |
| ||||
I won't reply you about the payment. A LimeWire crew would better replyto you privately. I contribute to LimeWire as a free open-source contributer, managing most of the work with candidate translators, validating their input, testing them in LimeWire, helping LimeWire for beta-tests, or helping with proposed optimizations or corrections. I'm not paid to do that, but I have no contractual obligation with LimeWire, but just need to respect its working etiquette. For these reasons, Limewire has granted to me a limited access to their development platform, and I received a couple of checks and some goodies as a way thank me for my past contributions.
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml |
| ||||
Note that I already have some open-source lexers, but they are imlplemented in C, not Java (however this source is not difficult to port). The only good question is the validity of these lexicon extractors, but some of them are used in wellknown search engines, such as mnoGoSearch, which comes with a lexicon extractor that: - first extract tokens from text using separators, or returning 1-char tokens for Han and Thai characters - then uses a dynamic programming to determine lexicons from multiple paths based on word frequencies with maximum path length (this requires a word frequency dictionnary, available for Thai and Chinese, Thai being the smallest, Mandarin/simplified Chinese containing 3 times more entries, and Traditional Chinese being twice the size of Mandarin). The algorithm implementation is small, and efficient, but the main problem is to encode the dictionnaries in a small and efficient way. For Thai, this could fit in less than 100KB, but for Mandarin it would require about 300 to 400KB, and for Traditional Chinese about 1MB (in memory at run-time)...
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml |
| ||||
Poor Yukino: if he/she is still trying to follow this thread and she/he is using some kind of translating software to understand what is being talked about then he/she is pretty much done for.
__________________ iMac G4 OSX 10.3.9 RAM 256MB LW 4.10.5 Basic ADSL anything from 3 to 8Mbps/around 1024kbps "Raise your can of Beer on high And seal your fate forever Our best years have passed us by The Golden Age Of Leather" -Blue Öyster Cult- |
| ||||
How can we avoid to be technical for such discussions? At the time when such 3-characters limit was added, it was because there was a huge and unnecessary traffic caused by searches with too many results like "the", or "mp3", or even "*". Limewire has integrated some filters forcing its users to be more selective, so that requests will not be randomly routed throughout a large part of the network, where there are too many results, so that these requests will completely fill the available bandwidth. The problem with those request is not that all results will not be returned (due to bandwidth limitation), but the fact that they will completely fill the space shared that would allow routing more specific and more useful requests. This limitation is inherent to the propagation model of these requests: they behave, statistically, exactly like network-wide broadcasts, consuming an extremely large bandwidth both for the broadcasted requests themselves but even worse for the distributed responses spreaded from throughout the network. I note however, that since Gnutella has evolved, the routing of responses directly to the requester will use a lower shared bandwidth, because many intermediate nodes will not have to support this traffic: the only node that will be overwelmed by this traffic will be the requested host itself, whose input bandwidth will be completely saturated by responses. But Limewire now has an algorithm to easily limit and control this incoming flow. So these replies are less a problem than they were in the past. It remains true that these requests will still propagate as broadcast, without efficient routing. But LimeWire integrates in its router, some algorithms that limit the impact of broadcasts, by not routing a request immediately to all candidate directions. If you look at the "what's new?" feature, you'll note that it looks like a request that can match many replies from many hosts, possibly nearly all! This feature now works marvelously, and it's possible that these "greedy" short requests would no longer be a problem in LimeWire. These algorithms are mostly heuristics, they are not perfect. So we still need to carefully study the impact if we lower them. What is clear is that the most problematic greedy requests in the past was with requests containing only ASCII letters. The 3-characters limit was imagined at a time where only ASCII requests were possible on Gnutella, so working well only in English and some languages rarely present on the web. Under this old limit, we had too many files containing "words" with 1 or 2 ASCII letters or digits. In other terms, these requests were not selective enough. But as we can now search for international strings, with accents on Latin letters, or using other scripts, the 3 characters minimum becomes excessive, because a search for a 1-letter or 2-letter Han ideographic word will be most often more selective than a search for a 3-letter English word. In fact Latin letters with accents or even Greek letters or Cyrillic letters are much less present on the network, even on hosts run by users using this script natively for naming a part of their shared files. Note however that if I search for "Pô", this search may look selective for the French name of a River in the North of Italy, but in fact the effective search string will be "po" (because searches are matched and routed so that minor differences of orthography are hidden: I can search for "café" and I get results with "CAFE" or "Café" or "cafes" or results where the accute accent above e is encoded separately after the e letter as a combining diacritic, because of the way various hosts and systems encode or strip this accent in their filenames...). So a search for "Pô" is still not selective enough. This is a place for later improvement, with more dynamic behavior based on actual frequencies of results. For now, we must keep this limit for ASCII letters (which apply to nearly all Latin letters with few exceptions like "ø" or "æ" which may be decomposed into "o" and "ae", or Latin letters like the Icelandic/old English "thorn", the Nordic "eth", or the rare "esh"), but I don't see why we should not relax it for non Latin scripts: Notably Cyrillic letters (for Russian, Bulgarian, Serbian...), Han ideographs (for Chinese Hanzi characters, Japanese Kanjis or Korean Hanjas), Hiragana and Katakana syllables (for Japanese), and less urgently for the Greek alphabet, and the Hebrew and Arabic abugidas, and some Indian scripts.
__________________ LimeWire is international. Help translate LimeWire to your own language. Visit: http://www.limewire.org/translate.shtml |
| |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
everybody point and laugh | wrestlingles | Open Discussion topics | 2 | November 10th, 2005 05:30 AM |
Hallo Deutschland | DaleKaufm | Deutsch | 1 | June 21st, 2005 03:02 PM |
What is the point if i cant transfer???? | Unregistered | General Mac OSX Support | 9 | November 1st, 2002 10:14 AM |
What is the point of the MP3 player in version 1.8? | Unregistered | Open Discussion topics | 2 | November 13th, 2001 08:54 AM |
Point Server with P2P | allautoweb | General Gnutella / Gnutella Network Discussion | 0 | April 29th, 2001 07:09 PM |