Gnutella Forums  

Go Back   Gnutella Forums > Current Gnutella Client Forums > LimeWire+WireShare (Cross-platform) > Open Discussion topics
Register FAQ The Twelve Commandments Members List Calendar Arcade Find the Best VPN Today's Posts

Open Discussion topics Discuss the time of day, whatever you want to. This is the hangout area. If you have LimeWire problems, post them here too.


Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old December 11th, 2004
verdyp's Avatar
LimeWire is International
 
Join Date: January 13th, 2002
Location: Nantes, FR; Rennes, FR
Posts: 306
verdyp is flying high
Default

In principle, Lao and Khmer will be less complex than Thai, because they were encoded in Unicode using the logical model, which makes full-text searches easier to implement. For LimeWire, it means that Lao and Khmer can be handled like other Indian/Brahmic scripts (we don't care if the visual order differs from the logical order, or if there exists input methods that use a visual order, given that the conversion of these texts to Unicode will use a logical order.)

But it's not true for Thai, because Thai is supported in Thailand by an old and widely used TIS-620 standard, made since long in the 70's by IBM for the Thai government which made it mandatory for the representation of Thai texts. Unicode has then borrowed this situation, because it wanted to keep a roundtrip compatibility with the very large existing corpus of Thai texts in computers, files, databases, and input methods, encoded since long with TIS-620 or one of its predecessors.

Thai has always been encoded with the visual order where some letters that are logically after another one (including in phonetic, structure, or collation) must be entered before it, because it will be written graphically in a single right-to-left direction (this visual ordering comes from the legacy limitations of font and display technologies in the 70s). India has chosen to preserve the logical ordering, which looks more like the way Indian users think about their language, and how they spell it orally (there has existed some typewriters in India using the visual orer, but these were considered difficult or illogical to use; for computers, the ISCII standard was created with the correct assumption that computers would make themselves the visual ordering.)

Thai is complex because there does not exist a reliable algorithm to convert from the visual order (encoded) to the logical order (that would be useful for searches and collation). In practice, a Thai collation system comes with a large database containing most Thai words or radicals encoded in logical order. This database could be used to create useful lexical entities in LimeWire, but it is large, and may be subject to some copyright restrictions (I don't know exactly the status of the Thai database that comes with IBM's ICU, i.e. if it can be redistributed freely, like ICU itself).

If LimeWire incorporated this database for Thai users, may be this should be an optional download, because of its size.

As far as I know, no other Asian language needs such a database for collation, but such a database may be needed by a lexer to split sentences into keywords, due to the absence of mandatory spaces. But it's possible that these Asian users have learned to insert spaces or punctuation within their filenames to help the indexation of their files. We have no feedback about this from Chinese or Japanese users, so I don't know if their attempts to search files in their language is successful or not. If not, may be we should consider implementing at least some basic lexer like the one I exposed above (every 2 or 3 or 4 characters within a sequence of letters of the same Asian script).
__________________
LimeWire is international. Help translate LimeWire to your own language.
Visit: http://www.limewire.org/translate.shtml
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
everybody point and laugh wrestlingles Open Discussion topics 2 November 10th, 2005 04:30 AM
Hallo Deutschland DaleKaufm Deutsch 1 June 21st, 2005 02:02 PM
What is the point if i cant transfer???? Unregistered General Mac OSX Support 9 November 1st, 2002 09:14 AM
What is the point of the MP3 player in version 1.8? Unregistered Open Discussion topics 2 November 13th, 2001 07:54 AM
Point Server with P2P allautoweb General Gnutella / Gnutella Network Discussion 0 April 29th, 2001 06:09 PM


All times are GMT -7. The time now is 08:16 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.

Copyright © 2020 Gnutella Forums.
All Rights Reserved.