#109674 - 2003-12-05 07:12 PM
Re: UNICODE support for READLINE
|
Allen
KiX Supporter
Registered: 2003-04-19
Posts: 4549
Loc: USA
|
IIRC, I ran into the same problem while writing the Addprinter() UDF. The way I got around it was to shell out and TYPE the file to another file.
Code:
shell '%comspec% /c type "$driverinf">%temp%\addprinter.txt'
Once the new file was created readline worked normally. However, you are right, it would be nice to be able to just read the unicode directly.
|
Top
|
|
|
|
#109675 - 2003-12-08 11:37 AM
Re: UNICODE support for READLINE
|
Richard H.
Administrator
Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
|
Quote:
Maybe something like supporint chr(0) in strings would be sufficient.
Supporting wide character/double-byte/unicode is a good idea and I believe will become more and more important. I think that simply changing the internal support for basic strings is not going to be sufficient to support Unicode.
Unicode characters are not simply an ASCII character preceded by a Chr(0). That Chr(0) is there for a reason, see the Unicode home page for the full spec.
This means that you need to be careful when reading, writing, substringing, intstringing, catenating, testing and converting strings to preserve the character set information.
The better solution is to have a new "wide string" basic type and either update the string functions to auto-magically support both or add wide string functions.
Conversion between wide and non-wide strings would be automatic in the same way as (say) between strings and number types. The extra byte would of course be lost when converting from a wide to a non-wide string, and would have to be set to '0' when converting from a non-wide to a wide string.
Specifying characters is also an interesting task. "Simple Latin" isn't a problem as it corresponds to 7-bit ASCII (&0000 - &007F), so the automatic conversion could handle that. How do you specify Cyrillic, Greek, Box drawing ot mathematical characters though, especially when the high order byte is often a non-printable? Perhaps a new conversion function, so if you wanted the currency symbol for the Euro you would use: Code:
$wsEuroSymbol=CWStr(&20AC)
This is going to be a lot of work, so a short-term measure would be:
- Update "OPEN", to recognise wide character files, and allow the specification of wide character when creating files.
- Update "Readline" to silently drop the leading byte of each wide character.
- Update "Writeline" to prefix each character with Chr(0).
While this doesn't actually provide Unicode support, it will allow the reading and writing of files which contain only the "Basic Latin" and "Latin-1 Supplement" codes which may be sufficient for administration purposes.
|
Top
|
|
|
|
Moderator: Lonkero, ShaneEP, Jochen, Radimus, Glenn Barnas, Allen, Ruud van Velsen, Mart
|
1 registered
(Allen)
and 496 anonymous users online.
|
|
|