Page 2 of 3 <123>
Topic Options
#101893 - 2003-06-09 06:26 PM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
no.
don't screw the charset-tables with "ascii" = "on"

leave that out.
_________________________
!

download KiXnet

Top
#101894 - 2003-06-09 06:28 PM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
some explanation.
this is udf, thus we can't go and change these global kixtart's settings.
let them be like they are.

indeed the output to console can differ from the one written to file.
anyway, we... you don't need to care about that.
that is not the problem of the udf but the person using it.
_________________________
!

download KiXnet

Top
#101895 - 2003-06-09 10:49 PM Re: HTMLtoText, has anyone done that?
masken Offline
MM club member
*****

Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
hmm... I think we have something here? [Smile]
Comments, errors?

I'm a bit worried about the "$Char = $Chars[$AltAPos]", cause that will result in a stop if the code isn't found (-1 returned)...

code:
FUNCTION HTMLtoText($string)
;| Exit if there's nothing to convert
IF INSTR($string, "&") = 0 OR INSTR($string, ";") = 0
EXIT 2
ENDIF
DIM $Counter
DIM $CodeAlts[102]
$CodeAlts[0] = "&quot;"
$CodeAlts[1] = "&amp;"
$CodeAlts[2] = "&lt;"
$CodeAlts[3] = "&gt;"
$CodeAlts[4] = "&trade;"
$CodeAlts[5] = "&nbsp;"
$CodeAlts[6] = "&iexcl;"
$CodeAlts[7] = "&cent;"
$CodeAlts[8] = "&pound;"
$CodeAlts[9] = "&curren;"
$CodeAlts[10] = "&yen;"
$CodeAlts[11] = "&brvbar;"
$CodeAlts[12] = "&sect;"
$CodeAlts[13] = "&uml;"
$CodeAlts[14] = "&copy;"
$CodeAlts[15] = "&ordf;"
$CodeAlts[16] = "&laquo;"
$CodeAlts[17] = "&not;"
$CodeAlts[18] = "&shy;"
$CodeAlts[19] = "&reg;"
$CodeAlts[20] = "&macr;"
$CodeAlts[21] = "&deg;"
$CodeAlts[22] = "&plusmn;"
$CodeAlts[23] = "&sup2;"
$CodeAlts[24] = "&sup3;"
$CodeAlts[25] = "&acute;"
$CodeAlts[26] = "&micro;"
$CodeAlts[27] = "&para;"
$CodeAlts[28] = "&middot;"
$CodeAlts[29] = "&cedil;"
$CodeAlts[30] = "&sup1;"
$CodeAlts[31] = "&ordm;"
$CodeAlts[32] = "&raquo;"
$CodeAlts[33] = "&frac14;"
$CodeAlts[34] = "&frac12;"
$CodeAlts[35] = "&frac34;"
$CodeAlts[36] = "&iquest;"
$CodeAlts[37] = "&Agrave;"
$CodeAlts[38] = "&Aacute;"
$CodeAlts[39] = "&Acirc;"
$CodeAlts[40] = "&Atilde;"
$CodeAlts[41] = "&Auml;"
$CodeAlts[42] = "&Aring;"
$CodeAlts[43] = "&AElig;"
$CodeAlts[44] = "&Ccedil;"
$CodeAlts[45] = "&Egrave;"
$CodeAlts[46] = "&Eacute;"
$CodeAlts[47] = "&Ecirc;"
$CodeAlts[48] = "&Euml;"
$CodeAlts[49] = "&Igrave;"
$CodeAlts[50] = "&Iacute;"
$CodeAlts[51] = "&Icirc;"
$CodeAlts[52] = "&Iuml;"
$CodeAlts[53] = "&eth;"
$CodeAlts[54] = "&Ntilde;"
$CodeAlts[55] = "&Ograve;"
$CodeAlts[56] = "&Oacute;"
$CodeAlts[57] = "&Ocirc;"
$CodeAlts[58] = "&Otilde;"
$CodeAlts[59] = "&Ouml;"
$CodeAlts[60] = "&times;"
$CodeAlts[61] = "&Oslash;"
$CodeAlts[62] = "&Ugrave;"
$CodeAlts[63] = "&Uacute;"
$CodeAlts[64] = "&Ucirc;"
$CodeAlts[65] = "&Uuml;"
$CodeAlts[66] = "&Yacute;"
$CodeAlts[67] = "&thorn;"
$CodeAlts[68] = "&szlig;"
$CodeAlts[69] = "&agrave;"
$CodeAlts[70] = "&aacute;"
$CodeAlts[71] = "&acirc;"
$CodeAlts[72] = "&atilde;"
$CodeAlts[73] = "&auml;"
$CodeAlts[74] = "&aring;"
$CodeAlts[75] = "&aelig;"
$CodeAlts[76] = "&ccedil;"
$CodeAlts[77] = "&egrave;"
$CodeAlts[78] = "&eacute;"
$CodeAlts[79] = "&ecirc;"
$CodeAlts[80] = "&euml;"
$CodeAlts[81] = "&igrave;"
$CodeAlts[82] = "&iacute;"
$CodeAlts[83] = "&icirc;"
$CodeAlts[84] = "&iuml;"
$CodeAlts[85] = "&eth;"
$CodeAlts[86] = "&ntilde;"
$CodeAlts[87] = "&ograve;"
$CodeAlts[88] = "&oacute;"
$CodeAlts[89] = "&ocirc;"
$CodeAlts[90] = "&otilde;"
$CodeAlts[91] = "&ouml;"
$CodeAlts[92] = "&divide;"
$CodeAlts[93] = "&oslash;"
$CodeAlts[94] = "&ugrave;"
$CodeAlts[95] = "&uacute;"
$CodeAlts[96] = "&ucirc;"
$CodeAlts[97] = "&uuml;"
$CodeAlts[98] = "&yacute;"
$CodeAlts[99] = "&thorn;"
$CodeAlts[100] = "&yuml;"

DIM $Chars[102]
$Chars[0] = CHR(34)
$Chars[1] = "&"
$Chars[2] = "<"
$Chars[3] = ">"
$Chars[4] = "™"
$Chars[5] = " "
$Chars[6] = "¡"
$Chars[7] = "¢"
$Chars[8] = "£"
$Chars[9] = "¤"
$Chars[10] = "¥"
$Chars[11] = "¦"
$Chars[12] = "§"
$Chars[13] = "¨"
$Chars[14] = "©"
$Chars[15] = "ª"
$Chars[16] = "«"
$Chars[17] = "¬"
$Chars[18] = "­"
$Chars[19] = "®"
$Chars[20] = "¯"
$Chars[21] = "°"
$Chars[22] = "±"
$Chars[23] = "²"
$Chars[24] = "³"
$Chars[25] = "´"
$Chars[26] = "µ"
$Chars[27] = "¶"
$Chars[28] = "·"
$Chars[29] = "¸"
$Chars[30] = "¹"
$Chars[31] = "º"
$Chars[32] = "»"
$Chars[33] = "¼"
$Chars[34] = "½"
$Chars[35] = "¾"
$Chars[36] = "¿"
$Chars[37] = "À"
$Chars[38] = "Á"
$Chars[39] = "Â"
$Chars[40] = "Ã"
$Chars[41] = "Ä"
$Chars[42] = "Å"
$Chars[43] = "Æ"
$Chars[44] = "Ç"
$Chars[45] = "È"
$Chars[46] = "É"
$Chars[47] = "Ê"
$Chars[48] = "Ë"
$Chars[49] = "Ì"
$Chars[50] = "Í"
$Chars[51] = "Î"
$Chars[52] = "Ï"
$Chars[53] = "Ð"
$Chars[54] = "Ñ"
$Chars[55] = "Ò"
$Chars[56] = "Ó"
$Chars[57] = "Ô"
$Chars[58] = "Õ"
$Chars[59] = "Ö"
$Chars[60] = "×"
$Chars[61] = "Ø"
$Chars[62] = "Ù"
$Chars[63] = "Ú"
$Chars[64] = "Û"
$Chars[65] = "Ü"
$Chars[66] = "Ý"
$Chars[67] = "Þ"
$Chars[68] = "ß"
$Chars[69] = "à"
$Chars[70] = "á"
$Chars[71] = "â"
$Chars[72] = "ã"
$Chars[73] = "ä"
$Chars[74] = "å"
$Chars[75] = "æ"
$Chars[76] = "ç"
$Chars[77] = "è"
$Chars[78] = "é"
$Chars[79] = "ê"
$Chars[80] = "ë"
$Chars[81] = "ì"
$Chars[82] = "í"
$Chars[83] = "î"
$Chars[84] = "ï"
$Chars[85] = "ð"
$Chars[86] = "ñ"
$Chars[87] = "ò"
$Chars[88] = "ó"
$Chars[89] = "ô"
$Chars[90] = "õ"
$Chars[91] = "ö"
$Chars[92] = "÷"
$Chars[93] = "ø"
$Chars[94] = "ù"
$Chars[95] = "ú"
$Chars[96] = "û"
$Chars[97] = "ü"
$Chars[98] = "ý"
$Chars[99] = "þ"
$Chars[100] = "ÿ"

$CodeTexts = SPLIT("$string", "&")
FOR $Counter = 0 TO UBOUND($CodeTexts)
IF INSTR($CodeTexts[$Counter], ";") <> 0
;|We have an array element starting with a code, that perhaps also
;|contains some text after the code, substract the code-only part
$Code = SPLIT($CodeTexts[$Counter], ";")[0]
IF LEFT($Code, 1) = "#"
;|we have a numeric code
$Char = CHR(SUBSTR($Code, 2, LEN($Code)))
ELSE
;|we have an altcode
$AltAPos = ASCAN($CodeAlts, "&" + $Code + ";")
$Char = $Chars[$AltAPos]
ENDIF
;|re-assemble the array element, there might be text after the code part
$CodeTexts[$Counter] = $Char + SUBSTR($CodeTexts[$Counter], INSTR($CodeTexts[$Counter], ";") + 1, LEN($CodeTexts[$Counter]))
ENDIF
NEXT
FOR EACH $Text IN $CodeTexts
$HTMLtoText = $HTMLtoText + $Text
NEXT
ENDFUNCTION



[ 09. June 2003, 22:51: Message edited by: masken ]
_________________________
The tart is out there

Top
#101896 - 2003-06-09 11:08 PM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
like I said to you in IM.
leave the first entry of the array empty.
this way you can check simply if ascan() and then:

$Char = $Chars[iif(-1<$AltAPos,$AltAPos,0)]

well, there is other ways too...
anyway, the code looks too long still [Razz]
_________________________
!

download KiXnet

Top
#101897 - 2003-06-09 11:29 PM Re: HTMLtoText, has anyone done that?
masken Offline
MM club member
*****

Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
I'd rather have it understandable than 1337  - [Wink]

I'm slow heh.. but I'll figure out what you mean by that soon. Gotta sleep right now though, thanks for the help so far m8's [Smile]

Edit
Ok... think it's more or less done. Posted it here:
]http://www.kixtart.org/cgi-bin/ultimatebb.cgi?ubb=get_topic&f=12&t=000423
Thanks for the help Lonkero & Richard [Smile]


[ 10. June 2003, 09:35: Message edited by: masken ]
_________________________
The tart is out there

Top
#101898 - 2003-06-10 10:14 AM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
what is the part that needs the case-sensitivity?
I see that you fancy using those options [Wink]

but, I indeed don't see a place where you do string compare (which is the reason to need case-sensitivity)...
and the html-codes (the arrays) are case-insensitive
_________________________
!

download KiXnet

Top
#101899 - 2003-06-10 10:35 AM Re: HTMLtoText, has anyone done that?
masken Offline
MM club member
*****

Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
Lonk, try this when having disabled case sensitivity (just a generated thing):
code:
$htmlstring = "color supervisor: &#34;Terminated&#34; scenes &ccedil;(just some ccedil capsulted text)&ccedil; (uncredited) "

...there are upper and lowercase special characters, and the only thing that differs these in the alternative codes are the case it's written in.

ie:
code:
&Ccedil; <> &ccedil;



[ 10. June 2003, 10:40: Message edited by: masken ]
_________________________
The tart is out there

Top
#101900 - 2003-06-10 11:16 AM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
yes, it's different if you use case-sensitivity.
but you are not supposed to!
that is the point.
_________________________
!

download KiXnet

Top
#101901 - 2003-06-10 11:48 AM Re: HTMLtoText, has anyone done that?
masken Offline
MM club member
*****

Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
Yes, you are [Wink] Or you'll get a different character than the one meant in the HTML source.
_________________________
The tart is out there

Top
#101902 - 2003-06-10 11:51 AM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
no.
do you mean that there is character named "Ç" and different one "&CceDil;"?

I don't think so.
_________________________
!

download KiXnet

Top
#101903 - 2003-06-10 11:55 AM Re: HTMLtoText, has anyone done that?
masken Offline
MM club member
*****

Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
Lonk, look at the sample I provided... There's a "Ç" and there's a "ç". Array items 45 and 77 respectively...

hmm.. I guess the most optimal would be to first check the array with case sensitivity, and if -1 is returned, check without case sensitivity?

[ 10. June 2003, 12:08: Message edited by: masken ]
_________________________
The tart is out there

Top
#101904 - 2003-06-11 12:15 AM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
indeed.
weird [Embarrassed] [Confused] [Eek!] [Mad] [Roll Eyes] [Cool]
_________________________
!

download KiXnet

Top
#101905 - 2003-06-11 12:22 AM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
no...
trying again.
"&CceDil;"
"Ç"
"ç"

indeed.
note:
html codes are lcase().
if the first char is ucase() it means it's capital.
this info we can then use to change the udf to work without the setoption [Wink]
_________________________
!

download KiXnet

Top
#101906 - 2003-06-11 12:27 AM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
first of all, we don't need the "&" and ";" marks in the arrays, do we?
strip them of to save strokes.
then the script can be instead of:
code:
$AltAPos = ASCAN($CodeAlts, "&" + $Code + ";")
$Char = $Chars[IIF(-1 < $AltAPos, $AltAPos, 0)]

something like...

no, it's ascan that is case-insensitive...
let me think for a sec.
if skipping the setoption, ascan can't be used [Frown]
_________________________
!

download KiXnet

Top
#101907 - 2003-06-10 01:29 PM Re: HTMLtoText, has anyone done that?
masken Offline
MM club member
*****

Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
Lonk, for me it all works using the SETOPTION(), the correct chars are returned. Try:
code:
"color supervisor: &#34;Terminated&#34; scenes &ccedil;(blah...)&ccedil; &Ccedil;(uncredited)&Ccedil; "

for example.

Didn't we need "&" and ";", since ASCAN() searches for exact matches in the array?

hmm.. since SETOPTION is needed, then we can use a LCASE() if the return from ASCAN is -1 ? My problem with using the ASCAN return code the last time was that I couldn't use:

$returns = ASCAN($array, "whatever")
IF $returns = -1

...shouldn't this work?

[ 10. June 2003, 13:29: Message edited by: masken ]
_________________________
The tart is out there

Top
#101908 - 2003-06-10 01:33 PM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
we don't need those characters as we indeed want exact match.
remove the chars from your arrays (thus saving at least 200 bytes) and then you can remove them from your ascan part too.
_________________________
!

download KiXnet

Top
#101909 - 2003-06-10 01:34 PM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
also, the arrays should be dimmed with 102 elements [Wink]
_________________________
!

download KiXnet

Top
#101910 - 2003-06-10 01:49 PM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
hey, try this...
couldn't post it here...

and I know I promised that I won't code it for you [Wink]
so I just changed it a little and skipped testing [Razz]

anyway, removed some things saw as not needed.
and, you must never put variables into quotes as you may end up crashing your script.
so, check:
http://www.kixtart.org/cgi-bin/ultimatebb.cgi?ubb=get_topic&f=12&t=000423#000002
_________________________
!

download KiXnet

Top
#101911 - 2003-06-10 01:56 PM Re: HTMLtoText, has anyone done that?
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
not sure does split work in-case-sensitively without the setoption but that sounds stupid already.
thus left the setoptions out.

also, now as there is no ascan anymore, the arrays can be combined to one (too lazy to show) and in the last loop split with element 0 of counters element and join with element 1 of counters element, respectively.
_________________________
!

download KiXnet

Top
#101912 - 2003-06-10 02:26 PM Re: HTMLtoText, has anyone done that?
masken Offline
MM club member
*****

Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
Lonk, as on IM, this would split and join arrays 120 times, statically by the array elements, instead of just finding the right CHR directly. Think that will be slower?

I've adjusted the code now, so that the previous CaseSensitivity will be reset when the script exits.
_________________________
The tart is out there

Top
Page 2 of 3 <123>


Moderator:  Arend_, Allen, Jochen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Mart 
Hop to:
Shout Box

Who's Online
0 registered and 811 anonymous users online.
Newest Members
batdk82, StuTheCoder, M_Moore, BeeEm, min_seow
17885 Registered Users

Generated in 0.075 seconds in which 0.026 seconds were spent on a total of 13 queries. Zlib compression enabled.

Search the board with:
superb Board Search
or try with google:
Google
Web kixtart.org