#101874 - 2003-06-05 04:38 PM
Re: HTMLtoText, has anyone done that?
|
Kdyer
KiX Supporter
   
Registered: 2001-01-03
Posts: 6241
Loc: Tigard, OR
|
Have you had a look at something like this?
http://www.ascii.cl/htmlcodes.htm
Kent
|
|
Top
|
|
|
|
#101880 - 2003-06-06 01:01 PM
Re: HTMLtoText, has anyone done that?
|
masken
MM club member
   
Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
|
Ok, this works, with some limitations. Please comment:
code:
;FUNCTION HTMLtoText() ; ;ACTION Convert a HTML source string into plain text ; ;AUTHOR masken (masken|at|home.se) ; ;VERSION 1.0 ; ;DATE CREATED 2003-06-06 ; ;DATE MODIFIED - ; ;KIXTART 4.12+ ; ;SYNTAX HTMLtoText([STRING]) ; ;PARAMETERS STRING ; The string you want to convert. ; ;RETURNS 1 If all HTML codes were converted ; 2 If there were no codes to convert ; 3 If not all codes could be converted ; ;REMARKS * If there's a ";" in the $STRING (ie; the HTML source), ; the conversion will go wrong (there shouldn't be any though). ; * Only handles decimal codes. Some webpage code might have hex codes. ; * Doesn't handle alternatives, like """ and " " for example. ; ; The list of decimal codes used for the arrays were taken from here: ; http://tergestesoft.com/~eddysworld/spechars.htm ; ;DEPENDENCIES none ; ;EXAMPLE $htmlstring = "Implacável" ; $result = HTMLtoText($htmlstring) ; ; $result will be: "Implacável" ; ;KIXTART BBS http://www.kixtart.org/cgi-bin/ultimatebb.cgi?ubb=get_topic&f=14&t=000765 ; ;===============================================================================
Function HTMLtoText($string) ;| Exit if there's nothing to convert IF INSTR($string, "&#") = 0 EXIT 2 ENDIF
DIM $Code[157] $Code[0] = "	" $Code[1] = " " $Code[2] = " " $Code[3] = " " $Code[4] = "!" $Code[5] = """ $Code[6] = "#" $Code[7] = "$" $Code[8] = "%" $Code[9] = "&" $Code[10] = "'" $Code[11] = "(" $Code[12] = ")" $Code[13] = "*" $Code[14] = "+" $Code[15] = "," $Code[16] = "-" $Code[17] = "." $Code[18] = "/" $Code[19] = ":" $Code[20] = ";" $Code[21] = "<" $Code[22] = "=" $Code[23] = ">" $Code[24] = "?" $Code[25] = "@" $Code[26] = "[" $Code[27] = "\" $Code[28] = "]" $Code[29] = "^" $Code[30] = "_" $Code[31] = "`" $Code[32] = "{" $Code[33] = "|" $Code[34] = "}" $Code[35] = "~" $Code[36] = "‚" $Code[37] = "ƒ" $Code[38] = "„" $Code[39] = "…" $Code[40] = "†" $Code[41] = "‡" $Code[42] = "ˆ" $Code[43] = "‰" $Code[44] = "Š" $Code[45] = "‹" $Code[46] = "Œ" $Code[47] = "‘" $Code[48] = "’" $Code[49] = "“" $Code[50] = "”" $Code[51] = "•" $Code[52] = "–" $Code[53] = "—" $Code[54] = "˜" $Code[55] = "™" $Code[56] = "š" $Code[57] = "›" $Code[58] = "œ" $Code[59] = "Ÿ" $Code[60] = " " $Code[61] = "¡" $Code[62] = "¢" $Code[63] = "£" $Code[64] = "¤" $Code[65] = "¥" $Code[66] = "¦" $Code[67] = "§" $Code[68] = "¨" $Code[69] = "©" $Code[70] = "ª" $Code[71] = "«" $Code[72] = "¬" $Code[73] = "­" $Code[74] = "®" $Code[75] = "¯" $Code[76] = "°" $Code[77] = "±" $Code[78] = "²" $Code[79] = "³" $Code[80] = "´" $Code[81] = "µ" $Code[82] = "¶" $Code[83] = "·" $Code[84] = "¸" $Code[85] = "¹" $Code[86] = "º" $Code[87] = "»" $Code[88] = "¼" $Code[89] = "½" $Code[90] = "¾" $Code[91] = "¿" $Code[92] = "À" $Code[93] = "Á" $Code[94] = "Â" $Code[95] = "Ã" $Code[96] = "Ä" $Code[97] = "Å" $Code[98] = "Æ" $Code[99] = "Ç" $Code[100] = "È" $Code[101] = "É" $Code[102] = "Ê" $Code[103] = "Ë" $Code[104] = "Ì" $Code[105] = "Í" $Code[106] = "Î" $Code[107] = "Ï" $Code[108] = "Ð" $Code[109] = "Ñ" $Code[110] = "Ò" $Code[111] = "Ó" $Code[112] = "Ô" $Code[113] = "Õ" $Code[114] = "Ö" $Code[115] = "×" $Code[116] = "Ø" $Code[117] = "Ù" $Code[118] = "Ú" $Code[119] = "Û" $Code[120] = "Ü" $Code[121] = "Ý" $Code[122] = "Þ" $Code[123] = "ß" $Code[124] = "à" $Code[125] = "á" $Code[126] = "â" $Code[127] = "ã" $Code[128] = "ä" $Code[129] = "å" $Code[130] = "æ" $Code[131] = "ç" $Code[132] = "è" $Code[133] = "é" $Code[134] = "ê" $Code[135] = "ë" $Code[136] = "ì" $Code[137] = "í" $Code[138] = "î" $Code[139] = "ï" $Code[140] = "ð" $Code[141] = "ñ" $Code[142] = "ò" $Code[143] = "ó" $Code[144] = "ô" $Code[145] = "õ" $Code[146] = "ö" $Code[147] = "÷" $Code[148] = "ø" $Code[149] = "ù" $Code[150] = "ú" $Code[151] = "û" $Code[152] = "ü" $Code[153] = "ý" $Code[154] = "þ" $Code[155] = "ÿ"
DIM $Char[157] $Char[0] = CHR(9) $Char[1] = CHR(10) $Char[2] = CHR(13) $Char[3] = " " $Char[4] = "!" $Char[5] = CHR(34) $Char[6] = "#" $Char[7] = CHR(36) $Char[8] = "%" $Char[9] = "&" $Char[10] = CHR(39) $Char[11] = "(" $Char[12] = ")" $Char[13] = "*" $Char[14] = "+" $Char[15] = "," $Char[16] = "-" $Char[17] = "." $Char[18] = "/" $Char[19] = ":" $Char[20] = ";" $Char[21] = "<" $Char[22] = "=" $Char[23] = ">" $Char[24] = "?" $Char[25] = "@" $Char[26] = "[" $Char[27] = "\" $Char[28] = "]" $Char[29] = "^" $Char[30] = "_" $Char[31] = "`" $Char[32] = "{" $Char[33] = "|" $Char[34] = "}" $Char[35] = "~" $Char[36] = "‚" $Char[37] = "ƒ" $Char[38] = "„" $Char[39] = "…" $Char[40] = "†" $Char[41] = "‡" $Char[42] = "ˆ" $Char[43] = "‰" $Char[44] = "Š" $Char[45] = "‹" $Char[46] = "Œ" $Char[47] = "‘" $Char[48] = "’" $Char[49] = "“" $Char[50] = "”" $Char[51] = "•" $Char[52] = "–" $Char[53] = "—" $Char[54] = "˜" $Char[55] = "™" $Char[56] = "š" $Char[57] = "›" $Char[58] = "œ" $Char[59] = "Ÿ" $Char[60] = " " $Char[61] = "¡" $Char[62] = "¢" $Char[63] = "£" $Char[64] = "¤" $Char[65] = "¥" $Char[66] = "¦" $Char[67] = "§" $Char[68] = "¨" $Char[69] = "©" $Char[70] = "ª" $Char[71] = "«" $Char[72] = "¬" $Char[73] = "" $Char[74] = "®" $Char[75] = "¯" $Char[76] = "°" $Char[77] = "±" $Char[78] = "²" $Char[79] = "³" $Char[80] = "´" $Char[81] = "µ" $Char[82] = "¶" $Char[83] = "·" $Char[84] = "¸" $Char[85] = "¹" $Char[86] = "º" $Char[87] = "»" $Char[88] = "¼" $Char[89] = "½" $Char[90] = "¾" $Char[91] = "¿" $Char[92] = "À" $Char[93] = "Á" $Char[94] = "Â" $Char[95] = "Ã" $Char[96] = "Ä" $Char[97] = "Å" $Char[98] = "Æ" $Char[99] = "Ç" $Char[100] = "È" $Char[101] = "É" $Char[102] = "Ê" $Char[103] = "Ë" $Char[104] = "Ì" $Char[105] = "Í" $Char[106] = "Î" $Char[107] = "Ï" $Char[108] = "Ð" $Char[109] = "Ñ" $Char[110] = "Ò" $Char[111] = "Ó" $Char[112] = "Ô" $Char[113] = "Õ" $Char[114] = "Ö" $Char[115] = "×" $Char[116] = "Ø" $Char[117] = "Ù" $Char[118] = "Ú" $Char[119] = "Û" $Char[120] = "Ü" $Char[121] = "Ý" $Char[122] = "Þ" $Char[123] = "ß" $Char[124] = "à" $Char[125] = "á" $Char[126] = "â" $Char[127] = "ã" $Char[128] = "ä" $Char[129] = "å" $Char[130] = "æ" $Char[131] = "ç" $Char[132] = "è" $Char[133] = "é" $Char[134] = "ê" $Char[135] = "ë" $Char[136] = "ì" $Char[137] = "í" $Char[138] = "î" $Char[139] = "ï" $Char[140] = "ð" $Char[141] = "ñ" $Char[142] = "ò" $Char[143] = "ó" $Char[144] = "ô" $Char[145] = "õ" $Char[146] = "ö" $Char[147] = "÷" $Char[148] = "ø" $Char[149] = "ù" $Char[150] = "ú" $Char[151] = "û" $Char[152] = "ü" $Char[153] = "ý" $Char[154] = "þ" $Char[155] = "ÿ"
$MaxTries = LEN("$string") / 5 WHILE INSTR($string, "&#") <> 0 AND $Tries < $MaxTries $Tries = $Tries + 1 $CodeStart = INSTR("$string", "&#") $CodeEnd = INSTR("$string", ";") + 1 $CodeToReplace = SUBSTR("$string", $CodeStart, $CodeEnd - $CodeStart) $CodeAPos = ASCAN($Code, $CodeToReplace) IF $CodePos <> -1 $CharToInsert = $Char[$CodeAPos] $string = SUBSTR("$string", 1, $CodeStart - 1) + $CharToInsert + SUBSTR("$string", $CodeEnd, LEN("$string")) ENDIF LOOP $HTMLtoText = $string IF INSTR($HTMLtoText, "&#") <> 0 EXIT 3 ELSE EXIT 1 ENDIF EndFunction
[ 06. June 2003, 13:07: Message edited by: masken ]
_________________________
The tart is out there
|
|
Top
|
|
|
|
#101885 - 2003-06-07 10:35 AM
Re: HTMLtoText, has anyone done that?
|
masken
MM club member
   
Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
|
hehe.. thanks m8
btw; I took the liberty to make your code readable too
code:
FUNCTION html2text($TxtToConvert) $Count,$Code $TxtToConvert = SPLIT($TxtToConvert, "&#") FOR $Count = 0 TO UBOUND($TxtToConvert) $Code = SPLIT($TxtToConvert[$Count], ";")[0] $TxtToConvert[$Count] = CHR(SUBSTR($Code, 2)) + SUBSTR($TxtToConvert[$Count], LEN($Code) + 1) NEXT $html2text = $TxtToConvert ENDFUNCTION
I'll look into if the CHR() is really useful here... but if you look at the good old ASCII table displayer, you'll see that the cdes don't match?
code:
BREAK ON CLS
AT(9,22) "Starting Windows in progress..." BOX(10,10,12,65,"single") $COL = 12 WHILE $COL < 64 SETASCII("on") $a = 1 CLS WHILE $a < 6 ;Change this value to get more or less numbers $y = 1 WHILE $y <= 80 $z = 1 WHILE $z <= 20 AT($z,$y) "$x " + CHR($x) $z = $z + 1 $x = $x + 1 LOOP $y = $y + 10 LOOP ? SHELL "%COMSPEC% /c pause" $a = $a + 1 LOOP EXIT
_________________________
The tart is out there
|
|
Top
|
|
|
|
#101887 - 2003-06-09 09:41 AM
Re: HTMLtoText, has anyone done that?
|
Richard H.
Administrator
   
Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
|
Be careful when talking about ASCII codes (and tables).
Standard ASCII is 7 bit, and so only covers codes 0-127.
The "extended" ASCII sets are 8 bit (0-255), and the extra symbols are used for national language replacement and things like line drawing characters.
The biggest problem is that the characters above 127 are not constant. This is why in DOS you used to have to load "codeset" pages, which installed a local version of the symbols - these are often called things like "Latin-8" or "IBM Line drawing".
We still have code pages even now,, especially in email and HTML - check out the "charset=" header found in mail headers and on web server responses which defines an ISO character set. You may occasionally get email from the Internet which has warnings that the character set is not supported where it comes from a country which uses a different alphabet. In the Roman alphabet world we would have trouble dealing with Kanjii or cyrillic alphabets unless those code pages have been installed.
Printers also have code pages - when you select a font in say Word, the application will translate it into typeface, weight, and character set amongst other things.
What all this means is that the symbols which appear may well be different on different PCs for the same number. Indeed, the symbol that appears in you Word processing document may not be the one that appears on the page when you print it.
If you want to see the same things you need to be sure that the devices are set the same.
The adoption of unicode will avoid these problems, as each symbol will have a unique number.
|
|
Top
|
|
|
|
#101889 - 2003-06-09 04:45 PM
Re: HTMLtoText, has anyone done that?
|
masken
MM club member
   
Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
|
Yeah... you're right avout the ASCII codes there Richard. But I think in webpages, it's a static table? Ie; the one below? This would also make sure any local OS language gets the codes right?
Right now though.. now why doesn't this work? ASCAN() always returns -1. the IF NOT case still kicks in too
code:
FUNCTION HTMLtoText($string) ;| Exit if there's nothing to convert IF INSTR($string, "&") = 0 OR INSTR($string, ";") = 0 EXIT 2 ENDIF
DIM $Codes[157] $Codes[0] = CHR(9) + "&&" + "	" $Codes[1] = CHR(10) + "&&" + " " $Codes[2] = CHR(13) + "&&" + " " $Codes[3] = " " + "&&" + " " $Codes[4] = "!" + "&&" + "!" $Codes[5] = CHR(34) + "&&" + """ + "&&" + """ $Codes[6] = "#" + "&&" + "#" $Codes[7] = CHR(36) + "&&" + "$" $Codes[8] = "%" + "&&" + "%" $Codes[9] = "&" + "&&" + "&" + "&&" + "&" $Codes[10] = CHR(39) + "&&" + "'" $Codes[11] = "(" + "&&" + "(" $Codes[12] = ")" + "&&" + ")" $Codes[13] = "*" + "&&" + "*" $Codes[14] = "+" + "&&" + "+" $Codes[15] = "," + "&&" + "," $Codes[16] = "-" + "&&" + "-" $Codes[17] = "." + "&&" + "." $Codes[18] = "/" + "&&" + "/" $Codes[19] = ":" + "&&" + ":" $Codes[20] = ";" + "&&" + ";" $Codes[21] = "<" + "&&" + "<" + "&&" + "<" $Codes[22] = "=" + "&&" + "=" $Codes[23] = ">" + "&&" + ">" + "&&" + ">" $Codes[24] = "?" + "&&" + "?" $Codes[25] = "@" + "&&" + "@" $Codes[26] = "[" + "&&" + "[" $Codes[27] = "\" + "&&" + "\" $Codes[28] = "]" + "&&" + "]" $Codes[29] = "^" + "&&" + "^" $Codes[30] = "_" + "&&" + "_" $Codes[31] = "`" + "&&" + "`" $Codes[32] = "{" + "&&" + "{" $Codes[33] = "|" + "&&" + "|" $Codes[34] = "}" + "&&" + "}" $Codes[35] = "~" + "&&" + "~" $Codes[36] = "‚" + "&&" + "‚" $Codes[37] = "ƒ" + "&&" + "ƒ" $Codes[38] = "„" + "&&" + "„" $Codes[39] = "…" + "&&" + "…" $Codes[40] = "†" + "&&" + "†" $Codes[41] = "‡" + "&&" + "‡" $Codes[42] = "ˆ" + "&&" + "ˆ" $Codes[43] = "‰" + "&&" + "‰" $Codes[44] = "Š" + "&&" + "Š" $Codes[45] = "‹" + "&&" + "‹" $Codes[46] = "Œ" + "&&" + "Œ" $Codes[47] = "‘" + "&&" + "‘" $Codes[48] = "’" + "&&" + "’" $Codes[49] = "“" + "&&" + "“" $Codes[50] = "”" + "&&" + "”" $Codes[51] = "•" + "&&" + "•" $Codes[52] = "–" + "&&" + "–" $Codes[53] = "—" + "&&" + "—" $Codes[54] = "˜" + "&&" + "˜" $Codes[55] = "™" + "&&" + "™" + "&&" + "™" $Codes[56] = "š" + "&&" + "š" $Codes[57] = "›" + "&&" + "›" $Codes[58] = "œ" + "&&" + "œ" $Codes[59] = "Ÿ" + "&&" + "Ÿ" $Codes[60] = " " + "&&" + " " + "&&" + " " $Codes[61] = "¡" + "&&" + "¡" + "&&" + "¡" $Codes[62] = "¢" + "&&" + "¢" + "&&" + "¢" $Codes[63] = "£" + "&&" + "£" + "&&" + "£" $Codes[64] = "¤" + "&&" + "¤" + "&&" + "¤" $Codes[65] = "¥" + "&&" + "¥" + "&&" + "¥" $Codes[66] = "¦" + "&&" + "¦" + "&&" + "¦" $Codes[67] = "§" + "&&" + "§" + "&&" + "§" $Codes[68] = "¨" + "&&" + "¨" + "&&" + "¨" $Codes[69] = "©" + "&&" + "©" + "&&" + "©" $Codes[70] = "ª" + "&&" + "ª" + "&&" + "ª" $Codes[71] = "«" + "&&" + "«" + "&&" + "«" $Codes[72] = "¬" + "&&" + "¬" + "&&" + "¬" $Codes[73] = "" + "&&" + "­" + "&&" + "­" $Codes[74] = "®" + "&&" + "®" + "&&" + "®" $Codes[75] = "¯" + "&&" + "¯" + "&&" + "¯" $Codes[76] = "°" + "&&" + "°" + "&&" + "°" $Codes[77] = "±" + "&&" + "±" + "&&" + "±" $Codes[78] = "²" + "&&" + "²" + "&&" + "²" $Codes[79] = "³" + "&&" + "³" + "&&" + "³" $Codes[80] = "´" + "&&" + "´" + "&&" + "´" $Codes[81] = "µ" + "&&" + "µ" + "&&" + "µ" $Codes[82] = "¶" + "&&" + "¶" + "&&" + "¶" $Codes[83] = "·" + "&&" + "·" + "&&" + "·" $Codes[84] = "¸" + "&&" + "¸" + "&&" + "¸" $Codes[85] = "¹" + "&&" + "¹" + "&&" + "¹" $Codes[86] = "º" + "&&" + "º" + "&&" + "º" $Codes[87] = "»" + "&&" + "»" + "&&" + "»" $Codes[88] = "¼" + "&&" + "¼" + "&&" + "¼" $Codes[89] = "½" + "&&" + "½" + "&&" + "½" $Codes[90] = "¾" + "&&" + "¾" + "&&" + "¾" $Codes[91] = "¿" + "&&" + "¿" + "&&" + "¿" $Codes[92] = "À" + "&&" + "À" + "&&" + "À" $Codes[93] = "Á" + "&&" + "Á" + "&&" + "Á" $Codes[94] = "Â" + "&&" + "Â" + "&&" + "Â" $Codes[95] = "Ã" + "&&" + "Ã" + "&&" + "Ã" $Codes[96] = "Ä" + "&&" + "Ä" + "&&" + "Ä" $Codes[97] = "Å" + "&&" + "Å" + "&&" + "Å" $Codes[98] = "Æ" + "&&" + "Æ" + "&&" + "Æ" $Codes[99] = "Ç" + "&&" + "Ç" + "&&" + "Ç" $Codes[100] = "È" + "&&" + "È" + "&&" + "È" $Codes[101] = "É" + "&&" + "É" + "&&" + "É" $Codes[102] = "Ê" + "&&" + "Ê" + "&&" + "Ê" $Codes[103] = "Ë" + "&&" + "Ë" + "&&" + "Ë" $Codes[104] = "Ì" + "&&" + "Ì" + "&&" + "Ì" $Codes[105] = "Í" + "&&" + "Í" + "&&" + "Í" $Codes[106] = "Î" + "&&" + "Î" + "&&" + "Î" $Codes[107] = "Ï" + "&&" + "Ï" + "&&" + "Ï" $Codes[108] = "Ð" + "&&" + "Ð" + "&&" + "ð" $Codes[109] = "Ñ" + "&&" + "Ñ" + "&&" + "Ñ" $Codes[110] = "Ò" + "&&" + "Ò" + "&&" + "Ò" $Codes[111] = "Ó" + "&&" + "Ó" + "&&" + "Ó" $Codes[112] = "Ô" + "&&" + "Ô" + "&&" + "Ô" $Codes[113] = "Õ" + "&&" + "Õ" + "&&" + "Õ" $Codes[114] = "Ö" + "&&" + "Ö" + "&&" + "Ö" $Codes[115] = "×" + "&&" + "×" + "&&" + "×" $Codes[116] = "Ø" + "&&" + "Ø" + "&&" + "Ø" $Codes[117] = "Ù" + "&&" + "Ù" + "&&" + "Ù" $Codes[118] = "Ú" + "&&" + "Ú" + "&&" + "Ú" $Codes[119] = "Û" + "&&" + "Û" + "&&" + "Û" $Codes[120] = "Ü" + "&&" + "Ü" + "&&" + "Ü" $Codes[121] = "Ý" + "&&" + "Ý" + "&&" + "Ý" $Codes[122] = "Þ" + "&&" + "Þ" + "&&" + "þ" $Codes[123] = "ß" + "&&" + "ß" + "&&" + "ß" $Codes[124] = "à" + "&&" + "à" + "&&" + "à" $Codes[125] = "á" + "&&" + "á" + "&&" + "á" $Codes[126] = "â" + "&&" + "â" + "&&" + "â" $Codes[127] = "ã" + "&&" + "ã" + "&&" + "ã" $Codes[128] = "ä" + "&&" + "ä" + "&&" + "ä" $Codes[129] = "å" + "&&" + "å" + "&&" + "å" $Codes[130] = "æ" + "&&" + "æ" + "&&" + "æ" $Codes[131] = "ç" + "&&" + "ç" + "&&" + "ç" $Codes[132] = "è" + "&&" + "è" + "&&" + "è" $Codes[133] = "é" + "&&" + "é" + "&&" + "é" $Codes[134] = "ê" + "&&" + "ê" + "&&" + "ê" $Codes[135] = "ë" + "&&" + "ë" + "&&" + "ë" $Codes[136] = "ì" + "&&" + "ì" + "&&" + "ì" $Codes[137] = "í" + "&&" + "í" + "&&" + "í" $Codes[138] = "î" + "&&" + "î" + "&&" + "î" $Codes[139] = "ï" + "&&" + "ï" + "&&" + "ï" $Codes[140] = "ð" + "&&" + "ð" + "&&" + "ð" $Codes[141] = "ñ" + "&&" + "ñ" + "&&" + "ñ" $Codes[142] = "ò" + "&&" + "ò" + "&&" + "ò" $Codes[143] = "ó" + "&&" + "ó" + "&&" + "ó" $Codes[144] = "ô" + "&&" + "ô" + "&&" + "ô" $Codes[145] = "õ" + "&&" + "õ" + "&&" + "õ" $Codes[146] = "ö" + "&&" + "ö" + "&&" + "ö" $Codes[147] = "÷" + "&&" + "÷" + "&&" + "÷" $Codes[148] = "ø" + "&&" + "ø" + "&&" + "ø" $Codes[149] = "ù" + "&&" + "ù" + "&&" + "ù" $Codes[150] = "ú" + "&&" + "ú" + "&&" + "ú" $Codes[151] = "û" + "&&" + "û" + "&&" + "û" $Codes[152] = "ü" + "&&" + "ü" + "&&" + "ü" $Codes[153] = "ý" + "&&" + "ý" + "&&" + "ý" $Codes[154] = "þ" + "&&" + "þ" + "&&" + "þ" $Codes[155] = "ÿ" + "&&" + "ÿ" + "&&" + "ÿ"
$MaxTries = LEN("$string") / 5 WHILE INSTR($string, "&") <> 0 AND $Tries < $MaxTries $Tries = $Tries + 1 $CodeFound = 0 $CodeStart = INSTR("$string", "&") $CodeEnd = INSTR("$string", ";") + 1 $CodeToReplace = SUBSTR("$string", $CodeStart, $CodeEnd - $CodeStart) $CodeLength = LEN("$CodeToReplace") ;---TEST ? "CodeStart: " + $CodeStart ? "CodeEnd: " + $CodeEnd ? "CodeToReplace: " + CHR(34) + $CodeToReplace + CHR(34) ? "CodeLength: " + $CodeLength ;---/TEST IF $CodeLength > 3 AND $CodeLength < 9 ;|All codes are between 4-8 characters long. $CodeAPos = ASCAN($Codes, $CodeToReplace) ? "CodeAPos: " + $CodeAPos IF NOT $CodePos < 0 ;|The code exists $CodeFound = 1 $CharToInsert = $Codes[$CodeAPos] ;---TEST ? "CharToInsert: " + $CharToInsert ;---/TEST $CharToInsert = SPLIT($CharToInsert, "&&")[0] ;---TEST ? "CharToInsert: " + $CharToInsert ;---/TEST $string = SUBSTR("$string", 1, $CodeStart - 1) + $CharToInsert + SUBSTR("$string", $CodeEnd, LEN("$string")) ENDIF ENDIF IF $CodeFound <> 1 ;|we need to skip the part which isn't convertible before the next loop $stringNoChar = $stringNoChar + SUBSTR("$string", 1, $CodeEnd) $string = SUBSTR("$string", $CodeEnd + 1, LEN("$string")) ENDIF LOOP $HTMLtoText = $string IF INSTR($HTMLtoText, "&#") <> 0 EXIT 3 ELSE EXIT 1 ENDIF ENDFUNCTION
[ 09. June 2003, 16:55: Message edited by: masken ]
_________________________
The tart is out there
|
|
Top
|
|
|
|
#101891 - 2003-06-09 05:13 PM
Re: HTMLtoText, has anyone done that?
|
Richard H.
Administrator
   
Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK
|
Actually, the "standard" symbols are taken from a couple of code pages - more information here
The idea is that it is up to the renderer to decide what the character should look like. That information is contained in the HTTP headers which are sent before the HTML data - you won't see them if you "view" the HTML text, but you can get hold of them through the COM interface.
If you are simply converting the characters from the "&nnn;" format you should leave them as whatever appears by using the CHR() function, otherwise you are translating to a completely different character.
If you are writing a rendering application then you can change the characters to whatever you want, but you need to be aware that if you ignore the code set which the webserver is expecting you to have installed you may end up with odd results.
|
|
Top
|
|
|
|
#101892 - 2003-06-09 05:17 PM
Re: HTMLtoText, has anyone done that?
|
masken
MM club member
   
Registered: 2000-11-27
Posts: 1222
Loc: Gothenburg, Sweden
|
Oh, thanks Lonk... ASCAN() needs perfect array matches, not partial... guess the manual could be a bit more precies there than "Searches an array for an element containing the same value as an expression.".
But there's still a need for an array for the "alternative" codes (last in some of the array entries above).
Ok Richard, thanks The HTML in this case is the returned page from another function, that Lonkero wrote: code:
; Loads a webpage into a variable Function GetPage($URL) DIM $htmldata $htmldata = CreateObject("microsoft.XMLhttp") $htmldata.open("GET",$URL,not 1) $htmldata.send $getpage=$htmldata.responsetext ;or responsebody EndFunction
So I guess what you're saying is that I should use Lonkero's CHR() based conversion for numeric format entries (with SETASCII("ON") in KiX), and an array for the Name format codes? [ 09. June 2003, 17:30: Message edited by: masken ]
_________________________
The tart is out there
|
|
Top
|
|
|
|
Moderator: Arend_, Allen, Jochen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Mart
|
1 registered
(Allen)
and 675 anonymous users online.
|
|
|