XML parsing and unicode text fields
Posted: Wed Aug 12, 2009 3:17 am
So, I'm having an issue parsing XML from various websites. While the XML tags parse just fine, the inner text doesn't convert unicode characters (for example: ’). For example:
This returns back a tag with the inner text of "’" which is wrong. I can apply a regular expression to the text and use numToChar with useUnicode set to true, but the character still won't appear properly in the text field. Sample code:
Anyway, numToChar doesn't appear to work properly for this situation (or I'm doing something wrong), and regardless of the font I use in the field, if numToChar is working, the character being displayed is very wrong.
Of course, what would be ideal is if revXml just properly handled the translation itself (maybe it does and I'm doing something wrong?).
Any thoughts?
Jeff M.
Code: Select all
<some_tag>’</some_tag>
Code: Select all
set useUnicode to true
-- strip all the html tags from the description
put it into tDesc
-- fix all the unicode characters
repeat while matchText(tDesc, "&#x([0-9A-Fa-f]+);", tUninum)
put baseConvert(tUninum, 16, 10) into tUnihex
put numToChar(tUnihex) into tUnichar
replace "&#x" & tUninum & ";" with tUnichar in tDesc
end repeat
put tDesc into field "Test"
Of course, what would be ideal is if revXml just properly handled the translation itself (maybe it does and I'm doing something wrong?).
Any thoughts?
Jeff M.