How to parse xml files with special Unicode chars?

Login to reply to this topic.
Mon, 2006-11-20 03:10
Joined: 2006-11-20
Forum posts: 1
Hi all,

I'm using the XML Framework API in 3rd edition to parse xml files, however, problems come to me that CSenXmlReader always fails to parse those files with special Unicode characters, such as Chinese words.
Although i've tried to convert the whole file into unicode and utf-8 format with provided char conversion APIs, but it still doesn't work.

Here is the code i used to parse the xml file, and it works well when parsing xml files with all english words.
Code:
RFs fs;
User::LeaveIfError(fs.Connect());
CleanupClosePushL(fs);

CSenXmlReader* reader = CSenXmlReader::NewL( EConvertTagsToLowerCase | ELastFeature );
CSenDomFragment* dom =  CSenDomFragment::NewL();

dom->SetReader(*reader);
reader->SetContentHandler(*dom);

TRAPD( err, reader->ParseL(fs, aFileToParse) );

However, when the file contains some Chinese words, and error was trapped when executing "reader->ParseL(fs, aFileToParse)".

And the file to parse may look like:
Code:
<meta>
  <head>标题</head>
  ... ...
</meta>

I used RDebug to retrieve error code as below:
Code:
RDebug::Printf("%d", err);

And the error code i got is -996, then i looked up the parser error enumeration in xmlparsererrors.h (S60_3rd\Epoc32\include\xml\), it seems to be a EXmlInvalidToken error.

But i still have no idea about the exact reason for this error.
How to encode the file to prevent this error, Plz help!

Btw. I also tried Japanese words and Greek letter, the error always exist.

Mon, 2007-11-12 18:49
Joined: 2007-09-04
Forum posts: 5
Re: How to parse xml files with special Unicode chars?

Hi,

Does anybody have ideas to solve this kind of problem. I try to use the nokia Web Services API, but I get the same error (EXmlInvalidToken) when xml response contains ä or ö characters.

br,
Mikko

Mon, 2007-11-12 19:12
NewLC AdministratorSymbian AccreditedForum Nokia Champion
Joined: 2003-01-14
Forum posts: 1918
Re: How to parse xml files with special Unicode chars?

I do not have your problem. I can use the XML parser (not the whole Web Service API though) and successfully parse content with many nordic characters (ö, æ, ø, etc....).

I do parse content in UTF8 format (thus stored in a TDes8).


Eric Bustarret
NewLC Founder & CEO / Professional Symbian OS Consultant

Mon, 2007-11-12 22:07
Joined: 2005-11-20
Forum posts: 1154
Re: How to parse xml files with special Unicode chars?

Maybe your XML files are not correct, in quite a subtle way? E.g. the very first line says encoding="UTF-8", but then in the content the umlaut characters are not really in UTF-8 but in ISO-8859-1 / Latin 1, which is a contradiction that a parser probably won't tolerate.


René Brunner

Tue, 2007-11-13 17:40
Joined: 2007-09-04
Forum posts: 5
Re: How to parse xml files with special Unicode chars?

rbrunner, maybe you are right. I noticed same thing. Have to take closer look to this problem today.

Tue, 2007-11-13 17:46
Joined: 2007-09-04
Forum posts: 5
Re: How to parse xml files with special Unicode chars?

Well, now it works. My text wasn't UTF-8 encoded even header's charset was UTF-8. I thought that all texts are automatically encoded to UTF-8 if header's charset is utf-8. I was wrong.

I added utf_8_encode() call to my php code and now it works.

Before: return new SOAP_Value('Osoite','string', $rivi["osoite"])
Now: return new SOAP_Value('Osoite','string', utf8_encode($rivi["osoite"]))

Thanks for the help!

BR,
Mikko

  • Login to reply to this topic.