Tuesday, October 18, 2005

UTF-8, Java, XML and my linux kernel

Posts to Dom4j after receiving "?" instead of unicode chars while parsing XML.
-----------
I've searched and searched for my problem, and this is my last hope.Apache Maven 1.x uses Dom4j for various xml processing and provides a Jelly tag for parsing XML into a DefaultDocument. I'm experiencing Unicode characters being translated into "?" and the hooks are definitely not what I need.

I've replaced the 1.4 version shipped with Maven to 1.6.1 and have
created a simple test goal in order to replicate the behavior. I've
tried to two different ways of parsing the file, one through using a StringReader that's opened a file in UTF-8, and the other using the default xml:parse tag in Jelly. Both with the same results. So, my question is, has anyone else experienced anything similar? Should I be posting my question on the Maven list? Any ideas?

Thanks in advance,
-RR-

UPDATE
Pretty much after I sent my post I realized that this could be a
JVM/System issue. Further testing, and loading NLS into my kernel
resolved the problem. The System.out issue was resolved by setting my LC_ALL="en_US.utf8" and then I proceeded with my kernel modification.

No comments: