Tuesday, October 18, 2005

UTF-8, Java, XML and my linux kernel

Posts to Dom4j after receiving "?" instead of unicode chars while parsing XML.
I've searched and searched for my problem, and this is my last hope.Apache Maven 1.x uses Dom4j for various xml processing and provides a Jelly tag for parsing XML into a DefaultDocument. I'm experiencing Unicode characters being translated into "?" and the hooks are definitely not what I need.

I've replaced the 1.4 version shipped with Maven to 1.6.1 and have
created a simple test goal in order to replicate the behavior. I've
tried to two different ways of parsing the file, one through using a StringReader that's opened a file in UTF-8, and the other using the default xml:parse tag in Jelly. Both with the same results. So, my question is, has anyone else experienced anything similar? Should I be posting my question on the Maven list? Any ideas?

Pretty much after I sent my post I realized that this could be a
JVM/System issue. Further testing, and loading NLS into my kernel
resolved the problem. The System.out issue was resolved by setting my LC_ALL="en_US.utf8" and then I proceeded with my kernel modification.

