Wednesday, August 24, 2011

Java / XML UTF-8 marshalling and validating pains

This is a post on the little pains I am sure most people have endured while working with Java and XML from different sources, with 'funny' characters...

If you use a FileWriter instead of a FileOutputStream it will use your OS encoding (So cp1252, ISO-8859-1 or US-ASCII and not UTF-8... ):

A way to ensure that you are using UTF-8 is:

If you are reading XML from a stream, it is probably safer to do something like:

When validating a schema note the .getBytes("UTF-8"):


When marshalling (Use a stream rather than a writer):


Lastly if you are getting XML from some unknown source there are some chars that are outside the legal XML Unicode limits that you can not encode. To remove those:
Code below from another blog :





1 comment:

  1. Your site has a lot of useful information for myself. I visit regularly. Hope to have more quality items.

    ReplyDelete

Popular Posts

Followers