Wednesday, February 2, 2011

How to get international unicode characters from a a form input field/servlet parameter into a string



Answer

[ I have a servlet based app that generates and processes HTML forms. I would like to support mutiple languages/character sets that will be stored in unicode UTF-8 in the database. I am setting the following tags, for example:

<meta content="text/html; charset=Shift_JIS" http-equiv="Content-Type"> and
<form accept-charset="Shift_JIS"...

I am also setting the locale and content type on the HttpServletResponse to "ja" and "text/html; charset=Shift_JIS" respectively.

Unfortunately, the CharacterEncoding on the HttpServletRequest from the post is always null even though it's set properly in the browser.

Does anyone have a sample on how to get international unicode characters from a a form input field/servlet parameter into a string and from a java string into a form input or text field? ]

Answer:

If the request.getCharacterEncoding() is null, the default parsing value of the String is ISO-8859-1.


So if you want to get an Unicode String (UTF-8) you have to do something like that:


String myparam = request.getParameter("myparamname");
if (myparam != null)
myparam = new String(myparam.getBytes("8859_1"),"UTF8");

No comments:

Post a Comment