| Summary: | Incorrect UTF-8 conversion for non-BMP characters | ||
|---|---|---|---|
| Product: | JacORB | Reporter: | Peter Klotz <peter.klotz> |
| Component: | ORB | Assignee: | Mailinglist to track bugs <jacorb-bugs> |
| Status: | RESOLVED FIXED | ||
| Severity: | enhancement | CC: | gotthard.witsch, jacorb |
| Priority: | P5 | ||
| Version: | 3.3 | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Attachments: |
Patch for UTF-8 conversion problem
JUnit Test for CodeSet.write_string |
||
|
Description
Peter Klotz
2013-11-13 05:30:21 UTC
Created attachment 431 [details]
Patch for UTF-8 conversion problem
To solve the conversion problem I attached a patch. The following changes have been done: In org.jacorb.org.CDROutputStream the conversion of the string is done by the methodcall codeSet.write_string. Therefore the class org.jacorb.orb.giop.CodeSet received a new method write_string with the following signature: public void write_string( OutputBuffer buffer, String s, boolean write_bom, boolean write_length, int giop_minor ). In it's default implementation it does the same as in CDROutputStream has been done earlier. Every character of the string is converted with the write_char method. The inner class Utf8CodeSet overrides this methode an uses the String's getBytes(Charset charset) method to receive the necessary bytes for transmission. With the buffer's write_byte method the bytes are added to the buffer. The getBytes(Charset charset) is prefered to getBytes(String charsetName), as getBytes(String charsetName) does not specify what will happen if characters cannot be encoded. Thanks for the patch. Do you have the tests you are using? Created attachment 432 [details]
JUnit Test for CodeSet.write_string
I uploaded the tests for the new CodeSet.write_string method. Thanks for the patch and test! Fixed by https://github.com/JacORB/JacORB/pull/103 |