How to use ICU with UTF-16?

Venemo

2013-11-08T00:59:57

I'm looking into using ICU for Unicode string processing in a native Node.js module because it seems to me that v8::String (according to these docs) doesn't have a C++ API for this purpose.

To my knowledge V8 expects UTF-16 in ExternalStringResource and other APIs, so I'd like to use ICU for UTF-16 processing. I specifically need to:

Iterate over the characters (not just the 16-bit code units) of an UTF-16 string
Tell the number of characters (not just the 16-bit code units) that an UTF-16 string contains

So I looked at the ICU documentation and found the UnicodeString and CharacterIterator classes. However, UnicodeString doesn't have a fromUTF16 method, only fromUTF8 and fromUTF32.

The other thing I'm unsure about is, does the UnicodeString constructor copy the data I give it or not? I'd very much prefer to use a zero-copy approach where I'd just work with an immutable object so it shouldn't perform any copy operations, just use the buffer I point it at.

I'm also unsure if I can just use UCharIterator (assuming I can somehow convert UChar* from my UTF-16 strings).

So my question is: How do I use ICU for the above purposes?

Copyright License：
Author:「Venemo」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to：https://stackoverflow.com/questions/19842014/how-to-use-icu-with-utf-16

About “How to use ICU with UTF-16?” questions

How to use ICU with UTF-16?

I'm looking into using ICU for Unicode string processing in a native Node.js module because it seems to me that v8::String (according to these docs) doesn't have a C++ API for this purpose. To my

ICU Big Endian Strings

I want to convert a UnicodeString using ICU to a UTF-16 string, encoded in big-endian, for sending through a socket. The big-endian thing is messing me up. I can't seem to find any resources relat...

how to use ICU java API for encoding

I have a byte[] array and I have to convert it to "Unicode, UnicodeWithBOM e.t.c" I tried ICU library but unable to find any solution. is there any functions available in ICU for java to

UTF-16 to UTF-8 using ICU library

I wanted to convert UTF-16 strings to UTF-8. I came across the ICU library by Unicode. I am having problems doing the conversion as the default is UTF-16. I have tried using converter: UErrorCode

ICU, Unicode and libraries

My goal is to implement Unicode in UTF-16 format in an embedded system. I want to be able to use multiple languages to display on my LCD. The texts in multiple languages are going to be written in

How to use ICU without setting ICU_DATA directory

I'm trying to use ICU4C on windows. I copied the 2 dlls into the executable directory and a ICU data file. According to what I read in the documentation, I should not need to set a specific data

ICU and basic opreation on UTF-8 strings

how can I do basic string operations such as strcat, strlen and ... on UTF-8 string with ICU library in C. I found lots of functions for UTF-16 but not for UFT-8.

Getting the correct Collator setting in ICU

The requirement is to be able to do case insensitive operations on both ASCII and Unicode strings. Each input string is encoded using UTF-16LE and stored as a std::basic_string<u_int16_t> dat...

C++ using ICU and Nana GUI Library - String conversion?

I just did some successful tests with ICU in C/C++. I need to parse different CSV files with different encodings (might be UTF-8, UTF-16LE, ), do some modifications on the data and finally output

Elasticsearch use icu_tokenizer from ICU Analysis plugin

I am trying to install ICU Analysis plugin manually in Elasticsearch 1.4.0. I have downloaded elasticsearch-analysis-icu-2.4.1.jar, lucene-analyzers-icu-4.10.2.jar and icu4j-54.1.1.jar. The plugin