I'm looking into using ICU for Unicode string processing in a native Node.js module because it seems to me that v8::String
(according to these docs) doesn't have a C++ API for this purpose.
To my knowledge V8 expects UTF-16 in ExternalStringResource
and other APIs, so I'd like to use ICU for UTF-16 processing. I specifically need to:
- Iterate over the characters (not just the 16-bit code units) of an UTF-16 string
- Tell the number of characters (not just the 16-bit code units) that an UTF-16 string contains
So I looked at the ICU documentation and found the UnicodeString
and CharacterIterator
classes. However, UnicodeString
doesn't have a fromUTF16
method, only fromUTF8
and fromUTF32
.
The other thing I'm unsure about is, does the UnicodeString
constructor copy the data I give it or not? I'd very much prefer to use a zero-copy approach where I'd just work with an immutable object so it shouldn't perform any copy operations, just use the buffer I point it at.
I'm also unsure if I can just use UCharIterator
(assuming I can somehow convert UChar*
from my UTF-16 strings).
So my question is: How do I use ICU for the above purposes?
Copyright License:
Author:「Venemo」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/19842014/how-to-use-icu-with-utf-16