Choosing encoding for icu::UnicodeString

2015-12-29T23:13:20

I found myself in need of a way to change a string to lower case that was safe to use for ASCII and for UTF16-LE (as found in some windows registry strings) and came across this question: How to convert std::string to lower case?

The answer that seemed to be the "most correct" to me (I'm not using Boost) was one that demonstrated using the icu library.

In this answer, he specified the encoding "ISO-8859-1" for the UnicodeString constructor. Why is this the correct value and how do I know what to use?

ISO-8859-1 has worked for the few unit tests I've run against ASCII encoded strings that used only Latin characters, but I don't like using it if I don't know why.

If it matters, I'm mainly concerned with manipulating English data that is typically stored in ASCII, but the windows registry has the ability to store things in UTF-16LE and I don't want to block myself from supporting other languages down the road by littering my code with non-unicode safe stuff.

Copyright License:
Author:「Matthew」,Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.
Link to:https://stackoverflow.com/questions/34513831/choosing-encoding-for-icuunicodestring

About “Choosing encoding for icu::UnicodeString” questions

As I understand it, Boost.Filesystem uses the native locale encoding, and I use ICU's UnicodeString instead of std::string as it works for Unicode. However, I want to convert my UnicodeString to some
I found myself in need of a way to change a string to lower case that was safe to use for ASCII and for UTF16-LE (as found in some windows registry strings) and came across this question: How to co...
I am trying to compile my project where I've declared as class members some: icu::UnicodeString label; icu::UnicodeString tags; icu::UnicodeString domain; icu::UnicodeString data; After having in...
I am trying to serialize an icu::UnicodeString with the boost serialization library but am having trouble. The icu::UnicodeString does not have the required serialize function to serialize it. So I
I am wondering if it is possible to reserve memory in icu::UnicodeString (ICU 59.1) similar to how it is done in std::string through the std::string::reserve method? I have looked through the
In my application I use ICU UnicodeString to store my strings. Since I use some libraries incompatible with ICU, I need to convert UnicodeString to its platform dependent representation. Basicly w...
I have a method reads a json file and returns a const char* that can be any text, including emojis. I don't have access to the source of this method. For example, I created a json file with the en...
ICU::UnicodeString's IndexOf method(https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1UnicodeString.html#add90e508b078575eae3d04e596c88dc2) returns the index of the first cod...
I am running ubuntu, and I can build ICU I have included: #include <unistr.h> using namespace icu; This is my build method for ICU: CPPFLAGS="-DU_USING_ICU_NAMESPACE=0" CPPFLAGS="-
I'd like to be able to do this: std::unordered_map<icu::UnicodeString, icu::UnicodeString> mymap; However, when I do (and I come to use it) I was getting "cannot convert size_t to UnicodeSt...

Copyright License:Reproduced under the CC 4.0 BY-SA copyright license with link to original source & disclaimer.