Consider trying to find last instance of Unicode whitespace in UTF-8 data. You can view sys_string as a sequence of UTF-8/16/32 characters and iterate forward or backward equally efficiently. Not 2 allocations and 5 copies like in other languages or with std::string.īidirectional UTF-8/UTF-16/UTF-32 views. in a way proven to be natural and productive in those languages.Ĭoncatenation does not allocate temporaries and copies addends once means that result = s1 + s2 + s3 has one memory allocation and one copy of each of s1, s2 and s3 content into the result. Operations similar to Python or ECMAScript strings means that you can do things like rtrim, split, join, starts_with etc. Iteration can be done in all of these encodings and all operations (case conversion, case insensitive comparisons, trimming) are specified as actions on sequence of Unicode codepoints using Unicode algorithms. Instances of sys_string always store Unicode characters in either UTF-8, UTF-16 or UTF-32, depending on platform. This is similar to how many other languages do it and results in improved performance and elimination of whole class of errors. To do modifications you use a separate "builder" class.
#Iterate over string android codepoints code#
The default storage for sys_string is picked for you based on your platform (you can change it via compilation options) but you can also directly use other specializations in your code if necessary. Internally, sys_string is a specialization of template sys_string_t where the Storage parameter defines what kind of native string type to use. Some platforms, like Windows, support multiple kinds of native string types.
On Android where no-op conversions to Java strings are impossible for technical reasons, the internal storage is such that it makes conversions as cheap as possible. For example on Apple's platforms it stores NSString * internally allowing zero cost conversion.
#Iterate over string android codepoints windows#
Native string types are things like NSString * or CFStringRef on macOS/iOS, Java String on Android, const wchar_t *, HSTRING or BSTR on Windows and const char * on Linux. Interoperability with platform-native string type means that sys_string makes conversions to and from native string types as efficient as possible and ideally 0 cost operations. The library exposes bidirectional UTF-8/UTF-16/UTF-32 views of sys_string as well as of any random access containers It provides fast concatenation via + operator that does not allocate temporary strings. It uses a separate sys_string_builder class to construct strings. It is immutable, Unicode-first and exposes convenient operations similar to Python or ECMAScript strings.
This library provides a C++ string class sys_string that is optimized for interoperability with platform-native string type.