std.utf

Encode and decode UTF-8, UTF-16 and UTF-32 strings.

UTF character support is restricted to '\u0000' <= character <= '\U0010FFFF'.


Category	Functions
Decode	decode decodeFront
Lazy decode	byCodeUnit byChar byWchar byDchar byUTF
Encode	encode toUTF8 toUTF16 toUTF32 toUTFz toUTF16z
Length	codeLength count stride strideBack
Index	toUCSindex toUTFindex
Validation	isValidDchar validate
Miscellaneous	replacementDchar UseReplacementDchar UTFException

Members

Aliases

UseReplacementDchar alias UseReplacementDchar = Flag!"useReplacementDchar": Whether or not to replace invalid UTF with replacementDchar
byChar alias byChar = byUTF!char
byDchar alias byDchar = byUTF!dchar
byWchar alias byWchar = byUTF!wchar: Iterate an input range of characters by char, wchar, or dchar. These aliases simply forward to byUTF with the corresponding C argument.

Classes

UTFException class UTFException: Exception thrown on errors in std.utf functions.

Functions

byCodeUnit auto byCodeUnit(R r): Iterate a range of char, wchar, or dchars by code unit.
codeLength ubyte codeLength(dchar c): Returns the number of code units that are required to encode the code point c when C is the character type used to encode it.
codeLength size_t codeLength(InputRange input): Returns the number of code units that are required to encode str in a string whose character type is C. This is particularly useful when slicing one string with the length of another and the two string types use different character types.
count size_t count(const(C)[] str): Returns the total number of code points encoded in str.
decode dchar decode(S str, size_t index): Decodes and returns the code point starting at str[index]. index is advanced to one past the decoded code point. If the code point is not well-formed, then a UTFException is thrown and index remains unchanged.
decodeBack dchar decodeBack(S str, size_t numCodeUnits)
dchar decodeBack(S str): decodeBack is a variant of decode which specifically decodes the last code point. Unlike decode, decodeBack accepts any bidirectional range of code units (rather than just a string or random access range). It also takes the range by ref and pops off the elements as it decodes them. If numCodeUnits is passed in, it gets set to the number of code units which were in the code point which was decoded.
decodeFront dchar decodeFront(S str, size_t numCodeUnits)
dchar decodeFront(S str): decodeFront is a variant of decode which specifically decodes the first code point. Unlike decode, decodeFront accepts any input range of code units (rather than just a string or random access range). It also takes the range by ref and pops off the elements as it decodes them. If numCodeUnits is passed in, it gets set to the number of code units which were in the code point which was decoded.
encode size_t encode(char[4] buf, dchar c)
size_t encode(wchar[2] buf, dchar c)
size_t encode(dchar[1] buf, dchar c): Encodes c into the static array, buf, and returns the actual length of the encoded character (a number between 1 and 4 for char[4] buffers and a number between 1 and 2 for wchar[2] buffers).
encode void encode(char[] str, dchar c)
void encode(wchar[] str, dchar c)
void encode(dchar[] str, dchar c): Encodes c in str's encoding and appends it to str.
isValidDchar bool isValidDchar(dchar c): Check whether the given Unicode code point is valid.
stride uint stride(S str, size_t index)
uint stride(S str): Calculate the length of the UTF sequence starting at index in str.
strideBack uint strideBack(S str, size_t index)
uint strideBack(S str): Calculate the length of the UTF sequence ending one code unit before index in str.
toUCSindex size_t toUCSindex(const(C)[] str, size_t index): Given index into str and assuming that index is at the start of a UTF sequence, toUCSindex determines the number of UCS characters up to index. So, index is the index of a code unit at the beginning of a code point, and the return value is how many code points into the string that that code point is.
toUTF16 wstring toUTF16(S s): Encodes the elements of s to UTF-16 and returns a newly GC allocated wstring of the elements.
toUTF16z const(wchar)* toUTF16z(const(C)[] str): toUTF16z is a convenience function for toUTFz!(const(wchar)*).
toUTF32 dstring toUTF32(S s): Encodes the elements of s to UTF-32 and returns a newly GC allocated dstring of the elements.
toUTF8 string toUTF8(S s): Encodes the elements of s to UTF-8 and returns a newly allocated string of the elements.
toUTFindex size_t toUTFindex(const(C)[] str, size_t n): Given a UCS index n into str, returns the UTF index. So, n is how many code points into the string the code point is, and the array index of the code unit is returned.
validate void validate(S str): Checks to see if str is well-formed unicode or not.

Templates

byUTF template byUTF(C, UseReplacementDchar useReplacementDchar = Yes.useReplacementDchar): Iterate an input range of characters by char type C by encoding the elements of the range.
toUTFz template toUTFz(P): Returns a C-style zero-terminated string equivalent to str. str must not contain embedded '\0''s as any C function will treat the first '\0' that it sees as the end of the string. If str.empty is true, then a string containing only '\0' is returned.

Variables

replacementDchar enum dchar replacementDchar;: Inserted in place of invalid UTF sequences.

std.utf

Members

Aliases

Classes

Functions

Templates

Variables

See Also

Meta

Source

License

Copyright

Authors