byUTF

Iterate an input range of characters by char type C by encoding the elements of the range.

UTF sequences that cannot be converted to the specified encoding are either replaced by U+FFFD per "5.22 Best Practice for U+FFFD Substitution" of the Unicode Standard 6.2 or result in a thrown UTFException. Hence byUTF is not symmetric. This algorithm is lazy, and does not allocate memory. @nogc, pure-ity, nothrow, and @safe-ty are inferred from the r parameter.

template byUTF(C, UseReplacementDchar useReplacementDchar = Yes.useReplacementDchar)
ref
byUTF
(
R
)
(
R r
)

Parameters

C

char, wchar, or dchar

useReplacementDchar

UseReplacementDchar.yes means replace invalid UTF with replacementDchar, UseReplacementDchar.no means throw UTFException for invalid UTF

Return Value

A forward range if R is a range and not auto-decodable, as defined by std.traits.isAutodecodableString, and if the base range is also a forward range.

Or, if R is a range and it is auto-decodable and is(ElementEncodingType!typeof(r) == C), then the range is passed to byCodeUnit.

Otherwise, an input range of characters.

Throws

UTFException if invalid UTF sequence and useReplacementDchar is set to UseReplacementDchar.yes

GC: Does not use GC if useReplacementDchar is set to UseReplacementDchar.no

Examples

import std.algorithm.comparison : equal;

// hellö as a range of `char`s, which are UTF-8
assert("hell\u00F6".byUTF!char().equal(['h', 'e', 'l', 'l', 0xC3, 0xB6]));

// `wchar`s are able to hold the ö in a single element (UTF-16 code unit)
assert("hell\u00F6".byUTF!wchar().equal(['h', 'e', 'l', 'l', 'ö']));

// 𐐷 is four code units in UTF-8, two in UTF-16, and one in UTF-32
assert("𐐷".byUTF!char().equal([0xF0, 0x90, 0x90, 0xB7]));
assert("𐐷".byUTF!wchar().equal([0xD801, 0xDC37]));
assert("𐐷".byUTF!dchar().equal([0x00010437]));
import std.algorithm.comparison : equal;
import std.exception : assertThrown;

assert("hello\xF0betty".byChar.byUTF!(dchar, UseReplacementDchar.yes).equal("hello\uFFFDetty"));
assertThrown!UTFException("hello\xF0betty".byChar.byUTF!(dchar, UseReplacementDchar.no).equal("hello betty"));

Meta

Suggestion Box / Bug Report