an input range of characters (including strings) or a type that implicitly converts to a string type.
If r is not an auto-decodable string (i.e. a narrow string or a user-defined type that implicits converts to a string type), then r is returned.
Otherwise, r is converted to its corresponding string type (if it's not already a string) and wrapped in a random-access range where the element encoding type of the string (its code unit) is the element type of the range, and that range returned. The range has slicing.
If r is quirky enough to be a struct or class which is an input range of characters on its own (i.e. it has the input range API as member functions), and it's implicitly convertible to a string type, then r is returned, and no implicit conversion takes place.
If r is wrapped in a new range, then that range has a source property for returning the string that's currently contained within that range.
1 import std.range.primitives; 2 import std.traits : isAutodecodableString; 3 4 auto r = "Hello, World!".byCodeUnit(); 5 static assert(hasLength!(typeof(r))); 6 static assert(hasSlicing!(typeof(r))); 7 static assert(isRandomAccessRange!(typeof(r))); 8 static assert(is(ElementType!(typeof(r)) == immutable char)); 9 10 // contrast with the range capabilities of standard strings (with or 11 // without autodecoding enabled). 12 auto s = "Hello, World!"; 13 static assert(isBidirectionalRange!(typeof(r))); 14 static if (isAutodecodableString!(typeof(s))) 15 { 16 // with autodecoding enabled, strings are non-random-access ranges of 17 // dchar. 18 static assert(is(ElementType!(typeof(s)) == dchar)); 19 static assert(!isRandomAccessRange!(typeof(s))); 20 static assert(!hasSlicing!(typeof(s))); 21 static assert(!hasLength!(typeof(s))); 22 } 23 else 24 { 25 // without autodecoding, strings are normal arrays. 26 static assert(is(ElementType!(typeof(s)) == immutable char)); 27 static assert(isRandomAccessRange!(typeof(s))); 28 static assert(hasSlicing!(typeof(s))); 29 static assert(hasLength!(typeof(s))); 30 }
byCodeUnit does no Unicode decoding
string noel1 = "noe\u0308l"; // noël using e + combining diaeresis assert(noel1.byCodeUnit[2] != 'ë'); assert(noel1.byCodeUnit[2] == 'e'); string noel2 = "no\u00EBl"; // noël using a precomposed ë character // Because string is UTF-8, the code unit at index 2 is just // the first of a sequence that encodes 'ë' assert(noel2.byCodeUnit[2] != 'ë');
byCodeUnit exposes a source property when wrapping narrow strings.
1 import std.algorithm.comparison : equal; 2 import std.range : popFrontN; 3 import std.traits : isAutodecodableString; 4 { 5 auto range = byCodeUnit("hello world"); 6 range.popFrontN(3); 7 assert(equal(range.save, "lo world")); 8 static if (isAutodecodableString!string) // only enabled with autodecoding 9 { 10 string str = range.source; 11 assert(str == "lo world"); 12 } 13 } 14 // source only exists if the range was wrapped 15 { 16 auto range = byCodeUnit("hello world"d); 17 static assert(!__traits(compiles, range.source)); 18 }
Refer to the std.uni docs for a reference on Unicode terminology.
For a range that iterates by grapheme cluster (written character) see std.uni.byGrapheme.
Iterate a range of char, wchar, or dchars by code unit.
The purpose is to bypass the special case decoding that std.range.primitives.front does to character arrays. As a result, using ranges with byCodeUnit can be nothrow while std.range.primitives.front throws when it encounters invalid Unicode sequences.
A code unit is a building block of the UTF encodings. Generally, an individual code unit does not represent what's perceived as a full character (a.k.a. a grapheme cluster in Unicode terminology). Many characters are encoded with multiple code units. For example, the UTF-8 code units for ø are 0xC3 0xB8. That means, an individual element of byCodeUnit often does not form a character on its own. Attempting to treat it as one while iterating over the resulting range will give nonsensical results.