My thoughts on bitfields and recap of binary literals

Posted 2022-09-12

A misconception from DConf led to a thread on the forum about what it means to simplify D, and I separately wrote up some thoughts on how bitfields ought to work.

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

Binary literals thread

At DConf, Walter incorrectly claimed that binary literals were deprecated from the language as an example of a successful simplification of the core language. Binary literals were never actually deprecated, so a viewer took this as a plan to do so in the future and created a thread saying "please don't".

Walter doubled down, claiming the feature is useless and removing it would simplify D. Several people came out saying they use the literals, some searched public code repos and found several uses, but he stood firm. I mentioned that this kind of thing is going to lead to a fork. Why? Not this decision in particular, but rather the problems behind it.

The underlying issues are primarily that there's no investigation with actually existing D users. In fact, there's active disdain for us - people in that thread saying "we use it" continued to get replies of "nobody uses it, remove it" and that there's no consistent theory behind the actions. They say "let's simplify D" when it comes to changing small, isolated features, yet simultaneously keep adding actual complexity in the form of features that have interactions with other features and knock-on effects down into other features.

Superficial simplification isn't of much value. Especially when that goes against actually existing D users' value proposition. D, as a language, used to agree that some complexity is inherent to programs, and if it isn't tackled through the language, it still exists, just now in user code. Worth remembering there's some legit truth in that.

Bitfields

Walter implemented bitfields for ImportC, matching what C compilers do, then tried to transition this to D. Several of us objected, and it currently exists behind a preview switch. I don't ever want to see C bitfields added to D. But I wouldn't mind some form of language construct for the concept.

C bitfields are bad

Even in pure C code, C bitfields are bad. They're only good for one thing: getting the compiler's help in packing private, internal-use-only structs into more compact memory layouts. This is because the layout is implementation defined - the compiler is free to pack the data where it fits, not to match any particular positioning, so it can't be used to match an external structure like hardware registers. Compilers can change the layout between versions too if they choose - hence why it is only good for internal things, since a public abi would get no guarantees.

Additionally, the syntax and type rules are outright bizarre. The type you give doesn't affect things like other variables. It is used to determine signedness and default alignment - not size or much of anything else. Things get int promoted in C anyway so the type doesn't have much bearing on what it pretends to be either.

When you mix this with D, things get even weirder. D has reflection. What happens when you ask for all members? .tupleof? Or .init? How would you reflect over the bitmasks? How does it interact with value range propagation? D requires explicit casts when bits are discarded, so how does this interact with the narrowing conversions rule? C doesn't have many answers. Even things C does have answers to don't necessarily translate to D, like sizeof, offsetof, typeof, taking the address of the field, and the alignment rules. What works for C doesn't necessarily work for D because of feature interactions.

Adding C bitfields to D is a recipe for a lot of bugs and not very much use value.

D bitfields could be good

All that said, I'm not necessarily against adding something to D. The Phobos bitfields template solves some of the C problems - at least its layout is defined and since it generates D code, its reflection... well it still isn't great since it is generated code, but it at least follows the D rules. The syntax is a bit weird too, since the types are used for return values of the getter properties, but not necessarily any other aspect of the type for storage in the struct. This isn't all bad, but we may be able to do better.

I haven't thought through it too much, but a thought of mine is I'd say something along the lines of breaking down an existing type into a list of bits. Like hypothetically:

ulong fields {
      msb : 1,
      _reserved: 62,
      lsb: 1,
};

So you get the bitfields contained in the struct in a defined way by breaking down an existing field. The existing field gets its normal alignment, sizeof, reflection from existing code, etc.

Inside the brackets, all the bits must be assigned; the total numbers must add up to the type.sizeof * 8. Hence, having the reserved bits explicitly listed is required.

The order of them is defined strictly from msb to lsb from top to bottom.

We'd add new reflection traits to pull these fields. Perhaps you can check something like is(field == __bits) to identify it, then call allMembers to get the things inside, and fetch the bitcount from there, or maybe sizeof of a bits inner meber reports in bits. Since it isn't returned unless you do something previously illegal - ask for members of an integral field - it lets us define some new details, while maintaining sane compatibility with everything else who just sees it as a ulong field; (or whatever). We'd probably also want something like __traits(bitmask) for a field so you can create your own accessors properties.

We might also want some types inside the braces to act as the return types of the accessors. Otherwise, it would use the outer type. Similarly, we might consider a thing with signedness, but I'd prefer to make people cast that in accessors and require the containing thing be unsigned. Value range propagation - since accessing one of these works through bitmasking - ought to already work. If you did ubyte a = fields.msb; it should allow it. fields is a ulong, sure, but fields.msb is going to do a (fields >> 63) & 0x1 operation, which VRP already understands. These things should just work.

A fair question would be if you access it through obj.fields.msb or just through obj.msb (I prefer the former - you can mixin a reflection thing to forward things if you want it in the higher thing.)

The bottom line here is I'd rather treat the bitfields as breaking down an existing field instead of being magic half-fields. Then we define existing reflection, field layout, etc. in terms of the existing field and offer new functions for getting the details of the bits. Something along these lines I think would actually benefit D.