Write barriers could work

Posted 2022-10-31

I sketch out some thoughts on how D could easily and efficiently add write barriers to enable more GC strategies without affecting other code, analogizing them to bounds checks.

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

Thoughts on pointer barriers

GC talk came up again and I got thinking a little bit about pointer barriers in the compiler. I don't think it is helpful to think of them as new types per se, but rather as something akin to range checks in slices - a facility in the normal type that adds safety and capability, but can be bypassed both locally and globally if necessary for special cases.

You still have the same pointers D has today. The only difference to the language itself is adding a way to bypass the GC write barrier, probably some kind of core.memory.__raw(T) which would look something like struct __raw(T) { T ptr; alias ptr this; auto opAssign(T rhs) { /* compiler magic to bypass the bounds check */ } }. Note the alias this - if you did actually have one of these, it'd implicitly convert right back to the normal pointer. On the other hand, bypassing the check is always an explicit act at construction time. Its only real job is to indicate that you want to bypass the normal check here, you might even use it similarly to slice.ptr[0] to locally bypass a bounds check. Important to realize that you never have to actually use this; it is strictly an optional performance enhancement for special cases, just like how you can bypass a bounds check.

In the compiler, again analogous to bounds checking, you can add -barrier=[none|writes] as a command line switch to control it globally. Certain GC implementations would not be available if the barriers are not compiled in.

The barrier itself would be delegated to a druntime function or a llvm intrinsic, or whatever is appropriate for the compiler and garbage collector. (This can make swapping out collectors at runtime an additional performance hit, since that'd imply the barrier is also a function pointer, but all these are reasonable decisions to look at.) The function would most likely look like core.memory.__ptr_write(void** where, void* what) so give the implementation maximum flexibility. With intrinsics and inlining, when the barriers are turned off, this would surely reduce to mov [where], what; anyway.

A few things to note:

  • A const pointer never needs a write barrier because it is never written. This doesn't need any special attention in the front end since const T* a; a = something; doesn't compile anyway.
  • A self-assign pointer, e.g. int* a; a++; or int[] a; a = a[1 .. $];, doesn't need a write barrier. Why? Because the goal of this is to protect race conditions that change the GC's behavior, and the GC only cares about which block it points to... and these operations are never permitted to change which block it points to. The slice would fails a bounds check if it went outside, and even the raw pointer going outside its original allocation block is undefined behavior according to C rules. You'd just have to make sure the actual pointer replacement is atomic; make sure it actually generates inc ptr for example instead of a read-modify-write. I'm pretty sure the compiler already does this anyway, but it'd have to be specified to be sure.

    The compiler frontend doesn't even need to do anything special for this case either btw, since the backend can detect these self-modify cases and elide the check.

    I expect this would mean a great many D functions don't generate the barrier at all, even when it is enabled.... unless I'm missing something important. The barriers happen when you reassign pointers to another block.

  • It might be possible to elide barriers for borrowed references too, since you know there's a canonical reference somewhere else that the GC will see. Again, the key question is: would this write affect the GC's behavior in the middle of a run? If so, you need to protect it. If not, you can skip it. Just better safe than sorry; an extra barrier you didn't need is a small performance hit. A missing barrier you needed is a major runtime bug.
  • These elided things combined with conservative stack scanning likely still exclude copying/moving GC implementations. But it should be enough to enable certain types of generational GCs and most implementations of incremental and concurrent GCs. Of course, you might want a command line switch to generate more code to enable those other schemes too, but I'm trying to sketch out the simplest thing that could possibly work.
  • All this applies only to direct writes to pointers, NOT to writes done through pointers.
    char* a;
    a = x; // this gets a barrier
    *a = x; // this does not

    It is only the mutation of the pointer value that matters. The GC doesn't care about the value of a random char, only about which memory blocks are referenced and unreferenced.

EDIT

I posted this but didn't explain some of the practical benefits of these different GC impls! So here's some very brief things:

Right now, the GC mark phase stops all threads so it doesn't get into race conditions, when it marks one thing then it changes while the GC is working elsewhere. With the write barriers, you can avoid stopping a thread until it tries to actually write a pointer value

So say you have an audio thread that is consuming GC resources but only writing to an existing ushort[] buffer... its activities will never invalidates the GC's work-in-progress so it can just keep going. This would just work without explicit user work, no need to unregister it.

Another work thread may be processing some data then will send the buffer to a function. This one MIGHT be able to keep running through the whole thing, depending on if the borrowed pointer thing actually worked, but a simpler and likely less buggy implementation will probably left it finish processing the data then only actually stop when it copies the buffer pointer to the function, which would trigger the write barrier. And by then, the GC might be almost done, meaning the actual time stopped is much smaller than the current implementation.

DONE WITH EDIT

Additional reading: http://www.infognition.com/blog/2014/the_real_problem_with_gc_in_d.html is an article from 2014 about how generational GCs can provide a major win and absolutely require write barriers to implement. I'd note that generational GCs are often also copying, since this lets them reduce memory fragmentation when promoting items to older generations, thus reducing wasted memory. But it isn't strictly necessary; you can still free the unreferenced ones and then promote the whole area to older generation, allocating a new pool for new young objects. Not ideal but definitely doable.

Also, a post about the GC implementation in particular 4 years ago: https://web.archive.org/web/20180622071748/https://olshansky.me//gc/runtime/dlang/2017/06/14/inside-d-gc.html I don't know how much of that has changed since then.

I actually find today's GC perfectly good enough, but improvements are generally good anyway and I think the write barriers would be a net again without being particularly invasive in user code.

Next week, I'll talk about how to optimize your objects to reduce GC pauses in today's D with a general understanding the NO_SCAN flag. In short, arrays without pointers are not scanned at all and you can use this to your advantage. I'll also touch upon strategic deferrals of collections when you know you are in a pure allocation loop.