A potential GC puzzler discovered

Posted 2021-09-20

I was going to write about minigui grid, but instead a discussion came up that looked like a codegen bug and I had to help with the investigation.

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

My investigation

The problem

Consider the following code:

import std.stdio;
import core.memory;

class C {
    ~this() { writeln("dtor"); }
}
void main()
{
    auto c = new C;
    GC.collect();
    GC.collect();
    writeln("here");
}

Do those GC.collects collect c? You might think no, because c is still in scope when GC.collect is called; that's what I thought too. But turns out the answer is actually "sometimes". Let me show you what dmd -m32 makes out of that on Linux:

Dump of assembler code for function _Dmain:
   0x0807f88c <+0>:     push   ebp
   0x0807f88d <+1>:     mov    ebp,esp
   0x0807f88f <+3>:     sub    esp,0x8
   0x0807f892 <+6>:     sub    esp,0xc
   0x0807f895 <+9>:     mov    eax,0x80d0210
   0x0807f89a <+14>:    push   eax
   0x0807f89b <+15>:    call   0x8083100 <_d_newclass>
   0x0807f8a0 <+20>:    add    esp,0x10
   0x0807f8a3 <+23>:    call   0x80812f0 <_D4core6memory2GC7collectFNbZv>
   0x0807f8a8 <+28>:    call   0x80812f0 <_D4core6memory2GC7collectFNbZv>
   0x0807f8ad <+33>:    sub    esp,0x8
   0x0807f8b0 <+36>:    mov    ecx,0x80bb029
   0x0807f8b5 <+41>:    push   ecx
   0x0807f8b6 <+42>:    push   0x5
   0x0807f8b8 <+44>:    call   0x807f8e4 <_D3std5stdio__T7writelnTAyaZQnFNfQjZv>
   0x0807f8bd <+49>:    add    esp,0x8
   0x0807f8c0 <+52>:    xor    eax,eax
   0x0807f8c2 <+54>:    leave
   0x0807f8c3 <+55>:    ret

As a piece of context, keep in mind that eax stores function return values.

So if we stepped through this to right after the _d_newclass call, you could find that eax will be set to something like 0xf7b13000, which is the address of the new class we just created.

But now you'll notice there is no store of that eax anywhere before the GC collect call. Tons of operations are liable to clear that register out; it is quite temporary and if not explicitly saved it isn't going to stay around. Indeed, GC.collect clears eax very early on just as an incidental process of loading the GC proxy pointer.

The only reason why there's two GC.collects here is that the first one doesn't actually collect.... and to be honest, I'm not sure why it doesn't, since the register is wiped before it is scanned. Maybe it is still left in an unreferenced stack frame from the new class call that isn't wiped until after another function. I'm able to partially confirm that by replacing the first GC.collect with a call to this:

void foo() {
        ubyte[1000] spam;
}

(If the optimizer was turned on, it would of course remove this call as useless, but for this particular test, I'm not optimizing meaning that spam = 0 auto-initializer is still there. Reproducing these bugs are tricky because adding debug info and changing optimization settings can hide or show the problem.)

So yeah, there's just another reference on the stack that is not yet overwritten when the first collect happens. The function then overwrites it in the process of its work, so the second collect now finds it unreferenced and collects the instance.

Is this a problem?

In most cases, this doesn't matter. The variable is never used again and the compiler knows it, meaning it has no need to actually store it. Sure, it is freed, but since it is never used again, the user will never notice unless they are tracking GC finalizers like this.

Any use of the variable after the GC collect call will cause the compiler to output a store to the stack which the GC will not stomp. If it was a struct destructor, the timing of that is well defined, so the compiler will keep it around for that too.

0x0807f8d7 <+15>:    call   0x8083190 <_d_newclass>
0x0807f8dc <+20>:    add    esp,0x10
0x0807f8df <+23>:    mov    DWORD PTR [ebp-0x4],eax

That mov instruction there is storing to the stack as a local variable.

Eliminating the mov before is considered an optimization by the compiler, which is why it is so finicky and only manifests in certain times. gdc -O eliminated the mov, but plain gdc didn't. dmd -m32 did, but dmd -g -m32 did not.

But so far, the variable is not used again, so nothing matters. However, consider this:

import std.stdio;
import core.memory;

import core.stdc.stdlib;

class C {
    ~this() { writeln("dtor"); }
    void use() { writeln("used ", cast(void*) this); }
}

void main()
{
    void** magic = cast(void**) malloc((void*).sizeof);
    auto c = new C;
    //c.use();
    *magic = cast(void*) c;
    GC.collect();
    GC.collect();

    auto c2 = cast(C) *magic;
    c2.use();
    writeln("here ");
}

That commented c.use line makes a difference - if you use the variable, dmd will not elide that mov, meaning it gets saved and NOT gc'd. You'll see it print the same address twice.

But if you comment that first c.use... now you get a segmentation fault. Why? Well, since the local register got clobbered and the GC can't see something hidden inside malloc (unless you tell it to look there with GC.addRoot or GC.addRange), we smuggled a reference out behind the compiler's back. The harmless, invisible optimization just turned into a mysterious runtime crash.

This is a reduced sample, of course nobody would write it like this exactly, but this is the kind of thing that can happen when you pass pointers to C functions. It is fairly rare - generally the act of passing it to a C function means a copy is made on the local stack, and the GC only ever triggers when you call it, which C functions rarely do - but it can happen. If you pass it down a pipe or similar event system, pointers can be hidden from the GC in a kernel buffer, and it is easy to assume that because you have the local variable in D still, the GC must be able to see it.

The D spec even suggests using this very pattern to keep the reference alive. Yet the compilers can optimize that store away...

So, is this a bug?

Well, saying is a problem is one thing, but is it a bug?

First, let me define how I declare bugs: I do it in a bit of a legalistic way, and I believe the compiler is entitled to a defense. By "legalistic", I mean I try to find exact wording in the spec that is violated by this behavior. I will quote a section of text in my indictment. But by entitled to a defense, I'll try to argue how that doesn't apply or find wiggle room in there to allow it.

What that means is my label of "bug" might not be too important. Some things are buggy but useful and some things are not bugs but not helpful either. In some cases, the spec is buggy and in some the implementation is buggy.

But one thing about my approach is it does hint at cross-compiler issues. If the spec allows it, a reasonable alternate compiler might do it. In this case, it is a codegen difference, but we can reproduce with the different backends - dmd and gdc don't share this part of the code. So that shows it is probably generally a reasonable thing to do as it has independent implementations.

Indeed, the spec gives flexibility on when class finalizers are run. So I don't think this is a bug per se... but the spec also implies the local variable pins it for the whole lexical scope, which is clearly false in practice given the presence of this optimization. And indeed, this optimization here is no big deal, but it could be relevant in other places, so this kind of flexibility is useful.

So... I lean toward innocent until proven guilty. I don't think this is a bug right now, unless we want to point fingers at the spec's suggestion on interfacing with C.

But wow this can be a subtle problem when it comes up. Its rareness makes it almost worse since when it almost never happens, when it does happen, you think you've lost your mind.

So I don't know. Another blog ending that way. At least I understand the problem even if the correct solution eludes me at this time.

Thanks

Special thanks to the folks on the forum and chatrooms Monday who reduced this problem and talked it through as a group.