Based on the code for array bounds checking, I implemented null pointer checking in the opend compiler recently: https://github.com/opendlang/opend/commit/9c4bdb8e6623625d1b299d0fb932282d3ad0a0e5 (then had to fixup the dmd side in a followup commit: https://github.com/opendlang/opend/commit/69a575803b77ce6e694661744294dc3420c1bf33 ). Also similar to array bounds checks, you can turn it off with -check=null=off command line switch to dmd, ldmd2, or the opend forwarder.
I'm also keen on adding a pragma to disable it on a per-function basis, but this isn't done yet.
This works by modifying the code gen to insert if(ptr is null) onNullPointerError(); before each operation that would dereference ptr. onNullPointerError will throw a NullPointerError. Just like RangeError, this is thrown before any invalid memory operation is actually performed, so you are permitted to catch it and carry on normally if you like; it does not invoke any undefined behavior in the language.
However, note that it is only added to D code (and is disabled inside druntime's internals too), so if you call into C code or something else compiled separately, they still work the same way as before and may trigger the normal cpu/os fault. I still recommend you try to do explicit checks in your code (again, just like how you shouldn't actually have RangeErrors in practice either), have a failover system so your service can recover from a crash, and be prepared to use a debugger as necessary, but this should help you make a better time of unexpected problems.
My implementation may have missed some spots, as we find them, we'll go back and patch it up, but the current thing already catches most common cases:
1 import arsd.core; // for writeln 2 3 import core.stdc.stdio; // for printf 4 import core.exception; // for NullPointerError 5 6 void main(string[] args) { 7 try { 8 int* a = null; 9 int b = *a; 10 } catch (NullPointerError re) { 11 printf("*ptr\n"); 12 writeln(re); 13 } 14 15 16 /+ // this case is still broken on dmd 17 try { 18 Object obj; 19 string delegate() a = &obj.toString; 20 a(); 21 } catch (Throwable re) { 22 printf("dg_with_null_object()\n"); 23 } 24 +/ 25 26 try { 27 static class A {int b;} 28 A a; 29 int b = a.b; 30 } catch (Throwable re) { 31 printf("obj.member\n"); 32 writeln(re); 33 } 34 35 try { 36 Object a; 37 string s = a.toString(); 38 } catch (Throwable re) { 39 printf("obj.toString\n"); 40 writeln(re); 41 } 42 43 try { 44 void function() a = null; 45 a(); 46 } catch (Throwable re) { 47 printf("fn()\n"); 48 writeln(re); 49 } 50 51 try { 52 void delegate() a = null; 53 a(); 54 } catch (Throwable re) { 55 printf("dg()\n"); 56 writeln(re); 57 } 58 59 60 try { 61 static class A { 62 int b; 63 final int omg() { return this.b; } 64 } 65 A a; 66 int b = a.omg; 67 } catch (Throwable re) { 68 printf("obj.omg\n"); 69 writeln(re); 70 } 71 }
Another known case it misses as of this writing is:
void crashme() @safe { int* a; int b = a[0]; }
Which I thought about making an explicit thing to disable it, but a[0], unlike a[1], does not trigger the safer by default warning (nor the error with explicit @safe, since the compiler sees this as same safety-wise as *a, so this is easy to accidentally fall into! I'll probably fix this up in the next patch release. I'll also try to fix the delegate of a null object then as well.
There may be other cases I just never thought to test. If you find one, let me know!
So, what else have I seen in my testing so far?
I expect your results will vary, so I encourage you to try using it and judge for yourself, but in my case, the performance difference with checks on and off were minimal - it seems the llvm optimizer and/or the cpu's branch predictor do a good job keeping things working well.
For my test, I took my online bingo game and compiled before and after with a couple different compiler settings - plain dmd, ldc -Oz, and ldc -O2, and ran a web benchmarker at it after confirming a deliberate null pointer use worked as expected.
Using -O2 with ldc:
null checks: off. binary size: 10,967,320. web result: 1543 rps, p99 23ms. null checks: on.. binary size: 10,779,768. web result: 1507 rps, p99 25ms.
That's about a 2% difference in this test (similar to what I find the cost of array bounds checking too). The file actually got smaller here too which I don't understand; in all the other tests, the file got between 1 and 10% larger. Maybe the optimizer was somehow able to delete more dead code with this or something, but in all the other tests, it got 40 KB - 800 KB larger.
I don't support betterC, but the codegen approach also works there, using C assert instead of D throw.
extern(C) int main() { int* a; int b = *a; return 0; }
$ opend ldmd2 -betterC npe2 -g $ ./npe2 npe2: npe2.d:3: npe2.main: Assertion `null pointer' failed. Aborted
(it should work with dmd as well, but dmd seems to output misaligned strings and the C assert uses a simd strlen which doesn't like that, i can see about fixing this later too, it also does this with upstream dmd so not specific to this.)
A user suggested OpenD try integrating with Zig's build helper. After working with a wrapper shell script and some config variables, I got druntime+phobos cross-compiled for a new target using zig cc to do the link, then built an application with little extra work.
I think there's potential in this approach. The hardest thing about cross compiling is getting the system libraries linked. For Windows, this is fairly easy since the system import libraries are easy to get (opend install xpack-win64 downloads them for you and it is a stable target), but not really the case on most the unixes. However, the Zig people (notably a couple names I recognized as former D contributors!) solved this in a fairly clever way - it creates stub import libraries from interface lists on demand.
Interestingly, this is quite similar to how things work for Windows - an import library has the name and some minimal "call into the real library" code. The zig thing has the name and then is replaced with the real library on the target machine by the system dynamic linker. In neither case, do you require the full real library at link time, making the cross compilation requirement much simpler.
I thought about cloning it or downloading their list+generator on demand, but I think asking the user to install zig then opend can try to use it for various targets is easier. Could even build the druntime on the user's machine as part of setup, though that is a bit finicky (it requires the source package and cmake and other dependencies too the user would have to set up), but even if the xpack includes them pre built it might be a decent compromise of user effort vs compilation target breadth.
This will see more development in the future. I might provide a couple prebuilt xpacks using this zig program later this year.
I have written before about how we'd like to experiment with write barriers to make the GC more concurrent. I believe a write barrier implementation is basically the same as the null pointer check, just using a different variable. Null pointer check is basically *ptr -> (ptr || call_throw, *ptr). A write barrier would be like *ptr -> (blocked || (call_wait, goto check_again), *ptr). You'd probably want to do it with some kind of atomic op to avoid race conditions, but this is still the same rewrite idea in the same parts of the compiler.
But, while I called the null pointer check one of OpenD's riskiest changes thus far because of how much it affects, it still has a layer of protection under it - if the codegen is wrong or if it missed a spot, the failure is going to be contained to another segfault. (Yes, it took two commits because I missed a problem the first time, but it came up as soon as I tried the install package on running real world code.) Whereas with the write barrier, a missed spot or mistake is more likely to cause use-after-free GC errors, which are much harder to detect. So it'll surely take the crown as the next riskiest thing.
We're a step closer to this now, but still have no real timeline to move forward the rest of the way.
OpenD has now shipped a NullPointerError, enabled by default, can be disabled via a command line switch.
To get opend, go to https://opendlang.org/ and follow the link to get started then download from github.