My webasm updates, gdc in D

Posted 2021-10-18

I wrote a malloc and free finally! Some talk on assumeSafeAppend and webasm drawing.

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

GDC using new frontend

If you want to build a branch yourself, gdc using newer dmd code is available in Iain's branch now. I got a report from a user that s/he was able to build it without difficulty and it worked well for their special purpose (running on esoteric hardware).

My custom webassembly update

GC rox but

I haven't done much on webassembly since last dconf and with the next one coming up again soon, I wanted to refresh myself on it and maybe even make a garbage collector.

I've argued previously that druntime's standard GC should just work in webassembly. I am now fairly certain that I was wrong about that. The standard GC can read the webassembly memory space, but it cannot read webassembly parameters as that stack is hidden from the application.

It might be possible with compiler support; some llvm intrinsics related to GC and maybe it could make a shadow stack or something that can be scanned. I don't know enough to say that for sure.

But this means the druntime webasm port is going to need more work to figure out, probably with some level of compiler support (same with exceptions and even blocking calls btw, llvm has intrinsics to help with exceptions that might work and blocking calls actually need stop+resume points too).

And for my part, with my minimal webasm runtime and special library ports, I don't think I will bother writing a GC for it anymore. Instead, I'll probably just do a basic mallocator with free.

Realistically, lightweight webassembly is a different enough target from Windows and Linux that programs are likely to need some kind of adaptation anyway. This can be kinda minimal depending on the kind of program - like async event loops tend to basically work just because the code is already written that way - but there might be other things necessary too like version blocks with alternate implementations. That's the price of keeping things light; you don't want to emulate everything since that costs code - sometimes a lot of code - elsewhere.

Now that I have a custom runtime and am open to custom code versioning in, this gives me some options I wouldn't normally condone, like doing some manual frees, assumeSafeAppend or similar to optimize the ~= operator, or even replacing the global allocator with an autorelease pool at times. I can either version these out or make them harmless no-ops in real D to maintain most the compatibility with other builds.

Or, of course, I can always just leak memory. Seriously, that might be acceptable in certain situations. And there's other pooling strategies that are harmless to beneficial in other scenarios too, I just don't often do them because it it is extra work that isn't really necessary normally. But there's a variety of available options.

What I added so far

First, I had to actually implement a malloc that was more than push-the-pointer nonsense. It needs to actually support free, extend, and alignment (javascript data view objects need the data to be aligned! i kept getting a "index X is out of bounds" when trying to construct a Float64Array and it baffled me for a while until I realized the address wasn't a multiple of 8. so yeah, gotta align these things). I've never actually done that before!

Of course, I could probably just bring in one of the many higher quality malloc implementations out there. But where's the fun in that?! And besides, they're all kinda big and part of the reason for me doing this all myself is I want to keep the webassembly distribution small.

In my info block, I put in a blockSize and used fields. This allowed me to implement druntime's assumeSafeAppend function, and druntime's reasonably good append operator, all with the built-in array syntax, just like the real thing. (Doing this in a user-defined type is easy enough regardless, that's what you'd typically do in betterC and thanks to operator overloads it is pretty nice.

But here, I want to take the code I already wrote that uses the append operator on build in arrays and use it. I'm trying to minimize the code changes in the actual program. Using things like reserve and assumeSafeAppend is often a good idea in the normal D implementation too and now my mini webasm runtime can do all these reasonably efficiently too... whereas before I could use them but it just leaked memory and wasted a lot of time moving around.

However, now that I no longer think writing a webasm GC is actually doable at this time, I have to open myself to some small source edits. I could just call free(arr.ptr); as needed, but when everything else is managed by ~= it seemed a bit annoying to have to keep the old pointer, compare and free outside.

Instead, I added a new function arr.assumeUniqueReference which sets a flag in the memory block. When you do that and it reallocates, it outright frees the old pointer instead of leaving it behind for the GC to verify it is safe to free. In real D, I can either simply version this out, or have a no-op function you can call but it doesn't necessarily do anything.

I'm tempted to do some kind of reference count option for delegates too. It could stick that right in the memory block too and again just version or stub those out for real D. The scary thing here would be if you call addRef on something that the compiler did NOT allocate a closure for... what does it do then? I don't know yet. I'll probably just go with a call to free. The LWDR - lightweight D runtime - project in the works right now for embedded ARM situations does that too. (In fact, LWDR used my webasm runtime as it was a few months ago as a starting point.) I'm helping the author and we share some ideas, though the two projects are not quite interchangeable due to the different targets. (Like he probably WILL do a GC of some sort.)

In other webasm news

I also optimized the port of simpledisplay's ScreenPainter to webassembly. The old one leaned on my eval function which, well, evals a string in Javascript. This is very slow. I optimized it a little to cache results and it made a difference (though I kinda wish I took strings only at compile time in it instead of a runtime string. I might change that, since if they are all compile time it makes building a list of them and indexing the cache a lot easier), but I wanted to try something else too.

I ended up making it into a little stack machine that builds up info in a reused array, then does a single call across the Javascript bridge to execute it.

The result I saw so far:

CPUoperation
18%eval
14%naive cache of evaled functions
13%bytecode array

I also want to try a smarter cache since I really do like that code style - each D module can define its own JS bridge as-needed - and I also want to try ditching my silly bytecode function and just calling JS imports directly and see what happens. Preparing arbitrary JS imports is a pain but I could always do something else about that in the build (my webasm server here just recompiles as needed when it serves so I don't even run the make anyway).

None of these results are very good though. The test program is a simple line animation, this should be like 0.3% cpu. It isn't just the bridge's fault though; firefox's canvas seems to not be especially well implemented anyway.

But meh it is good enough for what little I do with this. I still don't find webasm all that useful anyway lol.

anyway, it is coming together to be reasonably useful. Hopefully by Nov 20, when I'm on DConf Online again doing another livestream, I'll fix enough of this to make finishing the programs nice and easy. I'd rather demo minigui on that stream if I have time anyway!