cgi.d's new scheduler, static this tricks

Posted 2019-08-05

This week, I wrote a new schedule server. Below, I describe how it works.

BTW in other news, there's some cool new stuff coming out in dmd - better template error messages, and getLocation for better custom errors. Next release is exciting to me!

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

What Adam is working on

I first talked about this several months ago, but just now finally implemented the code for the schedule server in cgi.d. The usage code looks like this:

The functions discussed below are currently only implemented on Linux, and only on the git master version of cgi.d.
1 void fn(string n) {
2         import std.stdio;
3         writeln("hi ", n);
4 }
5 
6 void test(Cgi cgi) {
7         schedule!fn("asap").asap();
8         schedule!fn("delay").delay(5.seconds);
9         schedule!fn("at").at(DateTime(2019,8,7,1,23,0));
10 }

On the inside, the schedule function gathers arguments and puts them into a ScheduledJobHelper struct. This struct contains the methods to do the scheduling itself - asap, delay, and at right now. These communicate with the timer server, which is embedded in the arsd.cgi library, and currently run separately via ./yourprogram --timer-server. I'll probably have it run automatically later on demand, or possibly run in a helper thread of your application depending on build flags.

The timer server will wait until the specified time - on Linux, via the timerfd mechanism, so the kernel is responsible for waking the program up when the time has come - and then call the function in a new process via shell execute.

There are a few tricks in the implementation I'd like to show you all:

  • Just calling schedule!fn(args...) is a no-op; you MUST call one of the methods to actually do anything. To help you detect this, a destructor ~this() { assert(consumed); } will throw an exception if the struct goes out of scope without a method called.

    This isn't great, I kinda wish the D language had some kind of "must use return value" indicator we can put on a function.

  • I said the function is run in a new process. How does it do this? Well, certainly one option would have been to fork and then just block until the kernel wakes up! It is simple and kinda works, I've done that before. But here, I wanted a different approach.

    In this implementation, the schedule function actually creates a shared static this constructor:

    1 	private immutable void delegate(string[])[string] scheduledJobHandlers;
    2 	template schedule(alias fn, T...) if(is(typeof(fn) == function)) {
    3         	ScheduledJobHelper schedule(T args) {
    4 			// snip a little
    5                 	return ScheduledJobHelper(fn.mangleof, sargs);
    6 	        }
    7 
    8 	        shared static this() {
    9         	        scheduledJobHandlers[fn.mangleof] = delegate(string[] sargs) {
    10 				// snip impl
    11 			};
    12 		}
    13 	}

    This is a long-form eponymous template. It is called like any other function - as you saw above - but also includes the second declaration, the shared static this constructor.

    The compiler will automatically combined all these static constructors into one at the end of the build, meaning you can call this as many times as you want, in as many different places as you want, and it will all initialize that single scheduledJobHandlers array at runtime! Even the immutable part just works, since the compiler treats the many as one. e plubrius unum lol.

    Now, when the program runs, it can look up the function in that associative array and run it!

    Hence we see what the timer scheduler does: it just calls your_application --timed-job mangle_here args go here... and it then looks it up in that array. This gives it flexibility for use by several different applications, all cgi modes, crash resilience, and easy use manually from the command line too. In theory (but not implemented), I can also make a management interface to see the waiting background/timed jobs.

    This lets us gather compile time info into one place without an explicit registration step.

  • Since it runs the function in a new process, the documentation warns against depending on any global variables. I wish D's pure was a little more fine-grained. See, in this case, pure is wrong - a delayed job might need to do I/O, write to a database, etc., just it shouldn't be allowed to read non-immutable globals, write to any (directly anyway, it IS allowed to write THROUGH one, e.g. database.write would be fine), and static guarantees for these would be great.

    But alas, no such thing exists, so I just warn in the documentation instead.

  • But note that it specifically takes a function instead of delegate - this is to limit the context passed at least to the given arguments avoiding hidden dependencies that wouldn't work when recalled.
  • auto sample = delegate() {
    	fn(args);
    };
    Edit: I forgot to mention one more thing:

    Which is inside the schedule function. That function is never run... so why is it there?

    That's a hack to get some type checking and decent error messages for the forwarded argument list. I could use if(__traits(compiles(fn(args)))), but then you lose detailed error messages. I want it to act like a regular function call, so I had to compile one normally.

    The only weird thing is I don't actually want to call it *here*, hence wrapping it in a function that is never called.

Well, there's a few places where I wish D could do a bit more, but it is still pretty cool that it does that it does so easily.