cgi.d hybrid server basically working, terminal.d can redirect stdout to a window if requested

Posted 2020-09-28

I did some improvements to cgi.d's implementation and add the capability to redirect stdout and stderr in terminal.d's embedded emulator.

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

A quick note

You might notice the "posted" date on these articles is always Monday, but I often don't actually publish until several days later.

The way it works is I auto-generate the source file consistently on Monday. This attaches the date and copies in the bugzilla, github, and forum activity. Doing this consistently keeps the weekly digest comparable when going through the archives.

However, I do not actually formally publish and update the website until I either finish writing the long-form content, or just give up and decide to skip a proper article that week. This can take me a pretty long time.

If you hate this, let me know...

What Adam is working on

cgi.d

After writing the last entry in here, I spent a few hours to play with the idea of a hybrid server in cgi.d and I got it working. It is now listed on the http benchmark site here: https://github.com/tchaloupka/httpbench

If you look at the numbers, it is competitive. Different modes work better or worse for different loads, but the new hybrid server consistently does a solidly OK job.

The biggest difference though was a buffering fix that has been in cgi.d for years. The code used to look something like this:

string[] data;
data ~= "Header: " ~ value;
data ~= "More: " ~ stuff ~ ":" ~ to!string(number);
foreach(datum; data)
   write(datum);

As you can see, it basically doesn't actually buffer at all - it spends the time concatenating a thing, but then loops through and writes them individually anyway. Then the content is written in another separate call (that is correctly buffered at least).

Just joining ahead of time and writing at once doubled the speed on the simple hello world benchmark. Putting it all into a nicer buffer, skipping 99% of the allocations, got even more speed out of it.

This fix applies to all modes and made a very big difference. Low concurrency hello world test on Linux: nginx gets 55k requests per second (more is better) and 10ms max latency (less is better). Go's fasthttp gets 54k, 10ms. Rust's actix scores 53k, 1ms. And cgi.d's processes mode gets 48k, 10ms. The new hybrid gets 32k here, 13ms. Previously, the process server would only get about 20k on this test, so it more than doubled, moving it from bottom tier to upper-middle tier.

At high concurrency, the processes server remains at 48k, 8ms. Still not bad, but meanwhile nginx grows to 97k, 329ms, Go clocks in at 82k, 30ms, and Rust reaches 94k, 7ms. So by comparison, cgi.d's processes server is no longer doing well. This is because of the problems it has with keep-alive I mentioned last week.

But this is where the new hybrid server comes in: it does 86k requests per second, 48ms max latency, now competitive with Go again (a bit more throughput but also more latency, but still not enough to be particularly problematic) and not too far behind Rust.

Of course, this test is only for hello world. I expect Go will once again take a clear lead in other tests and Rust's lead will probably grow. But on the other hand, these improvements do not sacrifice cgi.d's strengths in reliability, ease of distribution, third-party compatibility, etc. So overall, I'm pretty happy with the hybrid. Solidly OK in all concurrency levels while remaining flexible and easy to use.

Another nice benefit of the new hybrid model is the websocket server now just works inline - you don't have to pass the connections off to a helper process anymore (though you actually still might want to, so I'll continue to work on that mode as well, the fibers are slightly heavyweight compared to the minimal custom data structure you can do with the helper process). The event-driven fibers in worker threads can handle a variety of workloads.

I'm not quite ready to make it a default yet though. I still need to do more testing and I'm concerned about breaking user code. With the old http server, you could somewhat reliably use static TLS data. With the hybrid, since your fiber can be moved between worker threads so easily, this no longer holds. But... you weren't really supposed to rely on that anyway to be compatible with one-process-per-request CGI mode. Still, though, it was useful and I'd use it sometimes myself. So it will take a bit more thought to decide if I want to change the default or just make it opt-in.

I still have more I can optimize here too and might do more work toward it as I have time. But I also want to fix some of the inner API (did you know you can subclass Cgi and custom-process body payloads?) and formalize that for a documented release too - so more coming soon one way or another. I also need to do the non-Linux implementations at some point.

Nevertheless, I'm pretty happy with it for a day's tweaks.

terminal.d

This week, I was asked if it was possible to hook C's stdout and stderr into the TerminalDirectToEmulator mode of terminal.d, which keeps a consistent user-level API while transparently either popping up a GUI window with a custom terminal emulator if possible, or falling back to the normal terminal if not.

I said "don't cross the streams". They said "but thou must". Well, I figured it shouldn't be too bad, I know how to redirect the standard streams to a self-pipe, and if I hook that into the terminal buffer... it might just work. I still suggest not mixing methods - if you are using the terminal api, consistently use the methods on Terminal so its internal state stays in sync and the buffers don't get reordered - but I do suppose there's some code that just can't be changed, and even a reordered buffer is better than a crash.

Yes, a crash - if you build a Windows GUI executable and someone tries to stderr.writeln, Phobos will actually check the return value of both write and flush... and thus eventually throw a "bad file descriptor" exception because the stderr stream doesn't even exist in that environment.

Well, I started hacking in the self-pipe redirection and just wasn't happy with it at all. Buffers would get out of sync causing output to be randomly interleaved, cursors would get misaligned throwing off prompts, and it would sometimes still either crash and/or hang. Clearly not good enough to send off to users.

I had to go back to the drawing board. It occurred to me there is one way to keep everything ordered: to always use the same pipe. Yes, redirect stdout's receiving end to the terminal... but then ALSO redirect the terminal's sending end through stdout. This seems a bit silly to me: instead of terminal.write going straight to the inner buffer which is picked up upon flush to terminal.display, terminal.write would then stdout.write, then terminal.flush will forward to stdout.flush, which is then picked up by terminal.read and moved to the inner buffer, which finally ends up at terminal.display. A roundabout way of getting to the same place.

But the benefit is since all the various methods go through the same buffer and same chokepoint, they will no longer be reordered! Even third party code is unlikely to break things.

Well, getting the redirects to ACTUALLY happen was a pain - I'll write about this in a separate section at the end of this article - but it worked!

...until I tried to call getline. Then it wrote in the wrong place again. See, getline in TerminalDirectToEmulator mode would save the painful hassle of talking through the pipe and instead just read the cursor position directly out of the embedded terminal. Much easier and worked beautifully here since the parts shared a buffer and thread synchronization objects... but with this change, there's a C FILE* doing its own separate buffering! And the flush would return when the other thread received the data.. but it wouldn't tell me when the other thread was actually *done* with the data. So the cursor may or may not be updated when this thread tried to read it. And since C's fwrite doesn't know to flag the sync event that terminal.d watches to make all this work.

I spent a lot of time trying to figure this out. Maybe I could turn off C's buffering and use a mutex? Nope. Maybe a read-ready event from the OS with direct writing through the pipe? Nope. I just about gave up on it and fell back to saying "just don't do that, it is buggy".

But after searching the code, only this terminal cursor function needed attention, and maybe it could just write through the pipe again like it does on a separated terminal. The actual code for that is Posix-specific (and fairly complicate) by necessity, but perhaps I could simplify it here and hook back into one of those special synchronization objects since I'd then control both side again.

So that's what I did: set a flag and write a magic number to the pipe. The read side, if the flag is set, watches for the magic number. When it comes, it discards it, resets the flag, and triggers the sync event telling the other thread it is allowed to move forward and read the cursor position.

It worked! No more reordering... all appears to be in sync on Windows and Linux systems alike. Doesn't appear to have broken the fallbacks or anything else either. I think I can deploy this... it just took quite a bit longer than the half-day I thought it would take, even after accounting for the baby keeping my arms busy.

Next curiosity was the process hanging if I closed the window while it was waiting for input. wtf? It correctly closed the window and correctly sent a HUP signal (which is translated into a "Terminal disconnected" exception) to the other thread, which was supposed to shut it down. The exception was thrown, so why didn't it actually terminate?

Get this: I spent some time in the debugger and it kept ending up in the unhandled exception function... which tries to write to stderr... which waits for it to complete so it can error check one more time (who actually checks those errors?! Well Phobos and druntime do.), but is now talking to a window that no longer exists and thus will never process it.

Turns out I just forgot to close the pipe handle, figuring it would be done at process termination anyway. But after explicitly closing the handle, the stderr write completes (it fails, there's nowhere for the message to go, but at least it completes) finally allowing the process to terminate. What a journey.

There's still a problem here: naughty libraries trying to write text in module constructors will hit trouble before the Terminal object is constructed. I think such naughty libraries should not be used, especially on a Windows system. But... if so, there'll be little hope but to have a wrapper shell script redirect things ahead of time. And then the new code in terminal.d will try to respect that redirection and not put it in the window... meaning that will need a command line argument or something to override. But that's all potentially doable if the third part library situation leaves no alternative.

And I still need to test all the scenarios on Linux before I release this. But I'll be releasing soon enough.

Aside - redirection of streams

Redirecting the C streams on Windows is a bit of a pain. See, if you do a GUI executable, the std in/out/err handles are all set to null unless the user redirects them to something at startup time. The Microsoft C runtime sees this and sets them to an invalid file descriptor. The Digital Mars C runtime does it differently, which tripped me up - I got it working on plain dmd relatively easily by creating a new Windows HANDLE and passing it in, but then it failed with dmd -m64 (and dmd -m32mscoff) which uses the Microsoft runtime. Blargh.

So I'll focus on the Microsoft runtime. Using Windows' SetStdHandle doesn't affect it. You can't use the _open_osfhandle function because the _dup2 call to associate it with stdout will fail with the error "invalid file handle".

I found only one function that actually works: freopen. It works on the FILE* level rather than the file descriptor, so it actually affects the right thing, the problem is it only takes a filename string, not an existing HANDLE, so I couldn't pass it my anonymous pipe.

So I switched to a named pipe (which is better anyway since you can use them with overlapped I/O, which I took advantage of later), and yay, freopen successfully connected to it and gave me a valid file for stdout to use!

The steps: 1) CreateNamedPipe, 2) freopen to that named pipe. Not so hard but it took me a while to get there from my starting point.

Next challenge: I want stderr to also be redirected back to this same stream. Can't freopen to the same pipe (well, not without making the named pipe server accept a second client, which could have been done, but not what I wanted to do)... and the normal solution of _dup2 runs into that invalid file descriptor error again.

The solution is a combination: 1) freopen("NUL", stderr). This gives it a valid file descriptor. (And if you wanted to just have writes succeed but didn't care about seeing them, that's how you do it btw.) 2) _dup2(_fileno(stdout), _fileno(stderr)). Finally, it is done!

Side note: D's imports don't seem to define _dup2 but that's trivial, extern(C) int _dup2(int, int);.

This is similar but not the same as having a GUI program grab a console. After you AttachConsole(ATTACH_PARENT_PROCESS) || AllocConsole();, you freopen("CONOUT$", stdout); freopen("CONIN$", stdin); freopen("CONOUT$", stderr);. But there since there's magic names for the console, it is a bit simpler.

The last thing to check in this redirection is to not blast the user's shell redirections, in present. You can detect this through GetStdHandle and GetFileType, but the invalid descriptor can also be checked in the C level: if(_fileno(stdout) < 0) you know it is invalid and can be changed.

Taking care of all this seems to actually work adequately. You might SetStdHandle on top of it all, but the C things are separate so you don't really have to if you are just worried about that.

One note: the stdout etc here are from import core.stdc.stdio, not the homonyms from std.stdio, which have a D wrapper on them. This is all using the C thing directly - I'd suggest just import core.stdc.stdio in these modules, or at least using import c = core.stdc.stdio; and then c.stdout.