Linux is better than BSD. Directory watchers and decompressors in arsd.

Posted 2023-04-03

Is that a flamebait headline? Better click to find out.

Core D Development Statistics
In the community

Community announcements

What Adam is working on

decompressLzma and Gzip
Directory watchers
Utility functions
Network helpers

Core D Development Statistics

6 bugs fixed
6 bugs and enhancement requests opened
Bugs closed WONTFIX:
- @disable this inconsistent between structs and classes
Bugs closed INVALID:
- ImportC: Need better error messages involving structs
18 pull requests merged into the language: 16 into DMD and druntime and 2 into Phobos
6 pull requests merged into the website.

In the community

Community announcements

Hipreme Engine is fully ported to MacOS

See more at the announce forum.

What Adam is working on

Done several more things over the last couple weeks.

decompressLzma and Gzip

I got a complaint that arsd.archive's lzma decoder was hard to use, so I made a new interface to it. It was so much nicer that I decided to do the same for gzip too. The new functions are here:

http://dpldocs.info/experimental-docs/arsd.archive.decompressGzip.html

http://dpldocs.info/experimental-docs/arsd.archive.decompressLzma.html

(docs virtually identical, they are supposed to be ditto'd as related items, but apparently there's a bug in adrdox i need to look at. but they work the same way so swapping algorithms is as simple as swapping function names instead of using wholly different objects like before)

The new function drives things for you through the delegates: it calls your chunk receiver when decompressed data is ready and calls your bufferFiller when it needs more. Then it lets you do a bit of configuration if you don't like the defaults.

I might still change it so the chunkReceiver can return a value telling it you've had enough, stop processing the file early. But otherwise I'm pretty happy with it. This takes some of the trickier parts of buffer management out of your hands as is the case with the objects.

Generally speaking, the objects are externally driven - you call their functions when the right data is in the right place. And these functions are internally driven, they call your functions when it needs you to put the right data in the right place. The externally driven one can be more flexible (though having a fiber yield in the bufferFiller can give you much of that flexibility back, it does so with its own caveats), but the internally driven one is almost always easier to use. For these, doing a tar.gz or tar.xz read is now simple since the functions can feed right into each other.

Getting data to fill the buffer from memory or a file is simple here, but getting it off a network stream can be a bit harder. Well, it doesn't have to be, you can do a blocking socket.receive call and however much it fills the buffer you send back to the function and it will make use of it. So still easy to use, with the blocking receive function.

What about with a non-blocking function, where you get data ready notifications? That's where the externally driven object is gonna work better on its own... but, you can use a fiber.

You start a fiber and call the decompress function from there. When it requests data, you kick off the async read operation to the requested buffer (or you can stream through an intermediate, but going direct is often nicer if it is sized appropriately for decent performance) and yield.

When the read is complete, its event handler calls your fiber which then returns to the decompressor and work continues. It uses the internally-driven api with an externally-driven data source, just at the cost of remembering to run it from inside a fiber. Not bad at all.

Directory watchers

I continued my arsd.core work, including adding a directory watcher class. I happened to have my bsd box on during the weekend so I started with the kqueue based implementation and... I'm not impressed.

I've heard so many good things about kqueue, but never used it myself until recently and while there's some things I do like about it - its signal scheme is just plain better than what Linux offers, and the add and wait combination is ok (though it isn't as great as some say, it is nice to have sometimes) - but overall, I prefer Linux's way with epoll, timerfd, eventfd, etc. And, of course, Windows offers a lot of nice functionality

And now having worked with a bit of kqueue, Windows' functions, and Linux's inotify... again, kqueue is the disappointing one of the bunch. You have to open each file in a directory and the directory itself, whereas Windows and Linux will just tell you from the single top level what has changed.

Supporting the kqueue system thus has some significant implications on the api design. I think what I'll do is take a glob pattern and scan for them there, and maybe do an auto filter on the other systems.

Utility functions

One thing arsd.core is doing too is taking scattered utility functions from other modules and consolidating them, and giving an opportunity to write a few small ones that I just didn't deem worthwhile before.

One of these is a flagsToString function. This does things like

enum Flags {
	none = 0,
	a = 1,
	b = 2
}

assert(flagsToString!Flags(3) == "a | b");

This is implemented primarily to support error messages, but might be useful elsewhere too.

Speaking of error messages, I also made a LimitedVariant which - as its name implies - is a variant with limited capabilities. It really just holds numbers or strings, the kind of thing passed to system functions. I decided to try to pack it into the same space as a single D string, though I might change that to a separate tag later. It has some small-string optimization for storing a few things in-situ. I also might use this for the new database.d revamp eventually, but I'm still not entirely sure. I'll probably decide to add a separate tag in there at some point... but anyway, my tag also differentiates between binary, octal, hex, and decimal numbers. Of course, they're all stored the same way, but the tag lets the automatic printer present it a bit better to the user. I want the actual number for program inspection, and a nice representation for the user, and this does help.

Sadly, my current implementation can't use my flagsToString function, since the packed tag just isn't big enough for arbitrary types. Another reason why I might change it late. But a bunch of the OS-level calls don't have a convenient flags enum anyway; they tend to be independent constants that aren't introspection-friendly. Bleh.

Of course, instead of an array of limited variants, I could also have generated new subclasses to hold the members. And since I do that elsewhere anyway, I might delete all this stuff before the release and use that facility instead. We'll see.

First, I gotta keep hammering away at finishing the necessary code to get the breakage batched up on schedule.

Network helpers

I also started porting over some network helpers from other modules and unifying their interface. UDP listeners can set up for per-thread or any-thread callbacks. TCP listeners can automatically dispatch connections to their own workers. And local named pipes and unix sockets can use it or a separate interface to aid with things like single-instance programs.

I think that if I can keep up this pace, the core module will be ready for beta testing in another two weeks or so. Then I still have some stuff I'm gonna try to do in sdpy, minigui, terminal, http2, simpleaudio, and game.d... more on this next week perhaps.

Blog Articles