di files are currently useless - the compiler does the same thing for the original source anyway

Posted 2023-10-09

Consider a potential new paradigm for caching ctfe results: simply wrapping them inside a standard runtime function. Using a property getter around a template instance instead of declaring a variable means it is emitted to the object file and not recalculated on import.

Core D Development Statistics
In the community

Community announcements

CTFE Caching

Core D Development Statistics

5 bugs fixed
6 bugs and enhancement requests opened
Bugs closed WONTFIX:
- missed opportunity to propagate `final` to aliased symbol
Bugs closed INVALID:
- DIP1000 fails to determine proper lifetime for struct
- Interface address is offset by 16bytes, causing memory leaks
11 pull requests merged into the language: 11 into DMD and druntime and 0 into Phobos
1 pull requests merged into the website.

In the community

Community announcements

See more at the announce forum.

CTFE Caching

Steven Schveighoffer on the discord chat room found some old code written by Martin Nowak in Phobos with a curious comment: it said it wraps some ctfe in an auto return function so semantic isn't run on the function body unless the function is actually called.

Inside dmd, there's various functions called semantic that do a whole mess of activities, notably including running any necessary CTFE in the function and processing imports in the function body. (It also calculates inferred attributes, but we'll come back to that thought.) The semantic family of functions are often one of the most expensive parts of a build (in specific cases, but notably including CTFE), so avoiding that can save much time.

An auto function isn't especially interesting for caching ctfe, however, since its body is always processed when the function is called, regardless of if it is being compiled or imported. This is the same reason why inferred attributes work for them - the body is necessarily processed when the function is used, so the compiler doesn't need to be told explicit attributes - but it also means any CTFE inside the function is also reprocessed any time it is used, even if it had been done before and the result is already in an object file. Just importing an auto file won't trigger the CTFE in its body, but importing it and calling it will, even if the module was precompiled.

This is an improvement over a global variable (whether immutable or enum makes no difference here), which is always computed, even if just imported and never used, but it isn't good enough yet. I'd like to be able to import a function AND call it, still getting to use precompiled CTFE results.

Well, that old code comment and something Walter consistently says (that I think is [https://dpldocs.info/this-week-in-d/Blog.Posted_2022_07_11.html#compile-speeds|overblown) is that inferred attributes can harm compile speeds made me think: is semantic actually run on imported non-root modules' function bodies right now at all?

I made a test rig to investigate:

// ctfe.d
template thing() {
        string helper() {
                string s;
                foreach(i; 0 .. 50000)
                        s ~= "ok";
                return s;
        }

        enum thing = helper;
}

// immutable getThing2 = thing!();

// enum getThing2 = thing!();

string getThing() {
        // import ctfe3;
	// int a = "foo";
        return thing!();
}

And a main module:

// ctfe2.d
import ctfe;

void main() {
        string s = getThing();

        // enum foo = getThing() ~ "lol";
}

The append loop in the imported module is tuned to give me a compile time of about one second on my computer. As I often have to remind people, compile speeds are less about the number of lines in a build and more about what you do with them. CTFE's ~= operator has a horrific implementation in dmd (and ldc and gdc) so it is easy to explode your compile times by using it. Normally, I recommend you avoid it for this reason, but when you want to make something build slowly deliberately, it is an easy thing to reach for.

I can now tell if the CTFE is run by just watching the compile time. A one second build vs about a 1/10th second build is easy to notice.

The deliberately failing type mismatch with int a = "foo"; or the import of a non-existent module are also things to confirm semantic is not run - those errors are generated during the semantic runs too. I commented them out here to compile the module, but if you uncommented it, errors would NOT be generated when just importing it!

The key takeaway is that in the current dmd implementation, non-template, non-auto returning function bodies are completely skipped in non-root modules, that is, modules being imported but not compiled in this run of the compiler. Sure, they are lexed and at least minimally parsed to know when the function ends, but there is no semantic analysis, no ctfe, no code generation - the typically relatively expensive parts of a build are all skipped.

Seeing this makes me understand why Walter thinks compile times may be adversely affected by universal attribute inference. I still say that is overblown - these are the relatively expensive parts of a compile being sometimes skipped, but most the time they still aren't expensive in an absolute sense.

But, sometimes they are very expensive! This ctfe inside getThing is an example of where we indeed don't want semantic run again unless it is absolutely necessary. I think I'd amend my inferred attributes proposal to add a way to suppress the inference of attributes and keep the current behavior for this use.

Anyway, about a year ago, I wrote about a new concept of di files and mentioned one of the troubles was keeping a function available for CTFE while skipping its body.... and turns out that is already solved, today! You simply don't use the di generation... ever. (Now if it was redefined to fix the template emission problem described in my old blog, then it would have some value, but it could still simply keep the body attached to the function and the compiler can skip it.)

Take a look at the test rig again (with other stuff snipped out this time):

template thing() {
        string helper() {
                string s;
                foreach(i; 0 .. 50000)
                        s ~= "ok";
                return s;
        }

        enum thing = helper;
}

string getThing() {
        return thing!();
}

There's a couple layers on top of the actual ctfe calculation, as done in the helper function: a wrapper template and then a wrapper standard function. These both serve similar goals, but one at compile time and one at run time.

The wrapper template serves to cache the CTFE result at compile time. When the template is referenced again with the same arguments, the compiler pulls the result back out of its in-memory cache. It also suppresses code generation of the helper function, preventing it from existing in the object file for runtime use and keeping the compiler from wasting time running code generation on it.

Then, the wrapper property - note that it has an explicit return type - will cause the ctfe result to be output in the object file, and next time the module is imported, the body will be skipped and the template+ctfe is not done again; it will instead reference it from the precompiled file.

However, if you do want to use the result at CTFE, you can - the body is still there and can be used on-demand, but you'll pay the CTFE cost once per compilation job again. So it isn't cached in such a way that separate compilation CTFE units can reuse it, but it does work for users just trying to call the function in their code normally.

This is a pretty good result, available in today's compilers, and only a minor adjustment to your code. I'm looking forward to trying it in a more realistic situation than my test rig above.

(PS: one last note on separate compilation, one place where you can get biggest benefit is if there's some really expensive internal thing with a really basic public interface. If you generate code to glue together web interfaces or scripting languages, which boil down to a lookup table to generated functions in the public interface, you can make again by structuring the code such that all the generation is inside these kind of basic functions. Even if the body is expensive, the body is not processed if you follow the rules above. This might be worth exploring more too.)

Blog Articles