Thoughts on inferred attributes

Posted 2022-07-11

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

Inferred Attributes

In today's D, attributes are inferred for auto returning functions and for templates:

auto foo() { }

void bar() {}

pragma(msg, typeof(foo)); // pure nothrow @nogc @safe void()
pragma(msg, typeof(bar)); // void()

Notice how foo automatically got pure nothrow @nogc @safe, without me having to write it. This is attribute inference.

One proposal out there, pushed especially by rikki cattermole among others, is to extend this inference to all functions. This would make said functions easier to use by people who care about the attributes, without making them any harder for library authors to write.

I like the idea. But there's some caveats which need to be considered:

  • Run-time dispatch
  • Function documentation
  • Compatibility contracts
  • Attributes dependent on arguments
  • ABI and .di file compatibility

Let's see how to address them.

Run-time dispatch

First thing to realize is that these are static attributes, which is directly at-odds with run-time dispatch, unless you force them.

interface Foo {
	void func();
}

class Bar : Foo {
	void func() {}
}

In the example, Foo.func can obviously not be inferred, since it has no body available to analyze. You can subclass it and do whatever you want; the interface makes no guarantees (unless, of course, the interface does specify some attributes, which forces all implementations of it to be at least that strict too).

But Bar.func does have a body. Can we infer it? Well, if it was final, I'd say yes: you can use stricter attributes on a child implementation than the parent interface requires. So the relationship with the parent interface is no blocker; you can infer as tightly as you want and still fulfil that.

However, since Bar.func is not final, the truth is the function body actually called when you call it is NOT actually known. Consider:

class Baz : Bar {
	override void func() { some_global ~= x; }
}

void main() {
	Bar b = new Baz();
	b.func();
}

At this point, the main function can not be @nogc or pure, even though Bar.func - the interface it chooses to call through - might have been able to use that. The reason is the override, which causes a run-time substitution of the function body.

So my lesson here is: we can infer attributes on a final method, but not a virtual one, because runtime dispatch allows the body to be substituted by anything that implements the formal interface. Restricting child classes like that must be an explicit attribute, otherwise you'd break substitution.

This failure to infer would propagate out to main, unless I marked Baz.func to final and was sure to call it through that interface:

class Baz : Bar {
	final override void func() { some_global ~= x; }
}

void main() @safe {
	// this would be ok, it can still infer @safe
	// on Baz because it is `final` and thus we know
	// for sure which function body is actually used here.
	Baz b = new Baz();
	b.func();
}

Function documentation

Auto-generated documentation tends to look at the static annotations, but it can be useful to know what it was inferred to. The doc gen would want the compiler to tell the doc gen what it inferred to, but the doc gen should call out that it was inferred. This matters because of compatibility contracts, so read on.

Compatibility contracts

If a library makes a change in a future version that causes the inferred attributes to change, is this a breaking change? Well, my answer (as it always is with defining breaking changes) is "it depends on what your promised". The documentation defines those promises, which is why I think it is so important that the documentation calls out the difference between a static guarantee and an inferred reality.

If I write a function:

Color parseColor(string name) @safe @nogc { ... }

Then I'm promising it is guarantee to be @safe and @nogc - if I were to release a new version that does an allocation, for example, that'd be a breaking change.

But, at the same time, it may very well infer to pure, and you'd then be able to use it from a pure function. It'd compile, but since I didn't list pure there, I'm not promising it will stay pure in the future.

Of course, it very well may! I might add pure in a future version and expand my guarantee to it. But, if I were to change it in the future to, for example, check the user's locale from a global variable, this would no longer infer to pure. And since I didn't list it, I could call this a new feature (with a minor version bump) instead of a breaking change (with a major version bump).

I'm perfectly ok with this, as long as the documentation clearly shows the difference between a coincidental inference that lets it work for now as opposed to a long-term compatibility guarantee that I'll keep it that way. Yes, an implementation change might break your build. But that's really no different than you relying on something I've labeled a bug and release a fix for later.

Attributes dependent on arguments

Inferred attributes changing because the implementation changed is one thing, but what if it changed because of user code passing a different argument?

This works today with templates, at the cost of another instantiation:

void forward(Dg)(Dg dg) {
	dg();
}

void main() {
	void delegate() notSafe;
	void delegate() @safe yesSafe;

	// when you call `forward`, it automatically
	// instantiates it with the type of the argument.

	// so calling `forward(notSafe);` it'd do this:
	pragma(msg, typeof(forward!(typeof(notSafe))));
	// Note that when creating with notSafe, it is
	// @system void(void delegate() dg)

	// and calling `forward(yesSafe);` it'd do this:
	pragma(msg, typeof(forward!(typeof(yesSafe))));
	// But when created with yesSafe, it is
	// @safe void(void delegate() @safe dg)

	// Note the one is `@system` and the other is `@safe`.
	// It inferred different types from the arguments.

	// But notice it is different types and different
	// addresses:
	assert(&forward!(typeof(notSafe)) != &forward!(typeof(yesSafe)));

	// Because they are different instantiations!
}

What really happens with the templates is the given arguments are pasted into a new function body, then that new body is passed through the attribute inferred system.

A common pattern is to use an annotated unittest to demonstrate that it may have the attribute in certain circumstances, with the test providing an example that the compiler checks as it compiles the test.

I plan on adding detection of "maybe, dependent on argument" attributes to adrdox soon by scanning attached unittests for them. This is another category for documentation: always guaranteed (attributes present in the declaration), guaranteed with appropriate arguments (attributes present in the unittest), and not guaranteed, but works today (attributes inferred).

If we were inferring non-template function bodies though, there is only one body, so to get it to infer on arguments, we'd have to make that part of a single type, instead of depending on multiple types coming in.

There's a "Argument dependent attributes" DIP about that: https://github.com/dlang/DIPs/pull/198 and I believe another competing proposal, though I can't find that link right now. I shorten the concept as "inout for attributes"; just like inout can make the return constness dependent on an argument constness, this idea would make the function's attributes dependent on the argument's attributes. It'd make the rest of the function as strict as the attribute requires - just like how inout works like const inside the function - but defer the final result to depend on what it is passed. And it'd do this with a single function body, no need to multiple copies of the same function.

In any case, while this is something to consider for attribute inference, it would require a new feature in the type system so it is technically an orthogonal idea. Probably a worthwhile one regardless of inference; it'd be useful with explicit guaranteed attributes too!

ABI and .di file compatibility

Another concern with inferred attributes - and the main one Walter Bright brings in response to the proposals - is that attributes become part of the mangled name, meaning if an implementation tweak changed the inferred attribute, it would also change the name the linker sees. Which can lead to "undefined reference" errors if the binding didn't match.

I put this last because it is the least of my worries; I don't really understand why Walter is concerned about it. It is not surprising to me that changing the code might break the interface, but I suppose there is a difference between changing the body and seeing the name change and changing the obvious prototype itself and seeing a change in how it links.

This is mitigated by two facts: 1) separate bindings are relatively rare in D, instead opting to simply use the source, and 2) you can auto-generate them with dmd -H. Now, dmd -H is actually pretty bad (I'd like to fix it, maybe I'll write in more detail later, but the compiler ought to be able to make it work much better than it does and additionally I think the .di generation or maybe the dmd json output including inferred values might be a good source of data for a docgen), but it does work in some cases already.

And when you do get a linker error, it is easy enough to update your bindings, though going back to the compatibility contract concept, needing to update bindings might come as a surprise. But still, the library author ought to be the one providing .di bindings to a D thing anyway, so it wouldn't come as a surprise to them!

So while this can be double in extraordinary situations, I think the authors themselves are most likely to see it, and them fixing it is not too difficult today and can be even easier tomorrow.

Compile speeds

A bonus concern: will inferring attributes add to compile speeds? I doubt it would matter, but if it does, .di generation can come back and cache the result anyway.

I'd kinda like to see .di and .obj files get bundled officially and be well supported for some of these cases. If a function is in the object file, generate the matching declaration so the compiler knows what it can and can't reuse.

Conclusion

I think inferring attributes can be extended effectively right now. Using it on functions with a body known at compile time (so basically anything non-virtual and not just a no-body declaration) ought to work just fine and practically extend the usability of these attributes without being a burden on library authors.

The promise of maintaining compatibility is still a burden on library authors, but they'd opt into that the same as today. The big difference is when they write it without thinking about it, things are far more likely to work and this might help provide pressure to maintain compatibility if that is desired.

I say we ought to do it.