Adam's string interpolation proposal

Posted 2019-05-13

Well, DConf was this week, but I only caught a couple hours of it due to technical difficulties on day one, timezone on day two, and personal life on day three, so I will have to write about that later.

This week though, I will talk about D string interpolation below, with my proposal.

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

My string interpolation proposal

The idea

Starting with Jonathan Marler's proposal, I want to propose a few additions and details. The big difference is I really want the interpolated string to make a new type. See here for some initial comments: https://github.com/dlang/DIPs/pull/140#discussion_r283385684

Here's the basic idea. Given:

int i = 50;
string cool = "amazing";
auto res = i"test $i$(30+5) $cool";

res is one of these:

1 struct ResultOfInterpolation__anonymous_name {
2 	// this allows easy DbI if necessary to identify we have one
3 	// of these
4 	enum __d_isInterpolatedString = true;
5 
6 	// the original string literal is also available for
7 	// future introspection. We might process this to get
8 	// original variable names and stuff or whatever in libs.
9 	enum string __d_originalString = "test $i$(30+5) $cool";
10 
11 	// notice the types of even numbered members are always string
12 	// odd ones are whatever the type of the expression inside the
13 	// interpolated string worked out to.
14 	//
15 	// (actually, it would be `args[0]` instead of `arg0` in the likely
16 	// real implementation, but I am spelling it out here to be explicit.)
17 	string arg0 = "test ";
18 	auto arg1 = 50; // the i got passed by value down here
19 	string arg2 = ""; // the gap between interpolation is always a string
20 	auto arg3 = 35; // the 30 + 5 is passed again
21 	string arg4 = " ";
22 	auto arg5 = "amazing"; // the cool var passed by value
23 
24 	// note that this is a zero-arg template, it is only compiled
25 	// when used, and works for constness, etc. too as an added bonus.
26 	string toString()() {
27 		import std.conv : text;
28 		return text(this.tupleof);
29 	}
30 
31 	// maybe this, read on
32 	alias toString this;
33 }

The names of the members are all well-defined and reliable for use for libraries and reflection.

Usage

This would be used in one of three ways:

  1. As a string:
    string s = i"foo $(bar)".toString();

    You'd call .toString on it (or it could be .text, I could go either way, but I lean toward toString because that is the traditional name for such a function.) and get a simple string. This is your most basic use-case and covers it without requiring an explicit import on the user end.

    This calls out to Phobos to actually do the conversions, so the compiler doesn't have to know any of that. Yes, I am a bit gross having a lower-level struct call up to Phobos too, but it is only done when called, in theory replaceable for embedded, etc., environments (provide your own std.conv module), and a template so you only pay for it if you use it, so I'm OK with it. Note the exponentiation operator is precedent for built-in language syntax delegating to Phobos.

    If we wanted to, we could also add alias toString this to that generated object, and make string s = i"foo $(bar)"; just work as well. So would string s = "foo" ~ i"foo $(bar)"; as the ~ operator would also trigger the implicit conversion via alias this. I'm not 100% sold I want to do that - it would be an implicit runtime allocation slightly hidden and it might have other edge cases (alias this can be finicky with existing Phobos templates, like wouldn't be be weird if std.file.exists(i"$(my_directory)/filename") gave an ugly compile error? but that's not a dealbreaker to me, just a slight concern)

    Anyway, I am leaning yes to adding this, just not quite a "yes" yet.. Y'all can probably tip me over into adding that if you think it is a good idea. This would cover this common use case even more transparently.

  2. As a tuple:
    import std.stdio;
    
    writeln(i"foo $(bar)".tupleof);

    You can call .tupleof on it and get the expanded values and pass them to a function, same as Jonathan Marler's current implementation. (Yes, you could just do `import std.conv; i"foo $(bar)".tupleof.text as well to explicitly call Phobos, obviating #1 here, but I think that is common enough to build into it.) This lets you pass it to functions that don't explicitly know about interpolated strings, but without turning into an intermediate string an losing other information (or having the runtime allocation) in the process.

    I'd probably actually modify writeln, text, and other common standard library functions to understand the interpolated object and just work, so you could surely do writeln(i"foo $(bar)") directly. But I want to demo that tupleof can be used without specialized library functions too anyway.
  3. As a specialized object to library functions that are designed for it:
    1 // in user code:
    2 sqlQuery(i"foo $(bar)"); // passed directly, processed correctly!
    3 
    4 // in a library:
    5 void sqlQuery(T)(T value) if(is(typeof(T.__d_isInterpolatedString))) {
    6 	// custom-process the value, *reliably* differentiating
    7 	// the string literals from the interpolated values
    8 }

    You write a templated function that is specialized to handle one of those objects for your particular use-case and do whatever you want with it as a library author. By making it a new type, this can avoid confusion with another list of random strings or other variadic arguments, as well as give us a hook point to add other functionality.

    I put in two enum members: __d_isInterpolatedString which makes identifying one of these for template constraints and static if conditions trivial (just check for the presence of that member), and __d_originalString which has the original string literal from the code, enabling further library processing of it in the future. It is just a string literal, and the compiler already knows it, so might as well have it here. For example, the sqlQuery example might process that to issue a compile-time error via static assert for malformed SQL syntax! Yes, even with the interpolation object itself passed by value, the enum __d_originalString, being an enum attached to the type, is still inspectable at compile time!

    (whoa, I wasn't intending that last bit and didn't realize it until I wrote the above paragraph... but that might actually be the killer feature of this proposal. wow. but I digress)

    Then, the toString method enables use-case #1 conveniently. .tupleof, for use-case #2, is provided for free, just like any other struct.

    Finally, the arguments are spelled out as members, given predictable names, so a library template can do the rest via D's existing reflection facilities. Alternatively, it might be a compiler-tuple simply encapsulated in the struct and accessed by the index operator, but if we do that, more compiler magic must be implemented to keep .tupleof working, or we would replace that with .args at the usage point, or something. I could go either way.

    Just like in the Marler proposal, it is designed to always go string, arg, string, arg, so you can differentiate i"$(foo)($bar)" from i"$(foo)bar and do necessary escape, conversion, etc. work.

  4. And, as an added bonus:
    auto value = i"foo $(bar)";
    later_on_process(value.toString());

    Since it is a struct, the interpolated string can also be held in intermediate variables, passed as enums, and possibly even manipulated (that might be bad style, but there's no technical reason why we couldn't) just like any other struct until you decide to transform it at the usage point by one of the three above methods.

Why not in a library?

Pure library solutions

The obvious question to such a heavily-library accessible solution like this is why not just go all the way and make it a library entity?

import my.interpolation : i;

mixin(i!"foo $(bar)");

That's kinda heavy at the usage point; I probably would never bother using that. (just like with octal... I wrote that template myself too, and I think my implementation is brilliant... but since the crappy C-style octal literals were deprecated (and they deserved to be!), I have still never actually used octal!123 in real life. I just write it as hex or binary - as built-in literals - instead.)

To be fair, it is possible to do, and since the whole implementation is library-provided, each library could do their own custom processing. Strictly speaking, string interpolation is not new to D.

But, at the same time, template metaprogramming was also not a new idea to D. C++ has been able to do it for ages. D's success here is not making the impossible possible... it was making the possible *approachable*. I think we can do the same with built-in interpolation syntax. And by making it the object as described above, we sacrifice nothing in library flexibility (while keeping the compiler implementation simple). Even as I was writing this document, new potential ideas arose on how to leverage this to write better libraries.

And with built-in syntax at the usage point, people might actually *use* those better libraries, instead of avoiding it like I do with the (otherwise really cool!) octal template.

Hybrid library solutions

A hybrid implementation is my preference, where the compiler simply translates the i"..." syntax to __d_interpolated_string!"original string given"( tuple expansion...) and lets druntime define the surrounding struct. If we can make this work, I'd actually like to do it. I think it'd be a simpler implementation, avoids the compiler outputting a phobos reference directly (though druntime still would, I'm ok with it due to being hidden behind a template function), and lets us experiment in other environments more.

I'm slightly concerned about one thing relative to the Marler proposal though:

alias a = something;
enum b = "bar";
foo!i"$(a) $(b)";

With that one, and the tuple implementation, the aliasness of a and enumness of b is still passed to foo. Passing through a library middle-man function would drop this; they would become values indistinguishable from runtime constructions.

However... I think that's perfectly fine. An interpolated string is about values, it doesn't need to perfectly preserve alias and enum distinctions. It would be kinda cool if it did... but it doesn't have to, and I want to avoid complicating it unnecessarily.

Now, we could try passing it as template arguments, but the following MUST work:

int a = 0;
foo(i"$(a + 5)");

And if that was sent as a template parameter, you would get "variable a cannot be read at compile time" errors. So, I'd pass just the string itself as a template param (which allows the enum __d_originalString member to the struct to be processed by future templates), and pass the rest as values.

Of course, CTFE can still work with values, so of course, the interpolated string syntax can still be used in a CTFE context, just it doesn't force one.

This, the compiler to runtime library interface looks like this:

1 // user writes
2 auto a = i"$(foo)";
3 
4 // compiler rewrites to:
5 // note that the first argument is always a string, even if empty
6 auto a = __d_interpolated_string!"$(foo)"("", foo);
7 
8 // library then generates the struct from my introduction

And the implementation in the runtime library might look like:

1 auto __d_interpolated_string(string str, T...)(T args) {
2 	static struct Impl {
3 		enum __d_isInterpolatedString = true;
4 		enum __d_originalString = str;
5 
6 		T args;
7 
8 		string toString()() {
9 			import std.conv;
10 			return text(this.tupleof);
11 		}
12 	}
13 
14 	return Impl(args);
15 }

I haven't confirmed all details of constness, scope, etc. work, but that should be inferred by existing template and struct rules anyway - it passes my quick tests at least! (and tbh if it doesn't work with every possible thing, I still say it is worth doing.)

The existing pull request by Jonathan Marler should be easy to adapt for this - all it does is wrap in that __d_interpolated_string call, adding the original literal as a template argument. No other magic inside the compiler.

Syntax inside the string

I have consistently used i"$(bar)" in these examples, as that is my preference, but there are some thoughts as to making i"$bar" work too. If we go with that, I'd propose we piggy-back off the UDA grammar here.

So, the rule would be if @token.sequence compiles, so would i"$token.sequence".

However, I selfishly want to keep the rule very, very simple for two reasons:

  • I want the string to be trivially parsed by a hand-written library, or by syntax highlighters, etc.. Scanning for $( and matching parenthesis is very simple and can be done reliably even with weaksauce edit syntax definition restrictions.
  • Glancing at a string with the naked eye, I want it to be easy to tell what is code too. i"$$$foo!"bar"". How do we parse that? Of course, we can define rules to handle it, and of course, they can be pretty simple to implement for computers.

    I just want it to be simple enough that human readers and writers get it right at first glance too. Again, $() does that and I think the parens are acceptable syntax to require.

(lol, ironically, I unbalanced parenthesis in the block above. adrdox uses balanced parens to find blocks too, but it doesn't tokenize. D's lexer does, so it is balanced parens *in the token stream*, not in the text per se. Which means the hand-written parsers do still need to at least understand string literals inside those parens, and know ")" doesn't count as a openParensCount--; call. But, this is still lexing, not parsing, so I'm ok with it.)

I also like this being primarily done in the lexer, which doesn't know what a token sequence means. Again, requiring matching parenthesis means it can just tokenize and bypass stuff inside to get to the end of the i"" block.

As for printing a literal $... I'd actually not even provide special syntax for that. Here's how you'd do it with no special rules:

assert("$" == i"$("$")".toString());

Inside the parenthesis, we can have any D expression (so long as, after tokenization, the parenthesis balance). And a string literal containing a $ is a valid D expression.

So we don't need special escaping syntax at all! I understand that is a little wordy, so I am not opposed to having something, just I don't think it is necessary.

Interpolated token strings?

What about iq{ $(bar) }? I know other people want interpolation to be available in token strings for mixin purposes, but I have to vote against that. Consider this:

mixin(iq{
   string s = i"foo $(bar)";
});

At which level is that $(bar) meant to be interpolated? On the outside of the mixin, or in the code generated by the mixin? Well, of course, the lexer could see it is inside its own i"..." token and bypass it, but I still just... don't love it.

Besides, you shouldn't need to interpolate anything except a declaration name in a generated mixin string! And you could always concat them. Concatenation of names just isn't ugly enough to warrant new syntax at all, especially when the new syntax would be problematic. (and I don't want to encourage more interpolation inside mixins. If you are converting stuff to string to mix it in, you are usually doing it wrong.)

So no, my proposal would be ONLY the i"..." string (and perhaps i".."w and i"..."d variants if we want them, which work exactly the same way, just declare the literal args as type wstring and dstring instead of string, and could alias this to a wstring toWstring()() { return text(this.tupleof); } instead - the library function can detect this by virtue of typeof(args[0]) being wstring or dstring instead of string) gets this interpolation feature.

All other string literal types are not changed.

Miscellaneous details

What about formatting or mixing with other stuff? Well, it just delegates all that to the library. The interpolated string is really just sugar for creating a struct, which defines a few methods. You can answer questions today by creating the struct yourself by hand and trying it.

Just remember, the syntax sugar makes that constructor pattern a lot easier, and my goal here is to just lower the barrier of use low enough that this makes D libraries even better than they already are.