Adam shares Windows console secrets - DO NOT USE chcp!!

Posted 2019-11-25

The community announcement about a webassembly proposal is solid, and the author has done some good work toward it already. I'm excited to see where he goes with it!

Meanwhile, I am using an older technology and want to clear up some common misconceptions about it - how to properly program input and output on the Windows (Vista+) console API.

Core D Development Statistics

In the community

Community announcements

See more at the announce forum.

Tip of the Week

Unicode

A common piece of advice about using the Windows console is to chcp 65001 (or its API equivalent, SetConsoleOutputCP) to use UTF-8 on it. While this works to some extent, it brings with it some strange bugs, especially in conjunction with other programs. I worked on a project last week that saw random font changes because it would spawn another process that interact badly with the console state. I do not recommend doing this.

A better way to use it is to sidestep the code page question entirely by using the wide character APIs that Windows provides. Then it will work no matter the console's settings without side effects. The tricky part is these APIs only work on consoles, so if you want console as well as pipe capability, you will need to test and branch.

Let me show you some code:

1 
2 // this detects if the output is a console, or something else.
3 bool isConsole(int fd) {
4 	import core.sys.windows.windows;
5 	import std.conv;
6 
7 	version(Posix) {
8 		import core.sys.posix.unistd;
9 		return cast(bool) isatty(fd);
10 	} else version(Windows) {
11 		auto hConsole = GetStdHandle(fd == 0 ? STD_INPUT_HANDLE : STD_OUTPUT_HANDLE);
12 		return GetFileType(hConsole) == FILE_TYPE_CHAR;
13 	} else
14 		static assert(0);
15 }
16 
17 version(Windows)
18 string readln() {
19 	import core.sys.windows.windows;
20 	import std.conv;
21 
22 	if(isConsole(0)) {
23 		// if in a console, we want to
24 		// use ReadConsoleW and convert that
25 		// input data to UTF-8 to return.
26 
27 		wchar[] input;
28 		wchar[2048] staticBuffer;
29 		// this loops because the buffer might be too small for a line
30 		// (though I doubt that will actually happen)
31 		while(input.length == 0 || input[$ - 1] != '\n') {
32 			// if we are on a second loop, we need to copy the input
33 			// away so the next write doesn't smash data we must store
34 			if(input.ptr is staticBuffer.ptr)
35 				input = input.dup;
36 			DWORD chars = staticBuffer.length;
37 			if(!ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), staticBuffer.ptr, chars, &chars, null))
38 				throw new Exception("read stdin failed " ~ to!string(GetLastError()));
39 			if(input is null)
40 				input = staticBuffer[0 .. chars];
41 			else
42 				input ~= staticBuffer[0 .. chars];
43 		}
44 
45 		char[] buffer;
46 		auto got = WideCharToMultiByte(CP_UTF8, 0, input.ptr, cast(int) input.length, null, 0, null, null);
47 		if(got == 0)
48 			throw new Exception("conversion preparation failed " ~ to!string(GetLastError()));
49 		buffer.length = got;
50 
51 		got = WideCharToMultiByte(CP_UTF8, 0, input.ptr, cast(int) input.length, buffer.ptr, cast(int) buffer.length, null, null);
52 		if(got == 0)
53 			throw new Exception("conversion actual failed " ~ to!string(GetLastError()));
54 
55 		// drop the terminator, or maybe you can convert it
56 		// from \r\n to \n or whatever.
57 		auto ret = cast(string) buffer[0 .. got];
58 		if(ret.length && ret[$ - 1] == 10)
59 			ret = ret[0 .. $ - 1];
60 		if(ret.length && ret[$ - 1] == 13)
61 			ret = ret[0 .. $ - 1];
62 		return ret;
63 	} else {
64 		// for pipe or file redirection, just use the normal
65 		// thing, utf-8 may be cool there, but you should do
66 		// whatever is best for interoperability there.
67 		import std.stdio;
68 		return stdin.readln(); // maybe trim the \n too btw
69 	}
70 }
71 
72 version(Windows)
73 void writeln(string s) { // you might actually do variadic template for full compatibility, but that's trivial
74 	import core.sys.windows.windows;
75 	import std.conv;
76 
77 	// again, it is important to branch on output
78 	// being the console or redirected.
79 	if(isConsole(1)) {
80 		wchar[2048] staticBuffer;
81 		wchar[] buffer = staticBuffer[];
82 		DWORD i = 0;
83 
84 		// in here I do the conversion in D instead of
85 		// using the Windows function, since I know it is
86 		// going UTF-8 to UTF-16, but you could do
87 		// it the other way too.
88 		foreach(wchar c; s) {
89 			if(i + 2 >= buffer.length)
90 				buffer.length = buffer.length * 2;
91 			buffer[i++] = c;
92 		}
93 		if(i + 2 > buffer.length)
94 			buffer.length = buffer.length + 2;
95 		// adding the new line
96 		buffer[i++] = 13;
97 		buffer[i++] = 10;
98 		DWORD actual;
99 		// and now calling the wide char function
100 		if(!WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), buffer.ptr, i, &actual, null))
101 			throw new Exception("write console failed " ~ to!string(GetLastError()));
102 	} else {
103 		// non-console should still be utf-8, let it work normally
104 		// or whatever for interop (WriteConsoleW would fail here anyway)
105 		import std.stdio;
106 		stdout.writeln(s);
107 	}
108 }
109 
110 version(Posix) {
111 	import std.stdio : readln, writeln; // these are OK there
112 }
113 
114 // and now this code will just work
115 void main() {
116 	writeln("\€ euros");
117 	auto r = readln(); // try using alt+ codes here too
118 	writeln(r);
119 }

I actually suggest Phobos do something like this internally since it is a little more sane for the users. My terminal.d uses these wide char functions (though it ALSO sets the utf-8 codepage, I think I am going to remove that in my next commit since it is unnecessary).

Tab completion

ReadConsoleW, since Windows Vista, also has an interesting feature that isn't very well documented (one of the top search results is now a Stack Overflow answer I wrote up myself about 2 1/2 years ago! https://stackoverflow.com/a/43836992/1457000 ) that has some built-in support for tab completion.

ReadConsoleW also uses Windows' built in command history. cmd.exe uses this stuff and your programs can too. You can also modify this, see: https://docs.microsoft.com/en-us/windows/console/setconsolehistoryinfo

Anyway, when using these features, you should also do a isConsole check, but I'll leave that out of the next sample program for brevity.

1 import core.sys.windows.windows;
2 
3 // druntime's Windows bindings don't include
4 // this struct, but it is the one used from MSDN
5 // see: https://docs.microsoft.com/en-us/windows/console/console-readconsole-control
6 struct CONSOLE_READCONSOLE_CONTROL {
7 	ULONG nLength;
8 	ULONG nInitialChars;
9 	ULONG dwCtrlWakeupMask;
10 	ULONG dwControlKeyState;
11 }
12 
13 
14 // see the sample above for more correct code with pipe detection,
15 // error handling, charset conversion, and buffer reuse. I am
16 // simplifying this to focus on the one special parameter.
17 void main() {
18 	wchar[2048] buffer;
19 	DWORD chars;
20 	CONSOLE_READCONSOLE_CONTROL crc;
21 
22 	// abstracting into nested function to resue later
23 	void readAgain(int keep) {
24 		// this is required by the API, it is basically the Win32
25 		// api's way to handle backward compatibility and polymorphism
26 		crc.nLength = CONSOLE_READCONSOLE_CONTROL.sizeof;
27 
28 		// this tells how many chars in the buffer you want Windows to
29 		// preserve. I'll come back to this.
30 		crc.nInitialChars = keep;
31 
32 		// this is the most interesting parameter: it tells which keys
33 		// are going to trigger the return. It is a 32 bit mask, thus can
34 		// only do ascii characters 1-31 (inclusive). You can NOT wake
35 		// up on spacebar or other keys.
36 		//
37 		// Like with other bitmasks, you add an item to it by oring on
38 		// 1 << bit for the thing. So `1 << '\t'` adds tab to the list.
39 		// 1 << 4 adds ctrl+d to the list (generally, ctrl+X is represented
40 		// by the ascii character matching the alphabetical order.
41 		// ctrl+a = 1, ctrl+b = 2, ctrl+c = 3, etc.
42 		crc.dwCtrlWakeupMask |= 1 << '\t'; // add tab
43 		crc.dwCtrlWakeupMask |= 1 << 4; // add ctrl+d
44 		// so now we'll wake up on tab OR on ctrl+d
45 
46 		// the last item, dwControlKeyState is actually set BY the function, so
47 		// we will leave it at 0.
48 		crc.dwControlKeyState = 0;
49 
50 		chars = cast(int) (buffer.length - keep);
51 
52 		ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer.ptr, chars, &chars, &crc);
53 
54 		// after the call, the function populates two members: the `chars`
55 		// variable contains the amount of the buffer used, and
56 		// `crc.dwControlKeyState` tells some of the user's keyboard state when the
57 		// function returned, if it returned early (pressing enter does NOT set this!).
58 
59 		// so let's call it with an initial empty buffer, hence 0
60 
61 		// Now, the buffer will have the content of what the user typed... with the
62 		// character at the current cursor position REPLACED with the completion key
63 		// the user pressed.
64 
65 		auto input = buffer[0 .. chars]; // slice it to what we actually got
66 
67 		// for example to test the control key state, though it isn't that useful IMO
68 		// maybe shift+tab could be something else
69 		if(crc.dwControlKeyState & SHIFT_PRESSED)
70 			{}
71 
72 		// I don't know if there is any other way to detect this other than
73 		// a scan.. if(crc.dwControlKeyState) may work - it is definitely zero
74 		// if not early returned, but it may also be zero if it is early return,
75 		// but the user had numlock turned off, didn't hit shift, etc. too.
76 		//
77 		// so let's linear scan and see if we got an early return
78 
79 		foreach(idx, ch; input) {
80 			if(ch == '\t') {
81 				// tab complete requested here
82 
83 				// so we should fill in from this char up with the completion, then everything after it
84 				// already in the buffer is likely to be discarded, though you can move it if you want to keep
85 				// it. Note that the internal cursor will be at the end of the input no matter what.
86 				auto addition = "tab-completed"w;
87 				buffer[idx .. idx + addition.length] = addition[];
88 				// after you copy your completion into the buffer, you will want to update the
89 				// screen display for the user - if completing from the cursor, you can write out the
90 				// new text, plus some spaces to clear out previous input to avoid confusing the user.
91 				WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), buffer.ptr + idx, cast(int) addition.length, null, null);
92 				// get where the cursor is now...
93 				if(chars >  idx + addition.length) {
94 					// there was other stuff before we need to erase...
95 
96 					// fill with spaces
97 					buffer[idx + addition.length .. chars] = ' ';
98 
99 					// get current cursor position
100 					CONSOLE_SCREEN_BUFFER_INFO sbi;
101 					GetConsoleScreenBufferInfo(GetStdHandle(STD_OUTPUT_HANDLE), &sbi);
102 
103 					DWORD actuallyWritten;
104 					// write some spaces to erase the old stuff
105 					// at the current position, without moving the cursor again
106 					WriteConsoleOutputCharacterW(GetStdHandle(STD_OUTPUT_HANDLE),
107 						// where the spaces are
108 						buffer.ptr + idx + addition.length,
109 						// how many spaces there are
110 						chars - idx - addition.length,
111 						// write at the current cursor position
112 						sbi.dwCursorPosition,
113 						// it returns chars actually written, we don't
114 						// care but it will access violation if not here
115 						&actuallyWritten);
116 
117 					// it might be better btw to clear the whole line rather than write spaces but meh
118 				}
119 				// keep the stuff up to the index plus the addition, and read again!
120 				readAgain(cast(int) (idx + addition.length));
121 				break;
122 			} if(ch == 4) {
123 				// ctrl+d done here
124 				// lets just truncate here, though it could possibly complete some way too
125 				chars = cast(DWORD) idx;
126 			}
127 		}
128 
129 	}
130 
131 	readAgain(0);
132 
133 	auto msg = "\nThanks for typing! You said:\n\n"w;
134 	WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), msg.ptr, cast(int) msg.length, null, null); 
135 	WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), buffer.ptr, chars, null /* chars written */, null /* reserved */);
136 }

Windows' built in line editing, history, and completion is a little awkward on the API side as well as on the user side, especially if you are used to things like GNU readline, but it is functional and can be done with relatively little code. There's very close to zero documentation for this on the 'net, so hopefully you find this useful.

Or of course, you can use my arsd.terminal which has a getline function built in which is even easier to use :)

But, to be fair, the Windows function currently works better with some forms of input. I still need to fix some handling of alt+nnnn input and double-wide unicode characters in my library. Such things work automatically in Windows using the native API.