WG14 Meeting Report

WG14 - 71st meeting

The 71st meeting of WG14 has concluded. It took place in Minneapolis at the start of October 2024. So what are some of the highlights?

Unicode - N3366

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3366.htm
One of my favourites is JeanHeyd Meneide's "Restartable Functions for Efficient Character Conversions". Basically, C used to have two character types, which meant two types of strings: char and wchar_t, and the encoding of their strings was not specified. New character types were added: char8_t, char16_t, and char32_t, whose strings are mandated to be encoded as utf-8, utf-16, and utf-32 respectively.

Unfortunately, functions did not exist to convert between char* strings and the utf-* strings, nor from the wchar_t strings. There are functions that convert individual characters though, so that works, right? ...right? No. Not at all.

There are no such functions to convert a wchar_t to char32_t without first going via char, the encoding of which is locale-dependent. Along comes N3366, and gives us a load of functions that look like this:

	stdc_mcerr stdc_mcsnrtoc32sn(
		size_t *restrict output_size, char32_t   *restrict *restrict output,
		size_t *restrict input_size,  const char *restrict *restrict input,
		mbstate_t*restrict state);

This is not as scary as it first appears!
First: ignore restrict - it basically means "If I can get to something through this pointer, then I will not access that thing in any other way". Skipping that, we end up with:

	stdc_mcerr stdc_mcsnrtoc32sn(
		size_t* output_size, char32_t**   output,
		size_t* input_size,  const char** input,
		mbstate_t*restrict state);

The name breaks down like so:

stdc_mcerr - error type (or success!)
stdc_ - (the new standard prefix)
mcs - a char string
n - of the specified size
r - restartable (some encodings require state to be maintained between calls)
to - to
c32s - a char32_t string
n - of the specified size

Still not the prettiest, but it is powerful. The pointers are well-justified, and I would do JeanHeyd a disservice by attempting to simplify his words (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3366.htm#design).

These functions are available from and to each character (and string) type, and an implementation is available here: https://ztdcuneicode.readthedocs.io/en/latest/

Thanks JeanHeyd!

Zero-length operations on null pointers - N3322

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3322.pdf
Zero-length operations on null pointers are now officially okay (for a lot of functions). Up to now, memcpy(dest, nullptr, count) has been undefined behaviour, no matter the value of count. But if you're not copying anything, what's the problem if one or both of the other arguments are nullptr - intuitively, nothing should be touching them unless count is !=0. Though it is technically undefined behaviour, it's actually very common to rely on it. With one paper, Aaron Ballmann removed a huge swathe of undefined behaviour, and simultaneously made the language more intuitive.

Thanks Aaron!

`abs`? `uabs`! - N3349

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3349.pdf
abs(...) and friends can trigger undefined behaviour if called with INT_MIN. Why? Because abs returns a signed type, which cannot store -INT_MIN.
Lenard Mollenkopf and Anton Zellerhoff: "hey, this is UB, how about we make uabs and friends, that return an unsigned type?"
WG14: "sure!"
Overall, this is a small, easy-to-understand paper that Makes Things Better(TM).

Thanks Lenard and Anton!

Declarations in `if` and `switch` - N3356

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3356.htm
Following C++'s practice, it is now possible to declare a variable in an if clause, which is in-scope for both the if... and the else... blocks. My experience with this in C++ is that it's a nice feature to have, and can avoid braces. This also applies to switch statements, which can include a declaration such as switch(int cheese = select_cheese_type())... Instead of:

	int scones_to_eat = get_hunger();
	if (scones_to_eat < 3)
	{
		nom_nom_nom(scones_to_eat);
	}
	else
	{
		share_scones(scones_to_eat);
	}

we can now write:

	if (int scones_to_eat = get_hunger(); scones_to_eat < 3)
	{
		nom_nom_nom(scones_to_eat);
	}
	else
	{
		share_scones(scones_to_eat);
	}

and avoid having scones_to_eat being introduced into the containing scope. It's another small, easy-to-understand change, that increases consistency with C++.

Thanks Alex!

Case Ranges: `case 1 ... 3` - N3370

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3370.htm
You know how in C23, within a switch(...) statement, the case statements can only have a single integer label? Not any more! They can now include ranges! Before, you could kind of, maybe, possible have a single range by using the default label and specifying every other option, but that's...not ideal. Now, instead of:

	switch(int scones_to_eat = get_hunger())
	{
		case 0:
		case 1:
		case 2:
			nom_nom_nom(scones_to_eat);
			break;
		case 3:
		case 4:
		case 5:
		case 6:
		case 7:
			stash_scones(scones_to_eat);
			break;
		default:
			share_scones(scones_to_eat);
			break;
	}

we can write:

	switch(int scones_to_eat = get_hunger())
	{
		case 0 ... 2:
			nom_nom_nom(scones_to_eat);
			break;
		case 3 ... 7:
			stash_scones(scones_to_eat);
			break;
		default:
			share_scones(scones_to_eat);
			break;
	}

which I think is much nicer :)

Thanks again Alex!

`_Lengthof(an_array)` - N3369

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3369.pdf
WG14 loves standardising existing practice, and there are few macros more common than the one that gives the number of elements in a (non variable-length) array.

You've probably written it yourself: #define LENGTH_OF_ARRAY(x) (sizeof(x) / sizeof(x[0])). Well now it's going to be provided for you by the standard:

	#define LENGTH_OF_ARRAY(x)  (sizeof(x) / sizeof(x[0]))

	int scones_per_shop[] = {3, 4, 5, 10, 0, 0};
	int number_of_shops = LENGTH_OF_ARRAY(scones_per_shop);

can now be reduced to:

	int scones_per_shop[] = {3, 4, 5, 10, 0, 0};
	int number_of_shops = _Lengthof(scones_per_shop);

Which has the nice bonus of not needing to include a header with that macro everywhere you want to use it. There was some discussion over the name, and various ones were suggested (with variations on capitalisation/contraction):

nelems(scones_per_shop)
nelementsof(scones_per_shop)
lengthof(scones_per_shop)
dimensionof(scones_per_shop)

But the committee eventually went with lengthof ('uglified' to _Lengthof to avoid breaking anything, since names starting with an underscore followed by a capital letter are reserved by the standard for its own use).

I think this was the right decision: dimensionof would have been misleading: whilst C does not strictly speaking have 2D arrays, not many people would look at arr[10][10] and have the answer 10. _Lengthof seems like the best choice to me - I associate size with a number of bytes, and length with something more higher-level/more semantic (e.g. the number of characters in a string, or the number of items in an array). Chris Bazley did some investigating within ARM and found that lengthof (or some variant thereof) was by far the most popular, and I think this would hold true outside ARM too.

Thanks Alejandro and Chris!

Named loops - N3355

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3355.htm
Basically, by putting a label directly in front of a for loop, you can use labelled break and continue statements to break out of that loop specifically, even from within a switch(...)! Sure, this was possible with goto, but there is a large amount of animosity towards using goto, and these are more restricted in what they can do.

	search_fridge:
	for (int shelf = 0; shelf < top_shelf; ++shelf)
	{
		for (int i = 0; i < shelf_width; ++i)
		{
			if (fridge[shelf][i] == Gouda)
				break search_fridge;
			printf("This isn't the cheese I want!\n");
		}
	}

Thanks yet again, Alex!

Concluding Thoughts

Some of the accepted papers were conveniences or syntactic sugar, reducing the friction for those who are newcomers to C, and others were significant features, such as the character conversion functions.

A lot of undefined behaviour was removed, making the language more predictable and intuitive, which is always helpful. The committee acknowledges the challenges laid by more modern languages, such as Rust, and though C will remain C, work is being done to improve safety and security.

These changes weren't all passed without friction or disagreement, but they got there in the end, and I think C2y will be a major upgrade to C23 - I'm already looking forward to being able to use it.

WG14 - 71st meeting

Unicode - N3366

Zero-length operations on null pointers - N3322

abs? uabs! - N3349

Declarations in if and switch - N3356

Case Ranges: case 1 ... 3 - N3370

_Lengthof(an_array) - N3369