The 71st meeting of WG14 has concluded. It took place in Minneapolis at the start of October 2024. So what are some of the highlights?
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3366.htm
One of my favourites is JeanHeyd Meneide's "Restartable Functions for Efficient Character Conversions". Basically, C used to have two character types, which meant two types of strings: char
and wchar_t
, and the encoding of their strings was not specified. New character types were added: char8_t
, char16_t
, and char32_t
, whose strings are mandated to be encoded as utf-8, utf-16, and utf-32 respectively.
Unfortunately, functions did not exist to convert between char*
strings and the utf-* strings, nor from the wchar_t
strings. There are functions that convert individual characters though, so that works, right? ...right? No. Not at all.
There are no such functions to convert a wchar_t
to char32_t
without first going via char
, the encoding of which is locale-dependent. Along comes N3366, and gives us a load of functions that look like this:
stdc_mcerr stdc_mcsnrtoc32sn(
size_t *restrict output_size, char32_t *restrict *restrict output,
size_t *restrict input_size, const char *restrict *restrict input,
mbstate_t*restrict state);
This is not as scary as it first appears!restrict
- it basically means "If I can get to something through this pointer, then I will not access that thing in any other way". Skipping that, we end up with:
stdc_mcerr stdc_mcsnrtoc32sn(
size_t* output_size, char32_t** output,
size_t* input_size, const char** input,
mbstate_t*restrict state);
The name breaks down like so:
stdc_mcerr
- error type (or success!)stdc_
- (the new standard prefix)mcs
- a char stringn
- of the specified sizer
- restartable (some encodings require state to be maintained between calls)to
- to c32s
- a char32_t stringn
- of the specified sizeStill not the prettiest, but it is powerful. The pointers are well-justified, and I would do JeanHeyd a disservice by attempting to simplify his words (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3366.htm#design).
These functions are available from and to each character (and string) type, and an implementation is available here: https://ztdcuneicode.readthedocs.io/en/latest/
Thanks JeanHeyd!
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3322.pdf
Zero-length operations on null pointers are now officially okay (for a lot of functions). Up to now, memcpy(dest, nullptr, count)
has been undefined behaviour, no matter the value of count
. But if you're not copying anything, what's the problem if one or both of the other arguments are nullptr
- intuitively, nothing should be touching them unless count is !=0
. Though it is technically undefined behaviour, it's actually very common to rely on it. With one paper, Aaron Ballmann removed a huge swathe of undefined behaviour, and simultaneously made the language more intuitive.
Thanks Aaron!
abs
? uabs
! - N3349https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3349.pdf
abs(...)
and friends can trigger undefined behaviour if called with INT_MIN
. Why? Because abs
returns a signed type, which cannot store -INT_MIN
.
Lenard Mollenkopf and Anton Zellerhoff: "hey, this is UB, how about we make uabs
and friends, that return an unsigned type?"
WG14: "sure!"
Overall, this is a small, easy-to-understand paper that Makes Things Better(TM).
Thanks Lenard and Anton!
if
and switch
- N3356https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3356.htm
Following C++'s practice, it is now possible to declare a variable in an if
clause, which is in-scope for both the if...
and the else...
blocks. My experience with this in C++ is that it's a nice feature to have, and can avoid braces. This also applies to switch
statements, which can include a declaration such as switch(int cheese = select_cheese_type())...
Instead of:
int scones_to_eat = get_hunger();
if (scones_to_eat < 3)
{
nom_nom_nom(scones_to_eat);
}
else
{
share_scones(scones_to_eat);
}
we can now write:
if (int scones_to_eat = get_hunger(); scones_to_eat < 3)
{
nom_nom_nom(scones_to_eat);
}
else
{
share_scones(scones_to_eat);
}
and avoid having scones_to_eat
being introduced into the containing scope. It's another small, easy-to-understand change, that increases consistency with C++.
Thanks Alex!
case 1 ... 3
- N3370https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3370.htm
You know how in C23, within a switch(...)
statement, the case
statements can only have a single integer label? Not any more! They can now include ranges! Before, you could kind of, maybe, possible have a single range by using the default
label and specifying every other option, but that's...not ideal. Now, instead of:
switch(int scones_to_eat = get_hunger())
{
case 0:
case 1:
case 2:
nom_nom_nom(scones_to_eat);
break;
case 3:
case 4:
case 5:
case 6:
case 7:
stash_scones(scones_to_eat);
break;
default:
share_scones(scones_to_eat);
break;
}
we can write:
switch(int scones_to_eat = get_hunger())
{
case 0 ... 2:
nom_nom_nom(scones_to_eat);
break;
case 3 ... 7:
stash_scones(scones_to_eat);
break;
default:
share_scones(scones_to_eat);
break;
}
which I think is much nicer :)
Thanks again Alex!
_Lengthof(an_array)
- N3369https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3369.pdf
WG14 loves standardising existing practice, and there are few macros more common than the one that gives the number of elements in a (non variable-length) array.
You've probably written it yourself: #define LENGTH_OF_ARRAY(x) (sizeof(x) / sizeof(x[0]))
. Well now it's going to be provided for you by the standard:
#define LENGTH_OF_ARRAY(x) (sizeof(x) / sizeof(x[0]))
int scones_per_shop[] = {3, 4, 5, 10, 0, 0};
int number_of_shops = LENGTH_OF_ARRAY(scones_per_shop);
can now be reduced to:
int scones_per_shop[] = {3, 4, 5, 10, 0, 0};
int number_of_shops = _Lengthof(scones_per_shop);
Which has the nice bonus of not needing to include a header with that macro everywhere you want to use it. There was some discussion over the name, and various ones were suggested (with variations on capitalisation/contraction):
nelems(scones_per_shop)
nelementsof(scones_per_shop)
lengthof(scones_per_shop)
dimensionof(scones_per_shop)
But the committee eventually went with lengthof
('uglified' to _Lengthof
to avoid breaking anything, since names starting with an underscore followed by a capital letter are reserved by the standard for its own use).
I think this was the right decision: dimensionof
would have been misleading: whilst C does not strictly speaking have 2D arrays, not many people would look at arr[10][10]
and have the answer 10
. _Lengthof
seems like the best choice to me - I associate size
with a number of bytes, and length
with something more higher-level/more semantic (e.g. the number of characters in a string, or the number of items in an array). Chris Bazley did some investigating within ARM and found that lengthof
(or some variant thereof) was by far the most popular, and I think this would hold true outside ARM too.
Thanks Alejandro and Chris!
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3355.htm
Basically, by putting a label directly in front of a for
loop, you can use labelled break
and continue
statements to break out of that loop specifically, even from within a switch(...)
! Sure, this was possible with goto
, but there is a large amount of animosity towards using goto
, and these are more restricted in what they can do.
search_fridge:
for (int shelf = 0; shelf < top_shelf; ++shelf)
{
for (int i = 0; i < shelf_width; ++i)
{
if (fridge[shelf][i] == Gouda)
break search_fridge;
printf("This isn't the cheese I want!\n");
}
}
Thanks yet again, Alex!
Some of the accepted papers were conveniences or syntactic sugar, reducing the friction for those who are newcomers to C, and others were significant features, such as the character conversion functions.
A lot of undefined behaviour was removed, making the language more predictable and intuitive, which is always helpful. The committee acknowledges the challenges laid by more modern languages, such as Rust, and though C will remain C, work is being done to improve safety and security.
These changes weren't all passed without friction or disagreement, but they got there in the end, and I think C2y will be a major upgrade to C23 - I'm already looking forward to being able to use it.