Up-to-date trunk clang has introduced a built-in operator called
_Countof, which is like the 'lenof' macro in this code (returns the
number of elements in a statically-declared array object) but with the
safety advantage that it provokes a compile error if you accidentally
use it on a pointer. In this commit I add a cmake-time check for it,
and conditional on that, switch over the definition of lenof.
This should add a safety check for accidental uses of lenof(pointer).
When I tested it with new clang, this whole code base compiled cleanly
with the new setting, so there aren't currently any such accidents.
clang cites C2y as the source for _Countof: WG14 document N3369
initially proposed it under a different name, and then there was a big
internet survey about naming (in which of course I voted for lenof!),
and document N3469 summarises the results, which show that the name
_Countof and/or countof won. Links:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3369.pdfhttps://www.open-std.org/jtc1/sc22/wg14/www/docs/n3469.htm
My reading of N3469 seems to say that there will _either_ be _Countof
by itself, _or_ lowercase 'countof' as a new keyword, but they don't
say which. They say they _don't_ intend to do the same equivocation we
had with _Complex and _Bool, where you have a _Countof keyword and an
optional header file defining a lowercase non-underscore macro
wrapping it. But there hasn't been a new whole draft published since
N3469 yet, so I don't know what will end up in it when there is.
However, as of now, _Countof exists in at least one compiler, and that
seems like enough reason to implement it here. If it becomes 'countof'
in the real standard, then we can always change over later. (And in
that case it would probably make sense to rename the macro throughout
the code base to align with what will become the new standard usage.)
I'm not sure why these have never bothered me before, but a test build
I just made for a completely different reason complained about them!
findtest() did a binary search using a while loop, and then used
variables set in the loop body, which gcc objected to on the grounds
that the body might have run 0 times and not initialised those
variables. Also in the same function gcc objected to the idea that
findrelpos234() might have returned NULL and not set 'index'. I think
neither of these things can actually have _happened_, but let's stop
the compiler complaining anyway.
I mentioned recently (in commit 9e7d4c53d80b6eb) message that I'm no
longer fond of the variable name 'ret', because it's used in two quite
different contexts: it's the return value from a subroutine you just
called (e.g. 'int ret = read(fd, buf, len);' and then check for error
or EOF), or it's the value you're preparing to return from the
_containing_ routine (maybe by assigning it a default value and then
conditionally modifying it, or by starting at NULL and reallocating,
or setting it just before using the 'goto out' cleanup idiom). In the
past I've occasionally made mistakes by forgetting which meaning the
variable had, or accidentally conflating both uses.
If all else fails, I now prefer 'retd' (short for 'returned') in the
former situation, and 'toret' (obviously, the value 'to return') in
the latter case. But even better is to pick a name that actually says
something more specific about what the thing actually is.
One particular bad habit throughout this codebase is to have a set of
functions that deal with some object type (say 'Foo'), all *but one*
of which take a 'Foo *foo' parameter, but the foo_new() function
starts with 'Foo *ret = snew(Foo)'. If all the rest of them think the
canonical name for the ambient Foo is 'foo', so should foo_new()!
So here's a no-brainer start on cutting down on the uses of 'ret': I
looked for all the cases where it was being assigned the result of an
allocation, and renamed the variable to be a description of the thing
being allocated. In the case of a new() function belonging to a
family, I picked the same name as the rest of the functions in its own
family, for consistency. In other cases I picked something sensible.
One case where it _does_ make sense not to use your usual name for the
variable type is when you're cloning an existing object. In that case,
_neither_ of the Foo objects involved should be called 'foo', because
it's ambiguous! They should be named so you can see which is which. In
the two cases I found here, I've called them 'orig' and 'copy'.
As in the previous refactoring, many thanks to clang-rename for the
help.
I think a lot of these were inserted by a prior run through GNU indent
many years ago. I noticed in a more recent experiment that that tool
doesn't always correctly distinguish which instances of 'id * id' are
pointer variable declarations and which are multiplications, so it
spaces some of the former as if they were the latter.
I found these while going through the code, and decided if we're going
to have them then we should compile them. They didn't all compile
first time, proving my point :-)
I've enhanced the tree234 test so that it has a verbose option, which
by default is off.
Now that the new CMake build system is encouraging us to lay out the
code like a set of libraries, it seems like a good idea to make them
look more _like_ libraries, by putting things into separate modules as
far as possible.
This fixes several previous annoyances in which you had to link
against some object in order to get a function you needed, but that
object also contained other functions you didn't need which included
link-time symbol references you didn't want to have to deal with. The
usual offender was subsidiary supporting programs including misc.c for
some innocuous function and then finding they had to deal with the
requirements of buildinfo().
This big reorganisation introduces three new subdirectories called
'utils', one at the top level and one in each platform subdir. In each
case, the directory contains basically the same files that were
previously placed in the 'utils' build-time library, except that the
ones that were extremely miscellaneous (misc.c, utils.c, uxmisc.c,
winmisc.c, winmiscs.c, winutils.c) have been split up into much
smaller pieces.