[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: using size_t for search bindings in info reader bug hidden versus un
From: |
Gavin Smith |
Subject: |
Re: using size_t for search bindings in info reader bug hidden versus underflow |
Date: |
Tue, 8 Oct 2024 17:05:59 +0100 |
On Tue, Oct 08, 2024 at 01:46:56AM +0200, Patrice Dumas wrote:
> Hello,
>
> In the info reader, as part of an effort to avoid comparison of signed
> and unsigned integers, and also to have a clearer code, I am considering
> setting SEARCH_BINDING start and end offsets to size_t instead of long.
> Indeed, this should be a bug if they are negative (although there were
> places in the code where they could become negative temporarily, before
> being reset to 0 right after, which I modified).
As Eli said, unsigned types in C can be dangerous. I have spent hours
in the past on several occasions trying to debug programs that misused
unsigned types, including the info reader. Having suffered this, I
object to this changing on the basis of so-called clarity or correctness.
Simply blindly changing long to size_t throughout a program is unlikely
to be correct. I remember that some types were changed to size_t in the
info reader before I started working on it, but as a result some checks
no longer worked, such as a check that a value was greater than or equal
to 0 when this was guaranteed to be correct due to language semantics.
In C, if an integers of unsigned type is added to subtracted from an
integer of signed type, then the integer of signed type is converted
to the unsigned type before the operation, and the result is of the
unsigned type. (Although I think that is not quite the full story -
if one of the integers is of type "unsigned short int" and "short int"
is a narrower type than "int", then the "unsigned short int" is converted
to "signed int" instead - conversion in the other direction from
signed to unsigned. The exact rules are difficult to remember if you
don't use them every day. If all the involved types are signed you
don't really need to worry about it.)
This means that comparisons involving unsigned types do not obey basic
algebraic properties such as "a > b" being equivalent to "a - b > 0".
Although I don't remember now the exact reason that unsigned types
caused a problem in "info", you can imagine a situation where they
would if searching in a range of offsets centred at a location, such
as looking for a node around the value in the tags table. Pseudocode:
unsigned node_location;
unsigned min, max;
min = node_location - 1000;
if (min < 0)
min = 0;
min = node_location + 1000;
if (max > document_end)
max = document_end;
The check here that min < 0 will not work. Why should we need to worry
about this? Basically unsigned types always need to be treated with
care. They may appear to work correctly most of the time but could
easily lead to bugs. (Negative numbers were invented for a reason and
there is nothing wrong with using them.)