[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JSON/YAML/TOML/etc. parsing performance

From: Eli Zaretskii
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Fri, 06 Oct 2017 10:40:58 +0300

> Cc: address@hidden, address@hidden
> From: Paul Eggert <address@hidden>
> Date: Thu, 5 Oct 2017 18:58:34 -0700
> > AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
> > whether a size_t value is too large
> In the most recent patch I proposed, there were only two such checks; 
> there were two other checks for too-large size_t that were fixnum 
> checks, not ptrdiff_t checks.
> However, I can see that you really don't like those checks. So I tweaked 
> the proposed patch to remove them all from json.c. Please see the 
> attached 3 patches (the first is just Philipp's patch rebased to 
> master). The basic idea here is that json.c should be using xmalloc for 
> allocation anyway, for reasons other than size overflow checking. And 
> once it uses the Emacs malloc we can make sure that it never allocates 
> objects that are too large for ptrdiff_t.

Thanks, that's better, although it just pushes the extra checks down
to our memory allocation routines.  It is better because now these
checks are applied to all the places where Lisp objects are created,
instead of requiring higher-level code to include these checks.

> > I'm not arguing for general replacement of ptrdiff_t with size_t, only
> > for doing that in those primitives where negative values are a clear
> > mistake/bug.
> This is exactly where we should be cautious about using unsigned types. 
> The longstanding Emacs style is to prefer signed integers, to avoid the 
> common typos that occur with unsigned. If we start changing the style, 
> these primitives (or people debugging these primitives) often won't be 
> able to distinguish buggy from valid-but-enormous cases.

Those valid-but-enormous values are either invalid (if they are larger
than PTRDIFF_MAX), or easily uncovered by looking at higher call-stack
frames in the debugger, where the actual size of the object being
created is visible.  Yes, it requires some diligence, but then
debugging Emacs is already more complicated than that of C programs
which use only C objects, so I don't see that as a significant

> And when we compile such primitives (or their callers) with
> -fsanitize=undefined, we won't be able to catch integer-overflow
> bugs automatically at runtime, since unsigned integer arithmetic
> silently wraps around even when -fsanitize=undefined is used.

I don't envision many primitives to need this kind of change, so
again, the disadvantage doesn't sound too significant to me.  The
advantage is IMO significant, as doing so will remove the need for
checks that a size_t value doesn't overflow a ptrdiff_t value, so we
will have an overall speedup.  Emacs is accused of being slow, so I
think we should avoid overhead that is only needed in a tiny fraction
of valid use cases.

> I help maintain several GNU programs that use unsigned types for sizes, 
> and find that style to be trickier than the style Emacs uses, with 
> respect to integer-overflow bugs. I've been gradually changing some of 
> the non-Emacs GNU code to use signed types, and the results have been 
> encouraging: the code is more readable and more obviously correct.

I'm not sure that experience is 100% applicable to Emacs, because
Emacs has special needs due to the fact that our integers are narrower
than the corresponding C integral types.

And once again, I'm not arguing for a wholesale switch to size_t, only
for its judicious use in a few primitives that create Lisp objects.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]