bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41615: [feature/native-comp] Dump prettier C code.


From: Nicolas Bértolo
Subject: bug#41615: [feature/native-comp] Dump prettier C code.
Date: Sun, 31 May 2020 14:26:46 -0300

> I believe bzero is unnecessary given these are static allocated.

Ok with me.

> For memcpy we can just use the standard library implementation given
>  elns are linked to it.  The other advantage is that doing this way (here
> at least) memcpy is not inlined also at speed 3, so we don't trap in the
> optimizer issue!

This is good!

> All summed is even a little faster than the stock patch and closer to
> the one with the specific GCC blob support.

Good.

> Let me know if you like the attached and if does the job for you too.

I like it. I see calls to memcpy even with -O3, which is great.

Nico

El dom., 31 may. 2020 a las 13:57, Andrea Corallo (<akrl@sdf.org>) escribió:
>
> Nicolas Bértolo <nicolasbertolo@gmail.com> writes:
>
> >> I like this considerably less :)
> >
> > Ok, let's say goodbye to this patch.
> >
> >> It introduces quite some complexity and the same advantage in
> >> debuggability can be achieved with something like the attached 8 line
> >> patch (untested).
> >
> > Sounds good, I haven't tested it either.
> >
> >> Generally speaking I want to try to keep our back-end as simple as we
> >> manage to.
> >
> > I initially wrote this patch chasing the reason for slow compile times. I 
> > think
> > that a 10k line C file should be compiled much faster than what gccjit 
> > achieves.
> > I thought that "uncommon" (for C) ways of doing thing were causing gccjit 
> > to get
> > stuck trying to optimize them hard, until it gave up. I thought that 
> > filling the
> > static data using memcpy() and constant strings would help GCC recognize 
> > this as
> > a constant initialization and hopefully just store a completely initialized 
> > copy
> > in memory.
> >
> > I found that GCC would inline memcpy() and the static initialization would 
> > turn
> > into a very long unrolled loop with SSE instructions. I tested this with -O3
> > only in gccjit to force maximum optimization. I found this super strange
> > considering that -ftree-loop-distribute-patterns is enabled at -O3 and it 
> > should
> > recognize the naive_memcpy() function as an implementation of memcpy() and 
> > issue
> > calls to libc's implementation. Instead, it was inlining and unrolling it.
>
> Ok you confirm the suspects I wrote in the other mail!
>
> I've used your patch as a base, apart for minors here and there I've
> stripped out the definitions of bzero and memcpy.
>
> I believe bzero is unnecessary given these are static allocated.
>
> For memcpy we can just use the standard library implementation given
> elns are linked to it.  The other advantage is that doing this way (here
> at least) memcpy is not inlined also at speed 3, so we don't trap in the
> optimizer issue!
>
> All summed is even a little faster than the stock patch and closer to
> the one with the specific GCC blob support.
>
> Let me know if you like the attached and if does the job for you too.
>
> Thanks
>
>   Andrea
>
> --
> akrl@sdf.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]