Re: [Tinycc-devel] Optimizing for avx512

Single pass compilers still can have peephole optimization. Once I demonstrated with situations when tcc generates

MOV EAX,const

MOV [location],EAX

and catched this case to generate

MOV DWORD [location],const

Back to the topic, AVX-512 is difficult to be supported at language level. In the case of tcc, it is impossible because to convert array operations into AVX-512 instructions you need to see a global picture of the program.

However, it is possible at library level. Simply write matrix manipulations functions in assembly language. Inside functions load data into AVX registers, do calculations in registers and at exit write the result into memory.

On Sat, Feb 5, 2022 at 4:55 PM Christian Jullien <eligis@orange.fr> wrote:

An optimizer compiler need several pass to operate.
- constant folding
- register allocation
- peephole optimization
- branch prediction
...

When it knows the target it can reorganize code to keep as much as possible data un L1 cache and have the longest series of instructions that can be executed without breaking the pipeline. i.e. instructions nearly run in //

Tcc, which is one pass compiler, definitely loses on this point. On the other end, one pass makes it damn fast and that's why we love it.

We can't have the butter and the money for the butter

-----Original Message-----
From: rempas@tutanota.com [mailto:rempas@tutanota.com]
Sent: Saturday, February 05, 2022 16:10
To: Jullien; Tinycc Devel
Cc: Tinycc Devel
Subject: Re: [Tinycc-devel] Optimizing for avx512

5 Φεβ 2022, 11:01 Από eligis@orange.fr:

> The price to pay its really fast compilation is that the generated code is poor compared to gcc, clang or vc++ (among others). Depending on your program, consider it is roughly 2 to 4x slower.
>
I would say that this is not always the case. And correct me if I'm wrong but aren't optimization (except few of them) mostly because the programmer wrote bad code and the compiler found a better instructions to do the same thing? Inline assembly exists in the end so if you really really care about performance, you should probably use inline assembly in the most critical algorithms/functions. I've seen some code running the same on TCC and GCC so I suppose optimization doesn't always makes magic. Or you may have a 5% increase or even less. In any case, I would suggest using both TCC and then GCC/Clang for the critical parts that will be hugely favored by the optimizations these compilers can do.

But of course just my opinion on the topic.

_______________________________________________
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

From:	Samir Ribić
Subject:	Re: [Tinycc-devel] Optimizing for avx512
Date:	Sat, 5 Feb 2022 17:13:40 +0100