libffcall
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libffcall] Return small structs in registers for powerpc on openbsd


From: Josh Elsasser
Subject: Re: [Libffcall] Return small structs in registers for powerpc on openbsd
Date: Mon, 28 Oct 2019 13:35:14 -0700
User-agent: Mutt/1.5.22 (2013-10-16)

On Thu, Oct 17, 2019 at 07:16:38PM -0700, Josh Elsasser wrote:
> On Thu, Oct 17, 2019 at 11:29:05PM +0200, Bruno Haible wrote:
> > Hi Josh,
> > 
> > Thank you for the patch.
> > It will take me a while to make sure your patch is good, and include it.
> 
> Of course, thanks for taking the time to look.
> 
> > Maybe you can help me with some infos:
> >   - Which OpenBSD version and which qemu-system-ppc or qemu-system-ppc64
> >     options do you recommend for running it in a VM?
> 
> I've never tried that before, I'll poke around and see if I can find a
> working configuration.

I was unable to run openbsd/macppc under qemu, but had some luck with
netbsd 7.1.1 and qemu 2.8. After building qemu with the ppc-softmmu
target and downloading
https://cdn.netbsd.org/pub/NetBSD/NetBSD-7.1.1/images/NetBSD-7.1.1-macppc.iso
I created the disk image and started qemu:

$ qemu-img create -f qcow2 netbsdmacppc.qcow2 10g
$ qemu-system-ppc -m 512 -M g3beige -hda netbsdmacppc.qcow2 -cdrom 
NetBSD-7.1.1-macppc.iso -prom-env boot-device=cd:,ofwboot.xcf -net 
nic,model=i82551 -net user

It should boot into the netbsd installer. As per the netbsd install
doc, I chose (S)hell and then partitioned manually: (note C vs. c)

# stty erase '^?'
# pdisk /dev/wd0c
i
C 64 65472 boot Apple_HFS
c 65536 196608 swap b
c 262144 20709376 root a
w
q
# sysinst

Next I followed the menus to install, selecting "Use existing
partitions", installing sets from cdrom, and then configuring the
system. I skipped networking as the installer has no DHCP client.

Then I exited to the shell and finished setting up:

# mount /dev/wd0a /mnt
# chroot /mnt /bin/ksh
# echo dhcp > /etc/ifconfig.fxp0
# dhcpcd -4w
# echo 
PKG_PATH=http://cdn.netbsd.org/pub/pkgsrc/packages/NetBSD/powerpc/7.1/All > 
/etc/pkg_install.conf
# pkg_add hfsutils
# export PATH="$PATH:/usr/pkg/bin"
# hformat /dev/wd0d
# hcopy /usr/mdec/ofwboot.xcf :
# humount
# mount /dev/cd0a /mnt
# mv /netbsd /netbsd.orig
# gzip -d < /mnt/macppc/binary/kernel/netbsd-GENERIC_MD.gz > /netbsd
# halt -p

Note that I moved the original kernel out of the way and replaced it
with one containing the embedded installer ramdisk. The normal kernel
hung on boot for me and I worked around it by booting the ramdisk
kernel with -a to force the mountroot prompt. You'll have to enter -a
at the Boot: prompt and wd0a at the "root device" prompt. This is
the command I used to boot post-install:

$ qemu-system-ppc -m 512 -M g3beige -hda netbsdmacppc.qcow2 -prom-env 
boot-command='boot hd:,ofwboot.xcf -a' -net nic,model=i82551 -net 
user,hostfwd=tcp::2222-:22

Good luck, hopefully my success is reproducable.

> >   - If gcc 4.1.2 needs patches, to make it compile, maybe a different
> >     version is simpler to use. How about using gcc 4.2.4 instead?
> 
> I'll give 4.2.4 a try.
> 
> >   - Do you have a formal documentation of the "secure plt ABI"?
> 
> This is the reference I was reading:
> 
> https://www.polyomino.org.uk/publications/2011/Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf
> 
> I also found this helpful:
> 
> http://www.sourceware.org/ml/binutils/2005-05/txt00011.txt

I've dropped the secure plt bits from this version of the diff as
they're not needed for netbsd. I'll send another diff after
openbsd/netbsd small struct return is taken care of.

> > > Replace the unused powerpc small-struct-copying code with hppa's,
> > > which copies structs of all sizes.
> > 
> > Oh really? I thought that the hppa code for small structs was so braindead
> > that it would remain the only OS that needs this code. The code for
> > m68k, s390, sparc is simpler.
> 
> The existing code doesn't handle 3, 5, 6, or 7 byte structs. It looks
> like riscv uses the hppa code too. I wonder if the compiler pads
> structs to a full word on those other platforms?
> 
> > Bruno
> 

Here's an updated patch, with the secure plt bits removed, netbsd
added to the ifdefs with openbsd, and with a couple obvious mistakes
fixed. I think I've figured out how to build a proper release tarball
now, and have checked that it builds and passes all tests for powerpc
linux and netbsd.

diff --git avcall/avcall-internal.h avcall/avcall-internal.h
index 0357a77..3a8d318 100644
--- avcall/avcall-internal.h
+++ avcall/avcall-internal.h
@@ -191,7 +191,7 @@ typedef int __av_alist_verify[2*(__AV_ALIST_SIZE_BOUND - 
(int)sizeof(__av_alist)
 #define __av_start_struct3(LIST)  \
   ((LIST).flags |= __AV_REGISTER_STRUCT_RETURN, 0)
 #endif
-#if (defined(__i386__) && !defined(_WIN32)) || defined(__m68k__) || 
(defined(__powerpc__) && !defined(__powerpc64__)) || (defined(__s390__) && 
!defined(__s390x__))
+#if (defined(__i386__) && !defined(_WIN32)) || defined(__m68k__) || 
(defined(__powerpc__) && !defined(__powerpc64__) && !(defined(__OpenBSD__) || 
defined(__NetBSD__))) || (defined(__s390__) && !defined(__s390x__))
 #define __av_reg_struct_return(LIST,TYPE_SIZE,TYPE_SPLITTABLE)  \
   ((TYPE_SIZE) == 1 || (TYPE_SIZE) == 2 || (TYPE_SIZE) == 4            \
    || ((TYPE_SIZE) == 8 && (TYPE_SPLITTABLE)                           \
@@ -247,6 +247,15 @@ typedef int __av_alist_verify[2*(__AV_ALIST_SIZE_BOUND - 
(int)sizeof(__av_alist)
 #define __av_start_struct3(LIST)  \
   ((LIST).flags |= __AV_REGISTER_STRUCT_RETURN, 0)
 #endif
+#if defined(__powerpc__) && !defined(__powerpc64__) && (defined(__OpenBSD__) 
|| defined(__NetBSD__))
+#define __av_reg_struct_return(LIST,TYPE_SIZE,TYPE_SPLITTABLE)  \
+  ((TYPE_SIZE) <= 8)
+/* Turn on __AV_REGISTER_STRUCT_RETURN if __AV_SMALL_STRUCT_RETURN was set
+ * and the struct will actually be returned in registers.
+ */
+#define __av_start_struct3(LIST)  \
+  ((LIST).flags |= __AV_REGISTER_STRUCT_RETURN, 0)
+#endif
 #if (defined(__powerpc64__) && !defined(__powerpc64_elfv2__)) || 
defined(__s390x__)
 #define __av_reg_struct_return(LIST,TYPE_SIZE,TYPE_SPLITTABLE)  \
   0
diff --git avcall/avcall-powerpc.c avcall/avcall-powerpc.c
index 5d1b6f8..17cbde8 100644
--- avcall/avcall-powerpc.c
+++ avcall/avcall-powerpc.c
@@ -196,19 +196,96 @@ avcall_call(av_alist* list)
   } else
   if (l->rtype == __AVstruct) {
     if (l->flags & __AV_REGISTER_STRUCT_RETURN) {
-      if (l->rsize == sizeof(char)) {
-        RETURN(char, i);
-      } else
-      if (l->rsize == sizeof(short)) {
-        RETURN(short, i);
-      } else
-      if (l->rsize == sizeof(int)) {
-        RETURN(int, i);
-      } else
-      if (l->rsize == 2*sizeof(__avword)) {
+      if (l->rsize > 0 && l->rsize <= 8) {
         void* raddr = l->raddr;
-        ((__avword*)raddr)[0] = i;
-        ((__avword*)raddr)[1] = iret2;
+        #if 0 /* Unoptimized */
+        if (l->rsize == 1) {
+          ((unsigned char *)raddr)[0] = (unsigned char)(i);
+        } else
+        if (l->rsize == 2) {
+          ((unsigned char *)raddr)[0] = (unsigned char)(i>>8);
+          ((unsigned char *)raddr)[1] = (unsigned char)(i);
+        } else
+        if (l->rsize == 3) {
+          ((unsigned char *)raddr)[0] = (unsigned char)(i>>16);
+          ((unsigned char *)raddr)[1] = (unsigned char)(i>>8);
+          ((unsigned char *)raddr)[2] = (unsigned char)(i);
+        } else
+        if (l->rsize == 4) {
+          ((unsigned char *)raddr)[0] = (unsigned char)(i>>24);
+          ((unsigned char *)raddr)[1] = (unsigned char)(i>>16);
+          ((unsigned char *)raddr)[2] = (unsigned char)(i>>8);
+          ((unsigned char *)raddr)[3] = (unsigned char)(i);
+        } else
+        if (l->rsize == 5) {
+          ((unsigned char *)raddr)[0] = (unsigned char)(i);
+          ((unsigned char *)raddr)[1] = (unsigned char)(iret2>>24);
+          ((unsigned char *)raddr)[2] = (unsigned char)(iret2>>16);
+          ((unsigned char *)raddr)[3] = (unsigned char)(iret2>>8);
+          ((unsigned char *)raddr)[4] = (unsigned char)(iret2);
+        } else
+        if (l->rsize == 6) {
+          ((unsigned char *)raddr)[0] = (unsigned char)(i>>8);
+          ((unsigned char *)raddr)[1] = (unsigned char)(i);
+          ((unsigned char *)raddr)[2] = (unsigned char)(iret2>>24);
+          ((unsigned char *)raddr)[3] = (unsigned char)(iret2>>16);
+          ((unsigned char *)raddr)[4] = (unsigned char)(iret2>>8);
+          ((unsigned char *)raddr)[5] = (unsigned char)(iret2);
+        } else
+        if (l->rsize == 7) {
+          ((unsigned char *)raddr)[0] = (unsigned char)(i>>16);
+          ((unsigned char *)raddr)[1] = (unsigned char)(i>>8);
+          ((unsigned char *)raddr)[2] = (unsigned char)(i);
+          ((unsigned char *)raddr)[3] = (unsigned char)(iret2>>24);
+          ((unsigned char *)raddr)[4] = (unsigned char)(iret2>>16);
+          ((unsigned char *)raddr)[5] = (unsigned char)(iret2>>8);
+          ((unsigned char *)raddr)[6] = (unsigned char)(iret2);
+        } else
+        if (l->rsize == 8) {
+          ((unsigned char *)raddr)[0] = (unsigned char)(i>>24);
+          ((unsigned char *)raddr)[1] = (unsigned char)(i>>16);
+          ((unsigned char *)raddr)[2] = (unsigned char)(i>>8);
+          ((unsigned char *)raddr)[3] = (unsigned char)(i);
+          ((unsigned char *)raddr)[4] = (unsigned char)(iret2>>24);
+          ((unsigned char *)raddr)[5] = (unsigned char)(iret2>>16);
+          ((unsigned char *)raddr)[6] = (unsigned char)(iret2>>8);
+          ((unsigned char *)raddr)[7] = (unsigned char)(iret2);
+        }
+       #else /* Optimized: fewer conditional jumps, fewer memory accesses */
+        uintptr_t count = l->rsize; /* > 0, ≤ 2*sizeof(__avword) */
+        __avword* wordaddr = (__avword*)((uintptr_t)raddr & 
~(uintptr_t)(sizeof(__avword)-1));
+        uintptr_t start_offset = (uintptr_t)raddr & 
(uintptr_t)(sizeof(__avword)-1); /* ≥ 0, < sizeof(__avword) */
+        uintptr_t end_offset = start_offset + count; /* > 0, < 
3*sizeof(__avword) */
+        if (count <= sizeof(__avword)) {
+          /* Use iret. */
+          if (end_offset <= sizeof(__avword)) {
+            /* 0 < end_offset ≤ sizeof(__avword) */
+            __avword mask0 = ((__avword)2 << 
(sizeof(__avword)*8-start_offset*8-1)) - ((__avword)1 << 
(sizeof(__avword)*8-end_offset*8));
+            wordaddr[0] ^= (wordaddr[0] ^ (i << 
(sizeof(__avword)*8-end_offset*8))) & mask0;
+          } else {
+            /* sizeof(__avword) < end_offset < 2*sizeof(__avword), 
start_offset > 0 */
+            __avword mask0 = ((__avword)2 << 
(sizeof(__avword)*8-start_offset*8-1)) - 1;
+            __avword mask1 = - ((__avword)1 << 
(2*sizeof(__avword)*8-end_offset*8));
+            wordaddr[0] ^= (wordaddr[0] ^ (i >> 
(end_offset*8-sizeof(__avword)*8))) & mask0;
+            wordaddr[1] ^= (wordaddr[1] ^ (i << 
(2*sizeof(__avword)*8-end_offset*8))) & mask1;
+          }
+        } else {
+          /* Use iret, iret2. */
+          __avword mask0 = ((__avword)2 << 
(sizeof(__avword)*8-start_offset*8-1)) - 1;
+          if (end_offset <= 2*sizeof(__avword)) {
+            /* sizeof(__avword) < end_offset ≤ 2*sizeof(__avword) */
+            __avword mask1 = - ((__avword)1 << 
(2*sizeof(__avword)*8-end_offset*8));
+            wordaddr[0] ^= (wordaddr[0] ^ ((i << 
(2*sizeof(__avword)*8-end_offset*8)) | (iret2 >> 
(end_offset*4-sizeof(__avword)*4) >> (end_offset*4-sizeof(__avword)*4)))) & 
mask0;
+            wordaddr[1] ^= (wordaddr[1] ^ (iret2 << 
(2*sizeof(__avword)*8-end_offset*8))) & mask1;
+          } else {
+            /* 2*sizeof(__avword) < end_offset < 3*sizeof(__avword), 
start_offset > 0 */
+            __avword mask2 = - ((__avword)1 << 
(3*sizeof(__avword)*8-end_offset*8));
+            wordaddr[0] ^= (wordaddr[0] ^ (i >> 
(end_offset*8-2*sizeof(__avword)*8))) & mask0;
+            wordaddr[1] = (i << (3*sizeof(__avword)*8-end_offset*8)) | (iret2 
>> (end_offset*8-2*sizeof(__avword)*8));
+            wordaddr[2] ^= (wordaddr[2] ^ (iret2 << 
(3*sizeof(__avword)*8-end_offset*8))) & mask2;
+          }
+        }
+      #endif
       }
     }
   }
diff --git avcall/avcall.h avcall/avcall.h
index f429dc8..cfd6492 100644
--- avcall/avcall.h
+++ avcall/avcall.h
@@ -140,7 +140,7 @@ enum __AV_alist_flags
 #if defined(__sparc__) && !defined(__sparc64__) && defined(__sun) && 
(defined(__SUNPRO_C) || defined(__SUNPRO_CC)) /* SUNWspro cc or CC */
                                  __AV_SUNPROCC_STRUCT_RETURN,
 #else
-#if (defined(__i386__) && (defined(_WIN32) || defined(__CYGWIN__) || 
(defined(__MACH__) && defined(__APPLE__)) || defined(__FreeBSD__) || 
defined(__DragonFly__) || defined(__OpenBSD__))) || defined(__m68k__) || 
defined(__mipsn32__) || defined(__mips64__) || defined(__sparc64__) || 
defined(__hppa__) || defined(__hppa64__) || defined(__arm__) || 
defined(__armhf__) || defined(__arm64__) || defined(__powerpc64_elfv2__) || 
defined(__ia64__) || defined(__x86_64__) || defined(__riscv32__) || 
defined(__riscv64__)
+#if (defined(__i386__) && (defined(_WIN32) || defined(__CYGWIN__) || 
(defined(__MACH__) && defined(__APPLE__)) || defined(__FreeBSD__) || 
defined(__DragonFly__) || defined(__OpenBSD__))) || defined(__m68k__) || 
defined(__mipsn32__) || defined(__mips64__) || defined(__sparc64__) || 
defined(__hppa__) || defined(__hppa64__) || defined(__arm__) || 
defined(__armhf__) || defined(__arm64__) || (defined(__powerpc__) && 
!defined(__powerpc64__) && (defined(__OpenBSD__) || defined(__NetBSD__))) || 
defined(__powerpc64_elfv2__) || defined(__ia64__) || defined(__x86_64__) || 
defined(__riscv32__) || defined(__riscv64__)
                                  __AV_SMALL_STRUCT_RETURN |
 #endif
 #if defined(__GNUC__) && !((defined(__mipsn32__) || defined(__mips64__)) && 
((__GNUC__ == 3 && __GNUC_MINOR__ >= 4) || (__GNUC__ > 3)))
diff --git callback/vacall_r/vacall_r.h callback/vacall_r/vacall_r.h
index 0e66c2e..b883029 100644
--- callback/vacall_r/vacall_r.h
+++ callback/vacall_r/vacall_r.h
@@ -167,7 +167,7 @@ enum __VA_alist_flags
 #if defined(__sparc__) && !defined(__sparc64__) && defined(__sun) && 
(defined(__SUNPRO_C) || defined(__SUNPRO_CC)) /* SUNWspro cc or CC */
                                  __VA_SUNPROCC_STRUCT_RETURN,
 #else
-#if (defined(__i386__) && (defined(_WIN32) || defined(__CYGWIN__) || 
(defined(__MACH__) && defined(__APPLE__)) || defined(__FreeBSD__) || 
defined(__DragonFly__) || defined(__OpenBSD__))) || defined(__m68k__) || 
defined(__mipsn32__) || defined(__mips64__) || defined(__sparc64__) || 
defined(__hppa__) || defined(__hppa64__) || defined(__arm__) || 
defined(__armhf__) || defined(__arm64__) || defined(__powerpc64_elfv2__) || 
defined(__ia64__) || defined(__x86_64__) || defined(__riscv32__) || 
defined(__riscv64__)
+#if (defined(__i386__) && (defined(_WIN32) || defined(__CYGWIN__) || 
(defined(__MACH__) && defined(__APPLE__)) || defined(__FreeBSD__) || 
defined(__DragonFly__) || defined(__OpenBSD__))) || defined(__m68k__) || 
defined(__mipsn32__) || defined(__mips64__) || defined(__sparc64__) || 
defined(__hppa__) || defined(__hppa64__) || defined(__arm__) || 
defined(__armhf__) || defined(__arm64__) || (defined(__powerpc__) && 
!defined(__powerpc64__) && (defined(__OpenBSD__) || defined(__NetBSD__))) || 
defined(__powerpc64_elfv2__) || defined(__ia64__) || defined(__x86_64__) || 
defined(__riscv32__) || defined(__riscv64__)
                                  __VA_SMALL_STRUCT_RETURN |
 #endif
 #if defined(__GNUC__) && !((defined(__mipsn32__) || defined(__mips64__)) && 
((__GNUC__ == 3 && __GNUC_MINOR__ >= 4) || (__GNUC__ > 3)))
diff --git vacall/vacall-internal.h vacall/vacall-internal.h
index e29a6d4..b4d60d0 100644
--- vacall/vacall-internal.h
+++ vacall/vacall-internal.h
@@ -279,7 +279,7 @@ typedef struct vacall_alist
 #define __va_start_struct1(LIST,TYPE_SIZE,TYPE_ALIGN,TYPE_SPLITTABLE)  \
   ((LIST)->flags |= __VA_REGISTER_STRUCT_RETURN, 0)
 #endif
-#if (defined(__i386__) && !defined(_WIN32)) || defined(__m68k__) || 
(defined(__powerpc__) && !defined(__powerpc64__)) || (defined(__s390__) && 
!defined(__s390x__))
+#if (defined(__i386__) && !defined(_WIN32)) || defined(__m68k__) || 
(defined(__powerpc__) && !defined(__powerpc64__) && !(defined(__OpenBSD__) || 
defined(__NetBSD__))) || (defined(__s390__) && !defined(__s390x__))
 #define __va_reg_struct_return(LIST,TYPE_SIZE,TYPE_SPLITTABLE)  \
   ((TYPE_SIZE) == 1 || (TYPE_SIZE) == 2 || (TYPE_SIZE) == 4            \
    || ((TYPE_SIZE) == 8 && (TYPE_SPLITTABLE)                           \
@@ -355,6 +355,16 @@ typedef struct vacall_alist
     && ((LIST)->flags |= __VA_REGISTER_DOUBLESTRUCT_RETURN),                   
\
    0)
 #endif
+#if defined(__powerpc__) && !defined(__powerpc64__) && (defined(__OpenBSD__) 
|| defined(__NetBSD__))
+#define __va_reg_struct_return(LIST,TYPE_SIZE,TYPE_SPLITTABLE)  \
+  ((TYPE_SIZE) <= 8)
+/* Turn on __VA_REGISTER_STRUCT_RETURN if __VA_SMALL_STRUCT_RETURN was set
+ * and the struct will actually be returned in registers.
+ */
+#define __va_start_struct1(LIST,TYPE_SIZE,TYPE_ALIGN,TYPE_SPLITTABLE)  \
+  ((LIST)->flags |= __VA_REGISTER_STRUCT_RETURN,   \
+   0)
+#endif
 #if (defined(__powerpc64__) && !defined(__powerpc64_elfv2__)) || 
defined(__s390x__)
 #define __va_reg_struct_return(LIST,TYPE_SIZE,TYPE_SPLITTABLE)  \
   0
diff --git vacall/vacall-powerpc.c vacall/vacall-powerpc.c
index ea3e208..04a790e 100644
--- vacall/vacall-powerpc.c
+++ vacall/vacall-powerpc.c
@@ -155,18 +155,93 @@ vacall_receiver (__vaword word1, __vaword word2, __vaword 
word3, __vaword word4,
   } else
   if (list.rtype == __VAstruct) {
     if (list.flags & __VA_REGISTER_STRUCT_RETURN) {
-      if (list.rsize == sizeof(char)) {
-        iret = *(unsigned char *) list.raddr;
-      } else
-      if (list.rsize == sizeof(short)) {
-        iret = *(unsigned short *) list.raddr;
-      } else
-      if (list.rsize == sizeof(int)) {
-        iret = *(unsigned int *) list.raddr;
-      } else
-      if (list.rsize == 2*sizeof(__vaword)) {
-        iret  = ((__vaword *) list.raddr)[0];
-        iret2 = ((__vaword *) list.raddr)[1];
+      if (list.rsize > 0 && list.rsize <= 8) {
+        #if 0 /* Unoptimized */
+        if (list.rsize == 1) {
+          iret =   ((unsigned char *) list.raddr)[0];
+        } else
+        if (list.rsize == 2) {
+          iret =  (((unsigned char *) list.raddr)[0] << 8)
+                |  ((unsigned char *) list.raddr)[1];
+        } else
+        if (list.rsize == 3) {
+          iret =  (((unsigned char *) list.raddr)[0] << 16)
+                | (((unsigned char *) list.raddr)[1] << 8)
+                |  ((unsigned char *) list.raddr)[2];
+        } else
+        if (list.rsize == 4) {
+          iret =  (((unsigned char *) list.raddr)[0] << 24)
+                | (((unsigned char *) list.raddr)[1] << 16)
+                | (((unsigned char *) list.raddr)[2] << 8)
+                |  ((unsigned char *) list.raddr)[3];
+        } else
+        if (list.rsize == 5) {
+          iret  =   ((unsigned char *) list.raddr)[0];
+          iret2 =  (((unsigned char *) list.raddr)[1] << 24)
+                 | (((unsigned char *) list.raddr)[2] << 16)
+                 | (((unsigned char *) list.raddr)[3] << 8)
+                 |  ((unsigned char *) list.raddr)[4];
+        } else
+        if (list.rsize == 6) {
+          iret  =  (((unsigned char *) list.raddr)[0] << 8)
+                 |  ((unsigned char *) list.raddr)[1];
+          iret2 =  (((unsigned char *) list.raddr)[2] << 24)
+                 | (((unsigned char *) list.raddr)[3] << 16)
+                 | (((unsigned char *) list.raddr)[4] << 8)
+                 |  ((unsigned char *) list.raddr)[5];
+        } else
+        if (list.rsize == 7) {
+          iret  =  (((unsigned char *) list.raddr)[0] << 16)
+                 | (((unsigned char *) list.raddr)[1] << 8)
+                 |  ((unsigned char *) list.raddr)[2];
+          iret2 =  (((unsigned char *) list.raddr)[3] << 24)
+                 | (((unsigned char *) list.raddr)[4] << 16)
+                 | (((unsigned char *) list.raddr)[5] << 8)
+                 |  ((unsigned char *) list.raddr)[6];
+        } else
+        if (list.rsize == 8) {
+          iret  =  (((unsigned char *) list.raddr)[0] << 24)
+                 | (((unsigned char *) list.raddr)[1] << 16)
+                 | (((unsigned char *) list.raddr)[2] << 8)
+                 |  ((unsigned char *) list.raddr)[3];
+          iret2 =  (((unsigned char *) list.raddr)[4] << 24)
+                 | (((unsigned char *) list.raddr)[5] << 16)
+                 | (((unsigned char *) list.raddr)[6] << 8)
+                 |  ((unsigned char *) list.raddr)[7];
+        }
+        #else /* Optimized: fewer conditional jumps, fewer memory accesses */
+        uintptr_t count = list.rsize; /* > 0, ≤ 2*sizeof(__vaword) */
+        __vaword* wordaddr = (__vaword*)((uintptr_t)list.raddr & 
~(uintptr_t)(sizeof(__vaword)-1));
+        uintptr_t start_offset = (uintptr_t)list.raddr & 
(uintptr_t)(sizeof(__vaword)-1); /* ≥ 0, < sizeof(__vaword) */
+        uintptr_t end_offset = start_offset + count; /* > 0, < 
3*sizeof(__vaword) */
+        if (count <= sizeof(__vaword)) {
+          /* Assign iret. */
+          __vaword mask0 = ((__vaword)2 << 
(sizeof(__vaword)*8-start_offset*8-1)) - 1;
+          if (end_offset <= sizeof(__vaword)) {
+            /* 0 < end_offset ≤ sizeof(__vaword) */
+            iret = (wordaddr[0] & mask0) >> (sizeof(__vaword)*8-end_offset*8);
+          } else {
+            /* sizeof(__vaword) < end_offset < 2*sizeof(__vaword), 
start_offset > 0 */
+            iret = ((wordaddr[0] & mask0) << (end_offset*8-sizeof(__vaword)*8))
+                   | (wordaddr[1] >> (2*sizeof(__vaword)*8-end_offset*8));
+          }
+        } else {
+          /* Assign iret, iret2. */
+          __vaword mask0 = ((__vaword)2 << 
(sizeof(__vaword)*8-start_offset*8-1)) - 1;
+          if (end_offset <= 2*sizeof(__vaword)) {
+            /* sizeof(__vaword) < end_offset ≤ 2*sizeof(__vaword) */
+            iret = (wordaddr[0] & mask0) >> 
(2*sizeof(__vaword)*8-end_offset*8);
+            iret2 = ((wordaddr[0] & mask0) << 
(end_offset*4-sizeof(__vaword)*4) << (end_offset*4-sizeof(__vaword)*4))
+                    | (wordaddr[1] >> (2*sizeof(__vaword)*8-end_offset*8));
+          } else {
+            /* 2*sizeof(__vaword) < end_offset < 3*sizeof(__vaword), 
start_offset > 0 */
+            iret = ((wordaddr[0] & mask0) << 
(end_offset*8-2*sizeof(__vaword)*8))
+                   | (wordaddr[1] >> (3*sizeof(__vaword)*8-end_offset*8));
+            iret2 = (wordaddr[1] << (end_offset*8-2*sizeof(__vaword)*8))
+                    | (wordaddr[2] >> (3*sizeof(__vaword)*8-end_offset*8));
+          }
+        }
+        #endif
       }
     }
   }
diff --git vacall/vacall.h vacall/vacall.h
index 9485356..4946916 100644
--- vacall/vacall.h
+++ vacall/vacall.h
@@ -128,7 +128,7 @@ enum __VA_alist_flags
 #if defined(__sparc__) && !defined(__sparc64__) && defined(__sun) && 
(defined(__SUNPRO_C) || defined(__SUNPRO_CC)) /* SUNWspro cc or CC */
                                  __VA_SUNPROCC_STRUCT_RETURN,
 #else
-#if (defined(__i386__) && (defined(_WIN32) || defined(__CYGWIN__) || 
(defined(__MACH__) && defined(__APPLE__)) || defined(__FreeBSD__) || 
defined(__DragonFly__) || defined(__OpenBSD__))) || defined(__m68k__) || 
defined(__mipsn32__) || defined(__mips64__) || defined(__sparc64__) || 
defined(__hppa__) || defined(__hppa64__) || defined(__arm__) || 
defined(__armhf__) || defined(__arm64__) || defined(__powerpc64_elfv2__) || 
defined(__ia64__) || defined(__x86_64__) || defined(__riscv32__) || 
defined(__riscv64__)
+#if (defined(__i386__) && (defined(_WIN32) || defined(__CYGWIN__) || 
(defined(__MACH__) && defined(__APPLE__)) || defined(__FreeBSD__) || 
defined(__DragonFly__) || defined(__OpenBSD__))) || defined(__m68k__) || 
defined(__mipsn32__) || defined(__mips64__) || defined(__sparc64__) || 
defined(__hppa__) || defined(__hppa64__) || defined(__arm__) || 
defined(__armhf__) || defined(__arm64__) || (defined(__powerpc__) && 
!defined(__powerpc64__) && (defined(__OpenBSD__) || defined(__NetBSD__))) || 
defined(__powerpc64_elfv2__) || defined(__ia64__) || defined(__x86_64__) || 
defined(__riscv32__) || defined(__riscv64__)
                                  __VA_SMALL_STRUCT_RETURN |
 #endif
 #if defined(__GNUC__) && !((defined(__mipsn32__) || defined(__mips64__)) && 
((__GNUC__ == 3 && __GNUC_MINOR__ >= 4) || (__GNUC__ > 3)))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]