[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PULL 41/47] target/arm/vec_helper: Handle oprsz less than 16 bytes in i
From: |
Peter Maydell |
Subject: |
[PULL 41/47] target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations |
Date: |
Tue, 1 Sep 2020 16:18:17 +0100 |
In the gvec helper functions for indexed operations, for AArch32
Neon the oprsz (total size of the vector) can be less than 16 bytes
if the operation is on a D reg. Since the inner loop in these
helpers always goes from 0 to segment, we must clamp it based
on oprsz to avoid processing a full 16 byte segment when asked to
handle an 8 byte wide vector.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200828183354.27913-43-peter.maydell@linaro.org
---
target/arm/vec_helper.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 20f153b47a1..b27b90e1dd8 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -1040,7 +1040,8 @@ DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
#define DO_MUL_IDX(NAME, TYPE, H) \
void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
{ \
- intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \
+ intptr_t i, j, oprsz = simd_oprsz(desc); \
+ intptr_t segment = MIN(16, oprsz) / sizeof(TYPE); \
intptr_t idx = simd_data(desc); \
TYPE *d = vd, *n = vn, *m = vm; \
for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \
@@ -1061,7 +1062,8 @@ DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )
#define DO_MLA_IDX(NAME, TYPE, OP, H) \
void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
{ \
- intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \
+ intptr_t i, j, oprsz = simd_oprsz(desc); \
+ intptr_t segment = MIN(16, oprsz) / sizeof(TYPE); \
intptr_t idx = simd_data(desc); \
TYPE *d = vd, *n = vn, *m = vm, *a = va; \
for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \
@@ -1086,7 +1088,8 @@ DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -, )
#define DO_FMUL_IDX(NAME, TYPE, H) \
void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
{ \
- intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \
+ intptr_t i, j, oprsz = simd_oprsz(desc); \
+ intptr_t segment = MIN(16, oprsz) / sizeof(TYPE); \
intptr_t idx = simd_data(desc); \
TYPE *d = vd, *n = vn, *m = vm; \
for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \
@@ -1108,7 +1111,8 @@ DO_FMUL_IDX(gvec_fmul_idx_d, float64, )
void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, \
void *stat, uint32_t desc) \
{ \
- intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \
+ intptr_t i, j, oprsz = simd_oprsz(desc); \
+ intptr_t segment = MIN(16, oprsz) / sizeof(TYPE); \
TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1); \
intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1); \
TYPE *d = vd, *n = vn, *m = vm, *a = va; \
--
2.20.1
- [PULL 31/47] target/arm: Implement fp16 for Neon fp compare-vs-0, (continued)
- [PULL 31/47] target/arm: Implement fp16 for Neon fp compare-vs-0, Peter Maydell, 2020/09/01
- [PULL 33/47] target/arm: Implement fp16 for Neon VRSQRTS, Peter Maydell, 2020/09/01
- [PULL 34/47] target/arm: Implement fp16 for Neon pairwise fp ops, Peter Maydell, 2020/09/01
- [PULL 35/47] target/arm: Implement fp16 for Neon float-integer VCVT, Peter Maydell, 2020/09/01
- [PULL 32/47] target/arm: Implement fp16 for Neon VRECPS, Peter Maydell, 2020/09/01
- [PULL 36/47] target/arm: Convert Neon VCVT fixed-point to gvec, Peter Maydell, 2020/09/01
- [PULL 37/47] target/arm: Implement fp16 for Neon VCVT fixed-point, Peter Maydell, 2020/09/01
- [PULL 39/47] target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode, Peter Maydell, 2020/09/01
- [PULL 40/47] target/arm: Implement fp16 for Neon VRINTX, Peter Maydell, 2020/09/01
- [PULL 38/47] target/arm: Implement fp16 for Neon VCVT with rounding modes, Peter Maydell, 2020/09/01
- [PULL 41/47] target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations,
Peter Maydell <=
- [PULL 42/47] target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations, Peter Maydell, 2020/09/01
- [PULL 43/47] target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS, Peter Maydell, 2020/09/01
- [PULL 44/47] target/arm: Enable FP16 in '-cpu max', Peter Maydell, 2020/09/01
- [PULL 45/47] hw/arm/sbsa-ref: add "reg" property to DT cpu nodes, Peter Maydell, 2020/09/01
- [PULL 47/47] hw/arm/sbsa-ref : Add embedded controller in secure memory, Peter Maydell, 2020/09/01
- [PULL 46/47] hw/misc/sbsa_ec : Add an embedded controller for sbsa-ref, Peter Maydell, 2020/09/01
- Re: [PULL 00/47] target-arm queue, Peter Maydell, 2020/09/01
- Re: [PULL 00/47] target-arm queue, no-reply, 2020/09/02