> +#define PACK_WIDTH (4 << SHIFT)
Incorrect for AVX, it was correct with the ifdef in v1.
Perhaps just (SHIFT ? 8 : 4)?
That's intentional, the AVX patches change it to an #ifndef that AVX overrides. For now the purpose of the series is to keep things simple and loop-ified, with AVX remaining in the background. But I can use the ternary operator if you prefer, that's a good suggestion too.
Paolo
I think this should be parameterized on the larger of
the two types in the insn. Because right now you get
some weird arithmetic in e.g. punpck*dq.
r~
,