VADDPS zmm1,zmm2,zmm3 ; adds 8 float numbers in zmm2 with 8 numbers in zmm3 and stores result in 8 numbers of zmm1
VSUBPS zmm1,zmm2,zmm3 ; subtracts 8 float numbers in zmm3 from 8 numbers in zmm2 and stores result in 8 numbers of zmm1
VDIVPS zmm1,zmm2,zmm3 ; divides 8 float numbers in zmm2 with 8 numbers in zmm3 and stores result in 8 numbers of zmm1
VMOVUPS [memory location],zmm1 ; Stores 8 floats from zmn1 to memory location
A bit faster than VMOVUPS is VMOVAPS, but the numbers must be at addresses divisible by 64.
Check if your PC supports AVX-512. All Xeon processors support it, usually no Pentium and Celeron, while Core processors may and may not.