This is exactly my point... If matlab had to reform a C99 complex
matrix and then call zGEMM rather than use four calls to dGEMM and do
the additions then it would be slower as the creation and copying of
the data to a C99 complex array comes at a cost. In fact xGEMM does
the operator
C = alpha * A * B + beta * C
and so you might for example call dGEMM once passing the two imaginary
parts as A, B and have beta of zero, then have a second call with that
result as C, beta of -1 and A and B being the imaginary parts, thus
giving you the real part of the complex matrix multiply with two calls
to dGEMM on the real and imaginary parts of the matrix. The same goes
for the calculation of the imaginary part of the matrix multiply. The
underlying code of zGEMM has to do something similar in any case, it
just does the four multiplications element by element instead, so
there is no surprise it is much the same speed as what matlab does.
The absence of cGEMM from the symbols of the numeric is a pretty good
indication that the above is exactly what mathworks does as I see no
reason to handle matrix multiplies different between double and single
precision values.