The rtl native environment is close to a usable state for linux x86-64. Most instructions have been translated to assembler and it remains to debug them and write test cases which will take some time.
Consider the simple code,
(define (f n) (let loop ((n n) (s 0)) (if (eq? n 0) s (loop (- n 1) (+ n s)))))
for n = 100 000 000 I get
old vm, 2.74s 5x rtl vm , 1.43s 3x
rtl,native, 0.54s 1x ---------------------------------
One can do better but then we must make sure to use the cpu registers and not memory based registers. I would expect a direct C loop to be less then 0.1s.